SlideShare a Scribd company logo
1 of 169
PGDT 2016
HARAMAYA UNIVERSITY
COLLEGE OF EDUCATION AND
BEHAVIORAL SCIENCES
Department of Psychology
Assessment and Evaluation of
Learning
Unit 1: Assessment: Concept,
Purpose, and Principles
 You might have come across with the
concepts
1. test,
2. measurement,
3. assessment, &
4. evaluation. How do you
understand them?
Test
 Test in educational context is presentation of a
standard set of questions to be answered by students.
 It is one instrument that is used for collecting
information about students’ behaviors or
Cont.
o A test is a task or a serious of tasks or questions
that students must answer or perform.
o It is used to get information regarding the extent to
which the students have mastered the subject
matter taught and the attainment of instructional
objectives.
o Test is a systematic procedure for observing (i.e.,
getting information) and describing one or more
characteristics of a person with the aid of either a
numerical scale (measurement such as test
scores) or a category (qualitative means).
Cont.
Measurement
 Measurement is the process by which the
attributes of a person are measured and described
in numbers based on certain rules.
 It is a quantitative description of the behavior or
performance of students.
 Measurement permits more objective description
concerning traits and facilitates comparisons.
Assessment
 Assessment is the planned process of gathering and synthesizing
information relevant to the purposes of discovering and
documenting students' strengths and weaknesses, planning and
enhancing instruction, and/or to check progress or status to make it
ready for decision making.
 Information may be collected using various instruments including
tests, observations of students, checklists, questionnaires and
interviews.
Cont.
 It is a process of collecting, synthesizing, and
interpreting information to aid in decision-
making (Nitko, 1996; and Airasian, 1996).
 It can be qualitative and quantitative.
Evaluation
 Is the processes of judging the quality of student learning on the basis
of established performance standards and assigning a value to
represent the worthiness or quality of that learning or performance.
 It is concerned with determining how well they have learned.
 When we evaluate, we are saying that something is good, appropriate,
valid, positive, and so forth.
 Evaluation includes both quantitative and qualitative descriptions of
student behavior and value judgment concerning the desirability of that
behavior
Cont.
 Evaluation = Quantitative description of
students’ behavior (measurement) + qualitative
description of students’ behavior (non-
measurement) + value judgment
 Evaluation involves judgment. The quantitative
values that we obtain through measurement
will not have any meaning until they are
evaluated against some standards.
Importance and Purposes of Assessment
 It can be summarized that assessment in education
focuses on:
helping LEARNING, and;
improving TEACHING.
 With regards to the learner, assessment is aimed at
providing information that will help us make decisions
concerning remediation, enrichment, selection,
exceptionality, progress and certification.
 With regard to teaching, assessment provides
information about the attainment of objectives, the
effectiveness of teaching methods and learning
materials.
Overall, assessment serves the following main purposes
in education.
1. Assessment is used to inform and guide teaching and
learning
2. Assessment is used to help students set learning
goals
3. Assessment is used to assign report card grades
for
 Grading
 Awards
 Diagnosis
 Placement
 Information on entry behavior
 Evidence of effectiveness and/or possible
shortcomings
 Opportunity to communicate to stakeholders
 Guidance
 Plan and adapt instruction
 Provide feedback and incentives(to motivate students)
Role of objectives in assessment
 The first step in planning any good teaching is to
clearly define the learning objectives or
outcomes.
 A learning objective is an outcome statement
that captures specifically what knowledge, skills,
attitudes learners should be able to exhibit
following instruction.
 Effective assessment practice requires relating
the assessment procedures as directly as
possible to the learning objectives.
 Instructional objectives which are commonly
known as learning outcomes play a key role in
both the instructional and assessment process.
Cont.
 They serve as guides for both teaching and
learning, communicate the intent of instruction
to others, and provide guidelines for assessing
students learning.
 A learning outcome stated in this way clearly
indicates the kind of performance students are
expected to exhibit as a result of the
instruction.
 Well stated learning outcomes make clear the
types of students performance we are willing to
accept as evidence that the instruction has
been successful.
Principles of Assessment
Different educators and school systems
have developed different sets of assessment
principles.
Miller, Linn and Grunland (2009) have
identified the following general principles of
assessment.
 Clearly specifying what is to be assessed has
priority in the assessment process.
 An assessment procedure should be selected
because of its relevance to the characteristics or
performance to be measured.
 Comprehensive assessment requires a variety of
procedures.
 Proper use of assessment procedures require an
awareness of their limitations.
Cont.
 The New South West Wales Department of
Education and Training (2008) in Australia are
more inclusive and listed the following
principles:
 Assessment should be relevant
 Assessment should be appropriate
 Assessment should be fair.
 Assessment should be accurate
 Assessment should provide useful information
 Assessment should be integrated into the teaching
and learning cycle.
 Assessment should draw on a wide range of
evidence
 Assessment should be manageable
Assessment and Some Basic
Assumptions
 The quality of student learning is directly, although
not exclusively related to the quality of teaching.
 To improve their effectiveness, teachers need first
to make their goals and objectives explicit and
then to get specific, comprehendible feedback on
the extent to which they are achieving those goals
and objectives.
 To improve their learning, students need to receive
appropriate and focused feedback early and often;
they also need to learn how to assess their own
learning.
Concept of Continuous
Assessment
 Continuous assessment is a learning process between teachers,
students and stakeholders (parents).
 It is a process that fosters dialogue between these stakeholders to bring
out the child’s best learning.
 It is also a holistic process that not only brings together multiple
stakeholders, but also integrates assessment and teaching as
interconnected activities that are integral to the child’s learning.
 It is an assessment approach which involves the use of a variety of
assessment instruments, assessing various components of learning.
Characteristics of Continuous
Assessment
Characteristics of continuous assessment.
a) Systematic
b) Comprehensive
c) Cumulative
d) Guidance –Oriented
A/ Systematic nature of CA
 It requires an operational plan which indicates what
measurements are to be made about the pupils’
performance, at what time intervals or times during the
school year, the measurements to be made and the
results recorded, and the nature of the tools or
instruments to be used in the measurements.
 It is planned, graded to suit the age and experience of
the children and given at suitable intervals during the
school year.
Cont.
B/ Comprehensive nature of CA
 It is comprehensive in the sense that many types of instruments are
used in determining the performance.
 It means that Continuous assessment is not focused on academic skills
alone.
 It embraces the cognitive, psychomotor and affective domains. A child
is assessed as a total entity using all the psychometric devises such as
test and non test techniques.
Cont.
(c) Cumulative Nature of CA: It is cumulative since any decision to be
made at any point on the pupil takes into account all previous
decisions.
d) Guidance-Oriented Nature of CA:
 It means that the information collected is to be used for educational,
vocational and personal- social decision-making for the child.
Guidance and counseling activities thrive better on valid, sequel,
systematic, continuous, cumulative and comprehensive information.
Assessment
1. It measures what a student knows, understands and
can do.
2. It is fair
3. It is carried out periodically over a term and over a
year
4. There are a number of different types of assessment
activities
5. Assessment and instruction are similar/going side by
side
6. There is a lack of pupil fear
7. Focus is on pupil progress
Assessment, Learning, and the Involvement of
Students
Classroom assessment promotes learning when
teachers use it in the following ways:
 When they use it to become aware of the knowledge, skills, and
beliefs that their students bring to a learning task, and;
 When they use this knowledge as a starting point for new
instruction, and monitor students’ changing perceptions as
instruction proceeds.
 When teachers and students collaborate and use ongoing
assessment and pertinent feedback to move learning forward.
 When classroom assessment is frequent and varied, teachers can
learn a great deal about their students.
Cont.
Assessment provides the feedback loop for this process.
 By increasing students’ motivation. Motivation is essential for
students’ engagement in their learning. The higher the motivation, the
more time and energy a student is willing to devote to any given task.
Even when a student finds the content interesting and the activity
enjoyable, they show sustained concentration and effort.
 Assessment can be a motivator, not through reward and punishment,
but by stimulating students’ intrinsic interest. Assessment can enhance
student motivation by:
Cont.
• emphasizing progress and achievement rather than failure
• providing feedback to move learning forward
• reinforcing the idea that students have control over, and responsibility
for their own learning
• building confidence in students so they can and need to take risks being
relevant, and appealing to students’ imaginations
• providing the scaffolding that students need to genuinely succeed
Assessment and Teacher Professional
Competence
o A teacher should have some basic competencies on classroom
assessment so as to be able to effectively assess his/her students
learning.
o Assessment activities occur prior, during, and after instruction.
o In the American education system a list of seven standards for teacher
competence in educational assessment of students has been developed.
Cont.
The seven standards are stated below.
Teachers should be skilled in:-
1. Choosing assessment options appropriate for instructional decisions.
2. Developing assessment methods appropriate for instructional
decisions.
3. Administering, scoring, and interpreting the results of assessment
methods.
4. Using assessment results when making decisions about individual
students, planning teaching, developing curriculum, and school
improvement
Cont.
5. Developing valid student grading procedures that use
student assessments
6. Communicating assessment results to students, parents,
other audiences, and educators.
7. Recognizing unethical, illegal, and otherwise inappropriate
assessment methods and uses of assessment
information.
UNIT TWO
ASSESSMENT STRATEGIES, METHODS, AND
TOOLS
There are three pairs of assessment typologies:
o formal vs. informal,
o formative vs. summative assessments
o criterion referenced vs. norm referenced,
Formal and Informal Assessment
 Formal Assessment: implies a written document,
such as a test, quiz, or paper. A formal assessment
gives a numerical score or grade based on student
performance.
 Informal Assessment: "Informal“ indicate
techniques that can easily be incorporated into
classroom routines and learning activities.
 can be used at anytime without interfering with
instructional time.
 It usually occurs in a more casual manner and may
include observation, inventories, checklists, rating
scales, rubrics, performance and portfolio
assessments, participation, peer and self
Cont.
 Methods for informal assessment can be
unstructured (e.g., student work samples, journals)
and structured (e.g., checklists, observations).
 Unstructured methods frequently are somewhat
more difficult to score and evaluate, but they can
provide a great deal of valuable information about
the skills of the students.
 Structured methods can be reliable and valid
techniques when time is spent creating the
"scoring" procedures.
 Informal assessments actively involve the
students in the evaluation process - they are not
just paper-and-pencil tests.
Formative and Summative Assessments
 Functional role during classroom instruction -formative
and summative assessment
 Formative Assessment: used to shape and guide classroom
instruction.
 Include both informal and formal assessments
 It can be given before, during, and even after instruction, the
goal is to improve instruction.
 It is ongoing assessments, appraisal, and observations in a
classroom.
 It serves a diagnostic function for both students and
teachers.
Cont.
 It helps students to adjust, improve their performance or engagement
in the unit.
 Teachers receive feedback on the quality of learners’ understandings
and consequently, can modify their teaching approaches to provide
enrichment or remedial activities to more effectively guide learners.
 It is also known by the name ‘assessment for learning’ or ‘continuous
assessment’.
Cont.
 Summative Assessment: comes at the end of a course (or unit) of
instruction.
 It evaluates the quality of students’ learning and assigns a mark to
students’ work based on how effectively learners have addressed the
performance standards and criteria.
 Assessment tasks conducted during the progress of a semester may be
regarded as summative in nature if they only contribute to the final
grades of the students.
 A particular assessment task can be both formative and summative
Criterion-referenced and Norm-referenced Assessments
Based on interpreting student performance:
 Criterion-referenced Assessment- is carried out against
previously specified criteria and performance standards or
the subject matter.
 Grade is assigned on the basis of the standard the student
has achieved on each of the criteria.
 Norm-referenced Assessment: This type of assessment
has its end point the determination of student performance
based on a position within a cohort of students – the norm
group.
Assessment Strategies
 Assessment strategy refers to those assessment tasks
(methods/approaches/activities) in which students are
engaged to ensure that all the learning objectives of a
subject, a unit or a lesson have been adequately
addressed.
Criteria for selecting assessment strategies involves:
 Its appropriateness for the particular behavior being
assessed.
 It should also be related to the course material and
relevant to students’ lives.
Cont.
 There are many different ways to categorize learning goals for
students.
o Knowledge and understanding: What facts do students know outright?
What information can they retrieve? What do they understand?
o Reasoning proficiency: Can students analyze, categorize, and sort into
component parts? Can they generalize and synthesize what they have
learned?
o Skills: We have certain skills that we want students to master such as
reading fluently, working productively in a group, making an oral
presentation, speaking a foreign language, or designing an experiment.
Cont.
 Ability to create products: Another kind of learning
target is student-created products - tangible evidence
that the student has mastered knowledge, reasoning,
and specific production skills. Examples include a
research paper, a piece of furniture, or artwork.
 Dispositions: We also frequently care about student
attitudes and habits of mind, including attitudes
toward school, persistence, responsibility, flexibility,
and desire to learn.
Cont.
 The various assessment strategies that can be used by
classroom teachers, some are described below.
Classroom presentations:
 requires students to verbalize their knowledge, select and present
samples of finished work, and organize their thoughts about a topic in
order to present a summary of their learning.
 Conferences: is a formal or informal meeting between the teacher and
a student for the purpose of exchanging information or sharing ideas.
Cont.
 Exhibitions/Demonstrations: is a performance in a public setting,
during which a student explains and applies a process, procedure, etc.,
in concrete ways to show individual achievement of specific skills and
knowledge.
 Interviews:-interview is a face-to-face conversation in which teacher
and student use inquiry to share their knowledge and understanding of
a topic or problem.
This form of assessment can be used by the teacher to:
 explore the student’s thinking;
 assess the student’s level of understanding of a concept or procedure;
and
 gather information, obtain clarification, determine positions, and probe
for motivations
Cont.
o Observation: is a process of systematically
viewing and recording students while they work,
for the purpose of making instructional decisions.
o It can take place at any time and in any setting. It
provides information on students' strengths and
weaknesses, learning styles, interests, and
attitudes.
 Observations may be informal or highly structured,
and incidental or scheduled over different periods of
time in different learning contexts.
There are various observational techniques. They
include anecdotal records, checklists, rating scales,
socio-metric techniques.
Cont.
Performance tasks: students create, produce, perform,
or present works on "real world" issues.
It may be used to assess a skill or proficiency, and
provides useful information on the process as well as
the product.
Portfolios: is a collection of samples of a student’s work
over time.
o It offers a visual demonstration of a student’s
achievement, capabilities, strengths, weaknesses,
knowledge, and specific skills, over time and in a
variety of contexts.
o For a portfolio to serve as an effective assessment
instrument, it has to be focused, selective, reflective,
and collaborative.
Cont.
 Questions and answers: Perhaps, this is a widely used
strategy by teachers with the intention of involving
their students in the learning and teaching process. In
this strategy, the teacher poses a question and the
student answers verbally, rather than in writing.
Cont.
Students’ self-assessments:
 It is the student’s own assessment of personal progress in
terms of knowledge, skills, processes, or attitudes. Self-
assessment leads students to a greater awareness and
understanding of themselves as learners
Checklists, Rating Scales and Rubrics
 These are tools that state specific criteria and allow teachers and
students to gather information and to make judgments about what
students know and can do in relation to the outcomes.
Cont.
 Checklists usually offer a yes/no format in relation to student
demonstration of specific criteria. They may be used to record
observations of an individual, a group or a whole class.
 Rating Scales allow teachers to indicate the degree or frequency of the
behaviors, skills and strategies displayed by the learner. Rating scales
state the criteria and provide three or four response selections to
describe the quality or frequency of student work.
 Rubrics use a set of criteria to evaluate a student's performance. They
consist of a fixed measurement scale and detailed description of the
characteristics for each level of performance. These descriptions focus
on the quality of the product or performance and not the quantity.
Cont.
The purpose of checklists, rating scales and rubrics is
to:
 provide tools for systematic recording of observations
 provide tools for self-assessment
 provide samples of criteria for students prior to
collecting and evaluating data on their work
 record the development of specific skills, strategies,
attitudes and behaviors necessary for demonstrating
learning
 clarify students' instructional needs by presenting a
record of current accomplishments.
Cont.
 One- Minute paper: During the last few minutes of the
class period, you may ask students to answer on a
half-sheet of paper: "What is the most important point
you learned today?" and, "What point remains least
clear to you?"
 Muddiest Point: This is similar to ‘One-Minute Paper’
but only asks students to describe what they didn't
understand and what they think might help.
 It is to determine which key points of the lesson were
missed by the students.
 Here also you have to review before next class
meeting and use to clarify, correct, or elaborate.
Cont.
 Student- generated test questions: You may allow students to write test
questions and model answers for specified topics, in a format
consistent with course exams. This will give students the opportunity
to evaluate the course topics, reflect on what they understand, and
what good test items are. You may evaluate the questions and use the
goods ones as prompts for discussion.
 Tests: is the type of assessment that you are mostly familiar with. A
test requires students to respond to prompts in order to demonstrate
their knowledge (orally or in writing) or their skills (e.g., through
performance).
Assessment in large classes
 Due to time and resources constraints, teachers
often use less time-demanding assessment
methods.
 Assessment issues associated with large classes
include:
1.Surface Learning Approach: teachers rely on time-efficient and
exam-based assessment methods for assessing large classes, such
as multiple choices and short answer question examinations.
 Assess learning at the lower levels of intellectual complexity.
 Students tend to adopt a surface rote learning approach when
preparing for these kinds of assessment methods.
Cont.
 Feedback is often inadequate
 Inconsistency in marking: Large class usually consists of a
diverse and complex group of students. The issues of different
perception towards assessments, cultural and educational
background, prior knowledge and level of interest to the subject
all pose challenges to the fairness of marking and grading.
 Difficulty in monitoring cheating and plagiarism
 Lack of interaction and engagement
 When teachers raise questions in large classes, many students
are not willing to respond.
Cont.
Assessment in large class for Effective student
learning include:
Front ending: putting in an increased effort at the
beginning in setting up the students for the
work they are going to do, the work submitted
can be improved.
 Therefore the time needed to mark it is reduced.
Making use of in-class assignments
In-class assignments are usually quick and relatively easy
to mark and provide feedback on, help you to identify gaps
in understanding.
Cont.
• Self-assessment reduces the marking load
because it ensures a higher quality of work that is
submitted, thereby minimizing the amount of time
expended on marking and feedback.
Peer-assessment
• provide useful learning experiences for students
at the same time as reducing the marking load of
staff.
Cont.
 Group Assessments: significantly reduces the marking
load if the group submits only one work.
 The major problem is that group members may not
contribute equally, so how are they to be rewarded
fairly.
 Changing the assessment method, or at least
shortening it
Being faced with large numbers of students will
present challenges but may also provide opportunities
to either modify existing assessments or to explore
new methods of assessment.
Selecting and developing assessment methods
and tools
 The process of assessing student performance
must begin with educational outcomes.
 A wide variety of tools are available for assessing
student performance.
Cont.
Constructing Tests
 Classroom tests can be consist of objective test
items and performance assessments.
 Objective tests are highly structured and require
the test taker to select the correct answer from
several alternatives or to supply a word or short
phrase to answer a question or complete a
statement.
 They are called objective because they have a
single right or best answer that can be determined
in advance.
 Performance assessment tasks permit the student
to organize and construct the answer in essay
form, by using equipment, generating hypothesis,
making observations, constructing something or
Cont.
Constructing Objective Test Items
 There are various types of objective test items.
 These can be classified into supply type items and selection type items.
 Supply type items include completion items and short answer
questions.
 Selection type test items include True/False, multiple choice and
matching.
True/False Test Items
Advantage of true/false items is that:
 do not require the student much time for
answering.
Cont.
 allows a teacher to cover a wide range of content.
 can be scored quickly, reliably, and objectively
 measuring higher mental processes of understanding, application
and interpretation.
Disadvantage
 promote memorization of factual information
 encourage students for guessing and cheating.
 Do not discriminate b/n students of varying ability like other test
items
 include more irrelevant clues than other item types
 lead a teacher to favor testing of trivial knowledge.
 Items are not unequivocally (clearly) true or false (difficulty of
writing statements which are clearly true or false)
Suggestions
Construct good quality true/false test items.
 Avoid negative statements, and never use double negatives.
 If opinion is used, attribute it to some source, unless the
ability to identify opinion is being specifically measured.
 Restrict single-item statements to single concepts
 Avoid ambiguous words and statements of broad general
statements
 Use an approximately equal number of items, reflecting the
two categories tested
Cont.
 Make statements representing both categories equal in
length.
 Avoid trivial statements-which have little importance
for knowledge and understanding.
 Avoid specific determiners like most, all, always,
sometimes, in most cases etc
 Avoid long complex sentences.
Matching Items
 A matching item consists of two lists of words or
phrases.
 The test-taker must match components in one list (the
premises, on the left) with components in the other list
(the responses, presented on the right), according to a
particular kind of association.
 It can cover a good deal of content in an efficient
fashion.
Conti…
 Used in the memorization of factual information that
are related.
 It is compact form, which makes it measure a large
amount of related factual material in a relatively
short time.
 It is easy for scoring. It is easy of construction
Limitation
 Restricted to the measurement of factual information
based on rote learning.
 the difficulty of finding homogenous material
 include in their matching items material which is less
significant.
 Susceptible to irrelevant clues.
In teacher made matching type tests, some of the
more common faults are found to be that:
 the set directions are vague
 the items to be matched are excessively long
 the list of responses lacks homogeneity
 the premises are vaguely stated.
Construction of good matching items.
1. Use fairly brief lists, placing the shorter entries on
the right. The words and phrases that make up the
premises should be short, and those that make
up the responses should be shorter still. If too
long, students tend to lose track of what they
originally set out to look for.
2. Employ homogeneous lists.
3. List responses in a logical order
4. Describe the basis for matching and the number of
times a response can be used:
Cont.
5. Try to place all premises and responses for any
matching item on a single page
6. Make sure that there are never multiple correct
responses for one stem
7. Avoid giving inadvertent grammatical clues to the
correct response
8. Use no more than 10 items in one set.
9. Provide more responses than stems to make
process-of-elimination guessing less effective.
10. Use capital letters for the response signs rather than
lower-case letters.
Short Answer/Completion Test Items
 The short answer type uses a direct question, where
as the completion test item consists of an incomplete
statement requiring the student to complete.
 The short-answer test items are one of the easiest to
construct, partly because of the relatively simple
learning outcomes it usually measures.
 Except for the problem-solving outcomes measured in
Mathematics and Science, it is used almost exclusively
to measure the recall of memorized information.
Cont.
 Partial knowledge, which might enable them to choose the correct
answer on a selection item, is insufficient for answering a short
answer test item correctly.
 There are two limitations:
 unsuitable for assessing complex learning outcomes.
 difficulty of scoring.
 This is especially true where the item is not clearly phrased to
require a definitely correct answer and the student’s spelling
ability.
Cont.
The following suggestions will help to make short-
answer type test items to function as intended.
1. Word the item so that the required answer is both brief
and specific.
2. Do not take statements directly from textbooks to use
as a basis for short-answer items..
3. A direct question is generally more desirable than an
incomplete statement.
4. If the answer is to be expressed in numerical units,
indicate the type of answer wanted.
Cont.
5. Avoid using a long quote with multiple blanks to
complete.
6. Require only one word or phrase in each blank.
7. Facilitate scoring by having the students write their
responses on lines arranged in a column to the left of
the items.
8. Only ask students important terms or expressions in
completion items.
9. Avoid providing grammatical clues to the correct
answer by using a/an, etc., instead of specific
modifiers.
Multiple-Choice Items
 This is the most popular or versatile type of selected-
response item.
 It can effectively measure learning outcomes
measured by the short-answer item, the true-false
item, and the matching item types.
 It can measure a variety of complex cognitive learning
outcomes.
 A multiple-choice item consists of a problem and a list
of suggested solutions.
 A student is first given either a question or a partially
complete statement. This part of the item is referred to
as the item’s stem.
 Then three or more potential answer-options are
presented. These are usually called alternatives,
Cont.
Variants in a multiple-choice item:
(1)The stem consists of a direct question or an incomplete
statement, and
(2)The student’s choice of alternatives to be a correct answer or a
best answer
Advantages
 widespread applicability to the assessment of cognitive skills and
knowledge,
 It’s possible to make them quite varied in the levels of difficulty
they possess.
 Items are fairly easy to score.
 The results are amenable to diagnosis.
 They provide greater structure to the question e.g. South America
. . . .a) ---b)---c)-- d)-- .The alternatives make it clear.
Cont.
 They can test students’ ability to think quickly under
pressure.
 They can be easier to modify in order to test students
at the appropriate level.
Limitation/weakness of multiple-choice
 when students review a set of alternatives for an item,
they may be able to recognize a correct answer. So it
can present an exaggerated picture of a student’s
understanding or competence, which might lead
teachers to invalid inferences.
 can never measure a student’s ability to creatively
synthesize content of any sort.
 They are difficult to construct, especially getting
plausible distracters is difficult.
 It is relatively labour intensive and time consuming to
prepare the test.
Cont.
 In an effort to come up with the necessary number of plausible
alternatives, novice item-writers sometimes toss in some alternatives
that are obviously incorrect.
 They are not well adapted to measure some learning outcomes in
mathematics, chemistry and physics etc.
 There is a possibility that students may guess the correct answer if it is
subjected to many irrelevant clues.
 As a result of recycling questions, students may get access to the
questions and achieve good marks without achieving the instructional
objectives.
1. The question or problem in the stem must be self-
contained. The stem should contain as much of the
item’s content as possible, thereby rendering the
alternatives much shorter than would otherwise be the
case.
2. Avoid negatively stated stems. Just as with the
True/False items, negatively stated stems can create
genuine confusion in students.
3. Each alternative must be grammatically consistent
with the item’s stem.
4. Make all alternatives plausible, but be sure that one of
them is indisputably the correct or best answer.
Here are some useful rules for you to
follow
Cont.
5. Randomly use all answer positions in approximately
equal numbers.
6. List alternatives on separate lines rather than
including them as part of the stem so that they can be
clearly distinguished.
7. Keep all alternatives in a similar format (e.g., all
phrases, all sentences, etc.).
8. Try to make alternatives for an item approximately the
same length. (Making the correct response
consistently longer is a common error.)
9. Use misconceptions which students have indicated in
class or errors commonly made by students in the
class as the basis for incorrect alternatives.
Cont.
10. If possible, do not use “all of the above” and “none
of the above” or use them sparingly since these
alternatives are often chosen on the basis of
incomplete knowledge.
11. Never use words such as “all,” “always,” and
“never” they are likely to signal incorrect options.
12. Use capital letters (A, B, C, D, E) on tests as
responses rather than lower-case letters (“a” gets
confused with “d” and “c” with “e” if the type or
duplication is poor).
13. Try to write items with equal numbers of alternatives
in order to avoid asking students to continually adjust
to a new pattern caused by different numbers.
14. Put the incomplete part of the sentence at the end
rather than the beginning of the stem.
Suggestions for Constructing good
distracters
 Base distracters on the most frequent errors made by students in
homework, assignments or class discussions related to that
concept.
 Use words in the distracters that are associated with words in the
stem (for example, explorer-exploration).
 Use concepts from the instructional material that have similar
vocabulary or were used in the same context as the correct
answer.
 Use distracters that are similar in content or form to the correct
answer (for example, if the correct answer is the name of a
place, have all distracters be places instead of using names of
people and other facts).
 Make the distracters similar to the correct answer in terms of
complexity, sentence structure, and length.
Constructing Performance Assessments
 Outcomes as the ability to recall, organize, and
integrate ideas; the ability to express oneself in
writing; and the ability to create.
 The most familiar form of performance-based
assessment is essay question.
 Learning outcomes concerned with the ability to
conceptualize, construct, organize, relate, and evaluate
ideas.
Cont.
Essay questions can be classified into two types –
restricted-response essay questions and extended
response essay questions.
o Restricted-response: Are usually limit both the content
and the response. The content is usually restricted by
the scope of the topic to be discussed.
o Extended response
These types of questions allow students:
 To select any factual information that they think is
relevant,
 To organize the answer in accordance with their best
judgment;
 To integrate and evaluate ideas as they deem
appropriate.
Cont.
In addition to measuring higher order thinking skills the
advantages also include the following:
 Extended-response essays focus on the integration
and application of thinking and problem solving skills.
 Essay assessments enable the direct evaluation of
writing skills.
 Essay questions, as compared to objective tests, are
easy to construct.
 Essay questions have a positive effect on students
learning.
Cont.
Limitations
 The most commonly is their unreliability of scoring.
Thus, the same paper may be scored differently by
different teachers, and even the same teacher may
give different scores for the same paper at different
times.
 The amount of time required for scoring.
 The limited sampling of content they provide.
The improvement of the essay question requires
attention to two problems:
 How to construct essay questions that call forth the
desired student response, and
 How to score the answers so that achievement is
reliably measured
Suggestions for the construction of good
essay questions
 Restrict the use of essay questions to those learning
outcomes that can not be measured satisfactorily by
objective items
 Structure items so that the student’s task is explicitly
bounded
 For each question, specify the point value, an
acceptable response-length, and a recommended time
allocation
 Employ more questions requiring shorter answers
rather than fewer questions requiring longer answers
 Don’t employ optional questions
 Test a question’s quality by creating a trial response to
the item.
Guidelines in the scoring of essay items
The following helps to make scoring easier and more
reliable.
 ensure that you are firm emotionally, mentally before
scoring
 All responses to one item should be scored before
moving to the next item
 Write out in advance a model answer to guide yourself
in grading the students’ answers
 Shuffle exam papers after scoring every question
before moving to the next
 The names of test takers should not be known while
scoring to avoid bias
Table of Specification and Arrangement of
Items
Table of Specification
 The development of valid, reliable and usable
questions involves proper planning.
 The validity, reliability and usability of such test
depend on the care with their planning and
preparation.
 Planning helps to ensure that the test covers the pre-
specified instructional objectives and the subject
matter (content).
 Planning classroom test involves identifying the
instructional objectives earlier stated and the subject
matter (content) covered during the teaching/learning
process.
Planning a classroom test.
1. Determine the purpose of the test;
2. Describe the instructional objectives and content
to be measured.
3. Determine the relative emphasis to be given to
each learning outcome;
4. Select the most appropriate item formats (essay
or objective);
5. Develop the test blue print to guide the test
construction;
6. Prepare test items that are relevant to the
learning outcomes specified in the test plan;
Cont.
7. Decide on the pattern of scoring and the
interpretation of result;
8. Decide on the length and duration of the test, and
9. Assemble the items into a test, prepare direction
and administer the test.
 The instructional objectives of the course are critically
considered while developing the test items.
Cont.
 A table of specification is a two-way table that matches
the objectives and content taught with the level at
which you expect your students to perform.
 It contains an estimate of the percentage of the test to
be associated to each topic at each level at which it is
to be measured.
 In effect we establish how much emphasis to give to
each objective or content.
Cont.
Developing a table of specification involves:
1. Preparing a list of learning outcomes, i.e. the type
of performance students are expected to
demonstrate
2. Outlining the contents of instruction, i.e. the area
in which each type of performance is to be
shown, and
3. Preparing the two way chart that relates the
learning outcomes to the instructional content.
Cont.
Cont
ents
Instructional objectives Per
cen
tag
e
Know
ledge
Com
preh
ensio
n
Appli
catio
n
Analy
sis
Synth
esis
Evalua
tion
Tot
al
Air
pressur
e
2 2 1 1 - - 6 24%
Wind 1 1 1 1 - - 4 16%
Temper
ature
2 2 1 1 - 1 7 28%
Rainfall 1 2 1 - 1 - 5 20%
Clouds 1 1 - 1 - - 3 12%
Total 7 8 4 4 1 1 25
Cont.
 The rows show the content areas from which the test
is to be sampled; and the columns indicate the level
of thinking students are required to demonstrate in
each of the content areas.
 Thus, the test items are distributed among each of the
five content areas with their corresponding
representation among the six levels of the cognitive
domain.
 The percentage row and column also shown the
degree of representation of both the contents and
levels of the cognitive domain in this particular test.
 Objectives that is more important should get more
representation in the test items.
 Similarly, content areas on which you have spent
more instructional time should be allotted more test
Cont.
 There are also other ways of developing a test blue
print.
 One of this is a way of showing the distribution of test
items among the content areas and the type of test
items to be developed from each content area.
Cont.
Cont
ents
Type of Items
True/
False
Matc
hing
Short
answ
er
Multi
ple
choic
e
Total Perc
enta
ge
Air
pressur
e
1 1 1 3 6 24%
Wind 1 1 1 1 4 16%
Temper
ature
1 2 1 3 7 28%
Rainfall 1 1 1 2 5 20%
Clouds 1 1 1 3 12%
Total 5 5 5 10 25
Percent 20% 20% 20% 40% 100%
Arrangement of test items
For most purposes the items can be arranged by a
systematic consideration of:
 The type of items used
 The learning outcomes measured
 The difficulty of the items, and
 Subject matter measured
First, the items should be arranged in sections by item
type. That is all True-false items should be grouped
together, then matching items, then all short answer or
completion items, and then all multiple choice items
Cont.
 Extended-response essay questions and performance
tasks usually take a lot of time that they would be
administered alone.
 If combined with some of the other types of items and
tasks, the extended response tasks should come last.
Cont.
This has the following advantages
- we will have a single set of direction for each type
- students can maintain the same mental set through out
each section
- scoring will be easier
Linn and Gronlund (2000: 350-351)suggest the following
arrangement order of items by format
- True False
- Matching
- Short Answer/completion
- Multiple choice
- Essay
Cont.
 For this purpose, items that measure similar outcomes
should be placed together and then arranged in order
of ascending difficulty.
For example, the items under the multiple choice section
might be arranged in the following order:
 knowledge of terms,
 knowledge of specific facts,
 knowledge of principles, and
 application of principles.
 Keeping together items that measure similar learning
outcomes is especially helpful in determining the type
of learning outcomes causing students the greatest
difficulty.
Cont.
 If it is not feasible to group the items by the learning
outcomes measured, then it is still desirable to
arrange them in order of increasing difficulty.
 Beginning with the easiest items and proceeding
gradually to the most difficult has a motivating effect
on students.
 Encountering difficult items early in the test often
causes students to spend a disproportionate amount
of time on such items.
Cont.
o Items within each test section can be arranged in
order of increasing difficulty.
o To summarize, the most effective method for
organizing items in the typical classroom test is to:
 Form sections by item type
 Group the items within each section by the
learning outcomes measured, and
 Arrange both the sections and the items
within sections in an ascending order of
difficulty, And subject matter.
Administration of Tests
 IT is the procedure of actually presenting the learning
task that the examinees are required to perform in
order to ascertain the degree of learning that has taken
place during the teaching-learning process.
 It is as important as the process of preparing the test.
 This is because the validity and reliability of test scores
can be greatly reduced when test is poorly
administered.
Cont.
 This requires the provision of a physical and
psychological environment which is conducive to
make their best efforts
Conditions that may create test anxiety on students
includs:
 Threatening students with tests if they do not behave
 Warning students to do their best “because the test
is important”
 Telling students they must work fast in order to
finish on time.
 Threatening dire consequences if they fail.
i. Ensuring Quality in Test Administration
 Guidelines and steps in ensuring quality in test
administration are:
 Collection of the question papers in time from course
teacher.
 Ensure compliance with the stipulated sitting
arrangements
 Ensure orderly and proper distribution of questions
papers to the test takers.
 Do not talk unnecessarily before the test.
 Avoid unnecessary remarks, instructions or threat that
may develop test anxiety.
 remind the test takers of the need to avoid
unprofessional conduct
Cont.
 Avoid giving hints to test takers who ask about particular
items.
 Make corrections or clarifications to the test takers whenever
necessary.
 Keep interruptions during the test to a minimum
ii) Credibility and Civility
Credibility is the value the eventual recipients and users of
results of assessment place on the result with respect to the
grades obtained, certificates issued or the issuing institution.
Conti…
Civility is whether the persons being assessed
are in such conditions as to give their best
without hindrances and burdens in the attributes
being assessed and whether the exercise is seen
as integral to or as external to the learning
process.
Cont.
 Instructions: Test should contain a set of instructions
which are usually of two types.
 One is the instruction to the test administrator while
the other one is to the test taker.
 The instruction to the test administrator should explain
how the test is to be administered the arrangements to
be made for proper administration of the test and the
handling of the scripts and other materials.
 The instructions to the administrator should be clear
for effective compliance. For the test takers, the
instruction should direct them on the amount of work
to be done or of tasks to be accomplished.
Cont.
 The instruction should explain how the test should be
performed. The language used for the instruction
should be appropriate to the level of the test takers.
The administrators should explain the test takers
instruction for proper understanding especially when
the ability to understand and follow instructions is not
part of the test.
 Duration of the Test: The time for accomplishing the
test is technically important in test administration and
should be clearly stated for both the test
administrators and test takers. Ample time should be
provided for candidates to demonstrate what they
know and what they can do. The duration of test
should reflect the age and attention span of the test
takers and the purpose of the test.
Cont.
 Venue and Sitting Arrangement: The test environment
should be learner friendly.
 Adequate physical conditions such as work space,
good and comfortable writing desks, proper lighting,
good ventilation, moderate temperature, conveniences
within reasonable distance and serenity necessary for
maximum concentration.
 Adequate lighting, good ventilation and moderate
temperature reduce test anxiety and loss of
concentration.
 Other necessary conditions: Other necessary
conditions include the fact that the questions and
question papers should be friendly with bold
characters, neat, decent, clear and appealing and not
such that intimidates test taker into mistakes.
Item Analysis
Unit 3
It is the process involved in examining or analyzing
testes’ responses to each item on a test with a basic
intent of judging the quality of item.
It is the process of examining students’ responses to each
item to determine the quality of test items.
In item analysis involves: determining difficulty level,
discrimination power of test items and judging how
effectively distracters are functioning in case of multiple
choice items.
It helps to determine the adequacy of the items
within a test as well as the adequacy of the test
itself.
Cont.
Some of the reasons for Item Analysis are :
1) Identify content that has not been adequately covered and
should be re-taught,
2) Provide feedback to students,
3) Determine if any items need to be revised, be used again or
become part of an item file or bank.
4) Identify items that may not have functioned as they were
intended,
5) Direct the teacher's attention to individual student
weaknesses.
 It is a measure of the proportion of examinees who answered the
item correctly; for this reason it is frequently called the p-value.
 If scores from all students in a group are included the difficulty
index is simply the total percent correct.
 When there is a sufficient number of scores available (i.e., 100 or
more) difficulty indexes are calculated using scores from the top
and bottom 27 percent of the group.
Item analysis procedures
 Rank the papers in order from the highest to the lowest score.
 For each test item, tabulate the number of students in the upper &
lower groups who selected each option
Item difficulty level index
Cont.
 Compute the difficulty of each item (% of students
who got the right item)
 Item difficulty index can be calculated using the
following formula:
P= Success in HSG+Success in LSG
N or (HSG+LSG)
 Where, HSG = High Scoring Groups
 LSG = Low Scoring Groups
 N = the total number of HSG and LSG
The difficulty indexes can range between 0.0 and
1.0 and are usually expressed as a percentage.
A higher value indicates that a greater proportion
of examinees responded to the item correctly, and
it was thus an easier item.
Cont.
P-Value Percent
Range
Interpretation
> or = 0.75 75-100 Easy
< or = 0.25 0-25 Difficult
between .25 & .75 26-74 Average
o The average difficulty of a test is the average of the
individual item difficulties.
o For maximum discrimination among students, an average
difficulty of .60 is ideal.
Cont.
 For criterion-referenced tests, with their emphasis on
mastery-testing, many items on an exam form will have
p-values of .9 or above.
 Norm-referenced tests , are designed to be harder
overall and to spread out the examinees’ scores. Thus,
many of the items on an NRT will have difficulty
indexes between .4 and .6.
Item discrimination index
 The index of discrimination is a numerical
indicator that enables us to determine whether the
question discriminates appropriately between
lower scoring and higher scoring students.
 When students who earn high scores are
compared with those who earn low scores, we
would expect to find more students in the high
scoring group answering a question correctly than
students from the low scoring group.
 In the case of very difficult items which no one in
either group answered correctly or fairly easy
questions which even the students in the low group
answered correctly, the numbers of correct answers
might be equal for the two groups.
 What we would not expect to find is a case in which
the low scoring students answered correctly more
frequently than students in the high group.
 Item discrimination index can be calculated using the
following formula:
D= Success in HSG-Success in LSG
0.5(HSG+LSG)
 Where, HSG = High Scoring Groups
 LSG = Low Scoring Groups
Cont.
 The item discrimination index can vary from -1.00 to
+1.00.
 A negative discrimination index (between -1.00 and
zero) results when more students in the low group
answered correctly than students in the high group.
 A discrimination index of zero means equal numbers
of high and low students answered correctly, so the
item did not discriminate between groups.
Conti…
 A positive index occurs when more students in the
high group answer correctly than the low group.
 If the students in the class are fairly homogeneous
in ability and achievement, their test performance
is also likely to be similar, resulting in little
discrimination between high and low groups.
Cont.
 Questions that have an item difficulty index of 1.00 or
0.00 need not be included when calculating item
discrimination indices.
 An item difficulty of 1.00 indicates that everyone
answered correctly, while 0.00 means no one answered
correctly.
 Either type of item discriminates between students.
 When computing the discrimination index, the scores
are divided into three groups with the top 27% of the
scores in the upper group and the bottom 27% in the
lower group.
Cont.
 The number of correct responses for an item by the
lower group is subtracted from the number of correct
responses for the item in the upper group.
 The difference is divided by the number of students in
either group. The process is repeated for each item.
The value is interpreted in terms of both:
 direction (positive or negative) and
 strength (non-discriminating to strongly-discriminating).
These values can range from -1.00 to +1.00.
The possible range of the discrimination index is -1.0
to 1.
Cont.
D-Value Direction Interpretation
Strength
> +.40 Positive Strong
+.20 to +.40 Positive Moderate
-.20 to +.20 None Non-
discriminating
< -.20 Negative Moderate to
strong
Cont.
 For a small group of students, an index of
discrimination for an item that exceeds .20 is
considered satisfactory.
 For larger groups, the index should be higher because
more difference between groups would be expected.
 The guidelines for an acceptable level of
discrimination depend upon item difficulty.
 For very easy or very difficult items, low discrimination
levels would be expected; most students, regardless
of ability, would get the item correct or incorrect as the
case may be.
 For items with a difficulty level of about 70 percent, the
Cont.
 When an item is discriminating negatively, overall the
most knowledgeable examinees are getting the item
wrong and the least knowledgeable examinees are
getting the item right.
 A negative discrimination index may indicate that the
item is measuring something other than what the rest
of the test is measuring. More often, it is a sign that the
item has been mis-keyed.
Distracter Analysis
 One important element in the quality of a multiple choice
item is the quality of the item’s distracters. However,
neither the item difficulty nor the item discrimination index
considers the performance of the incorrect response
options, or distracters.
 A distracter analysis evaluates the effectiveness of the
distracters in each item by comparing the number of
students in the upper and lower groups who selected each
incorrect alternative (a good distracter will attract more
students from the lower group than the upper group).
Cont.
 Just as the key, or correct response option, must be
definitively correct, the destructors must be clearly
incorrect (or clearly not the "best" option). In addition
to being clearly incorrect, the destructors must also be
plausible. That is, the destructors should seem likely
or reasonable to an examinee who is not sufficiently
knowledgeable in the content area.
 If a destructor appears so unlikely that almost no
examinee will select it, it is not contributing to the
performance of the item. In fact, the presence of one or
more plausible destructors in a multiple choice item
can make the item artificially far easier than it ought to
be.
Cont.
 It is not desirable to have one of the distracters chosen
more often than the correct answer. This result
indicates a potential problem with the question.
 If students do not know the correct answer and are
purely guessing, their answers would be expected to
be distributed among the distracters as well as the
correct answer.
 If one or more distracters are not chosen the
unselected distracters probably are not plausible. If
the teacher wants to make the test more difficult, those
distracters should be replaced in the next tests.
Cont.
 Whenever the proportion of examinees who selected a
distracter is greater than the proportion of examinees
who selected the key, the item should be examined to
determine if it has been mis-keyed or double-keyed.
 If examinees consistently fail to select a given
distracter, this may be evidence that the distracter is
implausible or simply too easy.
Item Banking
 Building a file of effective test items and assessment
tasks involves recording the items or tasks, adding
information from analyses of students responses, and
filing the records by both the content area and the
objective that the item or task measures.
 Such a file is especially valuable in areas of complex
achievement, when the construction of test items and
assessment tasks is difficult and time consuming.
When enough high-quality items and tasks have been
assembled, the burden of preparing tests and
assessments is considerably lightened. Computer item
banking makes tasks even easier.
UNIT FOUR
RELIABILITY AND VALIDITY OF ASSESSMENT TOOLS
VALIDITY
 Validity is the most important idea to consider when preparing or
selecting a test or other measuring instrument for use. The drawing of
correct conclusions or making decisions based on the data obtained
from an assessment is validity.
 If the test lacks validity, the information it provides is useless. The
validity of a test can be viewed as the "correctness" of the decisions or
conclusions made from performance of students gathered through the
tests.
 Validity has been defined as referring to the appropriateness,
meaningfulness, and usefulness of the decisions teachers make based
on the information they collect from their students using tests and
other instruments.
Cont.
 It should be related to (1) performance on a universe of
items (content validity), (2) performance on some criterion
(criterion-related validity), or (3) the degree to which certain
psychological traits or constructs are actually represented by
test performance (construct validity).
 The term validity refers to the accuracy of test results of
students; that is, it addresses the question of how confident
can we be that a test actually indicates a person's true score
on a trait.
Types of Validity
 Validity is divided into three categories or types.
A. Content Validity
 The relevant type of validity in the measurement of a
behavior is content validity
 In assessing the content validity of a test, the teacher
asks, "To what extent does the test require
demonstration by the students who have taken the test
that constitute all aspects of the knowledge or other
behavior being measured?" This type of validity refers
to the adequacy of the assessment. According to
Whitely (1996), adequate assessment has two
components:
1. Relevance of the content of the test to the objectives
or behavior being measured, and
2. Representativeness of the content in the test.
 A relevant test assesses only those objectives that are
stated in the instruction.
 For a test to have high content validity, it should be a
representative sample of both the objectives and
contents being measured.
Cont.
1. For critical examination of the items of a measure in
relation to the behavior or the purpose of the tests, one
must make the following professional judgments:
A. Does the test content parallel to the instructional
objectives to be assessed?
B. Do the items of the measure cover different aspects
of the course and objectives?
Criterion-Related Validity
 Criterion-related validity indicates whether the scores
on a test predict scores on a well-specified,
predetermined criterion.
 There are two types of criterion-related validity. They
are concurrent validity and predictive validity.
Cont.
Concurrent validity uses correlation coefficients to
describe the degree of relationship between the scores
of two tests of students given at about the same time.
A high relationship suggests that the tests are
assessing something similar. This type of evidence is
used in the development of new standardized tests or
other measuring instruments that measure – in a
different, perhaps more efficient, way – the same thing
as an old instrument. The purpose of doing this is
usually to substitute one test by another.
In predictive validity, the data on the criterion variable
are collected some time after the data on the predictor
variable are collected. Take the case of ESLCE and
college performance. Students take the national
examination in May. Their college performances are
collected certain months in February. In both
concurrent and predictive procedures a test is related
 The term construct refers to a psychological construct,
a theoretical conceptualization about an aspect of
human behavior that cannot be measured or observed
directly (Ebel and Frisbie, 1991).
 Construct validity is an interpretation or meaning that
is given to a set of scores from tests that assess a
behavior or theory that cannot be measured directly,
such as measuring an unobservable trait like
intelligence, creativity, or anxiety. For example, we use
tests to measure intelligence. Intelligence is a variable
that we cannot directly observe it.
 We infer it from the students’ test scores. Students
who scored high on the test are said to be intelligent.
Cont.
Face Validity
 Strictly speaking, face validity is not a major type of
validity. It refers to the degree of the content of the test
look valid or the extent to which a test appears to
measure what it is intended to measure, to those, for
example teachers who prepared the test items, who
administer and/or value the test (Worthen et al., 1999).
Face validity may not be as important as content
validity, criterion related validity or construct related
validity from measurement perspective.
 Validity is influenced by a number of factors. The
following are some major ones.
Factors in the test itself
 The following factors can prevent the test items from
functioning as intended and thereby lower the validity
of the interpretations from the assessment results. The
first five factors are equally applicable for
assessments requiring extended student performance
and traditional tests. The last five factors apply most
directly to tests with fixed choice or short answer
items that are scored right or wrong.
Cont.
 Unclear direction
 Too difficult vocabulary and sentence structure
 Ambiguity in sentence structure
 Inadequate time limits
 Overemphasis of easy to-assess aspects of domain at the
expense of important –but hard to assess aspects
 Test items inappropriate for the outcomes being
measured
 Poorly constructed test items
 Test too short
 Improper arrangement of items
 Identifiable pattern of answers
Cont.
Factors in administration and scoring
 In case of teacher made tests, such factors as insufficient
time, unfair aid to individual students who ask for help,
cheating, and unreliable scoring of student performances
tend to lower validity. In the case of published tests, failure
to follow the standard directions and time limits, giving
students unauthorized assistance, and errors in scoring
similarly contribute to lower validity.
Factors in student responses
 Some students may be bothered by emotional disturbances
that interfere with their performance. Others may be
frightened by the assessment situation and so are unable
to respond normally, and still others may be motivated to
put forth their effort.
RELIABILITY:
 The degree of consistency between two measures of
the same thing.
 The measure of how stable, dependable, trustworthy,
and consistent a test is in measuring the same thing
each time.
 Reliability can be defined as a measure of how
consistent our measurements are. Scores that are
highly reliable are accurate and can be reproduced.
Reliability refers to the consistency of assessment
results over time and with different samples of
students. It is the adequacy of assessment devices to
assess what they are supposed to assess again and
again.
Cont.
 Consistency can be a) consistency over a period of
time (stability),b) consistency over different forms and
c) internal consistency.
 Consistent measurement is a necessary condition for
high quality educational and psychological testing.
 Although reliability is a necessary condition for valid
test score, it is not a sufficient condition.
 Reliability affects validity and quality of decision.
Methods of estimating reliability
 There are several methods of estimating reliability of a
measuring instrument or a test. The common ones are
stability, equivalence, stability and equivalence, internal
consistency, and rater agreement.
Stability (Test-retest)
 A coefficient of stability is obtained by correlating scores
from the same test of a group of individuals on two
different occasions. If the scores of the individuals are
consistent (that is, if those scoring high the first time also
score high the second time, and so on) then the correlation
coefficient, and the reliability, are high. This test-retest
procedure assumes that the characteristic measured
remains constant.
Cont.
Equivalence (Parallel forms)
 It is obtained by giving two forms (with equal content,
means, and variances) of a test to the same group on
the same day and correlating these results. Here we
are determining how confidently we can generalize a
person’s score to what he would receive if he took a
test composed of similar but different questions. When
two equivalent or parallel forms of the same
instrument are administered to the same group of
students at about the same time, and the scores are
related, the reliability that results is a coefficient of
equivalence.
Cont.
Equivalence and Stability
 When a teacher needs to give a pretest and posttest to
assess a change in behavior, a reliability coefficient of
equivalence and stability should be established. In this
procedure, reliability data are obtained by
administering to the same group of individuals one
form of a test at one time and a second form at a later
date. To minimize the effects of memory, chance and
maturation factors, reliability coefficients estimated
from parallel-forms of a test are preferred to test-retest
reliability coefficients.
Cont.
Internal Consistency
 Internal consistency is another type of estimating
reliability. It is the most common type of reliability since it
can be estimated from giving one form of a test once.
There are three common types of internal consistency:
Split-half method, Kuder-Richardson, and the Cronbach
Alpha methods.
a. The Split-Half Method
In split-half reliability, the items of a test that have been
administered to a group are divided into two comparable
halves, and a correlation coefficient is calculated
between the halves. If each student has about the same
scores on each half, then the correlation is high and the
test has high reliability.
Cont.
 Each half should be of similar difficulty. This method
provides a lower reliability than other methods, since
the total number in the correlation equation contains
only half the items (and we know that other things
being equal, longer tests are more reliable than short
tests).
 This technique should not be used with speeded tests.
This is because all students do not answer all items, a
factor that tends to increase the correlations between
the items.
 In splitting the test into two halves, one might put, for
example, the odd and even-numbered items.
Cont.
 Fortunately, one can estimate the reliability (rxx) of the full
test from roe via the Spearman-Brown formula.
 Where roe = the Pearson correlation between the half-test
scores on the odd and the even items. The Spearman-
Brown split half method assumes the two halves have
equal standard deviations.
Kuder-Richardson Method
 They have developed a number of formulas in order to
correlate all items on a single test with each other when
each item is scored right or wrong, correct or incorrect, yes
or no, and so on. K-R reliability is thus determined from a
single administration of a test for the same group of
students, but without having to split the test into equivalent
halves.
Cont.
 This procedure assumes that all items in the test are
equivalent to each other, and it is appropriate when the
purpose of the test is to measure a single behavior, for
example, reading ability of students.
 If a test has items of varying difficulty or if it measures
more than one behavior, the KR estimates would
usually be lower than the split-half reliabilities.
 The Cronbach Alpha, or sometimes called Alpha
Coefficient, developed by Cronbach (1957), also
assumes that all items have similar difficulty levels. It
is a much more general form of internal consistency
than the KR, and it is used for items that are not
scored right or wrong, yes or no, true or false. The
Cronbach Alpha is generally the most appropriate type
of reliability for tests or questionnaires in which there
is a range of possible answers for each item.
Raters Agreement
The fifth type of reliability is expressed as a coefficient
of agreement. This is established by determining the
extent to which two or more persons agree about what
they have seen, heard, scored, or rated. For example,
when two or more teachers score the answers of
students for essay items, will they give the same or
similar scores for the students, i.e., do they agree on
what they score? If they do, then there is some
consistency in measurement.
FACTORS INFLUENCING RELIABILITY
A number of factors influence reliability of test scores.
a. Test related factors
 These factors include test length, difficulty of test items and
score variability.
 Test length. When we have a large number of items, the reliability
of the items would increase.
 Difficulty of test items. Score variability depends on difficulty
level of items. If items are too difficult, only few students will
answer them. As a result, scores will be all low. On the other
hand, if items are too easy, many students will answer them. As a
result, scores will be mostly high. In both instances, scores do
not vary very much. This contributes to low reliability index. In
contrast, in moderately difficult items students scores are highly
likely to vary which will result in high reliability index.
Cont.
 According to Ebel & Frisbie (1991) items having 40% to 80%
difficulty level contribute much to reliability. On the other
hand, items that have more than 90 percent or fewer than
30 percent of the examinees answer correctly cannot
possibly contribute to reliability.
Score Variability
 As scores vary, reliability tends to be higher. Compared to
true-false items, multiple-choice items yield higher
reliability indices. This is so because in true-false items
students have a 50% probability of getting a correct answer
by chance thereby contributing to low score variability
among students. In multiple choice items with four options,
on the other hand, the probability of getting an item right by
chance is 25% which results in better score variability
among students.
 These factors include the nature of the group tested,
student testwiseness, and student motivation
 Nature of the group tested. Assume that all students in your
class are brilliant. When you administer a test, they score
very high and the difference between the highest and the
lowest score is very narrow. If all the students are weak,
their scores would be low. Accordingly, the range of the
scores between the maximum and minimum would be low.
But if there are high achieving, average, and low achieving
students in the same classroom, their scores vary from
high to low. Thus, reliability is higher in heterogeneous
students than in homogenous students.
Cont.
 Student testwiseness. Some students are wise when
they are taking tests. Though they do not know the
idea, they are very clever at guessing the correct
answer for the item using some clues or something
that leads them to the correct answer. Students
therefore vary in their skill of taking tests. This skill
creates a difference in the students’ scores. When
students vary considerably in the level of
testwiseness, error scores will be higher which in turn
results in lower reliability of scores.
Cont.
 Student Motivation. Who performs well: A motivated
student or an unmotivated student? Does motivation
have any effect on academic performance of students?
Obviously, academic motivation has a strong effect on
students’ performance. Students who are motivated
academically tend to achieve higher than those who
are not motivated. When students are unmotivated,
their performances may not be the reflections of their
actual performances. In a classroom where students
vary in terms of motivation there will be variability in
scores leading to lowered reliability.
 These factors include time limits and cheating
opportunities
 Time Limits. Whether a test is speeded or power matters
when it comes to the reliability of the test. When internal
consistency reliability indices are determined from a single
administration speeded tests, they will vary highly. This is
because in speeded tests students' scores are likely to be
the number of items attempted. Thus, it is suggested that
when reliability is to be determined to speeded tests, two
test administration be used and correlation calculated.
 Cheating. Any form of cheating reduces score reliability.
Cheating includes copying answers from others, using
cheat sheets, passing answers to next exam halls, getting a
test prior to its administration, etc.
Ethical Standards of assessment
Ethical and Professional Standards of
Assessment and its Use
 Ethical standards guide teachers in fulfilling their
obligation to provide and use tests that are fair to all
test takers regardless of age, gender, disability,
ethnicity, religion, linguistic background, or other
personal characteristics.
Cont.
 Fairness is a primary consideration in all aspects of
testing. It:
helps to ensure that all test takers are
given a comparable opportunity to
demonstrate what they know and how they
can perform in the area being tested.
implies that every test taker has the
opportunity to prepare for the test and is
informed about the general nature and
content of the test.
also extends to the accurate reporting of
individual and group test results.
teachers may consider in their assessment
practices.
1. Teachers should be skilled in choosing assessment
methods appropriate for instructional decisions.
2. Teachers should develop tests that meet the intended
purpose and that are appropriate for the intended test
takers.
3. The teacher should be skilled in administering,
scoring and interpreting the results from diverse
assessment methods.
Cont
4. Teachers should be skilled in using assessment results
when making decisions about individual students, planning
teaching, developing curriculum, and school improvement
5. Teachers should be skilled in developing valid pupil
grading procedures which use pupil assessments.
6. Teachers should be skilled in communicating assessment
results to students, parents, other stakeholders and other
educators.
7. Teachers should be skilled in recognizing unethical, illegal,
and otherwise inappropriate assessment methods and uses
of assessment information.
Cont.
In addition, the following are principles of grading that can
guide the development of a grading system.
 The system of grading should be clear and understandable
(to parents, other stakeholders, and most especially
students).
 The system of grading should be communicated to all
stakeholders (e.g., students, parents, administrators).
 Grading should be fair for all students regardless of gender,
socioeconomic status or any other personal
characteristics.
 Grading should support, enhance, and inform the
instructional process.
assessments
 Fairness is fundamentally a socio-cultural, rather than
a technical, issue.
 Students represent a variety of cultural and linguistic
backgrounds. If the cultural and linguistic
backgrounds are ignored, students may become
alienated or disengaged from the learning and
assessment process. Teachers need to be aware of
how such backgrounds may influence student
performance and the potential impact on learning.
Teachers should be ready to provide accommodations
where needed.
Cont
Classroom assessment practices should be sensitive to
the cultural and linguistic diversity of students in order
to obtain accurate information about their learning.
Assessment practices that attend to issues of cultural
diversity include those that
 acknowledge students’ cultural backgrounds.
 are sensitive to those aspects of an assessment that
may hamper students’ ability to demonstrate their
knowledge and understanding.
 use that knowledge to adjust or scaffold assessment
practices if necessary.
Cont
Assessment practices that attend to issues of linguistic
diversity include those that
 acknowledge students’ differing linguistic abilities.
 use that knowledge to adjust or scaffold assessment
practices if necessary.
 use assessment practices in which the language demands
do not unfairly prevent the students from understanding
what is expected of them.
 use assessment practices that allow students to accurately
demonstrate their understanding by responding in ways
that accommodate their linguistic abilities, if the response
method is not relevant to the concept being assessed (e.g.,
allow a student to respond orally rather than in writing).
Cont
 Teachers must make every effort to address and minimize the
effect of bias in classroom assessment practices. Bias occurs
when irrelevant or arbitrary factors systematically influence
interpretations and results made that affect the performance of
an individual student or a subgroup of students.
 Assessment should be culturally and linguistically appropriate,
fair and bias-free.
For an assessment task to be fair, its content, context, and
performance expectations should:
 reflect knowledge, values, and experiences that are equally
familiar and appropriate to all students;
 tap knowledge and skills that all students have had adequate
time to acquire;
 be as free as possible of cultural and ethnic stereotypes
Disability and Assessment
Practices
 It is quite obvious that many countries education
system were exclusionary in fully accommodating the
educational needs of disabled students.
 This has been true not only in our country but in the
rest of the world as well, although the magnitude might
differ from country to country.
 It was in response to this situation that UNESCO has
been promoting the principle of inclusive education to
guide the educational policies and practice of all
governments.
Cont
 Different world conventions were held and documents
signed towards the implementation of inclusive
education. Our country, Ethiopia, has been a signatory
of these documents and therefore has accepted
inclusive education as a basic principle to guide its
policy and practice in relation to the education of
disabled students
Cont
 Inclusive education is based on the idea that all
students, including those with disabilities, should be
provided with the best possible education to develop
themselves. This implies for the provision of all
possible accommodations to address the educational
needs of disabled students. Accommodations should
not only refer to the teaching and learning process. It
should also consider the assessment mechanisms and
procedures.
cont
 There are different strategies that can be considered to
make assessment practices accessible to students with
disabilities depending on the type of disability. The
following strategies could be considered in summative
assessments:
 Modifying assessments: - This should enable disabled
students to have full access to the assessment without
giving them any unfair advantage.
 Others’ support: - Disabled students may need the support
of others in certain assessment activities which they can
not do it independently. For instance, they may require
readers and scribes in written exams; they may also need
others’ assistance in practical activities, such as using
equipments, locating materials, drawing and measuring.
Cont
 Time allowances: - Disabled students should be given
additional time to complete their assessments which the
individual instructor has to decide based on the purpose
and nature of the assessment.
 Rest breaks: Some students may need rest breaks during
the examination. This may be to relieve pain or to attend to
personal needs.
 Flexible schedules: In some cases disabled students may
require flexibility in the scheduling of examinations. For
example, some students may find it difficult to manage a
number of examinations in quick succession and need to
have examinations scheduled over a period of days.
cont
 Alternative methods of assessment:- In certain
situations where formal methods of assessment may
not be appropriate for disabled students, the instructor
should assess them using non formal methods such
as class works, portfolios, oral presentations, etc.
 Assistive Technology: Specific equipment may need to
be available to the student in an examination. Such
arrangements often include the use of personal
computers, voice activated software and screen
readers.
Gender issues in assessment
 Teachers’ assessment practices can also be affected
by gender stereotypes. The issues of gender bias and
fairness in assessment are concerned with differences
in opportunities for boys and girls. A test is biased if
boys and girls with the same ability levels tend to
obtain different scores.
cont
Test questions should be checked for:
 material or references that may be offensive to
members of one gender,
 references to objects and ideas that are likely to be
more familiar to men or to women,
 unequal representation of men and women as actors in
test items or representation of members of each
gender only in stereotyped roles.
Cont.
 If the questions involve objects and ideas that are
more familiar or less offensive to members of one
gender, then the test may be easier for individuals of
that gender. Standards for achievement on such a test
may be unfair to individuals of the gender that is less
familiar with or more offended by the objects and ideas
discussed, because it may be more difficult for such
individuals to demonstrate their abilities or their
knowledge of the material.

More Related Content

What's hot

Classroom management
Classroom managementClassroom management
Classroom managementNadia Khurram
 
Teaching profession powerpoint
Teaching profession powerpointTeaching profession powerpoint
Teaching profession powerpointMaryjane Tura
 
Measurement and evaluation in education
Measurement and evaluation in educationMeasurement and evaluation in education
Measurement and evaluation in educationCheryl Asia
 
Types of grading system
Types of grading systemTypes of grading system
Types of grading systemRedPaspas
 
Formative and summative evaluation in Education
Formative and summative evaluation in EducationFormative and summative evaluation in Education
Formative and summative evaluation in EducationSuresh Babu
 
Assessment for learning
Assessment for learningAssessment for learning
Assessment for learningAtul Thakur
 
marks and mark system.pptx
marks and mark system.pptxmarks and mark system.pptx
marks and mark system.pptxLailaIlajasTagbo
 
Continuous Assessment
Continuous AssessmentContinuous Assessment
Continuous AssessmentManuel Reyes
 
BEHAVIORAL OBJECTIVES
BEHAVIORAL OBJECTIVESBEHAVIORAL OBJECTIVES
BEHAVIORAL OBJECTIVESjonasmole
 
Educational Measurement
Educational MeasurementEducational Measurement
Educational MeasurementAJ Briones
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrumentJohn Paul Hablado
 
Lesson plan new
Lesson plan newLesson plan new
Lesson plan newNehaNupur8
 
Glaser’s Basic Teaching (Classroom Meeting).pdf
Glaser’s Basic Teaching (Classroom Meeting).pdfGlaser’s Basic Teaching (Classroom Meeting).pdf
Glaser’s Basic Teaching (Classroom Meeting).pdfBeulahJayarani
 

What's hot (20)

Continuous Assessment
Continuous AssessmentContinuous Assessment
Continuous Assessment
 
Classroom management
Classroom managementClassroom management
Classroom management
 
Teaching profession powerpoint
Teaching profession powerpointTeaching profession powerpoint
Teaching profession powerpoint
 
Measurement and evaluation in education
Measurement and evaluation in educationMeasurement and evaluation in education
Measurement and evaluation in education
 
Types of grading system
Types of grading systemTypes of grading system
Types of grading system
 
Formative and summative evaluation in Education
Formative and summative evaluation in EducationFormative and summative evaluation in Education
Formative and summative evaluation in Education
 
Marks and marking system final
Marks and marking system final Marks and marking system final
Marks and marking system final
 
Assessment for learning
Assessment for learningAssessment for learning
Assessment for learning
 
marks and mark system.pptx
marks and mark system.pptxmarks and mark system.pptx
marks and mark system.pptx
 
Teaching competence
Teaching competenceTeaching competence
Teaching competence
 
Continuous Assessment
Continuous AssessmentContinuous Assessment
Continuous Assessment
 
Basics of Assessment
Basics of AssessmentBasics of Assessment
Basics of Assessment
 
BEHAVIORAL OBJECTIVES
BEHAVIORAL OBJECTIVESBEHAVIORAL OBJECTIVES
BEHAVIORAL OBJECTIVES
 
Modes of education
Modes of educationModes of education
Modes of education
 
IMPORTANCE OF KNOWLEDGE, SUBJECT AND DISCIPLINE
IMPORTANCE OF KNOWLEDGE, SUBJECT  AND  DISCIPLINEIMPORTANCE OF KNOWLEDGE, SUBJECT  AND  DISCIPLINE
IMPORTANCE OF KNOWLEDGE, SUBJECT AND DISCIPLINE
 
Educational Measurement
Educational MeasurementEducational Measurement
Educational Measurement
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrument
 
Lesson plan new
Lesson plan newLesson plan new
Lesson plan new
 
Glaser’s Basic Teaching (Classroom Meeting).pdf
Glaser’s Basic Teaching (Classroom Meeting).pdfGlaser’s Basic Teaching (Classroom Meeting).pdf
Glaser’s Basic Teaching (Classroom Meeting).pdf
 
Programmes for professional growth
Programmes for professional growth  Programmes for professional growth
Programmes for professional growth
 

Similar to 2015 PGDT 423 (1).pptx

Unit 1.Evaluation, Assessment and Measurement pptx
Unit 1.Evaluation, Assessment and Measurement pptxUnit 1.Evaluation, Assessment and Measurement pptx
Unit 1.Evaluation, Assessment and Measurement pptxSamruddhi Chepe
 
Basic Concept in Assessment
Basic Concept in AssessmentBasic Concept in Assessment
Basic Concept in AssessmentJarry Fuentes
 
Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...
Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...
Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...Dereck Downing
 
Assessment and Evaluation
Assessment and EvaluationAssessment and Evaluation
Assessment and EvaluationSuresh Babu
 
Assessment of learning Chapter 1
Assessment of learning Chapter 1Assessment of learning Chapter 1
Assessment of learning Chapter 1Jarry Fuentes
 
What-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptxWhat-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptxANIOAYRochelleDaoaya
 
Chapter 1 Basic Concept in Assessment
Chapter 1 Basic Concept in AssessmentChapter 1 Basic Concept in Assessment
Chapter 1 Basic Concept in AssessmentKaenah Faye Padongao
 
Assessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi guptaAssessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi guptagoggigupta
 
Test Development and Evaluation
Test Development and Evaluation Test Development and Evaluation
Test Development and Evaluation HennaAnsari
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxMarjorie Malveda
 
distinction between assessment evaluation and measurement
distinction between assessment evaluation and measurementdistinction between assessment evaluation and measurement
distinction between assessment evaluation and measurementhidayatulhaq
 
Basic concepts in Assessments (Educ 9)
Basic concepts in Assessments (Educ 9)Basic concepts in Assessments (Educ 9)
Basic concepts in Assessments (Educ 9)Roqui Gonzaga
 
EVALUATION AND ASSESSMENT IN NURSING.pptx
EVALUATION AND ASSESSMENT IN NURSING.pptxEVALUATION AND ASSESSMENT IN NURSING.pptx
EVALUATION AND ASSESSMENT IN NURSING.pptxKavitha Krishnan
 
Concept and nature of measurment and evaluation (1)
Concept and nature of measurment and evaluation (1)Concept and nature of measurment and evaluation (1)
Concept and nature of measurment and evaluation (1)dheerajvyas5
 
ASSESSMENT FOR LEARNING - BOOK
ASSESSMENT FOR LEARNING - BOOKASSESSMENT FOR LEARNING - BOOK
ASSESSMENT FOR LEARNING - BOOKLisa Brewer
 

Similar to 2015 PGDT 423 (1).pptx (20)

Unit 1.Evaluation, Assessment and Measurement pptx
Unit 1.Evaluation, Assessment and Measurement pptxUnit 1.Evaluation, Assessment and Measurement pptx
Unit 1.Evaluation, Assessment and Measurement pptx
 
Basic Concept in Assessment
Basic Concept in AssessmentBasic Concept in Assessment
Basic Concept in Assessment
 
Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...
Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...
Assessment For Learning -Second Year -Study Material TAMIL NADU TEACHERS EDUC...
 
Assessment (1)
Assessment (1)Assessment (1)
Assessment (1)
 
Assessment and Evaluation
Assessment and EvaluationAssessment and Evaluation
Assessment and Evaluation
 
Assessment of learning Chapter 1
Assessment of learning Chapter 1Assessment of learning Chapter 1
Assessment of learning Chapter 1
 
What-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptxWhat-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptx
 
UNIT 1.pptx
UNIT 1.pptxUNIT 1.pptx
UNIT 1.pptx
 
Chapter 1 Basic Concept in Assessment
Chapter 1 Basic Concept in AssessmentChapter 1 Basic Concept in Assessment
Chapter 1 Basic Concept in Assessment
 
Assessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi guptaAssessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi gupta
 
Test Development and Evaluation
Test Development and Evaluation Test Development and Evaluation
Test Development and Evaluation
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
 
distinction between assessment evaluation and measurement
distinction between assessment evaluation and measurementdistinction between assessment evaluation and measurement
distinction between assessment evaluation and measurement
 
Assessment
AssessmentAssessment
Assessment
 
Basic concepts in Assessments (Educ 9)
Basic concepts in Assessments (Educ 9)Basic concepts in Assessments (Educ 9)
Basic concepts in Assessments (Educ 9)
 
EVALUATION AND ASSESSMENT IN NURSING.pptx
EVALUATION AND ASSESSMENT IN NURSING.pptxEVALUATION AND ASSESSMENT IN NURSING.pptx
EVALUATION AND ASSESSMENT IN NURSING.pptx
 
Dup(01)portfolio (1)
Dup(01)portfolio (1)Dup(01)portfolio (1)
Dup(01)portfolio (1)
 
Concept and nature of measurment and evaluation (1)
Concept and nature of measurment and evaluation (1)Concept and nature of measurment and evaluation (1)
Concept and nature of measurment and evaluation (1)
 
ASSESSMENT FOR LEARNING - BOOK
ASSESSMENT FOR LEARNING - BOOKASSESSMENT FOR LEARNING - BOOK
ASSESSMENT FOR LEARNING - BOOK
 
Report 5
Report 5Report 5
Report 5
 

Recently uploaded

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

2015 PGDT 423 (1).pptx

  • 1. PGDT 2016 HARAMAYA UNIVERSITY COLLEGE OF EDUCATION AND BEHAVIORAL SCIENCES Department of Psychology Assessment and Evaluation of Learning
  • 2. Unit 1: Assessment: Concept, Purpose, and Principles  You might have come across with the concepts 1. test, 2. measurement, 3. assessment, & 4. evaluation. How do you understand them? Test  Test in educational context is presentation of a standard set of questions to be answered by students.  It is one instrument that is used for collecting information about students’ behaviors or
  • 3. Cont. o A test is a task or a serious of tasks or questions that students must answer or perform. o It is used to get information regarding the extent to which the students have mastered the subject matter taught and the attainment of instructional objectives. o Test is a systematic procedure for observing (i.e., getting information) and describing one or more characteristics of a person with the aid of either a numerical scale (measurement such as test scores) or a category (qualitative means).
  • 4. Cont. Measurement  Measurement is the process by which the attributes of a person are measured and described in numbers based on certain rules.  It is a quantitative description of the behavior or performance of students.  Measurement permits more objective description concerning traits and facilitates comparisons.
  • 5. Assessment  Assessment is the planned process of gathering and synthesizing information relevant to the purposes of discovering and documenting students' strengths and weaknesses, planning and enhancing instruction, and/or to check progress or status to make it ready for decision making.  Information may be collected using various instruments including tests, observations of students, checklists, questionnaires and interviews.
  • 6. Cont.  It is a process of collecting, synthesizing, and interpreting information to aid in decision- making (Nitko, 1996; and Airasian, 1996).  It can be qualitative and quantitative.
  • 7. Evaluation  Is the processes of judging the quality of student learning on the basis of established performance standards and assigning a value to represent the worthiness or quality of that learning or performance.  It is concerned with determining how well they have learned.  When we evaluate, we are saying that something is good, appropriate, valid, positive, and so forth.  Evaluation includes both quantitative and qualitative descriptions of student behavior and value judgment concerning the desirability of that behavior
  • 8. Cont.  Evaluation = Quantitative description of students’ behavior (measurement) + qualitative description of students’ behavior (non- measurement) + value judgment  Evaluation involves judgment. The quantitative values that we obtain through measurement will not have any meaning until they are evaluated against some standards. Importance and Purposes of Assessment  It can be summarized that assessment in education focuses on: helping LEARNING, and; improving TEACHING.
  • 9.  With regards to the learner, assessment is aimed at providing information that will help us make decisions concerning remediation, enrichment, selection, exceptionality, progress and certification.  With regard to teaching, assessment provides information about the attainment of objectives, the effectiveness of teaching methods and learning materials. Overall, assessment serves the following main purposes in education. 1. Assessment is used to inform and guide teaching and learning 2. Assessment is used to help students set learning goals 3. Assessment is used to assign report card grades
  • 10. for  Grading  Awards  Diagnosis  Placement  Information on entry behavior  Evidence of effectiveness and/or possible shortcomings  Opportunity to communicate to stakeholders  Guidance  Plan and adapt instruction  Provide feedback and incentives(to motivate students)
  • 11. Role of objectives in assessment  The first step in planning any good teaching is to clearly define the learning objectives or outcomes.  A learning objective is an outcome statement that captures specifically what knowledge, skills, attitudes learners should be able to exhibit following instruction.  Effective assessment practice requires relating the assessment procedures as directly as possible to the learning objectives.  Instructional objectives which are commonly known as learning outcomes play a key role in both the instructional and assessment process.
  • 12. Cont.  They serve as guides for both teaching and learning, communicate the intent of instruction to others, and provide guidelines for assessing students learning.  A learning outcome stated in this way clearly indicates the kind of performance students are expected to exhibit as a result of the instruction.  Well stated learning outcomes make clear the types of students performance we are willing to accept as evidence that the instruction has been successful.
  • 13. Principles of Assessment Different educators and school systems have developed different sets of assessment principles. Miller, Linn and Grunland (2009) have identified the following general principles of assessment.  Clearly specifying what is to be assessed has priority in the assessment process.  An assessment procedure should be selected because of its relevance to the characteristics or performance to be measured.  Comprehensive assessment requires a variety of procedures.  Proper use of assessment procedures require an awareness of their limitations.
  • 14. Cont.  The New South West Wales Department of Education and Training (2008) in Australia are more inclusive and listed the following principles:  Assessment should be relevant  Assessment should be appropriate  Assessment should be fair.  Assessment should be accurate  Assessment should provide useful information  Assessment should be integrated into the teaching and learning cycle.  Assessment should draw on a wide range of evidence  Assessment should be manageable
  • 15. Assessment and Some Basic Assumptions  The quality of student learning is directly, although not exclusively related to the quality of teaching.  To improve their effectiveness, teachers need first to make their goals and objectives explicit and then to get specific, comprehendible feedback on the extent to which they are achieving those goals and objectives.  To improve their learning, students need to receive appropriate and focused feedback early and often; they also need to learn how to assess their own learning.
  • 16. Concept of Continuous Assessment  Continuous assessment is a learning process between teachers, students and stakeholders (parents).  It is a process that fosters dialogue between these stakeholders to bring out the child’s best learning.  It is also a holistic process that not only brings together multiple stakeholders, but also integrates assessment and teaching as interconnected activities that are integral to the child’s learning.  It is an assessment approach which involves the use of a variety of assessment instruments, assessing various components of learning.
  • 17. Characteristics of Continuous Assessment Characteristics of continuous assessment. a) Systematic b) Comprehensive c) Cumulative d) Guidance –Oriented A/ Systematic nature of CA  It requires an operational plan which indicates what measurements are to be made about the pupils’ performance, at what time intervals or times during the school year, the measurements to be made and the results recorded, and the nature of the tools or instruments to be used in the measurements.  It is planned, graded to suit the age and experience of the children and given at suitable intervals during the school year.
  • 18. Cont. B/ Comprehensive nature of CA  It is comprehensive in the sense that many types of instruments are used in determining the performance.  It means that Continuous assessment is not focused on academic skills alone.  It embraces the cognitive, psychomotor and affective domains. A child is assessed as a total entity using all the psychometric devises such as test and non test techniques.
  • 19. Cont. (c) Cumulative Nature of CA: It is cumulative since any decision to be made at any point on the pupil takes into account all previous decisions. d) Guidance-Oriented Nature of CA:  It means that the information collected is to be used for educational, vocational and personal- social decision-making for the child. Guidance and counseling activities thrive better on valid, sequel, systematic, continuous, cumulative and comprehensive information.
  • 20. Assessment 1. It measures what a student knows, understands and can do. 2. It is fair 3. It is carried out periodically over a term and over a year 4. There are a number of different types of assessment activities 5. Assessment and instruction are similar/going side by side 6. There is a lack of pupil fear 7. Focus is on pupil progress
  • 21. Assessment, Learning, and the Involvement of Students Classroom assessment promotes learning when teachers use it in the following ways:  When they use it to become aware of the knowledge, skills, and beliefs that their students bring to a learning task, and;  When they use this knowledge as a starting point for new instruction, and monitor students’ changing perceptions as instruction proceeds.  When teachers and students collaborate and use ongoing assessment and pertinent feedback to move learning forward.  When classroom assessment is frequent and varied, teachers can learn a great deal about their students.
  • 22. Cont. Assessment provides the feedback loop for this process.  By increasing students’ motivation. Motivation is essential for students’ engagement in their learning. The higher the motivation, the more time and energy a student is willing to devote to any given task. Even when a student finds the content interesting and the activity enjoyable, they show sustained concentration and effort.  Assessment can be a motivator, not through reward and punishment, but by stimulating students’ intrinsic interest. Assessment can enhance student motivation by:
  • 23. Cont. • emphasizing progress and achievement rather than failure • providing feedback to move learning forward • reinforcing the idea that students have control over, and responsibility for their own learning • building confidence in students so they can and need to take risks being relevant, and appealing to students’ imaginations • providing the scaffolding that students need to genuinely succeed
  • 24. Assessment and Teacher Professional Competence o A teacher should have some basic competencies on classroom assessment so as to be able to effectively assess his/her students learning. o Assessment activities occur prior, during, and after instruction. o In the American education system a list of seven standards for teacher competence in educational assessment of students has been developed.
  • 25. Cont. The seven standards are stated below. Teachers should be skilled in:- 1. Choosing assessment options appropriate for instructional decisions. 2. Developing assessment methods appropriate for instructional decisions. 3. Administering, scoring, and interpreting the results of assessment methods. 4. Using assessment results when making decisions about individual students, planning teaching, developing curriculum, and school improvement
  • 26. Cont. 5. Developing valid student grading procedures that use student assessments 6. Communicating assessment results to students, parents, other audiences, and educators. 7. Recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of assessment information.
  • 27. UNIT TWO ASSESSMENT STRATEGIES, METHODS, AND TOOLS There are three pairs of assessment typologies: o formal vs. informal, o formative vs. summative assessments o criterion referenced vs. norm referenced,
  • 28. Formal and Informal Assessment  Formal Assessment: implies a written document, such as a test, quiz, or paper. A formal assessment gives a numerical score or grade based on student performance.  Informal Assessment: "Informal“ indicate techniques that can easily be incorporated into classroom routines and learning activities.  can be used at anytime without interfering with instructional time.  It usually occurs in a more casual manner and may include observation, inventories, checklists, rating scales, rubrics, performance and portfolio assessments, participation, peer and self
  • 29. Cont.  Methods for informal assessment can be unstructured (e.g., student work samples, journals) and structured (e.g., checklists, observations).  Unstructured methods frequently are somewhat more difficult to score and evaluate, but they can provide a great deal of valuable information about the skills of the students.  Structured methods can be reliable and valid techniques when time is spent creating the "scoring" procedures.  Informal assessments actively involve the students in the evaluation process - they are not just paper-and-pencil tests.
  • 30. Formative and Summative Assessments  Functional role during classroom instruction -formative and summative assessment  Formative Assessment: used to shape and guide classroom instruction.  Include both informal and formal assessments  It can be given before, during, and even after instruction, the goal is to improve instruction.  It is ongoing assessments, appraisal, and observations in a classroom.  It serves a diagnostic function for both students and teachers.
  • 31. Cont.  It helps students to adjust, improve their performance or engagement in the unit.  Teachers receive feedback on the quality of learners’ understandings and consequently, can modify their teaching approaches to provide enrichment or remedial activities to more effectively guide learners.  It is also known by the name ‘assessment for learning’ or ‘continuous assessment’.
  • 32. Cont.  Summative Assessment: comes at the end of a course (or unit) of instruction.  It evaluates the quality of students’ learning and assigns a mark to students’ work based on how effectively learners have addressed the performance standards and criteria.  Assessment tasks conducted during the progress of a semester may be regarded as summative in nature if they only contribute to the final grades of the students.  A particular assessment task can be both formative and summative
  • 33. Criterion-referenced and Norm-referenced Assessments Based on interpreting student performance:  Criterion-referenced Assessment- is carried out against previously specified criteria and performance standards or the subject matter.  Grade is assigned on the basis of the standard the student has achieved on each of the criteria.  Norm-referenced Assessment: This type of assessment has its end point the determination of student performance based on a position within a cohort of students – the norm group.
  • 34. Assessment Strategies  Assessment strategy refers to those assessment tasks (methods/approaches/activities) in which students are engaged to ensure that all the learning objectives of a subject, a unit or a lesson have been adequately addressed. Criteria for selecting assessment strategies involves:  Its appropriateness for the particular behavior being assessed.  It should also be related to the course material and relevant to students’ lives.
  • 35. Cont.  There are many different ways to categorize learning goals for students. o Knowledge and understanding: What facts do students know outright? What information can they retrieve? What do they understand? o Reasoning proficiency: Can students analyze, categorize, and sort into component parts? Can they generalize and synthesize what they have learned? o Skills: We have certain skills that we want students to master such as reading fluently, working productively in a group, making an oral presentation, speaking a foreign language, or designing an experiment.
  • 36. Cont.  Ability to create products: Another kind of learning target is student-created products - tangible evidence that the student has mastered knowledge, reasoning, and specific production skills. Examples include a research paper, a piece of furniture, or artwork.  Dispositions: We also frequently care about student attitudes and habits of mind, including attitudes toward school, persistence, responsibility, flexibility, and desire to learn.
  • 37. Cont.  The various assessment strategies that can be used by classroom teachers, some are described below. Classroom presentations:  requires students to verbalize their knowledge, select and present samples of finished work, and organize their thoughts about a topic in order to present a summary of their learning.  Conferences: is a formal or informal meeting between the teacher and a student for the purpose of exchanging information or sharing ideas.
  • 38. Cont.  Exhibitions/Demonstrations: is a performance in a public setting, during which a student explains and applies a process, procedure, etc., in concrete ways to show individual achievement of specific skills and knowledge.  Interviews:-interview is a face-to-face conversation in which teacher and student use inquiry to share their knowledge and understanding of a topic or problem. This form of assessment can be used by the teacher to:  explore the student’s thinking;  assess the student’s level of understanding of a concept or procedure; and  gather information, obtain clarification, determine positions, and probe for motivations
  • 39. Cont. o Observation: is a process of systematically viewing and recording students while they work, for the purpose of making instructional decisions. o It can take place at any time and in any setting. It provides information on students' strengths and weaknesses, learning styles, interests, and attitudes.  Observations may be informal or highly structured, and incidental or scheduled over different periods of time in different learning contexts. There are various observational techniques. They include anecdotal records, checklists, rating scales, socio-metric techniques.
  • 40. Cont. Performance tasks: students create, produce, perform, or present works on "real world" issues. It may be used to assess a skill or proficiency, and provides useful information on the process as well as the product. Portfolios: is a collection of samples of a student’s work over time. o It offers a visual demonstration of a student’s achievement, capabilities, strengths, weaknesses, knowledge, and specific skills, over time and in a variety of contexts. o For a portfolio to serve as an effective assessment instrument, it has to be focused, selective, reflective, and collaborative.
  • 41. Cont.  Questions and answers: Perhaps, this is a widely used strategy by teachers with the intention of involving their students in the learning and teaching process. In this strategy, the teacher poses a question and the student answers verbally, rather than in writing.
  • 42. Cont. Students’ self-assessments:  It is the student’s own assessment of personal progress in terms of knowledge, skills, processes, or attitudes. Self- assessment leads students to a greater awareness and understanding of themselves as learners Checklists, Rating Scales and Rubrics  These are tools that state specific criteria and allow teachers and students to gather information and to make judgments about what students know and can do in relation to the outcomes.
  • 43. Cont.  Checklists usually offer a yes/no format in relation to student demonstration of specific criteria. They may be used to record observations of an individual, a group or a whole class.  Rating Scales allow teachers to indicate the degree or frequency of the behaviors, skills and strategies displayed by the learner. Rating scales state the criteria and provide three or four response selections to describe the quality or frequency of student work.  Rubrics use a set of criteria to evaluate a student's performance. They consist of a fixed measurement scale and detailed description of the characteristics for each level of performance. These descriptions focus on the quality of the product or performance and not the quantity.
  • 44. Cont. The purpose of checklists, rating scales and rubrics is to:  provide tools for systematic recording of observations  provide tools for self-assessment  provide samples of criteria for students prior to collecting and evaluating data on their work  record the development of specific skills, strategies, attitudes and behaviors necessary for demonstrating learning  clarify students' instructional needs by presenting a record of current accomplishments.
  • 45. Cont.  One- Minute paper: During the last few minutes of the class period, you may ask students to answer on a half-sheet of paper: "What is the most important point you learned today?" and, "What point remains least clear to you?"  Muddiest Point: This is similar to ‘One-Minute Paper’ but only asks students to describe what they didn't understand and what they think might help.  It is to determine which key points of the lesson were missed by the students.  Here also you have to review before next class meeting and use to clarify, correct, or elaborate.
  • 46. Cont.  Student- generated test questions: You may allow students to write test questions and model answers for specified topics, in a format consistent with course exams. This will give students the opportunity to evaluate the course topics, reflect on what they understand, and what good test items are. You may evaluate the questions and use the goods ones as prompts for discussion.  Tests: is the type of assessment that you are mostly familiar with. A test requires students to respond to prompts in order to demonstrate their knowledge (orally or in writing) or their skills (e.g., through performance).
  • 47. Assessment in large classes  Due to time and resources constraints, teachers often use less time-demanding assessment methods.  Assessment issues associated with large classes include: 1.Surface Learning Approach: teachers rely on time-efficient and exam-based assessment methods for assessing large classes, such as multiple choices and short answer question examinations.  Assess learning at the lower levels of intellectual complexity.  Students tend to adopt a surface rote learning approach when preparing for these kinds of assessment methods.
  • 48. Cont.  Feedback is often inadequate  Inconsistency in marking: Large class usually consists of a diverse and complex group of students. The issues of different perception towards assessments, cultural and educational background, prior knowledge and level of interest to the subject all pose challenges to the fairness of marking and grading.  Difficulty in monitoring cheating and plagiarism  Lack of interaction and engagement  When teachers raise questions in large classes, many students are not willing to respond.
  • 49. Cont. Assessment in large class for Effective student learning include: Front ending: putting in an increased effort at the beginning in setting up the students for the work they are going to do, the work submitted can be improved.  Therefore the time needed to mark it is reduced. Making use of in-class assignments In-class assignments are usually quick and relatively easy to mark and provide feedback on, help you to identify gaps in understanding.
  • 50. Cont. • Self-assessment reduces the marking load because it ensures a higher quality of work that is submitted, thereby minimizing the amount of time expended on marking and feedback. Peer-assessment • provide useful learning experiences for students at the same time as reducing the marking load of staff.
  • 51. Cont.  Group Assessments: significantly reduces the marking load if the group submits only one work.  The major problem is that group members may not contribute equally, so how are they to be rewarded fairly.  Changing the assessment method, or at least shortening it Being faced with large numbers of students will present challenges but may also provide opportunities to either modify existing assessments or to explore new methods of assessment.
  • 52. Selecting and developing assessment methods and tools  The process of assessing student performance must begin with educational outcomes.  A wide variety of tools are available for assessing student performance.
  • 53. Cont. Constructing Tests  Classroom tests can be consist of objective test items and performance assessments.  Objective tests are highly structured and require the test taker to select the correct answer from several alternatives or to supply a word or short phrase to answer a question or complete a statement.  They are called objective because they have a single right or best answer that can be determined in advance.  Performance assessment tasks permit the student to organize and construct the answer in essay form, by using equipment, generating hypothesis, making observations, constructing something or
  • 54. Cont. Constructing Objective Test Items  There are various types of objective test items.  These can be classified into supply type items and selection type items.  Supply type items include completion items and short answer questions.  Selection type test items include True/False, multiple choice and matching. True/False Test Items Advantage of true/false items is that:  do not require the student much time for answering.
  • 55. Cont.  allows a teacher to cover a wide range of content.  can be scored quickly, reliably, and objectively  measuring higher mental processes of understanding, application and interpretation. Disadvantage  promote memorization of factual information  encourage students for guessing and cheating.  Do not discriminate b/n students of varying ability like other test items  include more irrelevant clues than other item types  lead a teacher to favor testing of trivial knowledge.  Items are not unequivocally (clearly) true or false (difficulty of writing statements which are clearly true or false)
  • 56. Suggestions Construct good quality true/false test items.  Avoid negative statements, and never use double negatives.  If opinion is used, attribute it to some source, unless the ability to identify opinion is being specifically measured.  Restrict single-item statements to single concepts  Avoid ambiguous words and statements of broad general statements  Use an approximately equal number of items, reflecting the two categories tested
  • 57. Cont.  Make statements representing both categories equal in length.  Avoid trivial statements-which have little importance for knowledge and understanding.  Avoid specific determiners like most, all, always, sometimes, in most cases etc  Avoid long complex sentences.
  • 58. Matching Items  A matching item consists of two lists of words or phrases.  The test-taker must match components in one list (the premises, on the left) with components in the other list (the responses, presented on the right), according to a particular kind of association.  It can cover a good deal of content in an efficient fashion.
  • 59. Conti…  Used in the memorization of factual information that are related.  It is compact form, which makes it measure a large amount of related factual material in a relatively short time.  It is easy for scoring. It is easy of construction
  • 60. Limitation  Restricted to the measurement of factual information based on rote learning.  the difficulty of finding homogenous material  include in their matching items material which is less significant.  Susceptible to irrelevant clues. In teacher made matching type tests, some of the more common faults are found to be that:  the set directions are vague  the items to be matched are excessively long  the list of responses lacks homogeneity  the premises are vaguely stated.
  • 61. Construction of good matching items. 1. Use fairly brief lists, placing the shorter entries on the right. The words and phrases that make up the premises should be short, and those that make up the responses should be shorter still. If too long, students tend to lose track of what they originally set out to look for. 2. Employ homogeneous lists. 3. List responses in a logical order 4. Describe the basis for matching and the number of times a response can be used:
  • 62. Cont. 5. Try to place all premises and responses for any matching item on a single page 6. Make sure that there are never multiple correct responses for one stem 7. Avoid giving inadvertent grammatical clues to the correct response 8. Use no more than 10 items in one set. 9. Provide more responses than stems to make process-of-elimination guessing less effective. 10. Use capital letters for the response signs rather than lower-case letters.
  • 63. Short Answer/Completion Test Items  The short answer type uses a direct question, where as the completion test item consists of an incomplete statement requiring the student to complete.  The short-answer test items are one of the easiest to construct, partly because of the relatively simple learning outcomes it usually measures.  Except for the problem-solving outcomes measured in Mathematics and Science, it is used almost exclusively to measure the recall of memorized information.
  • 64. Cont.  Partial knowledge, which might enable them to choose the correct answer on a selection item, is insufficient for answering a short answer test item correctly.  There are two limitations:  unsuitable for assessing complex learning outcomes.  difficulty of scoring.  This is especially true where the item is not clearly phrased to require a definitely correct answer and the student’s spelling ability.
  • 65. Cont. The following suggestions will help to make short- answer type test items to function as intended. 1. Word the item so that the required answer is both brief and specific. 2. Do not take statements directly from textbooks to use as a basis for short-answer items.. 3. A direct question is generally more desirable than an incomplete statement. 4. If the answer is to be expressed in numerical units, indicate the type of answer wanted.
  • 66. Cont. 5. Avoid using a long quote with multiple blanks to complete. 6. Require only one word or phrase in each blank. 7. Facilitate scoring by having the students write their responses on lines arranged in a column to the left of the items. 8. Only ask students important terms or expressions in completion items. 9. Avoid providing grammatical clues to the correct answer by using a/an, etc., instead of specific modifiers.
  • 67. Multiple-Choice Items  This is the most popular or versatile type of selected- response item.  It can effectively measure learning outcomes measured by the short-answer item, the true-false item, and the matching item types.  It can measure a variety of complex cognitive learning outcomes.  A multiple-choice item consists of a problem and a list of suggested solutions.  A student is first given either a question or a partially complete statement. This part of the item is referred to as the item’s stem.  Then three or more potential answer-options are presented. These are usually called alternatives,
  • 68. Cont. Variants in a multiple-choice item: (1)The stem consists of a direct question or an incomplete statement, and (2)The student’s choice of alternatives to be a correct answer or a best answer Advantages  widespread applicability to the assessment of cognitive skills and knowledge,  It’s possible to make them quite varied in the levels of difficulty they possess.  Items are fairly easy to score.  The results are amenable to diagnosis.  They provide greater structure to the question e.g. South America . . . .a) ---b)---c)-- d)-- .The alternatives make it clear.
  • 69. Cont.  They can test students’ ability to think quickly under pressure.  They can be easier to modify in order to test students at the appropriate level. Limitation/weakness of multiple-choice  when students review a set of alternatives for an item, they may be able to recognize a correct answer. So it can present an exaggerated picture of a student’s understanding or competence, which might lead teachers to invalid inferences.  can never measure a student’s ability to creatively synthesize content of any sort.  They are difficult to construct, especially getting plausible distracters is difficult.  It is relatively labour intensive and time consuming to prepare the test.
  • 70. Cont.  In an effort to come up with the necessary number of plausible alternatives, novice item-writers sometimes toss in some alternatives that are obviously incorrect.  They are not well adapted to measure some learning outcomes in mathematics, chemistry and physics etc.  There is a possibility that students may guess the correct answer if it is subjected to many irrelevant clues.  As a result of recycling questions, students may get access to the questions and achieve good marks without achieving the instructional objectives.
  • 71. 1. The question or problem in the stem must be self- contained. The stem should contain as much of the item’s content as possible, thereby rendering the alternatives much shorter than would otherwise be the case. 2. Avoid negatively stated stems. Just as with the True/False items, negatively stated stems can create genuine confusion in students. 3. Each alternative must be grammatically consistent with the item’s stem. 4. Make all alternatives plausible, but be sure that one of them is indisputably the correct or best answer. Here are some useful rules for you to follow
  • 72. Cont. 5. Randomly use all answer positions in approximately equal numbers. 6. List alternatives on separate lines rather than including them as part of the stem so that they can be clearly distinguished. 7. Keep all alternatives in a similar format (e.g., all phrases, all sentences, etc.). 8. Try to make alternatives for an item approximately the same length. (Making the correct response consistently longer is a common error.) 9. Use misconceptions which students have indicated in class or errors commonly made by students in the class as the basis for incorrect alternatives.
  • 73. Cont. 10. If possible, do not use “all of the above” and “none of the above” or use them sparingly since these alternatives are often chosen on the basis of incomplete knowledge. 11. Never use words such as “all,” “always,” and “never” they are likely to signal incorrect options. 12. Use capital letters (A, B, C, D, E) on tests as responses rather than lower-case letters (“a” gets confused with “d” and “c” with “e” if the type or duplication is poor). 13. Try to write items with equal numbers of alternatives in order to avoid asking students to continually adjust to a new pattern caused by different numbers. 14. Put the incomplete part of the sentence at the end rather than the beginning of the stem.
  • 74. Suggestions for Constructing good distracters  Base distracters on the most frequent errors made by students in homework, assignments or class discussions related to that concept.  Use words in the distracters that are associated with words in the stem (for example, explorer-exploration).  Use concepts from the instructional material that have similar vocabulary or were used in the same context as the correct answer.  Use distracters that are similar in content or form to the correct answer (for example, if the correct answer is the name of a place, have all distracters be places instead of using names of people and other facts).  Make the distracters similar to the correct answer in terms of complexity, sentence structure, and length.
  • 75. Constructing Performance Assessments  Outcomes as the ability to recall, organize, and integrate ideas; the ability to express oneself in writing; and the ability to create.  The most familiar form of performance-based assessment is essay question.  Learning outcomes concerned with the ability to conceptualize, construct, organize, relate, and evaluate ideas.
  • 76. Cont. Essay questions can be classified into two types – restricted-response essay questions and extended response essay questions. o Restricted-response: Are usually limit both the content and the response. The content is usually restricted by the scope of the topic to be discussed. o Extended response These types of questions allow students:  To select any factual information that they think is relevant,  To organize the answer in accordance with their best judgment;  To integrate and evaluate ideas as they deem appropriate.
  • 77. Cont. In addition to measuring higher order thinking skills the advantages also include the following:  Extended-response essays focus on the integration and application of thinking and problem solving skills.  Essay assessments enable the direct evaluation of writing skills.  Essay questions, as compared to objective tests, are easy to construct.  Essay questions have a positive effect on students learning.
  • 78. Cont. Limitations  The most commonly is their unreliability of scoring. Thus, the same paper may be scored differently by different teachers, and even the same teacher may give different scores for the same paper at different times.  The amount of time required for scoring.  The limited sampling of content they provide. The improvement of the essay question requires attention to two problems:  How to construct essay questions that call forth the desired student response, and  How to score the answers so that achievement is reliably measured
  • 79. Suggestions for the construction of good essay questions  Restrict the use of essay questions to those learning outcomes that can not be measured satisfactorily by objective items  Structure items so that the student’s task is explicitly bounded  For each question, specify the point value, an acceptable response-length, and a recommended time allocation  Employ more questions requiring shorter answers rather than fewer questions requiring longer answers  Don’t employ optional questions  Test a question’s quality by creating a trial response to the item.
  • 80. Guidelines in the scoring of essay items The following helps to make scoring easier and more reliable.  ensure that you are firm emotionally, mentally before scoring  All responses to one item should be scored before moving to the next item  Write out in advance a model answer to guide yourself in grading the students’ answers  Shuffle exam papers after scoring every question before moving to the next  The names of test takers should not be known while scoring to avoid bias
  • 81. Table of Specification and Arrangement of Items Table of Specification  The development of valid, reliable and usable questions involves proper planning.  The validity, reliability and usability of such test depend on the care with their planning and preparation.  Planning helps to ensure that the test covers the pre- specified instructional objectives and the subject matter (content).  Planning classroom test involves identifying the instructional objectives earlier stated and the subject matter (content) covered during the teaching/learning process.
  • 82. Planning a classroom test. 1. Determine the purpose of the test; 2. Describe the instructional objectives and content to be measured. 3. Determine the relative emphasis to be given to each learning outcome; 4. Select the most appropriate item formats (essay or objective); 5. Develop the test blue print to guide the test construction; 6. Prepare test items that are relevant to the learning outcomes specified in the test plan;
  • 83. Cont. 7. Decide on the pattern of scoring and the interpretation of result; 8. Decide on the length and duration of the test, and 9. Assemble the items into a test, prepare direction and administer the test.  The instructional objectives of the course are critically considered while developing the test items.
  • 84. Cont.  A table of specification is a two-way table that matches the objectives and content taught with the level at which you expect your students to perform.  It contains an estimate of the percentage of the test to be associated to each topic at each level at which it is to be measured.  In effect we establish how much emphasis to give to each objective or content.
  • 85. Cont. Developing a table of specification involves: 1. Preparing a list of learning outcomes, i.e. the type of performance students are expected to demonstrate 2. Outlining the contents of instruction, i.e. the area in which each type of performance is to be shown, and 3. Preparing the two way chart that relates the learning outcomes to the instructional content.
  • 86. Cont. Cont ents Instructional objectives Per cen tag e Know ledge Com preh ensio n Appli catio n Analy sis Synth esis Evalua tion Tot al Air pressur e 2 2 1 1 - - 6 24% Wind 1 1 1 1 - - 4 16% Temper ature 2 2 1 1 - 1 7 28% Rainfall 1 2 1 - 1 - 5 20% Clouds 1 1 - 1 - - 3 12% Total 7 8 4 4 1 1 25
  • 87. Cont.  The rows show the content areas from which the test is to be sampled; and the columns indicate the level of thinking students are required to demonstrate in each of the content areas.  Thus, the test items are distributed among each of the five content areas with their corresponding representation among the six levels of the cognitive domain.  The percentage row and column also shown the degree of representation of both the contents and levels of the cognitive domain in this particular test.  Objectives that is more important should get more representation in the test items.  Similarly, content areas on which you have spent more instructional time should be allotted more test
  • 88. Cont.  There are also other ways of developing a test blue print.  One of this is a way of showing the distribution of test items among the content areas and the type of test items to be developed from each content area.
  • 89. Cont. Cont ents Type of Items True/ False Matc hing Short answ er Multi ple choic e Total Perc enta ge Air pressur e 1 1 1 3 6 24% Wind 1 1 1 1 4 16% Temper ature 1 2 1 3 7 28% Rainfall 1 1 1 2 5 20% Clouds 1 1 1 3 12% Total 5 5 5 10 25 Percent 20% 20% 20% 40% 100%
  • 90. Arrangement of test items For most purposes the items can be arranged by a systematic consideration of:  The type of items used  The learning outcomes measured  The difficulty of the items, and  Subject matter measured First, the items should be arranged in sections by item type. That is all True-false items should be grouped together, then matching items, then all short answer or completion items, and then all multiple choice items
  • 91. Cont.  Extended-response essay questions and performance tasks usually take a lot of time that they would be administered alone.  If combined with some of the other types of items and tasks, the extended response tasks should come last.
  • 92. Cont. This has the following advantages - we will have a single set of direction for each type - students can maintain the same mental set through out each section - scoring will be easier Linn and Gronlund (2000: 350-351)suggest the following arrangement order of items by format - True False - Matching - Short Answer/completion - Multiple choice - Essay
  • 93. Cont.  For this purpose, items that measure similar outcomes should be placed together and then arranged in order of ascending difficulty. For example, the items under the multiple choice section might be arranged in the following order:  knowledge of terms,  knowledge of specific facts,  knowledge of principles, and  application of principles.  Keeping together items that measure similar learning outcomes is especially helpful in determining the type of learning outcomes causing students the greatest difficulty.
  • 94. Cont.  If it is not feasible to group the items by the learning outcomes measured, then it is still desirable to arrange them in order of increasing difficulty.  Beginning with the easiest items and proceeding gradually to the most difficult has a motivating effect on students.  Encountering difficult items early in the test often causes students to spend a disproportionate amount of time on such items.
  • 95. Cont. o Items within each test section can be arranged in order of increasing difficulty. o To summarize, the most effective method for organizing items in the typical classroom test is to:  Form sections by item type  Group the items within each section by the learning outcomes measured, and  Arrange both the sections and the items within sections in an ascending order of difficulty, And subject matter.
  • 96. Administration of Tests  IT is the procedure of actually presenting the learning task that the examinees are required to perform in order to ascertain the degree of learning that has taken place during the teaching-learning process.  It is as important as the process of preparing the test.  This is because the validity and reliability of test scores can be greatly reduced when test is poorly administered.
  • 97. Cont.  This requires the provision of a physical and psychological environment which is conducive to make their best efforts Conditions that may create test anxiety on students includs:  Threatening students with tests if they do not behave  Warning students to do their best “because the test is important”  Telling students they must work fast in order to finish on time.  Threatening dire consequences if they fail.
  • 98. i. Ensuring Quality in Test Administration  Guidelines and steps in ensuring quality in test administration are:  Collection of the question papers in time from course teacher.  Ensure compliance with the stipulated sitting arrangements  Ensure orderly and proper distribution of questions papers to the test takers.  Do not talk unnecessarily before the test.  Avoid unnecessary remarks, instructions or threat that may develop test anxiety.  remind the test takers of the need to avoid unprofessional conduct
  • 99. Cont.  Avoid giving hints to test takers who ask about particular items.  Make corrections or clarifications to the test takers whenever necessary.  Keep interruptions during the test to a minimum ii) Credibility and Civility Credibility is the value the eventual recipients and users of results of assessment place on the result with respect to the grades obtained, certificates issued or the issuing institution.
  • 100. Conti… Civility is whether the persons being assessed are in such conditions as to give their best without hindrances and burdens in the attributes being assessed and whether the exercise is seen as integral to or as external to the learning process.
  • 101. Cont.  Instructions: Test should contain a set of instructions which are usually of two types.  One is the instruction to the test administrator while the other one is to the test taker.  The instruction to the test administrator should explain how the test is to be administered the arrangements to be made for proper administration of the test and the handling of the scripts and other materials.  The instructions to the administrator should be clear for effective compliance. For the test takers, the instruction should direct them on the amount of work to be done or of tasks to be accomplished.
  • 102. Cont.  The instruction should explain how the test should be performed. The language used for the instruction should be appropriate to the level of the test takers. The administrators should explain the test takers instruction for proper understanding especially when the ability to understand and follow instructions is not part of the test.  Duration of the Test: The time for accomplishing the test is technically important in test administration and should be clearly stated for both the test administrators and test takers. Ample time should be provided for candidates to demonstrate what they know and what they can do. The duration of test should reflect the age and attention span of the test takers and the purpose of the test.
  • 103. Cont.  Venue and Sitting Arrangement: The test environment should be learner friendly.  Adequate physical conditions such as work space, good and comfortable writing desks, proper lighting, good ventilation, moderate temperature, conveniences within reasonable distance and serenity necessary for maximum concentration.  Adequate lighting, good ventilation and moderate temperature reduce test anxiety and loss of concentration.  Other necessary conditions: Other necessary conditions include the fact that the questions and question papers should be friendly with bold characters, neat, decent, clear and appealing and not such that intimidates test taker into mistakes.
  • 105. It is the process involved in examining or analyzing testes’ responses to each item on a test with a basic intent of judging the quality of item. It is the process of examining students’ responses to each item to determine the quality of test items. In item analysis involves: determining difficulty level, discrimination power of test items and judging how effectively distracters are functioning in case of multiple choice items. It helps to determine the adequacy of the items within a test as well as the adequacy of the test itself.
  • 106. Cont. Some of the reasons for Item Analysis are : 1) Identify content that has not been adequately covered and should be re-taught, 2) Provide feedback to students, 3) Determine if any items need to be revised, be used again or become part of an item file or bank. 4) Identify items that may not have functioned as they were intended, 5) Direct the teacher's attention to individual student weaknesses.
  • 107.  It is a measure of the proportion of examinees who answered the item correctly; for this reason it is frequently called the p-value.  If scores from all students in a group are included the difficulty index is simply the total percent correct.  When there is a sufficient number of scores available (i.e., 100 or more) difficulty indexes are calculated using scores from the top and bottom 27 percent of the group. Item analysis procedures  Rank the papers in order from the highest to the lowest score.  For each test item, tabulate the number of students in the upper & lower groups who selected each option Item difficulty level index
  • 108. Cont.  Compute the difficulty of each item (% of students who got the right item)  Item difficulty index can be calculated using the following formula: P= Success in HSG+Success in LSG N or (HSG+LSG)  Where, HSG = High Scoring Groups  LSG = Low Scoring Groups  N = the total number of HSG and LSG The difficulty indexes can range between 0.0 and 1.0 and are usually expressed as a percentage. A higher value indicates that a greater proportion of examinees responded to the item correctly, and it was thus an easier item.
  • 109. Cont. P-Value Percent Range Interpretation > or = 0.75 75-100 Easy < or = 0.25 0-25 Difficult between .25 & .75 26-74 Average o The average difficulty of a test is the average of the individual item difficulties. o For maximum discrimination among students, an average difficulty of .60 is ideal.
  • 110. Cont.  For criterion-referenced tests, with their emphasis on mastery-testing, many items on an exam form will have p-values of .9 or above.  Norm-referenced tests , are designed to be harder overall and to spread out the examinees’ scores. Thus, many of the items on an NRT will have difficulty indexes between .4 and .6.
  • 111. Item discrimination index  The index of discrimination is a numerical indicator that enables us to determine whether the question discriminates appropriately between lower scoring and higher scoring students.  When students who earn high scores are compared with those who earn low scores, we would expect to find more students in the high scoring group answering a question correctly than students from the low scoring group.
  • 112.  In the case of very difficult items which no one in either group answered correctly or fairly easy questions which even the students in the low group answered correctly, the numbers of correct answers might be equal for the two groups.  What we would not expect to find is a case in which the low scoring students answered correctly more frequently than students in the high group.  Item discrimination index can be calculated using the following formula: D= Success in HSG-Success in LSG 0.5(HSG+LSG)  Where, HSG = High Scoring Groups  LSG = Low Scoring Groups
  • 113. Cont.  The item discrimination index can vary from -1.00 to +1.00.  A negative discrimination index (between -1.00 and zero) results when more students in the low group answered correctly than students in the high group.  A discrimination index of zero means equal numbers of high and low students answered correctly, so the item did not discriminate between groups.
  • 114. Conti…  A positive index occurs when more students in the high group answer correctly than the low group.  If the students in the class are fairly homogeneous in ability and achievement, their test performance is also likely to be similar, resulting in little discrimination between high and low groups.
  • 115. Cont.  Questions that have an item difficulty index of 1.00 or 0.00 need not be included when calculating item discrimination indices.  An item difficulty of 1.00 indicates that everyone answered correctly, while 0.00 means no one answered correctly.  Either type of item discriminates between students.  When computing the discrimination index, the scores are divided into three groups with the top 27% of the scores in the upper group and the bottom 27% in the lower group.
  • 116. Cont.  The number of correct responses for an item by the lower group is subtracted from the number of correct responses for the item in the upper group.  The difference is divided by the number of students in either group. The process is repeated for each item. The value is interpreted in terms of both:  direction (positive or negative) and  strength (non-discriminating to strongly-discriminating). These values can range from -1.00 to +1.00. The possible range of the discrimination index is -1.0 to 1.
  • 117. Cont. D-Value Direction Interpretation Strength > +.40 Positive Strong +.20 to +.40 Positive Moderate -.20 to +.20 None Non- discriminating < -.20 Negative Moderate to strong
  • 118. Cont.  For a small group of students, an index of discrimination for an item that exceeds .20 is considered satisfactory.  For larger groups, the index should be higher because more difference between groups would be expected.  The guidelines for an acceptable level of discrimination depend upon item difficulty.  For very easy or very difficult items, low discrimination levels would be expected; most students, regardless of ability, would get the item correct or incorrect as the case may be.  For items with a difficulty level of about 70 percent, the
  • 119. Cont.  When an item is discriminating negatively, overall the most knowledgeable examinees are getting the item wrong and the least knowledgeable examinees are getting the item right.  A negative discrimination index may indicate that the item is measuring something other than what the rest of the test is measuring. More often, it is a sign that the item has been mis-keyed.
  • 120. Distracter Analysis  One important element in the quality of a multiple choice item is the quality of the item’s distracters. However, neither the item difficulty nor the item discrimination index considers the performance of the incorrect response options, or distracters.  A distracter analysis evaluates the effectiveness of the distracters in each item by comparing the number of students in the upper and lower groups who selected each incorrect alternative (a good distracter will attract more students from the lower group than the upper group).
  • 121. Cont.  Just as the key, or correct response option, must be definitively correct, the destructors must be clearly incorrect (or clearly not the "best" option). In addition to being clearly incorrect, the destructors must also be plausible. That is, the destructors should seem likely or reasonable to an examinee who is not sufficiently knowledgeable in the content area.  If a destructor appears so unlikely that almost no examinee will select it, it is not contributing to the performance of the item. In fact, the presence of one or more plausible destructors in a multiple choice item can make the item artificially far easier than it ought to be.
  • 122. Cont.  It is not desirable to have one of the distracters chosen more often than the correct answer. This result indicates a potential problem with the question.  If students do not know the correct answer and are purely guessing, their answers would be expected to be distributed among the distracters as well as the correct answer.  If one or more distracters are not chosen the unselected distracters probably are not plausible. If the teacher wants to make the test more difficult, those distracters should be replaced in the next tests.
  • 123. Cont.  Whenever the proportion of examinees who selected a distracter is greater than the proportion of examinees who selected the key, the item should be examined to determine if it has been mis-keyed or double-keyed.  If examinees consistently fail to select a given distracter, this may be evidence that the distracter is implausible or simply too easy.
  • 124. Item Banking  Building a file of effective test items and assessment tasks involves recording the items or tasks, adding information from analyses of students responses, and filing the records by both the content area and the objective that the item or task measures.  Such a file is especially valuable in areas of complex achievement, when the construction of test items and assessment tasks is difficult and time consuming. When enough high-quality items and tasks have been assembled, the burden of preparing tests and assessments is considerably lightened. Computer item banking makes tasks even easier.
  • 125. UNIT FOUR RELIABILITY AND VALIDITY OF ASSESSMENT TOOLS VALIDITY  Validity is the most important idea to consider when preparing or selecting a test or other measuring instrument for use. The drawing of correct conclusions or making decisions based on the data obtained from an assessment is validity.  If the test lacks validity, the information it provides is useless. The validity of a test can be viewed as the "correctness" of the decisions or conclusions made from performance of students gathered through the tests.  Validity has been defined as referring to the appropriateness, meaningfulness, and usefulness of the decisions teachers make based on the information they collect from their students using tests and other instruments.
  • 126. Cont.  It should be related to (1) performance on a universe of items (content validity), (2) performance on some criterion (criterion-related validity), or (3) the degree to which certain psychological traits or constructs are actually represented by test performance (construct validity).  The term validity refers to the accuracy of test results of students; that is, it addresses the question of how confident can we be that a test actually indicates a person's true score on a trait. Types of Validity  Validity is divided into three categories or types. A. Content Validity  The relevant type of validity in the measurement of a behavior is content validity
  • 127.  In assessing the content validity of a test, the teacher asks, "To what extent does the test require demonstration by the students who have taken the test that constitute all aspects of the knowledge or other behavior being measured?" This type of validity refers to the adequacy of the assessment. According to Whitely (1996), adequate assessment has two components: 1. Relevance of the content of the test to the objectives or behavior being measured, and 2. Representativeness of the content in the test.  A relevant test assesses only those objectives that are stated in the instruction.  For a test to have high content validity, it should be a representative sample of both the objectives and contents being measured.
  • 128. Cont. 1. For critical examination of the items of a measure in relation to the behavior or the purpose of the tests, one must make the following professional judgments: A. Does the test content parallel to the instructional objectives to be assessed? B. Do the items of the measure cover different aspects of the course and objectives? Criterion-Related Validity  Criterion-related validity indicates whether the scores on a test predict scores on a well-specified, predetermined criterion.  There are two types of criterion-related validity. They are concurrent validity and predictive validity.
  • 129. Cont. Concurrent validity uses correlation coefficients to describe the degree of relationship between the scores of two tests of students given at about the same time. A high relationship suggests that the tests are assessing something similar. This type of evidence is used in the development of new standardized tests or other measuring instruments that measure – in a different, perhaps more efficient, way – the same thing as an old instrument. The purpose of doing this is usually to substitute one test by another. In predictive validity, the data on the criterion variable are collected some time after the data on the predictor variable are collected. Take the case of ESLCE and college performance. Students take the national examination in May. Their college performances are collected certain months in February. In both concurrent and predictive procedures a test is related
  • 130.  The term construct refers to a psychological construct, a theoretical conceptualization about an aspect of human behavior that cannot be measured or observed directly (Ebel and Frisbie, 1991).  Construct validity is an interpretation or meaning that is given to a set of scores from tests that assess a behavior or theory that cannot be measured directly, such as measuring an unobservable trait like intelligence, creativity, or anxiety. For example, we use tests to measure intelligence. Intelligence is a variable that we cannot directly observe it.  We infer it from the students’ test scores. Students who scored high on the test are said to be intelligent.
  • 131. Cont. Face Validity  Strictly speaking, face validity is not a major type of validity. It refers to the degree of the content of the test look valid or the extent to which a test appears to measure what it is intended to measure, to those, for example teachers who prepared the test items, who administer and/or value the test (Worthen et al., 1999). Face validity may not be as important as content validity, criterion related validity or construct related validity from measurement perspective.
  • 132.  Validity is influenced by a number of factors. The following are some major ones. Factors in the test itself  The following factors can prevent the test items from functioning as intended and thereby lower the validity of the interpretations from the assessment results. The first five factors are equally applicable for assessments requiring extended student performance and traditional tests. The last five factors apply most directly to tests with fixed choice or short answer items that are scored right or wrong.
  • 133. Cont.  Unclear direction  Too difficult vocabulary and sentence structure  Ambiguity in sentence structure  Inadequate time limits  Overemphasis of easy to-assess aspects of domain at the expense of important –but hard to assess aspects  Test items inappropriate for the outcomes being measured  Poorly constructed test items  Test too short  Improper arrangement of items  Identifiable pattern of answers
  • 134. Cont. Factors in administration and scoring  In case of teacher made tests, such factors as insufficient time, unfair aid to individual students who ask for help, cheating, and unreliable scoring of student performances tend to lower validity. In the case of published tests, failure to follow the standard directions and time limits, giving students unauthorized assistance, and errors in scoring similarly contribute to lower validity. Factors in student responses  Some students may be bothered by emotional disturbances that interfere with their performance. Others may be frightened by the assessment situation and so are unable to respond normally, and still others may be motivated to put forth their effort.
  • 135. RELIABILITY:  The degree of consistency between two measures of the same thing.  The measure of how stable, dependable, trustworthy, and consistent a test is in measuring the same thing each time.  Reliability can be defined as a measure of how consistent our measurements are. Scores that are highly reliable are accurate and can be reproduced. Reliability refers to the consistency of assessment results over time and with different samples of students. It is the adequacy of assessment devices to assess what they are supposed to assess again and again.
  • 136. Cont.  Consistency can be a) consistency over a period of time (stability),b) consistency over different forms and c) internal consistency.  Consistent measurement is a necessary condition for high quality educational and psychological testing.  Although reliability is a necessary condition for valid test score, it is not a sufficient condition.  Reliability affects validity and quality of decision.
  • 137. Methods of estimating reliability  There are several methods of estimating reliability of a measuring instrument or a test. The common ones are stability, equivalence, stability and equivalence, internal consistency, and rater agreement. Stability (Test-retest)  A coefficient of stability is obtained by correlating scores from the same test of a group of individuals on two different occasions. If the scores of the individuals are consistent (that is, if those scoring high the first time also score high the second time, and so on) then the correlation coefficient, and the reliability, are high. This test-retest procedure assumes that the characteristic measured remains constant.
  • 138. Cont. Equivalence (Parallel forms)  It is obtained by giving two forms (with equal content, means, and variances) of a test to the same group on the same day and correlating these results. Here we are determining how confidently we can generalize a person’s score to what he would receive if he took a test composed of similar but different questions. When two equivalent or parallel forms of the same instrument are administered to the same group of students at about the same time, and the scores are related, the reliability that results is a coefficient of equivalence.
  • 139. Cont. Equivalence and Stability  When a teacher needs to give a pretest and posttest to assess a change in behavior, a reliability coefficient of equivalence and stability should be established. In this procedure, reliability data are obtained by administering to the same group of individuals one form of a test at one time and a second form at a later date. To minimize the effects of memory, chance and maturation factors, reliability coefficients estimated from parallel-forms of a test are preferred to test-retest reliability coefficients.
  • 140. Cont. Internal Consistency  Internal consistency is another type of estimating reliability. It is the most common type of reliability since it can be estimated from giving one form of a test once. There are three common types of internal consistency: Split-half method, Kuder-Richardson, and the Cronbach Alpha methods. a. The Split-Half Method In split-half reliability, the items of a test that have been administered to a group are divided into two comparable halves, and a correlation coefficient is calculated between the halves. If each student has about the same scores on each half, then the correlation is high and the test has high reliability.
  • 141. Cont.  Each half should be of similar difficulty. This method provides a lower reliability than other methods, since the total number in the correlation equation contains only half the items (and we know that other things being equal, longer tests are more reliable than short tests).  This technique should not be used with speeded tests. This is because all students do not answer all items, a factor that tends to increase the correlations between the items.  In splitting the test into two halves, one might put, for example, the odd and even-numbered items.
  • 142. Cont.  Fortunately, one can estimate the reliability (rxx) of the full test from roe via the Spearman-Brown formula.  Where roe = the Pearson correlation between the half-test scores on the odd and the even items. The Spearman- Brown split half method assumes the two halves have equal standard deviations. Kuder-Richardson Method  They have developed a number of formulas in order to correlate all items on a single test with each other when each item is scored right or wrong, correct or incorrect, yes or no, and so on. K-R reliability is thus determined from a single administration of a test for the same group of students, but without having to split the test into equivalent halves.
  • 143. Cont.  This procedure assumes that all items in the test are equivalent to each other, and it is appropriate when the purpose of the test is to measure a single behavior, for example, reading ability of students.  If a test has items of varying difficulty or if it measures more than one behavior, the KR estimates would usually be lower than the split-half reliabilities.
  • 144.  The Cronbach Alpha, or sometimes called Alpha Coefficient, developed by Cronbach (1957), also assumes that all items have similar difficulty levels. It is a much more general form of internal consistency than the KR, and it is used for items that are not scored right or wrong, yes or no, true or false. The Cronbach Alpha is generally the most appropriate type of reliability for tests or questionnaires in which there is a range of possible answers for each item.
  • 145. Raters Agreement The fifth type of reliability is expressed as a coefficient of agreement. This is established by determining the extent to which two or more persons agree about what they have seen, heard, scored, or rated. For example, when two or more teachers score the answers of students for essay items, will they give the same or similar scores for the students, i.e., do they agree on what they score? If they do, then there is some consistency in measurement.
  • 146. FACTORS INFLUENCING RELIABILITY A number of factors influence reliability of test scores. a. Test related factors  These factors include test length, difficulty of test items and score variability.  Test length. When we have a large number of items, the reliability of the items would increase.  Difficulty of test items. Score variability depends on difficulty level of items. If items are too difficult, only few students will answer them. As a result, scores will be all low. On the other hand, if items are too easy, many students will answer them. As a result, scores will be mostly high. In both instances, scores do not vary very much. This contributes to low reliability index. In contrast, in moderately difficult items students scores are highly likely to vary which will result in high reliability index.
  • 147. Cont.  According to Ebel & Frisbie (1991) items having 40% to 80% difficulty level contribute much to reliability. On the other hand, items that have more than 90 percent or fewer than 30 percent of the examinees answer correctly cannot possibly contribute to reliability. Score Variability  As scores vary, reliability tends to be higher. Compared to true-false items, multiple-choice items yield higher reliability indices. This is so because in true-false items students have a 50% probability of getting a correct answer by chance thereby contributing to low score variability among students. In multiple choice items with four options, on the other hand, the probability of getting an item right by chance is 25% which results in better score variability among students.
  • 148.  These factors include the nature of the group tested, student testwiseness, and student motivation  Nature of the group tested. Assume that all students in your class are brilliant. When you administer a test, they score very high and the difference between the highest and the lowest score is very narrow. If all the students are weak, their scores would be low. Accordingly, the range of the scores between the maximum and minimum would be low. But if there are high achieving, average, and low achieving students in the same classroom, their scores vary from high to low. Thus, reliability is higher in heterogeneous students than in homogenous students.
  • 149. Cont.  Student testwiseness. Some students are wise when they are taking tests. Though they do not know the idea, they are very clever at guessing the correct answer for the item using some clues or something that leads them to the correct answer. Students therefore vary in their skill of taking tests. This skill creates a difference in the students’ scores. When students vary considerably in the level of testwiseness, error scores will be higher which in turn results in lower reliability of scores.
  • 150. Cont.  Student Motivation. Who performs well: A motivated student or an unmotivated student? Does motivation have any effect on academic performance of students? Obviously, academic motivation has a strong effect on students’ performance. Students who are motivated academically tend to achieve higher than those who are not motivated. When students are unmotivated, their performances may not be the reflections of their actual performances. In a classroom where students vary in terms of motivation there will be variability in scores leading to lowered reliability.
  • 151.  These factors include time limits and cheating opportunities  Time Limits. Whether a test is speeded or power matters when it comes to the reliability of the test. When internal consistency reliability indices are determined from a single administration speeded tests, they will vary highly. This is because in speeded tests students' scores are likely to be the number of items attempted. Thus, it is suggested that when reliability is to be determined to speeded tests, two test administration be used and correlation calculated.  Cheating. Any form of cheating reduces score reliability. Cheating includes copying answers from others, using cheat sheets, passing answers to next exam halls, getting a test prior to its administration, etc.
  • 152. Ethical Standards of assessment Ethical and Professional Standards of Assessment and its Use  Ethical standards guide teachers in fulfilling their obligation to provide and use tests that are fair to all test takers regardless of age, gender, disability, ethnicity, religion, linguistic background, or other personal characteristics.
  • 153. Cont.  Fairness is a primary consideration in all aspects of testing. It: helps to ensure that all test takers are given a comparable opportunity to demonstrate what they know and how they can perform in the area being tested. implies that every test taker has the opportunity to prepare for the test and is informed about the general nature and content of the test. also extends to the accurate reporting of individual and group test results.
  • 154. teachers may consider in their assessment practices. 1. Teachers should be skilled in choosing assessment methods appropriate for instructional decisions. 2. Teachers should develop tests that meet the intended purpose and that are appropriate for the intended test takers. 3. The teacher should be skilled in administering, scoring and interpreting the results from diverse assessment methods.
  • 155. Cont 4. Teachers should be skilled in using assessment results when making decisions about individual students, planning teaching, developing curriculum, and school improvement 5. Teachers should be skilled in developing valid pupil grading procedures which use pupil assessments. 6. Teachers should be skilled in communicating assessment results to students, parents, other stakeholders and other educators. 7. Teachers should be skilled in recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of assessment information.
  • 156. Cont. In addition, the following are principles of grading that can guide the development of a grading system.  The system of grading should be clear and understandable (to parents, other stakeholders, and most especially students).  The system of grading should be communicated to all stakeholders (e.g., students, parents, administrators).  Grading should be fair for all students regardless of gender, socioeconomic status or any other personal characteristics.  Grading should support, enhance, and inform the instructional process.
  • 157. assessments  Fairness is fundamentally a socio-cultural, rather than a technical, issue.  Students represent a variety of cultural and linguistic backgrounds. If the cultural and linguistic backgrounds are ignored, students may become alienated or disengaged from the learning and assessment process. Teachers need to be aware of how such backgrounds may influence student performance and the potential impact on learning. Teachers should be ready to provide accommodations where needed.
  • 158. Cont Classroom assessment practices should be sensitive to the cultural and linguistic diversity of students in order to obtain accurate information about their learning. Assessment practices that attend to issues of cultural diversity include those that  acknowledge students’ cultural backgrounds.  are sensitive to those aspects of an assessment that may hamper students’ ability to demonstrate their knowledge and understanding.  use that knowledge to adjust or scaffold assessment practices if necessary.
  • 159. Cont Assessment practices that attend to issues of linguistic diversity include those that  acknowledge students’ differing linguistic abilities.  use that knowledge to adjust or scaffold assessment practices if necessary.  use assessment practices in which the language demands do not unfairly prevent the students from understanding what is expected of them.  use assessment practices that allow students to accurately demonstrate their understanding by responding in ways that accommodate their linguistic abilities, if the response method is not relevant to the concept being assessed (e.g., allow a student to respond orally rather than in writing).
  • 160. Cont  Teachers must make every effort to address and minimize the effect of bias in classroom assessment practices. Bias occurs when irrelevant or arbitrary factors systematically influence interpretations and results made that affect the performance of an individual student or a subgroup of students.  Assessment should be culturally and linguistically appropriate, fair and bias-free. For an assessment task to be fair, its content, context, and performance expectations should:  reflect knowledge, values, and experiences that are equally familiar and appropriate to all students;  tap knowledge and skills that all students have had adequate time to acquire;  be as free as possible of cultural and ethnic stereotypes
  • 161. Disability and Assessment Practices  It is quite obvious that many countries education system were exclusionary in fully accommodating the educational needs of disabled students.  This has been true not only in our country but in the rest of the world as well, although the magnitude might differ from country to country.  It was in response to this situation that UNESCO has been promoting the principle of inclusive education to guide the educational policies and practice of all governments.
  • 162. Cont  Different world conventions were held and documents signed towards the implementation of inclusive education. Our country, Ethiopia, has been a signatory of these documents and therefore has accepted inclusive education as a basic principle to guide its policy and practice in relation to the education of disabled students
  • 163. Cont  Inclusive education is based on the idea that all students, including those with disabilities, should be provided with the best possible education to develop themselves. This implies for the provision of all possible accommodations to address the educational needs of disabled students. Accommodations should not only refer to the teaching and learning process. It should also consider the assessment mechanisms and procedures.
  • 164. cont  There are different strategies that can be considered to make assessment practices accessible to students with disabilities depending on the type of disability. The following strategies could be considered in summative assessments:  Modifying assessments: - This should enable disabled students to have full access to the assessment without giving them any unfair advantage.  Others’ support: - Disabled students may need the support of others in certain assessment activities which they can not do it independently. For instance, they may require readers and scribes in written exams; they may also need others’ assistance in practical activities, such as using equipments, locating materials, drawing and measuring.
  • 165. Cont  Time allowances: - Disabled students should be given additional time to complete their assessments which the individual instructor has to decide based on the purpose and nature of the assessment.  Rest breaks: Some students may need rest breaks during the examination. This may be to relieve pain or to attend to personal needs.  Flexible schedules: In some cases disabled students may require flexibility in the scheduling of examinations. For example, some students may find it difficult to manage a number of examinations in quick succession and need to have examinations scheduled over a period of days.
  • 166. cont  Alternative methods of assessment:- In certain situations where formal methods of assessment may not be appropriate for disabled students, the instructor should assess them using non formal methods such as class works, portfolios, oral presentations, etc.  Assistive Technology: Specific equipment may need to be available to the student in an examination. Such arrangements often include the use of personal computers, voice activated software and screen readers.
  • 167. Gender issues in assessment  Teachers’ assessment practices can also be affected by gender stereotypes. The issues of gender bias and fairness in assessment are concerned with differences in opportunities for boys and girls. A test is biased if boys and girls with the same ability levels tend to obtain different scores.
  • 168. cont Test questions should be checked for:  material or references that may be offensive to members of one gender,  references to objects and ideas that are likely to be more familiar to men or to women,  unequal representation of men and women as actors in test items or representation of members of each gender only in stereotyped roles.
  • 169. Cont.  If the questions involve objects and ideas that are more familiar or less offensive to members of one gender, then the test may be easier for individuals of that gender. Standards for achievement on such a test may be unfair to individuals of the gender that is less familiar with or more offended by the objects and ideas discussed, because it may be more difficult for such individuals to demonstrate their abilities or their knowledge of the material.