1. Mulatu Desalew (MSc in Clinical Psychology)
Clinical Assessment and Report
DIRE DAWA UNIVERSITY
COLLEGE OF SOCIAL SCIENCES AND HUMANITIES
DEPARTMENT OF PSYCHOLOGY
ASSESSMENT AND EVALUATION OF LEARNING
(PGDT)
2. Basic Concepts or Terms
• Test
• Measurements
• Assessment
• Evaluation
• What do you know about these concepts?
• Are they similar?
• Can you differentiate these concepts?
2
Let's learn to teach before we learn to
test!!!
9/9/2022
3. Test
• Test is most commonly used method in education.
• A test is a particular form of measurement.
• It is a formal, systematic usually paper – and-
pencil procedure to gather information about
pupil’s behavior or performance.
• Is the presentation of a standard set of questions
to be answered by students.
• Designed to measure any quality, ability, skill or
knowledge.
• There is right or wrong answer.
Let's learn to teach before we learn to test!!!
3
9/9/2022
4. Measurement
• It is a process of quantifying or assigning a number
to performance.
• Measurement can take many forms, ranging from
the application very elaborate and complex
electronic devices, to paper – and- pencil exams,
to rating scales or checklists.
• It is the process of obtaining numerical description
of the degree of individual possesses.
• We measure temperature and express it in terms
of degree centigrade or degree celsius.
• We need to have appropriate instruments such as
standard instruments such as a ruler, a meter, a
weighing scale.
9/9/2022
Let's learn to teach before we learn to
test!!!
4
5. Assessment
• It is a general term that includes all the different
ways to gather information in their classroom.
• It includes observations, oral questions, paper
pencil tests, homework, lab work, research paper,
and the like.
• It is a process of collecting, synthesizing and
interpreting information to aid in decision making
(evaluation).
• Is the process by which evidence of student
achievement is obtained and evaluated.
• Methods of measuring and evaluating the nature of
the learner (what he learned, how he learned).
9/9/2022
Let's learn to teach before we learn to
test!!!
5
6. Evaluation
• It is the process of making judgment about pupil’s
performance instruction, or classroom climate.
• It occurs after assessment information has been
collected, synthesized and though about, because this is
when the teacher is in a position to make informed
judgments.
• It is concerned with making judgments on the worth or
value of a performance, answer the question “how good,
adequate, or desirable”.
• It is also the process of obtaining, analyzing and
interpreting information to determent the extent to which
students achieve instructional objective.
• Pass and fail or black and white
9/9/2022
Let's learn to teach before we learn to
test!!!
6
7. Evaluation
• Evaluation includes both quantitative and
qualitative description of pupil behavior plus
value judgment concerning the desirability of
that behavior.
9/9/2022
Let's learn to teach before we learn to
test!!!
7
8. • Assessment is important because:
• it drives and directs students’ learning
• provides feedback to students on their performance
• provides feedback on instruction
• ensures that standards of learning progression are
met, and
• it provides the staff with feedback about their
effectiveness as teachers.
9/9/2022
Let's learn to teach before we learn to
test!!!
8
Purposes of Assessment
9. • Generally assessment in education focuses on:
• Helping LEARNING; make decisions concerning
remediation, enrichment, selection, exceptionality,
progress and certification.
• Improving TEACHING; assessment provides
information about the attainment of objectives, the
effectiveness of teaching methods and learning
materials.
• It also provides information which serves as a basis
for a verity of educational decisions like:
• instructional management decisions, selection decisions,
placement decisions, counseling and guidance decisions,
classification decision and credentialing and certification
decisions.
9/9/2022
Let's learn to teach before we learn to
test!!!
9
Purposes of Assessment
10. • A learning objective is an outcome statement that
captures specifically what knowledge, skills, attitudes
learners should be able to exhibit following instruction.
• Effective assessment practice requires relating the
assessment procedures as directly as possible to the
learning objectives.
• Play a key role in both the instructional process and
the assessment process.
• They serve as guides for both teaching and learning,
communicate the intent of instruction to others, and
provide guidelines for assessing students learning.
9/9/2022
Let's learn to teach before we learn to
test!!!
10
Learning objective
11. • The assessment principles developed by Miller, Linn
and Grunland were.
1. Assessment should be relevant and useful info.
2. Assessment should be appropriate.
3. Assessment should be fair and accurate.
4. Assessment should be integrated into the teaching and
learning cycle.
5. Assessment should incorporate feedback mechanisms.
6. Assessment should draw on a wide range of evidence.
7. Assessment should be manageable.
9/9/2022
Let's learn to teach before we learn to
test!!!
11
Principles of Assessment
12. Assessment Assumptions
• Angelo and Cross (1993) have listed seven basic
assumptions of classroom assessment as follows:
• 1. The quality of student learning is directly, although
not exclusively related to the quality of teaching.
• Therefore, one of the most promising ways to improve
learning is to improve teaching.
• 2,To improve their effectiveness, teachers need first to
make their goals and objectives explicit and then to
get specific, comprehendible feedback on the extent
to which they are achieving those goals and
objectives.
9/9/2022
Let's learn to teach before we learn to
test!!!
12
13. Cont’
• 3,To improve their learning, students need to receive
appropriate and focused feedback early and often; they
also need to learn how to assess their own learning.
• 4,The type of assessment most likely to improve
teaching and learning is that conducted by teachers to
answer questions in response to issues or problems in
their own teaching.
• 5, Systematic inquiry and intellectual challenge are
powerful sources of motivation, growth, and renewal for
teachers, and classroom assessment can provide such
challenge.
9/9/2022
Let's learn to teach before we learn to
test!!!
13
14. Cont’
• 6, Classroom assessment does not require
specialized training; it can be carried out by
dedicated teachers from all disciplines.
• 7, By collaborating with colleagues and actively
involving students in classroom assessment
efforts, teachers (and students) enhance
learning and personal satisfaction.
9/9/2022
Let's learn to teach before we learn to
test!!!
14
15. Unit Two:
Assessment Strategies, Methods,
and Tools
Types of assessment
• There are different approaches in conducting
assessment in the classroom.
• Here we are going to see three pairs of
assessment typologies: namely,
• Formal vs. informal,
• Criterion referenced vs. norm referenced,
• Formative vs. summative assessments.
9/9/2022
Let's learn to teach before we learn to
test!!!
15
16. Formal and Informal Assessment
• Formal Assessment:
• This usually implies a written document, such as a
test, quiz, or paper.
• A formal assessment is given a numerical score or
grade based on student performance.
• Informal Assessment:
• "Informal" is a techniques that can easily be
applied in to classroom routines and learning
activities.
• Such assessment techniques can be used at
anytime without interfering with instructional time.
• Their results are indicative of the student's
performance on the skill or subject of interest.
9/9/2022
Let's learn to teach before we learn to
test!!!
16
17. Informal ass.cont’d
• An informal assessment usually occurs in a more
casual manner and may include;
• observation, inventories, checklists, rating scales, rubrics,
performance and portfolio assessments, participation, peer
and self evaluation, and discussion.
• Informal assessment seeks to identify the strengths
and needs of individual students without regard to
grade or age norms.
• Methods for informal assessment can be divided into
two main types:
• unstructured (e.g., student work samples, journals) and
• structured (e.g., checklists, observations).
9/9/2022
Let's learn to teach before we learn to
test!!!
17
18. Informal ass.cont’d
• The unstructured methods frequently are somewhat
more difficult to score and evaluate, but they can
provide a great deal of valuable information about the
skills of the students.
• Structured methods can be reliable and valid
techniques when time is spent creating the "scoring"
procedures.
• In Informal assessments students actively involve in
the evaluation process - they are not just paper-and-
pencil tests.
9/9/2022
Let's learn to teach before we learn to
test!!!
18
19. Formative and Summative Assessments
• Formative Assessment:
• Formative assessments are used to shape and guide
classroom instruction.
• They can include both informal and formal
assessments and help us to gain a clearer picture of
where our students are and what they still need help
with.
• They can be given before, during, and even after
instruction, as long as the goal is to improve
instruction.(ongoing process).
• They serve a diagnostic function for both students and
teachers.
9/9/2022
Let's learn to teach before we learn to
test!!!
19
20. Formative cont’d
• Both teachers and students received feedback.
• Formative assessment is also known by the name
‘assessment for learning’ and continuous assessment’.
• Continuous assessment (as opposed to terminal
assessment) is based on the premise that:
• if assessment is to help students’ improvement in their
learning and
• if a teacher is to determine the progress of students towards
the achievement of the learning goals.
9/9/2022
Let's learn to teach before we learn to
test!!!
20
21. Strategies of formative assessment
• Students write their understanding of vocabulary or
concepts before and after instruction.
• Ask students to summarize the main ideas they've
taken away from your presentation, discussion, or
assigned reading.
• You can make students complete a few problems or
questions at the end of instruction and check answers.
• You can assign brief, in-class writing assignments
(e.g., "Why is this person or event representative of
this time period in history?)
• Tests and homework can also be used formatively if
teachers analyze where students are in their learning and
provide specific, focused feedback regarding
performance and ways to improve it.
9/9/2022
Let's learn to teach before we learn to
test!!!
21
22. Summative Assessment
• Is typically comes at the end of a course (or unit)
of instruction.
• It evaluates the quality of students’ learning and
assigns a mark to that students’ work based on
how effectively learners have addressed the
performance standards and criteria.
• It include achievement tests, ratings of various
types of performance, and assessment of products
(reports, drawings, etc.).
• A particular assessment task can be both
formative and summative(if it counted for final
grade).
• They could also receive extensive feedback on
their work.
9/9/2022
Let's learn to teach before we learn to
test!!!
22
23. Criterion-referenced and Norm-
referenced Assessments
• Criterion-referenced Assessment: This type of
assessment allows us to quantify the extent
students have achieved the goals of a unit of study
and a course.
• It is carried out against previously specified
criteria and performance standards.
• Criterion referenced classrooms are mastery-
oriented, informing all students of the expected
standard and teaching them to succeed on related
outcome measures.
• Criterion referenced assessments help to eliminate
competition and may improve cooperation.
9/9/2022
Let's learn to teach before we learn to
test!!!
23
24. Norm-referenced Assessment
• This type of assessment has as its end point the
determination of student performance based on a
position within a cohort of students – the norm
group.
• This type of assessment is most appropriate when
one wishes to make comparisons across large
numbers of students or important decisions
regarding student placement and advancement.
• The criterion-referenced assessment emphasizes
description of student’s performance, and
• The norm-referenced assessment emphasizes
discrimination among individual students in terms
of relative level of learning.
9/9/2022
Let's learn to teach before we learn to
test!!!
24
25. Assessment Strategies
• Assessment strategy refers to those assessment
tasks (methods/approaches/activities) in which
students are engaged to ensure that all the
learning objectives of a subject, a unit or a lesson
have been adequately addressed.
• Assessment strategies range from informal, almost
unconscious, observation to formal examinations.
• There are many different ways to categorize
learning goals for students.
• Categorizing helps us to thoroughly think through
what we want students to know and be able to do.
9/9/2022
Let's learn to teach before we learn to
test!!!
25
26. The learning goals can be
categorized as presented follows:
• Knowledge and understanding: What facts do
students know outright? What information can they
retrieve? What do they understand?
• Reasoning proficiency: Can students analyze,
categorize, and sort into component parts? Can they
generalize and synthesize what they have learned?
Can they evaluate and justify the worth of a process or
decision?
• Skills: We have certain skills that we want students to
master such as reading fluently, working productively in
a group, making an oral presentation, speaking a
foreign language, or designing an experiment.
• .
9/9/2022
Let's learn to teach before we learn to
test!!!
26
27. Cont’
• Dispositions: We also frequently care about
student attitudes and habits of mind, including
attitudes toward school, persistence,
responsibility, flexibility, and desire to learn.
• Ability to create products: Another kind of
learning target is student-created products -
tangible evidence that the student has
mastered knowledge, reasoning, and specific
production skills. Examples include a research
paper, a piece of furniture, or artwork
9/9/2022
Let's learn to teach before we learn to
test!!!
27
28. Assessment strategies that can be used
by classroom teachers
• Classroom presentations: A classroom presentation
is an assessment strategy that requires students to
verbalize their knowledge, select and present samples
of finished work, and organize their thoughts about a
topic in order to present a summary of their learning.
• It may provide the basis for assessment upon
completion of a student’s project or essay.
• Conferences: A conference is a formal or informal
meeting between the teacher and a student for the
purpose of exchanging information or sharing ideas.
• A conference might be held to explore the student’s
thinking and suggest next steps; assess the student’s
level of understanding of a particular concept or
procedure.
9/9/2022
Let's learn to teach before we learn to
test!!!
28
29. Cont’
• Exhibitions/Demonstrations: An exhibition/
demonstration is a performance in a public
setting, during which a student explains and
applies a process, procedure, etc., in concrete
ways to show individual achievement of
specific skills and knowledge.
• Interviews:????????????????
• Observation: Observation is a process of
systematically viewing and recording students
while they work, for the purpose of making
instruction decisions.
9/9/2022
Let's learn to teach before we learn to
test!!!
29
30. Cont’
• Performance tasks: During a performance task,
students create, produce, perform, or present works on
"real world" issues. The performance task may be used
to assess a skill or proficiency, and provides useful
information on the process as well as the product.
• Portfolios: A portfolio is a collection of samples of a
student’s work over time.
• It offers a visual demonstration of a student’s
achievement, capabilities, strengths, weaknesses,
knowledge, and specific skills, over time and in a
variety of contexts.
• For a portfolio to serve as an effective assessment
instrument, it has to be focused, selective, reflective,
and collaborative.
9/9/2022
Let's learn to teach before we learn to
test!!!
30
31. Cont’
• Questions and answers:
• Strategies for effective question and answer
assessment include:
• Apply a wait time or 'no hands-up rule' to
provide students with time to think after a
question before they are called upon randomly
to respond.
• Ask a variety of questions, including close-
ended and open-ended questions that require
more than a right or wrong answer.
9/9/2022
Let's learn to teach before we learn to
test!!!
31
32. Cont’
• Students’ self-assessments: Self-assessment is
a process by which the student gathers information
about, and reflects on, his or her own learning.
• Checklists usually offer a yes/no format in relation
to student demonstration of specific criteria. They
may be used to record observations of an
individual, a group or a whole class.
• Rating Scales allow teachers to indicate the
degree or frequency of the behaviors, skills and
strategies displayed by the learner. Rating scales
state the criteria and provide three or four
response selections to describe the quality or
frequency of student work.
9/9/2022
Let's learn to teach before we learn to
test!!!
32
33. Cont’
• Rubrics use a set of criteria to evaluate a
student's performance. They consist of a fixed
measurement scale and detailed description of
the characteristics for each level of
performance.
• These descriptions focus on the quality of the
product or performance and not the quantity.
• Rubrics use a set of specific criteria to evaluate
student performance.
• They may be used to assess individuals or
groups and, as with rating scales, may be
compared over time.
9/9/2022
Let's learn to teach before we learn to
test!!!
33
34. Cont’
• The purpose of checklists, rating scales and
rubrics is to:
• Provide tools for systematic recording of
observations
• Provide tools for self-assessment
• Provide samples of criteria for students prior to
collecting and evaluating data on their work
• Record the development of specific skills,
strategies, attitudes and behaviours necessary for
demonstrating learning
• Clarify students' instructional needs by presenting
a record of current accomplishments.
9/9/2022
Let's learn to teach before we learn to
test!!!
34
35. Cont’
• One- Minute paper: During the last few
minutes of the class period, you may ask
students to answer on a half-sheet of paper.
• Muddiest Point: This is similar to ‘One-Minute
Paper’ but only asks students to describe what
they didn't understand and what they think
might help.
• Student- generated test questions
• Tests:???????????????????????????
9/9/2022
Let's learn to teach before we learn to
test!!!
35
36. Assessment in large classes
• The existing educational literature has
identified various assessment issues
associated with large classes. They include:
• Surface Learning Approach: Traditionally,
teachers rely on time-efficient and exam-based
assessment methods for assessing large
classes, such as multiple choices and short
answer question examinations.
• Higher level learning such as critical thinking
and analysis are often not fully assessed.
• Feedback is often inadequate
9/9/2022
Let's learn to teach before we learn to
test!!!
36
37. Cont’
• Inconsistency in marking
• Difficulty in monitoring cheating and plagiarism
• Lack of interaction and engagement
• There are a number of ways to make the
assessment of large numbers of students more
effective whilst still supporting effective student
learning. These include:
9/9/2022
Let's learn to teach before we learn to
test!!!
37
38. Cont’
• 1.Front ending: The basic idea of this strategy is
that by putting in an increased effort at the
beginning in setting up the students for the work
they are going to do, the work submitted can be
improved. Therefore the time needed to mark it is
reduced (as well as time being saved in less
requests for tutorial guidance).
• 2.Making use of in-class assignments: In-class
assignments are usually quick and therefore
relatively easy to mark and provide feedback on,
but help you to identify gaps in understanding.
Students could be asked to complete a task within
the timeframe of a scheduled lecture, field exercise
or practical class.
9/9/2022
Let's learn to teach before we learn to
test!!!
38
39. Cont’
• 3.Self-and peer-assessment
• Self-assessment reduces the marking load
because it ensures a higher quality of work is
submitted, thereby minimizing the amount of
time expended on marking and feedback.
• peer-assessment can provide useful learning
experiences for students.
• This could involve providing students with
answer sheets or model answers to a piece of
coursework that you had set them previously
and then requiring students to undertake the
marking of those assignments in class.
9/9/2022
Let's learn to teach before we learn to
test!!!
39
40. The benefits of peer assessment
• students can get to see how their peers have
tackled a particular piece of work,
• they can see how you would assess the work (e.g.
from the model answers/answer sheets you've
provided) and;
• they are put in the position of being an assessor,
thereby giving them an opportunity to internalize
the assessment criteria.
• 4.Group Assessments
• 5.Changing the assessment method, or at least
shortening it.
9/9/2022
Let's learn to teach before we learn to
test!!!
40
41. General Classification of test items
1. Selection-typed item (student requires to
select answer)
• multiple choices, true or false, matching type
2.Supply-typed items (students requires to
supply answer)
• essay, short answer
9/9/2022
Let's learn to teach before we learn to
test!!!
41
42. Developing assessment methods and
tools
• Appropriate tools or combinations of tools must be
selected and used if the assessment process is to
successfully provide information relevant to stated
educational outcomes.
• Constructing Tests
• Miller, Linn, & Gronlund (2009) identified classroom
tests that consist of:
• objective test items and
• performance assessments-require students to construct
responses (e.g. write an essay) or perform a particular task
(e.g., measure air pressure).
9/9/2022
Let's learn to teach before we learn to
test!!!
42
43. Cont’
• Objective tests are highly structured and require the
test taker to select the correct answer from several
alternatives or to supply a word or short phrase to
answer a question.
• They are called objective because they have a single
right or best answer that can be determined in
advance.
• Performance assessment tasks permit the student to
organize and construct the answer in essay form.
• Other types of performance assessment tasks may
require the student to use equipment, generate
hypothesis, make observations, construct something or
perform for an audience.
9/9/2022
Let's learn to teach before we learn to
test!!!
43
44. Cont’
• Constructing Objective Test Items
• There are various types of objective test items.
• These can be classified into those that require
the student to supply the answer (supply type
items) and those that require the student to
select the answer from a given set of
alternatives (selection type items).
• Supply type items include completion items
and short answer questions.
• Selection type test items include True/False,
multiple choice and matching.
9/9/2022
Let's learn to teach before we learn to
test!!!
44
45. True/False Test Items
• The chief advantage of true/false items is that
they do not require the student much time for
answering.
• This allows a teacher to cover a wide range of
content by using a large number of such items.
• true/false test items can be scored quickly,
reliably, and objectively by any body using an
answer key.
• If carefully constructed, true/false test items
have also the advantage of measuring higher
mental processes of understanding, application
and interpretation.
9/9/2022
Let's learn to teach before we learn to
test!!!
45
46. The major disadvantage of
true/false items
• is that when they are used exclusively, they tend to promote
memorization of factual information: names, dates, definitions, and
so on.
• Some argue that another weakness of true/false items is that they
encourage students for guessing.
• In addition true/false items:
• Can often lead a teacher to write ambiguous statements due to
the difficulty of writing statements which are clearly true or false
• Do not discriminate b/n students of varying ability as well as other
test items
• Can often include more irrelevant clues than do other item types
• Can often lead a teacher to favour testing of trivial knowledge
9/9/2022
Let's learn to teach before we learn to
test!!!
46
47. Suggestions to construct good quality
true/false test items
• Avoid negative statements, and never use double
negatives.
• In Right-Wrong or True-False items, negatively
phrased statements make it needlessly difficult for
students to decide whether that statement is
accurate or inaccurate.
• Restrict single-item statements to single concepts
• Use an approximately equal number of items,
reflecting the two categories tested.
• Make statements representing both categories
equal in length.
9/9/2022
Let's learn to teach before we learn to
test!!!
47
48. Matching Items
• A matching item consists of two lists of words
or phrases according to a particular kind of
association indicated in the item’s directions.
• Matching items sometimes can work well if you
want your students to cross-reference and
integrate their knowledge regarding the listed
premises and responses.
• matching items can cover a good deal of
content in an efficient fashion
9/9/2022
Let's learn to teach before we learn to
test!!!
48
49. Merits and limitation of Matching
The major advantage of matching items is its
compact form, which makes it possible to measure
a large amount of related factual material in a
relatively short time.
Another advantage is its ease of construction.
• The main limitation of matching test items is that
they are restricted to the measurement of factual
information based on rote learning.
• Another limitation is the difficulty of finding
homogenous material that is significant from the
perspective of the learning outcomes.
• As a result test constructors tend to include in their
matching items material which is less significant
9/9/2022
Let's learn to teach before we learn to
test!!!
49
50. Suggestions for the construction of
good matching items
• Use fairly brief lists, placing the shorter entries on
the right
• Employ homogeneous lists
• Include more responses than premises
• List responses in a logical order
• Describe the basis for matching and the number of
times a response can be used(“Each response in
the list at the right may be used once, more than
once, or not at all.”)
• Try to place all premises and responses for any
matching item on a single page.
9/9/2022
Let's learn to teach before we learn to
test!!!
50
51. Short Answer/Completion Test
Items
• The short-answer items and completion test items
are essentially the same that can be answered by
a word, phrase, number or formula.
• They differ in the way the problem is presented.
The short answer type uses a direct question,
where as the completion test item consists of an
incomplete statement requiring the student to
complete.
• The short-answer test items are one of the easiest
to construct
• This reduces the possibility that students will
obtain the correct answer by guessing
9/9/2022
Let's learn to teach before we learn to
test!!!
51
52. There are two limitations cited in
the use of short-answer test items
• One is that they are unsuitable for assessing
complex learning outcomes.
• The other is the difficulty of scoring, this is
especially true where the item is not clearly
phrased to require a definitely correct answer
and the student’s spelling ability.
• The following suggestions will help to make
short-answer type test items to function as
intended.
9/9/2022
Let's learn to teach before we learn to
test!!!
52
53. Cont’
• Word the item so that the required answer is
both brief and specific
• Do not take statements directly from textbooks
to use as a basis for short-answer items.
• A direct question is generally more desirable
than an incomplete statement.
• If the answer is to be expressed in numerical
units, indicate the type of answer wanted
9/9/2022
Let's learn to teach before we learn to
test!!!
53
54. Multiple-Choice Items
• It can effectively measure many of the simple
learning outcomes, in addition, it can measure a
variety of complex cognitive learning outcomes.
• A multiple-choice item consists of a problem(item
stem) and a list of suggested
solutions(alternatives, choices or options).
• There are two important variants in a multiple-
choice item:
• (1) whether the stem consists of a direct question
or an incomplete statement, and
• (2) whether the student’s choice of alternatives is
supposed to be a correct answer or a best answer.
9/9/2022
Let's learn to teach before we learn to
test!!!
54
55. Advantage of the multiple-choice item
• Its widespread applicability to the assessment of
cognitive skills and knowledge, as well as to the
measurement of students’ affect.
• Another advantage is that it’s possible to make
them quite varied in the levels of difficulty they
possess.
• Cleverly constructed multiple-choice items can
present very high-level cognitive challenges to
students.
• And, of course, as with all selected-response
items, multiple-choice items are fairly easy to
score
9/9/2022
Let's learn to teach before we learn to
test!!!
55
56. Weakness of multiple-choice items
• is that when students review a set of alternatives for an
item, they may be able to recognize a correct answer
that they would never have been able to generate on
their own.
• In that sense, multiple-choice items can present an
exaggerated picture of a student’s understanding or
competence, which might lead teachers to invalid
inferences.
• Another serious weakness, one shared by all selected-
response items, is that multiple-choice items can never
measure a student’s ability to creatively synthesize
content of any sort.
• Finally, in an effort to come up with the necessary
number of plausible alternatives, novice item-writers
sometimes toss in some alternatives that are obviously
incorrect.
9/9/2022
Let's learn to teach before we learn to
test!!!
56
57. Rules to preparing multiple choice.
• The question or problem in the stem must be self-
contained.
• Avoid negatively stated stems.
• Each alternative must be grammatically consistent
with the item’s stem
• Make all alternatives plausible, but be sure that
one of them is indisputably the correct or best
answer.
• Randomly use all answer positions in
approximately equal numbers.
• Never use “all of the above” as an answer choice,
but use “none of the above” to make items more
demanding.
9/9/2022
Let's learn to teach before we learn to
test!!!
57
58. Constructing Performance
Assessments
• The distinctive feature of essay questions is
that students are free to construct, relate, and
present ideas in their own words.
• Learning outcomes concerned with the ability
to conceptualize, construct, organize, relate,
and evaluate ideas require the freedom of
response and the originality provided by essay
questions.
• Essay questions can be classified into two
types – restricted-response essay questions
and extended response essay questions.
9/9/2022
Let's learn to teach before we learn to
test!!!
58
59. Cont’
• Restricted-response essay questions: These types
of questions usually limit both the content and the
response.
• The content is usually restricted by the scope of the
topic to be discussed. Limitations on the form of
response are generally indicated in the question.
• Extended response Essays: these types of questions
allow students:
• to select any factual information that they think is
relevant,
• to organize the answer in accordance with their best
judgment, and;
• to integrate and evaluate ideas as they deem
appropriate.
9/9/2022
Let's learn to teach before we learn to
test!!!
59
60. Advantages of essay questions
• Extended-response essays focus on the
integration and application of thinking and
problem solving skills.
• Essay assessments enable the direct
evaluation of writing skills.
• Essay questions, as compared to objective
tests, are easy to construct.
• Essay questions have a positive effect on
students learning
9/9/2022
Let's learn to teach before we learn to
test!!!
60
61. Suggestions for the construction of
good essay questions:
• Restrict the use of essay questions to those
learning outcomes that can not be measured
satisfactorily by objective items.
• Structure items so that the student’s task is
explicitly bounded.
• For each question, specify the point value, an
acceptable response-length, and a
recommended time allocation.
• Employ more questions requiring shorter
answers rather than fewer questions requiring
longer answers.
9/9/2022
Let's learn to teach before we learn to
test!!!
61
62. Cont..
• Don’t employ optional questions.
• Test a question’s quality by creating a trial response to the
item.
• The following guidelines would be helpful in making the
scoring of essay items easier and more reliable.
• You should ensure that you are firm emotionally, mentally
etc before scoring
• All responses to one item should be scored before moving
to the next item
• Write out in advance a model answer to guide yourself in
grading the students’ answers
• Shuffle exam papers after scoring every question before
moving to the next
• The names of test takers should not be known while scoring
to avoid bias
9/9/2022
Let's learn to teach before we learn to
test!!!
62
63. Table of Specification and
Arrangement of Items
• If tests are to be valid and reliable they have to
be developed based on carefully designed
plans.
• Planning classroom test involves identifying the
instructional objectives earlier stated and the
subject matter (content) covered during the
teaching/learning process.
9/9/2022
Let's learn to teach before we learn to
test!!!
63
64. Guide in planning a classroom test
Determine the purpose of the test;
Describe the instructional objectives and content to be
measured.
Determine the relative emphasis to be given to each
learning outcome;
Select the most appropriate item formats (essay or
objective);
Develop the test blue print to guide the test construction;
Prepare test items that is relevant to the learning outcomes
specified in the test plan;
Decide on the pattern of scoring and the interpretation of
result;
Decide on the length and duration of the test, and
Assemble the items into a test, prepare direction and
administer the test.
9/9/2022
Let's learn to teach before we learn to
test!!!
64
65. Developing a table of specification
involves:
1. Preparing a list of learning outcomes, i.e. the type
of performance students are expected to
demonstrate
2. Outlining the contents of instruction, i.e. the area
in which each type of performance is to be shown,
and
3. Preparing the two way chart that relates the
learning outcomes to the instructional content.
9/9/2022
Let's learn to teach before we learn to
test!!!
65
66. In geography subject
content Item type
Contents
Item Types
Tot
al
Percent
True/False Matching Short
Answer
Multiple
Choice
Air pressure 1 1 1 3 6 24%
Wind 1 1 1 1 4 16%
Temperature 1 2 1 3 7 28%
Rainfall 1 1 1 2 5 20%
9/9/2022
Let's learn to teach before we learn to
test!!!
66
67. Arrangement of test items
• There are various methods of grouping items in an
achievement test depending on their purposes.
• For most purposes the items can be arranged by a
systematic consideration of:
• The type of items used
• The learning outcomes measured
• The difficulty of the items, and
• The subject matter measured
9/9/2022
Let's learn to teach before we learn to
test!!!
67
68. Cont
• To summarize, the most effective method for
organizing items in the typical classroom test is
to:
• Form sections by item type
• Group the items within each section by the
learning outcomes measured, and
• Arrange both the sections and the items
within sections in an ascending order of
difficulty.
9/9/2022
Let's learn to teach before we learn to
test!!!
68
69. Administration of Tests
• Test Administration refers to the procedure of
actually presenting the learning task that the
examinees are required to perform in order to
ascertain the degree of learning that has taken
place during the teaching-learning process.
• Threatening students with tests if they do not
behave
• Warning students to do their best “because the test
is important”
• Telling students they must work fast in order to
finish on time.
• Threatening dire consequences if they fail
9/9/2022
Let's learn to teach before we learn to
test!!!
69
70. Ensuring Quality in Test
Administration
• The ff are guidelines and steps involved in test
administration aimed at ensuring quality in test
administration.
• Collection of the question papers in time from custodian to be
able to start the test at the appropriate time stipulated.
• Ensure compliance with the stipulated sitting arrangements in
the test to prevent collision between or among the test takers.
• Ensure orderly and proper distribution of questions papers to
the test takers.
• Make it clear that cheating will be penalized.
• Avoid giving hints to test takers who ask about particular
items. But make corrections or clarifications to the test takers
whenever necessary.
• Keep interruptions during the test to a minimum
9/9/2022
Let's learn to teach before we learn to
test!!!
70
71. Credibility and Civility in
Test Administration
• Credibility deals with the value the eventual
recipients and users of the results of
assessment place on the result with respect to
the grades obtained, certificates issued or the
issuing institution.
• civility on the other hand enquires whether the
persons being assessed are in such conditions
as to give their best without hindrances and
burdens in the attributes being assessed and
whether the exercise is seen as integral to or
as external to the learning process.
9/9/2022
Let's learn to teach before we learn to
test!!!
71
72. Test administration
• Effort should be made to see that the test takers
are given a fair and unaided chance to
demonstrate what they have learnt with respect to:
• Instructions: test should contain a set of
instructions which are usually of two types. One is
the instruction to the test administrator while the
other one is to the test taker.
• Duration of the test
• Venue and sitting arrangement
• Other necessary conditions
All these are necessary to enhance the test
administration and to make assessment civil in
manifestation
9/9/2022
Let's learn to teach before we learn to
test!!!
72
73. Scoring Essay Test
• There are two common methods of scoring essay
questions. These are:
A. The Point or Analytic Method
• In this method each answer is compared with
already prepared ideal marking scheme (scoring
key) and marks are assigned according to the
adequacy of the answer.
• When used conscientiously, the analytic method
provides a means for maintaining uniformity in
scoring between scorers and between scripts thus
improving the reliability of the scoring
9/9/2022
Let's learn to teach before we learn to
test!!!
73
74. B. The Global/Holistic of
Rating Method
• In this method the examiner first sorts the
response into categories of varying quality
based on his general or global impression on
reading the response.
• The standard of quality helps to establish a
relative scale, which forms the basis for
ranking responses from those with the poorest
quality response to those that have the highest
quality response.
9/9/2022
Let's learn to teach before we learn to
test!!!
74
75. Scoring Objective Test.
i. Manual Scoring
• In this method of scoring the answer to test items
are scored by direct comparison of the examinees
answer with the marking key.
• If the answers are recorded on the test paper for
instance, a scoring key can be made by marking
the correct answers on a blank copy of the test
• Scoring is then done by simply comparing the
columns of answers on the master copy with the
columns of answers on each examinee’s test
paper.
9/9/2022
Let's learn to teach before we learn to
test!!!
75
76. ii. Stencil Scoring
• On the other hand, when separate sheet of answer sheets
are used by examinees for recording their answers, it’s
most convenient to prepare and use a scoring stencil.
• A scoring stencil is prepared by pending holes on a blank
answer sheet where the correct answers are supposed to
appear.
• Scoring is then done by laying the stencil over each answer
sheet and the number of answer checks appearing through
the holes is counted.
iii. Machine Scoring
• Usually for a large number of examinees, a specially
prepared answer sheets are used to answer the questions.
• The answers are normally shaded at the appropriate places
assigned to the various items
9/9/2022
Let's learn to teach before we learn to
test!!!
76
77. Unit 3: Item Analysis
• It is the process involved in examining or analyzing
testee’s responses to each item on a test with a
basic intent of judging the quality of item.
• Item analysis helps to determine the adequacy of
the items within a test as well as the adequacy of
the test itself.
• There are several reasons for analyzing questions
and tests that students have completed and that
have already been graded. Some of the reasons
that have been cited include the following:
• Identify content that has not been adequately
covered and should be re-taught,
9/9/2022
Let's learn to teach before we learn to
test!!!
77
78. Item Analysis
• Provide feedback to students,
• Determine if any items need to be revised in
the event they are to be used again or become
part of an item file or bank,
• Identify items that may not have functioned as
they were intended,
• Direct the teacher's attention to individual
student weaknesses.
9/9/2022
Let's learn to teach before we learn to
test!!!
78
79. Item Analysis
• The results of an item analysis provide
information about the difficulty of the items and
the ability of the items to discriminate between
better and poorer students.
• If an item is too easy, too difficult, failing to
show a difference between skilled and
unskilled examinees, or even scored
incorrectly, an item analysis will reveal it.
• The two most common statistics reported in an
item analysis are the item difficulty and the
item discrimination.
9/9/2022
Let's learn to teach before we learn to
test!!!
79
80. Item difficulty level index
• It is one of the most useful, and most frequently
reported, item analysis statistics.
• It is a measure of the proportion of examinees who
answered the item correctly; for this reason it is
frequently called the p-value.
• If scores from all students in a group are included
the difficulty index is simply the total percent
correct.
• When there is a sufficient number of scores
available (i.e., 100 or more) difficulty indexes are
calculated using scores from the top and bottom
27 percent of the group.
9/9/2022
Let's learn to teach before we learn to
test!!!
80
81. Item analysis procedures
• Rank the papers in order from the highest to the
lowest score
• Select one-third of the papers with the highest total
score and another one-third of the papers with
lowest total scores
• For each test item, tabulate the number of
students in the upper & lower groups who selected
each option
• Compute the difficulty of each item (% of students
who got the right item)
• Item difficulty index can be calculated using the
following formula:
9/9/2022
Let's learn to teach before we learn to
test!!!
81
82. Item difficulty analysis procedures
• P= successes in the HSG + successes in the LSG
N in (HSG+LSG )
Where, HSG = High Scoring Groups
• LSG = Low Scoring Groups
• N = the total number of HSG and LSG
The difficulty indexes can range between 0.0
and 1.0 and are usually expressed as a
percentage.
A higher value indicates that a greater
proportion of examinees responded to the item
correctly,
9/9/2022
Let's learn to teach before we learn to
test!!!
82
83. • For the above data P is given as
• = 70%
• For maximum discrimination among students, an
average difficulty of .60 is ideal.
84. Item difficulty interpretation
P-Value Percent Range Interpretation
> or = 0.75 75-100 Easy
< or = 0.25 0-25 Difficult
between .25 & .75 26-74 Average
9/9/2022
Let's learn to teach before we learn to
test!!!
84
85. Item difficulty interpretation
• For criterion-referenced tests (CRTs), with their
emphasis on mastery-testing, many items on
an exam form will have p-values of .9 or above.
• Norm-referenced tests (NRTs), on the other
hand, are designed to be harder overall and to
spread out the examinees’ scores.
• Thus, many of the items on an NRT will have
difficulty indexes between .4 and .6.
9/9/2022
Let's learn to teach before we learn to
test!!!
85
86. Item discrimination index
• The index of discrimination is a numerical indicator that
enables us to determine whether the question
discriminates appropriately between lower scoring and
higher scoring students.
• Item discrimination index can be calculated using the
following formula:
• D= successes in the HSG - successes in the LSG
1/2 (HSG+LSG )
• Where,
• HSG = High Scoring Groups
• LSG = Low Scoring Groups
9/9/2022
Let's learn to teach before we learn to
test!!!
86
88. Item discrimination index
• The item discrimination index can vary from -1.00
to +1.00.
• A negative discrimination index (between -1.00
and zero) results when more students in the low
group answered correctly than students in the high
group.
• A discrimination index of zero means equal
numbers of high and low students answered
correctly, so the item did not discriminate between
groups.
• A positive index occurs when more students in the
high group answer correctly than the low group.
9/9/2022
Let's learn to teach before we learn to
test!!!
88
89. Item discrimination index
• Questions that have an item difficulty index
(NOT item discrimination) of 1.00 or 0.00 need
not be included when calculating item
discrimination indices.
• An item difficulty of 1.00 indicates that
everyone answered correctly, while 0.00
means no one answered correctly.
• When computing the discrimination index, the
scores are divided into three groups with the
top 27% of the scores in the upper group and
the bottom 27% in the lower group.
9/9/2022
Let's learn to teach before we learn to
test!!!
89
90. Item discrimination interpretation
D-Value Direction Strength
> +.40 positive strong
+.20 to +.40 positive moderate
-.20 to +.20 none ---
< -.20 negative moderate to strong
9/9/2022
Let's learn to teach before we learn to
test!!!
90
91. Item discrimination interpretation
• For a small group of students, an index of
discrimination for an item that exceeds .20 is
considered satisfactory.
• For larger groups, the index should be higher because
more difference between groups would be expected.
• The guidelines for an acceptable level of discrimination
depend upon item difficulty.
• For items with a difficulty level of about 70 percent, the
discrimination should be at least .30.
• When an item is discriminating negatively, overall the
most knowledgeable examinees are getting the item
wrong and the least knowledgeable examinees are
getting the item right.
• More often, it is a sign that the item has been mis-
keyed.
9/9/2022
Let's learn to teach before we learn to
test!!!
91
92. Distractor Analysis
• It evaluates the effectiveness of the distracters
(options) in each item by comparing the
number of students in the upper and lower
groups who selected each incorrect alternative
(a good distracter will attract more students
from the lower group than the upper group).
• In addition to being clearly incorrect, the
distractors must also be plausible.
• That is, the distractors should seem likely or
reasonable to an examinee who is not
sufficiently knowledgeable in the content area
9/9/2022
Let's learn to teach before we learn to
test!!!
92
93. Evaluating the Effectiveness of Distracters
• The distraction power of a distractor is its ability to
differentiate between those who do not know and those who
know what the item is measuring. That , a good distracter
attracts more testees from the lower group than the
upper group.
Formula:
No of low scorers who No of high scorers
who
Option Distractor Power (Do) = marked option (L) - marked option (H)
Total No of testees in upper group (n)
9/9/2022
Let's learn to teach before we learn to
test!!!
93
94. Incorrect options with positive distraction power
are good distractors while one with negative
distracter must be changed or revised and those
with zero should be improved on because they
are not good. Hence, they failed to distract the
low achievers.
9/9/2022
Let's learn to teach before we learn to
test!!!
94
95. Item Banking
• items and tasks are recorded on records as they
are constructed; information form analysis of
students responses is added after the items and
tasks have been used, and then the effective items
and tasks are deposited in the file.
• Such a file is especially valuable in areas of
complex achievement, when the construction of
test items and assessment tasks is difficult and
time consuming.
• When enough high-quality items and tasks have
been assembled, the burden of preparing tests
and assessments is considerably lightened
9/9/2022
Let's learn to teach before we learn to
test!!!
95
96. Unit 4: Interpretation of Scores
Test interpretation is a process of assigning
meaning and usefulness to the scores
obtained from classroom test.
The test scores on their own lack a true
zero point and equal units.
Moreover, they are not based on the same
standard of measurement and as such
meaning cannot be read into the scores on
the basis of which academic and
psychological decisions may be taken.
9/9/2022
Let's learn to teach before we learn to
test!!!
96
97. Kinds of scores
Data differ in terms of what properties of the real
number series (order, distance, or origin) we can
attribute to the scores.
• A nominal scale involves the assignment of different
numerals to categorize that are qualitatively different.
• For example, we may assign the numeral 1 for
males and 2 for females.
• These symbols do not have any of the three
characteristics (order, distance, or origin) we attribute
to the real number series.
• The 1 does not indicate more of something than the
0.
9/9/2022
Let's learn to teach before we learn to
test!!!
97
98. • An ordinal scale has the order property of a
real number series and gives an indication
of rank order.
For example, ranking students based on
their performance on a certain athletic event
would involve an ordinal scale. We know
who is best, second best, third best, etc.
But the ranked do not tell us anything about
the difference between the scores.
9/9/2022
Let's learn to teach before we learn to
test!!!
98
99. • With interval data we can interpret the distances
between scores. If, on a test with interval data, a Almaz
has a score of 60, Abebe a score of 50, and Beshadu a
score of 30, we could say that the distance between
Abebe’s and Beshadu’s scores (50 to 30) is twice the
distance between Almaz”s and Abebe’s scores (60 t0
50).
• If one measures with a ratio scale, the ratio of the
scores has meaning.
• Thus, a person whose height is 2 meters is twice as a
tall as a person whose height is 1 meter.
• We can make this statement because a measurement of
0 actually indicates no height.
• That is, there is a meaningful zero point. However, if a
student scored 0 on a spelling test, we would not
interpret the score to mean that the student had no
spelling ability.
9/9/2022
Let's learn to teach before we learn to
test!!!
99
100. • Measures of Central Tendency
Mean, Median, Mode
• Measures of Variability
Range, Quartile Deviation, Standard
Deviation
• Point Measures
Quartiles, Deciles, Percentiles
9/9/2022
Let's learn to teach before we learn to
test!!!
100
101. Measures of Central Tendency
• MODE – the crude or inspectional average
measure. It is most frequently occurring
score. It is the poorest measure of central
tendency.
• Advantage: Mode is always a real value
since it does not fall on zero. It is simple to
approximate by observation for small cases.
It does not necessitate arrangement of
values.
• Disadvantage: It is not rigidly defined and is
inapplicable to irregular distribution
• What is the mode of these scores?
• 75,60,78, 75 76 75 88 75 81 75
9/9/2022
Let's learn to teach before we learn to
test!!!
101
102. Measures of Central Tendency
• MEDIAN – The scores that divides the
distribution into halves. It is sometimes called the
counting average.
• Advantage: It is the best measure when the
distribution is irregular or skewed. It can be
located in an open-ended distribution or when the
data is incomplete (ex. 80% of the cases is
reported)
• Disadvantage: It necessitates arranging of items
according to size before it can be computed
• What is the median?
• 75,60,78, 75 76 75 88 75 81 75
9/9/2022
Let's learn to teach before we learn to
test!!!
102
103. Measures of Central Tendency
• MEAN – The most widely used and familiar
average. The most reliable and the most
stable of all measures of central tendency.
• Advantage: It is the best measure for regular
distribution.
• Disadvantage: It is affected by extreme
values
• What is the mean?
• 75,60,78, 75 76 75 88 75 81 75
9/9/2022
Let's learn to teach before we learn to
test!!!
103
104. • To the extent that differences are observed
among these three measures, the
distribution is asymmetrical or “skewed”.
• In a positively-skewed distribution most of
the scores concentrate at the low end of the
distribution. This might occur, for example, if
the test was extremely difficult for the
students.
• In a negatively-skewed distribution, the
majority of scores are toward the high end of
the distribution. This could occur if we gave
a test that was easy for most of the
students.
9/9/2022
Let's learn to teach before we learn to
test!!!
104
105. • With perfectly bell shaped distributions, the
mean, median, and mode are identical.
• With positively skewed data, the mode is
lowest, followed by the median and mean.
• With negatively skewed data, the mean is
lowest, followed by the median and mode.
9/9/2022
Let's learn to teach before we learn to
test!!!
105
106. Point Measures:
Quartiles
• Point measures where the distribution is divided
into four equal parts.
• Q1 : N/4 or the 25% of distribution
• Q2 : N/2 or the 50% of distribution ( this is the same as
the median of the distribution)
• Q3 : 3N/4 or the 75% of distribution
9/9/2022
Let's learn to teach before we learn to
test!!!
106
107. Point Measures:
•Deciles
• point measures where the distribution is
divided into 10 equal groups.
• D1 : N/10 or the 10% of the distribution
• D2 : N/20 or the 20% of the distribution
• D3 : N/30 or the 30% of the distribution
• D4 : N/40 or the 40% of the distribution
• D5 : N/50 or the 50% of the distribution
• D….
• D9 : N/90 or the 90% of the distribution
9/9/2022
Let's learn to teach before we learn to
test!!!
107
108. Point Measures:
Percentiles
point measures where the distribution is divided
into 100 equal groups
P1 : N/1 or the 1% of the distribution
P10 : N/10 or the 10% of the distribution
P25 : N/25 or the 25% of the distribution
P50 : N/50 or the 50% of the distribution
P75 : N/75 or the 75% of the distribution
P90 : N/90 or the 90% of the distribution
P99 : N/99 or the 99% of the distribution
9/9/2022
Let's learn to teach before we learn to
test!!!
108
109. Measures of Variability or
Scatter
1. RANGE
R = highest score – lowest score
2. Quartile Deviation
QD = ½ (Q3 – Q1)
It is known as semi inter quartile range
It is often paired with median
9/9/2022
Let's learn to teach before we learn to
test!!!
109
110. Measures of Variability or Scatter:
STANDARD DEVIATION
• It is the most important and best
measure of variability of test
scores.
• A small standard deviation means
that the group has small variability
or relatively homogeneous.
• It is used with mean.
9/9/2022
Let's learn to teach before we learn to
test!!!
110
111. Calculating a standard
deviation
• Compute the mean.
• Subtract the mean from each individual’s score.
• Square each of these individual scores.
• Find the sum of the squared scores (∑X2).
• Divide the sum obtained in step 4 by N, the
number of students, to get the variance.
• Find the square root of the result of step 5. This
number is the standard deviation (SD) of the
scores.
• Thus the formula for the standard deviation
(SD) is: SD=
9/9/2022
Let's learn to teach before we learn to
test!!!
111
113. MEAN
Mean = fM
f
fM – total of the product of the
frequency (f) and midpoint (M)
f – total of the frequencies
9/9/2022
Let's learn to teach before we learn to
test!!!
113
114. MEDIAN
•Median = L + c [N/2 - cum f<]
fc
L – lowest real limit of the median class
cum f< – sum of cum f ‘less than’ up to but
below median class
fc – frequency of the median class
c – class interval
N – number of cases
9/9/2022
Let's learn to teach before we learn to
test!!!
114
115. MODE
MODE = LMo + c/2 [ f1 – f2 ]
[2fo – f2 – f1]
LMo – lower limit of the modal class
c – class interval
f1 – frequency of class after modal class
f2 – frequency of class before modal class
f0 – frequency of modal class
9/9/2022
Let's learn to teach before we learn to
test!!!
115
Editor's Notes
Thus, continuous assessment is a teaching approach as well as a process of deciding to what extent the educational objectives are actually being realized during instruction
For very easy or very difficult items, low discrimination levels would be expected; most students, regardless of ability, would get the item correct or incorrect as the case may be.