This document discusses key concepts related to assessment of learning. It defines assessment, measurement, evaluation and testing. It outlines different modes of assessment including traditional, performance, and portfolio assessments. It also discusses types of assessment processes such as diagnostic, formative and summative assessments. Principles of quality assessment are outlined including clarity, appropriateness, validity, reliability, fairness, and practicality. Different methods of developing tests are also discussed such as identifying objectives, determining test type, constructing items, and validating tests.
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Assessment of learning 1
1. ASSESSMENT OF LEARNING”(Basic Concepts)
Prof. Yonardo Agustin Gabuyo
Basic Concepts in Assessment of Learning
Assessment
refers to the collection of data to describe or better understand an issue.
measures "where we are in relation to where we should be?" Many consider it the same as Formative
Evaluation.
is a process by which information is obtained relative to some known objective or goal.
teacher’s way of gathering information about what students have learned , and they use them to make
important decisions-about students’ grades, the content of future lessons, the revision of the structure or
content of a course.
Measurement
refers to the process by which the attributes or dimensions of some physical object are determine.
is a process of measuring the individual’s intelligence, personality, attitudes and values, achievement and
anything that can be expressed quantitatively.
it answer the question, “ how much”?
Evaluation
determines "how well did we do what we set out to do?" Evaluation is tied to stated goals and
objectives. Many equate this to summative evaluation.
Evaluation
it refers to the process of determining the extent to which instructional objectives are attained.
refers to the comparison of data to standard for purpose of judging worth or quality.
Test is an instrument designed to measure any quality,ability, skill or knowledge.
Testing is a method used to measure the level of performance or achievement of the learner.
TESTING refers to the administration, scoring and interpretation of an instrument (procedure) designed to
elicit information about performance in a sample of a particular area of behavior.
MODES OF ASSESSMENT
A. Traditional Assessment
preparation of the instrument is time consuming and prone to cheating.
2. the objective paper-and-pen test which usually assess low level thinking skills.
scoring is objective and administration is easy because students can take the test at the same time.
B. Performance Assessment
the learner performs a behavior to be measured in a "real-world" context.
The learner demonstrates the desired behavior in a real-life context and the locus of controlis with the
student.
B. Performance Assessment
A mode of assessment that requires actual demonstration of skills or creation of products of learning.
Scoring tends to be subjective without rubrics.
Preparation of the instrument is relatively easy and it measures behavior that cannot be deceived.
C. Portfolio Assessment
A process of gathering multiple indicators of students progress to support course goals in dynamic, ongoing
and collaborative processes.
Development is time consuming and rating tends to be subjective without rubrics.
Measures student’s growth and development .
TYPES OF ASSESSMENT PROCESSES
Determine the entry behavior of the students.
Determine the student’s performance at the beginning of instruction.
A. Placement Assessment
Determine the position of the students in the instructional sequence.
Determine the mode of evaluation beneficial for each student.
B. Diagnostic Assessment
is given at the start:
to determine the student’s levels of competence.
to identify those who have already achieve mastery of the requisite learning.
to help classify students into tentative small group of instruction.
C.Formative Assessment
is given to:
3. monitor learning progress of the students.
provide feedback to both parents and students.
it answer the question "Where we are in relation to where we should be?”
this type of assessment can be done informally and need not use traditional instruments such as quizzes and
tests.
D. Summative Assessment
given at the end of a unit:
to determine if the objectives were achieved.
tends to be formal and use traditional instruments such as tests and quizzes.
it answer the question "How well did we do what we set out to do?"
determine the extent of the student’s achievement and competence.
provide a basis for assigning grades.
provide the data from which reports to parents andtranscripts can be prepared.
Principles of Quality Assessment
1.Clarity of the Learning Target
2.Appropriateness of the Assessment Method
3. Validity
4. Reliability
5. Fairness
6. Practicality and Efficiency
Principles of Quality Assessment
1.Clarity of the Learning Target
Learning Target.Clearly stated, focuses on student learning objective rather than teacher activity, meaningful
and important target.
Skill Assessed. Clearly presented, can you "see" how students would demonstrate the skill in the task itself?
Performance Task - Clarity. Could students tell exactly whatthey are supposed to do and how the final product
should be done?
Rubric - Clarity. Would students understand how they are to be evaluated? Are the criteria observable and
clearly described?
4. 2.Appropriateness of the Assessment Method
Does it work with type of task and learning target?
Does it allow for several levels of performance?
Does it assess skills as stated?
The type of test used should much the learning objective of the subject matter.
Two general categories of test items:
1.Objective items
require students to select the correct response from severalalternatives or to supply a word or short phrase to
answer a question or complete a statement.
2.Subjective or essay items
which permit the student to organize and present an original answer.
Objective Test
include true-false, fill-in-the-blank, matching type, and multiple choice questions.
the word objective refers to the scoring and indicates there is only one correct answer.
Objective tests rely heavily on your skill to read quickly and to reason out the answer.
measure both your ability to remember facts and figuresand your understanding of course materials.
prepare yourself for high level critical reasoning and making fine discriminations to determine the best
answer.
a) Multiple-Choice Items
used to measure knowledge outcomes and various types of learning outcomes.
they are most widely used for measuring knowledge , comprehension, and application outcomes.
scoring is easy, objective, and reliable.
Principles of Quality Assessment
Advantages in Using Multiple-Choice Items
Multiple-choice items can provide ...
versatility in measuring all levels of cognitive ability.
highly reliable test scores.
scoring efficiency and accuracy.
objective measurement of student achievement or ability.
Multiple-choice items can provide…
5. a wide sampling of content or objectives.
a reduced guessing factor when compared to true-false items.
different response alternatives which can provide diagnostic feedback.
b. True-False Items
typically used to measure the ability to identify whether statements of fact are correct.
the basic format is simply a declarative statement that the student must judge are true or false.
item is useful for outcomes where there are two possible alternatives.
True-False Items…..
do not discriminate between students of varying ability as well as other item types.
can often include more irrelevant clues than do other item types.
can often lead an instructor to favor testing of trivial knowledge.
c. Matching Type Items
consist of a column of key words presented on the left side of the page and a column of options place at the
right side of the page. Students are required to match the options associated with a given key word(s).
provide objective measurement of students achievement.
provide efficient and accurate test scores.
Matching Type Items
if options can not be used more than once, the items are not mutually exclusive; getting one answer incorrect
automatically means a second question is incorrect.
all items should be of the same class, and all options should be of the same class. (e.g., a list of events to be
matched with a list of dates.
d. Short Answer Items
requires the examinee to supply the appropriate words, numbers, or symbols to answer a question or
complete a statement.
items should require a single word answer or brief and definite statement.
can efficiently measure lower level of cognitive domain.
B) Essays or Subjective test
may include either short answer questions or long general questions. these exams have no one
specific answer per student.
6. they are usually scored on an opinion basis, although there will be certain facts and understanding
expected in the answer.
essay test are generally easier and less time consuming to construct than are most objective test items.
the main reason students fail essay tests is not because they cannot write, but because they fail to answer the
questions fully and specifically, their answer is not well organized.
students with good writing skills have an advantage overstudents who have difficulty expressing themselves
through writing.
essays are more subjective in nature due to their susceptibility to scoring influences.
C) PERFORMANCE TEST
also known as alternative or authentic assessment
is designed to assess the ability of a student to perform correctly in a simulated situation (i.e., a situation in
which the student will be ultimately expected to apply his/her learning).
a performance test will simulate to some degree a real life situation to accomplish the assessment.
in theory, a performance test could be constructed for any skill and real life situation.
most performance tests have been developed for theassessment of vocational, managerial, administrative,
leadership, communication, interpersonal and physical education skills in various simulated situations.
Advantages in Using Performance Test Items
Performance test items:
can appropriately measure learning objectives which focus on the ability of the students to apply skills or
knowledge in real life situations.
usually provide a degree of test validity not possible with standard paper and pencil test items.
are useful for measuring learning objectives in the psychomotor domain.
SUGGESTIONS FOR WRITINGPERFORMANCE TEST ITEMS
1.Prepare items that elicit the type of behavior you want to measure.
2. Clearly identify and explain the simulated situation to the student.
3. Make the simulated situation as "life-like" as possible.
4. Provide directions which clearly inform the students of the type of response called for.
5. When appropriate, clearly state time and activity limitations in the directions.
7. 6. Adequately train the observer(s)/scorer(s) to ensure that they are fair in scoring the appropriate behaviors.
D) Oral questioning
the most commonly-used of all forms of assessment in class.
assumes that the learner can hear, of course, and shares a common language with the assessor.
the ability to communicate orally is relevant to this type of assessment.
The other major role for the "oral" in summativeassessment is in language learning, where the capacity to
carry on a conversation at an appropriate level of fluency is relatively distinct from the ability to read and write
the language.
E) Observation
refers to measurement procedures
in which child behaviors in the school or classroom are systematically monitored, described, classified, and
analyzed,
with particular attention typically given to the antecedent and consequent events involved in the performance
and maintenance of such behaviors.
F) Self-reports
Students are asked to reflect on make a judgment about, and then report on their own or a peer's behavior
and performance.
typical evaluation tools could include sentence completion, Likert scales, checklists, or holistic scales.
responses may be used to evaluate both performance and attitude.
3. Validity
is the degree to which the test measures what is intended to measure.
it is the usefulness of the test for a given purpose.
a valid test is always reliable.
Approaches in Validating Test
Factors Affecting Content Validity of Test Items
A. Test itself
B. The administration and scoring of a test.
C. Personal factors influencing how students response to the test.
8. D. Validity is always specific to a particular group.
Factors Affecting Content Validity of Test Items
A. Test Itself:
Ways that can reduce the validity of test results
o 1. Unclear Directions
o 2. Poorly constructed test items
o 3. Ambiguity
o 4. Inappropriate level of difficulty
o 5. Improper arrangement of items
6. Inadequate time limits
o 7. Too short test
o 8.Identifiable pattern of answers.
o 9.Test items inappropriate for the outcomes being measured.
10.Reading vocabulary and sentence structure to difficult.
B. The administration and scoring of a test.
assessment procedures must be administered uniformly to all students. Otherwise, scores willvary due to
factors other than differences in student knowledge and skills.
the test should be administered with ease, clarity and uniformity so that scores obtained are comparable.
uniformity can be obtained by setting the time limit and oral instructions.
insufficient time to complete the test
giving assistance to students during the testing
subjectivity in scoring essay tests
C. Personal factors influencing how students response to thetest.
students might not mentally prepared for the test.
students can subconsciously be exercising what is called response set.
D. Validity is always specific to a particular group.
the measurement of test results can be influence by such factors as age, sex, ability level, educational
background and cultural background.
Validity
is the most important quality of a test.
9. does not refer to the test itself.
generally addresses the question: "Does the test measure what it is intended to measure?"
refers to the appropriateness, meaningfulness, andusefulness of the specific inferences that can be made from
test scores.
is the extent to which test scores allow decision makers to infer how well students have attained program
objectives.
4. Reliability
it refers to the consistency of score obtained by the same person when retested using the same instrument or
one that is parallel to it.
refers to the results obtained with an evaluation instrument and not the instrument itself.
an estimate of reliability always refer to a particular type of consistency.
reliability is necessary but not a sufficient condition for validity.
reliability is primarily statistical.
Methods of Computing Reliability Coefficient
Relationship of Reliability and Validity
test validity is requisite to test reliability.
if a test is not valid, then reliability is moot. In other words, if a test is not valid there is no point in
discussing reliability because test validity is required before reliability can be considered in any meaningful
way.
Reliability
is the degree to which test scores are free of errors of measurement due to things like student fatigue, item
sampling, student guessing.
if as test is not reliable it is also not valid.
5. Fairness
the assessment procedures do not discriminate against a particular group of students (for example, students
from various racial, ethnic, or gender groups, or students with disabilities).
6. Practicality and Efficiency
Teacher’s familiarity with the method
Time required
Complexity with the administration
10. Ease in scoring -the test should be easy to score such that directions for scoring are clear, the scoring key is
simple; provisions for answer sheets are made.
Cost- (economy) - the test should be given in the cheapest way, which means that the answer sheets must be
provided so that the test can be given from time to time.
Development of Classroom Assessment Tools
Steps in Planning for a Test
Identifying test objectives
Deciding on the type of objective test to be prepared
Preparing a Table of Specifications (TOS)
Construction the draft test items
Try-out and validation
Identifying Test Objectives.
An objective test, if it is to be comprehensive, must cover the various levels of Bloom’s taxonomy. Each
objective consists of a statement of what is to be achieved and preferably, by how many percent of the
students.
Cognitive Domain
1. Knowledge
recognizes students’ ability to used rote memorization and recall certain facts. Test questions focus on
identification and recall information.
Sample verbs of stating specific learning outcomes
Cite, define, identify label, list, match, name, recognize, reproduce, select state.
At the end of the topic, students be able to identify major food groups without error. (instructionalobjective)
Test Item:
What are the four major food groups?
What are the three measures of central tendency?
2. Comprehension
involves students’ ability to read course content, interpret important information and put other’s ideas into
their own words. Test questions should focus on the use of facts, rules and principles.
Sample verbs of stating specific learning outcomes.
Classify, convert, describe, distinguish between, give examples, interpret, summarize.
11. At the end of the lesson, the students be able tosummarize the main events of the story in grammatically
correct English. (instructional objective)
Summarize the main event in the story in grammatically correct English. (test item)
3. Application
students take new concepts and apply them to new situation. Test questions focuses on applying facts and
principles.
Sample verbs of stating specific learning outcomes.
Apply, arrange, compute, construct, demonstrate, discover, extend, operate, predict, relate, show, solve, use.
At the end of the lesson, the students be able to write a short poem in iambic pentameter. (instructional
objective)
Write a short poem in iambic pentameter.
4. Analysis
students have the ability to take new information and break it down into parts and differentiate between
them. The test question focus on separation of a whole into component parts.
Sample verbs of stating specific learning outcomes.
Analyze, associate, determine, diagram, differentiate, discriminate, distinguish, estimate, point out, infer,
outline, separate.
At the end of the lesson, the students be able to describe the statistical tools needed in testing the difference
between two means. (instructional objective)
What kind of statistical test would you run to see if there is a significant difference between pre-test and post-
test?
5. Synthesis
students are able to take various pieces of information and form a whole creating a pattern where one did not
previously exist. Test question focuses on combining new ideas to form a new whole.
Sample verbs of stating specific learning outcomes.
Combine, compile, compose, construct, create, design, develop, devise, formulate, integrate, modify, revise,
rewrite, tell, write.
At the end of the lesson, the student be able to compareand contrast the two types of error. (instructional
objective)
What is the difference between type I and type II error?
12. 6. Evaluation
involves students’ ability to look at someone else’s ideas or principles and the worth of the work and the
value of the conclusion.
Sample verbs of stating specific learning outcomes.
Appraise, assess, compare, conclude, contrast, criticize, evaluate, judge, justify, support.
At the end of the lesson, the students be able to conclude the relationship between two means.
Example: What should the researcher conclude about therelationship in the population?
Preparing Table of Specification
A table of specification
is a usefulguide in determining the type of test items that you need to construct. If properly prepared, s table
of specifications will help you limit the coverage of the test and identify the necessary skills or cognitive level
required to answer the test item correctly.
Gronlund (1990) lists several examples of how a table ofspecifications should be prepared.
Format of a Table of Specifications
Specific Objectives these refer to the intended learning outcomes stated as specific instructional objectives
covering a particular test topic.
Cognitive Level this pertains to the intellectual skill or ability to correctly answer a test item using Bloom’s
taxonomy of educational objectives. We sometimes refer to this as the cognitive demand of a test item. Thus
entries in this column could be knowledge, comprehension, application, analysis, synthesis and evaluation.
Type of Test Item this identifies the type or kind of test a test items belongs to. Examples of entries in this
column could be “multiple choice, true or false, or even essay.
Item Numberthis simply identifies the question number as it appears in the test.
Total Number of Pointsthis summarizes the score given to a particular test item.
(1) Sample of Table of specifications
(2) Sample of Table of specifications
13. (3) Sample of Table of specifications
Points to Remember in preparing a table of Specifications
1)Define and limit the subject matter coverage of the test depending on the length of the test.
2) Decide on the point distribution per subtopic.
3) Decide on the type of test you will construct per subtopic.
4) Make certain that the type of test is appropriate to thedegree
of difficulty of the topic.
5) State the specific instructional objectives in terms of the specific types of performance students are expected
to demonstrate at the end of instruction.
6) Be careful in identifying the necessary intellectual skill needed to correctly answer the test item. Use
Bloom’s taxonomy as reference.
Suggestions for Constructing Short-Answer Items
1)Word the item so that the required answer is both brief and specific.
2)Do not take statements directly from textbooks to use as a basis for short-answer items.
3)A direct question is generally more desirable than anincomplete statement.
4) If the answer is to be expressed in numerical units, indicate the type of answer wanted.
5) Blanks for answer should be equal in length and in columnto the right of the question.
6) When completion items are used, do not include too many blanks.
Example for:
1) Poor: An animal that eats the flesh of other animals is (carnivorous)
Better: An animal that eats the flesh of other animals is classified as (carnivorous)
2) Poor: Chlorine is a (halogen).
Better: Chlorine belongs to a group of elements that combine with metals to form salt. It is therefore called a
(halogen)
14. Development of Classroom Assessment Tools
Suggestions for Constructing Short-Answer Items
3) Poor: John Glenn made his first orbital flight around the earth
in (1962).
Better: In what year did John Glenn make his first orbital flight
around the earth? (1962)
Selecting the Test Format
Selective Test – a test where there are choices for the answer like multiple choice, true or false and matching
type.
Supply Test – a test where there are no choices for the answer like short answer, completion and extended-
response essay.
Construction and Tryouts
Item Writing
Content Validation
Item Tryout
Item Analysis
Item Analysis
refers to the process of examining the student’s response to each item in the test.
There are two characteristics of an item. These are desirable and undesirable characteristics. An item that has
desirable characteristics can be retained for subsequent use and that with undesirable characteristics is either be
revised or rejected.
Use of Item Analysis
Item analysis data provide a basis for efficient class discussion of the test results.
Item analysis data provide a basis for remedial work.
Item analysis data provide a basis for general improvement of classroom instruction.
Use of Item Analysis
Item analysis data provide a basis for increased skills in test construction.
Item analysis procedures provide a basis for constructing test bank.
Three criteria in determining the desirability andundesirability of an item.
15. a) difficulty of an item
b) discriminating power of an item
c) measures of attractiveness
Difficulty index
refers to the proportion of the number of students in the upper and lower groups who answered an item
correctly.
Development of Classroom Assessment Tools
Level of Difficulty of an Item
Development of Classroom Assessment Tools
Discrimination Index
refers to the proportion of the students in the upper group who got an item correctly minus the proportion of
the students in the lower group who got the an item right.
Development of Classroom Assessment Tools
Level of Discrimination
Types of Discrimination Index
Positive Discrimination Index
-more students from the upper group got the item correctly than in the lower group.
16. Negative discrimination Index
-More students from the lower group got the item correctly than in the upper group.
Zero Discrimination Index
The number of students from the upper group and lower group are equal
MEASURES OF ATTRACTIVENESS
To measure the attractiveness of the incorrect option (distractors) in a multiple-choice tests, count the number
of students who selected the incorrect option in both the upper and lower groups. The incorrect options should
attract less of the upper group than the lower group.
Rubrics
asystematic guideline to evaluate students’ performance through the use of a detailed description of
performance standard.
used to get consistent scores across all students
it provides students with feedbacks regarding theirweakness and strength, thus enabling them to develop
their skills.
allows students to be more aware of the expectations for performance and consequently improve their
performance.
Holistic Rubric vs Analytic Rubric
Holistic Rubric is more global and does little to separate the task in any given product, but rather views the
final product as a set of all interrelated tasks contributing to the whole.
Provide a single score based on an overall impression of a students’ performance on task.
May be difficult to provide one over all score.
Advantage: quick scoring, provide overview of students achievement.
Disadvantage:does not provide detailed information about the student performance in specific areas of the
content and skills.
Use a holistic rubric when:
You want a quick snapshot of achievement.
A single dimension is adequate to define quality.
Example of Holistic Rubrics
17. Analytic Rubric
breaks down the objective or final product into component part each part is scored independently.
provide specific feedback along several dimension.
Analytic Rubric
Advantage: more detailed feedback, scoring more consistent across students and graders.
Disadvantage: time consuming to score
Use an analytic rubric when:
you want to see relative strengths and weaknesses.
you want detailed feedback.
you want to assess complicated skills or performance.
you wants students to self-assess their understanding or performance.
Example of Analytic Writing Rubric
Example of Analytic Writing Rubric
Utilization of Assessment Data
Norm-Referenced Interpretation
result is interpreted by comparing a student with another student where some will really pass.
designed to measure the performance of the students compared to other students. Individual score is
compared to others.
usually expressed in term of percentile, grade equivalent or stanine.
18. Norm-referenced grading is a system typically used to evaluate students based on the performance of those
around them. IQ tests and SAT exams would be two examples of this system, as well as grading “on the curve.
Norm-referenced grading is more common in schools that emphasize class rank rather than understanding of
skills or facts.
Utilization of Assessment Data
Criterion-Reference Interpretationresult is interpreted by comparing student based on a predefined standard
where all or none may pass.
designed to measure the performance of students compared to a pre-determined criterion or standard, usually
expressed in terms of percentage.
Criterion-referenced evaluation should be used to evaluate student performance in classrooms.
it is referenced to criteria based on learning outcomes described in the provincial curriculum.
the criteria reflect a student's performance based on specific learning activities.
a student's performance is compared to established criteria rather than to the performance of other students.
evaluation referenced to prescribed curriculum requires that criteria are established based on the learning
outcomes listed under the curriculum.