Assessment
•an instrument designed to measure any
characteristic, quality, ability, knowledge or
skill. It comprised of items in the area it is
designed to measure.
Test
•a process of quantifying the degree to which
someone/something possess a given trait
Measurement
•a process of gathering and organizing
quantitative or qualitative data into an
interpretable form to have a basis for
judgement or decision making. It is a
prerequisite to evaluation. It provides the
information which enables evaluation to take
place.
Assessment
•a process of systematic interpretation,
analysis, appraisal or judgement of the worth
of organized data as basis for decision
making. It involves judgement about the
desirability of changes in students.
Evaluation
•It refers to the use of pen and paper
objective test.
Traditional Assessment
•it refers to the use of methods other than pen
and paper objective test which includes
performance tests, projects, portfolios,
journals and the likes.
Alternative Assessment
•it refers to the use of an assessment method
that simulate true to life situations. This could
be objective tests that reflect real-life
situations or alternative methods that are
parallel to what we experience in real life.
Authentic Assessment
1. Assessment FOR Learning – this includes
three types of assessment done before and
during instruction. These are placement,
formative and diagnostic.
Purposes of Classroom Assessment
• Its purpose is to assess the needs of the learners to
have basis in planning for a relevant instruction.
• Teachers use this assessment to know what their
students are bringing into the learning situation and
use this as a starting point for instruction.
• The results of this assessment place students in
specific learning groups to facilitate teaching and
learning.
a. Placement – done prior to instruction
• This assessment is where teachers continuously
monitor the students’ level of attainment of the
learning objectives.
• The results of this assessment are communicated
clearly and promptly to the students for them to
know their strengths and weaknesses and the
progress of their learning.
b. Formative – done during instruction
• This is used to determine students’ recurring or
persistent difficulties.
• It searches for the underlying causes of student’s
learning problems that do not respond to first aid
treatment. It helps formulate a plan for detailed
remedial instruction.
c. Diagnostic – done before or during instruction
• this is done after instruction. This is usually referred to
as the summative assessment.
• It is used to certify what students know and can do
and the level of their proficiency or competency.
• The information from assessment of learning is usually
expressed as marks or grades.
• The results of which are communicated to the
students, parents and other stakeholders for decision
making.
2. Assessment OF Learning
• this is done for teachers to understand and
perform well their role of assessing FOR and OF
learning. It requires teachers to undergo training on
how to assess learning and be equipped with the
following competencies needed in performing
their work as assessors.
3. Assessment AS Learning
MODE DESCRIPTION EXAMPLES ADVANTAGE DISADVANTAGE
Traditional The objective paper and
pen test which usually
assesses s low-level
thinking skills.
Standardized tests
Teacher-made tests
-Scoring is objective.
-Administration is
easy because
students can take
the test at the same
time.
-Preparation of
instrument is time-
consuming.
-Prone to cheating.
Performance Requires actual
demonstration of skills of
creation of products of
learning
-Practical test
-oral test
-projects
-preparation of the
instrument is
relatively easy
-measures behaviors
that cannot be
deceived
-scoring tends to be
subjective without
rubrics
-administration is time
consuming
Portfolio -a process of gathering
multiple indicators of
student progress to
support course goals in
dynamic, ongoing and
collaborative process.
-working portfolios
-show portfolios
-documentary
portfolios
-measures student’s
growth and
development
-intelligence-fair
-development is time
consuming
-ratings tends to be
subjective without
rubrics
MODES OF ASSESSMENT
1. Placement Evaluation
-done before instruction
-determines mastery of prerequisite skills
-not graded
FOUR TYPES OF EVALUATION
PROCEDURES
2. Summative Evaluation
-done after instruction
-certifies mastery of the intended learning
outcomes
-graded
-examples include quarterly exams, unit or
chapter tests, final exams
• determine the extent of what the pupils have
achieved or mastered in the objectives of the
intended instruction
• determine the students in specific learning groups
to facilitate teaching and learning
• serve as a pretest for the next unit
• serve as basis in planning for a relevant instruction
Both Placement and Summative Evaluations:
•reinforces successful learning
•provides continues feedback to both
students and teachers concerning learning
success and failures
•not graded
•examples: short quizzes, recitations
3. Formative Evaluation
•determine persistent deficiencies
•helps formulate a plan for remedial
instruction
4. Diagnostic Evaluation
1. administered during instruction
2. designed to formulate a plan for remedial
instruction
3. modify the teaching and learning process
4. not graded
Formative and Diagnostic Evaluation both:
Principle 1: Clarity of Learning Targets
• Clear and appropriate learning targets
include what students know and can do and
the criteria for judging student performance.
PRINCIPLES OF HIGH QUALITY
ASSESSMENT
•the method of assessment to be used should
match the learning targets
Principle 2: Appropriateness of Assessment Methods
• A balanced assessment sets target in all
domains of learning or domains of
intelligence.
• A balanced assessment makes use of both
traditional and alternative assessments.
Principle 3: Balance
•is the degree to which the assessment
instrument measures what it intends to
measure. It is also refers to the usefulness of
the instrument for a given purpose. It is the
most important criterion of a good
assessment instrument.
Principle 4: Validity
1. Face validity – is done by examining the
physical appearance of the instrument to
make it readable and understandable.
2. Content validity – is done through a careful
and critical examination of the objectives of
assessment to reflect the curricular objectives.
Ways in Establishing Validity
–is established statistically such that a set of scores obtained
in another external predictor or measure. It has two
purposes: concurrent and predictive.
a. Concurrent validity – describes the present status of
the individual by correlating the sets of scores
obtained from two measures given at a close interval.
b. Predictive validity – describes the future
performance of an individual by correlating the sets of
scores obtained from two measures given at a
longer time interval.
3. Criterion-related validity
– is established statistically by comparing psychological
traits of factors that theoretically influence scores in a test.
a. Convergent validity – is established if the instrument
defines another similar trait other than what it is
intended to measure. Ex. Critical thinking may be
correlated with creative thinking test
b. Divergent Validity – is established if an instrument
can describe only the intended trait and not the other
traits. Ex. Critical thinking test may not be correlated with
reading comprehension test.
4. Construct validity
•this refers to the degree of consistency when
several items in a test measure the same
thing and the stability when the same
measures are given across time
•Split-Half method, test-retest method, parallel
or equivalent form
Principle 5: Reliability
•fair assessment is unbiased and provides
students with opportunities to demonstrate
what they have learned.
Principle 6: Fairness
•When assessing learning, the information
obtained should be worth the resources and
time required to obtain it. The easier the
procedure, the more reliable the assessment
is.
Principle 7: Practicality and Efficiency
•Assessment takes place in all phases of
instruction. It could be done before, during
and after instruction.
Principle 8: Continuity
•Assessment targets and standards should be
communicated. Assessment results should be
communicated to important users.
Assessment results should be communicated
to students through direct interaction or
regular ongoing of feedback on their
progress.
Principle 9: Communication
•Assessment should have a positive
consequence to students; that is; it should
motivate them to learn.
•Assessment should have a positive
consequence to teachers; that is, it should
help them improve the effectiveness of their
instruction.
Principle 10: Positive Consequences
• Teachers should free the students from harmful
consequences of misuse or overuse of various
assessment procedures such as embarrassing students
and violating students right to confidentiality.
• Teachers should be guided by laws and policies that
affect their classroom assessment.
• Administrators and teachers should understand that it
is inappropriate to use standardized student
achievement to measure teaching effectiveness.
Principle 11: Ethics
• is a process of gathering information about
student’s learning through actual demonstration of
essential and observable skills and creation of
products that are grounded in real world contexts
and constraints. It is an assessment that is open to
many possible answers and judged using multiple
criteria or standards of excellence that are pre-
specified and public.
Performance-based Assessment
1. Demonstration type – this is a task that
requires no product. Examples: cooking
demonstration, presentations
2. Creation-type – this is a task that requires
tangible products
•Example: project plan, research paper,
project flyers
Types of Performance-based Task
•is also an alternative to pen and paper
objective test. It is a purposeful, ongoing,
dynamic and collaborative process of
gathering multiple indicators of the learner’s
growth and development. Portfolio
assessment is also performance based but
more authentic than any performance-
based task.
Portfolio Assessment
1. Content principle – suggests that portfolios should
reflect the subject matter that is important for the
students to learn.
2. Learning principle – suggests that portfolios should
enable the students to become active and
thoughtful learners.
3. Equity principle -explains that portfolios should
allow students to demonstrate their learning styles
and multiple intelligences.
Principles Underlying Portfolio
Assessment
1. The working portfolio is a collection of a
student’s day to day works which reflect
his/her learning.
2. The show portfolio is a collection of a
student’s best works.
3. The documentary portfolio is a combination
of a working and a show portfolio.
Types of Portfolios
Levels of Learning
Outcomes
Description Some Question Clues
Knowledge Involves remembering or recalling
previously learned material or a wide
range of materials
-list, define, identify, name,
recall, state, arrange
Comprehension Ability to grasp the meaning of material
b translating materials from one form to
another or by interpreting material
-describe, interpret, classify,
differentiate, explain, translate
Application Ability to use learned material in new
and concrete situations
-apply, demonstrate, solve,
interpret, use, experiment
Analysis Ability to break down material into its
component parts so that the whole
structure is understood
-analyze, separate, explain,
examine, discriminate, infer
Synthesis Ability to put parts together to form a
new whole
-integrate, plan, generalize,
construct, design, propose
Evaluation Ability to judge the value of material on
the basis of a definite criteria
-assess, decide, judge, support,
summarize, defend
A. COGNITIVE DOMAIN
Categories Description Some Illustrative Verbs
Receiving Willingness to receive or to
attend to a particular
phenomenon or stimulus
-acknowledge, ask,
choose, follow, listen,
reply, watch
Responding Refers to active participation on
the part of the student
-answer, assist, contribute,
cooperate, follow-up
Valuing Ability to see worth or value in a
subject activity
-adopt, commit, desire,
display, explain, initiate
Organization Bringing together a complex of
values, resolving conflicts
between them, and beginning to
build an internally consistent
value system
-adapt, categories,
establish, generalize,
integrate, organize
B. AFFECTIVE DOMAIN
Categories Description Some Illustrative Verbs
Imitation Early stages in learning a complex skill
after an indication or readiness to take
a particular type of action
-carry out, assemble, practice,
follow, repeat, sketch, move
Manipulation A particular skill or sequence; is
practiced continuously until it becomes
habitual and done with some
confidence and proficiency
-acquire, complete, conduct,
improve, perform, produce
Precision A skill has been attained with
proficiency and efficiency
-achieve, accomplish, excel.
master, succeed
Articulation An individual can modify movement
patters to meet a particular situation
-adapt, change, excel,
reorganize, rearrange
Naturalization An individual responds automatically
and creates new motor ways of
manipulation out of understanding,
abilities and skills developed
 arrange, combine, compose,
construct, create, design
C. PSYCHOMOTOR DOMAIN
MAIN POINT OF
COMPARISON
TYPES OF TESTS
Purpose Psychological Educational
-aims to measure students’
intelligence or mental
ability in a large degree
without reference to what
the students has learned
(e.g. aptitude test,
personality tests,
intelligence tests)
-aims to measure the result
of instructions and learning
(e.g. achievement tests,
performance test)
DIFFERENT TYPES OF TEST
Scope of Content Survey Mastery
-covers a broad
range of objectives
-covers a specific
objective
-measures general
achievement in
certain subjects
-measures
fundamental skills
and abilities
-constructed by
trained
professional
-typically
constructed by the
teacher
Language Mode Verbal Non-verbal
-words are used
by students in
attaching
meaning to or
responding to
test items
-students do
not use words
in attaching
meaning to or in
responding to
test items
Construction Standardized Informal
-constructed by a professional
item writer
-constructed by a classroom
teacher
-covers a broad range of
content covered in a subject
area
-covers a narrow range of
content
Uses mainly multiple choice -various types of items are
used
-items written are screened
and the best items were
chosen for the final
instrument
-teacher picks or writes items
as needed for the test
-can be scored by a machine -scored manually by the
teacher
-interpretation of results is
usually norm-referenced
-interpretation is usually
criterion-referenced
Manner of
Administration
Individual Group
-mostly given orally or
requires actual
demonstration of skill
-this is a paper and pen
test
-one on one situations,
thus, many
opportunities for clinical
observation
-loss of rapport, insights
and knowledge about
each examinee
-chance to follow up
examinee’s response in
order to clarify or
comprehend more
clearly
-same amount of time
needed to gather
information from one
student
Effect of Biases Objective Subjective
-scorer’s personal
judgment does not
affect the scoring
-affected by scorer’s
personal opinions,
biases and judgments
Worded that only one
answer is acceptable
-several answers are
possible
-little of no
disagreement on
what is the correct
answer
-possible to
disagreement on
what is the correct
answer
Time Limit and
Level of Difficulty
Power Speed
-consists of series
of items arranged
in ascending order
of difficulty
-consists of items
approximately
equal in difficulty
-measures
student’s ability to
answer more and
more difficult items
-measures students
speed or rate and
accuracy in
responding
Format Selective Supply
-there are choices for the
answer
-there are no choices for
the answer
-multiple choice, true and
false, matching type
-short answer,
completion, restricted or
extended essay
-can be answered quickly -may require a longer
time to answer
-prone to guessing -less chance to guessing
but prone to bluffing
-time consuming to
construct
-time consuming to
answer/score
Nature of
Assessment
Maximum
Performance
Typical
Performance
-determines
what individuals
can do when
performing at
their best
-determines
what individuals
will do under
natural
conditions
Interpretation Norm-referenced Criterion-referenced
-result is interpreted by
comparing one student’s
performance with other student’s
performance.
-results is interpreted by
comparing students’
performance based on a
predefined standard (mastery)
-some will really pass All or none may pass
-there is competition for a limited
percentage of high scores
There is no competition for a
limited percentage of high score
-typically covers a large domain
of learning tasks
-typically focuses on a delimited
domain of learning tasks
-emphasizes discrimination
among individuals in terms of
level of learning
-emphasizes description of what
learning tasks individuals can and
cannot perform
-favors items of average difficulty
and typically omits very easy and
very difficult items
Matches items difficulty to
learning tasks, without altering
items difficulty or omitting easy
-interpretation requires clearly
defined group
Interpretation requires a clearly
defined and delimited
achievement domain
Reference Interpretation Provided Condition that must be present
Ability-referenced How are students performing
relative to what they are
capable of doing
Good measures of the students
maximum possible
performance
Growth-referenced How much have students
changes or improved relative to
what they were doing
Pre and post measures of
performance that are highly
reliable
Norm-referenced How well are students doing
with respect to what is typical
or reasonable
Clear and understanding of
whom students are being
compared to
Criterion-reference What can students do and not
do
Well-defined content domain
that was assessed
FOUR COMMONLY-USED REFERENCES
FOR CLASSROOM INTERPRETATION
1. Selective Type – provides choices for the answer.
a. Multiple Choice – consists of a stem which describes the
problem and 3 or more alternatives which give the suggested
solutions. The incorrect alternatives are the distractors.
b. True-False or Alternative Response - consists of declarative
statement that one has to mark true or false, right or wrong,
correct or incorrect, yes or no, fact or opinion and the like.
c. Matching Type – consists of two parallel columns: Column A,
the column of premises from which a match is sought: Column B, the
column of responses from which the selection is made.
TYPES OF TEST ACCORDING TO
FORMAT
Type Advantages Limitations
Multiple Choice -more adequate sampling of content
-tend to structure the problem to be
addressed more effectively
-can be quickly and objectively scored
-prone to guessing
-often indirectly measure targeted
behaviors
-time-consuming to construct
Alternative Response -more adequate sampling of content
-easy to construct
-can be effectively and objectively scored
=prone to guessing
-can be used only when
dichotomous answers represent
sufficient response options
-usually must indirectly measure
performance related to procedural
knowledge
Matching Type -allows comparison of related ideas
-concepts or theories
-effectively assesses association between a
variety of items within a topic
-encourages integration of information
-can be quickly and objectively scored
-can be easily administered
-difficult to produce a sufficient
number of plausible premises
-not effective in testing isolated
facts
-may be limited to lower levels of
understanding
-useful; only when there is a
sufficient number of related items
-may be influenced by guessing
a. Short Answer – uses a direct question that
can be answered by a word, phrase, a
number or a symbol
b. Completion Test – consists of an incomplete
statement
2. Supply Test
Advantages Limitations
-easy to construct
-require the student to
supply the answer
-many can be included
in one test
-generally limited to
measuring recall of
information
-more likely to be
scored erroneously due
to a variety of
responses
a. Restricted - limits the content of the
response by restricting the scope of the topic.
b. Extended Response – allows the students to
select any factual information that they think is
pertinent, to organize their answers in
accordance with their best judgment
3. Essay Test
Advantages Limitations
-measure more directly
behaviors specified by
performance objectives
-examine students’ written
communication skills
-require the students to
supply the response
- provide a less adequate
sampling of content
-less reliable scoring
-time-consuming
1. Use your TOS (Table of Specification) as guide to
item writing.
2. Write more test items than needed.
3. Write the test items well in advance of the testing
date.
4. Write each items so that the task to be performed
is clearly defined.
5. Write each test items in appropriate reading level.
GENERAL SUGGESTIONS IN WRITING
TESTS
6. Write each test item so that it does not provide
help in answering other items in the test.
7. Write each test item so that the answer is one that
would be agreed upon by experts.
8. Write test items so that it is the proper level of
difficulty.
9. Whenever a test is revised, recheck its relevance.
1. Word the item/s so that the required answer is
both brief and specific.
2. Do not take statements directly from textbooks to
use as a basis for short answer items.
3. A direct question is generally more desirable than
an incomplete statement.
4. If item is to be expressed in numerical units,
indicate type of answer wanted.
SPECIFIC SUGGESTIONS
5. Blanks should be equal in length.
6. Answers should be written before the item
number for easy checking.
7. When completion items are to be used, do not
have too many blanks. Blanks should be at the
center of the sentence and not at the beginning.
1. Restrict the use of essay questions to those learning
outcomes that cannot be satisfactorily measured by
objective items.
2. Formulate questions that will call forth the behavior
specified in the learning outcome.
3. Phrase each question so that the pupil’s task is clearly
indicated.
4. Indicate an approximate time limit for each question.
5. Avoid the use of optional questions.
B. ESSAY TYPE
Alternative Response
1. Avoid broad statements.
2. Avoid trivial statements.
3. Avoid the use of negative statements
especially double negative.
4. Avoid long and complex sentences.
C. SELECTIVE TYPE
1. Use only homogenous materials in a single matching exercise.
2. Include an unequal number of responses and premises and
instruct the pupils that response may be used once, more than
once or not at all.
3. Keep the list of items to be matched brief and place the shorter
responses at the right.
4. Arrange the list of responses in logical order.
5. Indicate in the directions the base for matching the responses
and premises.
6. Place all the items for one matching exercise on the same page.
Matching Type
1. The stem of the item should be meaningful by itself and should
present a definite problem
2. The item should include as much of the item as possible and
should be free of irrelevant information.
3. Use a negatively stated item only when significant learning
outcome requires it.
4. Highlight negative words in the stem for emphasis.
5. All the alternatives should be grammatically consistent with the
stem of the item.
6. An item should only have one correct or clearly best answer.
Multiple Choice
ALTERNATIVE ASSESSMENT
When to Use Specific behaviors or behavioral outcomes are to be observed.
Possibility of judging the appropriateness of students actions
A process or outcomes cannot be directly measured by paper and
pencil tests
Advantages Allow evaluation of complex skills which are difficult to assess
using written tests.
Positive effect on instruction and learning,
Can be used to evaluate both the process and the product
Limitations Time-consuming to administer, develop and score
Subjectivity in scoring
Inconsistencies in performance on alternative skills
PERFORMANCE AND AUTHENTIC
ASSESSMENTS
Characteristics
1. Adaptable to individualized instructional goals.
2. Focus on assessment of products.
3. Identify students’ strengths rather than weaknesses.
4. Actively involve students in the evaluation process.
5. Communicate student achievement to others.
6. Time-consuming.
7. Need of scoring plan to increase reliability
PORTFOLIO ASSESSMENT
TYPES DESCRIPTION
Showcase A collection of students’ best work
Reflective Used for helping teachers, students and family members
think about various dimensions of student learning (e.g.
effort, achievement)
Cumulative A collection of items done for an extended period of time
Analyzed to verify changes in the products and process
associated with student learning
Goal-based A collection of works chosen by students and teachers to
match pre-established objectives
Process A way of documenting the steps and processes a student has
done to complete a place of work.
•scoring guide consisting of specific pre-
established performance criteria used in
evaluating student work in performance
assessments
RUBRICS
1. Holistic Rubric – requires the teacher to score the
overall process or product as a whole, without
judging the component parts separately
2. Analytic Rubric – requires the teacher to score
individual components of the product or
performance first, then sums the individual scores to
obtain a total score.
Two Types
1. Closed –Item or Forced-Choice Instruments – ask
for one or specific answer
a. Checklist – measures student’s preferences,
hobbies, attitudes, feelings, beliefs, interests,
etc. by marking a set of possible responses.
AFFECTIVE ASSESSMENTS
1. Rating Scale – measures the degree or extent of one’s
attitude, feelings and perception about ideas, objects
and people by marking a point along 3 or 5 point scale.
2. Semantic Differential Scale – measures the degree of
one’s attitudes, feelings and perceptions about ideas
and people by marking a point along 3 or 7 or 11 point
scale of semantic adjectives.
3. Likert Scale – measures the degree of one’s agreement
or disagreement on positive or negative statements
about objects and people
b. Scales – these instruments that indicate the extent or degree of one’s
response.
c. Alternate Response – measures students’
preferences, hobbies, attitudes, feelings,
beliefs, interests, by choosing two possible
responses.
d. Ranking – measures students preferences or
priorities by ranking a set of response
a. Sentence Completion – measures students preferences
over a variety of attitudes and allows students to answer by
completing an unfinished statement which may vary in
length.
b. Surveys – measures the values held by an individual by
writing one or many responses to a given question.
c. Essays – allows the students to reveal and clarify their
preferences, hobbies, attitudes, feelings, beliefs and interest
by writing their reaction or opinions to a given question.
2. Open-ended Instruments – they are open to more than one answer.
•VALIDITY – the degree to which a test
measures what is intended to be measured. It
is the usefulness of the test for a given
purpose. It is the most important criteria of a
good examination.
CRITERIA TO CONSIDER IN
CONSTRUCTING GOOD TESTS
a. Appropriateness of test – it should measures the abilities,
skills and information it is supposed to measure.
b. Directions – it should indicate how the learners should
answer and record their answers
c. Reading Vocabulary and Sentence Structure – it should
be based on intellectual level of maturity and background
experience of the learners.
d. Difficulty of Items – it should have items that are not too
difficult and not too easy to be able to discriminate the
bright from slow pupils.
FACTORS influencing the validity of test in general
e. Construction of Items- it should not provide clues so it will not
be a test on clues nor should it be ambiguous so it will not be a
test on interpretation.
f. Length of Test – it should just be of sufficient length so it can
measure what is it supposed to measure and not that it is too
short that it cannot adequately measure the performance we
want to measure.
g. Arrangement of Items – It should have items that are arrange
in ascending level of difficulty such that are arrange in
ascending level of difficulty such that it starts with the easy ones
so that pupils will pursue on taking the test.
h. Pattern of Answer – it should not allow the creation of patterns
in answering the test.
1. Face Validity – is done by examining the physical
appearance of the test
2. Content Validity – is done through a careful and
critical examination of the objectives of the test so
that it reflects the curricular objectives.
3. Criterion-related Validity – is established
statistically such that a set of scores revealed by a
test is correlated with scores.
WAYS OF ESTABLISHING VALIDITY
•Concurrent Validity – describes the present
status of the individual by correlating the sets
of scores obtained from two measures given
concurrently
•Predictive Validity – describes the future
performance of an individual by correlating
the sets of scores obtained from two
measures given at a longer time interval.
•Construct validity – is established statistically
by comparing psychological traits or factors
that influence scores in a test, e.g. verbal,
numerical, spatial, etc.
•Convergent Validity – is established if the
instrument defines another similar trait other
that what it intended to measure (e.g.
Critical Thinking Test may be correlated with
Creative Thinking Test)
•Divergent Validity – is established if an
instrument can describe only the intended
trait and not other traits (e.g. Critical Thinking
Test may not be correlated with Reading
Comprehension Test)
•it refers to the consistency of scores obtained
by the same person when retested using the
same instrument or one that is parallel to it.
RELIABILITY
1. Length of the test – as a general rule, the longer the test, the higher the
reliability. A longer test provides a more adequate sample of the behavior
being measured and is less distorted by chance of factors like guessing.
2. Difficulty of the test – ideally, achievement tests should be constructed
such that the average score is 50 percent correct and the scores range
from zero to near perfect. The bigger the spread of scores, the more
reliable the measured difference is likely to be. A test is reliable if the
coefficient of correlation is not less than 0.85.
3. Objectivity – can be obtained by eliminating the bias, opinions
judgments of the persons who checks the test
4. Administrability – the test should be administered with clarity and
uniformity so that the scores obtained are comparable. Uniformity can be
obtained by setting the time limit and oral instructions.
Factors affecting Reliability
5. Scorability – the test should be easy to score such that
directions for scoring are clear, the scoring key is simple,
provisions for answer sheets are made
6. Economy – the test should be given in the cheapest way,
which means that answer sheets must be provided so the
test can be given from time to time
7. Adequacy – the test should contain a wide sampling of
items to determine the educational outcomes or abilities so
that the resulting scores are representatives of the total
performance in the areas measured.
Steps:
1. Score the test. Arrange the scores from highest to
lowest.
2. Get the top 27% (upper group) and below 27% (lower
group) of the examinees.
3. Count the number of examinees in the upper group
and lower group who got the item correctly.
4. Compute for the Difficulty Index of each item.
5. Compute for the Discrimination Index of each item.
ITEM ANALYSIS
Df =
• where:
Df= difficulty index
Discrimination Index= DU - DL
• where:
DU= difficulty index for upper group
DL= difficulty index for lower group
Difficulty Index
(Df)
Description Action Taken
0.00-0.25 Very difficult Revise/Discard
0.26 -0.75 Average/Right difficulty Retain
0.76 – 1.00 Very easy Revise/Discard
INTERPRETATION
Discrimination
Index
Description Action Taken
0.46 to 1.00 Positive Discriminating Power Retain
-0.50 to 0.45 Could not discriminate Revise
-1.00 to -0.51 Negative Discriminating Power Discard
INTERPRETATION
• Item 1
Distractor Analysis
Sample
A B C D
12 25 13 10 TOTAL
3 10 2 1 UPPER GROUP
5 6 5 0 LOWER GROUP
•Leniency error – faculty tends to judge better than it really is.
•Generosity error – faculty tends to use high end of scale
only.
•Severity error – faculty tends to use low end of scale only.
•Central tendency error – faculty avoids both extremes of
the scale.
•Bias – letting other factors influence score (e.g. handwriting,
types)
•Halo – effect: letting general impression (e.g. student’s prior
work)
SCORING ERRORS AND BIASES
•Contamination effect – judgement is influenced by irrelevant knowledge
about the student or other factors that have no bearing on performance
level (e.g. student appearance)
• Similar to me effect – judging more favorably those students whom
faculty see as similar to themselves (e.g. expressing similar interest or point
of view)
• First- impression effect – judgment is based on early opinions rather than
on a complete picture (opening paragraph)
• Contrast effect – judging by comparing student against other students
instead of established criteria and standards
• Rater drift – unintentionally redefining criteria and standards over time or
across a series of scoring ( e.g. getting tired and cranky and therefore
more severe, getting tired and reading more quickly/leniently to get the
job done.
Measurement Characteristics Examples
Nominal Groups and label
data
Gender (1 male: 2
female)
Ordinal Distance between
points are indefinite
Income (1 low, 2-
average, 3- high)
Interval Distance between
points are equal
No absolute zero
Test scores.
temperature
Ratio Absolute zero Height, weight
FOUR TYPES OF MEASUREMENT
SCALES
1. Normal/Bell-shaped/Symmetrical
2. Positively Skewed- most scores are below the
mean and there are extremely high scores.
3. Negatively Skewed – most scores are above the
mean and there are extremely low scores.
4. Leptokurtic – highly peaked and the tails are
more elevated above the baseline.
SHAPES OF FREQUENCY POLYGONS
5. Mesokurtic – moderately peaked
6. Platykurtic – flattened peak
7. Bimodal Curve- curve with 2 peaks or
modes
8. Polymodal Curve – curve with 3 or more
modes
9. Rectangular Distribution – there is no mode.
• Skewers – distortion or asymmetry in a symmetrical
bell curve or normal distribution in a set of data.
• Positively-skewed (or right skewed) –the mean is
greater than the median, it means more students
understand the course or the test was simple
• negatively-skewed (or left skewed)- the mean is
less than the median; it means students did not
understand the test concepts or they were not
taught well
MEASURES OF CENTRAL TENDENCY
AND VARIABILITY
Assumptions when used Appropriate Statistical Tools
Measures of Central Tendency
(describes the representative
value of a set of data)
Measures of Variability
(describes the degree of spread
or dispersion of a set of data)
When the frequency
distribution is regular or
symmetrical
(normal)
Usually used when data are
numeric (interval or ratio)
Mean – the arithmetic average Standard Deviation – the root
mean square of the deviations
from the mean
When the frequency
distribution is irregular or sked.
Usually when the data is ordinal
Median – the middle score in a
group of scores that are ranked
Quartile Deviation – the
average deviation of the 1st
and
3rd
quartiles from the median
When the distribution of scores
is normal and quick answer is
needed
Usually when the data are
nominal
Mode – the most frequent
score
Range – the difference between
the highest and the lowest
score in the distribution
• The value that represents a set of data will
be the basis in determining whether the group
is performing better or poorer than the other
groups.
How to Interpret the Measures of Central Tendency
• The result will help you determine if the
group is homogenous or not.
• The result will help you determine the
number of students that fall below and above
the average performance.
How to Interpret the Standard Deviation
• Standard Deviation is a measure of how spread out
number are; a measurement that indicates how much
a group of scores vary from the average; it can tell you
how much a group of grades varied on any given test,
it might be able to tell you if the test was too easy or
too difficult.
• A small standard deviation means that the values in a
statistical data set are close to the mean of the data
set while the large SD means the values are farther
away from the mean.
• The result will help you determine if the
group is homogenous or not.
• The result will also help you determine the
number of students that fall below and above
the average performance.
How to Interpret the Quartile Deviation
-tells the percentage of examinee that lies below one’s score
• Percentile Scores – percentage of scores in its frequency
distribution that are equal to or lower than it.
• Scores of students are arranged in rank order from lowest to
highest- the scores are divided into 100 equally sized groups or
bands. The lowest score is “in the 1st percentile”. (there is no 0
percentile). The highest score is in the “99th percentile”
• If you’re score is in the 60th percentile, it means that you
scored better than 60 percent of all the test takers.
PERCENTILE
-tells the number of standard deviations
equivalent to a given raw scores.
Z-SCORES
a. could represent:
-how a student is performing in relation to other
students (norm-referenced grading)
-the extent to which a student has mastered a
particular body knowledge (criterion-referenced
grading); how a student is performing in relation to a
teacher’s judgment of his or her potential
GRADES
-certification that gives assurance that student has
mastered a specific content or achieved a certain level of
accomplishment.
-selection that provides basis in identifying or grouping
students for certain educational paths or programs
-direction that provides information for diagnosis and
planning
-motivation that emphasizes specific material or skills to be
learned and helping students to understand and improve
their performance.
b. could be for:
• Criterion-Referenced Grading – or grading based
on fixed or absolute standards where grade is
assigned based on how a student has met the
criteria or a well –defined objectives of a course
that were spelled out in advance. It is then up to
the student to earn the grade he or she wants to
receive regardless of how other students in the
class have performed. This is done by transmuting
test scores into marks or rating.
c. could be assigned by using:
•or grading based on relatives standards
where a student’s grade reflects his or her
level of achievement relative to the
performance of other students in the class. In
this system, the grade is assigned based on
the average of test scores.
Norm –referenced Grading-
•whereby the teacher identifies points or
percentages for various tests and class
activities depending on their performance.
The total of these points will be the bases for
the grade assigned to the student.
Point or Percentage Grading System
•where each student agrees to work for a
particular grade according to agreed-upon
standards.
Contact Grading System
The following points provide helpful reminders when preparing for and
conducting parent-teacher conferences.
1. Make plans for the conference. Set the goals and objectives of the
conference ahead of time.
2. Begin the conference in a positive manner. Starting the conference
by making a positive statement about the student sets the tone for the
meeting.
3. Present the student’s strong points before describing the areas
needing improvement. It is helpful to present examples of the
student’s work when discussing the student’s performance.
4. Encourage parents to participate and share information. Although
as a teacher you are in charge of the conference, you must be willing
to listen to parents and share information rather than “talk at” them.
Conducting Parent-Teacher
Conferences
5. Plan a course of action cooperatively. The discussion
should lead to what steps can be taken by the teacher and
the parent to help the student.
6. End the conference with a positive comment. At the end
of the conference, thank the parents for coming and say
something positive about the student, like “ Lucas has a
good sense of humor and I enjoy having him in class.”
7. Use good human relation skills during the conference.
Some of these skills can be summarized by following the
do’s and don’ts.
Thank You!

Assessment.pptx module 1: professional education

  • 1.
  • 2.
    •an instrument designedto measure any characteristic, quality, ability, knowledge or skill. It comprised of items in the area it is designed to measure. Test
  • 3.
    •a process ofquantifying the degree to which someone/something possess a given trait Measurement
  • 4.
    •a process ofgathering and organizing quantitative or qualitative data into an interpretable form to have a basis for judgement or decision making. It is a prerequisite to evaluation. It provides the information which enables evaluation to take place. Assessment
  • 5.
    •a process ofsystematic interpretation, analysis, appraisal or judgement of the worth of organized data as basis for decision making. It involves judgement about the desirability of changes in students. Evaluation
  • 6.
    •It refers tothe use of pen and paper objective test. Traditional Assessment
  • 7.
    •it refers tothe use of methods other than pen and paper objective test which includes performance tests, projects, portfolios, journals and the likes. Alternative Assessment
  • 8.
    •it refers tothe use of an assessment method that simulate true to life situations. This could be objective tests that reflect real-life situations or alternative methods that are parallel to what we experience in real life. Authentic Assessment
  • 9.
    1. Assessment FORLearning – this includes three types of assessment done before and during instruction. These are placement, formative and diagnostic. Purposes of Classroom Assessment
  • 10.
    • Its purposeis to assess the needs of the learners to have basis in planning for a relevant instruction. • Teachers use this assessment to know what their students are bringing into the learning situation and use this as a starting point for instruction. • The results of this assessment place students in specific learning groups to facilitate teaching and learning. a. Placement – done prior to instruction
  • 11.
    • This assessmentis where teachers continuously monitor the students’ level of attainment of the learning objectives. • The results of this assessment are communicated clearly and promptly to the students for them to know their strengths and weaknesses and the progress of their learning. b. Formative – done during instruction
  • 12.
    • This isused to determine students’ recurring or persistent difficulties. • It searches for the underlying causes of student’s learning problems that do not respond to first aid treatment. It helps formulate a plan for detailed remedial instruction. c. Diagnostic – done before or during instruction
  • 13.
    • this isdone after instruction. This is usually referred to as the summative assessment. • It is used to certify what students know and can do and the level of their proficiency or competency. • The information from assessment of learning is usually expressed as marks or grades. • The results of which are communicated to the students, parents and other stakeholders for decision making. 2. Assessment OF Learning
  • 14.
    • this isdone for teachers to understand and perform well their role of assessing FOR and OF learning. It requires teachers to undergo training on how to assess learning and be equipped with the following competencies needed in performing their work as assessors. 3. Assessment AS Learning
  • 15.
    MODE DESCRIPTION EXAMPLESADVANTAGE DISADVANTAGE Traditional The objective paper and pen test which usually assesses s low-level thinking skills. Standardized tests Teacher-made tests -Scoring is objective. -Administration is easy because students can take the test at the same time. -Preparation of instrument is time- consuming. -Prone to cheating. Performance Requires actual demonstration of skills of creation of products of learning -Practical test -oral test -projects -preparation of the instrument is relatively easy -measures behaviors that cannot be deceived -scoring tends to be subjective without rubrics -administration is time consuming Portfolio -a process of gathering multiple indicators of student progress to support course goals in dynamic, ongoing and collaborative process. -working portfolios -show portfolios -documentary portfolios -measures student’s growth and development -intelligence-fair -development is time consuming -ratings tends to be subjective without rubrics MODES OF ASSESSMENT
  • 16.
    1. Placement Evaluation -donebefore instruction -determines mastery of prerequisite skills -not graded FOUR TYPES OF EVALUATION PROCEDURES
  • 17.
    2. Summative Evaluation -doneafter instruction -certifies mastery of the intended learning outcomes -graded -examples include quarterly exams, unit or chapter tests, final exams
  • 18.
    • determine theextent of what the pupils have achieved or mastered in the objectives of the intended instruction • determine the students in specific learning groups to facilitate teaching and learning • serve as a pretest for the next unit • serve as basis in planning for a relevant instruction Both Placement and Summative Evaluations:
  • 19.
    •reinforces successful learning •providescontinues feedback to both students and teachers concerning learning success and failures •not graded •examples: short quizzes, recitations 3. Formative Evaluation
  • 20.
    •determine persistent deficiencies •helpsformulate a plan for remedial instruction 4. Diagnostic Evaluation
  • 21.
    1. administered duringinstruction 2. designed to formulate a plan for remedial instruction 3. modify the teaching and learning process 4. not graded Formative and Diagnostic Evaluation both:
  • 22.
    Principle 1: Clarityof Learning Targets • Clear and appropriate learning targets include what students know and can do and the criteria for judging student performance. PRINCIPLES OF HIGH QUALITY ASSESSMENT
  • 23.
    •the method ofassessment to be used should match the learning targets Principle 2: Appropriateness of Assessment Methods
  • 24.
    • A balancedassessment sets target in all domains of learning or domains of intelligence. • A balanced assessment makes use of both traditional and alternative assessments. Principle 3: Balance
  • 25.
    •is the degreeto which the assessment instrument measures what it intends to measure. It is also refers to the usefulness of the instrument for a given purpose. It is the most important criterion of a good assessment instrument. Principle 4: Validity
  • 26.
    1. Face validity– is done by examining the physical appearance of the instrument to make it readable and understandable. 2. Content validity – is done through a careful and critical examination of the objectives of assessment to reflect the curricular objectives. Ways in Establishing Validity
  • 27.
    –is established statisticallysuch that a set of scores obtained in another external predictor or measure. It has two purposes: concurrent and predictive. a. Concurrent validity – describes the present status of the individual by correlating the sets of scores obtained from two measures given at a close interval. b. Predictive validity – describes the future performance of an individual by correlating the sets of scores obtained from two measures given at a longer time interval. 3. Criterion-related validity
  • 28.
    – is establishedstatistically by comparing psychological traits of factors that theoretically influence scores in a test. a. Convergent validity – is established if the instrument defines another similar trait other than what it is intended to measure. Ex. Critical thinking may be correlated with creative thinking test b. Divergent Validity – is established if an instrument can describe only the intended trait and not the other traits. Ex. Critical thinking test may not be correlated with reading comprehension test. 4. Construct validity
  • 29.
    •this refers tothe degree of consistency when several items in a test measure the same thing and the stability when the same measures are given across time •Split-Half method, test-retest method, parallel or equivalent form Principle 5: Reliability
  • 30.
    •fair assessment isunbiased and provides students with opportunities to demonstrate what they have learned. Principle 6: Fairness
  • 31.
    •When assessing learning,the information obtained should be worth the resources and time required to obtain it. The easier the procedure, the more reliable the assessment is. Principle 7: Practicality and Efficiency
  • 32.
    •Assessment takes placein all phases of instruction. It could be done before, during and after instruction. Principle 8: Continuity
  • 33.
    •Assessment targets andstandards should be communicated. Assessment results should be communicated to important users. Assessment results should be communicated to students through direct interaction or regular ongoing of feedback on their progress. Principle 9: Communication
  • 34.
    •Assessment should havea positive consequence to students; that is; it should motivate them to learn. •Assessment should have a positive consequence to teachers; that is, it should help them improve the effectiveness of their instruction. Principle 10: Positive Consequences
  • 35.
    • Teachers shouldfree the students from harmful consequences of misuse or overuse of various assessment procedures such as embarrassing students and violating students right to confidentiality. • Teachers should be guided by laws and policies that affect their classroom assessment. • Administrators and teachers should understand that it is inappropriate to use standardized student achievement to measure teaching effectiveness. Principle 11: Ethics
  • 36.
    • is aprocess of gathering information about student’s learning through actual demonstration of essential and observable skills and creation of products that are grounded in real world contexts and constraints. It is an assessment that is open to many possible answers and judged using multiple criteria or standards of excellence that are pre- specified and public. Performance-based Assessment
  • 37.
    1. Demonstration type– this is a task that requires no product. Examples: cooking demonstration, presentations 2. Creation-type – this is a task that requires tangible products •Example: project plan, research paper, project flyers Types of Performance-based Task
  • 38.
    •is also analternative to pen and paper objective test. It is a purposeful, ongoing, dynamic and collaborative process of gathering multiple indicators of the learner’s growth and development. Portfolio assessment is also performance based but more authentic than any performance- based task. Portfolio Assessment
  • 39.
    1. Content principle– suggests that portfolios should reflect the subject matter that is important for the students to learn. 2. Learning principle – suggests that portfolios should enable the students to become active and thoughtful learners. 3. Equity principle -explains that portfolios should allow students to demonstrate their learning styles and multiple intelligences. Principles Underlying Portfolio Assessment
  • 40.
    1. The workingportfolio is a collection of a student’s day to day works which reflect his/her learning. 2. The show portfolio is a collection of a student’s best works. 3. The documentary portfolio is a combination of a working and a show portfolio. Types of Portfolios
  • 41.
    Levels of Learning Outcomes DescriptionSome Question Clues Knowledge Involves remembering or recalling previously learned material or a wide range of materials -list, define, identify, name, recall, state, arrange Comprehension Ability to grasp the meaning of material b translating materials from one form to another or by interpreting material -describe, interpret, classify, differentiate, explain, translate Application Ability to use learned material in new and concrete situations -apply, demonstrate, solve, interpret, use, experiment Analysis Ability to break down material into its component parts so that the whole structure is understood -analyze, separate, explain, examine, discriminate, infer Synthesis Ability to put parts together to form a new whole -integrate, plan, generalize, construct, design, propose Evaluation Ability to judge the value of material on the basis of a definite criteria -assess, decide, judge, support, summarize, defend A. COGNITIVE DOMAIN
  • 43.
    Categories Description SomeIllustrative Verbs Receiving Willingness to receive or to attend to a particular phenomenon or stimulus -acknowledge, ask, choose, follow, listen, reply, watch Responding Refers to active participation on the part of the student -answer, assist, contribute, cooperate, follow-up Valuing Ability to see worth or value in a subject activity -adopt, commit, desire, display, explain, initiate Organization Bringing together a complex of values, resolving conflicts between them, and beginning to build an internally consistent value system -adapt, categories, establish, generalize, integrate, organize B. AFFECTIVE DOMAIN
  • 44.
    Categories Description SomeIllustrative Verbs Imitation Early stages in learning a complex skill after an indication or readiness to take a particular type of action -carry out, assemble, practice, follow, repeat, sketch, move Manipulation A particular skill or sequence; is practiced continuously until it becomes habitual and done with some confidence and proficiency -acquire, complete, conduct, improve, perform, produce Precision A skill has been attained with proficiency and efficiency -achieve, accomplish, excel. master, succeed Articulation An individual can modify movement patters to meet a particular situation -adapt, change, excel, reorganize, rearrange Naturalization An individual responds automatically and creates new motor ways of manipulation out of understanding, abilities and skills developed  arrange, combine, compose, construct, create, design C. PSYCHOMOTOR DOMAIN
  • 45.
    MAIN POINT OF COMPARISON TYPESOF TESTS Purpose Psychological Educational -aims to measure students’ intelligence or mental ability in a large degree without reference to what the students has learned (e.g. aptitude test, personality tests, intelligence tests) -aims to measure the result of instructions and learning (e.g. achievement tests, performance test) DIFFERENT TYPES OF TEST
  • 46.
    Scope of ContentSurvey Mastery -covers a broad range of objectives -covers a specific objective -measures general achievement in certain subjects -measures fundamental skills and abilities -constructed by trained professional -typically constructed by the teacher
  • 47.
    Language Mode VerbalNon-verbal -words are used by students in attaching meaning to or responding to test items -students do not use words in attaching meaning to or in responding to test items
  • 48.
    Construction Standardized Informal -constructedby a professional item writer -constructed by a classroom teacher -covers a broad range of content covered in a subject area -covers a narrow range of content Uses mainly multiple choice -various types of items are used -items written are screened and the best items were chosen for the final instrument -teacher picks or writes items as needed for the test -can be scored by a machine -scored manually by the teacher -interpretation of results is usually norm-referenced -interpretation is usually criterion-referenced
  • 49.
    Manner of Administration Individual Group -mostlygiven orally or requires actual demonstration of skill -this is a paper and pen test -one on one situations, thus, many opportunities for clinical observation -loss of rapport, insights and knowledge about each examinee -chance to follow up examinee’s response in order to clarify or comprehend more clearly -same amount of time needed to gather information from one student
  • 50.
    Effect of BiasesObjective Subjective -scorer’s personal judgment does not affect the scoring -affected by scorer’s personal opinions, biases and judgments Worded that only one answer is acceptable -several answers are possible -little of no disagreement on what is the correct answer -possible to disagreement on what is the correct answer
  • 51.
    Time Limit and Levelof Difficulty Power Speed -consists of series of items arranged in ascending order of difficulty -consists of items approximately equal in difficulty -measures student’s ability to answer more and more difficult items -measures students speed or rate and accuracy in responding
  • 52.
    Format Selective Supply -thereare choices for the answer -there are no choices for the answer -multiple choice, true and false, matching type -short answer, completion, restricted or extended essay -can be answered quickly -may require a longer time to answer -prone to guessing -less chance to guessing but prone to bluffing -time consuming to construct -time consuming to answer/score
  • 53.
    Nature of Assessment Maximum Performance Typical Performance -determines what individuals cando when performing at their best -determines what individuals will do under natural conditions
  • 54.
    Interpretation Norm-referenced Criterion-referenced -resultis interpreted by comparing one student’s performance with other student’s performance. -results is interpreted by comparing students’ performance based on a predefined standard (mastery) -some will really pass All or none may pass -there is competition for a limited percentage of high scores There is no competition for a limited percentage of high score -typically covers a large domain of learning tasks -typically focuses on a delimited domain of learning tasks -emphasizes discrimination among individuals in terms of level of learning -emphasizes description of what learning tasks individuals can and cannot perform -favors items of average difficulty and typically omits very easy and very difficult items Matches items difficulty to learning tasks, without altering items difficulty or omitting easy -interpretation requires clearly defined group Interpretation requires a clearly defined and delimited achievement domain
  • 55.
    Reference Interpretation ProvidedCondition that must be present Ability-referenced How are students performing relative to what they are capable of doing Good measures of the students maximum possible performance Growth-referenced How much have students changes or improved relative to what they were doing Pre and post measures of performance that are highly reliable Norm-referenced How well are students doing with respect to what is typical or reasonable Clear and understanding of whom students are being compared to Criterion-reference What can students do and not do Well-defined content domain that was assessed FOUR COMMONLY-USED REFERENCES FOR CLASSROOM INTERPRETATION
  • 56.
    1. Selective Type– provides choices for the answer. a. Multiple Choice – consists of a stem which describes the problem and 3 or more alternatives which give the suggested solutions. The incorrect alternatives are the distractors. b. True-False or Alternative Response - consists of declarative statement that one has to mark true or false, right or wrong, correct or incorrect, yes or no, fact or opinion and the like. c. Matching Type – consists of two parallel columns: Column A, the column of premises from which a match is sought: Column B, the column of responses from which the selection is made. TYPES OF TEST ACCORDING TO FORMAT
  • 57.
    Type Advantages Limitations MultipleChoice -more adequate sampling of content -tend to structure the problem to be addressed more effectively -can be quickly and objectively scored -prone to guessing -often indirectly measure targeted behaviors -time-consuming to construct Alternative Response -more adequate sampling of content -easy to construct -can be effectively and objectively scored =prone to guessing -can be used only when dichotomous answers represent sufficient response options -usually must indirectly measure performance related to procedural knowledge Matching Type -allows comparison of related ideas -concepts or theories -effectively assesses association between a variety of items within a topic -encourages integration of information -can be quickly and objectively scored -can be easily administered -difficult to produce a sufficient number of plausible premises -not effective in testing isolated facts -may be limited to lower levels of understanding -useful; only when there is a sufficient number of related items -may be influenced by guessing
  • 58.
    a. Short Answer– uses a direct question that can be answered by a word, phrase, a number or a symbol b. Completion Test – consists of an incomplete statement 2. Supply Test
  • 59.
    Advantages Limitations -easy toconstruct -require the student to supply the answer -many can be included in one test -generally limited to measuring recall of information -more likely to be scored erroneously due to a variety of responses
  • 60.
    a. Restricted -limits the content of the response by restricting the scope of the topic. b. Extended Response – allows the students to select any factual information that they think is pertinent, to organize their answers in accordance with their best judgment 3. Essay Test
  • 61.
    Advantages Limitations -measure moredirectly behaviors specified by performance objectives -examine students’ written communication skills -require the students to supply the response - provide a less adequate sampling of content -less reliable scoring -time-consuming
  • 62.
    1. Use yourTOS (Table of Specification) as guide to item writing. 2. Write more test items than needed. 3. Write the test items well in advance of the testing date. 4. Write each items so that the task to be performed is clearly defined. 5. Write each test items in appropriate reading level. GENERAL SUGGESTIONS IN WRITING TESTS
  • 63.
    6. Write eachtest item so that it does not provide help in answering other items in the test. 7. Write each test item so that the answer is one that would be agreed upon by experts. 8. Write test items so that it is the proper level of difficulty. 9. Whenever a test is revised, recheck its relevance.
  • 64.
    1. Word theitem/s so that the required answer is both brief and specific. 2. Do not take statements directly from textbooks to use as a basis for short answer items. 3. A direct question is generally more desirable than an incomplete statement. 4. If item is to be expressed in numerical units, indicate type of answer wanted. SPECIFIC SUGGESTIONS
  • 65.
    5. Blanks shouldbe equal in length. 6. Answers should be written before the item number for easy checking. 7. When completion items are to be used, do not have too many blanks. Blanks should be at the center of the sentence and not at the beginning.
  • 66.
    1. Restrict theuse of essay questions to those learning outcomes that cannot be satisfactorily measured by objective items. 2. Formulate questions that will call forth the behavior specified in the learning outcome. 3. Phrase each question so that the pupil’s task is clearly indicated. 4. Indicate an approximate time limit for each question. 5. Avoid the use of optional questions. B. ESSAY TYPE
  • 67.
    Alternative Response 1. Avoidbroad statements. 2. Avoid trivial statements. 3. Avoid the use of negative statements especially double negative. 4. Avoid long and complex sentences. C. SELECTIVE TYPE
  • 68.
    1. Use onlyhomogenous materials in a single matching exercise. 2. Include an unequal number of responses and premises and instruct the pupils that response may be used once, more than once or not at all. 3. Keep the list of items to be matched brief and place the shorter responses at the right. 4. Arrange the list of responses in logical order. 5. Indicate in the directions the base for matching the responses and premises. 6. Place all the items for one matching exercise on the same page. Matching Type
  • 69.
    1. The stemof the item should be meaningful by itself and should present a definite problem 2. The item should include as much of the item as possible and should be free of irrelevant information. 3. Use a negatively stated item only when significant learning outcome requires it. 4. Highlight negative words in the stem for emphasis. 5. All the alternatives should be grammatically consistent with the stem of the item. 6. An item should only have one correct or clearly best answer. Multiple Choice
  • 70.
  • 71.
    When to UseSpecific behaviors or behavioral outcomes are to be observed. Possibility of judging the appropriateness of students actions A process or outcomes cannot be directly measured by paper and pencil tests Advantages Allow evaluation of complex skills which are difficult to assess using written tests. Positive effect on instruction and learning, Can be used to evaluate both the process and the product Limitations Time-consuming to administer, develop and score Subjectivity in scoring Inconsistencies in performance on alternative skills PERFORMANCE AND AUTHENTIC ASSESSMENTS
  • 72.
    Characteristics 1. Adaptable toindividualized instructional goals. 2. Focus on assessment of products. 3. Identify students’ strengths rather than weaknesses. 4. Actively involve students in the evaluation process. 5. Communicate student achievement to others. 6. Time-consuming. 7. Need of scoring plan to increase reliability PORTFOLIO ASSESSMENT
  • 73.
    TYPES DESCRIPTION Showcase Acollection of students’ best work Reflective Used for helping teachers, students and family members think about various dimensions of student learning (e.g. effort, achievement) Cumulative A collection of items done for an extended period of time Analyzed to verify changes in the products and process associated with student learning Goal-based A collection of works chosen by students and teachers to match pre-established objectives Process A way of documenting the steps and processes a student has done to complete a place of work.
  • 74.
    •scoring guide consistingof specific pre- established performance criteria used in evaluating student work in performance assessments RUBRICS
  • 75.
    1. Holistic Rubric– requires the teacher to score the overall process or product as a whole, without judging the component parts separately 2. Analytic Rubric – requires the teacher to score individual components of the product or performance first, then sums the individual scores to obtain a total score. Two Types
  • 78.
    1. Closed –Itemor Forced-Choice Instruments – ask for one or specific answer a. Checklist – measures student’s preferences, hobbies, attitudes, feelings, beliefs, interests, etc. by marking a set of possible responses. AFFECTIVE ASSESSMENTS
  • 79.
    1. Rating Scale– measures the degree or extent of one’s attitude, feelings and perception about ideas, objects and people by marking a point along 3 or 5 point scale. 2. Semantic Differential Scale – measures the degree of one’s attitudes, feelings and perceptions about ideas and people by marking a point along 3 or 7 or 11 point scale of semantic adjectives. 3. Likert Scale – measures the degree of one’s agreement or disagreement on positive or negative statements about objects and people b. Scales – these instruments that indicate the extent or degree of one’s response.
  • 82.
    c. Alternate Response– measures students’ preferences, hobbies, attitudes, feelings, beliefs, interests, by choosing two possible responses. d. Ranking – measures students preferences or priorities by ranking a set of response
  • 83.
    a. Sentence Completion– measures students preferences over a variety of attitudes and allows students to answer by completing an unfinished statement which may vary in length. b. Surveys – measures the values held by an individual by writing one or many responses to a given question. c. Essays – allows the students to reveal and clarify their preferences, hobbies, attitudes, feelings, beliefs and interest by writing their reaction or opinions to a given question. 2. Open-ended Instruments – they are open to more than one answer.
  • 84.
    •VALIDITY – thedegree to which a test measures what is intended to be measured. It is the usefulness of the test for a given purpose. It is the most important criteria of a good examination. CRITERIA TO CONSIDER IN CONSTRUCTING GOOD TESTS
  • 85.
    a. Appropriateness oftest – it should measures the abilities, skills and information it is supposed to measure. b. Directions – it should indicate how the learners should answer and record their answers c. Reading Vocabulary and Sentence Structure – it should be based on intellectual level of maturity and background experience of the learners. d. Difficulty of Items – it should have items that are not too difficult and not too easy to be able to discriminate the bright from slow pupils. FACTORS influencing the validity of test in general
  • 86.
    e. Construction ofItems- it should not provide clues so it will not be a test on clues nor should it be ambiguous so it will not be a test on interpretation. f. Length of Test – it should just be of sufficient length so it can measure what is it supposed to measure and not that it is too short that it cannot adequately measure the performance we want to measure. g. Arrangement of Items – It should have items that are arrange in ascending level of difficulty such that are arrange in ascending level of difficulty such that it starts with the easy ones so that pupils will pursue on taking the test. h. Pattern of Answer – it should not allow the creation of patterns in answering the test.
  • 87.
    1. Face Validity– is done by examining the physical appearance of the test 2. Content Validity – is done through a careful and critical examination of the objectives of the test so that it reflects the curricular objectives. 3. Criterion-related Validity – is established statistically such that a set of scores revealed by a test is correlated with scores. WAYS OF ESTABLISHING VALIDITY
  • 88.
    •Concurrent Validity –describes the present status of the individual by correlating the sets of scores obtained from two measures given concurrently
  • 89.
    •Predictive Validity –describes the future performance of an individual by correlating the sets of scores obtained from two measures given at a longer time interval.
  • 90.
    •Construct validity –is established statistically by comparing psychological traits or factors that influence scores in a test, e.g. verbal, numerical, spatial, etc.
  • 91.
    •Convergent Validity –is established if the instrument defines another similar trait other that what it intended to measure (e.g. Critical Thinking Test may be correlated with Creative Thinking Test)
  • 92.
    •Divergent Validity –is established if an instrument can describe only the intended trait and not other traits (e.g. Critical Thinking Test may not be correlated with Reading Comprehension Test)
  • 93.
    •it refers tothe consistency of scores obtained by the same person when retested using the same instrument or one that is parallel to it. RELIABILITY
  • 94.
    1. Length ofthe test – as a general rule, the longer the test, the higher the reliability. A longer test provides a more adequate sample of the behavior being measured and is less distorted by chance of factors like guessing. 2. Difficulty of the test – ideally, achievement tests should be constructed such that the average score is 50 percent correct and the scores range from zero to near perfect. The bigger the spread of scores, the more reliable the measured difference is likely to be. A test is reliable if the coefficient of correlation is not less than 0.85. 3. Objectivity – can be obtained by eliminating the bias, opinions judgments of the persons who checks the test 4. Administrability – the test should be administered with clarity and uniformity so that the scores obtained are comparable. Uniformity can be obtained by setting the time limit and oral instructions. Factors affecting Reliability
  • 95.
    5. Scorability –the test should be easy to score such that directions for scoring are clear, the scoring key is simple, provisions for answer sheets are made 6. Economy – the test should be given in the cheapest way, which means that answer sheets must be provided so the test can be given from time to time 7. Adequacy – the test should contain a wide sampling of items to determine the educational outcomes or abilities so that the resulting scores are representatives of the total performance in the areas measured.
  • 98.
    Steps: 1. Score thetest. Arrange the scores from highest to lowest. 2. Get the top 27% (upper group) and below 27% (lower group) of the examinees. 3. Count the number of examinees in the upper group and lower group who got the item correctly. 4. Compute for the Difficulty Index of each item. 5. Compute for the Discrimination Index of each item. ITEM ANALYSIS
  • 99.
    Df = • where: Df=difficulty index Discrimination Index= DU - DL • where: DU= difficulty index for upper group DL= difficulty index for lower group
  • 100.
    Difficulty Index (Df) Description ActionTaken 0.00-0.25 Very difficult Revise/Discard 0.26 -0.75 Average/Right difficulty Retain 0.76 – 1.00 Very easy Revise/Discard INTERPRETATION
  • 101.
    Discrimination Index Description Action Taken 0.46to 1.00 Positive Discriminating Power Retain -0.50 to 0.45 Could not discriminate Revise -1.00 to -0.51 Negative Discriminating Power Discard INTERPRETATION
  • 102.
    • Item 1 DistractorAnalysis Sample A B C D 12 25 13 10 TOTAL 3 10 2 1 UPPER GROUP 5 6 5 0 LOWER GROUP
  • 103.
    •Leniency error –faculty tends to judge better than it really is. •Generosity error – faculty tends to use high end of scale only. •Severity error – faculty tends to use low end of scale only. •Central tendency error – faculty avoids both extremes of the scale. •Bias – letting other factors influence score (e.g. handwriting, types) •Halo – effect: letting general impression (e.g. student’s prior work) SCORING ERRORS AND BIASES
  • 104.
    •Contamination effect –judgement is influenced by irrelevant knowledge about the student or other factors that have no bearing on performance level (e.g. student appearance) • Similar to me effect – judging more favorably those students whom faculty see as similar to themselves (e.g. expressing similar interest or point of view) • First- impression effect – judgment is based on early opinions rather than on a complete picture (opening paragraph) • Contrast effect – judging by comparing student against other students instead of established criteria and standards • Rater drift – unintentionally redefining criteria and standards over time or across a series of scoring ( e.g. getting tired and cranky and therefore more severe, getting tired and reading more quickly/leniently to get the job done.
  • 105.
    Measurement Characteristics Examples NominalGroups and label data Gender (1 male: 2 female) Ordinal Distance between points are indefinite Income (1 low, 2- average, 3- high) Interval Distance between points are equal No absolute zero Test scores. temperature Ratio Absolute zero Height, weight FOUR TYPES OF MEASUREMENT SCALES
  • 106.
    1. Normal/Bell-shaped/Symmetrical 2. PositivelySkewed- most scores are below the mean and there are extremely high scores. 3. Negatively Skewed – most scores are above the mean and there are extremely low scores. 4. Leptokurtic – highly peaked and the tails are more elevated above the baseline. SHAPES OF FREQUENCY POLYGONS
  • 107.
    5. Mesokurtic –moderately peaked 6. Platykurtic – flattened peak 7. Bimodal Curve- curve with 2 peaks or modes 8. Polymodal Curve – curve with 3 or more modes 9. Rectangular Distribution – there is no mode.
  • 108.
    • Skewers –distortion or asymmetry in a symmetrical bell curve or normal distribution in a set of data. • Positively-skewed (or right skewed) –the mean is greater than the median, it means more students understand the course or the test was simple • negatively-skewed (or left skewed)- the mean is less than the median; it means students did not understand the test concepts or they were not taught well
  • 111.
    MEASURES OF CENTRALTENDENCY AND VARIABILITY
  • 112.
    Assumptions when usedAppropriate Statistical Tools Measures of Central Tendency (describes the representative value of a set of data) Measures of Variability (describes the degree of spread or dispersion of a set of data) When the frequency distribution is regular or symmetrical (normal) Usually used when data are numeric (interval or ratio) Mean – the arithmetic average Standard Deviation – the root mean square of the deviations from the mean When the frequency distribution is irregular or sked. Usually when the data is ordinal Median – the middle score in a group of scores that are ranked Quartile Deviation – the average deviation of the 1st and 3rd quartiles from the median When the distribution of scores is normal and quick answer is needed Usually when the data are nominal Mode – the most frequent score Range – the difference between the highest and the lowest score in the distribution
  • 113.
    • The valuethat represents a set of data will be the basis in determining whether the group is performing better or poorer than the other groups. How to Interpret the Measures of Central Tendency
  • 114.
    • The resultwill help you determine if the group is homogenous or not. • The result will help you determine the number of students that fall below and above the average performance. How to Interpret the Standard Deviation
  • 115.
    • Standard Deviationis a measure of how spread out number are; a measurement that indicates how much a group of scores vary from the average; it can tell you how much a group of grades varied on any given test, it might be able to tell you if the test was too easy or too difficult. • A small standard deviation means that the values in a statistical data set are close to the mean of the data set while the large SD means the values are farther away from the mean.
  • 119.
    • The resultwill help you determine if the group is homogenous or not. • The result will also help you determine the number of students that fall below and above the average performance. How to Interpret the Quartile Deviation
  • 120.
    -tells the percentageof examinee that lies below one’s score • Percentile Scores – percentage of scores in its frequency distribution that are equal to or lower than it. • Scores of students are arranged in rank order from lowest to highest- the scores are divided into 100 equally sized groups or bands. The lowest score is “in the 1st percentile”. (there is no 0 percentile). The highest score is in the “99th percentile” • If you’re score is in the 60th percentile, it means that you scored better than 60 percent of all the test takers. PERCENTILE
  • 121.
    -tells the numberof standard deviations equivalent to a given raw scores. Z-SCORES
  • 122.
    a. could represent: -howa student is performing in relation to other students (norm-referenced grading) -the extent to which a student has mastered a particular body knowledge (criterion-referenced grading); how a student is performing in relation to a teacher’s judgment of his or her potential GRADES
  • 123.
    -certification that givesassurance that student has mastered a specific content or achieved a certain level of accomplishment. -selection that provides basis in identifying or grouping students for certain educational paths or programs -direction that provides information for diagnosis and planning -motivation that emphasizes specific material or skills to be learned and helping students to understand and improve their performance. b. could be for:
  • 124.
    • Criterion-Referenced Grading– or grading based on fixed or absolute standards where grade is assigned based on how a student has met the criteria or a well –defined objectives of a course that were spelled out in advance. It is then up to the student to earn the grade he or she wants to receive regardless of how other students in the class have performed. This is done by transmuting test scores into marks or rating. c. could be assigned by using:
  • 125.
    •or grading basedon relatives standards where a student’s grade reflects his or her level of achievement relative to the performance of other students in the class. In this system, the grade is assigned based on the average of test scores. Norm –referenced Grading-
  • 126.
    •whereby the teacheridentifies points or percentages for various tests and class activities depending on their performance. The total of these points will be the bases for the grade assigned to the student. Point or Percentage Grading System
  • 127.
    •where each studentagrees to work for a particular grade according to agreed-upon standards. Contact Grading System
  • 128.
    The following pointsprovide helpful reminders when preparing for and conducting parent-teacher conferences. 1. Make plans for the conference. Set the goals and objectives of the conference ahead of time. 2. Begin the conference in a positive manner. Starting the conference by making a positive statement about the student sets the tone for the meeting. 3. Present the student’s strong points before describing the areas needing improvement. It is helpful to present examples of the student’s work when discussing the student’s performance. 4. Encourage parents to participate and share information. Although as a teacher you are in charge of the conference, you must be willing to listen to parents and share information rather than “talk at” them. Conducting Parent-Teacher Conferences
  • 129.
    5. Plan acourse of action cooperatively. The discussion should lead to what steps can be taken by the teacher and the parent to help the student. 6. End the conference with a positive comment. At the end of the conference, thank the parents for coming and say something positive about the student, like “ Lucas has a good sense of humor and I enjoy having him in class.” 7. Use good human relation skills during the conference. Some of these skills can be summarized by following the do’s and don’ts.
  • 139.

Editor's Notes

  • #1 Introduction…
  • #29 Split half method Test-retest method Parallel or equivalent form