2. VALIDITY:
It is a term derived from the Latin word validus, meaning strong.
In contrast to what some teachers believe, it is not a property of a test. It
pertains to the accuracy of the inferences of the teachers make about
students based on the information gathered from an assessment
(McMillan, 2007; Fives & DiDonato-Barnes, 2013)
3. This implies that the conclusions teachers
come up with in their evaluation of student
performance is valid if there are strong and
sound evidences of the extent of students’
learning.
4. An assessment is valid if it measures a students’ actual
knowledge and performance with respect to the
intended outcomes, and not something else.
It is represented of the area of learning or content of
the curricular aim being assessed (McMillan,
2007;Popham, 2011).
5. For instance, an assessment purportedly for measuring
arithmetic skills of grade 4 pupils is invalid if used for
grade 1 pupils because of issues on content (test
content evidence) and level of performance (response
process evidence).
6. A test that measures recall of
mathematical formula is invalid if
it is supposed to assess problem-
solving.
7. There are three sources of information
that can be used to establish validity:
Content-Related Evidence
Criterion-Related Evidence
Construct-Related Evidence
9. For example, If a grade 4 pupil was able
to correctly answer 80% of the items in
Science test about matter, the teacher
may infer that the pupil knows 80% of
the content area.
10. Face validity
A test that appears to adequately
measure the learning outcomes
and content is said to possess
face validity.
11. As the name suggests, it looks at the
superficial face value of the instrument.
It is based on the subjective opinion of the
one reviewing it.
Hence, it is considered non-systematic or
non-scientific.
12. A test that was prepared to assess the ability of
pupils to construct simple sentences with
correct subject-verb agreement has face
validity if the test looks like an adequate
measure of the cognitive skills.
13. Instructional Validity
The extent to which an assessment is systematically
sensitive to the nature of instruction offered.
Popham (2006,p.1) defined as the “degree to which
students’ performances on a test accurately reflect the
quality of instruction to promote students’ mastery of
what is being assessed.”
14. Yoon & Resnick (1998) asserted that an instructionally valid
test is one that registers differences in the amount and kind of
instruction to which students have been exposed.
They also described the degree of overlap between the
content tested and the content taught as opportunity to learn
which has an impact on test scores.
15. In the first grading, they will cover three economic issues:
Unemployment
Globalization
Sustainable development
16. Only two were discussed in class but assessment covered all
three issues. Although these were all identified in the curriculum
guide and may even be found in a textbook, the question
remains as to whether the topics were all taught or not.
Inclusion of items that were not taken up in class reduces
validity because students had no opportunity to learn the
knowledge or skill being assessed.
17. Table of Specifications (ToS)
It is prepared before developing the test. It is the best blueprint that identifies the
content area and describes the learning outcomes at each level of the cognitive
domain (Notar, et.al., 2004)
It is a tool used in conjunction with lesson and unit planning to help teachers make
genuine connections between planning, instruction, and assessment (Fives and
DiDonato-Barnes, 2013)
It assures teachers that they are testing students’ learning across a wide range of
content and readings as well as cognitive processes requiring higher order thinking.
18. Carey ( as cited by Notar, et. Al., 2004) specified six
elements in ToS development:
1. balance among the goals selected for the examination
2. balance among the levels of learning
3. the test format
4. the total items
5. the number of items for each goal and level of learning
6. the enabling skills to be selected from each goal framework
19. Meanwhile, determining the number of items for each topic in the ToS
depends on the instructional time. This means that the topics that
consumed longer instructional time with more teaching-learning
activities should be given more emphasis.
Test items that demand higher order thinking skills obviously require
more time to answer, whereas simple recall items entail the least
amount.
Nitko & Brookhart (2011) gives the average response time for each
assessment task, as seen in Table below.
20. Table 4.1: Time Requirement for Certain
Assessment Tasks
Type of Test Questions Time Required to Answer
Modified Response (True-false) 20-30 seconds
Modified True or false 30-45 seconds(Notar, et.al., 2004)
Sentence completion (one-word fill-in) 40-60 seconds
Multiple choice with four responses (lower level) 40-60 seconds
Multiple choice (higher level) 70- 90 seconds
Matching type (5 stems, 6 choices) 2-4 minutes
Short answer 2-4 minutes
Multiple choice (with calculation) 2-5 minutes
Word problems (simple arithmetic) 5-10 minutes
Short essays 15-20 minutes
Data analysis/graphing 15-25 minutes
Drawing models/labeling 20-30 minutes
Extended essays 35-50 minutes
21. Validity suffers if the test is too short to sufficiently measure
behavior and cover the content.
Adding more items to the test may increase its validity.
However, an excessively long tests that may be taxing to the
students.
Regardless of the trade-off, teachers must construct tests that
students can finish within a reasonable time.
22. B. Criterion-Related Evidence
It refers to the degree to which, test scores
agree with an external criterion. As such, it is
related to external validity. It examines the
relationship between an assessment and
another measure of the same trait (McMillan,
2007).
23. Three types of criteria (Nitko & Brookhart,
2011)
1. Achievement test scores
2. Ratings, grades and other
numerical judgments made by the
teacher
3. Career data
24. Types of Criterion-Related Evidence
1.Concurrent Validity- provides an estimate of a
student’s current performance in relation to a
previously validated or established measure.
2.Predictive Validity- pertains to the power or
usefulness of test scores to predict future
performance
25. C. Construct- Related Evidence
A construct is an individual characteristic that explains some
aspect of behavior (Miller, Linn & Gronlund, 2009).
Construct- related evidence of validity is an assessment of the
quality of the instrument used.
It measures the extent to which the assessment is a
meaningful measure of an unobservable trait or characteristic
?(McMillan, 2007).
27. Unified Concept of Validity
In 1989, Messick proposed a unified concept of validity based on an
expanded theory of construct validity which addresses score meaning
and social values in test interpretation and test use.
His concept of unified validity “ integrates considerations of content,
criteria, and consequences into a construct framework for the empirical
test9ing of rational hypotheses about score meaning and theoretically
relevant relationship” (Messick, 1995,p.741)
28. Validity of Assessment Methods
1. The selected performance should reflect a valued activity.
2. The completion of performance assessment should provide a valuable
learning experience.
3. The statement of goals and objectives should be clearly aligned with
the measurable outcomes of the performance activity.
4. The task should not examine extraneous or unintended variables.
5. Performance assessments should be fair and free from bias.
Moskall (2003) Laid down 5 Recommendations. These are intrinsically associated to the validity of
the assessment.
29. Threats to Validity
Miller, Linn & Gronlund (2009) identified ten factors that affect validity of assessment results.
1. Unclear test directions
2. Complicated vocabulary and sentence structure
3. Ambiguous statement
4. Inadequate time limits
5. Inappropriate level of difficulty of test items
6. Poorly constructed test items
7. Inappropriate test items for outcomes being measured
8. Short test
9. Improper arrangement of items
10.Identifiable pattern of answers