Validity and reliability

Is a term derived from the Latin word
validus, meaning strong.
In view of assessment, it is deemed valid if
it measures what it is supposed to.

Content-related evidence for validity pertains to the
extent to which the test covers the entire domain of
content.

 A test that appears to adequately measure the
learning outcomes and content is said to possess
face validity.
 Instructional validity is the extent to which as
assessment is systematically sensitive to the nature
of instruction offered.
 Table of Specifications (ToS) – it is a test
blueprint that identifies the content area and
describes the learning outcomes at each level of
domain.

Course title: Math
Grade level: V
Periods test is being used: 2
Date of test: August 8, 2014
Subject matter digest: Number and Number Sense
Type of test: Power, Speed, Partially speeded (Circle one)
Test time: 45 minutes
Test value: 100 points
Base number of test questions: 75
Constraints: Test time
Learning Objective
Item
Type
Revised Bloom's Taxonomy
Total
Remember
Understan
d
Apply Analyze Evaluate Create
No. Level Instructiona
l Time (in
minutes)
Q/P/% Q/P
1 Apply 95 16% 11/16 Matching 6(1) 5(2) 11/16
2
Understan
d
55 9% 7/10 MC 5(2) 5/10
⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞
10 Evaluate 40 7% 5/7 Essay 1(7) 1/7
Total 600 100% 75/100 11/12 23/31 16/34 4/10 3/6 1/7 58/100

Type of Test Questions Time Required to Answer
Alternate Response (True-false) 20 - 30 seconds
Modified True or False 30 - 45 seconds (Notar, et. al. 2004)
Sentence completion (one-word fill-in) 40 - 60 seconds
Multiple choice with four responses (lower level) 40 - 60 seconds
Multiple choice (higher level) 70 - 90 seconds
Matching Type (5 stems, 6 choices) 2 - 4 minutes
Short answer 2 - 4 minutes
Multiple choice (with calculations) 2 - 5 minutes
Word problems (simple arithmetic) 5 - 10 minutes
Short essays 15 - 20 minutes
Data analysis/graphing 15 - 25 minutes
Drawing models/labelling 20 - 30 minutes
Extended essays 35 - 50 minutes

Refers to the degree to which test scores agrees with an external
criterion.
It examines the relationship between an assessment and another
measure of the same trait.

There are three types of criteria:
Achievement test scores
Ratings, grades, and other
numerical judgments made by the
teacher
Career data

 Concurrent validity provides an estimate
of a student’s current performance relation
to a previously validated or established
measure.
 Predictive validity pertains to the power or
usefulness of test scores to predict future
performance.

Construct-related evidence of validity is an assessment of the
quality of the instrument used.
It measures the extent to which the assessment is a meaningful
measure of an unobservable trait or characteristics.

-Theoretical, Logical & Statistical

Validity occurs when measures of construct
that are in fact observe to be related.

Validity occurs when construct that are
unrelated are in reality observe not to be.

Integrates considerations of content, criteria, and
consequences into a construct framework for the empirical
testing of rational hypotheses about score meaning and
theoretically relevant relationships. (Merrick, 1989)

 Content aspect are parallel to content-related evidence
which calls for content relevance and
representativeness.
 Substantive aspects pertain to the theoretical
constructs and empirical evidences.
 Structural aspects assess how well the scoring
structure matches the construct domain.

 Generalizability aspects examine how score properties
and interpretations generalize to and across
populations groups, contexts and tasks.
 External aspects include convergent and discriminant
evidences taken from Multitrait-Multimethod studies.
 Consequential aspects pertain to the intended and
unintended effects of assessment on teaching and
learning.

Developing performance assessments
involves three steps: define the purpose,
choose the activity and develop criteria
for scoring.

1. The selected performance should reflect a valued activity.
2. The completion of performance assessments should
provide a valuable learning experiences.
3. The statement of goals and objectives should be clearly
aligned with the measurable outcomes of the
performance activity.
4. The task should not examine extraneous or unintended
variables.
5. Performance assessments should be fair and free from
bias.

1. Unclear test directions
2. Complicated vocabulary and sentence structure
3. Ambiguous statement
4. Inadequate time limits
5. Inappropriate level of difficulty of test items
6. Poorly constructed test items
7. Inappropriate test items for outcomes being measured
8. Short test
9. Improper arrangement of items
10. Identifiable pattern of answers

 Ask others to judge the clarity of what you
are assessing.
 Check to see if different ways of assessing
the same thing give the same result.
 Sample a sufficient number of examples of
what is being assessed.
 Prepare a detailed table of specifications.

 Ask others to judge the match between the
assessment items and the objectives of the
assessment.
 Compare groups known differ on what is
being assessed.
 Compare scores taken before to those taken
after instruction.
 Compare predicted consequences to actual
consequences.

 Compare scores on similar, but different traits.
 Provide adequate time to complete the
assessment.
 Ensure appropriate vocabulary, sentence
structure and item difficulty.
 Ask easy questions first.
 Use different methods to assess the same thing.
 Use only for intended purposes.

Talks about reproducibility and consistency in methods and
criteria

Internal & External reliability

A. Stability
B. Equivalence
C. Internal Consistency
D. Scorer or Rater Consistency
E. Decision Consistency

 Lengthen the assessment procedure by providing more
time, more questions and more observation whenever
practical.
 Broaden the scope of the procedure by assessing all the
significant aspects of the largest learning performance.
 Improve objectivity by using a systematic and more
formal procedure for scoring student performance. A
scoring scheme or rubric would prove useful.

 Use multiple markers by employing inter-rater
reliability.
 Combine results from several assessments
especially when making crucial educational
decisions.
 Provide sufficient time to students in completing
the assessment procedure.
 Teach students how to perform their best by
providing practice and training to students and
motivating them.

 Match the assessment difficulty to the
students’ ability levels by providing tasks
that are neither too easy nor too difficult,
and tailoring the assessment to each
student’s ability level when possible.
 Differentiate among students by selecting
assessment tasks that distinguish or
discriminate the best from the least able
students.

Validity and reliability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Validity and reliability

Similar to Validity and reliability (20)

Recently uploaded

Recently uploaded (20)

Validity and reliability