Validity and Reliability
Joanna Ochoa V.
Validity
 To assert that a test has construct validity; empirical
evidence is needed.
 Subordinate forms of validity.
A valid test
measures
accurately what
it is intended to
measure.
Content
validity
Criterion-
related
validity
Content Validity
 The content of the test constitutes a respresentative sample of
the skills it is supossed to measure = Content Validity
 Specification of the skills or structures that the test is meant to
cover.
The test must include a
proper sample of the
relevant structures.
ATTENTION!
Content validation should
be carried out while a test
is being developed.
Criterion-related validity
•The test and the criterion are administer at the same time.
•Example: Oral exam. Long vs. Short version of the exam.
Random sampling.
•Levels of agreement = correlation coefficient.
•Perfect agreement = coefficient of 1. Lack of agreement =
coefficient of zero.
Concurrent
validity
•The degree to which a test can predict candidates’ future
performance.
•Example: Profiency test to predict a student’s ability to cope
with a graduate course at a British university. Criterion
measure: student’s English perceived by his supervisor or the
outcome of the course.
Predictive
validity
Validity in scoring
 Items and the way in which they are scored
must valid.
 Example: A reading test.
(Should we consider grammar and spelling
mistakes in the responses?)
A test is said to have face
validity…..
For example: A test to
measure pronunciation
ability.
How to make tests more valid
The scoring must
be related to what
is being testing.
Reliability!!
Whenever
feasible, use
direct
testing.
Write explicit
specifications
for the test.
Reliability
 We have to: construct, administer and score items in
a way that we will obtain similar results in different
situations.
The reliability coefficient
 To quantify the realibility of a test.
Ideal reliability
coefficient = 1 Would always
give the same
results.
Reliability
coefficient of
zero
Sets of results
unconnected
with each other.
It is required
to have two
sets of scores
to be
compared.
A group of students
take the same test
twice.
TEST-RETEST
METHOD
1. Too soon
(memorization
of the
answers)
2. Too late
(forgetting)
solution
Alternate forms
methods
Split half method =
only one
administration of
one test
Scorer reliability
 If the scoring of a test is not reliable, then the test results cannot
be reliable either.
 For example:
 The scorer reliability coefficient on a composition writing
test = .92
 The reliability coefficient for the test = .84
Variability in the performance of individual candidates
accounted for the differece between the two coefficients.
How to make tests more reliable
Take enough
samples of
behaviour
Exclude items
(weaker vs.
Stronger
students)
Do not allow
too much
freedom
Write
unambiguous
items
Provide clear
and explicit
instructions
Ensure that
tests are well
laid out and
perfectly
legiable
Make
candidates
familiar with
format and
testing
techniques
Provide
uniform
conditions of
administration
1. More items = more reliability
2. Too easy and too difficult items
3. Choice of questions
4. Unclear meaning of the items
5. The supposition that the students all understand the instructions
6. Institutional tests are badly typed
7. Unfamiliar aspects of the test
8. Precautions must be taken
Ways of obtaining scorer reliability
Use items that
permit scoring
which is as
objective as
possible
Make comparisons
between
candidates as
direct as possible
Provide a detailed
scoring key
Train scorers
Agree acceptable
responses and
appropiate scorers
at outset of scoring
Identify candidates
by number, not
name
Employ multiple
independent
scoring
Reliability and validity
 To be valid a test must provide consistently accurate
measurements.
 A reliable test may not be valid at all.
 Example: writing test
 To make tests reliable, we must be wary of reducing
their validity.

Validity and Reliability

  • 1.
  • 2.
    Validity  To assertthat a test has construct validity; empirical evidence is needed.  Subordinate forms of validity. A valid test measures accurately what it is intended to measure. Content validity Criterion- related validity
  • 3.
    Content Validity  Thecontent of the test constitutes a respresentative sample of the skills it is supossed to measure = Content Validity  Specification of the skills or structures that the test is meant to cover. The test must include a proper sample of the relevant structures. ATTENTION! Content validation should be carried out while a test is being developed.
  • 4.
    Criterion-related validity •The testand the criterion are administer at the same time. •Example: Oral exam. Long vs. Short version of the exam. Random sampling. •Levels of agreement = correlation coefficient. •Perfect agreement = coefficient of 1. Lack of agreement = coefficient of zero. Concurrent validity •The degree to which a test can predict candidates’ future performance. •Example: Profiency test to predict a student’s ability to cope with a graduate course at a British university. Criterion measure: student’s English perceived by his supervisor or the outcome of the course. Predictive validity
  • 5.
    Validity in scoring Items and the way in which they are scored must valid.  Example: A reading test. (Should we consider grammar and spelling mistakes in the responses?)
  • 6.
    A test issaid to have face validity….. For example: A test to measure pronunciation ability.
  • 7.
    How to maketests more valid The scoring must be related to what is being testing. Reliability!! Whenever feasible, use direct testing. Write explicit specifications for the test.
  • 8.
    Reliability  We haveto: construct, administer and score items in a way that we will obtain similar results in different situations.
  • 9.
    The reliability coefficient To quantify the realibility of a test. Ideal reliability coefficient = 1 Would always give the same results. Reliability coefficient of zero Sets of results unconnected with each other.
  • 10.
    It is required tohave two sets of scores to be compared. A group of students take the same test twice. TEST-RETEST METHOD 1. Too soon (memorization of the answers) 2. Too late (forgetting) solution Alternate forms methods Split half method = only one administration of one test
  • 11.
    Scorer reliability  Ifthe scoring of a test is not reliable, then the test results cannot be reliable either.  For example:  The scorer reliability coefficient on a composition writing test = .92  The reliability coefficient for the test = .84 Variability in the performance of individual candidates accounted for the differece between the two coefficients.
  • 12.
    How to maketests more reliable Take enough samples of behaviour Exclude items (weaker vs. Stronger students) Do not allow too much freedom Write unambiguous items Provide clear and explicit instructions Ensure that tests are well laid out and perfectly legiable Make candidates familiar with format and testing techniques Provide uniform conditions of administration 1. More items = more reliability 2. Too easy and too difficult items 3. Choice of questions 4. Unclear meaning of the items 5. The supposition that the students all understand the instructions 6. Institutional tests are badly typed 7. Unfamiliar aspects of the test 8. Precautions must be taken
  • 14.
    Ways of obtainingscorer reliability Use items that permit scoring which is as objective as possible Make comparisons between candidates as direct as possible Provide a detailed scoring key Train scorers Agree acceptable responses and appropiate scorers at outset of scoring Identify candidates by number, not name Employ multiple independent scoring
  • 15.
    Reliability and validity To be valid a test must provide consistently accurate measurements.  A reliable test may not be valid at all.  Example: writing test  To make tests reliable, we must be wary of reducing their validity.