VALIDATION
-ALDERSON-
Testing and
Evaluation
Katherine Andrade
The importance of validity
Does the exam test what it is supposed to test?
Validity: refers to the appropriateness of a
given test in measuring what it is designed
to measure and accurately performing the
functions it is purported to perform.
If a test is not valid for the purpose for
which it is designed, then the scores
do not mean what they are believed to
mean.
Types of validity
• The more
evidence that
can be
gathered for
any ‘type’ of
validity, the
better.
• It is best to
validate a test
in as many
ways as
possible.
• The more
different ‘types’
of validity that
can be
established,
the better.
Rationalvalidation
• Logical
analysis of
the test’s
content.
• Sees if the
test
contains a
representa
-tive
sample of
the
relevant
language
skills.
Empiricalvalidation
• Depends
on
empirical
and
statistical
evidence.
• Sees if
students’
marks on
the test
are similar
to their
marks of
their
ability.
Constructvalidation
• Refers to
what the
test
scores
actually
mean.
• Relies on
subjective
judgments
and
empirical
data.
Test validity
Internal validity
• Relates to studies of the
perceived content of the
test and its perceived
effect.
External validity
• Comparing students’ test
scores with measures of
their ability (criterion)
assembled from outside
the test.
Internal Validity
Face validity:
surface/public
credibility
Content validity:
adequacy of the
content/judgment
Content Validity
• The Communicative Language Accountability (CLA)
Level of ability required of test takers in the areas of:
-Grammatical, contextual, illocutionary, sociolinguistic, and strategic
competence
• Test Methods Characteristics (TMC)
Related to test items and test passages:
-Test environment, test rubric, item type, and nature of test
input.
Test input:
-Complexity of language, rhetorical organization, degree of
contextualization, test topic, cultural bias, and pragmatic
characteristics.
Response
validity
Gather
introspective
data
Learners/
test takers
How individuals respond to
test items: behaviors and
thoughts
External validity
Predictive validity:
Correlate test scores
with criterion scores
after examinees
have had a chance
to perform what is
predicted by the test.
Concurrent
validity:
Comparison of test
scores with criterion
scores obtained at
the same time.
Criterion: teachers’ ratings
Self-assessment
CONSTRUCT VALIDITION
CONSTRUCT
Psychological construct, a theoretical conceptualization
about an aspect of human behavior that cannot be
measured or observed directly.
Ex: intelligence, achievement motivation, anxiety, attitude,
reading comprehension.
• The process of gathering evidence to support the content
that a given test indeed measures and determine the
meaning of scores from the test, to assure that the scores
mean what we expect them to mean (judgmental-empirical)
Another form of construct
validation
Compare test performance with biodata and other data
gathered from students at the time they took the test.
Intention: detect bias in the test
Biodata: gender, age, L1, number of years studying the
language, etc.
Confirmatory factor analysis (CFA)
Researchers predicts which tests or components will relate to which others and
how, and then carries out tests of goodness of fit of the predictions with the data.
Exploratory factor analysis (EFA)
Explores the data to try and make sense of the factors that emerge.
FACTOR ANALYSIS
Take a matrix of correlation coefficients and reduce the complexity of the matrix to
more manageable proportions.
Reliability & Validity
A test cannot be valid unless it is reliable.
• If a test does not measure something consistently, it
follows that it cannot always be measuring it accurately.
It is quite possible for a test to be reliable but invlaid.
• A test can consistently give the same results, although it is
not measuring what it is supposes to.
TYPES OF VALIDITY PROCEDURES FOR EVALUATION
INTERNAL VALIDITY
FACE VALIDITY Questionnaires to, interviews with candidates,
administrators and other users.
CONTENT VALIDITY -Compare test content with
specifications/syllabus
-Questionnaires to , interviews with experts:
teachers, subject specialists, applied linguists.
-Expert judges rate test items and texts according
to precise list of criteria.
RESPONSE VALIDITY Students introspect on their test-taking
procedures, concurrently/retrospectively.
EXTERNAL VALIDITY
CONCURRENT VALIDITY -Correlate students’ test scores with their scores
on other tests.
-Correlate students’ test scores with teachers’
rankings.
-Correlate students’ test scores with other
measures of ability: students/ teachers’ ratings
TYPES OF VALIDITY PROCEDURES FOR EVALUATION
EXTERNAL VALIDITY
PREDICTIVE VALIDITY -Correlate students’ test scores with their scores
on tests taken some time later.
-Correlate students’ test scores with success in
final exams.
-Correlate students’ test scores with other
measures of their ability taken some time later:
subject teachers’ assessments, language
teachers’ assessment.
-Correlate students’ scores with success of later
placement.
CONSTRUCT VALIDITY -Correlate each subtest with other subtests.
-Correlate each subtest with total test.
-Correlate each subtest with total minus self.
-Compare students’ test scores with students’
biodata and psychological characteristics.
-Multitrait-multimethod studies
-Factor analysis

Validation

  • 1.
  • 2.
    The importance ofvalidity Does the exam test what it is supposed to test? Validity: refers to the appropriateness of a given test in measuring what it is designed to measure and accurately performing the functions it is purported to perform. If a test is not valid for the purpose for which it is designed, then the scores do not mean what they are believed to mean.
  • 3.
    Types of validity •The more evidence that can be gathered for any ‘type’ of validity, the better. • It is best to validate a test in as many ways as possible. • The more different ‘types’ of validity that can be established, the better.
  • 4.
    Rationalvalidation • Logical analysis of thetest’s content. • Sees if the test contains a representa -tive sample of the relevant language skills. Empiricalvalidation • Depends on empirical and statistical evidence. • Sees if students’ marks on the test are similar to their marks of their ability. Constructvalidation • Refers to what the test scores actually mean. • Relies on subjective judgments and empirical data.
  • 5.
    Test validity Internal validity •Relates to studies of the perceived content of the test and its perceived effect. External validity • Comparing students’ test scores with measures of their ability (criterion) assembled from outside the test.
  • 6.
  • 7.
    Content Validity • TheCommunicative Language Accountability (CLA) Level of ability required of test takers in the areas of: -Grammatical, contextual, illocutionary, sociolinguistic, and strategic competence • Test Methods Characteristics (TMC) Related to test items and test passages: -Test environment, test rubric, item type, and nature of test input. Test input: -Complexity of language, rhetorical organization, degree of contextualization, test topic, cultural bias, and pragmatic characteristics.
  • 8.
  • 9.
    External validity Predictive validity: Correlatetest scores with criterion scores after examinees have had a chance to perform what is predicted by the test. Concurrent validity: Comparison of test scores with criterion scores obtained at the same time.
  • 10.
  • 11.
  • 12.
    CONSTRUCT VALIDITION CONSTRUCT Psychological construct,a theoretical conceptualization about an aspect of human behavior that cannot be measured or observed directly. Ex: intelligence, achievement motivation, anxiety, attitude, reading comprehension. • The process of gathering evidence to support the content that a given test indeed measures and determine the meaning of scores from the test, to assure that the scores mean what we expect them to mean (judgmental-empirical)
  • 13.
    Another form ofconstruct validation Compare test performance with biodata and other data gathered from students at the time they took the test. Intention: detect bias in the test Biodata: gender, age, L1, number of years studying the language, etc.
  • 14.
    Confirmatory factor analysis(CFA) Researchers predicts which tests or components will relate to which others and how, and then carries out tests of goodness of fit of the predictions with the data. Exploratory factor analysis (EFA) Explores the data to try and make sense of the factors that emerge. FACTOR ANALYSIS Take a matrix of correlation coefficients and reduce the complexity of the matrix to more manageable proportions.
  • 15.
    Reliability & Validity Atest cannot be valid unless it is reliable. • If a test does not measure something consistently, it follows that it cannot always be measuring it accurately. It is quite possible for a test to be reliable but invlaid. • A test can consistently give the same results, although it is not measuring what it is supposes to.
  • 16.
    TYPES OF VALIDITYPROCEDURES FOR EVALUATION INTERNAL VALIDITY FACE VALIDITY Questionnaires to, interviews with candidates, administrators and other users. CONTENT VALIDITY -Compare test content with specifications/syllabus -Questionnaires to , interviews with experts: teachers, subject specialists, applied linguists. -Expert judges rate test items and texts according to precise list of criteria. RESPONSE VALIDITY Students introspect on their test-taking procedures, concurrently/retrospectively. EXTERNAL VALIDITY CONCURRENT VALIDITY -Correlate students’ test scores with their scores on other tests. -Correlate students’ test scores with teachers’ rankings. -Correlate students’ test scores with other measures of ability: students/ teachers’ ratings
  • 17.
    TYPES OF VALIDITYPROCEDURES FOR EVALUATION EXTERNAL VALIDITY PREDICTIVE VALIDITY -Correlate students’ test scores with their scores on tests taken some time later. -Correlate students’ test scores with success in final exams. -Correlate students’ test scores with other measures of their ability taken some time later: subject teachers’ assessments, language teachers’ assessment. -Correlate students’ scores with success of later placement. CONSTRUCT VALIDITY -Correlate each subtest with other subtests. -Correlate each subtest with total test. -Correlate each subtest with total minus self. -Compare students’ test scores with students’ biodata and psychological characteristics. -Multitrait-multimethod studies -Factor analysis