4. The concept of validity has
historically seen a variety
of iterations that involved
“packing” different aspects
into the concept and
subsequently “unpacking” some
of them.
5. Points of broad consensus
Validity if the most fundamental
consideration in the evaluation
of the appropriateness of claims
about, and uses and interpretations
of assessment results.
Validity is a matter of degree
rather than all or none.
SICI Conference 2010
North Rhine-Westphalia
Quality Assurance in the Work of “Inspectors”
6. Main controversial aspect
…empirical evidence and
theoretical rationales…
Validity is “an integrated evaluative
judgment of the degree to which
empirical evidence and theoretical
rationales support the adequacy and
appropriateness of inferences and
actions based on test scores or other
modes of assessment.”
Messick, S. (1989). Validity. In R. Linn (Ed.),
Educational Measurement (3rd ed., pp.13-103).
Washington, DC: American Council on
Education/Macmillan.
7. Broad, but not universal agreement
(for a dissenting viewpoint, Lissitz &
Samuelsen, 2007)
Karen Samuelsen,
Assistant Professor in the Department of
Educational Psychology and Instructional
Technology.
Robert W. Lissitz
Professor of Education in the College of Education
at the University of Maryland and Director of the
Maryland Assessment Research Center for
Education Success (MARCES).
8. Broad, but not universal agreement
(for a dissenting viewpoint, Lissitz &
Samuelsen, 2007)
It is the uses and interpretations of an
assessment result, i.e. the inferences,
rather than the assessment result itself
that is validated.
Validity may be relatively high for one
use of assessment results by quite low for
another use or interpretation
10. According to Angoff (1988),
theoretical conceptions of
validity and validation
practices have change
appreciably over the last 60
years largely because of
Messick’s many contributions
to our contemporary
conception of validity.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
11. 1951 Cureton , the essential
feature of validity was “how
well a test does the job it
was employed to do”
(p.621)
1954 American
Psychological Association
(APA) listed four distinct
types of validity
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
12. Types of Validity
1. Construct Validity refers to
how well a particular test
can be show to assess the
construct that it is said to
measure.
2. Content Validity refers to
how well test scores
adequately represent the
content domain that
these scores are said to
measure. Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
13. 3. Predictive Validity is the
degree to which the
predictions made by a test
are confirmed by the later
behavior of the tested
individuals.
4. Concurrent Validity is the
extent to which individuals
scores on a new test
correspond to their scores on
an established test of the
same construct that is
determined shortly before of
after the new test.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
14. 1966 APA, Standards for
Educational and
Psychological Tests and
Manuals, criterion-related
validity and predictive
validity were collapsed into
criterion-related validity.
1980 Guion, three aspects
of validity referred to as
“Holy Trinity.”
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
15. 1996 Hubley & Zumbo, the
Holy Trinity referred by
Guion, means that at least
one type of validity is
needed but one has three
chances to get it.
1957 Loevinger, argued
that construct validity was
the whole of validity,
anticipating a shift away
from multiple types to a
single type of validity.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
16. 1988 Angoff, validity was
viewed as a property of
tests, but the focus later
shifted to the validity of a
test in a specific context or
application, such as the
workplace.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
17. 1974 Standards for
Educational and
Psychological Tests (APA,
American Educational
Research Association and
National Council on
Measurement in Education)
shifted the focus of content
validity from a
representative sample of
content knowledge to a
representative sample of
behaviors in a specific
context.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
18. 1989 Messick professional
standard s were established
for a number of applied
testing areas such as
“counseling, licensure,
certification and program
evaluation
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
19. 1985 Standards (APA,
American Educational
Research Association and
National Council on
Measurement in Education
validity was redefined as
the “appropriateness,
meaningfulness, and
usefulness of the specific
inferences made from test
scores.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
20. 1985 the unintended social
consequences of the use of
tests – for example, bias
and adverse impact---were
also included in the
Standards (Messick 1989).
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
21. Validation Practice
is “disciplined inquiry” (Hubley & Zumbo, 1996) that started
out historically with calculation of measures of a single
aspect of validity (content validity or predictive validity)
Building an argument based on multiple sources of
evidence (e.g. statistical calculations, qualitative data,
reflections on one’s own values and those of others, and
an analysis of unintended consequences)
These calculations are based on logical or mathematical
models that date from the early 20th century (Crocker &
Algina, 1986)
Messick (1989) describes these procedures as
fragmented, unitary approaches to validation
22. Hubley and Zumbo (1996) describe them as
“scanty, disconnected bits of evidence…to
make a two-point decision about the validity of
a test”
Cronbach (1982) recommended a more
comprehensive, argument-based approach to
validation that considered multiple and diverse
sources of evidence
Validation practice has also evolved from a
fragmented approach to a comprehensive,
unified approach in which multiple sources of
data are used to support an argument
24. What is Validity?
Validity is “an integrated evaluative judgment
of the degree to which empirical evidence and
theoretical rationales support the adequacy
and appropriateness of inferences and actions
based on test scores or other modes of
assessment” (Messick, 1989)
Validity is a unified concept, and validation is a
scientific activity based on the collection of
multiple and diverse type of evidence (Messick,
1989; Zumbo, 1998, 2007)
25. Messick’s Conception of Validity
Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
26. Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
In terms of functions
(interpretation vs. use)
Basis for justifying
validity
(evidential basis vs.
consequential
basis)
27. Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
refer to
traditional
scientific
evidence traditional
psychometrics
relevance to
learners and
to society,
and to cost
benefit
28. Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
Consequential basis is not
about poor test practice
rather, the consequences of
testing refer to the
unanticipated or
unintended consequences
of legitimate test
interpretation and use
29. Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
refers to underlying
values, including
language or
rhetoric, theory, and
ideology
30. Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
32. The evidential basis of Messick’s
framework contains two facets
1. Traditional psychometric evidence
2. The evidence for relevance in applied settings such
as the workplace as well as utility or cost-benefit.
33. Evidential Basis for Test
Inferences and Use
The evidential basis for test interpretation is an
appraisal of the scientific evidence for construct
validity.
A construct is a “definition of skills and
knowledge included in the domain to be
measured by a tool such as a test” (Reckase,
1998b)
The four traditional types of validity are included
in this first facet.
34. Evidential Basis for Test
Inferences and Use
The evidential basis for test use includes measures of
predictive validity (e.g., correlations with other tests
of behaviors) as well as ultility (i.e., a cost-benefit
analysis)
Predictive validity coefficients re measures of
behavior to be predicted from the test (e.g., a
correlation between scores on a road test and a
written driver qualification test)
Cost- benefit refers to an analysis of costs compared
with benefits, which in education are often difficult
to quantify.
35. The consequential basis of
Messick’s framework contains
two facets
1. Value Implications (VI)
1. (CV + RU + VI)
2. Social Consequences
1. (CV + RU + VI + UC)
37. Value implications requires an
investigation of three components
Rhetoric or value -laden language and
terminology
Value-laden language that conveys both a
concept and an opinion of concept
Underlying theories
Underlying assumptions or logic of how a program
is supposed to work (Chen, 1990)
Underlying ideologies
A complex mix of shared values and beliefs that
provide a framework for interpreting the world
(Messick, 1989)
38. Rhetoric
Includes language that is discriminatory,
exaggerated, or over blown, such as derogatory
language used to refer to the homeless.
In validation practice, the rhetoric surrounding
standardized tests should be critically evaluated
to determine whether these terms are accurate
description of knowledge and skills said to be
assessed by a test (Messick, 1989)
39. Theory
The second component of the value
implications category is an appraisal of the
theory underlying the test. A theory connotes
a body of knowledge that organizes,
categorizes, describes, predicts, explains and
otherwise aids in understanding phenomenon
and organizing and directing thoughts,
observations and actions (Sidan& Sechrest,
1999)
40. Ideology
The third component of value implications is an
appraisal of the “broader ideologies that give
theories their perspective and purpose (Messick,
1989)
An ideology is a “complex configuration of shared
values, affects and beliefs that provides, among
other things, an existential framework for
interpreting the world.” (Messick, 1989)
41. Values implications challenge
us to reflect upon:
a. The personal or social values suggested by our
interest in the construct and the name/label
selected to represent that construct
b. The personal or social values reflected by the
theory underlying the construct and its
measurement
c. The values reflected by the broader social
ideologies that impacted the development of the
identified theory
Messick 1980, 1989
44. Remember that construct
validity, relevance and utility,
value implications and social
consequences all work
together and impact one
another in test interpretation
and use.