SlideShare a Scribd company logo
1 of 542
https://doi.org/10.1177/0091026020935582
Public Personnel Management
2021, Vol. 50(2) 232 –257
© The Author(s) 2020
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/0091026020935582
journals.sagepub.com/home/ppm
Article
A Critical Examination of
Content Validity Evidence
and Personality Testing for
Employee Selection
David M. Fisher1 , Christopher R. Milane2,
Sarah Sullivan3, and Robert P. Tett1
Abstract
Prominent standards/guidelines concerning test validation
provide contradictory
information about whether content-based evidence should be
used as a means
of validating personality test inferences for employee selection.
This unresolved
discrepancy is problematic considering the prevalence of
personality testing, the
importance of gathering sound validity evidence, and the
deference given to these
standards/guidelines in contemporary employee selection
practice. As a consequence,
test users and practitioners are likely to be reticent or uncertain
about gathering
content-based evidence for personality measures, which, in turn,
may cause such
evidence to be underutilized when personality testing is of
interest. The current
investigation critically examines whether (and how) content
validity evidence should
be used for measures of personality in relation to employee
selection. The ensuing
discussion, which is especially relevant in highly litigious
contexts such as personnel
selection in the public sector, sheds new light on test validation
practices.
Keywords
test validation, content validity, personality testing, employee
selection
An essential consideration when using any test or measurement
tool for employee
selection is gathering and evaluating relevant validity evidence.
In the contemporary
employee selection context, validity evidence is generally
understood to mean
1The University of Tulsa, OK, USA
2Qualtrics, Provo, UT, USA
3Rice University, Houston, TX, USA
Corresponding Author:
David M. Fisher, Assistant Professor of Psychology, The
University of Tulsa, 800 S. Tucker Drive, Tulsa,
OK 74104, USA.
Email: [email protected]
935582PPMXXX10.1177/0091026020935582Public Personnel
ManagementFisher et al.
research-article2020
https://us.sagepub.com/en-us/journals-permissions
https://journals.sagepub.com/home/ppm
mailto:[email protected]
Fisher et al. 233
evidence that substantiates inferences made from test scores.
Various sources provide
standards and guidelines for gathering validity evidence,
including the Uniform
Guidelines on Employee Selection Procedures (Equal
Employment Opportunity
Commission, Civil Service Commission, Department of Labor,
& Department of
Justice, 1978; hereafter, Uniform Guidelines, 1978), the
Principles for the Validation
and Use of Personnel Selection Procedures (Society for
Industrial and Organizational
Psychology [SIOP], 2003; hereafter, SIOP Principles, 2003),
and Standards for
Educational and Psychological Testing (American Educational
Research Association,
American Psychological Association, & National Council on
Measurement in
Education, 1999/2014; hereafter, Joint Standards, 1999/2014),
as well as the academic
literature (e.g., Aguinis et al., 2001). Having such a variety of
sources available is
beneficial, but challenges arise when the various sources
provide ambiguous or con-
tradictory information. Such ambiguity can be particularly
troublesome in highly liti-
gious contexts, such as the public sector, where adherence to
regulations governing
selection is of paramount importance.
The current investigation attempts to shed light on one such
area of ambiguity—
whether evidence based on test content should be used as a
means of validating per-
sonality test inferences for employee selection. Rothstein and
Goffin (2006) noted, “It
has been estimated that personality testing is a $400 million
industry in the United
States and it is growing at an average of 10% a year” (Hsu,
2004, p. 156). Given this
reality, it is important to carefully consider appropriate
validation procedures for such
measures. However, the various sources mentioned above
present conflicting direc-
tions on this issue, specifically in relation to content-based
validity evidence. On one
hand, evidence based on test content is one of five potential
sources of validity evi-
dence described by the Joint Standards (1999/2014), which is
similarly endorsed by
the SIOP Principles (2003). This form of evidence has further
been suggested by some
to be particularly relevant to personality tests (e.g., Murphy et
al., 2009; O’Neill et al.,
2009), and especially under challenging validation conditions,
such as small sample
sizes, test security concerns, or lack of a reliable criterion
measure (Landy, 1986; Tan,
2009; Thornton, 2009). On the other hand, the Uniform
Guidelines (1978) assert that
“. . . a content strategy is not appropriate for demonstrating the
validity of selection
procedures which purport to measure traits or constructs, such
as intelligence, apti-
tude, personality, commonsense, judgment, leadership, and
spatial ability [emphasis
added]” (Section 14.C.1). Other sources similarly convey
reticence toward content
validity for measures of traits or constructs (e.g., Goldstein et
al., 1993; Lawshe, 1985;
Wollack, 1976). Thus, there appears to be conflicting guidance
on the use of content
validity evidence to support personality measures.
In light of this discrepancy, the current investigation offers a
critical examination of
content validity evidence and personality testing for employee
selection. Such an
investigation is valuable for several reasons. First, an important
consequence of the
inconsistency noted above is that content-based evidence may
be overlooked as a
valuable approach to validation when personality testing is of
interest. Evidence for
this can be seen in the fact that other approaches such as
criterion-related validation
are sometimes viewed as the only option for personality
measures (Biddle, 2011).
234 Public Personnel Management 50(2)
Similarly, prominent writings on personality testing in the
workplace (e.g., Morgeson
et al., 2007b; O’Neill et al., 2013; Ones et al., 2007; Rothstein
& Goffin, 2006; Tett &
Christiansen, 2007) have tended to ignore the applicability of
content validation to
personality measures. Furthermore, considering the deference
given to the various
standards and guidelines in contemporary employee selection
practice (Schmit &
Ryan, 2013), those concerned about strict adherence to such
standards/guidelines are
likely to be reticent or uncertain about gathering content-based
evidence for personal-
ity measures—in no small part due to conflicting or ambiguous
recommendations. The
above circumstances tend to relegate content-based evidence to
be seen as less desir-
able or otherwise viewed as an afterthought. In turn, this
represents a missed opportu-
nity for valuable insight into the use of personality measures.
Second, the neglect or underutilization of content-based
evidence is, in many ways,
antithetical to the broader goal of developing a theory-based
and scientifically
grounded understanding of tests and measures used for
employee selection (Binning
& Barrett, 1989). For example, as elaborated below, there are
various situations in
which content-based evidence may be more optimal than
criterion-based evidence, not
the least of which includes an insufficient sample size for a
criterion-based investiga-
tion (McDaniel et al., 2011). Similarly, an exclusive focus on
empirical prediction
ignores the importance of underlying theory, which is critical
for advancing employee
selection research. Of relevance, the examination of content
validity evidence forces
one to carefully consider the correspondence between selection
measures and underly-
ing construct domains, as informed by theoretical
considerations. Evidence for the
value of content validity can also be found in trait activation
theory (Tett & Burnett,
2003; Tett et al., 2013), which highlights the importance of a
clear conceptual linkage
between the content of personality traits/constructs and the job
domain in question.
Thus, content validity evidence should be of primary
importance for personality test
validation.
Third, it is useful to acknowledge that the prohibition against
content validity evi-
dence in relation to personality measures noted in the Uniform
Guidelines (1978)
appears to be at odds with contemporary thinking on validation
(Joint Standards,
1999/2014). The focal passage quoted above from the Uniform
Guidelines has been
described as being “. . . as destructive to the interface of
psychological theory and
practice as any that might have been conceived” (Landy, 1986,
p. 1189). Although
there have been well-argued critiques of the Uniform Guidelines
(e.g., McDaniel
et al., 2011), in addition to thoughtful elaboration of issues
surrounding content valid-
ity (e.g., Binning & LeBreton, 2009), a direct attempt at
resolving the noted contradic-
tion remains conspicuously absent from the literature. This
contradiction, in
conjunction with the absence of a satisfactory explanation, is
problematic given the
importance of gathering sound validity evidence pertaining to
psychological test use.
As such, a critical examination of this issue is warranted.
Finally, the findings of the current investigation are likely to
have broad applicabil-
ity. Namely, although focused on personality testing, the
discussion below is relevant
to measures of other commonly assessed attributes classified
under the Uniform
Guidelines (1978) as “traits or constructs” (Section 14.C.1).
Similarly, while we
Fisher et al. 235
address the Uniform Guidelines—which some argue are
outdated (e.g., Jeanneret &
Zedeck, 2010) and further limited by their applicability to
employee selection in the
United States—we believe the value of this discussion extends
far beyond these guide-
lines. It is important to carefully consider appropriate validation
strategies in all cir-
cumstances where psychological tests are used. Hence, the
discussion presented herein
is likely to be of relevance for content-based validation efforts
in other areas beyond
employee selection in the United States (e.g., educational
testing, clinical practice,
international employee selection efforts).
Following a brief overview of validity and content-based
validation, our investiga-
tion is organized around three fundamental questions. Question
1 asks whether current
standards and guidelines support the use of content validity
evidence for validation of
personality test inferences in an employee selection context.
Based on the concerns
raised above, a preliminary answer to this question is that it is
unclear. Question 2 then
asks about the underlying bases of the inconsistency. Building
on the identified causes
of disagreement, Question 3 asks how one might actually gather
evidence based on test
content for personality measures. Ultimately, our goal in this
effort is to reduce ambigu-
ity and promote clarity regarding content-based validation of
personality measures.
Overview of Validity and Evidence Based on Test
Content
Broadly speaking, validity in measurement refers to how well an
assessment device
measures what it is supposed to (Schmitt, 2006). The focus of
measurement is typi-
cally described as a construct (Joint Standards, 1999/2014),
which represents a latent
attribute on which individuals can vary (e.g., cognitive ability,
diligence, interpersonal
skill, knowledge, the capacity to complete a given task).
Importantly, a person’s level
or relative standing with regard to the construct of interest is
inferred from the test
scores (SIOP Principles, 2003). As such, the notion of validity
addresses the simple yet
fundamental issue of whether test scores actually reflect the
attribute or construct that
the test is intended to measure. However, this succinct
characterization of validity also
belies the true complexity of this topic (Furr & Bacharach,
2014). Two particular com-
plexities bear discussion in light of our current aims.
First, contemporary thinking holds that validity is not the
property of a test per se,
but rather of the inferences made from test scores (Binning &
Barrett, 1989; Furr &
Bacharach, 2014; Joint Standards, 1999/2014; Landy, 1986;
SIOP Principles, 2003).
The value of this approach can be seen when the same test is
used for two different
purposes—for example, when an interpersonal skills test
developed for the selection
of sales personnel is used for hiring both sales representatives
and accountants.
Notably, the test itself does not change, but the inferences made
from the test scores
regarding the job performance potential of the applicants may
be more or less valid
given the focal job in question. In accord with this perspective,
the Joint Standards
(1999/2014) describe validity as “the degree to which evidence
and theory support the
interpretations of test scores for proposed uses of the test” (p.
11). Inherent in this view
is the idea that validity is difficult to fully assess without a
clear explication of the
236 Public Personnel Management 50(2)
intended interpretation of scores and corresponding purpose of
testing. Thus, substan-
tiating relevant inferences in terms of the intended purpose of
the test is of primary
concern in the contemporary view of validity.
Second, validity has come to be understood as a unitary
concept, as compared with
the dated notion of distinct types of validity (Binning & Barrett,
1989; Furr &
Bacharach, 2014; Joint Standards, 1999/2014; Landy, 1986;
SIOP Principles, 2003).
The older trinitarian view (Guion, 1980) posits three different
types of validity, includ-
ing criterion-related, content, and construct validity, each
relevant for different test
applications (Lawshe, 1985). By contrast, the more recent
unitarian perspective
(Landy, 1986) emphasizes that all measurement attempts are
ultimately about assess-
ing a target construct, and validation entails the collection of
evidence to support the
argument that test scores actually reflect the construct (and that
the construct is rele-
vant to the intended use of the test). Consistent with this latter
perspective, the Joint
Standards (1999/2014) espouse a unitary view of validity and
identify five sources of
validity evidence, including evidence based on test content,
response processes, inter-
nal structure, relations to other variables, and consequences of
testing. In summary, the
contemporary view of validity suggests that measurement
efforts ultimately implicate
constructs, and different sources of evidence can be marshaled
to substantiate the
validity of inferences based on test scores.
Drawing on the above discussion, evidence based on test
content represents one of
several potential sources of evidence for validity judgments.
The collection of content-
based evidence has become well-established as an important and
viable validation
strategy, as can be seen in the common discussion and
endorsement of content validity
in the academic literature (e.g., Aguinis et al., 2001; Binning &
Barrett, 1989; Furr &
Bacharach, 2014; Haynes et al., 1995; Landy, 1986) as well as
in legal, professional,
and technical standards or guidelines (e.g., Joint Standards,
1999/2014; SIOP
Principles, 2003; Uniform Guidelines, 1978). The specific
manner in which evidence
based on test content can substantiate the validity of test score
inferences is via an
informed and judicious examination of the match between the
content of an assess-
ment tool (e.g., test instructions, item wording, response
format) and the target con-
struct in light of the assessment purpose (Haynes et al., 1995).
For the sake of simplicity
and ease of exposition, throughout this article, we use various
terms interchangeably
to represent the concept of evidence based on test content, such
as content validity
evidence, content validation strategy, content-based strategy, or
simply content valid-
ity. However, each reference to this concept is intended to
reflect contemporary think-
ing regarding validity as described above—specifically, content
validity evidence is
not a separate “type” of validity but rather, a category of
evidence that can be used to
substantiate the validity of inferences regarding test scores.
Do Current Standards Support Content Validity for
Personality?
Having introduced the concepts of validity and evidence based
on test content, we now
turn to our primary purpose of discussing whether a content
validation strategy should
Fisher et al. 237
be used as a means of validating personality test inferences for
employee selection
purposes. In doing so, a preliminary question becomes whether
current standards and
guidelines support this practice. The following four
sources/areas are considered: (a)
the Uniform Guidelines (1978), (b) the SIOP Principles (2003),
(c) the Joint Standards
(1999/2014), and (c) a general review of relevant academic
literature. A summary of
information derived from these sources is shown in Table 1.
The Uniform Guidelines (1978)
The Uniform Guidelines (1978) are federally endorsed standards
pertaining to
employee selection procedures, which were jointly developed by
the Equal
Employment Opportunity Commission, the Civil Service
Commission, the Department
of Labor, and the Department of Justice in the United States.
Regarding content vali-
dation, the guidelines state that,
Evidence of the validity of a test or other selection procedure by
a content validity study
should consist of data showing that the content of the selection
procedure is representative
of important aspects of performance on the job for which the
candidates are to be
evaluated. (Section 5.B)
The guidelines go on to describe specific technical standards
and requirements for
content validity studies. For example, a content validity study
should include a review
of information about the job under consideration (Section 14.A;
Section 14.C.2).
Furthermore, when the selection procedure focuses on work
tasks or behaviors, it must
be shown that the selection procedure includes a representative
sample of on-the-job
behaviors or work products (Section 14.C.1; Section 14.C.4).
Conversely, under cer-
tain circumstances, the guidelines also permit content validation
where the selection
procedure focuses on worker requirements or attributes,
including knowledge, skills,
or abilities (KSAs). In such cases, beyond showing that the
selection procedure reflects
a representative sample of the implicated KSA, it must
additionally be documented
that the KSA is needed to perform important work tasks
(Section 14.C.1; Section
14.C.4), and the KSA must be operationally defined in terms of
observable work
behaviors (Section 14.C.4).
The above notwithstanding, the Uniform Guidelines (1978)
explicitly prohibit con-
tent validity for tests focusing on traits or constructs, including
personality (Section
14.C.1). The logic underlying this restriction appears to be
based on the seemingly
reasonable notion that content-based validation becomes
increasingly difficult as the
focus of the selection test is farther removed from actual work
behaviors (Section
14.C.4; Landy, 1986; Lawshe, 1985). This logic was confirmed
in a subsequent
“Questions and Answers” document, where it is stated that,
The Guidelines emphasize the importance of a close
approximation between the content of
the selection procedure and the observable behaviors or
products of the job, so as to minimize
the inferential leap between performance on the selection
procedure and job performance
[emphasis added]. (See
http://www.uniformguidelines.com/questionandanswers.html)
http://www.uniformguidelines.com/questionandanswers.html
238 Public Personnel Management 50(2)
Table 1. Review of Various Sources Regarding Content Validity
and Personality Testing.
Source Description of content validity Position on personality
measures
Uniform
Guidelines
(1978)
“Evidence of the validity of
a test or other selection
procedure by a content
validity study should consist
of data showing that the
content of the selection
procedure is representative
of important aspects of
performance on the job for
which the candidates are to
be evaluated” (Section 5.B)
Explicit prohibition related to the use of
content validity for tests that focus on
traits or constructs, such as personality:
“. . . a content strategy is not appropriate
for demonstrating the validity of selection
procedures which purport to measure
traits or constructs, such as intelligence,
aptitude, personality, commonsense,
judgment, leadership, and spatial ability”
(Section 14.C.1)
SIOP
Principles
(2003)
“Evidence for validity based on
content typically consists of
a demonstration of a strong
linkage between the content
of the selection procedure
and important work
behaviors, activities, worker
requirements, or outcomes
on the job” (p. 21)
Approval of content validity approach for
personality measures can be inferred from
[1] the absence of an explicit prohibition
against the use of content validity evidence
for tests that focus on traits or constructs
and [2] the stated scope of applicability for
content-based evidence, which includes
tests that focus on knowledge, skills,
abilities, and other personal characteristics
Joint
Standards
(1999/2014)
“Important validity evidence
can be obtained from an
analysis of the relationship
between the content of a
test and the construct it is
intended to measure” (p. 14)
Approval of content validity approach for
personality measures can be inferred from
[1] the absence of an explicit prohibition
against the use of content validity evidence
for tests that focus on traits or constructs,
[2] the explicit description of content
validity as pertaining to “the relationship
between the content of a test and the
construct it is intended to measure” (p.
14), and [3] the broad definition of the
term construct (see p. 217), which makes
it clear that personality variables would fall
under the definition of a construct
General
review of
academic
literature
Most, if not all, descriptions of
content validity found in the
literature embody the core
notion of documenting the
linkage between the content
of a test and a particular
domain that represents
the target of measurement
and/or purpose of testing
(Haynes et al., 1995)
The sources that specifically discuss this
issue collectively indicate mixed opinions;
while some authors have expressed
reticence toward the use of content-based
evidence for measures of personality
(e.g., Goldstein et al., 1993; Lawshe,
1985; Wollack, 1976); others consider
this restriction to be problematic (e.g.,
Landy, 1986; McDaniel et al., 2011) or view
content validity as particularly relevant
to personality testing (e.g., Murphy et al.,
2009; O’Neill et al., 2009)
Note. SIOP = Society for Industrial and Organizational
Psychology.
Fisher et al. 239
Interestingly, in an apparent application of this logic, the
guidelines permit content
validation for selection procedures focusing on KSAs (as noted
in the preceding para-
graph). In such cases, the inferential leap necessary to link
KSAs to job performance
is ostensibly greater than if the selection procedures were to
focus directly on work
behaviors, which explains why the guidelines include additional
requirements related
to the content validation of tests focusing on these worker
attributes (see Sections
14.C.1 and 14.C.4). Presumably, these additional requirements
serve to bridge the
larger inferential leap made when the test does not directly
focus on work behaviors.
Thus, the Uniform Guidelines do not limit the use of content
validity to actual samples
of work behavior, but additional evidence is needed to help
bridge the larger inferen-
tial leap made when selection tests target worker attributes (i.e.,
KSAs)—yet this same
reasoning is not extended to what the guidelines characterize as
traits or constructs.
The SIOP Principles (2003)
The SIOP Principles (2003) embody the formal pronouncements
of the Society for
Industrial and Organizational Psychology pertaining to
appropriate validation and use
of employee selection procedures. For content validation, the
principles state that,
“Evidence for validity based on content typically consists of a
demonstration of a
strong linkage between the content of the selection procedure
and important work
behaviors, activities, worker requirements, or outcomes on the
job” (p. 21). Like the
Uniform Guidelines (1978), the SIOP Principles stress the
importance of capturing a
representative sample of the target of measurement and further
establishing a close
correspondence between the selection procedure and the work
domain. The principles
also acknowledge that content validity evidence can be either
“logical or empirical”
(p. 6), highlighting the role of job analysis and expert judgment
in generating content-
based evidence. However, unlike the Uniform Guidelines, the
SIOP Principles do not
make a substantive distinction between work tasks/behaviors
and worker require-
ments/attributes in relation to content-based evidence but
rather, collectively, consider
selection procedures that focus on “work behaviors, activities,
and/or worker KSAOs”
(p. 21). Importantly, the addition of “O” to the KSA acronym
represents “other per-
sonal characteristics,” which are generally understood to
include “interests, prefer-
ences, temperament, and personality characteristics [emphasis
added]” (Brannick
et al., 2007, p. 62). Accordingly, although not explicitly stated,
the use of content
validity evidence as a mean of validating personality test
inferences for employee
selection purposes appears to be consistent with the SIOP
Principles.
The Joint Standards (1999/2014)
The Joint Standards (1999/2014) are a set of guidelines for test
development and valida-
tion in the areas of psychological and educational testing, which
were developed by a
joint committee including representatives from the American
Educational Research
Association, the American Psychological Association, and the
National Council of
Measurement in Education. According to the standards, content
validity is examined by
240 Public Personnel Management 50(2)
specifying the content domain to be measured and then
conducting “logical or empirical
analyses of the adequacy with which the test content represents
the content domain and of
the relevance of the content domain to the proposed
interpretation of test scores” (p. 14).
In other words, content validity is described as pertaining to
“the relationship between the
content of a test and the construct it is intended to measure” (p.
14), where “construct” is
defined as “The concept or characteristic that a test is designed
to measure” (p. 217).
Because personality traits are easily understood as constructs,
the Joint Standards suggest
that personality test inferences may be subject to content-based
validation.
Academic Literature
It is also informative to examine the academic literature
regarding validation and per-
sonality testing. In doing so, several general observations can
be made. First, most if
not all definitions of content validity share the core notion of
documenting the linkage
between the content of a test and a particular domain that
represents the target of mea-
surement and/or purpose of testing (e.g., Aguinis et al., 2001;
Goldstein et al., 1993;
Haynes et al., 1995; Sireci, 1998). Second, as noted previously,
prominent writings on
personality testing in the workplace (e.g., Morgeson et al.,
2007b; O’Neill et al., 2013;
Ones et al., 2007; Rothstein & Goffin, 2006; Tett &
Christiansen, 2007) have tended
to ignore the applicability of content validation to personality
measures. Third, the
sources that do specifically address this issue present mixed
opinions. While some
have expressed reticence about content-based evidence for
measures of personality
(e.g., Goldstein et al., 1993; Lawshe, 1985; Wollack, 1976),
others consider this
restriction to be problematic (e.g., Landy, 1986; McDaniel et
al., 2011) or view content
validity as particularly relevant to personality testing (e.g.,
Murphy et al., 2009;
O’Neill et al., 2009). Thus, as with the technical standards and
guidelines discussed
above, those turning to the academic literature for guidance
might similarly come
away uncertain regarding the use of content validity evidence to
support personality
measures in an employee selection context.
What Are the Bases of Inconsistency?
This section attempts to identify the conceptual issues that form
the bases for disagree-
ment/misunderstanding regarding the use of content validity
evidence for personality
measures. Making these underlying matters explicit will help to
identify some com-
mon ground and the potential for a way forward. Based on the
review of documents
and literature above, the primary areas to be addressed include
(a) vestiges of the trini-
tarian view of validity, (b) the focus of the content match, and
(c) a clear understanding
of the inferences to be substantiated.
Vestiges of the Trinitarian View of Validity
Although it is now well-established that validity should be
characterized in a manner
consistent with the contemporary perspective described above
(Binning & Barrett,
Fisher et al. 241
1989; Joint Standards, 1999/2014; Landy, 1986; SIOP
Principles, 2003), the outdated
trinitarian view continues to exert substantial influence. Perhaps
the most prominent
example of this can be found in the Uniform Guidelines (1978),
which clearly reflects
a trinitarian view of validity yet remains an important document
that holds consider-
able weight in contemporary employee selection practice
(McDaniel et al., 2011).
There are at least two important concerns related to this residual
influence of the trini-
tarian perspective.
First, the trinitarian view of validity suggests that constructs
represent a separate
category of measurement that is somehow distinct from other
types of measurement
efforts. This can most readily be seen in the simple fact that
there is a separate label for
construct validity, as compared with other categories of
validity. As a result, the deter-
mination of which “type” of validity to focus on rests on
whether or not a construct—as
opposed to some other type of attribute—is the target of
measurement (Landy, 1986).
This same logic is embodied in the Uniform Guidelines (1978),
where it is indicated
that certain validation strategies are appropriate when
measuring traits or constructs
(e.g., construct validity studies), whereas other strategies are
not (e.g., content validity
studies). Importantly, this perspective is in direct opposition to
contemporary thinking
regarding validity, which suggests that all measurement efforts
ultimately implicate
constructs (Joint Standards, 1999/2014). This notion is
illuminated in a telling example
provided by Landy (1986), where he contrasts a hypothetical
typing ability test with a
measure of reasoning ability. Of particular relevance is the idea
that the typing ability
test more readily implicates observable behaviors and, thus,
could be subject to content
validation according to the Uniform Guidelines. Conversely, the
reasoning ability test
is more easily described as trait- or construct-focused, in turn,
precluding the use of
content validation according to the Uniform Guidelines.
Landy’s point was that both of
these tests actually focus on constructs (i.e., typing ability,
reasoning ability), neither of
which is directly observable. Rather, in both cases, one must
infer the level of the con-
struct possessed by an individual via the administration of a
test.
The above example is intended to highlight that all variables
measured by psycho-
logical tests can be characterized as constructs. As such, the
notion that some variables
are constructs while others are not is problematic at best and
also in direct opposition
to prevailing conceptions of test validation. However, this is not
meant to imply that
all constructs are the same. In the example above, it is clear that
the typing ability
construct might more easily be linked to job performance, as
contrasted with the rea-
soning ability construct, given that typing ability has a more
direct and obvious behav-
ioral manifestation (i.e., typing). Conversely, one might say that
a greater inferential
leap is required when validating a measure of reasoning ability,
as reasoning ability is
relatively farther removed from actual job behavior (although
not irreconcilably far
removed). This critical distinction will be discussed further in
the sections to follow.
For now, the important conclusion is that all psychological
variables that might be
measured in an employee selection context, including
personality, are no more nor less
constructs than any other.
A second concern related to the residual influence of the
trinitarian perspective is
that there appears to be a de facto preference given to criterion-
related validity
242 Public Personnel Management 50(2)
evidence when considering employee selection procedures
(Binning & Barrett, 1989).
Nevertheless, it is critical to acknowledge that there are various
circumstances where
criterion-related validity evidence is far from optimal. It has
been suggested that the
minimum sample size for a criterion-related study be no fewer
than 250 study partici-
pants (Biddle, 2011). Yet, McDaniel et al. (2011) assert that
most employers simply do
not have enough employees or applicants to conduct such a
study. The outcome of a
criterion-related study is also highly contingent on the quality
of the criterion measure.
In this regard, it is useful to note that the appropriate
development and validation of
criterion measures is often given far less attention than the
predictor side of the equa-
tion (Binning & Barrett, 1989). Furthermore, in the context of a
high-stakes testing
situation, conducting a criterion-related study might represent a
test security risk, as
sensitive test content will be presented to study participants
who might subsequently
share confidential test information. None of the above is meant
to suggest that crite-
rion-related validity evidence is always bad. However, it is
similarly important to real-
ize that there are various situations in which evidence based on
test content might well
be the preferred approach toward validation.
Taken together, once you debunk the notion that personality
measures somehow
represent a different category of construct measurement than
other “non-construct”
variables—and further acknowledge that criterion-related
validity evidence (although
extremely useful in many situations) should not be de facto
treated as the strategy of
choice—then the idea of gathering content validity evidence for
personality measures
becomes much more relevant. In the words of Binning and
Barrett (1989), “One could
reasonably argue that content-related and construct-related
evidence, when based on
sound professional judgment about appropriate test use, are
often superior to criterion-
related evidence” (p. 484).
Focus of the Content Match
Another issue that may be fueling misunderstanding regarding
the use of content
validity evidence has to do with the focus of the content match.
This issue becomes
apparent when one carefully examines the various definitions
for content validity dis-
cussed previously and shown in Table 1. For example, the
Uniform Guidelines (1978)
indicate that content validity is applicable when it can be shown
that “the content of
the selection procedure is representative of important aspects of
performance on the
job” (Section 5.B). This suggests that the primary focus of
content match should be
between the content of the selection procedure and
representative elements of job
performance, such as work tasks or behaviors. In contrast, the
Joint Standards
(1999/2014) indicate that content validity is applicable
whenever it can be shown that
there is an overlap between the content of a test and the
construct that is the focus of
measurement, which for employee selection is often some
worker attribute or require-
ment. Hence, there appears to be a duality of focus when it
comes to content validity
evidence, where some would argue such evidence is derived
from documenting over-
lap with the job performance domain of application (e.g.,
specific tasks), whereas
others would argue that content-based evidence is based on
content overlap with the
construct domain the test is intended to measure (e.g., some
personal attribute).
Fisher et al. 243
Clarity regarding this issue can be found by considering the
distinction between
work samples versus signs (Binning & Barrett, 1989;
Wernimont & Campbell, 1968).
In the context of employee selection procedures, samples refer
to those assessments
that directly implicate or elicit behaviors relevant to
performance on the job, such as
the typing test from Landy’s (1986) example discussed above,
where the test elicits
behavior (i.e., typing) that can be seen as interchangeable with
relevant on-the-job
behavior. In contrast, employee selection measures
characterized as signs refer to
those assessments that do not directly target behaviors from the
performance domain,
but nonetheless attempt to assess attributes or capabilities that
are thought to be rele-
vant for job performance. An illustration of this would be the
reasoning test from
Landy’s example, where the actual behaviors elicited by the
assessment (e.g., reading
logic problems, completing multiple choice questions) may be
less obviously relevant
to primary job functions, but the test nevertheless measures an
attribute that is undoubt-
edly critical for effective performance in many occupations
(i.e., the capacity for
effective reasoning).
In light of this distinction, employee selection measures that
fall on the work-sam-
ple end of the spectrum are likely to focus on work tasks or
behaviors, while those that
fall on the sign end of the spectrum are likely to focus on
worker attributes and require-
ments. By extension, it appears that the Uniform Guidelines
(1978) primarily permit
the use of content validity evidence in relation to work samples
that target important
tasks and behaviors, whereas the SIOP Principles (2003) and
Joint Standards
(1999/2014) would additionally permit the use of content
validity evidence for sign-
based measures that focus on job-relevant personal capacities
and worker require-
ments. Importantly, although both work samples and signs can
be used as predictors
for employee selection, these two meaningfully differ in terms
of whether the behav-
iors implicated and/or elicited by the test are isomorphic with,
or functionally similar
to, the performance construct domain (Binning & Barrett, 1989;
Binning & LeBreton,
2009). More specifically, work samples tend to implicate or
elicit behaviors that
exhibit a high degree of isomorphism with performance
behaviors, whereas this tends
to be less true (although not necessarily untrue) for sign-based
measures.
The above discussion helps to clarify the divergent perspectives
concerning the focus
of the content match. First, work samples tend to be constructed
with the intention of
sampling from the performance domain, as exhibited in the
increased isomorphism with
performance behaviors (Binning & Barrett, 1989). As such, the
appropriate focus of
content validity in the case of work-sample-based measures
should be the degree of
match between the content of the measure and the job
performance domain (Binning &
LeBreton, 2009). Indeed, this appears to be the central logic
espoused in the Uniform
Guidelines (1978), which primarily permit the use of content
validity evidence in rela-
tion to work samples that target important tasks and behaviors.
Second, sign-based mea-
sures tend to be constructed with the intention of sampling from
a separate construct
domain that is technically distinct from—yet conceptually
relevant to—the job perfor-
mance domain (Binning & Barrett, 1989). Therefore, the
appropriate focus of content
validity in the case of sign-based measures should be the degree
of match between the
content of the selection measure and whatever distinct construct
domain represents that
244 Public Personnel Management 50(2)
target of measurement (Binning & LeBreton, 2009). The logic
behind this latter asser-
tion appears consistent with the SIOP Principles (2003) and
Joint Standards (1999/2014),
which additionally permit the use of content validity evidence
for sign-based measures
that focus on job-relevant worker requirements and attributes.
Understanding the Inferences to Be Substantiated
As described previously, the collection of validity evidence is
ultimately aimed at
substantiating the inferences made from test scores. However, a
careful consideration
of the validation process indicates that there are several
potential inferences that might
be of relevance, especially when the intended purpose of testing
is taken into account
(Binning & Barrett, 1989). For example, one might aim to
substantiate the inference
that test scores accurately reflect varying levels of the
underlying construct being mea-
sured. While this is, indeed, a crucial inference with regard to
validation, it does not
necessarily capture the intended use of the test. As such, it is
additionally relevant to
substantiate the inference that test scores (and corresponding
levels of the target con-
struct) have relevance with regard to the purpose of testing. In
the case of employee
selection tests, this purpose typically informs inferences about
job performance. This,
in turn, suggests the importance of understanding the
performance domain, which may
additionally implicate other inferences, such as substantiating
the degree to which
operational measures of job performance accurately reflect an
underlying performance
construct. Given these various potential inferences, it becomes
important to clearly
understand the specific inferences that must be addressed as
part of the validation
process. Toward this end, a better understanding of such
inferences can be achieved by
visually depicting the relevant inferences that are necessary for
linking the test in
question to the construct it is intended to measure, in addition
to the purpose of testing.
This can be seen in Figure 1a, which is based on the seminal
work of Binning and
Barrett (1989; also see Arthur & Villado, 2008; Binning &
LeBreton, 2009; Guion,
2004; Joint Standards, 1999/2014).
The framework depicted in Figure 1a allows one to clearly
discuss the specific infer-
ences involved in various approaches to validation. First and
foremost, given that the
ultimate purpose of most (if not all) employee selection
measures is to understand job
performance, it has been argued that the inference linking the
predictor measure to the
job performance construct domain represents the most critical
inference in the employee
selection context (Binning & Barrett, 1989). This inference is
depicted in Figure 1a as
Inference 1. To the extent that the predictor measure is work-
sample-based, and, thus,
exhibits a high degree of isomorphism with the job performance
domain, this primary
link can be directly substantiated by documenting the degree of
overlap (or content
match) between the predictor assessment and job performance
(Binning & LeBreton,
2009). This is consistent with what the Uniform Guidelines
(1978) describe as a content
validity study for a selection measure that focuses on work
tasks or behaviors, and is
further consistent with what the SIOP Principles (2003) and
Joint Standards (1999/2014)
would consider evidence for validity based on test content.
However, to the degree that
the selection measure in question is sign-based, and, thus,
departs from isomorphism
Fisher et al. 245
Figure 1. Inferences in the validation process: (a) common
framework for depicting
interferences in the validation process, (b) modified framework
for depicting interferences in
the validation process.
246 Public Personnel Management 50(2)
with job performance, then the direct substantiation of Inference
1 becomes more tenu-
ous, in turn, requiring indirect substantiation via the pairing of
additional inferences—
which nonetheless represents an appropriate means of validation
(Binning & Barrett,
1989; Binning & LeBreton, 2009; Joint Standards, 1999/2014).
A second viable approach for validation would be to
collectively substantiate
Inference 2 and Inference 3, as shown in Figure 1(a). Inference
2 represents the degree
to which the operational predictor measure reflects the
underlying construct it is pur-
ported to measure, while Inference 3 represents the degree to
which the underlying
predictor construct is relevant to the job performance domain.
Using the nomenclature
of the traditional trinitarian view of validity, this approach
might be labeled as con-
struct validity (Binning & Barrett, 1989; Joint Standards,
1999/2014), given that refer-
ence is made to underlying constructs. However, in light of the
fact that contemporary
conceptualizations of validity eschew the notion of a distinct
form of “construct valid-
ity,” Binning and LeBreton (2009) argue that this approach is
better characterized as
content-based evidence. Specifically, although the indirect
approach discussed here is
distinct from the direct substantiation of Inference 1 described
above, both approaches
rely heavily on comparing predictor content to some underlying
construct domain.
The difference lies in the fact that the substantiation of
Inference 1 compares the pre-
dictor content with the job performance construct domain, while
the substantiation of
Inference 2 compares the predictor content with the underlying
predictor construct
domain. Furthermore, to account for the less direct route of
substantiation in the latter
approach, additional evidence is required to bridge the larger
inferential leap to the job
performance domain, which is reflected in Inference 3.
Importantly, the collective
examination of Inference 2 and Inference 3 represents an
appropriate means for deriv-
ing content validity evidence for sign-based selection measures
that exhibit less iso-
morphism with the job performance domain (Binning &
LeBreton, 2009). Indeed, this
logic is explicitly included in the Joint Standards (1999/2014;
see pp. 172–173) and
also implicitly described in the Uniform Guidelines, where it is
stated that,
For any selection procedure measuring a knowledge, skill, or
ability [i.e., sign-based
measure] the user should show that (a) the selection procedure
measures and is a
representative sample of that knowledge, skill, or ability [i.e.,
Inference 2]; and (b) that
knowledge, skill, or ability is used in and is a necessary
prerequisite to performance of
critical or important work behavior(s) [i.e., Inference 3].
(Section 14.C.4)
Another approach to validation suggested by Figure 1a would be
to collectively
substantiate Inference 4 and Inference 5. Here, Inference 4
represents the empirical
relationship between the predictor measure and a job
performance/criterion measure,
while Inference 5 represents the degree to which the
performance/criterion measure
reflects the underlying performance construct domain it is
intended to capture. This
approach is analogous to what the Uniform Guidelines (1978)
would characterize as a
criterion-related validity study, and is further consistent with
what the SIOP Principles
(2003) and Joint Standards (1999/2014) would describe as
evidence for validity based
on relations to other variables. In practice, however, criterion-
related validity studies
Fisher et al. 247
often focus primarily and/or exclusively on Inference 4, at the
exclusion of Inference
5. Unfortunately, this means that validation efforts of this
nature are typically com-
pleted with only cursory reference to underlying theory or
consideration for the impli-
cated construct domains. This again highlights the fact that
criterion-related validation
should not necessarily or always be seen as the most optimal
strategy, especially from
the perspective of generating a theory-based and scientifically
grounded (as opposed
to purely empirical) understanding of tests and measures used
for employee selection.
Conversely, when both Inference 4 and Inference 5 are given
due consideration, one
can see that this approach mirrors the approach pertaining to
Inferences 2 and 3, sug-
gesting two different but comparably informative approaches to
understanding the
relevance of the focal predictor measure to the underlying job
performance domain.
Returning to the topic of content validity, the above discussion
suggests two poten-
tially viable approaches for examining content-based evidence
pertaining to an
employee selection test. First, to the degree that the measure is
work-sample-based,
and, thus, exhibits a high degree of isomorphism with the job
performance domain,
then the focus should be on Inference 1. This can be referred to
as validity evidence
based on test content for work-sample-based measures. Second,
to the extent that the
measure is sign-based, and, thus, departs from isomorphism
with job performance,
then the focus should be collectively on Inference 2 and
Inference 3. This can be
referred to as validity evidence based on test content for sign-
based measures. Figure
1b presents a modified framework that accounts for these
divergent approaches to
validation. Importantly, this framework is consistent with a
contemporary conceptual-
ization of validity that places primary emphasis on inferences in
the validation process
and also explicitly acknowledges the purpose of testing. At the
same time, this frame-
work also explicitly represents the degree of inferential leap
necessary for linking the
predictor measure with the performance construct domain—an
issue that is of para-
mount importance in the Uniform Guidelines (1978).
Specifically, as sign-based mea-
sures exhibit lower isomorphism with the job performance
domain as compared with
work-sample-based measures, additional evidence is needed to
bridge the larger infer-
ential leap, which is manifested in the requirement to
substantiate two inferences (e.g.,
Inferences 2 and 3), as opposed to just one (i.e., Inference 1).
As aptly summarized by
Binning and LeBreton (2009), “content validation involves
either (a) directly match-
ing predictor content to criterion CDs [construct domains] or (b)
matching predictor
content to psychological CDs which are in turn related to
criterion CDs (i.e., delineat-
ing psychological traits believed to influence job behavior)” (p.
489).
How to Gather Content Evidence for Personality
Measures?
Building on the preceding sections, our view is that evidence
based on test content can
and should be used as a means of validating personality test
inferences for employee
selection purposes. This practice is consistent with
contemporary conceptualizations of
validity, as embodied in the SIOP Principles (2003) and Joint
Standards (1999/2014),
which both place primary emphasis on inferences in the
validation process and further
248 Public Personnel Management 50(2)
view content-based evidence as one of several viable
approaches to validating such
inferences. Furthermore, the explicit prohibition against this
practice in the Uniform
Guidelines (1978) appears to be largely based on outdated
notions of validity and the
fallacious idea that tests focusing on constructs somehow
represent a different category
of measurement than other “non-construct” variables. Thus,
content validity evidence
should be treated as an appropriate means of validating
personality test inferences.
Accordingly, we can now make informed recommendations
regarding how best to
collect content validity evidence for personality measures.
There are at least two pri-
mary conduits through which one might maximize the content-
relevance of personal-
ity tests and generate appropriate content-based evidence,
including (a) maximizing
isomorphism with the job performance domain during initial test
development, and (b)
substantiating the appropriate inferences via expert judgment
after test development,
but prior to operational use.
Maximizing Isomorphism During Test Development
One way to increase the content-relevance of personality
measures used for employee
selection is to draw directly from the job performance domain
while developing the
test. The difference between work-sample-based measures (that
sample primarily
from the job performance domain) and sign-based measures
(that sample primarily
from a distinct yet related construct domain) is a matter of
degree, as opposed to a
strict categorical distinction. In other words, psychological tests
can move along this
continuum by sampling predominantly from one domain or the
other, in addition to a
combination of both (Spengler et al., 2009). Of relevance,
sampling from the perfor-
mance domain has the effect of increasing the degree of
isomorphism between the test
content and job performance, in turn, moving the measure
toward the work-sample
end of the spectrum.
By way of illustration, traditional measures of personality such
as those that might
be found in the International Personality Item Pool
(http://ipip.ori.org/; also see
Goldberg et al., 2006) sample primarily from the personality
construct domain of
interest. As such, an example item for the trait of extraversion
might include “I am the
life of the party.” Conversely, personality measures specifically
designed to be rele-
vant in the workplace (e.g., Ellingson et al., 2013) sample both
from the personality
construct domain as well as the intended domain of application
(i.e., work). Here, an
example item for extraversion would be “I involve my coworker
in what I am doing.”
Notably, the personality statement in the latter example is more
obviously and directly
applicable to the job performance domain. Related to this, there
is a growing body of
literature that supports the practice of contextualizing
personality scale content to spe-
cifically reference the domain of work (e.g., Hunthausen et al.,
2003; Lievens et al.,
2008; Schmit et al., 1995; Shaffer & Postlethwaite, 2012; also
see Ones & Viswesvaran,
2001). While this research on contextualization does not
primarily focus on the issue
of content validity, the practice of modifying personality scale
content to explicitly
reference the work context nonetheless has the consequence of
extending content-
relevance to the job performance domain. Indeed, it has been
suggested that the use of
http://ipip.ori.org/
Fisher et al. 249
custom-developed personality tests based on work-
contextualization “is extremely
valuable because it may open up content validation as a
potential validation strategy”
(Morgeson et al., 2007a, p. 1043).
In summary, the content-relevance of personality measures used
for employee
selection can be improved by sampling both from the
personality construct domain
and the job performance domain. This practice is ultimately
manifested in the creation
and use of personality scale items that directly reference and/or
implicate the perfor-
mance domain in question. Practically, this can be accomplished
by first identifying
critical features of performance via job analysis efforts and
subsequently creating
personality-based items that explicitly reflect the identified
performance elements. If
this is done, the personality construct of interest will ultimately
be operationalized in
terms of relevant behaviors and experiences that collectively
comprise important
aspects of performance on-the-job. Consequently, the more the
personality items
directly implicate or reference job behaviors (e.g., “I show up to
work on time”)—in
addition to cognitive and affective performance-related
experiences (e.g., “I get ner-
vous when I talk to clients”)—the lower the inferential leap
necessary for linking the
content of the test directly to the job performance domain.
Substantiation of Appropriate Inferences via Expert Judgment
The considerations discussed in the preceding section are only
applicable if a new test
is being created or if the opportunity to modify an existing
general-focused personality
test is available. The topic of the current section is on
generating content-based evi-
dence for existing or unmodifiable personality measures to be
used for employee
selection. For this, the primary mechanism of generating such
evidence involves elicit-
ing expert judgment regarding content-relevance. More
specifically, when the person-
ality measure in question exhibits a high degree of isomorphism
with the performance
domain of interest (e.g., contextualized personality scales),
content validity evidence
can be generated by eliciting expert judgment regarding the
direct overlap (or content
match) between the personality measure content and the job
performance domain (i.e.,
Inference 1 from Figure 1). However, to the extent that this is
not the case, as with
noncontextualized or general-focused personality measures,
content-based validation
would proceed via expert judgment regarding Inference 2 and
Inference 3. Example
scales that might be used to substantiate these various
inferences via expert judgment
are shown in Figure 2. These scales can be used to assist in the
collection of content-
based validity evidence.
The first approach involves substantiating the direct link
between the predictor
measure and the job performance construct domain (i.e.,
Inference 1). In other words,
expert judges are asked to indicate the degree to which the
specific items that comprise
the test are directly relevant to performance on-the-job. A
common method for quanti-
fying such ratings is the content validity ratio (Lawshe, 1975),
which—as originally
conceptualized—asks subject-matter experts (SMEs) whether a
particular skill or
knowledge area measured by test items is “Essential,” “Useful
but not essential,” or
“Not necessary” for performance of the job in question.
Although the content validity
250 Public Personnel Management 50(2)
ratio was originally intended for tests that focus on knowledge
or skills, with slight
modifications, it can be applied to measures of personality as
well. Examples of this
can be seen in Figure 2. Other scales may also be created to
serve the same purpose as
long as they adequately capture the degree to which the test
content is relevant to the
job performance domain of interest.
Figure 2. Example scales for substantiating relevant inferences
related to content validation.
Fisher et al. 251
The second approach involves substantiating the link between
the test in question
and the underlying construct that it is purported to measures
(i.e., Inference 2), in addi-
tion to the link between the predictor construct and the
underlying job performance
domain (i.e., Inference 3). Aguinis et al.’s (2001) discussion of
the content validation
ratio suggests that it can also be modified to serve the purpose
of substantiating
Inference 2, as was the case with Inference 1 above. As noted
by Aguinis et al., expert
judges can be asked to “rate whether each item is essential,
useful but not essential, or
not necessary for measuring the attribute [emphasis added]” (p.
38). Thus, in accor-
dance with the above discussion, the primary distinction
between the methods for
substantiating Inference 1 and Inference 2 is that the former
focuses on the job perfor-
mance domain whereas the latter focuses on the predictor
construct domain. Again,
other scales may be created to serve the same purpose as long as
they adequately
capture the degree to which the test content is reflective of the
underlying predictor
construct (see Figure 2).
In terms of Inference 3, Arthur and Villado (2008) indicate that
this inference “is
established via job analysis processes that are intended to
identify the predictor con-
structs deemed requisite for the successful performance of the
specified job (or perfor-
mance) behaviors in question” (p. 436). In this regard,
personality-oriented job analysis
efforts can be used to identify and substantiate the relevance of
particular traits for the
job under investigation (O’Neill et al., 2013; Raymark et al.,
1997; Tett & Burnett,
2003; Tett & Christiansen, 2007). For example, as shown in
Figure 2, Raymark et al.
(1997) adopted the following stem for their Personality-Related
Position Requirements
Form: “Effective performance in this position requires the
person to . . .” (p. 724).
Taken together, expert judgment regarding the above inferences
constitutes evidence
for validity based on test content (Binning & LeBreton, 2009).
Discussion
There are conflicting recommendations regarding the use of
content validity evidence
to support personality test inferences for employee selection.
Unfortunately, inconsis-
tencies of this nature may be the inevitable result of various
different constituencies
(e.g., legal, professional, scientific) jointly vying to determine
standards and guide-
lines within the collective enterprise of test use and validation
(see Binning & Barrett,
1989; Landy, 1986; McDaniel et al., 2011). As such, it may be
an unrealistic ideal to
achieve a perfect resolution that fully addresses any and all
potential inconsistencies.
Nonetheless, it is our hope that the discussion presented above
has ameliorated the
ambiguity to some meaningful degree. In particular, we believe
that the recommenda-
tions above concerning content validity evidence and
personality testing are—to the
greatest extent possible—consistent with the spirit and intention
of prevailing stan-
dards and guidelines, even if not with certain technical
proscriptions (Uniform
Guidelines, 1978). That being said, there remain a few caveats
to be discussed below.
First, aside from the explicit prohibition against content validity
for traits and con-
structs, another requirement in the Uniform Guidelines (1978) is
that tests validated
via a content-based strategy should focus on attributes that are
operationally defined
252 Public Personnel Management 50(2)
in terms of observable job behaviors (Section 14.C.4). This is
potentially problematic
for measures of personality, as personality constructs implicate
not only observable
behavioral manifestations (e.g., “I show up to work on time”),
but also various cogni-
tive and affective experiences (e.g., “I get nervous when I talk
to clients”). As a poten-
tial solution, when considering the cognitive and affective
experiences implicated by
a particular construct, it is helpful to think about the work-
relevant behavioral conse-
quences of such experiences (as informed by job analysis
efforts), and correspond-
ingly develop items to reflect such behaviors. Despite this
potential solution, we view
the strict requirement to operationalize all measures that are
subject to content valida-
tion in terms of observable behaviors as problematic.
Specifically, in an information-
based economy, many critical features of job performance may
not be directly or
obviously observable. As a result, measures that are limited to
observable behaviors
may suffer from construct deficiency to the extent that non-
observable experiences
represent important aspects of the construct domain.
Furthermore, the recommenda-
tions above provide viable paths for content validation
regardless of whether the test
in question exhibits complete isomorphism with performance in
terms of observable
job behaviors. To the extent that it does, the substantiation of
Inference 1 represents an
appropriate focus. Conversely, to the extent that it does not, the
collective substantia-
tion of Inference 2 and Inference 3 becomes an appropriate
alternative (Binning &
LeBreton, 2009). Regardless, those concerned about strict
adherence to the Uniform
Guidelines (1978) should consider limiting the personality items
used to those that
directly reference observable behavior.
Second, considering the critical attention given to the Uniform
Guidelines (1978),
it might come across as though our goal is to somehow vilify
these guidelines. This is
certainly not the case. The guidelines were created with
admirable intentions regard-
ing standardization of validation efforts and the prevention of
discrimination in
employment practices. These are extremely important concerns,
and we support efforts
that strive to accomplish these noble ideals. At the same time, it
is important that
efforts of this nature are concordant with contemporary
scientific understanding per-
taining to the subject matter. Unfortunately, as described by
McDaniel et al. (2011),
the Uniform Guidelines have “not been revised in over 3
decades [and as a result are]
substantially inconsistent with scientific knowledge and
professional guidelines and
practice” (p. 494). As such, our intention is not to disparage
these guidelines, but rather
to ensure that validation practices are consistent with
contemporary conceptualiza-
tions of validity. We also believe that the value of the above
discussion extends far
beyond the Uniform Guidelines and employee selection in the
United States. As noted
in the introduction, it is important to carefully consider
appropriate validation strate-
gies so that accurate inferences are made about all test-takers,
regardless of when,
where, or why testing and validation efforts occur. Accordingly,
the discussion pre-
sented herein is likely to be of relevance for content-based
validation efforts in all
areas that utilize psychological testing, including, for example,
educational testing,
clinical practice, and international employee selection efforts.
Third, not everyone may agree with our choice of terminology
in relation to the two
paths for content-based validation described above.
Specifically, we opted to label the
Fisher et al. 253
substantiation of Inference 1 as validity evidence based on test
content for work-sam-
ple-based measures and the collective substantiation of
Inferences 2 and 3 as validity
evidence based on test content for sign-based measures (see
Figure 1). In considering
our choice for terminology, we explored three possible avenues.
First, one possibility
was to adopt terminology that is consistent with the historical
trinitarian view of valid-
ity, which places primary emphasis on three “types” of
validity—content, criterion, and
construct. From this perspective, Inference 1 might be labeled
as content validity and
Inferences 2 and 3 as construct validity. However, as is clearly
outlined above, this trini-
tarian view is considered problematic and inconsistent with
contemporary thinking
regarding validation. Second, another possibility is to create a
new label to replace
construct validity for Inferences 2 and 3, given that a primary
criticism of the historical
trinitarian view is the notion of a separate category of construct
validity. While we
understand the appeal of this second approach, we also believe
that creating a new label
might have the unintended consequence of adding further
confusion to the literature,
especially considering that there is already much potential for
confusion in the arena of
validation terminology. This brings us to our third and favored
approach, which is to
adopt terminology that is consistent with a contemporary
perspective on validity,
wherein validity is a unitary concept with various sources of
supporting evidence. From
this perspective, the labeling of different inferential pathways
should be based on a
thoughtful consideration of which form of validity evidence
best aligns with the valida-
tion activities implicated by the paths of interest. Regarding
validity evidence based on
test content, this broadly refers to an analysis of whether the
content of a measure ade-
quately reflects (or samples from) a relevant underlying
construct domain, which is
precisely what is involved in both Inference 1 and the
combination of Inferences 2/3.
Finally, and perhaps most critically, we are by no means
arguing that a content-
based strategy should necessarily be the preferred or only
method of validation. The
important point is that content-based evidence is not inherently
more or less appropri-
ate than other sources of validity evidence. Rather, evidence
based on test content
represents one of several potential sources (Joint Standards,
1999/2014), and the par-
ticular circumstances of the validation effort should be carefully
considered before
determining which source(s) of evidence are most appropriate.
Validity generaliza-
tion, validity transport, and synthetic validity also represent
viable options.
Furthermore, validation efforts need not (and should not) be
limited to just one form
of evidence. For example, content-based evidence can be
combined with evidence
pertaining to an empirical relationship with a criterion measure,
which, in turn, would
result in a stronger validity argument. Or, to the extent that
faking or response distor-
tion is a concern (Morgeson et al., 2007b; Ones et al., 2007),
evidence pertaining to
response processes might be gathered as well. In a similar vein,
although we discuss
two potential paths for content validation (i.e., Inference 1 vs.
Inferences 2 and 3; see
Figure 1), this does not preclude efforts at validating all three
inferences, which again
would result in a stronger validity argument. Ultimately, “the
process of validation
involves accumulating relevant evidence to provide a sound
scientific basis for the
proposed score interpretations” (Joint Standards, 1999/2014, p.
11). To the extent
feasible, the more evidence the better.
254 Public Personnel Management 50(2)
Authors’ Note
An earlier version of this manuscript was presented as a poster
session at the 31st Annual
Conference of the Society for Industrial and Organizational
Psychology in Anaheim, California.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with
respect to the research, authorship,
and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial
support for the research, authorship,
and/or publication of this article: This work was, in part,
supported by a Faculty Development
Summer Fellowship Grant awarded by The University of Tulsa
to the first author.
ORCID iD
David M. Fisher https://orcid.org/0000-0002-7810-3494
References
Aguinis, H., Henle, C. A., & Ostroff, C. (2001). Measurement in
work and organizational psy-
chology. In N. Anderson, D. S. Ones, H. K. Sinangil, & C.
Viswesvaran (Eds.), Handbook
of industrial, work and organizational psychology (Vol. 1, pp.
27–50). SAGE.
American Educational Research Association, American
Psychological Association, &
National Council on Measurement in Educatio n. (2014).
Standards for Educational and
Psychological Testing. (Original work published 1999)
Arthur, W., Jr., & Villado, A. J. (2008). The importance of
distinguishing between constructs
and methods when comparing predictors in personnel selection
research and practice.
Journal of Applied Psychology, 93, 435–442.
https://doi.org/10.1037/0021-9010.93.2.435
Biddle, D. A. (2011). Adverse impact and test validation: A
practitioner’s handbook (3rd ed.).
Infinity Publishing.
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel
decisions: A conceptual analysis
of the inferential and evidential bases. Journal of Applied
Psychology, 74, 478–494. https://
doi.org/10.1037/0021-9010.74.3.478
Binning, J. F., & LeBreton, J. M. (2009). Coherent
conceptualization is useful for many things,
and understanding validity is one of them. Industrial and
Organizational Psychology, 2,
486–492. https://doi.org/10.1111/j.1754-9434.2009.01178.x
Brannick, M. T., Levine, E. L., & Morgeson, F. P. (2007). Job
and work analysis: Methods,
research, and applications for human resource management (2nd
ed.). SAGE.
Ellingson, J. E., Heggestad, E. D., & Myers, H. (2013). The
workplace IPIP: A contextualized
measure of personality [Unpublished manuscript].
Equal Employment Opportunity Commission, Civil Service
Commission, Department of Labor,
& Department of Justice. (1978). Uniform guidelines on
employee selection procedures.
Federal Register, 43, 38290–38315.
Furr, R. M., & Bacharach, V. R. (2014). Psychometrics: An
introduction (2nd ed.). SAGE.
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R.,
Ashton, M. C., Cloninger, C. R., &
Gough, H. C. (2006). The international personality item pool
and the future of public-
domain personality measures. Journal of Research in
Personality, 40, 84–96. https://doi.
org/10.1016/j.jrp.2005.08.007
https://orcid.org/0000-0002-7810-3494
https://doi.org/10.1037/0021-9010.93.2.435
https://doi.org/10.1037/0021-9010.74.3.478
https://doi.org/10.1037/0021-9010.74.3.478
https://doi.org/10.1111/j.1754-9434.2009.01178.x
https://doi.org/10.1016/j.jrp.2005.08.007
https://doi.org/10.1016/j.jrp.2005.08.007
Fisher et al. 255
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An
exploration of the job analysis–content
validity process. In N. Schmitt & W. C. Borman (Eds.),
Personnel selection in organiza-
tions (pp. 3–34). Jossey-Bass.
Guion, R. M. (1980). On trinitarian doctrines of validity.
Professional Psychology, 11, 385–
398. https://doi.org/10.1037/0735-7028.11.3.385
Guion, R. M. (2004). Validity and reliability. In S. G.
Rogelberg (Ed.), Handbook of research
methods in industrial and organizational psychology (pp. 57–
76). Blackwell Publishing.
Haynes, S. N., Richard, D., & Kubany, E. S. (1995). Content
validity in psychological assess-
ment: A functional approach to concepts and methods.
Psychological Assessment, 7, 238–
247. https://doi.org/10.1037/1040-3590.7.3.238
Hsu, C. (2004). The testing of America. U.S. News and World
Report, 137, 68–69.
Hunthausen, J. M., Truxillo, D. M., Bauer, T. N., & Hammer, L.
B. (2003). A field study of
frame-of-reference effects on personality test validity. Journal
of Applied Psychology, 88,
545–551. https://doi.org/10.1037/0021-9010.88.3.545
Jeanneret, P. R., & Zedeck, S. (2010). Professional
guidelines/standards. In J. L. Farr & N. T.
Tippins (Eds.), Handbook of employee selection (pp. 593–625).
Routledge.
Landy, E. J. (1986). Stamp collecting versus science: Validation
as hypothesis testing. American
Psychologist, 41, 1183–1192. https://doi.org/10.1037/0003-
066X.41.11.1183
Lawshe, C. H. (1975). A quantitative approach to content
validity. Personnel Psychology, 28,
563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lawshe, C. H. (1985). Inferences from personnel tests and their
validity. Journal of Applied
Psychology, 70, 237–238. https://doi.org/10.1037/0021-
9010.70.1.237
Lievens, F., De Corte, W., & Schollaert, E. (2008). A closer
look at the frame-of-reference
effect in personality scale scores and validity. Journal of
Applied Psychology, 93(2), 268–
279. https://doi.org/10.1037/0021-9010.93.2.268
McDaniel, M. A., Kepes, S., & Banks, G. C. (2011). The
uniform guidelines are a detriment
to the field of personnel selection. Industrial and Organizational
Psychology, 4, 494–514.
https://doi.org/10.1111/j.1754-9434.2011.01382.x
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck,
J. R., Murphy, K., & Schmitt,
N. (2007a). Are we getting fooled again? Coming to terms with
limitations in the use of
personality tests for personnel selection. Personnel Psychology,
60, 1029–1047. https://doi.
org/10.1111/j.1744-6570.2007.00100.x
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck,
J. R., Murphy, K., & Schmitt,
N. (2007b). Reconsidering the use of personality tests in
personnel selection contexts.
Personnel Psychology, 60, 683–729.
https://doi.org/10.1111/j.1744-6570.2007.00089.x
Murphy, K. R., Dzieweczynski, J. L., & Zhang, Y. (2009).
Positive manifold limits the rel-
evance of content-matching strategies for validating selection
test batteries. Journal of
Applied Psychology, 94, 1018–1031.
https://doi.org/10.1037/a0014075
O’Neill, T. A., Goffin, R. D., & Rothstein, M. (2013).
Personality and the need for personality-
oriented work analysis. In N. D. Christiansen & R. P. Tett
(Eds.), Handbook of personality
at work (pp. 226–252). Routledge.
O’Neill, T. A., Goffin, R. D., & Tett, R. P. (2009). Content
validation is fundamental for opti-
mizing the criterion validity of personality tests. Industrial and
Organizational Psychology,
2, 509–513. https://doi.org/10.1111/j.1754-9434.2009.01184.x
Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A.
(2007). In support of personality
assessment in organizational settings. Personnel Psychology,
60, 995–1027. https://doi.
org/10.1111/j.1744-6570.2007.00099.x
https://doi.org/10.1037/0735-7028.11.3.385
https://doi.org/10.1037/1040-3590.7.3.238
https://doi.org/10.1037/0021-9010.88.3.545
https://doi.org/10.1037/0003-066X.41.11.1183
https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
https://doi.org/10.1037/0021-9010.70.1.237
https://doi.org/10.1037/0021-9010.93.2.268
https://doi.org/10.1111/j.1754-9434.2011.01382.x
https://doi.org/10.1111/j.1744-6570.2007.00100.x
https://doi.org/10.1111/j.1744-6570.2007.00100.x
https://doi.org/10.1111/j.1744-6570.2007.00089.x
https://doi.org/10.1037/a0014075
https://doi.org/10.1111/j.1754-9434.2009.01184.x
https://doi.org/10.1111/j.1744-6570.2007.00099.x
https://doi.org/10.1111/j.1744-6570.2007.00099.x
256 Public Personnel Management 50(2)
Ones, D. S., & Viswesvaran, C. (2001). Integrity tests and other
criterion-focused occupational
personality scales (COPS) used in personnel selection.
International Journal of Selection
and Assessment, 9, 31–39. https://doi.org/10.1111/1468-
2389.00161
Raymark, P. H., Schmit, M. J., & Guion, R. M. (1997).
Identifying potentially useful person-
ality constructs for employee selection. Personnel Psychology,
50, 723–736. https://doi.
org/10.1111/j.1744-6570.1997.tb00712.x
Rothstein, M. G., & Goffin, R. D. (2006). The use of
personality measures in personnel selec-
tion: What does current research support? Human Resource
Management Review, 16, 155–
180. https://doi.org/10.1016/j.hrmr.2006.03.004
Schmit, M. J., & Ryan, A. M. (2013). Legal issues in
personality testing. In N. D. Christiansen
& R. P. Tett (Eds.), Handbook of personality at work (pp. 525–
542). Routledge.
Schmit, M. J., Ryan, A. M., Stierwalt, S. L., & Powell, A. B.
(1995). Frame-of-reference effects
on personality scale scores and criterion-related validity.
Journal of Applied Psychology,
80, 607–620. https://doi.org/10.1037/0021-9010.80.5.607
Schmitt, M. (2006). Conceptual, theoretical, and historical
foundations of multimethod assess-
ment. In M. Eid & E. Diener (Eds.), Handbook of multimethod
measurement in psychology
(pp. 9–25). American Psychological Association.
Shaffer, J. A., & Postlethwaite, B. W. (2012). A matter of
context: A meta-analytic investiga-
tion of the relative validity of contextualized and
noncontextualized personality measures.
Personnel Psychology, 65, 445–494.
https://doi.org/10.1111/j.1744-6570.2012.01250.x
Sireci, S. G. (1998). The construct of content validity. Social
Indicators Research, 45, 83–117.
https://doi.org/10.1023/A:1006985528729
Society for Industrial and Organizational Psychology. (2003).
Principles for the validation and
use of personnel selection procedures (4th ed.).
Spengler, M., Gelléri, P., & Schuler, H. (2009). The construct
behind content validity: New
approaches to a better understanding. Industrial and
Organizational Psychology, 2, 504–
508. https://doi.org/10.1111/j.1754-9434.2009.01183.x
Tan, J. A. (2009). Babies, bathwater, and validity: Content
validity is useful in the validation
process. Industrial and Organizational Psychology, 2, 514–516.
https://doi.org/10.1111/
j.1754-9434.2009.01185.x
Tett, R. P., & Burnett, D. D. (2003). A personality trait-based
interactionist model of job per-
formance. Journal of Applied Psychology, 88, 500–517.
https://doi.org/10.1037/0021-
9010.88.3.500
Tett, R. P., & Christiansen, N. D. (2007). Personality tests at
the crossroads: A response to
Morgeson, Campion, Dipboye, Hollenbeck, Murphy, and
Schmitt (2007). Personnel
Psychology, 60, 967–993. https://doi.org/10.1111/j.1744-
6570.2007.00098.x
Tett, R. P., Simonet, D. V., Walser, B., & Brown, C. (2013).
Trait activation theory: Applications,
developments, and implications for person–workplace fit. In N.
D. Christiansen & R. P.
Tett (Eds.), Handbook of personality at work (pp. 71–100).
Routledge.
Thornton, G. C., III. (2009). Evidence of content matching is
evidence of validity.
Industrial and Organizational Psychology, 2, 469–474.
https://doi.org/10.1111/j.1754-
9434.2009.01175.x
Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples and
criteria. Journal of Applied
Psychology, 52, 372–376. https://doi.org/10.1037/h0026244
Wollack, S. (1976). Content validity: Its legal and psychometric
basis. Public Personnel
Management, 5, 397–408.
https://doi.org/10.1111/1468-2389.00161
https://doi.org/10.1111/j.1744-6570.1997.tb00712.x
https://doi.org/10.1111/j.1744-6570.1997.tb00712.x
https://doi.org/10.1016/j.hrmr.2006.03.004
https://doi.org/10.1037/0021-9010.80.5.607
https://doi.org/10.1111/j.1744-6570.2012.01250.x
https://doi.org/10.1023/A:1006985528729
https://doi.org/10.1111/j.1754-9434.2009.01183.x
https://doi.org/10.1111/j.1754-9434.2009.01185.x
https://doi.org/10.1111/j.1754-9434.2009.01185.x
https://doi.org/10.1037/0021-9010.88.3.500
https://doi.org/10.1037/0021-9010.88.3.500
https://doi.org/10.1111/j.1744-6570.2007.00098.x
https://doi.org/10.1111/j.1754-9434.2009.01175.x
https://doi.org/10.1111/j.1754-9434.2009.01175.x
https://doi.org/10.1037/h0026244
Fisher et al. 257
Author Biographies
David M. Fisher is an assistant professor of psychology at The
University of Tulsa. Prior to his
academic position, he did consulting work that focused on
selection and testing for public safety
agencies. His research interests include employee selection,
organizational work teams, and
occupational health/resilience.
Christopher R. Milane is a senior project manager of research
services at Qualtrics. The
majority of this manuscript was written while he was a graduate
student at The University of
Tulsa. His research interests include employee selection,
organizational work teams, and leader-
ship development.
Sarah Sullivan is the department coordinator at the Doerr
Institute for New Leaders at Rice
University. The majority of this manuscript was written while
she was a graduate student at The
University of Tulsa. Her research interests include leadership
development, employee selection,
and organizational work teams.
Robert P. Tett is professor of Industrial-Organizational (I-O)
Psychology and director of the
I-O Graduate Program at The University of Tulsa where he
teaches courses in personnel selec-
tion, psychometrics, statistics, personality at work, and
evolutionary psychology. His research
targets personality trait-situation interactions, meta-analysis,
leadership competencies, and trait-
emotional intelligence.
Copyright of Public Personnel Management is the property of
Sage Publications Inc. and its
content may not be copied or emailed to multiple sites or posted
to a listserv without the
copyright holder's express written permission. However, users
may print, download, or email
articles for individual use.
Methodological and Statistical Advances in the Consideration of
Cultural
Diversity in Assessment: A Critical Review of Group
Classification and
Measurement Invariance Testing
Kyunghee Han, Stephen M. Colarelli, and Nathan C. Weed
Central Michigan University
One of the most important considerations in psychological and
educational assessment is the extent to
which a test is free of bias and fair for groups with diverse
backgrounds. Establishing measurement
invariance (MI) of a test or items is a prerequisite for
meaningful comparisons across groups as it ensures
that test items do not function differently across groups.
Demonstration of MI is particularly important
in assessment settings where test scores are used in decision
making. In this review, we begin with an
overview of test bias and fairness, followed by a discussion of
issues involving group classification,
focusing on categorizations of race/ethnicity and sex/gender.
We then describe procedures used to
establish MI, detailing steps in the implementation of
multigroup confirmatory factor analysis, and
discussing recent developments in alternative procedures for
establishing MI, such as the alignment
method and moderated nonlinear factor analysis, which
accommodate reconceptualization of group
categorizations. Lastly, we discuss a variety of important
statistical and conceptual issues to be
considered in conducting multigroup confirmatory factor
analysis and related methods and conclude with
some recommendations for applications of these procedures.
Public Significance Statement
This article highlights some important conceptual and statistical
and issues that researchers should
consider in research involving MI to maximize the
meaningfulness of their results. Additionally, it
offers recommendations for conducting MI research with
multigroup confirmatory factor analysis
and related procedures.
Keywords: test bias and fairness, categorizations of
race/ethnicity and sex/gender, measurement
invariance, multigroup CFA
Supplemental materials:
http://dx.doi.org/10.1037/pas0000731.supp
When psychological tests are used in diverse populations, it
is assumed that a given test score represents the same level of
the underlying construct across groups and predicts the same
outcome score. Suppose that two hypothetical examinees, a
middle aged Mexican immigrant woman and a Jewish European
American male college student, each produced the same score
on a measure of depression. We would like to conclude that the
examinees exhibit the same severity and breadth of depression
symptoms and that their therapists would rate them similarly on
relevant behavioral and symptom measures. If empirical evi -
dence indicates otherwise, and such conclusions are not justi-
fied, scores on the measure are said to be biased.
Although it has been defined variously, a representative
definition refers to psychometric bias as “systematic error in
estimation of a value”). A biased test “is one that systematically
overestimates or underestimates the value of the variable it is
intended to assess” due to group membership, such as ethnicity
or gender (Reynolds & Suzuki, 2013, p. 83). The “value of the
variable it is intended to assess” can either be a “true score”
(see
S1 in the online supplemental materials) on the latent construct
or a score on a specified criterion measure. The former appli -
cation concerns what is sometimes termed measurement bias, in
which the relationship between test scores and the latent attri -
bute that these test scores measure varies for different groups
(Borsboom, Romejin, & Wicherts, 2008; Millsap, 1997),
whereas the latter application concerns what is referred to as
predictive bias, which entails systematic inaccuracies in the
prediction of a criterion from a test depending upon group
membership (Clearly, 1968; Millsap, 1997).
Kyunghee Han, Stephen M. Colarelli, and Nathan C. Weed,
Department
of Psychology, Central Michigan University.
This article has not been published elsewhere, nor has it been
submitted simultaneously for publication elsewhere. The
author(s) de-
clared no potential conflicts of interest with respect to the
research,
authorship, and/or publication of this article. The author(s)
received no
funding for this study.
Correspondence concerning this article should be addressed to
Kyung-
hee Han, Department of Psychology, Central Michigan
University, Mount
Pleasant, MI 48859. E-mail: [email protected]
T
hi
s
do
cu
m
en
t
is
co
py
ri
gh
te
d
by
th
e
A
m
er
ic
an
P
sy
ch
ol
og
ic
al
A
ss
oc
ia
ti
on
or
on
e
of
it
s
al
li
ed
pu
bl
is
he
rs
.
T
hi
s
ar
ti
cl
e
is
in
te
nd
ed
so
le
ly
fo
r
th
e
pe
rs
on
al
us
e
of
th
e
in
di
vi
du
al
us
er
an
d
is
no
t
to
be
di
ss
em
in
at
ed
br
oa
dl
y.
Psychological Assessment
© 2019 American Psychological Association 2019, Vol. 31, No.
12, 1481–1496
1040-3590/19/$12.00 http://dx.doi.org/10.1037/pas0000731
1481
http://dx.doi.org/10.1037/pas0000731.supp
mailto:[email protected]
http://dx.doi.org/10.1037/pas0000731
Test bias should not be confused with test fairness. Although
the
two concepts have been used interchangeably at times (e.g.,
Hunter
& Schmidt, 1976), test fairness entails a broader and more sub-
jective evaluation of assessment outcomes from perspectives of
social justice (Kline, 2013), whereas test bias is an empirical
property of test scores, estimated statistically (Jensen, 1980).
Ap-
praisals of test fairness include multifaceted aspects of the
assess-
ment process, lack of test bias being only one facet (American
Educational Research Association, American Psychological
Asso-
ciation [APA], & National Council on Measurement in
Education,
2014; Society for Industrial Organizational Psychology, 2018;
see
S2 in the online supplemental materials).
In the example above, the measure of depression may be unfair
for the Mexican female client if an English language version of
the
measure was used without evaluating her English proficiency, if
her score was derived using American norms only, if
computerized
administration was used, or if use of the test leads her to be less
likely than members of other groups to be hired for a job.
Although
test bias is not a necessary condition for test unfairness to exist,
it
may be a sufficient condition (Kline, 2013). Accordingly, it is
especially important to evaluate whether test scores are biased
against vulnerable groups.
The evaluation of test bias and test fairness each entails a
comparison of one group of people with another. While asking
the
question, “Is a test biased?” we are also implicitly asking
“against
or for which group?” Similarly, if we are concerned about using
a
test fairly, we must ask: are the outcomes based on the results
of
the test apportioned fairly to groups of people who have taken
the
test? Thus, the categorization of people into distinct groups is a
sine qua non of many aspects of psychological assessment re-
search. Racial/ethnic and sex/gender categories are prominent
fea-
tures of the social, cultural, and political landscapes in the
United
States (e.g., Helms, 2006; Hyde, Bigler, Joel, Tate, & van
Anders,
2019; Jensen, 1980; Newman, Hanges, & Outtz, 2007), and have
therefore been the most commonly studied group variables in
bias
research (e.g., Warne, Yoon, & Price, 2014). Most of the initial
research on and debates about test bias and fairness in the
United
States stemmed from political movements addressing race and
sex
discrimination (e.g., Sackett & Wilk, 1994). In service of
pressing
research on questions of discriminatio n and economic
inequality, it
thus became commonplace among psychologists and social
scien-
tists to categorize people crudely into groups (based primarily
on
race, ethnicity, and sex/gender) without much thought to the
mean-
ing and validity of those categorizations (e.g., Hyde et al.,
2019;
Yee, 1983; Yee, Fairchild, Weizmann, & Wyatt, 1993). This has
changed somewhat over the past two decades as scholarship by
psychologists and others has increasingly focused on nuances of
identity, multiculturalism, intersectionality, and multiple
position-
alities (Cole, 2009; Song, 2017). This scholarship has
emphasized
that racial, ethnic, and gender classifications can be complex,
ambiguous, and debatable—and that identities are often self-
constructed and can be fluid (Helms, 2006; Hyde et al., 2019).
The
first goal of this review, therefore, is to overview contemporary
issues involving race/ethnicity and sex/gender classifications in
bias research and to describe alternative approaches to the mea -
surement of these variables.
The psychometric methods used to examine test bias usually
depend on the definition of test bias operating for a given appli -
cation. Evaluating predictive bias (i.e., establishing predictive
in-
variance) often involves regressing total scores from a criterion
measure onto total scores on the measure of interest, and
compar-
ing regression slopes and intercepts across groups (Clearly,
1968).
Evaluating measurement bias (i.e., establishing measurement in-
variance [MI]) often necessitates more advanced quantitative
meth-
ods, such as confirmatory factor analysis (CFA) or methods
deriv-
ing from item response theory, to compare the properties of
item
scores and scores on latent variables across different groups.
Multigroup confirmatory factor analysis (MGCFA) has been one
of the most commonly used techniques to examine MI (Davidov,
Meuleman, Cieciuch, Schmidt, & Billiet, 2014) because it pro-
vides a comprehensive framework for evaluating different forms
of MI. The second goal of this review is to provide a broad
overview of MGCFA and related procedures and their relevance
to
psychological assessment.
Although MGCFA is a well-established procedure in the eval-
uation of MI, it has limitations. MGCFA is not an optimal
method
for conducting MI tests when many groups are involved. More-
over, the grouping variable in MGCFA must be categorical, and
therefore does not permit MI testing with continuous grouping
variables (e.g., age). As modern research questions may require
MI
testing across many groups, and with continuous
reconceptualiza-
tions of some of the grouping variables (e.g., gender), more
flex-
ible techniques are needed. Our third goal, therefore, is to
describe
two recent alternative methods for MI testing, the alignment
method and moderated nonlinear factor analysis, that aim to
over-
come these limitations. We conclude the review with a
discussion
of some important statistical and conceptual issues to be consid-
ered when evaluating MI, and include a list of recommended
practices.
Group Classifications Used in Bias Research
Racial and Ethnic Classifications
Race and ethnicity (see S3 in the online supplemental materials)
are conceptually vague and empirically complex social
constructs
that have been examined by numerous researchers across many
disciplines (Betancourt & López, 1993; Helms, Jernigan, &
Mascher, 2005; Yee et al., 1993). Consider race. As a biological
concept, it is essentially meaningless. In most cases, there is
more
genetic variation within so-called racial groups than between
racial
groups (Witherspoon et al., 2007). Even if we allow race to be
defined by a combination of specific morphological features and
ancestry, few “racial” populations are pure (Gibbons, 2017).
Most
are mixed—like real numbers, with infinite gradations. For
exam-
ple, although many African Americans trace their ancestry to
West
Africa, about 20% to 30% of their genetic heritage is from
Euro-
pean and American Indian ancestors (Parra et al., 1998), and
racial
admixture continues as the frequency of interracial marriages
increases (Rosenfeld, 2006; U.S. Census Bureau, 2008). Even if
one were to accept race as a combination of biological features
and
cultural and social identities (shared cultural heritage,
hardships,
and discrimination), there is the problem of degree. For
example,
while many Black Americans share social and cultural identities
based on roots in American slavery and racial discrimination,
not
all do, such as recent Black immigrants from the Caribbean.
Racial
and ethnic classifications are often conflated. In psychological
research, “Asian” is commonly used both as a cultural (Nisbett,
T
hi
s
do
cu
m
en
t
is
co
py
ri
gh
te
d
by
th
e
A
m
er
ic
an
P
sy
ch
ol
og
ic
al
A
ss
oc
ia
ti
on
or
on
e
of
it
s
al
li
ed
pu
bl
is
he
rs
.
T
hi
s
ar
ti
cl
e
is
in
te
nd
ed
so
le
ly
fo
r
th
e
pe
rs
on
al
us
e
of
th
e
in
di
vi
du
al
us
er
an
d
is
no
t
to
be
di
ss
em
in
at
ed
br
oa
dl
y.
1482 HAN, COLARELLI, AND WEED
http://dx.doi.org/10.1037/pas0000731.supp
http://dx.doi.org/10.1037/pas0000731.supp
Peng, Choi, & Norenzayan, 2001) and racial category (Rushton,
1994). Yet it is a catch-all term based primarily on geography.
It
typically refers to people from (or whose ancestors are from)
South, Southeast, and Eastern Asia. The term Hispanic often
conflates linguistic, cultural, and sometimes even
morphological
features (Humes, Jones, & Ramirez, 2010).
In public policy, mixtures of racial (or ethnic) background has
only recently begun to be addressed. The U.S. Census, for
exam-
ple, did not include a multiracial category until 2000 (Nobles,
2000). We are only beginning to see assessment studies that
parse
people from traditional broad groupings into smaller, more
mean-
ingful and homogeneous groups. In one of the few studies that
identified different types of Asians, Appel, Huang, Ai, and Lin
(2011) found significant (and sometimes major) differences in
physical, behavioral, and mental health problems among
Chinese,
Vietnamese, and Filipina women in the U.S. More recently, Tal-
helm et al. (2014) found important differences in culture and
thought patterns within only one Asian country, China. People
in
northern China were significantly more individualistic than
those
in southern China, who were more collectivistic. With current
and
historical farming practices as their theoretical centerpiece, they
examined farming practices as causal factors. In northern China
wheat has been farmed as a staple crop for millennia, whereas
in
southern China rice has been (and is) the staple crop. Talhelm et
al.
argued that the farming practices required by these two crops
required different types of social organization that, over time,
influenced cultural values and cognition. The work by Talhelm
and
colleagues is important because it is one of the first studies to
show—along with a powerful theoretical rationale—that there
are
important cultural differences between people from what has
typ-
ically been thought of as a relatively homogeneous racial and
cultural group.
In another seminal article, Gelfand and colleagues (2011) ex-
amined the looseness-tightness dimension of cultures in 33
coun-
tries. This dimension reflects the strength of norms and the
toler-
ance of deviant behavior. Loose cultures have weaker norms and
are more tolerant of deviant behavior. While there was
substantial
variation between countries, there was still considerable
variation
among countries typically considered “Asian.” Hong Kong was
the
loosest (6.3), while Malaysia was the tightest (11.8), with the
People’s Republic of China (7.9), Japan (8.6), South Korea
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing
Content validity evidence for personality testing

More Related Content

Similar to Content validity evidence for personality testing

Advances In The Use Of Career Choice Process Measures
Advances In The Use Of Career Choice Process MeasuresAdvances In The Use Of Career Choice Process Measures
Advances In The Use Of Career Choice Process MeasuresJoaquin Hamad
 
APPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONS
APPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONSAPPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONS
APPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONSAngelina Johnson
 
A conceptual model of the influence of r sum components on personnel decisio...
A conceptual model of the influence of r sum  components on personnel decisio...A conceptual model of the influence of r sum  components on personnel decisio...
A conceptual model of the influence of r sum components on personnel decisio...Sarah Morrow
 
An Analysis Of Quality Criteria For Qualitative Research
An Analysis Of Quality Criteria For Qualitative ResearchAn Analysis Of Quality Criteria For Qualitative Research
An Analysis Of Quality Criteria For Qualitative ResearchJoaquin Hamad
 
The Duty of Loyalty and Whistleblowing Please respond to the fol.docx
The Duty of Loyalty and Whistleblowing Please respond to the fol.docxThe Duty of Loyalty and Whistleblowing Please respond to the fol.docx
The Duty of Loyalty and Whistleblowing Please respond to the fol.docxcherry686017
 
Journal of Applied Psychology Copyright 2000 by the American P.docx
Journal of Applied Psychology Copyright 2000 by the American P.docxJournal of Applied Psychology Copyright 2000 by the American P.docx
Journal of Applied Psychology Copyright 2000 by the American P.docxpriestmanmable
 
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed mCHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed mEstelaJeffery653
 
Research methodologies increasing understanding of the world
Research methodologies increasing understanding of the worldResearch methodologies increasing understanding of the world
Research methodologies increasing understanding of the worldDr. Mary Jane Coy, PhD
 
Stepby-step guide to critiquingresearch. Part 1 quantitati.docx
Stepby-step guide to critiquingresearch. Part 1 quantitati.docxStepby-step guide to critiquingresearch. Part 1 quantitati.docx
Stepby-step guide to critiquingresearch. Part 1 quantitati.docxsusanschei
 
©2016 Business Ethics Quarterly 264 (October 2016). ISSN 1.docx
 ©2016  Business Ethics Quarterly  264 (October 2016). ISSN 1.docx ©2016  Business Ethics Quarterly  264 (October 2016). ISSN 1.docx
©2016 Business Ethics Quarterly 264 (October 2016). ISSN 1.docxmayank272369
 
Sally rm 11
Sally rm 11Sally rm 11
Sally rm 11ejit
 
ANALYTIC GENERALIZATION VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...
ANALYTIC GENERALIZATION  VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...ANALYTIC GENERALIZATION  VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...
ANALYTIC GENERALIZATION VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...Mary Calkins
 
Developing Innovative Models of Prqactice at the Interface Between the NHS an...
Developing Innovative Models of Prqactice at the Interface Between the NHS an...Developing Innovative Models of Prqactice at the Interface Between the NHS an...
Developing Innovative Models of Prqactice at the Interface Between the NHS an...BASPCAN
 
Class Research Project Final
Class Research Project FinalClass Research Project Final
Class Research Project FinalAlexandra Collins
 
Face Construct And Criterion-Related Validity Essay
Face Construct And Criterion-Related Validity EssayFace Construct And Criterion-Related Validity Essay
Face Construct And Criterion-Related Validity EssayDeb Birch
 
Discussion.docx
Discussion.docxDiscussion.docx
Discussion.docx4934bk
 
Doctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docx
Doctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docxDoctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docx
Doctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docxelinoraudley582231
 
The Pros Of Construct Validity
The Pros Of Construct ValidityThe Pros Of Construct Validity
The Pros Of Construct ValidityJennifer Wood
 
A Case in Case Study Methodology.pdf
A Case in Case Study Methodology.pdfA Case in Case Study Methodology.pdf
A Case in Case Study Methodology.pdfKarla Long
 

Similar to Content validity evidence for personality testing (20)

Advances In The Use Of Career Choice Process Measures
Advances In The Use Of Career Choice Process MeasuresAdvances In The Use Of Career Choice Process Measures
Advances In The Use Of Career Choice Process Measures
 
APPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONS
APPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONSAPPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONS
APPLYING CASE STUDY METHODOLOGY TO CHILD CUSTODY EVALUATIONS
 
A conceptual model of the influence of r sum components on personnel decisio...
A conceptual model of the influence of r sum  components on personnel decisio...A conceptual model of the influence of r sum  components on personnel decisio...
A conceptual model of the influence of r sum components on personnel decisio...
 
An Analysis Of Quality Criteria For Qualitative Research
An Analysis Of Quality Criteria For Qualitative ResearchAn Analysis Of Quality Criteria For Qualitative Research
An Analysis Of Quality Criteria For Qualitative Research
 
The Duty of Loyalty and Whistleblowing Please respond to the fol.docx
The Duty of Loyalty and Whistleblowing Please respond to the fol.docxThe Duty of Loyalty and Whistleblowing Please respond to the fol.docx
The Duty of Loyalty and Whistleblowing Please respond to the fol.docx
 
Journal of Applied Psychology Copyright 2000 by the American P.docx
Journal of Applied Psychology Copyright 2000 by the American P.docxJournal of Applied Psychology Copyright 2000 by the American P.docx
Journal of Applied Psychology Copyright 2000 by the American P.docx
 
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed mCHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
 
Research methodologies increasing understanding of the world
Research methodologies increasing understanding of the worldResearch methodologies increasing understanding of the world
Research methodologies increasing understanding of the world
 
Stepby-step guide to critiquingresearch. Part 1 quantitati.docx
Stepby-step guide to critiquingresearch. Part 1 quantitati.docxStepby-step guide to critiquingresearch. Part 1 quantitati.docx
Stepby-step guide to critiquingresearch. Part 1 quantitati.docx
 
©2016 Business Ethics Quarterly 264 (October 2016). ISSN 1.docx
 ©2016  Business Ethics Quarterly  264 (October 2016). ISSN 1.docx ©2016  Business Ethics Quarterly  264 (October 2016). ISSN 1.docx
©2016 Business Ethics Quarterly 264 (October 2016). ISSN 1.docx
 
Sally rm 11
Sally rm 11Sally rm 11
Sally rm 11
 
ANALYTIC GENERALIZATION VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...
ANALYTIC GENERALIZATION  VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...ANALYTIC GENERALIZATION  VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...
ANALYTIC GENERALIZATION VALIDATING THEORIES THROUGH RESEARCH BY MANAGEMENT P...
 
Developing Innovative Models of Prqactice at the Interface Between the NHS an...
Developing Innovative Models of Prqactice at the Interface Between the NHS an...Developing Innovative Models of Prqactice at the Interface Between the NHS an...
Developing Innovative Models of Prqactice at the Interface Between the NHS an...
 
Class Research Project Final
Class Research Project FinalClass Research Project Final
Class Research Project Final
 
Qualitative research
Qualitative researchQualitative research
Qualitative research
 
Face Construct And Criterion-Related Validity Essay
Face Construct And Criterion-Related Validity EssayFace Construct And Criterion-Related Validity Essay
Face Construct And Criterion-Related Validity Essay
 
Discussion.docx
Discussion.docxDiscussion.docx
Discussion.docx
 
Doctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docx
Doctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docxDoctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docx
Doctoral StudentUNIT 1 – Discussion 2U1D2 – Qualitative Rese.docx
 
The Pros Of Construct Validity
The Pros Of Construct ValidityThe Pros Of Construct Validity
The Pros Of Construct Validity
 
A Case in Case Study Methodology.pdf
A Case in Case Study Methodology.pdfA Case in Case Study Methodology.pdf
A Case in Case Study Methodology.pdf
 

More from PazSilviapm

Case Study Clinical LeadersDavid Rochester enjoys his role as a C.docx
Case Study Clinical LeadersDavid Rochester enjoys his role as a C.docxCase Study Clinical LeadersDavid Rochester enjoys his role as a C.docx
Case Study Clinical LeadersDavid Rochester enjoys his role as a C.docxPazSilviapm
 
CASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docx
CASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docxCASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docx
CASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docxPazSilviapm
 
CASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docx
CASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docxCASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docx
CASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docxPazSilviapm
 
Case Study 2A 40 year-old female presents to the office with the c.docx
Case Study 2A 40 year-old female presents to the office with the c.docxCase Study 2A 40 year-old female presents to the office with the c.docx
Case Study 2A 40 year-old female presents to the office with the c.docxPazSilviapm
 
Case Study Horizon Horizon Consulting Patti Smith looked up at .docx
Case Study Horizon  Horizon Consulting Patti Smith looked up at .docxCase Study Horizon  Horizon Consulting Patti Smith looked up at .docx
Case Study Horizon Horizon Consulting Patti Smith looked up at .docxPazSilviapm
 
Case Study EvaluationBeing too heavy or too thin, having a disabil.docx
Case Study EvaluationBeing too heavy or too thin, having a disabil.docxCase Study EvaluationBeing too heavy or too thin, having a disabil.docx
Case Study EvaluationBeing too heavy or too thin, having a disabil.docxPazSilviapm
 
Case Study Disney Corporation1, What does Disney do best to connec.docx
Case Study Disney Corporation1, What does Disney do best to connec.docxCase Study Disney Corporation1, What does Disney do best to connec.docx
Case Study Disney Corporation1, What does Disney do best to connec.docxPazSilviapm
 
Case Study 3  Exemplar of Politics and Public Management Rightly Un.docx
Case Study 3  Exemplar of Politics and Public Management Rightly Un.docxCase Study 3  Exemplar of Politics and Public Management Rightly Un.docx
Case Study 3  Exemplar of Politics and Public Management Rightly Un.docxPazSilviapm
 
Case Study 2 Structure and Function of the Kidney Rivka is an ac.docx
Case Study 2 Structure and Function of the Kidney Rivka is an ac.docxCase Study 2 Structure and Function of the Kidney Rivka is an ac.docx
Case Study 2 Structure and Function of the Kidney Rivka is an ac.docxPazSilviapm
 
Case Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docx
Case Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docxCase Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docx
Case Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docxPazSilviapm
 
Case Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docx
Case Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docxCase Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docx
Case Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docxPazSilviapm
 
Case FormatI. Write the Executive SummaryOne to two para.docx
Case FormatI. Write the Executive SummaryOne to two para.docxCase FormatI. Write the Executive SummaryOne to two para.docx
Case FormatI. Write the Executive SummaryOne to two para.docxPazSilviapm
 
Case Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docx
Case Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docxCase Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docx
Case Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docxPazSilviapm
 
case scenario being used for this discussion postABS 300 Week One.docx
case scenario being used for this discussion postABS 300 Week One.docxcase scenario being used for this discussion postABS 300 Week One.docx
case scenario being used for this discussion postABS 300 Week One.docxPazSilviapm
 
Case Study #2Alleged improper admission orders resulting in mor.docx
Case Study #2Alleged improper admission orders resulting in mor.docxCase Study #2Alleged improper admission orders resulting in mor.docx
Case Study #2Alleged improper admission orders resulting in mor.docxPazSilviapm
 
Case Study 1Denise is a sixteen-year old 11th grade student wh.docx
Case Study 1Denise is a sixteen-year old 11th grade student wh.docxCase Study 1Denise is a sixteen-year old 11th grade student wh.docx
Case Study 1Denise is a sixteen-year old 11th grade student wh.docxPazSilviapm
 
Case AssignmentI. First read the following definitions of biodiver.docx
Case AssignmentI. First read the following definitions of biodiver.docxCase AssignmentI. First read the following definitions of biodiver.docx
Case AssignmentI. First read the following definitions of biodiver.docxPazSilviapm
 
Case and questions are In the attchmentExtra resources given.H.docx
Case and questions are In the attchmentExtra resources given.H.docxCase and questions are In the attchmentExtra resources given.H.docx
Case and questions are In the attchmentExtra resources given.H.docxPazSilviapm
 
Case C Hot GiftsRose Stone moved into an urban ghetto in order .docx
Case C Hot GiftsRose Stone moved into an urban ghetto in order .docxCase C Hot GiftsRose Stone moved into an urban ghetto in order .docx
Case C Hot GiftsRose Stone moved into an urban ghetto in order .docxPazSilviapm
 
Case Assignment must be 850 words and use current APA format with a .docx
Case Assignment must be 850 words and use current APA format with a .docxCase Assignment must be 850 words and use current APA format with a .docx
Case Assignment must be 850 words and use current APA format with a .docxPazSilviapm
 

More from PazSilviapm (20)

Case Study Clinical LeadersDavid Rochester enjoys his role as a C.docx
Case Study Clinical LeadersDavid Rochester enjoys his role as a C.docxCase Study Clinical LeadersDavid Rochester enjoys his role as a C.docx
Case Study Clinical LeadersDavid Rochester enjoys his role as a C.docx
 
CASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docx
CASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docxCASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docx
CASE STUDY Clinical  Journal Entry  1 to 2 pages A 21 month .docx
 
CASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docx
CASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docxCASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docx
CASE STUDY 5Exploring Innovation in Action The Dimming of the Lig.docx
 
Case Study 2A 40 year-old female presents to the office with the c.docx
Case Study 2A 40 year-old female presents to the office with the c.docxCase Study 2A 40 year-old female presents to the office with the c.docx
Case Study 2A 40 year-old female presents to the office with the c.docx
 
Case Study Horizon Horizon Consulting Patti Smith looked up at .docx
Case Study Horizon  Horizon Consulting Patti Smith looked up at .docxCase Study Horizon  Horizon Consulting Patti Smith looked up at .docx
Case Study Horizon Horizon Consulting Patti Smith looked up at .docx
 
Case Study EvaluationBeing too heavy or too thin, having a disabil.docx
Case Study EvaluationBeing too heavy or too thin, having a disabil.docxCase Study EvaluationBeing too heavy or too thin, having a disabil.docx
Case Study EvaluationBeing too heavy or too thin, having a disabil.docx
 
Case Study Disney Corporation1, What does Disney do best to connec.docx
Case Study Disney Corporation1, What does Disney do best to connec.docxCase Study Disney Corporation1, What does Disney do best to connec.docx
Case Study Disney Corporation1, What does Disney do best to connec.docx
 
Case Study 3  Exemplar of Politics and Public Management Rightly Un.docx
Case Study 3  Exemplar of Politics and Public Management Rightly Un.docxCase Study 3  Exemplar of Politics and Public Management Rightly Un.docx
Case Study 3  Exemplar of Politics and Public Management Rightly Un.docx
 
Case Study 2 Structure and Function of the Kidney Rivka is an ac.docx
Case Study 2 Structure and Function of the Kidney Rivka is an ac.docxCase Study 2 Structure and Function of the Kidney Rivka is an ac.docx
Case Study 2 Structure and Function of the Kidney Rivka is an ac.docx
 
Case Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docx
Case Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docxCase Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docx
Case Study 2 Plain View, Open Fields, Abandonment, and Border Searc.docx
 
Case Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docx
Case Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docxCase Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docx
Case Study 2 Collaboration Systems at Isuzu Australia LimitedDue .docx
 
Case FormatI. Write the Executive SummaryOne to two para.docx
Case FormatI. Write the Executive SummaryOne to two para.docxCase FormatI. Write the Executive SummaryOne to two para.docx
Case FormatI. Write the Executive SummaryOne to two para.docx
 
Case Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docx
Case Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docxCase Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docx
Case Study #2 Diabetes Hannah is a 10-year-old girl who has recentl.docx
 
case scenario being used for this discussion postABS 300 Week One.docx
case scenario being used for this discussion postABS 300 Week One.docxcase scenario being used for this discussion postABS 300 Week One.docx
case scenario being used for this discussion postABS 300 Week One.docx
 
Case Study #2Alleged improper admission orders resulting in mor.docx
Case Study #2Alleged improper admission orders resulting in mor.docxCase Study #2Alleged improper admission orders resulting in mor.docx
Case Study #2Alleged improper admission orders resulting in mor.docx
 
Case Study 1Denise is a sixteen-year old 11th grade student wh.docx
Case Study 1Denise is a sixteen-year old 11th grade student wh.docxCase Study 1Denise is a sixteen-year old 11th grade student wh.docx
Case Study 1Denise is a sixteen-year old 11th grade student wh.docx
 
Case AssignmentI. First read the following definitions of biodiver.docx
Case AssignmentI. First read the following definitions of biodiver.docxCase AssignmentI. First read the following definitions of biodiver.docx
Case AssignmentI. First read the following definitions of biodiver.docx
 
Case and questions are In the attchmentExtra resources given.H.docx
Case and questions are In the attchmentExtra resources given.H.docxCase and questions are In the attchmentExtra resources given.H.docx
Case and questions are In the attchmentExtra resources given.H.docx
 
Case C Hot GiftsRose Stone moved into an urban ghetto in order .docx
Case C Hot GiftsRose Stone moved into an urban ghetto in order .docxCase C Hot GiftsRose Stone moved into an urban ghetto in order .docx
Case C Hot GiftsRose Stone moved into an urban ghetto in order .docx
 
Case Assignment must be 850 words and use current APA format with a .docx
Case Assignment must be 850 words and use current APA format with a .docxCase Assignment must be 850 words and use current APA format with a .docx
Case Assignment must be 850 words and use current APA format with a .docx
 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 

Content validity evidence for personality testing

  • 1. https://doi.org/10.1177/0091026020935582 Public Personnel Management 2021, Vol. 50(2) 232 –257 © The Author(s) 2020 Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/0091026020935582 journals.sagepub.com/home/ppm Article A Critical Examination of Content Validity Evidence and Personality Testing for Employee Selection David M. Fisher1 , Christopher R. Milane2, Sarah Sullivan3, and Robert P. Tett1 Abstract Prominent standards/guidelines concerning test validation provide contradictory information about whether content-based evidence should be used as a means of validating personality test inferences for employee selection. This unresolved discrepancy is problematic considering the prevalence of personality testing, the
  • 2. importance of gathering sound validity evidence, and the deference given to these standards/guidelines in contemporary employee selection practice. As a consequence, test users and practitioners are likely to be reticent or uncertain about gathering content-based evidence for personality measures, which, in turn, may cause such evidence to be underutilized when personality testing is of interest. The current investigation critically examines whether (and how) content validity evidence should be used for measures of personality in relation to employee selection. The ensuing discussion, which is especially relevant in highly litigious contexts such as personnel selection in the public sector, sheds new light on test validation practices. Keywords test validation, content validity, personality testing, employee selection An essential consideration when using any test or measurement tool for employee selection is gathering and evaluating relevant validity evidence. In the contemporary employee selection context, validity evidence is generally understood to mean 1The University of Tulsa, OK, USA 2Qualtrics, Provo, UT, USA 3Rice University, Houston, TX, USA Corresponding Author: David M. Fisher, Assistant Professor of Psychology, The
  • 3. University of Tulsa, 800 S. Tucker Drive, Tulsa, OK 74104, USA. Email: [email protected] 935582PPMXXX10.1177/0091026020935582Public Personnel ManagementFisher et al. research-article2020 https://us.sagepub.com/en-us/journals-permissions https://journals.sagepub.com/home/ppm mailto:[email protected] Fisher et al. 233 evidence that substantiates inferences made from test scores. Various sources provide standards and guidelines for gathering validity evidence, including the Uniform Guidelines on Employee Selection Procedures (Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice, 1978; hereafter, Uniform Guidelines, 1978), the Principles for the Validation and Use of Personnel Selection Procedures (Society for Industrial and Organizational Psychology [SIOP], 2003; hereafter, SIOP Principles, 2003), and Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999/2014; hereafter, Joint Standards, 1999/2014), as well as the academic literature (e.g., Aguinis et al., 2001). Having such a variety of sources available is
  • 4. beneficial, but challenges arise when the various sources provide ambiguous or con- tradictory information. Such ambiguity can be particularly troublesome in highly liti- gious contexts, such as the public sector, where adherence to regulations governing selection is of paramount importance. The current investigation attempts to shed light on one such area of ambiguity— whether evidence based on test content should be used as a means of validating per- sonality test inferences for employee selection. Rothstein and Goffin (2006) noted, “It has been estimated that personality testing is a $400 million industry in the United States and it is growing at an average of 10% a year” (Hsu, 2004, p. 156). Given this reality, it is important to carefully consider appropriate validation procedures for such measures. However, the various sources mentioned above present conflicting direc- tions on this issue, specifically in relation to content-based validity evidence. On one hand, evidence based on test content is one of five potential sources of validity evi- dence described by the Joint Standards (1999/2014), which is similarly endorsed by the SIOP Principles (2003). This form of evidence has further been suggested by some to be particularly relevant to personality tests (e.g., Murphy et al., 2009; O’Neill et al., 2009), and especially under challenging validation conditions, such as small sample sizes, test security concerns, or lack of a reliable criterion measure (Landy, 1986; Tan,
  • 5. 2009; Thornton, 2009). On the other hand, the Uniform Guidelines (1978) assert that “. . . a content strategy is not appropriate for demonstrating the validity of selection procedures which purport to measure traits or constructs, such as intelligence, apti- tude, personality, commonsense, judgment, leadership, and spatial ability [emphasis added]” (Section 14.C.1). Other sources similarly convey reticence toward content validity for measures of traits or constructs (e.g., Goldstein et al., 1993; Lawshe, 1985; Wollack, 1976). Thus, there appears to be conflicting guidance on the use of content validity evidence to support personality measures. In light of this discrepancy, the current investigation offers a critical examination of content validity evidence and personality testing for employee selection. Such an investigation is valuable for several reasons. First, an important consequence of the inconsistency noted above is that content-based evidence may be overlooked as a valuable approach to validation when personality testing is of interest. Evidence for this can be seen in the fact that other approaches such as criterion-related validation are sometimes viewed as the only option for personality measures (Biddle, 2011). 234 Public Personnel Management 50(2) Similarly, prominent writings on personality testing in the
  • 6. workplace (e.g., Morgeson et al., 2007b; O’Neill et al., 2013; Ones et al., 2007; Rothstein & Goffin, 2006; Tett & Christiansen, 2007) have tended to ignore the applicability of content validation to personality measures. Furthermore, considering the deference given to the various standards and guidelines in contemporary employee selection practice (Schmit & Ryan, 2013), those concerned about strict adherence to such standards/guidelines are likely to be reticent or uncertain about gathering content-based evidence for personal- ity measures—in no small part due to conflicting or ambiguous recommendations. The above circumstances tend to relegate content-based evidence to be seen as less desir- able or otherwise viewed as an afterthought. In turn, this represents a missed opportu- nity for valuable insight into the use of personality measures. Second, the neglect or underutilization of content-based evidence is, in many ways, antithetical to the broader goal of developing a theory-based and scientifically grounded understanding of tests and measures used for employee selection (Binning & Barrett, 1989). For example, as elaborated below, there are various situations in which content-based evidence may be more optimal than criterion-based evidence, not the least of which includes an insufficient sample size for a criterion-based investiga- tion (McDaniel et al., 2011). Similarly, an exclusive focus on empirical prediction ignores the importance of underlying theory, which is critical
  • 7. for advancing employee selection research. Of relevance, the examination of content validity evidence forces one to carefully consider the correspondence between selection measures and underly- ing construct domains, as informed by theoretical considerations. Evidence for the value of content validity can also be found in trait activation theory (Tett & Burnett, 2003; Tett et al., 2013), which highlights the importance of a clear conceptual linkage between the content of personality traits/constructs and the job domain in question. Thus, content validity evidence should be of primary importance for personality test validation. Third, it is useful to acknowledge that the prohibition against content validity evi- dence in relation to personality measures noted in the Uniform Guidelines (1978) appears to be at odds with contemporary thinking on validation (Joint Standards, 1999/2014). The focal passage quoted above from the Uniform Guidelines has been described as being “. . . as destructive to the interface of psychological theory and practice as any that might have been conceived” (Landy, 1986, p. 1189). Although there have been well-argued critiques of the Uniform Guidelines (e.g., McDaniel et al., 2011), in addition to thoughtful elaboration of issues surrounding content valid- ity (e.g., Binning & LeBreton, 2009), a direct attempt at resolving the noted contradic- tion remains conspicuously absent from the literature. This
  • 8. contradiction, in conjunction with the absence of a satisfactory explanation, is problematic given the importance of gathering sound validity evidence pertaining to psychological test use. As such, a critical examination of this issue is warranted. Finally, the findings of the current investigation are likely to have broad applicabil- ity. Namely, although focused on personality testing, the discussion below is relevant to measures of other commonly assessed attributes classified under the Uniform Guidelines (1978) as “traits or constructs” (Section 14.C.1). Similarly, while we Fisher et al. 235 address the Uniform Guidelines—which some argue are outdated (e.g., Jeanneret & Zedeck, 2010) and further limited by their applicability to employee selection in the United States—we believe the value of this discussion extends far beyond these guide- lines. It is important to carefully consider appropriate validation strategies in all cir- cumstances where psychological tests are used. Hence, the discussion presented herein is likely to be of relevance for content-based validation efforts in other areas beyond employee selection in the United States (e.g., educational testing, clinical practice, international employee selection efforts).
  • 9. Following a brief overview of validity and content-based validation, our investiga- tion is organized around three fundamental questions. Question 1 asks whether current standards and guidelines support the use of content validity evidence for validation of personality test inferences in an employee selection context. Based on the concerns raised above, a preliminary answer to this question is that it is unclear. Question 2 then asks about the underlying bases of the inconsistency. Building on the identified causes of disagreement, Question 3 asks how one might actually gather evidence based on test content for personality measures. Ultimately, our goal in this effort is to reduce ambigu- ity and promote clarity regarding content-based validation of personality measures. Overview of Validity and Evidence Based on Test Content Broadly speaking, validity in measurement refers to how well an assessment device measures what it is supposed to (Schmitt, 2006). The focus of measurement is typi- cally described as a construct (Joint Standards, 1999/2014), which represents a latent attribute on which individuals can vary (e.g., cognitive ability, diligence, interpersonal skill, knowledge, the capacity to complete a given task). Importantly, a person’s level or relative standing with regard to the construct of interest is inferred from the test scores (SIOP Principles, 2003). As such, the notion of validity addresses the simple yet
  • 10. fundamental issue of whether test scores actually reflect the attribute or construct that the test is intended to measure. However, this succinct characterization of validity also belies the true complexity of this topic (Furr & Bacharach, 2014). Two particular com- plexities bear discussion in light of our current aims. First, contemporary thinking holds that validity is not the property of a test per se, but rather of the inferences made from test scores (Binning & Barrett, 1989; Furr & Bacharach, 2014; Joint Standards, 1999/2014; Landy, 1986; SIOP Principles, 2003). The value of this approach can be seen when the same test is used for two different purposes—for example, when an interpersonal skills test developed for the selection of sales personnel is used for hiring both sales representatives and accountants. Notably, the test itself does not change, but the inferences made from the test scores regarding the job performance potential of the applicants may be more or less valid given the focal job in question. In accord with this perspective, the Joint Standards (1999/2014) describe validity as “the degree to which evidence and theory support the interpretations of test scores for proposed uses of the test” (p. 11). Inherent in this view is the idea that validity is difficult to fully assess without a clear explication of the 236 Public Personnel Management 50(2)
  • 11. intended interpretation of scores and corresponding purpose of testing. Thus, substan- tiating relevant inferences in terms of the intended purpose of the test is of primary concern in the contemporary view of validity. Second, validity has come to be understood as a unitary concept, as compared with the dated notion of distinct types of validity (Binning & Barrett, 1989; Furr & Bacharach, 2014; Joint Standards, 1999/2014; Landy, 1986; SIOP Principles, 2003). The older trinitarian view (Guion, 1980) posits three different types of validity, includ- ing criterion-related, content, and construct validity, each relevant for different test applications (Lawshe, 1985). By contrast, the more recent unitarian perspective (Landy, 1986) emphasizes that all measurement attempts are ultimately about assess- ing a target construct, and validation entails the collection of evidence to support the argument that test scores actually reflect the construct (and that the construct is rele- vant to the intended use of the test). Consistent with this latter perspective, the Joint Standards (1999/2014) espouse a unitary view of validity and identify five sources of validity evidence, including evidence based on test content, response processes, inter- nal structure, relations to other variables, and consequences of testing. In summary, the contemporary view of validity suggests that measurement efforts ultimately implicate constructs, and different sources of evidence can be marshaled
  • 12. to substantiate the validity of inferences based on test scores. Drawing on the above discussion, evidence based on test content represents one of several potential sources of evidence for validity judgments. The collection of content- based evidence has become well-established as an important and viable validation strategy, as can be seen in the common discussion and endorsement of content validity in the academic literature (e.g., Aguinis et al., 2001; Binning & Barrett, 1989; Furr & Bacharach, 2014; Haynes et al., 1995; Landy, 1986) as well as in legal, professional, and technical standards or guidelines (e.g., Joint Standards, 1999/2014; SIOP Principles, 2003; Uniform Guidelines, 1978). The specific manner in which evidence based on test content can substantiate the validity of test score inferences is via an informed and judicious examination of the match between the content of an assess- ment tool (e.g., test instructions, item wording, response format) and the target con- struct in light of the assessment purpose (Haynes et al., 1995). For the sake of simplicity and ease of exposition, throughout this article, we use various terms interchangeably to represent the concept of evidence based on test content, such as content validity evidence, content validation strategy, content-based strategy, or simply content valid- ity. However, each reference to this concept is intended to reflect contemporary think- ing regarding validity as described above—specifically, content
  • 13. validity evidence is not a separate “type” of validity but rather, a category of evidence that can be used to substantiate the validity of inferences regarding test scores. Do Current Standards Support Content Validity for Personality? Having introduced the concepts of validity and evidence based on test content, we now turn to our primary purpose of discussing whether a content validation strategy should Fisher et al. 237 be used as a means of validating personality test inferences for employee selection purposes. In doing so, a preliminary question becomes whether current standards and guidelines support this practice. The following four sources/areas are considered: (a) the Uniform Guidelines (1978), (b) the SIOP Principles (2003), (c) the Joint Standards (1999/2014), and (c) a general review of relevant academic literature. A summary of information derived from these sources is shown in Table 1. The Uniform Guidelines (1978) The Uniform Guidelines (1978) are federally endorsed standards pertaining to employee selection procedures, which were jointly developed by the Equal Employment Opportunity Commission, the Civil Service
  • 14. Commission, the Department of Labor, and the Department of Justice in the United States. Regarding content vali- dation, the guidelines state that, Evidence of the validity of a test or other selection procedure by a content validity study should consist of data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated. (Section 5.B) The guidelines go on to describe specific technical standards and requirements for content validity studies. For example, a content validity study should include a review of information about the job under consideration (Section 14.A; Section 14.C.2). Furthermore, when the selection procedure focuses on work tasks or behaviors, it must be shown that the selection procedure includes a representative sample of on-the-job behaviors or work products (Section 14.C.1; Section 14.C.4). Conversely, under cer- tain circumstances, the guidelines also permit content validation where the selection procedure focuses on worker requirements or attributes, including knowledge, skills, or abilities (KSAs). In such cases, beyond showing that the selection procedure reflects a representative sample of the implicated KSA, it must additionally be documented that the KSA is needed to perform important work tasks (Section 14.C.1; Section 14.C.4), and the KSA must be operationally defined in terms of
  • 15. observable work behaviors (Section 14.C.4). The above notwithstanding, the Uniform Guidelines (1978) explicitly prohibit con- tent validity for tests focusing on traits or constructs, including personality (Section 14.C.1). The logic underlying this restriction appears to be based on the seemingly reasonable notion that content-based validation becomes increasingly difficult as the focus of the selection test is farther removed from actual work behaviors (Section 14.C.4; Landy, 1986; Lawshe, 1985). This logic was confirmed in a subsequent “Questions and Answers” document, where it is stated that, The Guidelines emphasize the importance of a close approximation between the content of the selection procedure and the observable behaviors or products of the job, so as to minimize the inferential leap between performance on the selection procedure and job performance [emphasis added]. (See http://www.uniformguidelines.com/questionandanswers.html) http://www.uniformguidelines.com/questionandanswers.html 238 Public Personnel Management 50(2) Table 1. Review of Various Sources Regarding Content Validity and Personality Testing. Source Description of content validity Position on personality measures
  • 16. Uniform Guidelines (1978) “Evidence of the validity of a test or other selection procedure by a content validity study should consist of data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated” (Section 5.B) Explicit prohibition related to the use of content validity for tests that focus on traits or constructs, such as personality: “. . . a content strategy is not appropriate for demonstrating the validity of selection procedures which purport to measure traits or constructs, such as intelligence, aptitude, personality, commonsense, judgment, leadership, and spatial ability” (Section 14.C.1) SIOP Principles (2003) “Evidence for validity based on content typically consists of a demonstration of a strong linkage between the content
  • 17. of the selection procedure and important work behaviors, activities, worker requirements, or outcomes on the job” (p. 21) Approval of content validity approach for personality measures can be inferred from [1] the absence of an explicit prohibition against the use of content validity evidence for tests that focus on traits or constructs and [2] the stated scope of applicability for content-based evidence, which includes tests that focus on knowledge, skills, abilities, and other personal characteristics Joint Standards (1999/2014) “Important validity evidence can be obtained from an analysis of the relationship between the content of a test and the construct it is intended to measure” (p. 14) Approval of content validity approach for personality measures can be inferred from [1] the absence of an explicit prohibition against the use of content validity evidence for tests that focus on traits or constructs, [2] the explicit description of content validity as pertaining to “the relationship between the content of a test and the construct it is intended to measure” (p.
  • 18. 14), and [3] the broad definition of the term construct (see p. 217), which makes it clear that personality variables would fall under the definition of a construct General review of academic literature Most, if not all, descriptions of content validity found in the literature embody the core notion of documenting the linkage between the content of a test and a particular domain that represents the target of measurement and/or purpose of testing (Haynes et al., 1995) The sources that specifically discuss this issue collectively indicate mixed opinions; while some authors have expressed reticence toward the use of content-based evidence for measures of personality (e.g., Goldstein et al., 1993; Lawshe, 1985; Wollack, 1976); others consider this restriction to be problematic (e.g., Landy, 1986; McDaniel et al., 2011) or view content validity as particularly relevant to personality testing (e.g., Murphy et al., 2009; O’Neill et al., 2009) Note. SIOP = Society for Industrial and Organizational Psychology.
  • 19. Fisher et al. 239 Interestingly, in an apparent application of this logic, the guidelines permit content validation for selection procedures focusing on KSAs (as noted in the preceding para- graph). In such cases, the inferential leap necessary to link KSAs to job performance is ostensibly greater than if the selection procedures were to focus directly on work behaviors, which explains why the guidelines include additional requirements related to the content validation of tests focusing on these worker attributes (see Sections 14.C.1 and 14.C.4). Presumably, these additional requirements serve to bridge the larger inferential leap made when the test does not directly focus on work behaviors. Thus, the Uniform Guidelines do not limit the use of content validity to actual samples of work behavior, but additional evidence is needed to help bridge the larger inferen- tial leap made when selection tests target worker attributes (i.e., KSAs)—yet this same reasoning is not extended to what the guidelines characterize as traits or constructs. The SIOP Principles (2003) The SIOP Principles (2003) embody the formal pronouncements of the Society for Industrial and Organizational Psychology pertaining to appropriate validation and use
  • 20. of employee selection procedures. For content validation, the principles state that, “Evidence for validity based on content typically consists of a demonstration of a strong linkage between the content of the selection procedure and important work behaviors, activities, worker requirements, or outcomes on the job” (p. 21). Like the Uniform Guidelines (1978), the SIOP Principles stress the importance of capturing a representative sample of the target of measurement and further establishing a close correspondence between the selection procedure and the work domain. The principles also acknowledge that content validity evidence can be either “logical or empirical” (p. 6), highlighting the role of job analysis and expert judgment in generating content- based evidence. However, unlike the Uniform Guidelines, the SIOP Principles do not make a substantive distinction between work tasks/behaviors and worker require- ments/attributes in relation to content-based evidence but rather, collectively, consider selection procedures that focus on “work behaviors, activities, and/or worker KSAOs” (p. 21). Importantly, the addition of “O” to the KSA acronym represents “other per- sonal characteristics,” which are generally understood to include “interests, prefer- ences, temperament, and personality characteristics [emphasis added]” (Brannick et al., 2007, p. 62). Accordingly, although not explicitly stated, the use of content validity evidence as a mean of validating personality test inferences for employee
  • 21. selection purposes appears to be consistent with the SIOP Principles. The Joint Standards (1999/2014) The Joint Standards (1999/2014) are a set of guidelines for test development and valida- tion in the areas of psychological and educational testing, which were developed by a joint committee including representatives from the American Educational Research Association, the American Psychological Association, and the National Council of Measurement in Education. According to the standards, content validity is examined by 240 Public Personnel Management 50(2) specifying the content domain to be measured and then conducting “logical or empirical analyses of the adequacy with which the test content represents the content domain and of the relevance of the content domain to the proposed interpretation of test scores” (p. 14). In other words, content validity is described as pertaining to “the relationship between the content of a test and the construct it is intended to measure” (p. 14), where “construct” is defined as “The concept or characteristic that a test is designed to measure” (p. 217). Because personality traits are easily understood as constructs, the Joint Standards suggest that personality test inferences may be subject to content-based validation.
  • 22. Academic Literature It is also informative to examine the academic literature regarding validation and per- sonality testing. In doing so, several general observations can be made. First, most if not all definitions of content validity share the core notion of documenting the linkage between the content of a test and a particular domain that represents the target of mea- surement and/or purpose of testing (e.g., Aguinis et al., 2001; Goldstein et al., 1993; Haynes et al., 1995; Sireci, 1998). Second, as noted previously, prominent writings on personality testing in the workplace (e.g., Morgeson et al., 2007b; O’Neill et al., 2013; Ones et al., 2007; Rothstein & Goffin, 2006; Tett & Christiansen, 2007) have tended to ignore the applicability of content validation to personality measures. Third, the sources that do specifically address this issue present mixed opinions. While some have expressed reticence about content-based evidence for measures of personality (e.g., Goldstein et al., 1993; Lawshe, 1985; Wollack, 1976), others consider this restriction to be problematic (e.g., Landy, 1986; McDaniel et al., 2011) or view content validity as particularly relevant to personality testing (e.g., Murphy et al., 2009; O’Neill et al., 2009). Thus, as with the technical standards and guidelines discussed above, those turning to the academic literature for guidance might similarly come away uncertain regarding the use of content validity evidence to
  • 23. support personality measures in an employee selection context. What Are the Bases of Inconsistency? This section attempts to identify the conceptual issues that form the bases for disagree- ment/misunderstanding regarding the use of content validity evidence for personality measures. Making these underlying matters explicit will help to identify some com- mon ground and the potential for a way forward. Based on the review of documents and literature above, the primary areas to be addressed include (a) vestiges of the trini- tarian view of validity, (b) the focus of the content match, and (c) a clear understanding of the inferences to be substantiated. Vestiges of the Trinitarian View of Validity Although it is now well-established that validity should be characterized in a manner consistent with the contemporary perspective described above (Binning & Barrett, Fisher et al. 241 1989; Joint Standards, 1999/2014; Landy, 1986; SIOP Principles, 2003), the outdated trinitarian view continues to exert substantial influence. Perhaps the most prominent example of this can be found in the Uniform Guidelines (1978), which clearly reflects
  • 24. a trinitarian view of validity yet remains an important document that holds consider- able weight in contemporary employee selection practice (McDaniel et al., 2011). There are at least two important concerns related to this residual influence of the trini- tarian perspective. First, the trinitarian view of validity suggests that constructs represent a separate category of measurement that is somehow distinct from other types of measurement efforts. This can most readily be seen in the simple fact that there is a separate label for construct validity, as compared with other categories of validity. As a result, the deter- mination of which “type” of validity to focus on rests on whether or not a construct—as opposed to some other type of attribute—is the target of measurement (Landy, 1986). This same logic is embodied in the Uniform Guidelines (1978), where it is indicated that certain validation strategies are appropriate when measuring traits or constructs (e.g., construct validity studies), whereas other strategies are not (e.g., content validity studies). Importantly, this perspective is in direct opposition to contemporary thinking regarding validity, which suggests that all measurement efforts ultimately implicate constructs (Joint Standards, 1999/2014). This notion is illuminated in a telling example provided by Landy (1986), where he contrasts a hypothetical typing ability test with a measure of reasoning ability. Of particular relevance is the idea that the typing ability
  • 25. test more readily implicates observable behaviors and, thus, could be subject to content validation according to the Uniform Guidelines. Conversely, the reasoning ability test is more easily described as trait- or construct-focused, in turn, precluding the use of content validation according to the Uniform Guidelines. Landy’s point was that both of these tests actually focus on constructs (i.e., typing ability, reasoning ability), neither of which is directly observable. Rather, in both cases, one must infer the level of the con- struct possessed by an individual via the administration of a test. The above example is intended to highlight that all variables measured by psycho- logical tests can be characterized as constructs. As such, the notion that some variables are constructs while others are not is problematic at best and also in direct opposition to prevailing conceptions of test validation. However, this is not meant to imply that all constructs are the same. In the example above, it is clear that the typing ability construct might more easily be linked to job performance, as contrasted with the rea- soning ability construct, given that typing ability has a more direct and obvious behav- ioral manifestation (i.e., typing). Conversely, one might say that a greater inferential leap is required when validating a measure of reasoning ability, as reasoning ability is relatively farther removed from actual job behavior (although not irreconcilably far removed). This critical distinction will be discussed further in
  • 26. the sections to follow. For now, the important conclusion is that all psychological variables that might be measured in an employee selection context, including personality, are no more nor less constructs than any other. A second concern related to the residual influence of the trinitarian perspective is that there appears to be a de facto preference given to criterion- related validity 242 Public Personnel Management 50(2) evidence when considering employee selection procedures (Binning & Barrett, 1989). Nevertheless, it is critical to acknowledge that there are various circumstances where criterion-related validity evidence is far from optimal. It has been suggested that the minimum sample size for a criterion-related study be no fewer than 250 study partici- pants (Biddle, 2011). Yet, McDaniel et al. (2011) assert that most employers simply do not have enough employees or applicants to conduct such a study. The outcome of a criterion-related study is also highly contingent on the quality of the criterion measure. In this regard, it is useful to note that the appropriate development and validation of criterion measures is often given far less attention than the predictor side of the equa- tion (Binning & Barrett, 1989). Furthermore, in the context of a high-stakes testing
  • 27. situation, conducting a criterion-related study might represent a test security risk, as sensitive test content will be presented to study participants who might subsequently share confidential test information. None of the above is meant to suggest that crite- rion-related validity evidence is always bad. However, it is similarly important to real- ize that there are various situations in which evidence based on test content might well be the preferred approach toward validation. Taken together, once you debunk the notion that personality measures somehow represent a different category of construct measurement than other “non-construct” variables—and further acknowledge that criterion-related validity evidence (although extremely useful in many situations) should not be de facto treated as the strategy of choice—then the idea of gathering content validity evidence for personality measures becomes much more relevant. In the words of Binning and Barrett (1989), “One could reasonably argue that content-related and construct-related evidence, when based on sound professional judgment about appropriate test use, are often superior to criterion- related evidence” (p. 484). Focus of the Content Match Another issue that may be fueling misunderstanding regarding the use of content validity evidence has to do with the focus of the content match. This issue becomes
  • 28. apparent when one carefully examines the various definitions for content validity dis- cussed previously and shown in Table 1. For example, the Uniform Guidelines (1978) indicate that content validity is applicable when it can be shown that “the content of the selection procedure is representative of important aspects of performance on the job” (Section 5.B). This suggests that the primary focus of content match should be between the content of the selection procedure and representative elements of job performance, such as work tasks or behaviors. In contrast, the Joint Standards (1999/2014) indicate that content validity is applicable whenever it can be shown that there is an overlap between the content of a test and the construct that is the focus of measurement, which for employee selection is often some worker attribute or require- ment. Hence, there appears to be a duality of focus when it comes to content validity evidence, where some would argue such evidence is derived from documenting over- lap with the job performance domain of application (e.g., specific tasks), whereas others would argue that content-based evidence is based on content overlap with the construct domain the test is intended to measure (e.g., some personal attribute). Fisher et al. 243 Clarity regarding this issue can be found by considering the
  • 29. distinction between work samples versus signs (Binning & Barrett, 1989; Wernimont & Campbell, 1968). In the context of employee selection procedures, samples refer to those assessments that directly implicate or elicit behaviors relevant to performance on the job, such as the typing test from Landy’s (1986) example discussed above, where the test elicits behavior (i.e., typing) that can be seen as interchangeable with relevant on-the-job behavior. In contrast, employee selection measures characterized as signs refer to those assessments that do not directly target behaviors from the performance domain, but nonetheless attempt to assess attributes or capabilities that are thought to be rele- vant for job performance. An illustration of this would be the reasoning test from Landy’s example, where the actual behaviors elicited by the assessment (e.g., reading logic problems, completing multiple choice questions) may be less obviously relevant to primary job functions, but the test nevertheless measures an attribute that is undoubt- edly critical for effective performance in many occupations (i.e., the capacity for effective reasoning). In light of this distinction, employee selection measures that fall on the work-sam- ple end of the spectrum are likely to focus on work tasks or behaviors, while those that fall on the sign end of the spectrum are likely to focus on worker attributes and require- ments. By extension, it appears that the Uniform Guidelines
  • 30. (1978) primarily permit the use of content validity evidence in relation to work samples that target important tasks and behaviors, whereas the SIOP Principles (2003) and Joint Standards (1999/2014) would additionally permit the use of content validity evidence for sign- based measures that focus on job-relevant personal capacities and worker require- ments. Importantly, although both work samples and signs can be used as predictors for employee selection, these two meaningfully differ in terms of whether the behav- iors implicated and/or elicited by the test are isomorphic with, or functionally similar to, the performance construct domain (Binning & Barrett, 1989; Binning & LeBreton, 2009). More specifically, work samples tend to implicate or elicit behaviors that exhibit a high degree of isomorphism with performance behaviors, whereas this tends to be less true (although not necessarily untrue) for sign-based measures. The above discussion helps to clarify the divergent perspectives concerning the focus of the content match. First, work samples tend to be constructed with the intention of sampling from the performance domain, as exhibited in the increased isomorphism with performance behaviors (Binning & Barrett, 1989). As such, the appropriate focus of content validity in the case of work-sample-based measures should be the degree of match between the content of the measure and the job performance domain (Binning &
  • 31. LeBreton, 2009). Indeed, this appears to be the central logic espoused in the Uniform Guidelines (1978), which primarily permit the use of content validity evidence in rela- tion to work samples that target important tasks and behaviors. Second, sign-based mea- sures tend to be constructed with the intention of sampling from a separate construct domain that is technically distinct from—yet conceptually relevant to—the job perfor- mance domain (Binning & Barrett, 1989). Therefore, the appropriate focus of content validity in the case of sign-based measures should be the degree of match between the content of the selection measure and whatever distinct construct domain represents that 244 Public Personnel Management 50(2) target of measurement (Binning & LeBreton, 2009). The logic behind this latter asser- tion appears consistent with the SIOP Principles (2003) and Joint Standards (1999/2014), which additionally permit the use of content validity evidence for sign-based measures that focus on job-relevant worker requirements and attributes. Understanding the Inferences to Be Substantiated As described previously, the collection of validity evidence is ultimately aimed at substantiating the inferences made from test scores. However, a careful consideration of the validation process indicates that there are several
  • 32. potential inferences that might be of relevance, especially when the intended purpose of testing is taken into account (Binning & Barrett, 1989). For example, one might aim to substantiate the inference that test scores accurately reflect varying levels of the underlying construct being mea- sured. While this is, indeed, a crucial inference with regard to validation, it does not necessarily capture the intended use of the test. As such, it is additionally relevant to substantiate the inference that test scores (and corresponding levels of the target con- struct) have relevance with regard to the purpose of testing. In the case of employee selection tests, this purpose typically informs inferences about job performance. This, in turn, suggests the importance of understanding the performance domain, which may additionally implicate other inferences, such as substantiating the degree to which operational measures of job performance accurately reflect an underlying performance construct. Given these various potential inferences, it becomes important to clearly understand the specific inferences that must be addressed as part of the validation process. Toward this end, a better understanding of such inferences can be achieved by visually depicting the relevant inferences that are necessary for linking the test in question to the construct it is intended to measure, in addition to the purpose of testing. This can be seen in Figure 1a, which is based on the seminal work of Binning and Barrett (1989; also see Arthur & Villado, 2008; Binning &
  • 33. LeBreton, 2009; Guion, 2004; Joint Standards, 1999/2014). The framework depicted in Figure 1a allows one to clearly discuss the specific infer- ences involved in various approaches to validation. First and foremost, given that the ultimate purpose of most (if not all) employee selection measures is to understand job performance, it has been argued that the inference linking the predictor measure to the job performance construct domain represents the most critical inference in the employee selection context (Binning & Barrett, 1989). This inference is depicted in Figure 1a as Inference 1. To the extent that the predictor measure is work- sample-based, and, thus, exhibits a high degree of isomorphism with the job performance domain, this primary link can be directly substantiated by documenting the degree of overlap (or content match) between the predictor assessment and job performance (Binning & LeBreton, 2009). This is consistent with what the Uniform Guidelines (1978) describe as a content validity study for a selection measure that focuses on work tasks or behaviors, and is further consistent with what the SIOP Principles (2003) and Joint Standards (1999/2014) would consider evidence for validity based on test content. However, to the degree that the selection measure in question is sign-based, and, thus, departs from isomorphism
  • 34. Fisher et al. 245 Figure 1. Inferences in the validation process: (a) common framework for depicting interferences in the validation process, (b) modified framework for depicting interferences in the validation process. 246 Public Personnel Management 50(2) with job performance, then the direct substantiation of Inference 1 becomes more tenu- ous, in turn, requiring indirect substantiation via the pairing of additional inferences— which nonetheless represents an appropriate means of validation (Binning & Barrett, 1989; Binning & LeBreton, 2009; Joint Standards, 1999/2014). A second viable approach for validation would be to collectively substantiate Inference 2 and Inference 3, as shown in Figure 1(a). Inference 2 represents the degree to which the operational predictor measure reflects the underlying construct it is pur- ported to measure, while Inference 3 represents the degree to which the underlying predictor construct is relevant to the job performance domain. Using the nomenclature of the traditional trinitarian view of validity, this approach might be labeled as con- struct validity (Binning & Barrett, 1989; Joint Standards, 1999/2014), given that refer- ence is made to underlying constructs. However, in light of the fact that contemporary
  • 35. conceptualizations of validity eschew the notion of a distinct form of “construct valid- ity,” Binning and LeBreton (2009) argue that this approach is better characterized as content-based evidence. Specifically, although the indirect approach discussed here is distinct from the direct substantiation of Inference 1 described above, both approaches rely heavily on comparing predictor content to some underlying construct domain. The difference lies in the fact that the substantiation of Inference 1 compares the pre- dictor content with the job performance construct domain, while the substantiation of Inference 2 compares the predictor content with the underlying predictor construct domain. Furthermore, to account for the less direct route of substantiation in the latter approach, additional evidence is required to bridge the larger inferential leap to the job performance domain, which is reflected in Inference 3. Importantly, the collective examination of Inference 2 and Inference 3 represents an appropriate means for deriv- ing content validity evidence for sign-based selection measures that exhibit less iso- morphism with the job performance domain (Binning & LeBreton, 2009). Indeed, this logic is explicitly included in the Joint Standards (1999/2014; see pp. 172–173) and also implicitly described in the Uniform Guidelines, where it is stated that, For any selection procedure measuring a knowledge, skill, or ability [i.e., sign-based measure] the user should show that (a) the selection procedure
  • 36. measures and is a representative sample of that knowledge, skill, or ability [i.e., Inference 2]; and (b) that knowledge, skill, or ability is used in and is a necessary prerequisite to performance of critical or important work behavior(s) [i.e., Inference 3]. (Section 14.C.4) Another approach to validation suggested by Figure 1a would be to collectively substantiate Inference 4 and Inference 5. Here, Inference 4 represents the empirical relationship between the predictor measure and a job performance/criterion measure, while Inference 5 represents the degree to which the performance/criterion measure reflects the underlying performance construct domain it is intended to capture. This approach is analogous to what the Uniform Guidelines (1978) would characterize as a criterion-related validity study, and is further consistent with what the SIOP Principles (2003) and Joint Standards (1999/2014) would describe as evidence for validity based on relations to other variables. In practice, however, criterion- related validity studies Fisher et al. 247 often focus primarily and/or exclusively on Inference 4, at the exclusion of Inference 5. Unfortunately, this means that validation efforts of this nature are typically com- pleted with only cursory reference to underlying theory or
  • 37. consideration for the impli- cated construct domains. This again highlights the fact that criterion-related validation should not necessarily or always be seen as the most optimal strategy, especially from the perspective of generating a theory-based and scientifically grounded (as opposed to purely empirical) understanding of tests and measures used for employee selection. Conversely, when both Inference 4 and Inference 5 are given due consideration, one can see that this approach mirrors the approach pertaining to Inferences 2 and 3, sug- gesting two different but comparably informative approaches to understanding the relevance of the focal predictor measure to the underlying job performance domain. Returning to the topic of content validity, the above discussion suggests two poten- tially viable approaches for examining content-based evidence pertaining to an employee selection test. First, to the degree that the measure is work-sample-based, and, thus, exhibits a high degree of isomorphism with the job performance domain, then the focus should be on Inference 1. This can be referred to as validity evidence based on test content for work-sample-based measures. Second, to the extent that the measure is sign-based, and, thus, departs from isomorphism with job performance, then the focus should be collectively on Inference 2 and Inference 3. This can be referred to as validity evidence based on test content for sign- based measures. Figure
  • 38. 1b presents a modified framework that accounts for these divergent approaches to validation. Importantly, this framework is consistent with a contemporary conceptual- ization of validity that places primary emphasis on inferences in the validation process and also explicitly acknowledges the purpose of testing. At the same time, this frame- work also explicitly represents the degree of inferential leap necessary for linking the predictor measure with the performance construct domain—an issue that is of para- mount importance in the Uniform Guidelines (1978). Specifically, as sign-based mea- sures exhibit lower isomorphism with the job performance domain as compared with work-sample-based measures, additional evidence is needed to bridge the larger infer- ential leap, which is manifested in the requirement to substantiate two inferences (e.g., Inferences 2 and 3), as opposed to just one (i.e., Inference 1). As aptly summarized by Binning and LeBreton (2009), “content validation involves either (a) directly match- ing predictor content to criterion CDs [construct domains] or (b) matching predictor content to psychological CDs which are in turn related to criterion CDs (i.e., delineat- ing psychological traits believed to influence job behavior)” (p. 489). How to Gather Content Evidence for Personality Measures? Building on the preceding sections, our view is that evidence based on test content can
  • 39. and should be used as a means of validating personality test inferences for employee selection purposes. This practice is consistent with contemporary conceptualizations of validity, as embodied in the SIOP Principles (2003) and Joint Standards (1999/2014), which both place primary emphasis on inferences in the validation process and further 248 Public Personnel Management 50(2) view content-based evidence as one of several viable approaches to validating such inferences. Furthermore, the explicit prohibition against this practice in the Uniform Guidelines (1978) appears to be largely based on outdated notions of validity and the fallacious idea that tests focusing on constructs somehow represent a different category of measurement than other “non-construct” variables. Thus, content validity evidence should be treated as an appropriate means of validating personality test inferences. Accordingly, we can now make informed recommendations regarding how best to collect content validity evidence for personality measures. There are at least two pri- mary conduits through which one might maximize the content- relevance of personal- ity tests and generate appropriate content-based evidence, including (a) maximizing isomorphism with the job performance domain during initial test development, and (b)
  • 40. substantiating the appropriate inferences via expert judgment after test development, but prior to operational use. Maximizing Isomorphism During Test Development One way to increase the content-relevance of personality measures used for employee selection is to draw directly from the job performance domain while developing the test. The difference between work-sample-based measures (that sample primarily from the job performance domain) and sign-based measures (that sample primarily from a distinct yet related construct domain) is a matter of degree, as opposed to a strict categorical distinction. In other words, psychological tests can move along this continuum by sampling predominantly from one domain or the other, in addition to a combination of both (Spengler et al., 2009). Of relevance, sampling from the perfor- mance domain has the effect of increasing the degree of isomorphism between the test content and job performance, in turn, moving the measure toward the work-sample end of the spectrum. By way of illustration, traditional measures of personality such as those that might be found in the International Personality Item Pool (http://ipip.ori.org/; also see Goldberg et al., 2006) sample primarily from the personality construct domain of interest. As such, an example item for the trait of extraversion might include “I am the
  • 41. life of the party.” Conversely, personality measures specifically designed to be rele- vant in the workplace (e.g., Ellingson et al., 2013) sample both from the personality construct domain as well as the intended domain of application (i.e., work). Here, an example item for extraversion would be “I involve my coworker in what I am doing.” Notably, the personality statement in the latter example is more obviously and directly applicable to the job performance domain. Related to this, there is a growing body of literature that supports the practice of contextualizing personality scale content to spe- cifically reference the domain of work (e.g., Hunthausen et al., 2003; Lievens et al., 2008; Schmit et al., 1995; Shaffer & Postlethwaite, 2012; also see Ones & Viswesvaran, 2001). While this research on contextualization does not primarily focus on the issue of content validity, the practice of modifying personality scale content to explicitly reference the work context nonetheless has the consequence of extending content- relevance to the job performance domain. Indeed, it has been suggested that the use of http://ipip.ori.org/ Fisher et al. 249 custom-developed personality tests based on work- contextualization “is extremely valuable because it may open up content validation as a potential validation strategy”
  • 42. (Morgeson et al., 2007a, p. 1043). In summary, the content-relevance of personality measures used for employee selection can be improved by sampling both from the personality construct domain and the job performance domain. This practice is ultimately manifested in the creation and use of personality scale items that directly reference and/or implicate the perfor- mance domain in question. Practically, this can be accomplished by first identifying critical features of performance via job analysis efforts and subsequently creating personality-based items that explicitly reflect the identified performance elements. If this is done, the personality construct of interest will ultimately be operationalized in terms of relevant behaviors and experiences that collectively comprise important aspects of performance on-the-job. Consequently, the more the personality items directly implicate or reference job behaviors (e.g., “I show up to work on time”)—in addition to cognitive and affective performance-related experiences (e.g., “I get ner- vous when I talk to clients”)—the lower the inferential leap necessary for linking the content of the test directly to the job performance domain. Substantiation of Appropriate Inferences via Expert Judgment The considerations discussed in the preceding section are only applicable if a new test is being created or if the opportunity to modify an existing general-focused personality
  • 43. test is available. The topic of the current section is on generating content-based evi- dence for existing or unmodifiable personality measures to be used for employee selection. For this, the primary mechanism of generating such evidence involves elicit- ing expert judgment regarding content-relevance. More specifically, when the person- ality measure in question exhibits a high degree of isomorphism with the performance domain of interest (e.g., contextualized personality scales), content validity evidence can be generated by eliciting expert judgment regarding the direct overlap (or content match) between the personality measure content and the job performance domain (i.e., Inference 1 from Figure 1). However, to the extent that this is not the case, as with noncontextualized or general-focused personality measures, content-based validation would proceed via expert judgment regarding Inference 2 and Inference 3. Example scales that might be used to substantiate these various inferences via expert judgment are shown in Figure 2. These scales can be used to assist in the collection of content- based validity evidence. The first approach involves substantiating the direct link between the predictor measure and the job performance construct domain (i.e., Inference 1). In other words, expert judges are asked to indicate the degree to which the specific items that comprise the test are directly relevant to performance on-the-job. A common method for quanti-
  • 44. fying such ratings is the content validity ratio (Lawshe, 1975), which—as originally conceptualized—asks subject-matter experts (SMEs) whether a particular skill or knowledge area measured by test items is “Essential,” “Useful but not essential,” or “Not necessary” for performance of the job in question. Although the content validity 250 Public Personnel Management 50(2) ratio was originally intended for tests that focus on knowledge or skills, with slight modifications, it can be applied to measures of personality as well. Examples of this can be seen in Figure 2. Other scales may also be created to serve the same purpose as long as they adequately capture the degree to which the test content is relevant to the job performance domain of interest. Figure 2. Example scales for substantiating relevant inferences related to content validation. Fisher et al. 251 The second approach involves substantiating the link between the test in question and the underlying construct that it is purported to measures (i.e., Inference 2), in addi- tion to the link between the predictor construct and the underlying job performance
  • 45. domain (i.e., Inference 3). Aguinis et al.’s (2001) discussion of the content validation ratio suggests that it can also be modified to serve the purpose of substantiating Inference 2, as was the case with Inference 1 above. As noted by Aguinis et al., expert judges can be asked to “rate whether each item is essential, useful but not essential, or not necessary for measuring the attribute [emphasis added]” (p. 38). Thus, in accor- dance with the above discussion, the primary distinction between the methods for substantiating Inference 1 and Inference 2 is that the former focuses on the job perfor- mance domain whereas the latter focuses on the predictor construct domain. Again, other scales may be created to serve the same purpose as long as they adequately capture the degree to which the test content is reflective of the underlying predictor construct (see Figure 2). In terms of Inference 3, Arthur and Villado (2008) indicate that this inference “is established via job analysis processes that are intended to identify the predictor con- structs deemed requisite for the successful performance of the specified job (or perfor- mance) behaviors in question” (p. 436). In this regard, personality-oriented job analysis efforts can be used to identify and substantiate the relevance of particular traits for the job under investigation (O’Neill et al., 2013; Raymark et al., 1997; Tett & Burnett, 2003; Tett & Christiansen, 2007). For example, as shown in Figure 2, Raymark et al.
  • 46. (1997) adopted the following stem for their Personality-Related Position Requirements Form: “Effective performance in this position requires the person to . . .” (p. 724). Taken together, expert judgment regarding the above inferences constitutes evidence for validity based on test content (Binning & LeBreton, 2009). Discussion There are conflicting recommendations regarding the use of content validity evidence to support personality test inferences for employee selection. Unfortunately, inconsis- tencies of this nature may be the inevitable result of various different constituencies (e.g., legal, professional, scientific) jointly vying to determine standards and guide- lines within the collective enterprise of test use and validation (see Binning & Barrett, 1989; Landy, 1986; McDaniel et al., 2011). As such, it may be an unrealistic ideal to achieve a perfect resolution that fully addresses any and all potential inconsistencies. Nonetheless, it is our hope that the discussion presented above has ameliorated the ambiguity to some meaningful degree. In particular, we believe that the recommenda- tions above concerning content validity evidence and personality testing are—to the greatest extent possible—consistent with the spirit and intention of prevailing stan- dards and guidelines, even if not with certain technical proscriptions (Uniform Guidelines, 1978). That being said, there remain a few caveats to be discussed below.
  • 47. First, aside from the explicit prohibition against content validity for traits and con- structs, another requirement in the Uniform Guidelines (1978) is that tests validated via a content-based strategy should focus on attributes that are operationally defined 252 Public Personnel Management 50(2) in terms of observable job behaviors (Section 14.C.4). This is potentially problematic for measures of personality, as personality constructs implicate not only observable behavioral manifestations (e.g., “I show up to work on time”), but also various cogni- tive and affective experiences (e.g., “I get nervous when I talk to clients”). As a poten- tial solution, when considering the cognitive and affective experiences implicated by a particular construct, it is helpful to think about the work- relevant behavioral conse- quences of such experiences (as informed by job analysis efforts), and correspond- ingly develop items to reflect such behaviors. Despite this potential solution, we view the strict requirement to operationalize all measures that are subject to content valida- tion in terms of observable behaviors as problematic. Specifically, in an information- based economy, many critical features of job performance may not be directly or obviously observable. As a result, measures that are limited to observable behaviors
  • 48. may suffer from construct deficiency to the extent that non- observable experiences represent important aspects of the construct domain. Furthermore, the recommenda- tions above provide viable paths for content validation regardless of whether the test in question exhibits complete isomorphism with performance in terms of observable job behaviors. To the extent that it does, the substantiation of Inference 1 represents an appropriate focus. Conversely, to the extent that it does not, the collective substantia- tion of Inference 2 and Inference 3 becomes an appropriate alternative (Binning & LeBreton, 2009). Regardless, those concerned about strict adherence to the Uniform Guidelines (1978) should consider limiting the personality items used to those that directly reference observable behavior. Second, considering the critical attention given to the Uniform Guidelines (1978), it might come across as though our goal is to somehow vilify these guidelines. This is certainly not the case. The guidelines were created with admirable intentions regard- ing standardization of validation efforts and the prevention of discrimination in employment practices. These are extremely important concerns, and we support efforts that strive to accomplish these noble ideals. At the same time, it is important that efforts of this nature are concordant with contemporary scientific understanding per- taining to the subject matter. Unfortunately, as described by McDaniel et al. (2011),
  • 49. the Uniform Guidelines have “not been revised in over 3 decades [and as a result are] substantially inconsistent with scientific knowledge and professional guidelines and practice” (p. 494). As such, our intention is not to disparage these guidelines, but rather to ensure that validation practices are consistent with contemporary conceptualiza- tions of validity. We also believe that the value of the above discussion extends far beyond the Uniform Guidelines and employee selection in the United States. As noted in the introduction, it is important to carefully consider appropriate validation strate- gies so that accurate inferences are made about all test-takers, regardless of when, where, or why testing and validation efforts occur. Accordingly, the discussion pre- sented herein is likely to be of relevance for content-based validation efforts in all areas that utilize psychological testing, including, for example, educational testing, clinical practice, and international employee selection efforts. Third, not everyone may agree with our choice of terminology in relation to the two paths for content-based validation described above. Specifically, we opted to label the Fisher et al. 253 substantiation of Inference 1 as validity evidence based on test content for work-sam- ple-based measures and the collective substantiation of
  • 50. Inferences 2 and 3 as validity evidence based on test content for sign-based measures (see Figure 1). In considering our choice for terminology, we explored three possible avenues. First, one possibility was to adopt terminology that is consistent with the historical trinitarian view of valid- ity, which places primary emphasis on three “types” of validity—content, criterion, and construct. From this perspective, Inference 1 might be labeled as content validity and Inferences 2 and 3 as construct validity. However, as is clearly outlined above, this trini- tarian view is considered problematic and inconsistent with contemporary thinking regarding validation. Second, another possibility is to create a new label to replace construct validity for Inferences 2 and 3, given that a primary criticism of the historical trinitarian view is the notion of a separate category of construct validity. While we understand the appeal of this second approach, we also believe that creating a new label might have the unintended consequence of adding further confusion to the literature, especially considering that there is already much potential for confusion in the arena of validation terminology. This brings us to our third and favored approach, which is to adopt terminology that is consistent with a contemporary perspective on validity, wherein validity is a unitary concept with various sources of supporting evidence. From this perspective, the labeling of different inferential pathways should be based on a thoughtful consideration of which form of validity evidence
  • 51. best aligns with the valida- tion activities implicated by the paths of interest. Regarding validity evidence based on test content, this broadly refers to an analysis of whether the content of a measure ade- quately reflects (or samples from) a relevant underlying construct domain, which is precisely what is involved in both Inference 1 and the combination of Inferences 2/3. Finally, and perhaps most critically, we are by no means arguing that a content- based strategy should necessarily be the preferred or only method of validation. The important point is that content-based evidence is not inherently more or less appropri- ate than other sources of validity evidence. Rather, evidence based on test content represents one of several potential sources (Joint Standards, 1999/2014), and the par- ticular circumstances of the validation effort should be carefully considered before determining which source(s) of evidence are most appropriate. Validity generaliza- tion, validity transport, and synthetic validity also represent viable options. Furthermore, validation efforts need not (and should not) be limited to just one form of evidence. For example, content-based evidence can be combined with evidence pertaining to an empirical relationship with a criterion measure, which, in turn, would result in a stronger validity argument. Or, to the extent that faking or response distor- tion is a concern (Morgeson et al., 2007b; Ones et al., 2007), evidence pertaining to
  • 52. response processes might be gathered as well. In a similar vein, although we discuss two potential paths for content validation (i.e., Inference 1 vs. Inferences 2 and 3; see Figure 1), this does not preclude efforts at validating all three inferences, which again would result in a stronger validity argument. Ultimately, “the process of validation involves accumulating relevant evidence to provide a sound scientific basis for the proposed score interpretations” (Joint Standards, 1999/2014, p. 11). To the extent feasible, the more evidence the better. 254 Public Personnel Management 50(2) Authors’ Note An earlier version of this manuscript was presented as a poster session at the 31st Annual Conference of the Society for Industrial and Organizational Psychology in Anaheim, California. Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was, in part,
  • 53. supported by a Faculty Development Summer Fellowship Grant awarded by The University of Tulsa to the first author. ORCID iD David M. Fisher https://orcid.org/0000-0002-7810-3494 References Aguinis, H., Henle, C. A., & Ostroff, C. (2001). Measurement in work and organizational psy- chology. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran (Eds.), Handbook of industrial, work and organizational psychology (Vol. 1, pp. 27–50). SAGE. American Educational Research Association, American Psychological Association, & National Council on Measurement in Educatio n. (2014). Standards for Educational and Psychological Testing. (Original work published 1999) Arthur, W., Jr., & Villado, A. J. (2008). The importance of distinguishing between constructs and methods when comparing predictors in personnel selection research and practice. Journal of Applied Psychology, 93, 435–442. https://doi.org/10.1037/0021-9010.93.2.435 Biddle, D. A. (2011). Adverse impact and test validation: A practitioner’s handbook (3rd ed.). Infinity Publishing. Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis
  • 54. of the inferential and evidential bases. Journal of Applied Psychology, 74, 478–494. https:// doi.org/10.1037/0021-9010.74.3.478 Binning, J. F., & LeBreton, J. M. (2009). Coherent conceptualization is useful for many things, and understanding validity is one of them. Industrial and Organizational Psychology, 2, 486–492. https://doi.org/10.1111/j.1754-9434.2009.01178.x Brannick, M. T., Levine, E. L., & Morgeson, F. P. (2007). Job and work analysis: Methods, research, and applications for human resource management (2nd ed.). SAGE. Ellingson, J. E., Heggestad, E. D., & Myers, H. (2013). The workplace IPIP: A contextualized measure of personality [Unpublished manuscript]. Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43, 38290–38315. Furr, R. M., & Bacharach, V. R. (2014). Psychometrics: An introduction (2nd ed.). SAGE. Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. C. (2006). The international personality item pool and the future of public- domain personality measures. Journal of Research in Personality, 40, 84–96. https://doi. org/10.1016/j.jrp.2005.08.007
  • 55. https://orcid.org/0000-0002-7810-3494 https://doi.org/10.1037/0021-9010.93.2.435 https://doi.org/10.1037/0021-9010.74.3.478 https://doi.org/10.1037/0021-9010.74.3.478 https://doi.org/10.1111/j.1754-9434.2009.01178.x https://doi.org/10.1016/j.jrp.2005.08.007 https://doi.org/10.1016/j.jrp.2005.08.007 Fisher et al. 255 Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the job analysis–content validity process. In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organiza- tions (pp. 3–34). Jossey-Bass. Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11, 385– 398. https://doi.org/10.1037/0735-7028.11.3.385 Guion, R. M. (2004). Validity and reliability. In S. G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (pp. 57– 76). Blackwell Publishing. Haynes, S. N., Richard, D., & Kubany, E. S. (1995). Content validity in psychological assess- ment: A functional approach to concepts and methods. Psychological Assessment, 7, 238– 247. https://doi.org/10.1037/1040-3590.7.3.238 Hsu, C. (2004). The testing of America. U.S. News and World Report, 137, 68–69. Hunthausen, J. M., Truxillo, D. M., Bauer, T. N., & Hammer, L. B. (2003). A field study of
  • 56. frame-of-reference effects on personality test validity. Journal of Applied Psychology, 88, 545–551. https://doi.org/10.1037/0021-9010.88.3.545 Jeanneret, P. R., & Zedeck, S. (2010). Professional guidelines/standards. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (pp. 593–625). Routledge. Landy, E. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist, 41, 1183–1192. https://doi.org/10.1037/0003- 066X.41.11.1183 Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70, 237–238. https://doi.org/10.1037/0021- 9010.70.1.237 Lievens, F., De Corte, W., & Schollaert, E. (2008). A closer look at the frame-of-reference effect in personality scale scores and validity. Journal of Applied Psychology, 93(2), 268– 279. https://doi.org/10.1037/0021-9010.93.2.268 McDaniel, M. A., Kepes, S., & Banks, G. C. (2011). The uniform guidelines are a detriment to the field of personnel selection. Industrial and Organizational Psychology, 4, 494–514. https://doi.org/10.1111/j.1754-9434.2011.01382.x
  • 57. Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007a). Are we getting fooled again? Coming to terms with limitations in the use of personality tests for personnel selection. Personnel Psychology, 60, 1029–1047. https://doi. org/10.1111/j.1744-6570.2007.00100.x Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007b). Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60, 683–729. https://doi.org/10.1111/j.1744-6570.2007.00089.x Murphy, K. R., Dzieweczynski, J. L., & Zhang, Y. (2009). Positive manifold limits the rel- evance of content-matching strategies for validating selection test batteries. Journal of Applied Psychology, 94, 1018–1031. https://doi.org/10.1037/a0014075 O’Neill, T. A., Goffin, R. D., & Rothstein, M. (2013). Personality and the need for personality- oriented work analysis. In N. D. Christiansen & R. P. Tett (Eds.), Handbook of personality at work (pp. 226–252). Routledge. O’Neill, T. A., Goffin, R. D., & Tett, R. P. (2009). Content validation is fundamental for opti- mizing the criterion validity of personality tests. Industrial and Organizational Psychology, 2, 509–513. https://doi.org/10.1111/j.1754-9434.2009.01184.x Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality
  • 58. assessment in organizational settings. Personnel Psychology, 60, 995–1027. https://doi. org/10.1111/j.1744-6570.2007.00099.x https://doi.org/10.1037/0735-7028.11.3.385 https://doi.org/10.1037/1040-3590.7.3.238 https://doi.org/10.1037/0021-9010.88.3.545 https://doi.org/10.1037/0003-066X.41.11.1183 https://doi.org/10.1111/j.1744-6570.1975.tb01393.x https://doi.org/10.1037/0021-9010.70.1.237 https://doi.org/10.1037/0021-9010.93.2.268 https://doi.org/10.1111/j.1754-9434.2011.01382.x https://doi.org/10.1111/j.1744-6570.2007.00100.x https://doi.org/10.1111/j.1744-6570.2007.00100.x https://doi.org/10.1111/j.1744-6570.2007.00089.x https://doi.org/10.1037/a0014075 https://doi.org/10.1111/j.1754-9434.2009.01184.x https://doi.org/10.1111/j.1744-6570.2007.00099.x https://doi.org/10.1111/j.1744-6570.2007.00099.x 256 Public Personnel Management 50(2) Ones, D. S., & Viswesvaran, C. (2001). Integrity tests and other criterion-focused occupational personality scales (COPS) used in personnel selection. International Journal of Selection and Assessment, 9, 31–39. https://doi.org/10.1111/1468- 2389.00161 Raymark, P. H., Schmit, M. J., & Guion, R. M. (1997). Identifying potentially useful person- ality constructs for employee selection. Personnel Psychology, 50, 723–736. https://doi. org/10.1111/j.1744-6570.1997.tb00712.x
  • 59. Rothstein, M. G., & Goffin, R. D. (2006). The use of personality measures in personnel selec- tion: What does current research support? Human Resource Management Review, 16, 155– 180. https://doi.org/10.1016/j.hrmr.2006.03.004 Schmit, M. J., & Ryan, A. M. (2013). Legal issues in personality testing. In N. D. Christiansen & R. P. Tett (Eds.), Handbook of personality at work (pp. 525– 542). Routledge. Schmit, M. J., Ryan, A. M., Stierwalt, S. L., & Powell, A. B. (1995). Frame-of-reference effects on personality scale scores and criterion-related validity. Journal of Applied Psychology, 80, 607–620. https://doi.org/10.1037/0021-9010.80.5.607 Schmitt, M. (2006). Conceptual, theoretical, and historical foundations of multimethod assess- ment. In M. Eid & E. Diener (Eds.), Handbook of multimethod measurement in psychology (pp. 9–25). American Psychological Association. Shaffer, J. A., & Postlethwaite, B. W. (2012). A matter of context: A meta-analytic investiga- tion of the relative validity of contextualized and noncontextualized personality measures. Personnel Psychology, 65, 445–494. https://doi.org/10.1111/j.1744-6570.2012.01250.x Sireci, S. G. (1998). The construct of content validity. Social Indicators Research, 45, 83–117. https://doi.org/10.1023/A:1006985528729 Society for Industrial and Organizational Psychology. (2003). Principles for the validation and
  • 60. use of personnel selection procedures (4th ed.). Spengler, M., Gelléri, P., & Schuler, H. (2009). The construct behind content validity: New approaches to a better understanding. Industrial and Organizational Psychology, 2, 504– 508. https://doi.org/10.1111/j.1754-9434.2009.01183.x Tan, J. A. (2009). Babies, bathwater, and validity: Content validity is useful in the validation process. Industrial and Organizational Psychology, 2, 514–516. https://doi.org/10.1111/ j.1754-9434.2009.01185.x Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job per- formance. Journal of Applied Psychology, 88, 500–517. https://doi.org/10.1037/0021- 9010.88.3.500 Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: A response to Morgeson, Campion, Dipboye, Hollenbeck, Murphy, and Schmitt (2007). Personnel Psychology, 60, 967–993. https://doi.org/10.1111/j.1744- 6570.2007.00098.x Tett, R. P., Simonet, D. V., Walser, B., & Brown, C. (2013). Trait activation theory: Applications, developments, and implications for person–workplace fit. In N. D. Christiansen & R. P. Tett (Eds.), Handbook of personality at work (pp. 71–100). Routledge. Thornton, G. C., III. (2009). Evidence of content matching is evidence of validity.
  • 61. Industrial and Organizational Psychology, 2, 469–474. https://doi.org/10.1111/j.1754- 9434.2009.01175.x Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples and criteria. Journal of Applied Psychology, 52, 372–376. https://doi.org/10.1037/h0026244 Wollack, S. (1976). Content validity: Its legal and psychometric basis. Public Personnel Management, 5, 397–408. https://doi.org/10.1111/1468-2389.00161 https://doi.org/10.1111/j.1744-6570.1997.tb00712.x https://doi.org/10.1111/j.1744-6570.1997.tb00712.x https://doi.org/10.1016/j.hrmr.2006.03.004 https://doi.org/10.1037/0021-9010.80.5.607 https://doi.org/10.1111/j.1744-6570.2012.01250.x https://doi.org/10.1023/A:1006985528729 https://doi.org/10.1111/j.1754-9434.2009.01183.x https://doi.org/10.1111/j.1754-9434.2009.01185.x https://doi.org/10.1111/j.1754-9434.2009.01185.x https://doi.org/10.1037/0021-9010.88.3.500 https://doi.org/10.1037/0021-9010.88.3.500 https://doi.org/10.1111/j.1744-6570.2007.00098.x https://doi.org/10.1111/j.1754-9434.2009.01175.x https://doi.org/10.1111/j.1754-9434.2009.01175.x https://doi.org/10.1037/h0026244 Fisher et al. 257 Author Biographies David M. Fisher is an assistant professor of psychology at The University of Tulsa. Prior to his
  • 62. academic position, he did consulting work that focused on selection and testing for public safety agencies. His research interests include employee selection, organizational work teams, and occupational health/resilience. Christopher R. Milane is a senior project manager of research services at Qualtrics. The majority of this manuscript was written while he was a graduate student at The University of Tulsa. His research interests include employee selection, organizational work teams, and leader- ship development. Sarah Sullivan is the department coordinator at the Doerr Institute for New Leaders at Rice University. The majority of this manuscript was written while she was a graduate student at The University of Tulsa. Her research interests include leadership development, employee selection, and organizational work teams. Robert P. Tett is professor of Industrial-Organizational (I-O) Psychology and director of the I-O Graduate Program at The University of Tulsa where he teaches courses in personnel selec- tion, psychometrics, statistics, personality at work, and evolutionary psychology. His research targets personality trait-situation interactions, meta-analysis, leadership competencies, and trait- emotional intelligence. Copyright of Public Personnel Management is the property of Sage Publications Inc. and its
  • 63. content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. Methodological and Statistical Advances in the Consideration of Cultural Diversity in Assessment: A Critical Review of Group Classification and Measurement Invariance Testing Kyunghee Han, Stephen M. Colarelli, and Nathan C. Weed Central Michigan University One of the most important considerations in psychological and educational assessment is the extent to which a test is free of bias and fair for groups with diverse backgrounds. Establishing measurement invariance (MI) of a test or items is a prerequisite for meaningful comparisons across groups as it ensures that test items do not function differently across groups. Demonstration of MI is particularly important in assessment settings where test scores are used in decision making. In this review, we begin with an overview of test bias and fairness, followed by a discussion of issues involving group classification, focusing on categorizations of race/ethnicity and sex/gender. We then describe procedures used to establish MI, detailing steps in the implementation of multigroup confirmatory factor analysis, and discussing recent developments in alternative procedures for
  • 64. establishing MI, such as the alignment method and moderated nonlinear factor analysis, which accommodate reconceptualization of group categorizations. Lastly, we discuss a variety of important statistical and conceptual issues to be considered in conducting multigroup confirmatory factor analysis and related methods and conclude with some recommendations for applications of these procedures. Public Significance Statement This article highlights some important conceptual and statistical and issues that researchers should consider in research involving MI to maximize the meaningfulness of their results. Additionally, it offers recommendations for conducting MI research with multigroup confirmatory factor analysis and related procedures. Keywords: test bias and fairness, categorizations of race/ethnicity and sex/gender, measurement invariance, multigroup CFA Supplemental materials: http://dx.doi.org/10.1037/pas0000731.supp When psychological tests are used in diverse populations, it is assumed that a given test score represents the same level of the underlying construct across groups and predicts the same outcome score. Suppose that two hypothetical examinees, a middle aged Mexican immigrant woman and a Jewish European American male college student, each produced the same score on a measure of depression. We would like to conclude that the examinees exhibit the same severity and breadth of depression symptoms and that their therapists would rate them similarly on relevant behavioral and symptom measures. If empirical evi -
  • 65. dence indicates otherwise, and such conclusions are not justi- fied, scores on the measure are said to be biased. Although it has been defined variously, a representative definition refers to psychometric bias as “systematic error in estimation of a value”). A biased test “is one that systematically overestimates or underestimates the value of the variable it is intended to assess” due to group membership, such as ethnicity or gender (Reynolds & Suzuki, 2013, p. 83). The “value of the variable it is intended to assess” can either be a “true score” (see S1 in the online supplemental materials) on the latent construct or a score on a specified criterion measure. The former appli - cation concerns what is sometimes termed measurement bias, in which the relationship between test scores and the latent attri - bute that these test scores measure varies for different groups (Borsboom, Romejin, & Wicherts, 2008; Millsap, 1997), whereas the latter application concerns what is referred to as predictive bias, which entails systematic inaccuracies in the prediction of a criterion from a test depending upon group membership (Clearly, 1968; Millsap, 1997). Kyunghee Han, Stephen M. Colarelli, and Nathan C. Weed, Department of Psychology, Central Michigan University. This article has not been published elsewhere, nor has it been submitted simultaneously for publication elsewhere. The author(s) de- clared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The author(s) received no funding for this study. Correspondence concerning this article should be addressed to
  • 66. Kyung- hee Han, Department of Psychology, Central Michigan University, Mount Pleasant, MI 48859. E-mail: [email protected] T hi s do cu m en t is co py ri gh te d by th e A m er ic
  • 70. ss em in at ed br oa dl y. Psychological Assessment © 2019 American Psychological Association 2019, Vol. 31, No. 12, 1481–1496 1040-3590/19/$12.00 http://dx.doi.org/10.1037/pas0000731 1481 http://dx.doi.org/10.1037/pas0000731.supp mailto:[email protected] http://dx.doi.org/10.1037/pas0000731 Test bias should not be confused with test fairness. Although the two concepts have been used interchangeably at times (e.g., Hunter & Schmidt, 1976), test fairness entails a broader and more sub- jective evaluation of assessment outcomes from perspectives of social justice (Kline, 2013), whereas test bias is an empirical property of test scores, estimated statistically (Jensen, 1980). Ap-
  • 71. praisals of test fairness include multifaceted aspects of the assess- ment process, lack of test bias being only one facet (American Educational Research Association, American Psychological Asso- ciation [APA], & National Council on Measurement in Education, 2014; Society for Industrial Organizational Psychology, 2018; see S2 in the online supplemental materials). In the example above, the measure of depression may be unfair for the Mexican female client if an English language version of the measure was used without evaluating her English proficiency, if her score was derived using American norms only, if computerized administration was used, or if use of the test leads her to be less likely than members of other groups to be hired for a job. Although test bias is not a necessary condition for test unfairness to exist, it may be a sufficient condition (Kline, 2013). Accordingly, it is especially important to evaluate whether test scores are biased against vulnerable groups. The evaluation of test bias and test fairness each entails a comparison of one group of people with another. While asking the question, “Is a test biased?” we are also implicitly asking “against or for which group?” Similarly, if we are concerned about using a test fairly, we must ask: are the outcomes based on the results of the test apportioned fairly to groups of people who have taken
  • 72. the test? Thus, the categorization of people into distinct groups is a sine qua non of many aspects of psychological assessment re- search. Racial/ethnic and sex/gender categories are prominent fea- tures of the social, cultural, and political landscapes in the United States (e.g., Helms, 2006; Hyde, Bigler, Joel, Tate, & van Anders, 2019; Jensen, 1980; Newman, Hanges, & Outtz, 2007), and have therefore been the most commonly studied group variables in bias research (e.g., Warne, Yoon, & Price, 2014). Most of the initial research on and debates about test bias and fairness in the United States stemmed from political movements addressing race and sex discrimination (e.g., Sackett & Wilk, 1994). In service of pressing research on questions of discriminatio n and economic inequality, it thus became commonplace among psychologists and social scien- tists to categorize people crudely into groups (based primarily on race, ethnicity, and sex/gender) without much thought to the mean- ing and validity of those categorizations (e.g., Hyde et al., 2019; Yee, 1983; Yee, Fairchild, Weizmann, & Wyatt, 1993). This has changed somewhat over the past two decades as scholarship by psychologists and others has increasingly focused on nuances of identity, multiculturalism, intersectionality, and multiple position- alities (Cole, 2009; Song, 2017). This scholarship has emphasized
  • 73. that racial, ethnic, and gender classifications can be complex, ambiguous, and debatable—and that identities are often self- constructed and can be fluid (Helms, 2006; Hyde et al., 2019). The first goal of this review, therefore, is to overview contemporary issues involving race/ethnicity and sex/gender classifications in bias research and to describe alternative approaches to the mea - surement of these variables. The psychometric methods used to examine test bias usually depend on the definition of test bias operating for a given appli - cation. Evaluating predictive bias (i.e., establishing predictive in- variance) often involves regressing total scores from a criterion measure onto total scores on the measure of interest, and compar- ing regression slopes and intercepts across groups (Clearly, 1968). Evaluating measurement bias (i.e., establishing measurement in- variance [MI]) often necessitates more advanced quantitative meth- ods, such as confirmatory factor analysis (CFA) or methods deriv- ing from item response theory, to compare the properties of item scores and scores on latent variables across different groups. Multigroup confirmatory factor analysis (MGCFA) has been one of the most commonly used techniques to examine MI (Davidov, Meuleman, Cieciuch, Schmidt, & Billiet, 2014) because it pro- vides a comprehensive framework for evaluating different forms of MI. The second goal of this review is to provide a broad overview of MGCFA and related procedures and their relevance to psychological assessment.
  • 74. Although MGCFA is a well-established procedure in the eval- uation of MI, it has limitations. MGCFA is not an optimal method for conducting MI tests when many groups are involved. More- over, the grouping variable in MGCFA must be categorical, and therefore does not permit MI testing with continuous grouping variables (e.g., age). As modern research questions may require MI testing across many groups, and with continuous reconceptualiza- tions of some of the grouping variables (e.g., gender), more flex- ible techniques are needed. Our third goal, therefore, is to describe two recent alternative methods for MI testing, the alignment method and moderated nonlinear factor analysis, that aim to over- come these limitations. We conclude the review with a discussion of some important statistical and conceptual issues to be consid- ered when evaluating MI, and include a list of recommended practices. Group Classifications Used in Bias Research Racial and Ethnic Classifications Race and ethnicity (see S3 in the online supplemental materials) are conceptually vague and empirically complex social constructs that have been examined by numerous researchers across many disciplines (Betancourt & López, 1993; Helms, Jernigan, & Mascher, 2005; Yee et al., 1993). Consider race. As a biological concept, it is essentially meaningless. In most cases, there is more genetic variation within so-called racial groups than between
  • 75. racial groups (Witherspoon et al., 2007). Even if we allow race to be defined by a combination of specific morphological features and ancestry, few “racial” populations are pure (Gibbons, 2017). Most are mixed—like real numbers, with infinite gradations. For exam- ple, although many African Americans trace their ancestry to West Africa, about 20% to 30% of their genetic heritage is from Euro- pean and American Indian ancestors (Parra et al., 1998), and racial admixture continues as the frequency of interracial marriages increases (Rosenfeld, 2006; U.S. Census Bureau, 2008). Even if one were to accept race as a combination of biological features and cultural and social identities (shared cultural heritage, hardships, and discrimination), there is the problem of degree. For example, while many Black Americans share social and cultural identities based on roots in American slavery and racial discrimination, not all do, such as recent Black immigrants from the Caribbean. Racial and ethnic classifications are often conflated. In psychological research, “Asian” is commonly used both as a cultural (Nisbett, T hi s do cu
  • 80. dl y. 1482 HAN, COLARELLI, AND WEED http://dx.doi.org/10.1037/pas0000731.supp http://dx.doi.org/10.1037/pas0000731.supp Peng, Choi, & Norenzayan, 2001) and racial category (Rushton, 1994). Yet it is a catch-all term based primarily on geography. It typically refers to people from (or whose ancestors are from) South, Southeast, and Eastern Asia. The term Hispanic often conflates linguistic, cultural, and sometimes even morphological features (Humes, Jones, & Ramirez, 2010). In public policy, mixtures of racial (or ethnic) background has only recently begun to be addressed. The U.S. Census, for exam- ple, did not include a multiracial category until 2000 (Nobles, 2000). We are only beginning to see assessment studies that parse people from traditional broad groupings into smaller, more mean- ingful and homogeneous groups. In one of the few studies that identified different types of Asians, Appel, Huang, Ai, and Lin (2011) found significant (and sometimes major) differences in physical, behavioral, and mental health problems among Chinese, Vietnamese, and Filipina women in the U.S. More recently, Tal- helm et al. (2014) found important differences in culture and thought patterns within only one Asian country, China. People in
  • 81. northern China were significantly more individualistic than those in southern China, who were more collectivistic. With current and historical farming practices as their theoretical centerpiece, they examined farming practices as causal factors. In northern China wheat has been farmed as a staple crop for millennia, whereas in southern China rice has been (and is) the staple crop. Talhelm et al. argued that the farming practices required by these two crops required different types of social organization that, over time, influenced cultural values and cognition. The work by Talhelm and colleagues is important because it is one of the first studies to show—along with a powerful theoretical rationale—that there are important cultural differences between people from what has typ- ically been thought of as a relatively homogeneous racial and cultural group. In another seminal article, Gelfand and colleagues (2011) ex- amined the looseness-tightness dimension of cultures in 33 coun- tries. This dimension reflects the strength of norms and the toler- ance of deviant behavior. Loose cultures have weaker norms and are more tolerant of deviant behavior. While there was substantial variation between countries, there was still considerable variation among countries typically considered “Asian.” Hong Kong was the loosest (6.3), while Malaysia was the tightest (11.8), with the People’s Republic of China (7.9), Japan (8.6), South Korea