SlideShare a Scribd company logo
1 of 134
Some Better Practices for Measuring Racial and Ethnic Identity
Constructs
Janet E. Helms
Boston College
Racial and ethnic identity (REI) measures are in danger of
becoming conceptually meaningless because
of evaluators’ insistence that they conform to measurement
models intended to assess unidimensional
constructs, rather than the multidimensional constructs
necessary to capture the complexity of internal-
ized racial or cultural socialization. Some aspects of the
intersection of REI theoretical constructs with
research design and psychometric practices are discussed, and
recommendations for more informed use
of each are provided. A table that summarizes some
psychometric techniques for analyzing multidimen-
sional measures is provided.
Keywords: racial identity, ethnic identity, reliability, validity,
factor analysis
In counseling psychology, the measurement of racial identity
constructs is a relatively new phenomenon. Arguably, the
practice
began when Jackson and Kirschner (1973) attempted to
introduce
complexity into the measurement of Black students’ racial
identity
by using a single categorical item with multiple options (e.g.,
“Black,” “Negro”) that the students could use to describe them-
selves. Helms and Parham (used in Parham & Helms, 1981) and
Helms and Carter (1990) built on the idea that assessment of
individual differences in racial identity is important, and they
added complexity to the measurement process by (a) developing
measures that were based on racial identity theoretical frame-
works, (b) using multiple items to assess the constructs inherent
to
the theories, and (c) asking participants to use continua (i.e.,
5-point Likert scales) rather than categories to self-describe.
These
principles underlie the Black Racial Identity Attitudes Scale
(BRIAS; formerly RIAS–B) and White Racial Identity Attitudes
Scale (WRIAS).
In response to perceived conceptual, methodological, or content
concerns with Helms and associates’ racial identity measures,
many rebuttal measures followed. Rebuttal measures are scales
that the new scale originator(s) specifically described as correc-
tions for one or more such deficiencies in preexisting identity
measures (e.g., Phinney, 1992, p. 157). Subsequent measures
have
tended to rely on the previously listed basic measurement
princi-
ples introduced by Parham and Helms (1981), although the theo-
retical rationales for the measures have varied. Phinney’s Multi-
group Ethnic Identity Measure (MEIM), the most frequently
used
of the rebuttal measures to date, added the principle of
measuring
“ethnic” rather than “racial identity,” which she seemingly
viewed
as interchangeable constructs. The MEIM also introduced the
principle of measuring the same identity constructs across racial
or
ethnic groups rather than group-specific constructs within them.
The BRIAS and WRIAS may be thought of as representative of
a class of identity measures in which opposing stages, statuses,
or
schemas are assessed, whereas the MEIM may be
conceptualized
as representative of a class of measures in which different
behav-
iors or attitudes are used to assess levels of commitment to a
single
group (i.e., one’s own). Consequently, these measures are used
as
exemplars of their classes in subsequent discussions. The two
classes of measures imply some similar as well as some
different
desirable practices with respect to research design,
measurement or
psychometrics, and interpretation that have not been addressed
in
the racial or ethnic identity literature heretofore. In fact,
virtually
no literature exists that focuses specifically on good practices
for
using or evaluating already developed theory-based racial or
ethnic
identity (REI) measures.
It is important to describe better practices for using already
developed REI scales to avoid oversimplifying essentially com-
plex measurement issues that are often inherent in REI
theoretical
constructs. The primary sources of my belief that a discussion
of
better practices is necessary are my experiences reviewing
manu-
scripts, submitting manuscripts, advising researchers, and being
fully engaged in REI research. Therefore, the purposes of this
article are to make explicit better practices for designing
research
and conducting psychometric analyses when using REI measures
to study identity constructs with new samples. I sometimes use
published studies to illustrate a practice or procedure; in most
instances, the studies were selected because their authors
reported
results in enough detail to permit the studies’ use for illustrative
purposes. More generally, the article is divided into two broad
sections, research design practices and psychometric practices.
The first section addresses conceptual issues pertinent to
research
design; the psychometric section addresses scale development
concerns.
Research Design Practices
The content of REI scales is intended to reflect standard
samples
of particular types of life experiences (racial vs. ethnic) as
postu-
lated by the relevant theory. A central empirical question with
respect to researchers’ use of REI scales is whether racial
identity
and ethnic identity scales measure the same constructs.
However,
the question cannot be adequately addressed if researchers do
not
use research design practices that are congruent with the
theoret-
Correspondence concerning this article should be addressed to
Janet E.
Helms, Department of Counseling, Developmental, and
Educational Psy-
chology, Boston College, 317 Campion Hall, Chestnut Hill, MA
02467.
E-mail: [email protected]
Journal of Counseling Psychology Copyright 2007 by the
American Psychological Association
2007, Vol. 54, No. 3, 235–246 0022-0167/07/$12.00 DOI:
10.1037/0022-0167.54.3.235
235
ical model(s) underlying each scale(s) under study. In this
section,
I (a) discuss some conceptual issues related to measuring racial
identity and ethnic identity as potentially different constructs,
(b)
discuss some poor practices that obscure differences if they
exist,
and (c) proffer some better practices.
Differentiating Racial Identity From Ethnic Identity
In REI research designs, if the researcher’s intent is to
substitute
one class of REI measures for the other, then it is important to
demonstrate that the two types of measures assess the same
racial
or ethnic constructs. Factors to consider are (a)
conceptualization
of the research question, (b) sample selection, (c) use of other
measures for assessing one type of identity rather than the
other,
and (d) comparability of validity evidence within and across
REI
measures.
Racial Identity Scales as Replacements for Racial
Categories
Racial groups or categories are not psychological constructs
because they do not connote any explicit behaviors, traits, or
biological or environmental conditions (Helms, Jernigan, &
Mascher, 2005). Instead racial categories are sociopolitical con-
structions that society uses to aggregate people on the basis of
ostensible biological characteristics (Helms, 1990). Because
racial
categories are null constructs, Helms et al. (2005) contended
that
they should not be used as the conceptual focus (e.g.,
independent
variables) for empirical studies but may be used to describe or
define samples or issues. Ascribed racial-group membership im-
plies different group-level racial socialization experiences that
vary according to whether the group is accorded advantaged or
disadvantaged status in society. The content of racial-identity
scales is individual group members’ internalization of the racial
socialization (e.g., discrimination, undeserved privileges) that
per-
tains to their group.
Ascribed racial group defines the type of life experiences to
which a person is exposed and that are available for
internalizing
(i.e., group oppression or privilege). For example, Black Ameri-
cans internalize different racial identities than White
Americans,
and, conversely, White Americans internalize different racial
iden-
tities than Black Americans. Also, the nature of the racial
identities
of Americans and immigrants or other nationals differs if they
have not experienced similar racial socialization during their
life-
times. Thus, racial identity theories are intended to describe
group-
specific development in particular sociopolitical contexts.
Racial identity measures are designed to assess the differential
impact of racial dynamics on individuals’ psychological
develop-
ment. One expects items in racial identity scales or inventories
to
include some mention of race, racial groups, or conditions that
commonly would be construed as racial in nature (e.g.,
discrimi-
nation or advantage on the basis of skin color). For example,
Helms and Carter’s (1990) WRIAS consists of five 10-item
scales,
each of which assesses the differential internalization of
societal
anti-Black racism on Whites’ identity development. Relevant
sam-
pling and measurement concerns are specifying samples and
mea-
sures for which race and racism in various forms are presumably
relevant constructs.
Ethnic Groups as Proxies for Theoretical Constructs
Ethnicity refers to the cultural practices (e.g., customs,
language,
values) of a group of people, but the group need not be the same
ascribed racial group. Betancourt and López (1993) use the term
ethnic group to connote membership in a self-identified kinship
group, defined by specific cultural values, language, and
traditions,
and that engages in transmission of the group’s culture to its
members. Ethnic identity refers to commitment to a cultural
group
and engagement in its cultural practices (e.g., culture, religion),
irrespective of racial ascriptions. Because ethnic groups imply
psychological culture-defined constructs, the constructs rather
than
the categories should be used as conceptual focuses of studies
(e.g., independent variables).
The content domain of ethnic identity measures is internalized
experiences of ethnic cultural socialization. Phinney and
associates
(Phinney, 1992; Phinney & Alipuria, 1990) initially developed
the
MEIM to assess adolescents’ search for and commitment to an
ethnic identity in a manner consistent with Erikson’s (1968)
mul-
tistage pychosocial identity theory and without regard to group-
specific cultural components. Originally, she conceptualized
eth-
nic identity as “a continuous variable [construct or scale],
ranging
from the lack of exploration and commitment . . . to evidence of
both exploration and commitment, reflected in efforts to learn
more about one’s background” (Phinney, 1992, p. 161). Her
con-
tinuous scale was composed of items assessing several
dimensions
of identity (e.g., ethnic behaviors, affirmation, and belonging);
hence, it was a multidimensional scale (Helms, Henze, Sass, &
Mifsud, 2006), with a focus on cultural characteristics that are
assumed to be relevant to individuals across ethnic groups.
Although the structure of the MEIM has varied, its underlying
conceptual theme is conformance to ethnic culture rather than
exposure to racism. The conceptual, sampling, and measurement
issues specific to ethnic identity measures pertain to identifying
participants who might be reasonably expected to engage in the
cultural practices of the ethnic cultural kinship group in
question
and ensuring that ethnic identity measures assess relevant
culture-
related rather than race-inspired psychological construct(s).
Selection and Use of Appropriate REI Measures
Researchers often use one type of REI measure (e.g., ethnic
identity) but provide a conceptual rationale for the other type
(e.g.,
racial identity) without empirical justification for doing so. Em-
pirical support for the interchangeability of identity constructs
and
measures would include evidence that (a) exemplars of the two
classes of measures are similarly related to the same external
racial
or cultural criteria or (b) one type of measure explains its own
as
well as the other theory’s external criteria best. Support for the
distinctiveness of constructs would be lack of support for inter-
changeability and evidence that other identity measures from
the
same class relate to each other in a logically consistent matter.
Empirical Comparisons of the MEIM and BRIAS as
Measures of REI Constructs
Researchers do not seem to consider whether their cultural or
racial outcome measures are theoretically congruent with the
type
of REI measure that they have selected. Consequently, lack of
236 HELMS
support for their hypotheses is attributed to deficient REI
measures
rather than possible incongruence between the researchers’ con-
ceptualization and measurement of REI constructs in their
research
designs. It is difficult to find a single study in which both
classes
of REI measures and racial and cultural outcome measures were
used. Yet for the purpose of illustrating the type of study
necessary
to support interchangeable use of REI measures, perhaps it is
reasonable to think that scores on racial identity measures, such
as
the BRIAS, should be related to scores on explicit measures of
racial constructs (e.g., perceived individual racism, institutional
racism), whereas scores on ethnic identity measures, such as the
MEIM, should be related to scores on explicit measures of
cultural
constructs (e.g., acculturation, cultural values). Confirmation of
each of these propositions would be evidence of construct expli-
cation in that each measure would be assessing constructs
germane
to it.
Johnson’s (2004) study provides sufficient psychometric sum-
mary statistics to permit illustration of the test for
interchangeabil-
ity of REI measures at least in part. His sample of Black college
students (N � 167) responded to the MEIM and the RIAS–B
(Parham & Helms, 1981), the earliest version of the BRIAS.
Table
1 summarizes alpha coefficients (rxx; in the last column) for the
REI measures, correlation coefficients between REI scores and
perceived discrimination scores, and the same correlation
coeffi-
cients corrected for disattenuation due to measurement error
(i.e.,
low alpha coefficients) attributable to MEIM or RIAS–B scores.
In
this example, the correction for attenuation may be interpreted
as
an estimate of the extent to which an REI and a discrimination
subscale measure the same underlying theoretical construct
when
the effects of REI error are eliminated (Schmidt & Hunter,
1996,
1999).
The dependent measures in the study, assessed by the Index of
Race Related Stress (Utsey & Ponterotto, 1996), were three
types
of racism: (a) cultural— belief in the superiority of one’s own
culture (e.g., values, traditions); (b) institutional—
discrimination
in social policies and practices; and (c) individual—personally
experienced acts of discrimination. The dependent constructs
favor
the RIAS–B rather than the MEIM, as is evident in the table. If
alpha is the correct reliability estimate and the correlations
were
calculated on untransformed scores, then the correlations, cor-
rected for attenuation attributable to REI measurement error,
sug-
gest even stronger relations between the racial-identity
constructs
and perceived discrimination than the ethnic-identity constructs.
The best correlation for MEIM scores is cultural racism, which
is
theory consistent and suggests that a more full blown cultural
measure might have favored it.
Alternative Measures of the Same Construct(s)
The question of whether scores on different measures of the
same theoretical constructs are related to scores on the original
measures is a matter of seeking evidence of convergent validity,
a
type of construct validity. Researchers seemingly have not
devel-
oped alternative measures of the same constructs postulated in
Phinney’s (1990, 1992) theoretical perspective, and only one set
of
researchers (Claney & Parker, 1989) has developed independent
measures of the theoretical constructs of Helms’s (1984, 1990)
White racial identity model. Choney and Rowe (1994)
conducted
an evaluation of Claney and Parker’s (1989) 15-item White
Racial
Consciousness Development Scale (WRCDS), a measure of
Helms’s (1984) stages of White racial identity development that
predated her own measure, the WRIAS. It is worth examining
the
study for what it can reveal about good practices in empirical
investigation of construct validity of scores on REI measures.
The expressed purpose of Choney and Rowe’s (1994) study was
to investigate “how the WRCDS compares with the RIAS–W
[sic],
the current instrument of choice for investigations of White
racial
identity” (p. 102). Yet their conclusion that “it seems
reasonable to
conclude that the WRCDS is not capable of adequately
assessing
the stages of White identity proposed by Helms (1984)”
(Choney
& Rowe, 1994, p. 104) suggests that an unspoken purpose may
have been to examine scores on the two scales for evidence of
convergent validity.
In Table 2, Cronbach’s alpha reliability coefficients for the
WRCDS and WRIAS scales are summarized in the last two
columns, and correlations between parallel subscales, adapted
from Choney and Rowe (1994, p. 103), are shown in the second
column. The authors did not report a full correlation matrix for
the
Table 1
Comparing Obtained and Corrected Correlations Between MEIM
and RIAS-B Scores and Perceived Racism
Scale
Perceived discrimination
Alpha
Cultural Institutional Individual
Obtained Corrected Obtained Corrected Obtained Corrected
MEIM .16 .17 �.01 �.01 .08 .09 .87
Racial
Preencounter �.01 �.01 .39 .47 .07 .08 .69
Encounter .36 .50 .25 .35 .28 .39 .51
Immersion .33 .41 .29 .36 .22 .27 .65
Internalization .37 .43 .13 .15 .35 .47 .75
Note. From The Relation of Racial Identity, Ethnic Identity, and
Perceived Racial Discrimination Among African Americans
Tables 1 (p. 18) and 6 (p.
33) by S. C. Johnson, 2004, Unpublished doctoral dissertation,
University of Houston, Texas. “Obtained” were correlations
reported by Johnson.
“Corrected” are estimates of correlations with measurement
error removed. Only measurement error for the MEIM or racial
identity subscales (i.e., alpha)
were used to correct for disattenuation of correlations
attributable to measurement error. The correction for
attenuation used was rxy� � rxy/rxx
.5, where
rxy� equals the corrected correlation, rxy equals the obtained
correlation, and rxx
.5 equals the square root of the reliability coefficient for the
relevant MEIM
or RIAS-B subscales. MEIM � Multigroup Ethnic Identity
Measure; RIAS-B � Black Racial Identity Attitudes Scale.
237SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
two measures, and so it is not possible to examine within-
measure
patterns of correlations. Table 2 also includes correlations cor-
rected for measurement error in each of the REI measures using
each scale’s reported Cronbach’s alpha coefficient.
The corrected correlations for these data suggest that the two
measures assessed the parallel constructs of Reintegration,
Auton-
omy, and Pseudo-Independence quite strongly and the
constructs
of Contact and Disintegration much more strongly than mere
examination of the obtained correlations would suggest, thereby
refuting Choney and Rowe’s assertion that the WRCDS was
“incapable” of assessing Helms’s constructs. However, the fact
that the corrected Reintegration correlation coefficient exceeded
1.00 suggests either that Choney and Rowe’s original
correlations
were downwardly biased by sampling error or that alpha coeffi-
cients were not the appropriate reliability estimates for their
data.
In such circumstances, sampling error concerns can be avoided
to
some extent by using the better conceptual, sampling, and inter-
pretative practices discussed subsequently. Procedures for
judging
the appropriateness of Cronbach’s alpha as one’s reliability
esti-
mator are addressed in the later section on better psychometric
practices.
Poor and Better Construct Defining Practices
Research design practices are considered poor if they do not
permit data obtained from REI measures to be interpreted in a
manner consistent with the relevant REI theory. Practices are
better if they make it possible to subject theory-congruent
hypoth-
eses to empirical testing.
Conceptual Practices
Researchers often assume that because the MEIM is intended to
be a multi-ethnic group measure, it is appropriate to collapse
data
across racial and ethnic groups without examining whether the
responses of the subgroups on the MEIM items and scales are
similar (e.g., Phan & Tylka, 2006). Analogously, researchers
ag-
gregate data across ethnic groups within ascribed racial
categories
when using the BRIAS or WRIAS without investigating the
types
of racial socialization to which they have been exposed.
Alterna-
tively, researchers find that ethnic groups differ but still report
aggregated descriptive statistics (Phelps, Taylor, & Gerard,
2001).
A conceptual problem associated with a priori aggregation is
that
researchers presume rather than demonstrate that potentially di-
verse racial and ethnic categories (e.g., African Americans and
other Black ethnic groups) share the same cultural or racial
social-
ization experiences. A methodological consequence is the
potential
loss of statistical power for subsequent analyses, if the groups’
responses are actually different.
Phinney’s (1992) studies show that the term ethnicity may have
different meaning to different populations. More specifically,
she
found ethnic group differences in responses to the MEIM such
that
White participants had lower scores than groups of color. On
the
basis of their responses to an open-ended item, Phinney (1992)
observed that “few White subjects in either sample identified
themselves as belonging to a distinct ethnic group. . . . The
num-
bers of Whites who considered themselves as ethnic group
mem-
bers was too small to permit a separate analysis” (p. 174). The
implications of this observation are rarely heeded.
A better research design practice is that users of racial identity
measures should provide a “racial” conceptual rationale,
focused
on racial socialization; users of ethnic identity measures should
provide an “ethnic cultural” rationale, focused on cultural
social-
ization; and researchers interested in both types of identity
should
provide both types of rationales. Matching samples to the
appro-
priate type of REI measure should enhance investigations of va-
lidity.
Also, it follows from the foregoing analyses that correction for
attenuation attributable to measurement error is a good practice
when the results of the researcher’s study are intended to have
far-reaching implications for REI theoretical constructs or to
lead
to substantive advice about the theoretical constructs assessed
by
the measures (Schmidt & Hunter, 1999). Correlations were the
focus of the corrections in the examples, but virtually any
statistic
can be corrected for measurement error if it conforms to the
assumptions of the general linear model (GLM).
Sampling Practices
Because scale respondents’ attributes interact with their re-
sponses to scales generally and REI scales specifically (Dawis,
1987; Helms, 2005; Vacha-Haase, Kogan, & Thompson, 2000),
researchers minimally should provide both a conceptual
rationale
for why the particular REI measure is appropriate for the
research
participants that were studied as well as empirical support
derived
from previous studies. However, researchers typically do not
de-
scribe any inclusion criteria for defining their participants as
members of a racial or ethnic group as such designations are
used
in the United States. At best, they indicate that the racial/ethnic
categories were “self-identified” without explaining how such
identification occurred (e.g., Mercer & Cunningham, 2003, p.
221). At worst, they either do not describe the racial or ethnic
composition of their sample in the Participants section at all
(e.g.,
Reese, Vera, & Paikoff, 1998) or they assign participants to a
racial or ethnic group without any indication of how the assign-
ment was determined (e.g., Goodstein & Ponterotto, 1997).
Better practices are that researchers should describe their pro-
cedures for recruiting research participants, collecting racial or
Table 2
An Example of a Convergent Validity Study of White Racial
Identity Constructs as Assessed by the White Racial Identity
Attitudes Scale (WRIAS) and the White Racial Consciousness
Development Scale (WRCDS)
Scale Correlation Correction
�
WRIAS WRCDS
Contact .11 0.42 .54 .13
Disintegration .17 0.30 .77 .43
Reintegration .53 1.19 .80 .25
Pseudo-Independence .29 0.61 .69 .32
Autonomy .55 0.91 .67 .55
Note. Alphas are adapted from Choney and Rowe, (1994), p.
103. Cor-
relations are between parallel subscales. Computation of the
correction for
attenuation was as follows: Contact � .11/(.54 � .13).5 � .42;
Disinte-
gration � .17/ (.77 � .43).5 � .30; Reintegration � .53/ (.80 �
.25).5 �
1.19; Pseudo-Independence � .29/ (.69 � .32).5 � .61;
Autonomy � .55/
(.67 � .55).5 � .91.
238 HELMS
ethnic data, quantifying the data, and assigning participants to
racial or ethnic categories. These aspects of researchers’
research
design should be described as thoroughly as the researchers de-
scribe the other measures or manipulations and analyses in their
studies. For example, if respondents were asked to describe
them-
selves, were they provided with checklists or open-ended items?
How were responses coded? If different racial or ethnic groups
were included, the researcher should provide descriptive
informa-
tion (e.g., means, standard deviations, and reliability
coefficients)
as evidence that aggregating the various groups was
appropriate.
Careful attention to the racial and cultural aspects of research
designs will provide better contexts for conducting the
psychomet-
ric studies of REI measures that have so intrigued researchers
and
reviewers since the measures’ inception.
Psychometric Practices
The original focus of racial identity scales—assessment tools
for
assisting counselors to better diagnose and remediate the
varying
psychological effects on individuals of internalized racial
social-
ization (Jackson & Kirschner, 1973; Parham & Helms, 1981)—
has virtually disappeared from the REI literature. In fact,
research-
ers have been severely chastised for attempting to use the
measures
for diagnostic or treatment purposes, and evidence of their
useful-
ness has been discounted (Behrens, 1997; Behrens & Rowe,
1997;
Fischer & Moradi, 2001). Instead researchers have focused on
evaluating the worthiness of REI scales by using reliability
anal-
yses, principal-components analyses, and factor analyses to
exam-
ine the internal structure of the measures. However, poor
practices
associated with each of these methodologies threaten to reduce
REI scales to simpler measures than are necessary to explain
individuals’ complex racial and cultural functioning.
Reliability Analyses
Researchers typically use Cronbach’s (1951) alpha coefficients
to estimate the reliability of a sample’s responses to REI items
comprising subscales or scales even though Cronbach’s alpha
was
not designed to assess the reliability of multidimensional
measures
(Hattie, 1985). Nevertheless, much of the threat to REI
theoretical
constructs from reliability analyses results as much from
research-
ers’ poor practices with respect to their use of Cronbach’s alpha
as
from the likelihood that it is the wrong statistic most of the time
(Helms, 2005).
To explain when Cronbach’s alpha is the wrong statistic, it is
necessary to provide a brief overview of the calculation and
assumptions underlying use of Cronbach’s alpha because
research-
ers seem to be unaware of them. Consequently, the researchers
do
not examine the fit of their data to the implied reliability
measure-
ment model or to the REI theoretical assumptions under investi-
gation. The Cronbach’s alpha coefficient is the focus of my REI
reliability discussion because it is virtually the only reliability
estimator that is ever used to evaluate REI measures in the REI
measurement literature or other types of measures in the social
and
behavioral sciences literature more generally (Behrens, 1997;
Cor-
tina, 1993; Helms, 1999; Peterson, 1994).
Overview of Cronbach’s Alpha
This overview of Cronbach’s alpha (henceforth, alpha) is not
intended to be a technical treatise on the statistic. A number of
primers for applied researchers are available to fulfill that goal.
These include Cortina (1993), Helms et al. (2006), Thompson
and
Vacha-Haase (2000), and the Standards for Educational and
Psy-
chological Testing (American Educational Research Association
[AERA], American Psychological Association, and National
Council on Measurement in Education, 1999). Sometimes poor
practices occur because researchers are not aware of the nature
of
the data necessary to use and interpret alpha; sometimes they
occur
because it is not the appropriate statistic for the researcher’s
intended use. Therefore, the purpose of this overview is to
provide
enough information to explain why the proposed “better”
practices
are better.
An alpha coefficient is a statistic that summarizes the degree of
interrelatedness among a sample of participants’ responses to a
set
of items intended to measure a single construct. Alpha
coefficients
typically range from zero to 1.00, with values approaching 1.00
suggesting a high level of positive intercorrelation among items.
The nature of the interitem relationships is captured in the
formula
for standardized alpha:
� � kr/�1 � �k�1)r], (1)
where k refers to the number of items in the subscale or scale
and
r is the average correlation between participants’ responses to
all
pairs of subscale items. It should be noted that this alternative
alpha formula should not be used unless the researcher
standard-
izes item responses before calculating total scores for
subsequent
analyses because it assumes homogeneity of item responses;
how-
ever, it is useful for making some necessary points (Helms et
al.,
2006).
As is true of all reliability coefficients, alpha is not an inherent
or stable property of scales, regardless of its value (AERA et
al.,
1999; Thompson, 1994). Rather it is a value that describes one
sample’s responses to a set of items under one set of circum-
stances. Therefore, appraisals of REI measures based solely on
the
magnitude of alpha coefficients, such as “Practitioners should
withhold use of the WRIAS until there is clear evidence with
regard to what it measures” (Behrens, 1997, p. 10), reflect
multiple
misunderstandings concerning how alpha coefficients should be
used and interpreted.
With few exceptions (e.g., Owens, 1947), researchers’
reliability
ideal is an alpha coefficient of 1.00 or as close as possible,
which
is the standard used to evaluate REI measures and non-REI mea-
sures alike. A coefficient of such magnitude would be correctly
interpreted as indicating that 1.00 or 100% of the variance of
the
total subscale or scale scores is reliable or systematic variance
for
the sample under study. No conclusions could rightfully be
made
about “what” the scale scores measure (i.e., the nature of the
systematic variance), which is a matter to be addressed by
means
of validity studies involving measurement or manipulations of
relevant constructs external to the target measure. Generalizing
the
obtained level of reliability to other samples would not be
proper
because reliability coefficients likely differ from sample to
sample.
Moreover, obtaining alpha coefficients close to 1.00 is a Pyrrhic
ideal generally because a unit value signifies that (a) certain
kinds
of statistical analyses cannot be conducted at the item level or
will
yield spurious results, and (b) the usefulness of scale scores for
validity studies under such circumstances is quite limited.
Owens’s
reductio ad absurdum is that perfect or almost perfect internal
consistency reliability (ICR) coefficients indicate that sample’s
239SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
responses to each item are perfectly predictable from their re-
sponses to every other item as well as the total scale scores
from
which the alpha coefficient was derived. Thus, the item-level
responses would be redundant with each other and results of
statistical analyses requiring matrix inversion—such as factor
analyses, structural equation modeling, and item analyses via
multiple regression analyses—would be trivial. Some computer
packages warn the user that the data matrix is “non-positive
definite” when this type of redundancy occurs.
The relevant validity issue is that when the ICR coefficient is
nearly perfect, it suggests that the items and subscale scores
have
some single construct (i.e., systematic variance) in common and
for validity evidence to be obtained, the same single construct
must
be salient in the external criteria. For example, the researcher
would need to identify the same cultural construct in the MEIM
and the external criteria; in the case of the WRIAS or BRIAS
and
external criteria, the researcher would have to identify the same
racial construct in each, if alpha coefficients were perfect. Use
of
alpha to evaluate either class of REI measures presumes that
item
responses fit a unidimensional structure, because alpha is
intended
to evaluate the unidimensionality of item responses or scale
scores,
whether it does or not (Hattie, 1985). Thus, validity analyses
might
result in “too small” validity coefficients when alpha is too
large.
In sum, it is not clear why alpha is so popular a statistic for
evaluating measures generally given that, at its best, it
describes
and promotes development of very simple constructs. In some
ways, the simple structure assumption is least problematic for
REI
measures such as the MEIM if their theoretical rationale
proposes
positively related constructs and their item responses are
positively
correlated. Positively correlated item responses and subscale
scores yield high alpha coefficients, even if they restrict the
other
kinds of analyses that should be conducted.
Simplification of constructs is most problematic for evaluation
of REI measures such as the WRIAS because their theories pro-
pose that persons endorsing some subscale items as self-
descriptive will reject others. Depending on the samples’ racial
socialization experiences, datasets may be defined by some
nega-
tive and some positive correlations among item responses; some
item responses may be more homogeneous as indicated by small
standard deviations; and the sample’s level of endorsement (i.e.,
item means) might differ across items. Any of these conditions
would contribute to low alpha coefficients because they are vio-
lations of basic alpha measurement assumptions, but each of the
conditions may be consistent with some REI theory.
It is important to understand these basic aspects of alpha as a
measurement model so that the researcher can make an informed
decision about whether its use is consistent with the theoretical
framework of the selected REI measure and the researcher’s
rea-
son for selecting it. Once data are collected, it is important to
check
the validity of assumptions associated with anticipated
psychomet-
ric analyses, particularly if REI scale modification depends on
the
magnitude of obtained reliability coefficients.
Assumptions and Scale Modification
Because alpha is virtually the only reliability coefficient used to
evaluate and modify REI measures, it is useful to discuss
assump-
tion checks and scale modification practices as they pertain to
alpha. Yet the better practices are relevant to virtually any reli-
ability coefficient.
Alpha Measurement Assumptions
The basic assumptions that should be examined to support use
of alpha are (a) item responses are positively correlated, (b)
item
responses are homogeneous, (c) item means are essentially
equal,
and (d) the REI theory postulates unidimensional or
homogeneous
constructs. The first assumption can be investigated by
examining
the interitem correlation matrix. The presence of any negative
correlations means that the resulting alpha coefficient will be an
underestimate of item relatedness. If Assumption A is
confirmed,
Assumption B can be checked by using Feldt and Charter’s
(2003)
ratio for examining interitem homogeneity of variances (i.e.,
com-
pare the largest item standard deviation to the smallest item
stan-
dard deviation in the data set). If the result of the comparison is
less than 1.3 (i.e., SDL/SDS � 1.3), alpha may be an
appropriate
estimate of ICR. If Assumption B is supported, then the
smallest
item mean and largest item mean should be compared via statis-
tical tests (e.g., paired comparison t tests, within-subjects
analysis
of variance). The check for Assumption D is conceptual, but it
may
be rejected if the REI theory proposes clusters of items or
people
intended to measure more than one construct using a single
scale
or multiple subscales.
If any of the assumptions are not supported, then alternative
procedures for estimating ICR are available. Some of these
alter-
native procedures are summarized in Table 3, which cites re-
sources and describes alternatives to use when specific
measure-
ment conditions exist. For example, if alpha is “too low,”
Rogers,
Schmitt, and Mullins (2002), cited in the last row, recommend
exploratory factor analysis, followed by calculation of alphas
for
identified item subsets and composite alpha if the researcher
intends to use the items as a single multidimensional scale. The
procedures in Table 3 ought to work well for MEIM-like scales
but
not for WRIAS-like scales if they are ipsative.
Ipsative Measurement Assumptions
Conceptually, a measure is ipsative if individuals’ scores are
their explicit or implicit self-rankings on a set of items or
subscales
such that some scores must be higher than some others.
Ordinarily,
ipsativity results from response formats (e.g., forced choice,
rank-
ings) or transformations (e.g., subtracting person’s mean scores
from total scores). However, for some REI measures, ipsativity
is
induced by individual participants’ need to be logically
consistent
and, therefore, unwillingness to endorse contradictory items as
self-descriptive. Consequently, when some subscale scores are
high, some others are relatively lower (i.e., some subscales are
inversely related within individuals and therefore negatively
cor-
related between individuals).
Three properties that Helms and Carter’s (1990) WRIAS share
with ipsative measures are (a) half or more correlations in a
matrix
of correlations typically are negative, (b) samples’ mean
correla-
tion among subscales typically is negative and approaches zero,
and (c) the average of the full-scale item responses is a
constant.
Of the 21 correlation matrices analyzed by Behrens (1997), for
which 10 correlation coefficients were reported, 100% consisted
of
five or more negative correlations and the mean correlation,
240 HELMS
weighted by sample size, was –.03. For the 50 WRIAS items,
the
average value within individuals and samples rounds to the
scale
midpoint (i.e., 3). Also, except for measurement error, such as
missing data, item means for any four subscales (total number
of
subscales minus one) will also equal a rounded value of 3 in
most
samples, meaning that the contribution of the remaining items is
not unique.
Experts in measurement have long debated how best to analyze
ipsative datasets given that the data violate all of the
assumptions
of the GLM and CTT, particularly the assumption of random
error
among items (Baron, 1996; Johnson, Wood, & Blinkhorn, 1988).
There is no consensual resolution to the debate, and most of the
compromises do not pertain to REI scales because their
ipsativity
is theoretically induced rather than an artifact of the response
format of scales. For now, the best compromises are first to
explore Points a– c from the previous paragraph to determine
whether data conform to an ipsative pattern. Second, if so, do
not
include all of the items in reliability or factor analyses. Conduct
analyses with single subscales or subsets of subscales. Also,
con-
sider analyses that do not depend on correlation coefficients,
such
as cluster or profile analyses. Third, if data are not ipsative,
multidimensional analyses in Table 3 might yield meaningful
results.
REI Scale Modification
Many researchers modify or negatively evaluate REI measures
when they obtain unsatisfactory alpha coefficients. Such
practices
typically occur without regard to any of the previously
discussed
research design or psychometric issues and thereby contribute
to
the development of nonstandard, atheoretical scales and perhaps
the discarding of important information about samples.
Vacha-Haase (1998) developed reliability– generalization (RG)
methodology to assess the effects of sample and research design
characteristics on the magnitude of reliability coefficients
across
studies. RG studies allow the researcher to adjust for the fact
that
tests per se are not reliable by identifying the conditions (e.g.,
sample demographic and response characteristics, settings)
under
which one is likely to obtain desired levels of reliability for
some
intended purpose using the same set of items. Utsey and
Gernat’s
(2002) study provides an example of how RG studies might be
used to improve psychometric reliability practices.
Utsey and Gernat used their obtained alpha coefficient of .28
for
scores on the 10-item WRIAS Autonomy subscale as the
rationale
for dropping two of its items, “Sometimes jokes based on Black
people’s experiences are funny” and “I understand that White
women and men must end racism in this country because White
people created it.” The two items assessed sense of humor and
historical knowledge in Helms’s (1990) White identity theory.
To
their credit, the researchers (a) reported which items were
omitted;
(b) attempted to find justification for their unusually low alpha
in
previous literature; and, finding none quite as low, (c) checked
their data for input errors before revising the subscale on the
basis
of corrected item–total correlations. The revised alpha was .55.
Had Utsey and Gernat conducted RG comparisons of their
sample to samples in other studies as Helms (2005) recently
recommended, they would have discovered that relative to
Helms
and Carter’s (1990) referent total sample, the responses of their
sample were much more variable on the Contact subscale; they
were less variable with respect to the remaining subscales,
except
Autonomy for which the authors did not report a standard devia-
tion for the full subscale. Participants in their study differed on
a
variety of attributes relative to Helms and Carter’s referent
sample
(including sense of humor and racial historical knowledge), but
the
point here is that if researchers do not rule out sample attributes
as
a rationale for their reliability results, then it is improper to
ascribe
the “erratic pattern of Cronbach’s alpha coefficients” to the REI
scale or subscales rather than their “unusual” sample (Utsey &
Gernat, 2002, p. 477).
An alpha coefficient is a point estimate of a population value
that is obtained with some level of precision and is sample
depen-
Table 3
Summary of Some Recommended Methodologies for Evaluating
Measures of Multidimensional Constructs
Author Description
Bacon, Sauer, & Young
(1995)
In SEM, use weighted omega to estimate rxx so that items can
violate alpha assumptions and receive weights
proportional to their true score variances.
Ferketich (1990) Compute rxx (a) from first eigenvalue of a
PCA (omega) or (b) the item commonalities of an FA (theta) if
items are heterogeneous.
Komaroff (1997) If CFA reveals correlated error, adjust alpha
by subtracting estimated positive sum of error covariances from
it.
Lee, Dunbar, & Frisbie (2001) SEM multifactor partially tau
equivalent—assumes items within subscales are homogeneous
and positively
correlated but subscales are not. SEM multifactor congeneric
(heterogeneous)—assumes subscale-specific
common factors; structural coefficients are not restricted for
subscales, and different parameters in each
subscale may be estimated.
Raykov (1998) 1. Use SEM to test whether scale is congeneric
(not homogeneous); if so, examine MI associated with error
covariances and expected parameter changes. If MIs � 5, it may
reflect heterogeneous item subsets. 2.
EFA using maximum likelihood extraction and examine chi-
square for fit; compare eigenvalues.
Raykov (1997) Use Raykov LVM to examine underlying factor
structure of item sets before deleting items to increase
alpha.
Rogers et al. (2002) If CFI is � .80, use EFA to identify item
subsets and calculate composite alpha, if a single
multidimensional
scale is desired.
Note. SEM � structural equation model; PCA � principal-
components analysis; FA � factor analysis; CFA �
confirmatory factor analysis; MI �
modification indices; EFA � exploratory factor analysis; LVM
� latent variable model; CFI � comparative fit index.
241SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
dent. RG studies may be used to discover whether one’s
obtained
reliability coefficients are aberrant relative to samples in other
studies (Helms et al., 2006). Fan and Thompson (2001) recom-
mended that researchers report confidence intervals (CIs) for
ob-
tained alpha coefficients and provided the methodology for
calcu-
lating them (pp. 522–523). They also illustrated the analysis for
statistically comparing an obtained coefficient to a population
value(s).
Helms et al.’s (2006) and Fan and Thompson’s (2001) advice
may be applied to Utsey and Gernat’s (2002) previously
discussed
Autonomy coefficient (.28). Behrens’s (1997) meta-analysis of
Autonomy alpha coefficients from 23 studies, which was cited
by
Utsey and Gernat, yielded an alpha coefficient population
estimate
of .61. The upper and lower limits of the (presumably) 95% CI
for
the population estimate were respectively .63 and .60. The 95%
CI
(based on a central F distribution) calculated for Utsey and Ger-
nat’s alpha of .28 is .09 (lower limit) to .44 (upper limit). Thus,
the
range of population estimates for their obtained and revised
alpha
(.55) coefficients were considerably below the average range
Be-
hrens reported. Additional support is that an analysis of
variance,
conducted with the smaller alpha value in the numerator (i.e.,
.28),
indicates that the researchers’ reported alpha was significantly
lower than Behrens’s population estimate of .61, F(144, 1296)
�
1.85, p � .0001.
Therefore, it is reasonable to conclude either that Utsey and
Gernat’s sample was aberrant or that alpha was not the
appropriate
statistic for analyzing their data. The more general principle is
that
to maintain the theoretical meaningfulness of REI scales,
modify-
ing the scales or subscales should not be the automatic response
to
“too-low” alpha coefficients. Instead, alternative psychometric
hy-
potheses should be explored, including effects of sample
attributes.
Whole Scale REI Scale Revisions
Researchers routinely engage in a variety of practices intended
to develop new REI measures from the original items. Most of
these practices involve analyses of responses to the entire scales
by
means of techniques intended to assess the fit of data to a unidi-
mensional measurement model (i.e., interscale correlations,
principal-components analysis, and factor analysis). Many of
these
analyses are conducted without regard to the interplay between
psychometrics and theory; some others rely on improperly con-
ducted or incorrectly interpreted psychometric procedures.
Nonstandard REI Scales
One consequence of disregarding the interactions between the-
ory and psychometric practices is that researchers replace
standard
sets of items with whatever sets of items best describe their
samples. Fit may be determined by reliability analyses, factor
analysis, principal-components analysis, correlation analyses, or
some combination.
Reliability Analyses for Subscales
When using the MEIM, researchers frequently describe the
measure as consisting of different numbers of items (range: 10
to
24) and different numbers of scales or subscales, dimensions, or
components (range: 1 to 5 subscales), each of which consists of
varying numbers of items and item anchors (e.g., Carter,
Sbrocco,
Lewis, & Friedman, 2001; Cokley, 2005). Often it is impossible
to
discern whether the items used by authors correspond to those
listed by Phinney (1992). The analogue for racial identity
measures
is that researchers drop scales or recombine items on the basis
of
their own or someone else’s reliability analyses or personal
pref-
erences (Kelly, 2004).
It is not clear why researchers are so flexible about the structure
of the MEIM in particular given that Phinney (1992) intended
her
measure to assess the same constructs across ethnic groups. Per-
haps they are confused because she did not report reliability
coefficients for the scores of her two developmental samples on
the two-item Ethnic Behaviors subscale. She asserted that “reli-
ability [i.e., alpha] cannot be calculated with only two items”
(Phinney, 1992, p. 165). Subsequent researchers have followed
suit and cite her as the source for this poor practice. However,
Phinney’s assertion is demonstrably untrue. Much of CTT has
focused on developing methodologies for estimating the
reliability
of scores on two-item (e.g., split-half, alternate form) tests
(Feldt
& Charter, 2003). In fact, if the researcher calculates item vari-
ances and covariances, then the standard formula for alpha, used
in
Table 4, may be used to estimate reliability.
The Spearman–Brown formula was used to estimate the alpha
coefficient for the two-item Ethnic Behavior subscale scores of
Phinney’s college student sample whose overall reliability for
responses to 14 items was .90. As shown in Table 4, the
estimated
alpha for the Ethnic Behavior subscale responses was much
lower
than the alphas reported for the responses of her college student
sample to the other subscales, but it could be calculated and
doing
so is consistent with the advice in the Testing Standards that
researchers report reliability coefficients for all scales and sub-
scales used in their studies (AERA et al., 1999).
Table 4
Summary of Calculation of Two-Item Reliability and Composite
Alpha for Phinney’s (1992) College Student Sample
Scale
Phinney data Untransformedb
k � M SD M SD
Affirmation/Belonging 5 .86 3.36 .59 16.80 2.95
Identity Achievement 7 .80 2.90 .64 20.30 4.48
Ethnic Behaviorsa 2 .56 2.67 .85 5.34 1.70
Ethnic Identity 14 .90 3.04 .59 42.56 8.26
Composite alpha 3 .80
Note. Data adapted from the college student sample in “The
Multigroup
Ethnic Group Measure: A New Scale for Use With Diverse
Groups,” by
J. S. Phinney, 1992, Journal of Adolescent Research, 7, Tables
2 & 3, p.
167. Calculation of composite alpha (CA) was as follows:
CA � �k/k�1)�[1�( SDss
2/SDtotal
2
]
� �3/ 2
� �1 � ��2.95 2 � 4.48 2 � 1.7 2
/8.26 2�
� �3/ 2
� �1 � �31.6629/68.2276
�
� .80
a Alpha for Ethnic Behaviors was estimated from alpha of
Ethnic Identity
Scale scores using Spearman-Brown formula. bUntransformed
scores
were computed by weighting Phinney’s data by the number of
items (k)
and used to compute composite alpha.
242 HELMS
Reliability Analyses for Total Scale Scores
Judging from her reporting of reliability coefficients for four
individual subscales (Phinney, 1992, p. 165), Phinney intended
one early version of the MEIM to assess four unidimensional
constructs: Affirmation/Belonging (five items, �s � .75, .86),
Ethnic Identity Achievement (seven items, alphas � .69, .80),
Ethnic Behaviors (two items, alphas not reported), and Other
Group Orientation (six items, �s � .71, .74). The two alphas in
parentheses are for samples of high school students (N � 417)
and
college students (N � 136), respectively. The Ethnic Identity
total
scale (14 items, �s � .81, .90) is the aggregated responses to
the
14 items comprising the three subscales (excluding Other Group
Orientation).
Researchers routinely report and analyze reliability data only
for
the Ethnic Identity (EI) total, thereby overlooking potentially
the-
oretically interesting information. Alternatively, they report
alpha
only for the EI total but use the individual subscales in their
analyses, thereby ignoring the better practice of reporting
reliabil-
ity coefficients for all scales and subscales (AERA et al., 1999).
A
third poor practice is that researchers use item variances rather
than subscale variances to calculate alpha for the composite
scores
(i.e., composite alpha) in spite of the fact that the EI total
obviously
consists of multiple components or subscales (i.e., is
multidimen-
sional).
Cronbach (1951) advised that for multisubscale measures, re-
searchers should use subscale values (e.g., variances,
intercorrela-
tions) in the standard formula to calculate lumpy alpha. Doing
so
determines whether a principal component, defining a
superordi-
nate dimension (e.g., ethnic identity), runs through the subscale
responses more strongly than individual constructs run through
subscale responses. In Table 4, I show the difference between
composite or lumpy alpha (.80), calculated for Phinney’s (1992)
data for college students, and the (presumably) item-level alpha
that she reported (.90). To do so, the original subscale variances
were estimated by weighting their standard deviations by the
number of scale items (k). Only one of the subscales has a lower
alpha than composite alpha, which means that although a
common
theme runs through the subscales, it does not warrant
abandoning
the theoretical constructs that the individual subscales assess by
replacing them with total scale scores.
When researchers use responses to individual items to calculate
ICR for total scale scores rather than responses to subscales,
total-scale alpha coefficients typically will be inflated for one
or
both of the following reasons: (a) the number of items for the
scale
overall exceeds the number of items for the separate subscales
(note that shared variance is weighted by the number of items in
alpha formulas) and (b) nominal positive correlations across
item
sets elevate the level of shared variance (Cortina, 1993; Helms
et
al., 2006). Thus, users of the MEIM and other REI measures
often
have been seduced by higher total-scale alpha coefficients into
abandoning the theoretical constructs underlying the selected
mea-
sure.
Yet conceptually meaningful measures are likely to yield better
validity evidence. Table 1 illustrates this point to some extent.
Notice that scores on the measure that yielded the best ICR (rxx
�
.87) correlated worst with measures of other constructs, but the
measure whose scores yielded the worst ICR (rxx � .51)
demon-
strated better correlations overall than any of the other measures
with better ICR (i.e., alphas). For some of the revisions of the
MEIM, it would not be surprising to discover that the subscales
yielded higher correlations with measures of other constructs
than
did the total scale because conceptual complexity is lost by
aggre-
gating subscales. Thus, a better practice is that if composite
ICR
must be calculated for REI measures for some purpose, then
researchers should calculate it at the subscale level as a means
of
determining whether it is meaningful or necessary to collapse
across theoretical REI constructs.
Analyses of Correlation Coefficients
Researchers frequently calculate correlations among subscales
and when large correlations are found between pairs of scales,
they
either (a) collapse the subscales, (b) claim “multicollinearity”
as
the explanation for why their hypotheses were not confirmed, or
(c) use the findings as a rationale for creating new scales. Both
statistical and REI theoretical assumptions interact to suggest
that
these are not good practices. Examination of the dataset’s fit to
statistical assumptions is necessary if the researcher intends to
use
inferential statistics to test hypotheses concerning correlations
between REI subscales or to reconfigure REI subscales.
Onwuegbuzie and Daniel (1999, pp. 8 –10) reported that re-
searchers fail to confirm that Pearson correlation coefficients
are
the correct statistic for evaluating associations between
measures
by assessing the conformance of the distributions of the pair(s)
of
variables to the assumptions of GLM. Some of the major
assump-
tions and their relevance to REI measures are as follows:
1. One variable of the pair is presumed to be the independent or
predictor variable and the other is presumed to be the dependent
or
criterion variable. Of course, this assumption is not true in the
case
of REI measures given that subscales are administered to the
same
people at the same time via the same measure and that REI
theories
suppose that subscale scores are interrelated within individuals.
Consequently, sampling and measurement error are likely corre-
lated and, therefore, influence the magnitude of correlations in
one
direction or another.
2. The dependent variable must be normally distributed. This
also is unlikely to be true because many of the REI theories
suppose that people behave differently according to the
setting(s)
in which they find themselves or the people with whom they are
interacting. Thus, skewness or kurtosis of subscale responses
may
affect the size of correlation coefficients, thereby contributing
to
Type I or Type II error.
3. The variability of scores for the dependent variable is about
the same at all levels of the independent variable. Because no
variable is the independent or dependent variable when all intra-
subscale correlations are compared, this assumption in effect re-
quires variances to be equal for all subscales at all levels, which
is
unlikely given that REI theories postulate sample heterogeneity
across subscales.
In sum, attributions that REI subscales are flawed, which cite
bivariate correlations as evidence, might be counterindicated if
the
researcher cannot provide evidence that relevant GLM assump-
tions were tested. Moreover, that a correlation differs
significantly
from zero is not evidence of multicollinearity of either of the
involved subscales. Researchers often use the term
multicollinear-
ity to mean redundancy, both of which are inferred from “large”
243SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
correlations of various sizes between scales, although the
defini-
tion of large varies from study to study.
Branch (1990) pointed out that it is fallacious to conclude on
the
basis of even substantial correlations (e.g., .80) that two
subscales
measure the same construct and are interchangeable as a result.
He
contended that (a) correlations do not necessarily reveal that
each
person obtained similar scores on each subscale, and (b) inter-
changeability requires evidence that the subscales involved
share
observed scores, means, variances, and content. These are
aspects
of whole scale correlation analyses that are rarely examined and
interpreted in evaluations of REI measures.
Principal-Components and Factor Analysis
When alpha coefficients are too small, researchers routinely
conduct post hoc principal-components analysis (PCA) or factor
analysis (FA) to develop “reliable and conceptually meaningful
scales” (Mercer & Cunningham, 2003, p. 217) or to “identify
and
describe a [more useful] subset of items” (Yancey, Aneshensel,
&
Driscoll, 2001, p. 194) from already developed conceptually
mean-
ingful REI scales. In doing so, they typically confuse PCA with
FA, although many psychometric texts indicate that the two
types
of analyses are based on different mathematical assumptions
and
serve different purposes (Kim & Mueller, 1978). Yet some of
the
poor psychometric practices are similar when they are used to
evaluate entire REI measures at the item level and some are
different.
PCA issues. PCA has been the primary methodology used to
evaluate responses to both types of REI measures. PCA is
intended
to reduce a large number of items (in this case) to a smaller
number, and the first component accounts for the maximum
amount of variance possible among the items. The implicit re-
search question when PCA is used to analyze the WRIAS,
BRIAS,
or the MEIM is whether the items can be transformed into some
other smaller set of variables—a question whose answer is theo-
retically meaningless. Helms and Carter (1990) used PCA in
developing the WRIAS and should not have because the number
of
dimensions (i.e., subscales) was already rationally defined by
theory.
PCA conducted at the item level assembles the strongest posi-
tively related items across subscales until it has accounted for
as
much variance as possible. Yet typically the analysis accounts
for
less interitem variance overall than the average of the alpha
coef-
ficients of the multidimensional subscales that inspired the PCA
analysis because the units of analysis are different. That is, reli-
ability analyses generally examine item responses within sub-
scales, whereas PCA analyses examine items without regard to
subscale. For example, in Mercer and Cunningham’s (2003)
WRIAS study, alpha coefficients accounted for an average of
62%
of the interitem variance, whereas their PCA accounted for 42%
of
the interitem variance.
Yet researchers may fool themselves into believing that they
have discovered better subscales because alpha coefficients
calcu-
lated for PCA-derived scales must be large because the PCA
statistically maximizes the shared covariance among item re-
sponses of the sample. Thus, alpha coefficients calculated for
PCA-derived subscales are statistical artifacts. Also, it should
be
noted that the amount of interitem variance explained by alpha
is
equivalent to the first principal component if the same data are
used in the analyses (e.g., subscale items) and the previously
discussed alpha assumptions are supported (e.g., homogeneity
of
variances). Support is recognizable from equal pattern/structure
coefficients (formerly “loadings”). If responses are
heterogeneous,
then the eigenvalue for the first component may be used to
calcu-
late a variety of statistics other than alpha to assess ICR of
measurements (see Table 3; Hattie, 1985). Nevertheless, use of
results from analyses of full scales to replace subscales
endangers
theoretical constructs because the analyses involve different
clus-
ters of items as well as different implicit hypotheses.
FA issues. In general, Phinney and her associates have con-
ducted increasingly more sophisticated FAs of her measures, al-
though evaluators of her measure typically have favored PCA
(Phinney, 1992; Roberts et al., 1999). On the face of it, FA and
confirmatory FA should permit the test of theoretical models
associated with the relevant items, if the theory and data fit the
measurement model. As previously discussed, when alpha
coeffi-
cients are too large (e.g., close to 1.00) or data are ipsative,
then
FA of entire scales either will not be possible or will yield
specious
results because of item redundancy. Warnings to the effect that
covariance matrices are “nonpositive definite” or “ill-
conditioned”
are indicative of interdependencies among items, but such warn-
ings are not necessarily indicative of flawed measures. Instead
they
might signal the need to shift one’s psychometric focus from
traditional ICR studies to alternatives, such as, for example,
anal-
yses to identify clusters or profiles of people rather than items
(Johnson, Wood, & Blinkhorn, 1988).
Common issues. For PCA and FA, researchers typically do not
report their analyses well enough to permit replication. Either
they
do not report methods for deciding the number of components or
factors that were extracted or rotation methods, or they use out-
moded methods, or they test models that are incongruent with
theory. A few remedies for these poor practices not discussed
elsewhere are as follows:
1. As appropriate, researchers should report all of the pattern/
structure coefficients for their PCA, FA, or structural equation
modeling (SEM) regardless of whether the coefficients conform
to
a preferred cut score (Lance, Butts, & Michels, 2006). They
should
also report eigenvalues, numbers of items analyzed, commonali-
ties, and any other structural properties of the analysis that
would
permit other investigators to verify or better understand their
findings.
2. Researchers should specify the assumptions of the measure-
ment model that they tested (e.g., homogeneity of variances).
For
SEM researchers, a good practice is to indicate what parameters
were constrained so that other researchers can determine
whether
the measurement model fits their interpretation of the relevant
REI
theory.
3. Users of PCA and FA should report what procedure was used
to decide the number of factors or components to extract and
should use parallel analysis rather than Kaiser’s criterion of
eig-
envalues greater than 1.00 (Hayton, Allen, & Scarpello, 2004).
4. Most REI theories do not propose orthogonal or independent
constructs; therefore, models and rotation methodologies that
as-
sume independence should not be used if they are inconsistent
with
theories.
5. If ipsative data are analyzed, they will necessarily yield
bipolar factors or components equal to one fewer than the
number
244 HELMS
of REI subscales, and the resulting “new” PCA/FA-derived sub-
scales will be ipsative or partially ipsative, too.
Conclusions
Researchers have not differentiated scale development from
theory testing research. The former is necessary if no tool exists
for
assessing relevant theoretical constructs or if the researcher
wants
to develop alternative measures to evaluate constructs, but this
typically has not been the case where REI scales are concerned.
Existent REI scales may need to be revised to better represent
their
underlying hypothetical constructs, but this cannot happen if (a)
each researcher changes the content of already developed REI
scales to reflect the responses of each new sample, (b) the mea-
surement models and samples used to inform the revisions do
not
fit the model implied by the REI theory being tested, and (c)
revisions rely exclusively on evaluations of the internal
structure
of the REI measures, no matter how well the research is
conducted.
REI scale development began as a quest to replace one-item
measures (i.e., racial categories) with more complex tools for
assessing individual differences in internalized racial-group
social-
ization (Jackson & Kirschner, 1973). However, contemporary
re-
searchers have commonly engaged in a variety of research
design
and psychometric practices that threaten to return the measures
and
their associated theories back to their more simplistic
atheoretical
roots, thereby limiting their usefulness for understanding the ef-
fects of racial and cultural socialization on people’s mental
health.
In an effort to discourage the routine practices of using research
design and ICR psychometric analyses to reduce all REI scales
to
measures of unidimensional constructs whether or not such
reduc-
tion is theory consistent, the focus of this article has been on
introducing counseling psychologists to some better practices
for
matching REI theoretical constructs and measures to the
theories’
implicit measurement models. Doing so should not be construed
as
an endorsement of the practice of giving preeminence to studies
of
the internal structure of REI measures rather than to other types
of
validity studies, even if methodologies that permit development
of
heterogeneous REI scales are used. Ultimately it does not
matter
how well REI scale items relate to each other if they do not help
explain complex behaviors beyond themselves (AERA et al.,
1999).
[The] complexity of psychosocial behavior may require tests to
be
heterogeneous, perhaps irreducibly so, to maintain their
reliability,
validity, and predictive utility. . .If a theory claims that an
entity has
multiple attributes, then the test measuring that entity should
measure
all relevant attributes. Therefore, tests must be heterogeneous.
The
meaningfulness of a test lies not in a methodological
prescription of
homogeneity but in the test’s ability to capture all relevant
attributes
of the entity it purports to measure. (Lucke, 2005, p. 66)
What psychosocial behaviors can be more complex than racial
identity and ethnic identity in the United States?
References
American Educational Research Association (AERA), American
Psycho-
logical Association and National Council on Measurement in
Education.
(1999). Standards for educational and psychological testing.
Washing-
ton, DC: AERA.
Bacon, D. R., Sauer, P. L., & Young, M. (1995). Composite
reliability in
structural equations modeling. Educational and Psychological
Measure-
ment, 55, 394 – 406.
Baron, H. (1996). Strengths and limitations of ipsative
measurement.
Journal of Occupational and Organizational Psychology, 69, 49
–56.
Behrens, J. T. (1997). Does the White Racial Identity Scale
measure racial
identity? Journal of Counseling Psychology, 44, 3–12.
Behrens, J. T., & Rowe, W. (1997). Measuring White racial
identity: A
reply to Helms (1997). Journal of Counseling Psychology, 44,
17–19.
Betancourt, H., & López, S. R. (1993). The study of culture,
ethnicity, and
race in American psychology. American Psychologist, 48, 629 –
637.
Branch, W. (1990). On interpreting correlation coefficients.
American
Psychologist, 45, 296.
Carter, M. M., Sbrocco, T., Lewis, E. L., & Friedman, E. K.
(2001).
Parental bonding and anxiety: Differences between African
American
and European American college students. Anxiety Disorders,
15, 555–
569.
Choney, S. K., & Rowe, W. (1994). Assessing White racial
identity: The
White Racial Consciousness Development Scale (WRCDS).
Journal of
Counseling & Development, 73, 102–104.
Claney, C., & Parker, W. M. (1989). Assessing White racial
consciousness
and perceived comfort with Black individuals: A preliminary
study.
Journal of Counseling & Development, 67, 449 – 451.
Cokley, K. O. (2005). Racial(ized) identity, ethnic identity, and
Africentric
values: Conceptual and methodological challenges in
understanding
African American identity. Journal of Counseling Psychology,
52, 517–
526.
Cortina, J. M. (1993). What is coefficient alpha? An
examination of theory
and applications. Journal of Applied Psychology, 78, 96 –104.
Cronbach, L. J. (1951). Coefficient alpha and the internal
structure of tests.
Psychometrika, 16, 297–334.
Dawis, R. V. (1987). Scale construction. Journal of Counseling
Psychol-
ogy, 34, 481– 489.
Erikson, E. H. (1968). Identity: Youth and crisis. New York:
Norton.
Fan, X., & Thompson, B. (2001). Confidence intervals about
score reli-
ability please: An EPM guidelines editorial. Educational and
Psycho-
logical Measurement, 61, 517–531.
Feldt, L. S., & Charter, R. A. (2003). Estimating the reliability
of a test split
into two parts of equal or unequal length. Psychological
Methods, 8,
102–109.
Ferketich, S. (1990). Focus on psychometrics: Internal
consistency esti-
mates of reliability. Research in Nursing & Health, 13, 437–
440.
Fischer, A. R., & Moradi, B. (2001). Racial and ethnic identity:
Recent
developments and needed directions. In J. G. Ponterotto, J. M.
Casas,
L. A. Suzuki, & C. M. Alexander (Eds.), Handbook of
multicultural
counseling (2nd ed., pp. 341–370). Thousand Oaks, CA: Sage.
Goodstein, R., & Ponterotto, J. G. (1997). Racial and ethnic
identity: Their
relationship and their contribution to self-esteem. Journal of
Black
Psychology, 23, 275–292.
Hattie, J. (1985). Methodology review: Assessing
unidimensionality of
tests and items. Applied Psychological Measurement, 9, 139 –
164.
Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor
retention
decisions in exploratory factor analysis: A tutorial on parallel
analysis.
Organizational Research Methods, 7, 191–205.
Helms, J. E. (1984). Toward a theoretical explanation of the
effects of race
on counseling: A Black and White model. The Counseling
Psychologist,
12, 153–165.
Helms, J. E. (Ed.). (1990). Black and White racial identity:
Theory,
research, and practice. Westport, CT: Greenwood Press.
Helms, J. E. (1999). Another meta-analysis of the White Racial
Identity
Attitudes Scale’s alphas: Implications for validity. Measurement
and
Evaluation in Counseling and Development, 32, 122–137.
Helms, J. E. (2005). Challenging some misuses of reliability
coefficients as
reflected in evaluations of the White Racial Identity Attitude
Scale
(WRIAS). In R. T. Carter (Ed.), Handbook of racial– cultural
psychol-
245SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
ogy and counseling: Theory and research (Vol. 1, pp. 360 –390).
New
York: Wiley.
Helms, J. E., & Carter, R. T. (1990). Development of the White
Racial
Identity Inventory. In J. E. Helms (Ed.), Black and White racial
identity:
Theory, research, and practice (pp. 67– 80). Westport, CT:
Greenwood
Press.
Helms, J. E., Henze, K., Sass, T., & Mifsud, V. (2006). Treating
Cron-
bach’s alpha reliability coefficients as data in counseling
research. The
Counseling Psychologist, 34, 630 – 660.
Helms, J. E., Jernigan, M., & Mascher, J. (2005). The meaning
of race in
psychology and how to change it. American Psychologist, 60,
27–36.
Jackson, G. G., & Kirschner, S. A. (1973). Racial self-
designation and
preference for a counselor. Journal of Counseling Psychology,
20,
560 –564.
Johnson, C. E., Wood, R., & Blinkhorn, S. F. (1988). Spuriouser
and
spuriouser: The use of ipsative personality tests. Journal of
Occupa-
tional Psychology, 61, 153–162.
Johnson, S. C. (2004). The relation of racial identity, ethnic
identity, and
perceived racial discrimination among African Americans.
Unpublished
doctoral dissertation, University of Houston, Texas.
Kelly, S. (2004). Underlying components of scores assessing
African
Americans’ racial perspectives. Measurement and Evaluation in
Coun-
seling and Development, 37, 28 – 40.
Kim, J., & Mueller, C. W. (1978). Factor analysis: Statistical
methods and
practical issues. Newbury Park, CA: Sage.
Komaroff, E. (1997). Effect of simultaneous violations of
essential
�-equivalence and uncorrelated error on coefficient �. Applied
Psycho-
logical Measurement, 21, 337–348.
Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The
sources of four
commonly reported cutoff criteria: What did they really say?
Organiza-
tional Research Methods, 9, 202–220.
Lee, G., Dunbar, S. B., & Frisbie, D. A. (2001). The relative
appropriate-
ness of eight measurement models for analyzing scores from
tests
composed of testlets. Educational and Psychological
Measurement, 61,
958 –975.
Lucke, J. F. (2005). The � and the
of congeneric test theory: An
extension of reliability and internal consistency to
heterogeneous tests.
Applied Psychological Measurement, 29, 65– 81.
Mercer, S. H., & Cunningham, M. (2003). Racial identity in
White Amer-
ican college students: Issues of conceptualization and
measurement.
Journal of College Student Development, 44, 217–230.
Onwuegbuzie, A. J., & Daniel, L. G. (1999, November). Uses
and misuses
of the correlation coefficient. Paper presented at the annual
meeting of
the Mid-South Educational Research Association, Point Clear,
AL.
Owens, W. A. (1947). An empirical study of the relationship
between item
validity and internal consistency. Educational and Psychological
Mea-
surement, 7, 281–288.
Parham, T. A., & Helms, J. E. (1981). The influence of Black
students’
racial identity attitudes on preferences for counselor’s race.
Journal of
Counseling Psychology, 28, 250 –257.
Peterson, R. A. (1994). A meta-analysis of Cronbach’s
coefficient alpha.
Journal of Consumer Psychology, 21, 381–391.
Phan, T., & Tylka, T. L. (2006). Exploring a model and
moderators of
disordered eating with Asian American college women. Journal
of
Counseling Psychology, 53, 36 – 47.
Phelps, R. E., Taylor, J. D., & Gerard, P. A. (2001). Cultural
mistrust,
ethnic identity, racial identity, and self-esteem among ethnically
diverse
Black university students. Journal of Counseling &
Development, 79,
209 –216.
Phinney, J. S. (1990). Ethnic identity in adolescence and
adulthood: A
review of research. Psychological Bulletin, 108, 499 –514.
Phinney, J. S. (1992). The Multigroup Ethnic Group Measure: A
new scale
for use with diverse groups. Journal of Adolescent Research, 7,
156 –
176.
Phinney, J. S., & Alipuria, L. L. (1990). Ethnic identity in
college students
from four ethnic groups. Journal of Adolescence, 13, 171–183.
Raykov, T. (1997). Scale reliability, Cronbach’s coefficient
alpha, and
violations of essential tau-equivalence with fixed congeneric
compo-
nents. Multivariate Behavioral Research, 32, 329 –353.
Raykov, T. (1998). Coefficient alpha and composite reliability
with inter-
related nonhomogeneous items. Applied Psychological
Measurement,
22, 375–385.
Reese, L. E., Vera, E. M., & Paikoff, R. L. (1998). Ethnic
identity
assessment among inner-city African American children:
Evaluating the
applicability of the Multigroup Ethnic Identity Measure. Journal
of
Black Psychology, 24, 289 –304.
Roberts, R. E., Phinney, J. S., Masse, L. C., Chen, Y. R.,
Roberts, C. R.,
& Romero, A. (1999). The structure of ethnic identity of young
adoles-
cents from diverse ethnocultural groups. Journal of Early
Adolescence,
19, 301–322.
Rogers, W. M., Schmitt, N., & Mullins, M. E. (2002).
Correction for
unreliability of multifactor measures: Comparison of alpha and
parallel
forms approaches. Organizational Research Methods, 5, 184 –
199.
Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in
psychological
research: Lessons from 26 research scenarios. Psychological
Methods, 2,
199 –223.
Schmidt, F. L., & Hunter, J. E. (1999). Theory testing and
measurement
error. Intelligence, 27, 183–198.
Thompson, B. (1994). Guidelines for authors reporting score
reliability
estimates. Educational and Psychological Measurement, 54,
837– 847.
Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is
datametrics:
The test is not reliable. Educational and Psychological
Measurement,
60, 174 –195.
Utsey, S. O., & Gernat, C. A. (2002). White racial identity
attitudes and the
ego defense mechanisms used by counselor trainees in racially
provoc-
ative counseling situations. Journal of Counseling &
Development, 80,
475– 483.
Utsey, S. O., & Ponterotto, J. (1996). Development and
validation of the
Index of Race-Related Stress (IRRS). Journal of Counseling
Psychol-
ogy, 43, 490 –501.
Vacha-Haase, T. (1998). Reliability generalization: Exploring
variance in
measurement error affecting score reliability across studies.
Educational
and Psychological Measurement, 58, 6 –20.
Vacha-Haase, T., Kogan, L. R., & Thompson, B. (2000). Sample
compo-
sitions and variability in published studies versus those in test
manuals:
Validity of score reliability inductions. Educational and
Psychological
Measurement, 60, 509 – 622.
Yancey, A. K., Aneshensel, C. S., & Driscoll, A. K. (2001). The
assess-
ment of ethnic identity in a diverse urban youth population.
Journal of
Black Psychology, 27, 190 –208.
Received August 30, 2006
Revision received January 9, 2007
Accepted January 14, 2007 �
246 HELMS
Some Better Practices for Measuring Racial and Ethnic Identity
Constructs
Janet E. Helms
Boston College
Racial and ethnic identity (REI) measures are in danger of
becoming conceptually meaningless because
of evaluators’ insistence that they conform to measurement
models intended to assess unidimensional
constructs, rather than the multidimensional constructs
necessary to capture the complexity of internal-
ized racial or cultural socialization. Some aspects of the
intersection of REI theoretical constructs with
research design and psychometric practices are discussed, and
recommendations for more informed use
of each are provided. A table that summarizes some
psychometric techniques for analyzing multidimen-
sional measures is provided.
Keywords: racial identity, ethnic identity, reliability, validity,
factor analysis
In counseling psychology, the measurement of racial identity
constructs is a relatively new phenomenon. Arguably, the
practice
began when Jackson and Kirschner (1973) attempted to
introduce
complexity into the measurement of Black students’ racial
identity
by using a single categorical item with multiple options (e.g.,
“Black,” “Negro”) that the students could use to describe them-
selves. Helms and Parham (used in Parham & Helms, 1981) and
Helms and Carter (1990) built on the idea that assessment of
individual differences in racial identity is important, and they
added complexity to the measurement process by (a) developing
measures that were based on racial identity theoretical frame-
works, (b) using multiple items to assess the constructs inherent
to
the theories, and (c) asking participants to use continua (i.e.,
5-point Likert scales) rather than categories to self-describe.
These
principles underlie the Black Racial Identity Attitudes Scale
(BRIAS; formerly RIAS–B) and White Racial Identity Attitudes
Scale (WRIAS).
In response to perceived conceptual, methodological, or content
concerns with Helms and associates’ racial identity measures,
many rebuttal measures followed. Rebuttal measures are scales
that the new scale originator(s) specifically described as correc-
tions for one or more such deficiencies in preexisting identity
measures (e.g., Phinney, 1992, p. 157). Subsequent measures
have
tended to rely on the previously listed basic measurement
princi-
ples introduced by Parham and Helms (1981), although the theo-
retical rationales for the measures have varied. Phinney’s Multi-
group Ethnic Identity Measure (MEIM), the most frequently
used
of the rebuttal measures to date, added the principle of
measuring
“ethnic” rather than “racial identity,” which she seemingly
viewed
as interchangeable constructs. The MEIM also introduced the
principle of measuring the same identity constructs across racial
or
ethnic groups rather than group-specific constructs within them.
The BRIAS and WRIAS may be thought of as representative of
a class of identity measures in which opposing stages, statuses,
or
schemas are assessed, whereas the MEIM may be
conceptualized
as representative of a class of measures in which different
behav-
iors or attitudes are used to assess levels of commitment to a
single
group (i.e., one’s own). Consequently, these measures are used
as
exemplars of their classes in subsequent discussions. The two
classes of measures imply some similar as well as some
different
desirable practices with respect to research design,
measurement or
psychometrics, and interpretation that have not been addressed
in
the racial or ethnic identity literature heretofore. In fact,
virtually
no literature exists that focuses specifically on good practices
for
using or evaluating already developed theory-based racial or
ethnic
identity (REI) measures.
It is important to describe better practices for using already
developed REI scales to avoid oversimplifying essentially com-
plex measurement issues that are often inherent in REI
theoretical
constructs. The primary sources of my belief that a discussion
of
better practices is necessary are my experiences reviewing
manu-
scripts, submitting manuscripts, advising researchers, and being
fully engaged in REI research. Therefore, the purposes of this
article are to make explicit better practices for designing
research
and conducting psychometric analyses when using REI measures
to study identity constructs with new samples. I sometimes use
published studies to illustrate a practice or procedure; in most
instances, the studies were selected because their authors
reported
results in enough detail to permit the studies’ use for illustrative
purposes. More generally, the article is divided into two broad
sections, research design practices and psychometric practices.
The first section addresses conceptual issues pertinent to
research
design; the psychometric section addresses scale development
concerns.
Research Design Practices
The content of REI scales is intended to reflect standard
samples
of particular types of life experiences (racial vs. ethnic) as
postu-
lated by the relevant theory. A central empirical question with
respect to researchers’ use of REI scales is whether racial
identity
and ethnic identity scales measure the same constructs.
However,
the question cannot be adequately addressed if researchers do
not
use research design practices that are congruent with the
theoret-
Correspondence concerning this article should be addressed to
Janet E.
Helms, Department of Counseling, Developmental, and
Educational Psy-
chology, Boston College, 317 Campion Hall, Chestnut Hill, MA
02467.
E-mail: [email protected]
Journal of Counseling Psychology Copyright 2007 by the
American Psychological Association
2007, Vol. 54, No. 3, 235–246 0022-0167/07/$12.00 DOI:
10.1037/0022-0167.54.3.235
235
ical model(s) underlying each scale(s) under study. In this
section,
I (a) discuss some conceptual issues related to measuring racial
identity and ethnic identity as potentially different constructs,
(b)
discuss some poor practices that obscure differences if they
exist,
and (c) proffer some better practices.
Differentiating Racial Identity From Ethnic Identity
In REI research designs, if the researcher’s intent is to
substitute
one class of REI measures for the other, then it is important to
demonstrate that the two types of measures assess the same
racial
or ethnic constructs. Factors to consider are (a)
conceptualization
of the research question, (b) sample selection, (c) use of other
measures for assessing one type of identity rather than the
other,
and (d) comparability of validity evidence within and across
REI
measures.
Racial Identity Scales as Replacements for Racial
Categories
Racial groups or categories are not psychological constructs
because they do not connote any explicit behaviors, traits, or
biological or environmental conditions (Helms, Jernigan, &
Mascher, 2005). Instead racial categories are sociopolitical con-
structions that society uses to aggregate people on the basis of
ostensible biological characteristics (Helms, 1990). Because
racial
categories are null constructs, Helms et al. (2005) contended
that
they should not be used as the conceptual focus (e.g.,
independent
variables) for empirical studies but may be used to describe or
define samples or issues. Ascribed racial-group membership im-
plies different group-level racial socialization experiences that
vary according to whether the group is accorded advantaged or
disadvantaged status in society. The content of racial-identity
scales is individual group members’ internalization of the racial
socialization (e.g., discrimination, undeserved privileges) that
per-
tains to their group.
Ascribed racial group defines the type of life experiences to
which a person is exposed and that are available for
internalizing
(i.e., group oppression or privilege). For example, Black Ameri-
cans internalize different racial identities than White
Americans,
and, conversely, White Americans internalize different racial
iden-
tities than Black Americans. Also, the nature of the racial
identities
of Americans and immigrants or other nationals differs if they
have not experienced similar racial socialization during their
life-
times. Thus, racial identity theories are intended to describe
group-
specific development in particular sociopolitical contexts.
Racial identity measures are designed to assess the differential
impact of racial dynamics on individuals’ psychological
develop-
ment. One expects items in racial identity scales or inventories
to
include some mention of race, racial groups, or conditions that
commonly would be construed as racial in nature (e.g.,
discrimi-
nation or advantage on the basis of skin color). For example,
Helms and Carter’s (1990) WRIAS consists of five 10-item
scales,
each of which assesses the differential internalization of
societal
anti-Black racism on Whites’ identity development. Relevant
sam-
pling and measurement concerns are specifying samples and
mea-
sures for which race and racism in various forms are presumably
relevant constructs.
Ethnic Groups as Proxies for Theoretical Constructs
Ethnicity refers to the cultural practices (e.g., customs,
language,
values) of a group of people, but the group need not be the same
ascribed racial group. Betancourt and López (1993) use the term
ethnic group to connote membership in a self-identified kinship
group, defined by specific cultural values, language, and
traditions,
and that engages in transmission of the group’s culture to its
members. Ethnic identity refers to commitment to a cultural
group
and engagement in its cultural practices (e.g., culture, religion),
irrespective of racial ascriptions. Because ethnic groups imply
psychological culture-defined constructs, the constructs rather
than
the categories should be used as conceptual focuses of studies
(e.g., independent variables).
The content domain of ethnic identity measures is internalized
experiences of ethnic cultural socialization. Phinney and
associates
(Phinney, 1992; Phinney & Alipuria, 1990) initially developed
the
MEIM to assess adolescents’ search for and commitment to an
ethnic identity in a manner consistent with Erikson’s (1968)
mul-
tistage pychosocial identity theory and without regard to group-
specific cultural components. Originally, she conceptualized
eth-
nic identity as “a continuous variable [construct or scale],
ranging
from the lack of exploration and commitment . . . to evidence of
both exploration and commitment, reflected in efforts to learn
more about one’s background” (Phinney, 1992, p. 161). Her
con-
tinuous scale was composed of items assessing several
dimensions
of identity (e.g., ethnic behaviors, affirmation, and belonging);
hence, it was a multidimensional scale (Helms, Henze, Sass, &
Mifsud, 2006), with a focus on cultural characteristics that are
assumed to be relevant to individuals across ethnic groups.
Although the structure of the MEIM has varied, its underlying
conceptual theme is conformance to ethnic culture rather than
exposure to racism. The conceptual, sampling, and measurement
issues specific to ethnic identity measures pertain to identifying
participants who might be reasonably expected to engage in the
cultural practices of the ethnic cultural kinship group in
question
and ensuring that ethnic identity measures assess relevant
culture-
related rather than race-inspired psychological construct(s).
Selection and Use of Appropriate REI Measures
Researchers often use one type of REI measure (e.g., ethnic
identity) but provide a conceptual rationale for the other type
(e.g.,
racial identity) without empirical justification for doing so. Em-
pirical support for the interchangeability of identity constructs
and
measures would include evidence that (a) exemplars of the two
classes of measures are similarly related to the same external
racial
or cultural criteria or (b) one type of measure explains its own
as
well as the other theory’s external criteria best. Support for the
distinctiveness of constructs would be lack of support for inter-
changeability and evidence that other identity measures from
the
same class relate to each other in a logically consistent matter.
Empirical Comparisons of the MEIM and BRIAS as
Measures of REI Constructs
Researchers do not seem to consider whether their cultural or
racial outcome measures are theoretically congruent with the
type
of REI measure that they have selected. Consequently, lack of
236 HELMS
support for their hypotheses is attributed to deficient REI
measures
rather than possible incongruence between the researchers’ con-
ceptualization and measurement of REI constructs in their
research
designs. It is difficult to find a single study in which both
classes
of REI measures and racial and cultural outcome measures were
used. Yet for the purpose of illustrating the type of study
necessary
to support interchangeable use of REI measures, perhaps it is
reasonable to think that scores on racial identity measures, such
as
the BRIAS, should be related to scores on explicit measures of
racial constructs (e.g., perceived individual racism, institutional
racism), whereas scores on ethnic identity measures, such as the
MEIM, should be related to scores on explicit measures of
cultural
constructs (e.g., acculturation, cultural values). Confirmation of
each of these propositions would be evidence of construct expli-
cation in that each measure would be assessing constructs
germane
to it.
Johnson’s (2004) study provides sufficient psychometric sum-
mary statistics to permit illustration of the test for
interchangeabil-
ity of REI measures at least in part. His sample of Black college
students (N � 167) responded to the MEIM and the RIAS–B
(Parham & Helms, 1981), the earliest version of the BRIAS.
Table
1 summarizes alpha coefficients (rxx; in the last column) for the
REI measures, correlation coefficients between REI scores and
perceived discrimination scores, and the same correlation
coeffi-
cients corrected for disattenuation due to measurement error
(i.e.,
low alpha coefficients) attributable to MEIM or RIAS–B scores.
In
this example, the correction for attenuation may be interpreted
as
an estimate of the extent to which an REI and a discrimination
subscale measure the same underlying theoretical construct
when
the effects of REI error are eliminated (Schmidt & Hunter,
1996,
1999).
The dependent measures in the study, assessed by the Index of
Race Related Stress (Utsey & Ponterotto, 1996), were three
types
of racism: (a) cultural—belief in the superiority of one’s own
culture (e.g., values, traditions); (b) institutional—
discrimination
in social policies and practices; and (c) individual—personally
experienced acts of discrimination. The dependent constructs
favor
the RIAS–B rather than the MEIM, as is evident in the table. If
alpha is the correct reliability estimate and the correlations
were
calculated on untransformed scores, then the correlations, cor-
rected for attenuation attributable to REI measurement error,
sug-
gest even stronger relations between the racial-identity
constructs
and perceived discrimination than the ethnic-identity constructs.
The best correlation for MEIM scores is cultural racism, which
is
theory consistent and suggests that a more full blown cultural
measure might have favored it.
Alternative Measures of the Same Construct(s)
The question of whether scores on different measures of the
same theoretical constructs are related to scores on the original
measures is a matter of seeking evidence of convergent validity,
a
type of construct validity. Researchers seemingly have not
devel-
oped alternative measures of the same constructs postulated in
Phinney’s (1990, 1992) theoretical perspective, and only one set
of
researchers (Claney & Parker, 1989) has developed independent
measures of the theoretical constructs of Helms’s (1984, 1990)
White racial identity model. Choney and Rowe (1994)
conducted
an evaluation of Claney and Parker’s (1989) 15-item White
Racial
Consciousness Development Scale (WRCDS), a measure of
Helms’s (1984) stages of White racial identity development that
predated her own measure, the WRIAS. It is worth examining
the
study for what it can reveal about good practices in empirical
investigation of construct validity of scores on REI measures.
The expressed purpose of Choney and Rowe’s (1994) study was
to investigate “how the WRCDS compares with the RIAS–W
[sic],
the current instrument of choice for investigations of White
racial
identity” (p. 102). Yet their conclusion that “it seems
reasonable to
conclude that the WRCDS is not capable of adequately
assessing
the stages of White identity proposed by Helms (1984)”
(Choney
& Rowe, 1994, p. 104) suggests that an unspoken purpose may
have been to examine scores on the two scales for evidence of
convergent validity.
In Table 2, Cronbach’s alpha reliability coefficients for the
WRCDS and WRIAS scales are summarized in the last two
columns, and correlations between parallel subscales, adapted
from Choney and Rowe (1994, p. 103), are shown in the second
column. The authors did not report a full correlation matrix for
the
Table 1
Comparing Obtained and Corrected Correlations Between MEIM
and RIAS-B Scores and Perceived Racism
Scale
Perceived discrimination
Alpha
Cultural Institutional Individual
Obtained Corrected Obtained Corrected Obtained Corrected
MEIM .16 .17 �.01 �.01 .08 .09 .87
Racial
Preencounter �.01 �.01 .39 .47 .07 .08 .69
Encounter .36 .50 .25 .35 .28 .39 .51
Immersion .33 .41 .29 .36 .22 .27 .65
Internalization .37 .43 .13 .15 .35 .47 .75
Note. From The Relation of Racial Identity, Ethnic Identity, and
Perceived Racial Discrimination Among African Americans
Tables 1 (p. 18) and 6 (p.
33) by S. C. Johnson, 2004, Unpublished doctoral dissertation,
University of Houston, Texas. “Obtained” were correlations
reported by Johnson.
“Corrected” are estimates of correlations with measurement
error removed. Only measurement error for the MEIM or racial
identity subscales (i.e., alpha)
were used to correct for disattenuation of correlations
attributable to measurement error. The correction for
attenuation used was rxy� � rxy/rxx
.5, where
rxy� equals the corrected correlation, rxy equals the obtained
correlation, and rxx
.5 equals the square root of the reliability coefficient for the
relevant MEIM
or RIAS-B subscales. MEIM � Multigroup Ethnic Identity
Measure; RIAS-B � Black Racial Identity Attitudes Scale.
237SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
two measures, and so it is not possible to examine within-
measure
patterns of correlations. Table 2 also includes correlations cor-
rected for measurement error in each of the REI measures using
each scale’s reported Cronbach’s alpha coefficient.
The corrected correlations for these data suggest that the two
measures assessed the parallel constructs of Reintegration,
Auton-
omy, and Pseudo-Independence quite strongly and the
constructs
of Contact and Disintegration much more strongly than mere
examination of the obtained correlations would suggest, thereby
refuting Choney and Rowe’s assertion that the WRCDS was
“incapable” of assessing Helms’s constructs. However, the fact
that the corrected Reintegration correlation coefficient exceeded
1.00 suggests either that Choney and Rowe’s original
correlations
were downwardly biased by sampling error or that alpha coeffi-
cients were not the appropriate reliability estimates for their
data.
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx
Some Better Practices for Measuring Racial and Ethnic Identity.docx

More Related Content

Similar to Some Better Practices for Measuring Racial and Ethnic Identity.docx

10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
BenitoSumpter862
 
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
SantosConleyha
 
Journal of Personality and Social Psychology1997, Vol. 73, N.docx
Journal of Personality and Social Psychology1997, Vol. 73, N.docxJournal of Personality and Social Psychology1997, Vol. 73, N.docx
Journal of Personality and Social Psychology1997, Vol. 73, N.docx
tawnyataylor528
 
Interdisciplinary analysis of race and ability intersections
Interdisciplinary analysis of race and ability intersectionsInterdisciplinary analysis of race and ability intersections
Interdisciplinary analysis of race and ability intersections
Alfredo Artiles
 
ANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docx
ANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docxANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docx
ANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docx
festockton
 
Conducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docx
Conducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docxConducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docx
Conducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docx
donnajames55
 
1Introduction The Multicultural PersonBoth the nature of what.docx
1Introduction The Multicultural PersonBoth the nature of what.docx1Introduction The Multicultural PersonBoth the nature of what.docx
1Introduction The Multicultural PersonBoth the nature of what.docx
drennanmicah
 
Running Head SOCIOLOGY1SOCIOLOGY 7Resea.docx
Running Head SOCIOLOGY1SOCIOLOGY 7Resea.docxRunning Head SOCIOLOGY1SOCIOLOGY 7Resea.docx
Running Head SOCIOLOGY1SOCIOLOGY 7Resea.docx
todd521
 
Erwin_Strain_APA_Poster_15
Erwin_Strain_APA_Poster_15Erwin_Strain_APA_Poster_15
Erwin_Strain_APA_Poster_15
Kyle Erwin
 
A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups
A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups
A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups
Sam Rany
 
Do We Overemphasize the Role of Culture in the Behavior ofRa.docx
Do We Overemphasize the Role of Culture in the Behavior ofRa.docxDo We Overemphasize the Role of Culture in the Behavior ofRa.docx
Do We Overemphasize the Role of Culture in the Behavior ofRa.docx
petehbailey729071
 

Similar to Some Better Practices for Measuring Racial and Ethnic Identity.docx (20)

Texto BC2
Texto BC2Texto BC2
Texto BC2
 
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
 
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
10 Diversity and Assessment Joyce P. Chu, Brian A. Maruy
 
Journal of Personality and Social Psychology1997, Vol. 73, N.docx
Journal of Personality and Social Psychology1997, Vol. 73, N.docxJournal of Personality and Social Psychology1997, Vol. 73, N.docx
Journal of Personality and Social Psychology1997, Vol. 73, N.docx
 
Interdisciplinary analysis of race and ability intersections
Interdisciplinary analysis of race and ability intersectionsInterdisciplinary analysis of race and ability intersections
Interdisciplinary analysis of race and ability intersections
 
ANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docx
ANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docxANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docx
ANYTIME YOU COMPLETE A PAPERESSAY in this course you must follow .docx
 
Conducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docx
Conducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docxConducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docx
Conducting Culturally Sensitive Qualitative Research DEVIKADIBYA.docx
 
1Introduction The Multicultural PersonBoth the nature of what.docx
1Introduction The Multicultural PersonBoth the nature of what.docx1Introduction The Multicultural PersonBoth the nature of what.docx
1Introduction The Multicultural PersonBoth the nature of what.docx
 
Shifting landscape of lgbt org research
Shifting landscape of lgbt org researchShifting landscape of lgbt org research
Shifting landscape of lgbt org research
 
Running Head SOCIOLOGY1SOCIOLOGY 7Resea.docx
Running Head SOCIOLOGY1SOCIOLOGY 7Resea.docxRunning Head SOCIOLOGY1SOCIOLOGY 7Resea.docx
Running Head SOCIOLOGY1SOCIOLOGY 7Resea.docx
 
Comparative political system
Comparative political systemComparative political system
Comparative political system
 
Erwin_Strain_APA_Poster_15
Erwin_Strain_APA_Poster_15Erwin_Strain_APA_Poster_15
Erwin_Strain_APA_Poster_15
 
A comparison of youth’s value systems
A comparison of youth’s value systemsA comparison of youth’s value systems
A comparison of youth’s value systems
 
A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups
A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups
A Comparison of Youth’s Value Systems: The Case of Vietnamese Ethnic Groups
 
The exploring nature of the assessment instrument of five factors of personal...
The exploring nature of the assessment instrument of five factors of personal...The exploring nature of the assessment instrument of five factors of personal...
The exploring nature of the assessment instrument of five factors of personal...
 
Qualitative Research.12.05.2021.Final.pdf
Qualitative Research.12.05.2021.Final.pdfQualitative Research.12.05.2021.Final.pdf
Qualitative Research.12.05.2021.Final.pdf
 
Mallinson
MallinsonMallinson
Mallinson
 
Analyze Quantitative And Qualitative Research
Analyze Quantitative And Qualitative ResearchAnalyze Quantitative And Qualitative Research
Analyze Quantitative And Qualitative Research
 
A Content Analysis Of Arguing Behaviors A Case Study Of Romania As Compared ...
A Content Analysis Of Arguing Behaviors  A Case Study Of Romania As Compared ...A Content Analysis Of Arguing Behaviors  A Case Study Of Romania As Compared ...
A Content Analysis Of Arguing Behaviors A Case Study Of Romania As Compared ...
 
Do We Overemphasize the Role of Culture in the Behavior ofRa.docx
Do We Overemphasize the Role of Culture in the Behavior ofRa.docxDo We Overemphasize the Role of Culture in the Behavior ofRa.docx
Do We Overemphasize the Role of Culture in the Behavior ofRa.docx
 

More from rronald3

Some experts assert that who we are is a result of nurture—the rel.docx
Some experts assert that who we are is a result of nurture—the rel.docxSome experts assert that who we are is a result of nurture—the rel.docx
Some experts assert that who we are is a result of nurture—the rel.docx
rronald3
 
Some examples of writingThis is detailed practical guidance with.docx
Some examples of writingThis is detailed practical guidance with.docxSome examples of writingThis is detailed practical guidance with.docx
Some examples of writingThis is detailed practical guidance with.docx
rronald3
 
SOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION I .docx
SOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION   I .docxSOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION   I .docx
SOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION I .docx
rronald3
 
Solving the Problem Five-Step Marketing Research Approach.docx
Solving the Problem Five-Step Marketing Research Approach.docxSolving the Problem Five-Step Marketing Research Approach.docx
Solving the Problem Five-Step Marketing Research Approach.docx
rronald3
 
Somatic Symptom and Related DisordersPrior to beginning work o.docx
Somatic Symptom and Related DisordersPrior to beginning work o.docxSomatic Symptom and Related DisordersPrior to beginning work o.docx
Somatic Symptom and Related DisordersPrior to beginning work o.docx
rronald3
 
Soma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docx
Soma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docxSoma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docx
Soma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docx
rronald3
 
SolutionsPro here is Part II of COMP 101 DB response to classmates. .docx
SolutionsPro here is Part II of COMP 101 DB response to classmates. .docxSolutionsPro here is Part II of COMP 101 DB response to classmates. .docx
SolutionsPro here is Part II of COMP 101 DB response to classmates. .docx
rronald3
 
Solution of Assessment 2Change Management Plan 1.I.docx
Solution of Assessment 2Change Management Plan 1.I.docxSolution of Assessment 2Change Management Plan 1.I.docx
Solution of Assessment 2Change Management Plan 1.I.docx
rronald3
 
SolutionsPro here is Part II of the Psychology assignment as we disc.docx
SolutionsPro here is Part II of the Psychology assignment as we disc.docxSolutionsPro here is Part II of the Psychology assignment as we disc.docx
SolutionsPro here is Part II of the Psychology assignment as we disc.docx
rronald3
 
SolutionsPro here are my Week 4 assignments due by midnight tomorrow.docx
SolutionsPro here are my Week 4 assignments due by midnight tomorrow.docxSolutionsPro here are my Week 4 assignments due by midnight tomorrow.docx
SolutionsPro here are my Week 4 assignments due by midnight tomorrow.docx
rronald3
 

More from rronald3 (20)

Some experts assert that who we are is a result of nurture—the rel.docx
Some experts assert that who we are is a result of nurture—the rel.docxSome experts assert that who we are is a result of nurture—the rel.docx
Some experts assert that who we are is a result of nurture—the rel.docx
 
Some examples of writingThis is detailed practical guidance with.docx
Some examples of writingThis is detailed practical guidance with.docxSome examples of writingThis is detailed practical guidance with.docx
Some examples of writingThis is detailed practical guidance with.docx
 
Some common biometric techniques includeFingerprint recognition.docx
Some common biometric techniques includeFingerprint recognition.docxSome common biometric techniques includeFingerprint recognition.docx
Some common biometric techniques includeFingerprint recognition.docx
 
Some common biometric techniques includeFingerprint recogniti.docx
Some common biometric techniques includeFingerprint recogniti.docxSome common biometric techniques includeFingerprint recogniti.docx
Some common biometric techniques includeFingerprint recogniti.docx
 
Some common biometric techniques include1. Fingerprint reco.docx
Some common biometric techniques include1. Fingerprint reco.docxSome common biometric techniques include1. Fingerprint reco.docx
Some common biometric techniques include1. Fingerprint reco.docx
 
Some common biometric techniques includeFingerprint recog.docx
Some common biometric techniques includeFingerprint recog.docxSome common biometric techniques includeFingerprint recog.docx
Some common biometric techniques includeFingerprint recog.docx
 
SOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION I .docx
SOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION   I .docxSOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION   I .docx
SOME BROAD TOPICS FOR CRITICAL ESSAYS ON FICTION I .docx
 
Some 19th century commentators argued that the poor are not well-ada.docx
Some 19th century commentators argued that the poor are not well-ada.docxSome 19th century commentators argued that the poor are not well-ada.docx
Some 19th century commentators argued that the poor are not well-ada.docx
 
Solving the Problem Five-Step Marketing Research Approach.docx
Solving the Problem Five-Step Marketing Research Approach.docxSolving the Problem Five-Step Marketing Research Approach.docx
Solving the Problem Five-Step Marketing Research Approach.docx
 
Somatic Symptom and Related DisordersPrior to beginning work o.docx
Somatic Symptom and Related DisordersPrior to beginning work o.docxSomatic Symptom and Related DisordersPrior to beginning work o.docx
Somatic Symptom and Related DisordersPrior to beginning work o.docx
 
Soma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docx
Soma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docxSoma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docx
Soma Bay Prospers with ERP in the CloudSoma Bay is a 10-millio.docx
 
Somatic symptom disorder has a long history. Sigmund Freud described.docx
Somatic symptom disorder has a long history. Sigmund Freud described.docxSomatic symptom disorder has a long history. Sigmund Freud described.docx
Somatic symptom disorder has a long history. Sigmund Freud described.docx
 
Solve dy dx = 2 cos 2x 3+2y , with y(0) = −1. (i) For what values .docx
Solve dy dx = 2 cos 2x 3+2y , with y(0) = −1. (i) For what values .docxSolve dy dx = 2 cos 2x 3+2y , with y(0) = −1. (i) For what values .docx
Solve dy dx = 2 cos 2x 3+2y , with y(0) = −1. (i) For what values .docx
 
SolveConsider this hypothetical situationDavid Doe is a ne.docx
SolveConsider this hypothetical situationDavid Doe is a ne.docxSolveConsider this hypothetical situationDavid Doe is a ne.docx
SolveConsider this hypothetical situationDavid Doe is a ne.docx
 
SolveConsider this hypothetical situationDavid Doe is a netw.docx
SolveConsider this hypothetical situationDavid Doe is a netw.docxSolveConsider this hypothetical situationDavid Doe is a netw.docx
SolveConsider this hypothetical situationDavid Doe is a netw.docx
 
SolutionsPro here is Part II of COMP 101 DB response to classmates. .docx
SolutionsPro here is Part II of COMP 101 DB response to classmates. .docxSolutionsPro here is Part II of COMP 101 DB response to classmates. .docx
SolutionsPro here is Part II of COMP 101 DB response to classmates. .docx
 
Solution of Assessment 2Change Management Plan 1.I.docx
Solution of Assessment 2Change Management Plan 1.I.docxSolution of Assessment 2Change Management Plan 1.I.docx
Solution of Assessment 2Change Management Plan 1.I.docx
 
Solution-Focused Therapy (Ch 10)Solution-focused practice is.docx
Solution-Focused Therapy (Ch 10)Solution-focused practice is.docxSolution-Focused Therapy (Ch 10)Solution-focused practice is.docx
Solution-Focused Therapy (Ch 10)Solution-focused practice is.docx
 
SolutionsPro here is Part II of the Psychology assignment as we disc.docx
SolutionsPro here is Part II of the Psychology assignment as we disc.docxSolutionsPro here is Part II of the Psychology assignment as we disc.docx
SolutionsPro here is Part II of the Psychology assignment as we disc.docx
 
SolutionsPro here are my Week 4 assignments due by midnight tomorrow.docx
SolutionsPro here are my Week 4 assignments due by midnight tomorrow.docxSolutionsPro here are my Week 4 assignments due by midnight tomorrow.docx
SolutionsPro here are my Week 4 assignments due by midnight tomorrow.docx
 

Recently uploaded

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Some Better Practices for Measuring Racial and Ethnic Identity.docx

  • 1. Some Better Practices for Measuring Racial and Ethnic Identity Constructs Janet E. Helms Boston College Racial and ethnic identity (REI) measures are in danger of becoming conceptually meaningless because of evaluators’ insistence that they conform to measurement models intended to assess unidimensional constructs, rather than the multidimensional constructs necessary to capture the complexity of internal- ized racial or cultural socialization. Some aspects of the intersection of REI theoretical constructs with research design and psychometric practices are discussed, and recommendations for more informed use of each are provided. A table that summarizes some psychometric techniques for analyzing multidimen- sional measures is provided. Keywords: racial identity, ethnic identity, reliability, validity, factor analysis In counseling psychology, the measurement of racial identity constructs is a relatively new phenomenon. Arguably, the practice began when Jackson and Kirschner (1973) attempted to introduce complexity into the measurement of Black students’ racial identity by using a single categorical item with multiple options (e.g., “Black,” “Negro”) that the students could use to describe them-
  • 2. selves. Helms and Parham (used in Parham & Helms, 1981) and Helms and Carter (1990) built on the idea that assessment of individual differences in racial identity is important, and they added complexity to the measurement process by (a) developing measures that were based on racial identity theoretical frame- works, (b) using multiple items to assess the constructs inherent to the theories, and (c) asking participants to use continua (i.e., 5-point Likert scales) rather than categories to self-describe. These principles underlie the Black Racial Identity Attitudes Scale (BRIAS; formerly RIAS–B) and White Racial Identity Attitudes Scale (WRIAS). In response to perceived conceptual, methodological, or content concerns with Helms and associates’ racial identity measures, many rebuttal measures followed. Rebuttal measures are scales that the new scale originator(s) specifically described as correc- tions for one or more such deficiencies in preexisting identity measures (e.g., Phinney, 1992, p. 157). Subsequent measures have tended to rely on the previously listed basic measurement princi- ples introduced by Parham and Helms (1981), although the theo- retical rationales for the measures have varied. Phinney’s Multi- group Ethnic Identity Measure (MEIM), the most frequently used of the rebuttal measures to date, added the principle of measuring “ethnic” rather than “racial identity,” which she seemingly viewed as interchangeable constructs. The MEIM also introduced the principle of measuring the same identity constructs across racial or ethnic groups rather than group-specific constructs within them.
  • 3. The BRIAS and WRIAS may be thought of as representative of a class of identity measures in which opposing stages, statuses, or schemas are assessed, whereas the MEIM may be conceptualized as representative of a class of measures in which different behav- iors or attitudes are used to assess levels of commitment to a single group (i.e., one’s own). Consequently, these measures are used as exemplars of their classes in subsequent discussions. The two classes of measures imply some similar as well as some different desirable practices with respect to research design, measurement or psychometrics, and interpretation that have not been addressed in the racial or ethnic identity literature heretofore. In fact, virtually no literature exists that focuses specifically on good practices for using or evaluating already developed theory-based racial or ethnic identity (REI) measures. It is important to describe better practices for using already developed REI scales to avoid oversimplifying essentially com- plex measurement issues that are often inherent in REI theoretical constructs. The primary sources of my belief that a discussion of better practices is necessary are my experiences reviewing manu- scripts, submitting manuscripts, advising researchers, and being
  • 4. fully engaged in REI research. Therefore, the purposes of this article are to make explicit better practices for designing research and conducting psychometric analyses when using REI measures to study identity constructs with new samples. I sometimes use published studies to illustrate a practice or procedure; in most instances, the studies were selected because their authors reported results in enough detail to permit the studies’ use for illustrative purposes. More generally, the article is divided into two broad sections, research design practices and psychometric practices. The first section addresses conceptual issues pertinent to research design; the psychometric section addresses scale development concerns. Research Design Practices The content of REI scales is intended to reflect standard samples of particular types of life experiences (racial vs. ethnic) as postu- lated by the relevant theory. A central empirical question with respect to researchers’ use of REI scales is whether racial identity and ethnic identity scales measure the same constructs. However, the question cannot be adequately addressed if researchers do not use research design practices that are congruent with the theoret- Correspondence concerning this article should be addressed to Janet E. Helms, Department of Counseling, Developmental, and Educational Psy-
  • 5. chology, Boston College, 317 Campion Hall, Chestnut Hill, MA 02467. E-mail: [email protected] Journal of Counseling Psychology Copyright 2007 by the American Psychological Association 2007, Vol. 54, No. 3, 235–246 0022-0167/07/$12.00 DOI: 10.1037/0022-0167.54.3.235 235 ical model(s) underlying each scale(s) under study. In this section, I (a) discuss some conceptual issues related to measuring racial identity and ethnic identity as potentially different constructs, (b) discuss some poor practices that obscure differences if they exist, and (c) proffer some better practices. Differentiating Racial Identity From Ethnic Identity In REI research designs, if the researcher’s intent is to substitute one class of REI measures for the other, then it is important to demonstrate that the two types of measures assess the same racial or ethnic constructs. Factors to consider are (a) conceptualization of the research question, (b) sample selection, (c) use of other measures for assessing one type of identity rather than the other, and (d) comparability of validity evidence within and across REI measures.
  • 6. Racial Identity Scales as Replacements for Racial Categories Racial groups or categories are not psychological constructs because they do not connote any explicit behaviors, traits, or biological or environmental conditions (Helms, Jernigan, & Mascher, 2005). Instead racial categories are sociopolitical con- structions that society uses to aggregate people on the basis of ostensible biological characteristics (Helms, 1990). Because racial categories are null constructs, Helms et al. (2005) contended that they should not be used as the conceptual focus (e.g., independent variables) for empirical studies but may be used to describe or define samples or issues. Ascribed racial-group membership im- plies different group-level racial socialization experiences that vary according to whether the group is accorded advantaged or disadvantaged status in society. The content of racial-identity scales is individual group members’ internalization of the racial socialization (e.g., discrimination, undeserved privileges) that per- tains to their group. Ascribed racial group defines the type of life experiences to which a person is exposed and that are available for internalizing (i.e., group oppression or privilege). For example, Black Ameri- cans internalize different racial identities than White Americans, and, conversely, White Americans internalize different racial iden- tities than Black Americans. Also, the nature of the racial identities of Americans and immigrants or other nationals differs if they
  • 7. have not experienced similar racial socialization during their life- times. Thus, racial identity theories are intended to describe group- specific development in particular sociopolitical contexts. Racial identity measures are designed to assess the differential impact of racial dynamics on individuals’ psychological develop- ment. One expects items in racial identity scales or inventories to include some mention of race, racial groups, or conditions that commonly would be construed as racial in nature (e.g., discrimi- nation or advantage on the basis of skin color). For example, Helms and Carter’s (1990) WRIAS consists of five 10-item scales, each of which assesses the differential internalization of societal anti-Black racism on Whites’ identity development. Relevant sam- pling and measurement concerns are specifying samples and mea- sures for which race and racism in various forms are presumably relevant constructs. Ethnic Groups as Proxies for Theoretical Constructs Ethnicity refers to the cultural practices (e.g., customs, language, values) of a group of people, but the group need not be the same ascribed racial group. Betancourt and López (1993) use the term ethnic group to connote membership in a self-identified kinship group, defined by specific cultural values, language, and traditions, and that engages in transmission of the group’s culture to its
  • 8. members. Ethnic identity refers to commitment to a cultural group and engagement in its cultural practices (e.g., culture, religion), irrespective of racial ascriptions. Because ethnic groups imply psychological culture-defined constructs, the constructs rather than the categories should be used as conceptual focuses of studies (e.g., independent variables). The content domain of ethnic identity measures is internalized experiences of ethnic cultural socialization. Phinney and associates (Phinney, 1992; Phinney & Alipuria, 1990) initially developed the MEIM to assess adolescents’ search for and commitment to an ethnic identity in a manner consistent with Erikson’s (1968) mul- tistage pychosocial identity theory and without regard to group- specific cultural components. Originally, she conceptualized eth- nic identity as “a continuous variable [construct or scale], ranging from the lack of exploration and commitment . . . to evidence of both exploration and commitment, reflected in efforts to learn more about one’s background” (Phinney, 1992, p. 161). Her con- tinuous scale was composed of items assessing several dimensions of identity (e.g., ethnic behaviors, affirmation, and belonging); hence, it was a multidimensional scale (Helms, Henze, Sass, & Mifsud, 2006), with a focus on cultural characteristics that are assumed to be relevant to individuals across ethnic groups. Although the structure of the MEIM has varied, its underlying conceptual theme is conformance to ethnic culture rather than exposure to racism. The conceptual, sampling, and measurement
  • 9. issues specific to ethnic identity measures pertain to identifying participants who might be reasonably expected to engage in the cultural practices of the ethnic cultural kinship group in question and ensuring that ethnic identity measures assess relevant culture- related rather than race-inspired psychological construct(s). Selection and Use of Appropriate REI Measures Researchers often use one type of REI measure (e.g., ethnic identity) but provide a conceptual rationale for the other type (e.g., racial identity) without empirical justification for doing so. Em- pirical support for the interchangeability of identity constructs and measures would include evidence that (a) exemplars of the two classes of measures are similarly related to the same external racial or cultural criteria or (b) one type of measure explains its own as well as the other theory’s external criteria best. Support for the distinctiveness of constructs would be lack of support for inter- changeability and evidence that other identity measures from the same class relate to each other in a logically consistent matter. Empirical Comparisons of the MEIM and BRIAS as Measures of REI Constructs Researchers do not seem to consider whether their cultural or racial outcome measures are theoretically congruent with the type of REI measure that they have selected. Consequently, lack of 236 HELMS
  • 10. support for their hypotheses is attributed to deficient REI measures rather than possible incongruence between the researchers’ con- ceptualization and measurement of REI constructs in their research designs. It is difficult to find a single study in which both classes of REI measures and racial and cultural outcome measures were used. Yet for the purpose of illustrating the type of study necessary to support interchangeable use of REI measures, perhaps it is reasonable to think that scores on racial identity measures, such as the BRIAS, should be related to scores on explicit measures of racial constructs (e.g., perceived individual racism, institutional racism), whereas scores on ethnic identity measures, such as the MEIM, should be related to scores on explicit measures of cultural constructs (e.g., acculturation, cultural values). Confirmation of each of these propositions would be evidence of construct expli- cation in that each measure would be assessing constructs germane to it. Johnson’s (2004) study provides sufficient psychometric sum- mary statistics to permit illustration of the test for interchangeabil- ity of REI measures at least in part. His sample of Black college students (N � 167) responded to the MEIM and the RIAS–B (Parham & Helms, 1981), the earliest version of the BRIAS. Table 1 summarizes alpha coefficients (rxx; in the last column) for the REI measures, correlation coefficients between REI scores and
  • 11. perceived discrimination scores, and the same correlation coeffi- cients corrected for disattenuation due to measurement error (i.e., low alpha coefficients) attributable to MEIM or RIAS–B scores. In this example, the correction for attenuation may be interpreted as an estimate of the extent to which an REI and a discrimination subscale measure the same underlying theoretical construct when the effects of REI error are eliminated (Schmidt & Hunter, 1996, 1999). The dependent measures in the study, assessed by the Index of Race Related Stress (Utsey & Ponterotto, 1996), were three types of racism: (a) cultural— belief in the superiority of one’s own culture (e.g., values, traditions); (b) institutional— discrimination in social policies and practices; and (c) individual—personally experienced acts of discrimination. The dependent constructs favor the RIAS–B rather than the MEIM, as is evident in the table. If alpha is the correct reliability estimate and the correlations were calculated on untransformed scores, then the correlations, cor- rected for attenuation attributable to REI measurement error, sug- gest even stronger relations between the racial-identity constructs and perceived discrimination than the ethnic-identity constructs. The best correlation for MEIM scores is cultural racism, which is
  • 12. theory consistent and suggests that a more full blown cultural measure might have favored it. Alternative Measures of the Same Construct(s) The question of whether scores on different measures of the same theoretical constructs are related to scores on the original measures is a matter of seeking evidence of convergent validity, a type of construct validity. Researchers seemingly have not devel- oped alternative measures of the same constructs postulated in Phinney’s (1990, 1992) theoretical perspective, and only one set of researchers (Claney & Parker, 1989) has developed independent measures of the theoretical constructs of Helms’s (1984, 1990) White racial identity model. Choney and Rowe (1994) conducted an evaluation of Claney and Parker’s (1989) 15-item White Racial Consciousness Development Scale (WRCDS), a measure of Helms’s (1984) stages of White racial identity development that predated her own measure, the WRIAS. It is worth examining the study for what it can reveal about good practices in empirical investigation of construct validity of scores on REI measures. The expressed purpose of Choney and Rowe’s (1994) study was to investigate “how the WRCDS compares with the RIAS–W [sic], the current instrument of choice for investigations of White racial identity” (p. 102). Yet their conclusion that “it seems reasonable to conclude that the WRCDS is not capable of adequately assessing
  • 13. the stages of White identity proposed by Helms (1984)” (Choney & Rowe, 1994, p. 104) suggests that an unspoken purpose may have been to examine scores on the two scales for evidence of convergent validity. In Table 2, Cronbach’s alpha reliability coefficients for the WRCDS and WRIAS scales are summarized in the last two columns, and correlations between parallel subscales, adapted from Choney and Rowe (1994, p. 103), are shown in the second column. The authors did not report a full correlation matrix for the Table 1 Comparing Obtained and Corrected Correlations Between MEIM and RIAS-B Scores and Perceived Racism Scale Perceived discrimination Alpha Cultural Institutional Individual Obtained Corrected Obtained Corrected Obtained Corrected MEIM .16 .17 �.01 �.01 .08 .09 .87 Racial Preencounter �.01 �.01 .39 .47 .07 .08 .69 Encounter .36 .50 .25 .35 .28 .39 .51 Immersion .33 .41 .29 .36 .22 .27 .65 Internalization .37 .43 .13 .15 .35 .47 .75 Note. From The Relation of Racial Identity, Ethnic Identity, and
  • 14. Perceived Racial Discrimination Among African Americans Tables 1 (p. 18) and 6 (p. 33) by S. C. Johnson, 2004, Unpublished doctoral dissertation, University of Houston, Texas. “Obtained” were correlations reported by Johnson. “Corrected” are estimates of correlations with measurement error removed. Only measurement error for the MEIM or racial identity subscales (i.e., alpha) were used to correct for disattenuation of correlations attributable to measurement error. The correction for attenuation used was rxy� � rxy/rxx .5, where rxy� equals the corrected correlation, rxy equals the obtained correlation, and rxx .5 equals the square root of the reliability coefficient for the relevant MEIM or RIAS-B subscales. MEIM � Multigroup Ethnic Identity Measure; RIAS-B � Black Racial Identity Attitudes Scale. 237SPECIAL SECTION: RACIAL IDENTITY MEASUREMENT two measures, and so it is not possible to examine within- measure patterns of correlations. Table 2 also includes correlations cor- rected for measurement error in each of the REI measures using each scale’s reported Cronbach’s alpha coefficient. The corrected correlations for these data suggest that the two measures assessed the parallel constructs of Reintegration, Auton- omy, and Pseudo-Independence quite strongly and the
  • 15. constructs of Contact and Disintegration much more strongly than mere examination of the obtained correlations would suggest, thereby refuting Choney and Rowe’s assertion that the WRCDS was “incapable” of assessing Helms’s constructs. However, the fact that the corrected Reintegration correlation coefficient exceeded 1.00 suggests either that Choney and Rowe’s original correlations were downwardly biased by sampling error or that alpha coeffi- cients were not the appropriate reliability estimates for their data. In such circumstances, sampling error concerns can be avoided to some extent by using the better conceptual, sampling, and inter- pretative practices discussed subsequently. Procedures for judging the appropriateness of Cronbach’s alpha as one’s reliability esti- mator are addressed in the later section on better psychometric practices. Poor and Better Construct Defining Practices Research design practices are considered poor if they do not permit data obtained from REI measures to be interpreted in a manner consistent with the relevant REI theory. Practices are better if they make it possible to subject theory-congruent hypoth- eses to empirical testing. Conceptual Practices Researchers often assume that because the MEIM is intended to be a multi-ethnic group measure, it is appropriate to collapse data across racial and ethnic groups without examining whether the
  • 16. responses of the subgroups on the MEIM items and scales are similar (e.g., Phan & Tylka, 2006). Analogously, researchers ag- gregate data across ethnic groups within ascribed racial categories when using the BRIAS or WRIAS without investigating the types of racial socialization to which they have been exposed. Alterna- tively, researchers find that ethnic groups differ but still report aggregated descriptive statistics (Phelps, Taylor, & Gerard, 2001). A conceptual problem associated with a priori aggregation is that researchers presume rather than demonstrate that potentially di- verse racial and ethnic categories (e.g., African Americans and other Black ethnic groups) share the same cultural or racial social- ization experiences. A methodological consequence is the potential loss of statistical power for subsequent analyses, if the groups’ responses are actually different. Phinney’s (1992) studies show that the term ethnicity may have different meaning to different populations. More specifically, she found ethnic group differences in responses to the MEIM such that White participants had lower scores than groups of color. On the basis of their responses to an open-ended item, Phinney (1992) observed that “few White subjects in either sample identified themselves as belonging to a distinct ethnic group. . . . The num- bers of Whites who considered themselves as ethnic group
  • 17. mem- bers was too small to permit a separate analysis” (p. 174). The implications of this observation are rarely heeded. A better research design practice is that users of racial identity measures should provide a “racial” conceptual rationale, focused on racial socialization; users of ethnic identity measures should provide an “ethnic cultural” rationale, focused on cultural social- ization; and researchers interested in both types of identity should provide both types of rationales. Matching samples to the appro- priate type of REI measure should enhance investigations of va- lidity. Also, it follows from the foregoing analyses that correction for attenuation attributable to measurement error is a good practice when the results of the researcher’s study are intended to have far-reaching implications for REI theoretical constructs or to lead to substantive advice about the theoretical constructs assessed by the measures (Schmidt & Hunter, 1999). Correlations were the focus of the corrections in the examples, but virtually any statistic can be corrected for measurement error if it conforms to the assumptions of the general linear model (GLM). Sampling Practices Because scale respondents’ attributes interact with their re- sponses to scales generally and REI scales specifically (Dawis, 1987; Helms, 2005; Vacha-Haase, Kogan, & Thompson, 2000), researchers minimally should provide both a conceptual
  • 18. rationale for why the particular REI measure is appropriate for the research participants that were studied as well as empirical support derived from previous studies. However, researchers typically do not de- scribe any inclusion criteria for defining their participants as members of a racial or ethnic group as such designations are used in the United States. At best, they indicate that the racial/ethnic categories were “self-identified” without explaining how such identification occurred (e.g., Mercer & Cunningham, 2003, p. 221). At worst, they either do not describe the racial or ethnic composition of their sample in the Participants section at all (e.g., Reese, Vera, & Paikoff, 1998) or they assign participants to a racial or ethnic group without any indication of how the assign- ment was determined (e.g., Goodstein & Ponterotto, 1997). Better practices are that researchers should describe their pro- cedures for recruiting research participants, collecting racial or Table 2 An Example of a Convergent Validity Study of White Racial Identity Constructs as Assessed by the White Racial Identity Attitudes Scale (WRIAS) and the White Racial Consciousness Development Scale (WRCDS) Scale Correlation Correction � WRIAS WRCDS Contact .11 0.42 .54 .13
  • 19. Disintegration .17 0.30 .77 .43 Reintegration .53 1.19 .80 .25 Pseudo-Independence .29 0.61 .69 .32 Autonomy .55 0.91 .67 .55 Note. Alphas are adapted from Choney and Rowe, (1994), p. 103. Cor- relations are between parallel subscales. Computation of the correction for attenuation was as follows: Contact � .11/(.54 � .13).5 � .42; Disinte- gration � .17/ (.77 � .43).5 � .30; Reintegration � .53/ (.80 � .25).5 � 1.19; Pseudo-Independence � .29/ (.69 � .32).5 � .61; Autonomy � .55/ (.67 � .55).5 � .91. 238 HELMS ethnic data, quantifying the data, and assigning participants to racial or ethnic categories. These aspects of researchers’ research design should be described as thoroughly as the researchers de- scribe the other measures or manipulations and analyses in their studies. For example, if respondents were asked to describe them- selves, were they provided with checklists or open-ended items? How were responses coded? If different racial or ethnic groups were included, the researcher should provide descriptive informa- tion (e.g., means, standard deviations, and reliability coefficients) as evidence that aggregating the various groups was appropriate.
  • 20. Careful attention to the racial and cultural aspects of research designs will provide better contexts for conducting the psychomet- ric studies of REI measures that have so intrigued researchers and reviewers since the measures’ inception. Psychometric Practices The original focus of racial identity scales—assessment tools for assisting counselors to better diagnose and remediate the varying psychological effects on individuals of internalized racial social- ization (Jackson & Kirschner, 1973; Parham & Helms, 1981)— has virtually disappeared from the REI literature. In fact, research- ers have been severely chastised for attempting to use the measures for diagnostic or treatment purposes, and evidence of their useful- ness has been discounted (Behrens, 1997; Behrens & Rowe, 1997; Fischer & Moradi, 2001). Instead researchers have focused on evaluating the worthiness of REI scales by using reliability anal- yses, principal-components analyses, and factor analyses to exam- ine the internal structure of the measures. However, poor practices associated with each of these methodologies threaten to reduce REI scales to simpler measures than are necessary to explain individuals’ complex racial and cultural functioning. Reliability Analyses
  • 21. Researchers typically use Cronbach’s (1951) alpha coefficients to estimate the reliability of a sample’s responses to REI items comprising subscales or scales even though Cronbach’s alpha was not designed to assess the reliability of multidimensional measures (Hattie, 1985). Nevertheless, much of the threat to REI theoretical constructs from reliability analyses results as much from research- ers’ poor practices with respect to their use of Cronbach’s alpha as from the likelihood that it is the wrong statistic most of the time (Helms, 2005). To explain when Cronbach’s alpha is the wrong statistic, it is necessary to provide a brief overview of the calculation and assumptions underlying use of Cronbach’s alpha because research- ers seem to be unaware of them. Consequently, the researchers do not examine the fit of their data to the implied reliability measure- ment model or to the REI theoretical assumptions under investi- gation. The Cronbach’s alpha coefficient is the focus of my REI reliability discussion because it is virtually the only reliability estimator that is ever used to evaluate REI measures in the REI measurement literature or other types of measures in the social and behavioral sciences literature more generally (Behrens, 1997; Cor- tina, 1993; Helms, 1999; Peterson, 1994). Overview of Cronbach’s Alpha
  • 22. This overview of Cronbach’s alpha (henceforth, alpha) is not intended to be a technical treatise on the statistic. A number of primers for applied researchers are available to fulfill that goal. These include Cortina (1993), Helms et al. (2006), Thompson and Vacha-Haase (2000), and the Standards for Educational and Psy- chological Testing (American Educational Research Association [AERA], American Psychological Association, and National Council on Measurement in Education, 1999). Sometimes poor practices occur because researchers are not aware of the nature of the data necessary to use and interpret alpha; sometimes they occur because it is not the appropriate statistic for the researcher’s intended use. Therefore, the purpose of this overview is to provide enough information to explain why the proposed “better” practices are better. An alpha coefficient is a statistic that summarizes the degree of interrelatedness among a sample of participants’ responses to a set of items intended to measure a single construct. Alpha coefficients typically range from zero to 1.00, with values approaching 1.00 suggesting a high level of positive intercorrelation among items. The nature of the interitem relationships is captured in the formula for standardized alpha: � � kr/�1 � �k�1)r], (1) where k refers to the number of items in the subscale or scale
  • 23. and r is the average correlation between participants’ responses to all pairs of subscale items. It should be noted that this alternative alpha formula should not be used unless the researcher standard- izes item responses before calculating total scores for subsequent analyses because it assumes homogeneity of item responses; how- ever, it is useful for making some necessary points (Helms et al., 2006). As is true of all reliability coefficients, alpha is not an inherent or stable property of scales, regardless of its value (AERA et al., 1999; Thompson, 1994). Rather it is a value that describes one sample’s responses to a set of items under one set of circum- stances. Therefore, appraisals of REI measures based solely on the magnitude of alpha coefficients, such as “Practitioners should withhold use of the WRIAS until there is clear evidence with regard to what it measures” (Behrens, 1997, p. 10), reflect multiple misunderstandings concerning how alpha coefficients should be used and interpreted. With few exceptions (e.g., Owens, 1947), researchers’ reliability ideal is an alpha coefficient of 1.00 or as close as possible, which is the standard used to evaluate REI measures and non-REI mea- sures alike. A coefficient of such magnitude would be correctly interpreted as indicating that 1.00 or 100% of the variance of the
  • 24. total subscale or scale scores is reliable or systematic variance for the sample under study. No conclusions could rightfully be made about “what” the scale scores measure (i.e., the nature of the systematic variance), which is a matter to be addressed by means of validity studies involving measurement or manipulations of relevant constructs external to the target measure. Generalizing the obtained level of reliability to other samples would not be proper because reliability coefficients likely differ from sample to sample. Moreover, obtaining alpha coefficients close to 1.00 is a Pyrrhic ideal generally because a unit value signifies that (a) certain kinds of statistical analyses cannot be conducted at the item level or will yield spurious results, and (b) the usefulness of scale scores for validity studies under such circumstances is quite limited. Owens’s reductio ad absurdum is that perfect or almost perfect internal consistency reliability (ICR) coefficients indicate that sample’s 239SPECIAL SECTION: RACIAL IDENTITY MEASUREMENT responses to each item are perfectly predictable from their re- sponses to every other item as well as the total scale scores from which the alpha coefficient was derived. Thus, the item-level responses would be redundant with each other and results of
  • 25. statistical analyses requiring matrix inversion—such as factor analyses, structural equation modeling, and item analyses via multiple regression analyses—would be trivial. Some computer packages warn the user that the data matrix is “non-positive definite” when this type of redundancy occurs. The relevant validity issue is that when the ICR coefficient is nearly perfect, it suggests that the items and subscale scores have some single construct (i.e., systematic variance) in common and for validity evidence to be obtained, the same single construct must be salient in the external criteria. For example, the researcher would need to identify the same cultural construct in the MEIM and the external criteria; in the case of the WRIAS or BRIAS and external criteria, the researcher would have to identify the same racial construct in each, if alpha coefficients were perfect. Use of alpha to evaluate either class of REI measures presumes that item responses fit a unidimensional structure, because alpha is intended to evaluate the unidimensionality of item responses or scale scores, whether it does or not (Hattie, 1985). Thus, validity analyses might result in “too small” validity coefficients when alpha is too large. In sum, it is not clear why alpha is so popular a statistic for evaluating measures generally given that, at its best, it describes and promotes development of very simple constructs. In some ways, the simple structure assumption is least problematic for REI
  • 26. measures such as the MEIM if their theoretical rationale proposes positively related constructs and their item responses are positively correlated. Positively correlated item responses and subscale scores yield high alpha coefficients, even if they restrict the other kinds of analyses that should be conducted. Simplification of constructs is most problematic for evaluation of REI measures such as the WRIAS because their theories pro- pose that persons endorsing some subscale items as self- descriptive will reject others. Depending on the samples’ racial socialization experiences, datasets may be defined by some nega- tive and some positive correlations among item responses; some item responses may be more homogeneous as indicated by small standard deviations; and the sample’s level of endorsement (i.e., item means) might differ across items. Any of these conditions would contribute to low alpha coefficients because they are vio- lations of basic alpha measurement assumptions, but each of the conditions may be consistent with some REI theory. It is important to understand these basic aspects of alpha as a measurement model so that the researcher can make an informed decision about whether its use is consistent with the theoretical framework of the selected REI measure and the researcher’s rea- son for selecting it. Once data are collected, it is important to check the validity of assumptions associated with anticipated psychomet- ric analyses, particularly if REI scale modification depends on the magnitude of obtained reliability coefficients.
  • 27. Assumptions and Scale Modification Because alpha is virtually the only reliability coefficient used to evaluate and modify REI measures, it is useful to discuss assump- tion checks and scale modification practices as they pertain to alpha. Yet the better practices are relevant to virtually any reli- ability coefficient. Alpha Measurement Assumptions The basic assumptions that should be examined to support use of alpha are (a) item responses are positively correlated, (b) item responses are homogeneous, (c) item means are essentially equal, and (d) the REI theory postulates unidimensional or homogeneous constructs. The first assumption can be investigated by examining the interitem correlation matrix. The presence of any negative correlations means that the resulting alpha coefficient will be an underestimate of item relatedness. If Assumption A is confirmed, Assumption B can be checked by using Feldt and Charter’s (2003) ratio for examining interitem homogeneity of variances (i.e., com- pare the largest item standard deviation to the smallest item stan- dard deviation in the data set). If the result of the comparison is less than 1.3 (i.e., SDL/SDS � 1.3), alpha may be an appropriate estimate of ICR. If Assumption B is supported, then the smallest
  • 28. item mean and largest item mean should be compared via statis- tical tests (e.g., paired comparison t tests, within-subjects analysis of variance). The check for Assumption D is conceptual, but it may be rejected if the REI theory proposes clusters of items or people intended to measure more than one construct using a single scale or multiple subscales. If any of the assumptions are not supported, then alternative procedures for estimating ICR are available. Some of these alter- native procedures are summarized in Table 3, which cites re- sources and describes alternatives to use when specific measure- ment conditions exist. For example, if alpha is “too low,” Rogers, Schmitt, and Mullins (2002), cited in the last row, recommend exploratory factor analysis, followed by calculation of alphas for identified item subsets and composite alpha if the researcher intends to use the items as a single multidimensional scale. The procedures in Table 3 ought to work well for MEIM-like scales but not for WRIAS-like scales if they are ipsative. Ipsative Measurement Assumptions Conceptually, a measure is ipsative if individuals’ scores are their explicit or implicit self-rankings on a set of items or subscales such that some scores must be higher than some others. Ordinarily, ipsativity results from response formats (e.g., forced choice,
  • 29. rank- ings) or transformations (e.g., subtracting person’s mean scores from total scores). However, for some REI measures, ipsativity is induced by individual participants’ need to be logically consistent and, therefore, unwillingness to endorse contradictory items as self-descriptive. Consequently, when some subscale scores are high, some others are relatively lower (i.e., some subscales are inversely related within individuals and therefore negatively cor- related between individuals). Three properties that Helms and Carter’s (1990) WRIAS share with ipsative measures are (a) half or more correlations in a matrix of correlations typically are negative, (b) samples’ mean correla- tion among subscales typically is negative and approaches zero, and (c) the average of the full-scale item responses is a constant. Of the 21 correlation matrices analyzed by Behrens (1997), for which 10 correlation coefficients were reported, 100% consisted of five or more negative correlations and the mean correlation, 240 HELMS weighted by sample size, was –.03. For the 50 WRIAS items, the average value within individuals and samples rounds to the scale midpoint (i.e., 3). Also, except for measurement error, such as missing data, item means for any four subscales (total number
  • 30. of subscales minus one) will also equal a rounded value of 3 in most samples, meaning that the contribution of the remaining items is not unique. Experts in measurement have long debated how best to analyze ipsative datasets given that the data violate all of the assumptions of the GLM and CTT, particularly the assumption of random error among items (Baron, 1996; Johnson, Wood, & Blinkhorn, 1988). There is no consensual resolution to the debate, and most of the compromises do not pertain to REI scales because their ipsativity is theoretically induced rather than an artifact of the response format of scales. For now, the best compromises are first to explore Points a– c from the previous paragraph to determine whether data conform to an ipsative pattern. Second, if so, do not include all of the items in reliability or factor analyses. Conduct analyses with single subscales or subsets of subscales. Also, con- sider analyses that do not depend on correlation coefficients, such as cluster or profile analyses. Third, if data are not ipsative, multidimensional analyses in Table 3 might yield meaningful results. REI Scale Modification Many researchers modify or negatively evaluate REI measures when they obtain unsatisfactory alpha coefficients. Such practices typically occur without regard to any of the previously discussed
  • 31. research design or psychometric issues and thereby contribute to the development of nonstandard, atheoretical scales and perhaps the discarding of important information about samples. Vacha-Haase (1998) developed reliability– generalization (RG) methodology to assess the effects of sample and research design characteristics on the magnitude of reliability coefficients across studies. RG studies allow the researcher to adjust for the fact that tests per se are not reliable by identifying the conditions (e.g., sample demographic and response characteristics, settings) under which one is likely to obtain desired levels of reliability for some intended purpose using the same set of items. Utsey and Gernat’s (2002) study provides an example of how RG studies might be used to improve psychometric reliability practices. Utsey and Gernat used their obtained alpha coefficient of .28 for scores on the 10-item WRIAS Autonomy subscale as the rationale for dropping two of its items, “Sometimes jokes based on Black people’s experiences are funny” and “I understand that White women and men must end racism in this country because White people created it.” The two items assessed sense of humor and historical knowledge in Helms’s (1990) White identity theory. To their credit, the researchers (a) reported which items were omitted; (b) attempted to find justification for their unusually low alpha in
  • 32. previous literature; and, finding none quite as low, (c) checked their data for input errors before revising the subscale on the basis of corrected item–total correlations. The revised alpha was .55. Had Utsey and Gernat conducted RG comparisons of their sample to samples in other studies as Helms (2005) recently recommended, they would have discovered that relative to Helms and Carter’s (1990) referent total sample, the responses of their sample were much more variable on the Contact subscale; they were less variable with respect to the remaining subscales, except Autonomy for which the authors did not report a standard devia- tion for the full subscale. Participants in their study differed on a variety of attributes relative to Helms and Carter’s referent sample (including sense of humor and racial historical knowledge), but the point here is that if researchers do not rule out sample attributes as a rationale for their reliability results, then it is improper to ascribe the “erratic pattern of Cronbach’s alpha coefficients” to the REI scale or subscales rather than their “unusual” sample (Utsey & Gernat, 2002, p. 477). An alpha coefficient is a point estimate of a population value that is obtained with some level of precision and is sample depen- Table 3 Summary of Some Recommended Methodologies for Evaluating Measures of Multidimensional Constructs
  • 33. Author Description Bacon, Sauer, & Young (1995) In SEM, use weighted omega to estimate rxx so that items can violate alpha assumptions and receive weights proportional to their true score variances. Ferketich (1990) Compute rxx (a) from first eigenvalue of a PCA (omega) or (b) the item commonalities of an FA (theta) if items are heterogeneous. Komaroff (1997) If CFA reveals correlated error, adjust alpha by subtracting estimated positive sum of error covariances from it. Lee, Dunbar, & Frisbie (2001) SEM multifactor partially tau equivalent—assumes items within subscales are homogeneous and positively correlated but subscales are not. SEM multifactor congeneric (heterogeneous)—assumes subscale-specific common factors; structural coefficients are not restricted for subscales, and different parameters in each subscale may be estimated. Raykov (1998) 1. Use SEM to test whether scale is congeneric (not homogeneous); if so, examine MI associated with error covariances and expected parameter changes. If MIs � 5, it may reflect heterogeneous item subsets. 2. EFA using maximum likelihood extraction and examine chi- square for fit; compare eigenvalues. Raykov (1997) Use Raykov LVM to examine underlying factor structure of item sets before deleting items to increase alpha.
  • 34. Rogers et al. (2002) If CFI is � .80, use EFA to identify item subsets and calculate composite alpha, if a single multidimensional scale is desired. Note. SEM � structural equation model; PCA � principal- components analysis; FA � factor analysis; CFA � confirmatory factor analysis; MI � modification indices; EFA � exploratory factor analysis; LVM � latent variable model; CFI � comparative fit index. 241SPECIAL SECTION: RACIAL IDENTITY MEASUREMENT dent. RG studies may be used to discover whether one’s obtained reliability coefficients are aberrant relative to samples in other studies (Helms et al., 2006). Fan and Thompson (2001) recom- mended that researchers report confidence intervals (CIs) for ob- tained alpha coefficients and provided the methodology for calcu- lating them (pp. 522–523). They also illustrated the analysis for statistically comparing an obtained coefficient to a population value(s). Helms et al.’s (2006) and Fan and Thompson’s (2001) advice may be applied to Utsey and Gernat’s (2002) previously discussed Autonomy coefficient (.28). Behrens’s (1997) meta-analysis of Autonomy alpha coefficients from 23 studies, which was cited by Utsey and Gernat, yielded an alpha coefficient population
  • 35. estimate of .61. The upper and lower limits of the (presumably) 95% CI for the population estimate were respectively .63 and .60. The 95% CI (based on a central F distribution) calculated for Utsey and Ger- nat’s alpha of .28 is .09 (lower limit) to .44 (upper limit). Thus, the range of population estimates for their obtained and revised alpha (.55) coefficients were considerably below the average range Be- hrens reported. Additional support is that an analysis of variance, conducted with the smaller alpha value in the numerator (i.e., .28), indicates that the researchers’ reported alpha was significantly lower than Behrens’s population estimate of .61, F(144, 1296) � 1.85, p � .0001. Therefore, it is reasonable to conclude either that Utsey and Gernat’s sample was aberrant or that alpha was not the appropriate statistic for analyzing their data. The more general principle is that to maintain the theoretical meaningfulness of REI scales, modify- ing the scales or subscales should not be the automatic response to “too-low” alpha coefficients. Instead, alternative psychometric hy- potheses should be explored, including effects of sample attributes. Whole Scale REI Scale Revisions
  • 36. Researchers routinely engage in a variety of practices intended to develop new REI measures from the original items. Most of these practices involve analyses of responses to the entire scales by means of techniques intended to assess the fit of data to a unidi- mensional measurement model (i.e., interscale correlations, principal-components analysis, and factor analysis). Many of these analyses are conducted without regard to the interplay between psychometrics and theory; some others rely on improperly con- ducted or incorrectly interpreted psychometric procedures. Nonstandard REI Scales One consequence of disregarding the interactions between the- ory and psychometric practices is that researchers replace standard sets of items with whatever sets of items best describe their samples. Fit may be determined by reliability analyses, factor analysis, principal-components analysis, correlation analyses, or some combination. Reliability Analyses for Subscales When using the MEIM, researchers frequently describe the measure as consisting of different numbers of items (range: 10 to 24) and different numbers of scales or subscales, dimensions, or components (range: 1 to 5 subscales), each of which consists of varying numbers of items and item anchors (e.g., Carter, Sbrocco, Lewis, & Friedman, 2001; Cokley, 2005). Often it is impossible to discern whether the items used by authors correspond to those
  • 37. listed by Phinney (1992). The analogue for racial identity measures is that researchers drop scales or recombine items on the basis of their own or someone else’s reliability analyses or personal pref- erences (Kelly, 2004). It is not clear why researchers are so flexible about the structure of the MEIM in particular given that Phinney (1992) intended her measure to assess the same constructs across ethnic groups. Per- haps they are confused because she did not report reliability coefficients for the scores of her two developmental samples on the two-item Ethnic Behaviors subscale. She asserted that “reli- ability [i.e., alpha] cannot be calculated with only two items” (Phinney, 1992, p. 165). Subsequent researchers have followed suit and cite her as the source for this poor practice. However, Phinney’s assertion is demonstrably untrue. Much of CTT has focused on developing methodologies for estimating the reliability of scores on two-item (e.g., split-half, alternate form) tests (Feldt & Charter, 2003). In fact, if the researcher calculates item vari- ances and covariances, then the standard formula for alpha, used in Table 4, may be used to estimate reliability. The Spearman–Brown formula was used to estimate the alpha coefficient for the two-item Ethnic Behavior subscale scores of Phinney’s college student sample whose overall reliability for responses to 14 items was .90. As shown in Table 4, the estimated alpha for the Ethnic Behavior subscale responses was much lower than the alphas reported for the responses of her college student
  • 38. sample to the other subscales, but it could be calculated and doing so is consistent with the advice in the Testing Standards that researchers report reliability coefficients for all scales and sub- scales used in their studies (AERA et al., 1999). Table 4 Summary of Calculation of Two-Item Reliability and Composite Alpha for Phinney’s (1992) College Student Sample Scale Phinney data Untransformedb k � M SD M SD Affirmation/Belonging 5 .86 3.36 .59 16.80 2.95 Identity Achievement 7 .80 2.90 .64 20.30 4.48 Ethnic Behaviorsa 2 .56 2.67 .85 5.34 1.70 Ethnic Identity 14 .90 3.04 .59 42.56 8.26 Composite alpha 3 .80 Note. Data adapted from the college student sample in “The Multigroup Ethnic Group Measure: A New Scale for Use With Diverse Groups,” by J. S. Phinney, 1992, Journal of Adolescent Research, 7, Tables 2 & 3, p. 167. Calculation of composite alpha (CA) was as follows: CA � �k/k�1)�[1�( SDss 2/SDtotal 2 ]
  • 39. � �3/ 2 � �1 � ��2.95 2 � 4.48 2 � 1.7 2 /8.26 2� � �3/ 2 � �1 � �31.6629/68.2276 � � .80 a Alpha for Ethnic Behaviors was estimated from alpha of Ethnic Identity Scale scores using Spearman-Brown formula. bUntransformed scores were computed by weighting Phinney’s data by the number of items (k) and used to compute composite alpha. 242 HELMS Reliability Analyses for Total Scale Scores Judging from her reporting of reliability coefficients for four individual subscales (Phinney, 1992, p. 165), Phinney intended one early version of the MEIM to assess four unidimensional constructs: Affirmation/Belonging (five items, �s � .75, .86), Ethnic Identity Achievement (seven items, alphas � .69, .80), Ethnic Behaviors (two items, alphas not reported), and Other Group Orientation (six items, �s � .71, .74). The two alphas in parentheses are for samples of high school students (N � 417) and college students (N � 136), respectively. The Ethnic Identity total
  • 40. scale (14 items, �s � .81, .90) is the aggregated responses to the 14 items comprising the three subscales (excluding Other Group Orientation). Researchers routinely report and analyze reliability data only for the Ethnic Identity (EI) total, thereby overlooking potentially the- oretically interesting information. Alternatively, they report alpha only for the EI total but use the individual subscales in their analyses, thereby ignoring the better practice of reporting reliabil- ity coefficients for all scales and subscales (AERA et al., 1999). A third poor practice is that researchers use item variances rather than subscale variances to calculate alpha for the composite scores (i.e., composite alpha) in spite of the fact that the EI total obviously consists of multiple components or subscales (i.e., is multidimen- sional). Cronbach (1951) advised that for multisubscale measures, re- searchers should use subscale values (e.g., variances, intercorrela- tions) in the standard formula to calculate lumpy alpha. Doing so determines whether a principal component, defining a superordi- nate dimension (e.g., ethnic identity), runs through the subscale responses more strongly than individual constructs run through subscale responses. In Table 4, I show the difference between composite or lumpy alpha (.80), calculated for Phinney’s (1992)
  • 41. data for college students, and the (presumably) item-level alpha that she reported (.90). To do so, the original subscale variances were estimated by weighting their standard deviations by the number of scale items (k). Only one of the subscales has a lower alpha than composite alpha, which means that although a common theme runs through the subscales, it does not warrant abandoning the theoretical constructs that the individual subscales assess by replacing them with total scale scores. When researchers use responses to individual items to calculate ICR for total scale scores rather than responses to subscales, total-scale alpha coefficients typically will be inflated for one or both of the following reasons: (a) the number of items for the scale overall exceeds the number of items for the separate subscales (note that shared variance is weighted by the number of items in alpha formulas) and (b) nominal positive correlations across item sets elevate the level of shared variance (Cortina, 1993; Helms et al., 2006). Thus, users of the MEIM and other REI measures often have been seduced by higher total-scale alpha coefficients into abandoning the theoretical constructs underlying the selected mea- sure. Yet conceptually meaningful measures are likely to yield better validity evidence. Table 1 illustrates this point to some extent. Notice that scores on the measure that yielded the best ICR (rxx � .87) correlated worst with measures of other constructs, but the measure whose scores yielded the worst ICR (rxx � .51)
  • 42. demon- strated better correlations overall than any of the other measures with better ICR (i.e., alphas). For some of the revisions of the MEIM, it would not be surprising to discover that the subscales yielded higher correlations with measures of other constructs than did the total scale because conceptual complexity is lost by aggre- gating subscales. Thus, a better practice is that if composite ICR must be calculated for REI measures for some purpose, then researchers should calculate it at the subscale level as a means of determining whether it is meaningful or necessary to collapse across theoretical REI constructs. Analyses of Correlation Coefficients Researchers frequently calculate correlations among subscales and when large correlations are found between pairs of scales, they either (a) collapse the subscales, (b) claim “multicollinearity” as the explanation for why their hypotheses were not confirmed, or (c) use the findings as a rationale for creating new scales. Both statistical and REI theoretical assumptions interact to suggest that these are not good practices. Examination of the dataset’s fit to statistical assumptions is necessary if the researcher intends to use inferential statistics to test hypotheses concerning correlations between REI subscales or to reconfigure REI subscales. Onwuegbuzie and Daniel (1999, pp. 8 –10) reported that re- searchers fail to confirm that Pearson correlation coefficients
  • 43. are the correct statistic for evaluating associations between measures by assessing the conformance of the distributions of the pair(s) of variables to the assumptions of GLM. Some of the major assump- tions and their relevance to REI measures are as follows: 1. One variable of the pair is presumed to be the independent or predictor variable and the other is presumed to be the dependent or criterion variable. Of course, this assumption is not true in the case of REI measures given that subscales are administered to the same people at the same time via the same measure and that REI theories suppose that subscale scores are interrelated within individuals. Consequently, sampling and measurement error are likely corre- lated and, therefore, influence the magnitude of correlations in one direction or another. 2. The dependent variable must be normally distributed. This also is unlikely to be true because many of the REI theories suppose that people behave differently according to the setting(s) in which they find themselves or the people with whom they are interacting. Thus, skewness or kurtosis of subscale responses may affect the size of correlation coefficients, thereby contributing to Type I or Type II error. 3. The variability of scores for the dependent variable is about
  • 44. the same at all levels of the independent variable. Because no variable is the independent or dependent variable when all intra- subscale correlations are compared, this assumption in effect re- quires variances to be equal for all subscales at all levels, which is unlikely given that REI theories postulate sample heterogeneity across subscales. In sum, attributions that REI subscales are flawed, which cite bivariate correlations as evidence, might be counterindicated if the researcher cannot provide evidence that relevant GLM assump- tions were tested. Moreover, that a correlation differs significantly from zero is not evidence of multicollinearity of either of the involved subscales. Researchers often use the term multicollinear- ity to mean redundancy, both of which are inferred from “large” 243SPECIAL SECTION: RACIAL IDENTITY MEASUREMENT correlations of various sizes between scales, although the defini- tion of large varies from study to study. Branch (1990) pointed out that it is fallacious to conclude on the basis of even substantial correlations (e.g., .80) that two subscales measure the same construct and are interchangeable as a result. He contended that (a) correlations do not necessarily reveal that each
  • 45. person obtained similar scores on each subscale, and (b) inter- changeability requires evidence that the subscales involved share observed scores, means, variances, and content. These are aspects of whole scale correlation analyses that are rarely examined and interpreted in evaluations of REI measures. Principal-Components and Factor Analysis When alpha coefficients are too small, researchers routinely conduct post hoc principal-components analysis (PCA) or factor analysis (FA) to develop “reliable and conceptually meaningful scales” (Mercer & Cunningham, 2003, p. 217) or to “identify and describe a [more useful] subset of items” (Yancey, Aneshensel, & Driscoll, 2001, p. 194) from already developed conceptually mean- ingful REI scales. In doing so, they typically confuse PCA with FA, although many psychometric texts indicate that the two types of analyses are based on different mathematical assumptions and serve different purposes (Kim & Mueller, 1978). Yet some of the poor psychometric practices are similar when they are used to evaluate entire REI measures at the item level and some are different. PCA issues. PCA has been the primary methodology used to evaluate responses to both types of REI measures. PCA is intended to reduce a large number of items (in this case) to a smaller number, and the first component accounts for the maximum amount of variance possible among the items. The implicit re-
  • 46. search question when PCA is used to analyze the WRIAS, BRIAS, or the MEIM is whether the items can be transformed into some other smaller set of variables—a question whose answer is theo- retically meaningless. Helms and Carter (1990) used PCA in developing the WRIAS and should not have because the number of dimensions (i.e., subscales) was already rationally defined by theory. PCA conducted at the item level assembles the strongest posi- tively related items across subscales until it has accounted for as much variance as possible. Yet typically the analysis accounts for less interitem variance overall than the average of the alpha coef- ficients of the multidimensional subscales that inspired the PCA analysis because the units of analysis are different. That is, reli- ability analyses generally examine item responses within sub- scales, whereas PCA analyses examine items without regard to subscale. For example, in Mercer and Cunningham’s (2003) WRIAS study, alpha coefficients accounted for an average of 62% of the interitem variance, whereas their PCA accounted for 42% of the interitem variance. Yet researchers may fool themselves into believing that they have discovered better subscales because alpha coefficients calcu- lated for PCA-derived scales must be large because the PCA statistically maximizes the shared covariance among item re- sponses of the sample. Thus, alpha coefficients calculated for PCA-derived subscales are statistical artifacts. Also, it should be
  • 47. noted that the amount of interitem variance explained by alpha is equivalent to the first principal component if the same data are used in the analyses (e.g., subscale items) and the previously discussed alpha assumptions are supported (e.g., homogeneity of variances). Support is recognizable from equal pattern/structure coefficients (formerly “loadings”). If responses are heterogeneous, then the eigenvalue for the first component may be used to calcu- late a variety of statistics other than alpha to assess ICR of measurements (see Table 3; Hattie, 1985). Nevertheless, use of results from analyses of full scales to replace subscales endangers theoretical constructs because the analyses involve different clus- ters of items as well as different implicit hypotheses. FA issues. In general, Phinney and her associates have con- ducted increasingly more sophisticated FAs of her measures, al- though evaluators of her measure typically have favored PCA (Phinney, 1992; Roberts et al., 1999). On the face of it, FA and confirmatory FA should permit the test of theoretical models associated with the relevant items, if the theory and data fit the measurement model. As previously discussed, when alpha coeffi- cients are too large (e.g., close to 1.00) or data are ipsative, then FA of entire scales either will not be possible or will yield specious results because of item redundancy. Warnings to the effect that covariance matrices are “nonpositive definite” or “ill- conditioned” are indicative of interdependencies among items, but such warn-
  • 48. ings are not necessarily indicative of flawed measures. Instead they might signal the need to shift one’s psychometric focus from traditional ICR studies to alternatives, such as, for example, anal- yses to identify clusters or profiles of people rather than items (Johnson, Wood, & Blinkhorn, 1988). Common issues. For PCA and FA, researchers typically do not report their analyses well enough to permit replication. Either they do not report methods for deciding the number of components or factors that were extracted or rotation methods, or they use out- moded methods, or they test models that are incongruent with theory. A few remedies for these poor practices not discussed elsewhere are as follows: 1. As appropriate, researchers should report all of the pattern/ structure coefficients for their PCA, FA, or structural equation modeling (SEM) regardless of whether the coefficients conform to a preferred cut score (Lance, Butts, & Michels, 2006). They should also report eigenvalues, numbers of items analyzed, commonali- ties, and any other structural properties of the analysis that would permit other investigators to verify or better understand their findings. 2. Researchers should specify the assumptions of the measure- ment model that they tested (e.g., homogeneity of variances). For SEM researchers, a good practice is to indicate what parameters were constrained so that other researchers can determine whether the measurement model fits their interpretation of the relevant
  • 49. REI theory. 3. Users of PCA and FA should report what procedure was used to decide the number of factors or components to extract and should use parallel analysis rather than Kaiser’s criterion of eig- envalues greater than 1.00 (Hayton, Allen, & Scarpello, 2004). 4. Most REI theories do not propose orthogonal or independent constructs; therefore, models and rotation methodologies that as- sume independence should not be used if they are inconsistent with theories. 5. If ipsative data are analyzed, they will necessarily yield bipolar factors or components equal to one fewer than the number 244 HELMS of REI subscales, and the resulting “new” PCA/FA-derived sub- scales will be ipsative or partially ipsative, too. Conclusions Researchers have not differentiated scale development from theory testing research. The former is necessary if no tool exists for assessing relevant theoretical constructs or if the researcher wants to develop alternative measures to evaluate constructs, but this typically has not been the case where REI scales are concerned.
  • 50. Existent REI scales may need to be revised to better represent their underlying hypothetical constructs, but this cannot happen if (a) each researcher changes the content of already developed REI scales to reflect the responses of each new sample, (b) the mea- surement models and samples used to inform the revisions do not fit the model implied by the REI theory being tested, and (c) revisions rely exclusively on evaluations of the internal structure of the REI measures, no matter how well the research is conducted. REI scale development began as a quest to replace one-item measures (i.e., racial categories) with more complex tools for assessing individual differences in internalized racial-group social- ization (Jackson & Kirschner, 1973). However, contemporary re- searchers have commonly engaged in a variety of research design and psychometric practices that threaten to return the measures and their associated theories back to their more simplistic atheoretical roots, thereby limiting their usefulness for understanding the ef- fects of racial and cultural socialization on people’s mental health. In an effort to discourage the routine practices of using research design and ICR psychometric analyses to reduce all REI scales to measures of unidimensional constructs whether or not such reduc- tion is theory consistent, the focus of this article has been on introducing counseling psychologists to some better practices
  • 51. for matching REI theoretical constructs and measures to the theories’ implicit measurement models. Doing so should not be construed as an endorsement of the practice of giving preeminence to studies of the internal structure of REI measures rather than to other types of validity studies, even if methodologies that permit development of heterogeneous REI scales are used. Ultimately it does not matter how well REI scale items relate to each other if they do not help explain complex behaviors beyond themselves (AERA et al., 1999). [The] complexity of psychosocial behavior may require tests to be heterogeneous, perhaps irreducibly so, to maintain their reliability, validity, and predictive utility. . .If a theory claims that an entity has multiple attributes, then the test measuring that entity should measure all relevant attributes. Therefore, tests must be heterogeneous. The meaningfulness of a test lies not in a methodological prescription of homogeneity but in the test’s ability to capture all relevant attributes of the entity it purports to measure. (Lucke, 2005, p. 66) What psychosocial behaviors can be more complex than racial identity and ethnic identity in the United States?
  • 52. References American Educational Research Association (AERA), American Psycho- logical Association and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washing- ton, DC: AERA. Bacon, D. R., Sauer, P. L., & Young, M. (1995). Composite reliability in structural equations modeling. Educational and Psychological Measure- ment, 55, 394 – 406. Baron, H. (1996). Strengths and limitations of ipsative measurement. Journal of Occupational and Organizational Psychology, 69, 49 –56. Behrens, J. T. (1997). Does the White Racial Identity Scale measure racial identity? Journal of Counseling Psychology, 44, 3–12. Behrens, J. T., & Rowe, W. (1997). Measuring White racial identity: A reply to Helms (1997). Journal of Counseling Psychology, 44, 17–19. Betancourt, H., & López, S. R. (1993). The study of culture, ethnicity, and race in American psychology. American Psychologist, 48, 629 – 637.
  • 53. Branch, W. (1990). On interpreting correlation coefficients. American Psychologist, 45, 296. Carter, M. M., Sbrocco, T., Lewis, E. L., & Friedman, E. K. (2001). Parental bonding and anxiety: Differences between African American and European American college students. Anxiety Disorders, 15, 555– 569. Choney, S. K., & Rowe, W. (1994). Assessing White racial identity: The White Racial Consciousness Development Scale (WRCDS). Journal of Counseling & Development, 73, 102–104. Claney, C., & Parker, W. M. (1989). Assessing White racial consciousness and perceived comfort with Black individuals: A preliminary study. Journal of Counseling & Development, 67, 449 – 451. Cokley, K. O. (2005). Racial(ized) identity, ethnic identity, and Africentric values: Conceptual and methodological challenges in understanding African American identity. Journal of Counseling Psychology, 52, 517– 526. Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 96 –104.
  • 54. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychol- ogy, 34, 481– 489. Erikson, E. H. (1968). Identity: Youth and crisis. New York: Norton. Fan, X., & Thompson, B. (2001). Confidence intervals about score reli- ability please: An EPM guidelines editorial. Educational and Psycho- logical Measurement, 61, 517–531. Feldt, L. S., & Charter, R. A. (2003). Estimating the reliability of a test split into two parts of equal or unequal length. Psychological Methods, 8, 102–109. Ferketich, S. (1990). Focus on psychometrics: Internal consistency esti- mates of reliability. Research in Nursing & Health, 13, 437– 440. Fischer, A. R., & Moradi, B. (2001). Racial and ethnic identity: Recent developments and needed directions. In J. G. Ponterotto, J. M. Casas, L. A. Suzuki, & C. M. Alexander (Eds.), Handbook of multicultural counseling (2nd ed., pp. 341–370). Thousand Oaks, CA: Sage.
  • 55. Goodstein, R., & Ponterotto, J. G. (1997). Racial and ethnic identity: Their relationship and their contribution to self-esteem. Journal of Black Psychology, 23, 275–292. Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139 – 164. Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7, 191–205. Helms, J. E. (1984). Toward a theoretical explanation of the effects of race on counseling: A Black and White model. The Counseling Psychologist, 12, 153–165. Helms, J. E. (Ed.). (1990). Black and White racial identity: Theory, research, and practice. Westport, CT: Greenwood Press. Helms, J. E. (1999). Another meta-analysis of the White Racial Identity Attitudes Scale’s alphas: Implications for validity. Measurement and Evaluation in Counseling and Development, 32, 122–137. Helms, J. E. (2005). Challenging some misuses of reliability coefficients as reflected in evaluations of the White Racial Identity Attitude
  • 56. Scale (WRIAS). In R. T. Carter (Ed.), Handbook of racial– cultural psychol- 245SPECIAL SECTION: RACIAL IDENTITY MEASUREMENT ogy and counseling: Theory and research (Vol. 1, pp. 360 –390). New York: Wiley. Helms, J. E., & Carter, R. T. (1990). Development of the White Racial Identity Inventory. In J. E. Helms (Ed.), Black and White racial identity: Theory, research, and practice (pp. 67– 80). Westport, CT: Greenwood Press. Helms, J. E., Henze, K., Sass, T., & Mifsud, V. (2006). Treating Cron- bach’s alpha reliability coefficients as data in counseling research. The Counseling Psychologist, 34, 630 – 660. Helms, J. E., Jernigan, M., & Mascher, J. (2005). The meaning of race in psychology and how to change it. American Psychologist, 60, 27–36. Jackson, G. G., & Kirschner, S. A. (1973). Racial self- designation and preference for a counselor. Journal of Counseling Psychology, 20,
  • 57. 560 –564. Johnson, C. E., Wood, R., & Blinkhorn, S. F. (1988). Spuriouser and spuriouser: The use of ipsative personality tests. Journal of Occupa- tional Psychology, 61, 153–162. Johnson, S. C. (2004). The relation of racial identity, ethnic identity, and perceived racial discrimination among African Americans. Unpublished doctoral dissertation, University of Houston, Texas. Kelly, S. (2004). Underlying components of scores assessing African Americans’ racial perspectives. Measurement and Evaluation in Coun- seling and Development, 37, 28 – 40. Kim, J., & Mueller, C. W. (1978). Factor analysis: Statistical methods and practical issues. Newbury Park, CA: Sage. Komaroff, E. (1997). Effect of simultaneous violations of essential �-equivalence and uncorrelated error on coefficient �. Applied Psycho- logical Measurement, 21, 337–348. Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did they really say? Organiza- tional Research Methods, 9, 202–220.
  • 58. Lee, G., Dunbar, S. B., & Frisbie, D. A. (2001). The relative appropriate- ness of eight measurement models for analyzing scores from tests composed of testlets. Educational and Psychological Measurement, 61, 958 –975. Lucke, J. F. (2005). The � and the of congeneric test theory: An extension of reliability and internal consistency to heterogeneous tests. Applied Psychological Measurement, 29, 65– 81. Mercer, S. H., & Cunningham, M. (2003). Racial identity in White Amer- ican college students: Issues of conceptualization and measurement. Journal of College Student Development, 44, 217–230. Onwuegbuzie, A. J., & Daniel, L. G. (1999, November). Uses and misuses of the correlation coefficient. Paper presented at the annual meeting of the Mid-South Educational Research Association, Point Clear, AL. Owens, W. A. (1947). An empirical study of the relationship between item validity and internal consistency. Educational and Psychological Mea- surement, 7, 281–288. Parham, T. A., & Helms, J. E. (1981). The influence of Black students’ racial identity attitudes on preferences for counselor’s race.
  • 59. Journal of Counseling Psychology, 28, 250 –257. Peterson, R. A. (1994). A meta-analysis of Cronbach’s coefficient alpha. Journal of Consumer Psychology, 21, 381–391. Phan, T., & Tylka, T. L. (2006). Exploring a model and moderators of disordered eating with Asian American college women. Journal of Counseling Psychology, 53, 36 – 47. Phelps, R. E., Taylor, J. D., & Gerard, P. A. (2001). Cultural mistrust, ethnic identity, racial identity, and self-esteem among ethnically diverse Black university students. Journal of Counseling & Development, 79, 209 –216. Phinney, J. S. (1990). Ethnic identity in adolescence and adulthood: A review of research. Psychological Bulletin, 108, 499 –514. Phinney, J. S. (1992). The Multigroup Ethnic Group Measure: A new scale for use with diverse groups. Journal of Adolescent Research, 7, 156 – 176. Phinney, J. S., & Alipuria, L. L. (1990). Ethnic identity in college students from four ethnic groups. Journal of Adolescence, 13, 171–183.
  • 60. Raykov, T. (1997). Scale reliability, Cronbach’s coefficient alpha, and violations of essential tau-equivalence with fixed congeneric compo- nents. Multivariate Behavioral Research, 32, 329 –353. Raykov, T. (1998). Coefficient alpha and composite reliability with inter- related nonhomogeneous items. Applied Psychological Measurement, 22, 375–385. Reese, L. E., Vera, E. M., & Paikoff, R. L. (1998). Ethnic identity assessment among inner-city African American children: Evaluating the applicability of the Multigroup Ethnic Identity Measure. Journal of Black Psychology, 24, 289 –304. Roberts, R. E., Phinney, J. S., Masse, L. C., Chen, Y. R., Roberts, C. R., & Romero, A. (1999). The structure of ethnic identity of young adoles- cents from diverse ethnocultural groups. Journal of Early Adolescence, 19, 301–322. Rogers, W. M., Schmitt, N., & Mullins, M. E. (2002). Correction for unreliability of multifactor measures: Comparison of alpha and parallel forms approaches. Organizational Research Methods, 5, 184 – 199. Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in
  • 61. psychological research: Lessons from 26 research scenarios. Psychological Methods, 2, 199 –223. Schmidt, F. L., & Hunter, J. E. (1999). Theory testing and measurement error. Intelligence, 27, 183–198. Thompson, B. (1994). Guidelines for authors reporting score reliability estimates. Educational and Psychological Measurement, 54, 837– 847. Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174 –195. Utsey, S. O., & Gernat, C. A. (2002). White racial identity attitudes and the ego defense mechanisms used by counselor trainees in racially provoc- ative counseling situations. Journal of Counseling & Development, 80, 475– 483. Utsey, S. O., & Ponterotto, J. (1996). Development and validation of the Index of Race-Related Stress (IRRS). Journal of Counseling Psychol- ogy, 43, 490 –501. Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in
  • 62. measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6 –20. Vacha-Haase, T., Kogan, L. R., & Thompson, B. (2000). Sample compo- sitions and variability in published studies versus those in test manuals: Validity of score reliability inductions. Educational and Psychological Measurement, 60, 509 – 622. Yancey, A. K., Aneshensel, C. S., & Driscoll, A. K. (2001). The assess- ment of ethnic identity in a diverse urban youth population. Journal of Black Psychology, 27, 190 –208. Received August 30, 2006 Revision received January 9, 2007 Accepted January 14, 2007 � 246 HELMS Some Better Practices for Measuring Racial and Ethnic Identity Constructs Janet E. Helms Boston College Racial and ethnic identity (REI) measures are in danger of
  • 63. becoming conceptually meaningless because of evaluators’ insistence that they conform to measurement models intended to assess unidimensional constructs, rather than the multidimensional constructs necessary to capture the complexity of internal- ized racial or cultural socialization. Some aspects of the intersection of REI theoretical constructs with research design and psychometric practices are discussed, and recommendations for more informed use of each are provided. A table that summarizes some psychometric techniques for analyzing multidimen- sional measures is provided. Keywords: racial identity, ethnic identity, reliability, validity, factor analysis In counseling psychology, the measurement of racial identity constructs is a relatively new phenomenon. Arguably, the practice began when Jackson and Kirschner (1973) attempted to introduce complexity into the measurement of Black students’ racial identity by using a single categorical item with multiple options (e.g., “Black,” “Negro”) that the students could use to describe them- selves. Helms and Parham (used in Parham & Helms, 1981) and Helms and Carter (1990) built on the idea that assessment of individual differences in racial identity is important, and they added complexity to the measurement process by (a) developing measures that were based on racial identity theoretical frame- works, (b) using multiple items to assess the constructs inherent to the theories, and (c) asking participants to use continua (i.e., 5-point Likert scales) rather than categories to self-describe. These principles underlie the Black Racial Identity Attitudes Scale
  • 64. (BRIAS; formerly RIAS–B) and White Racial Identity Attitudes Scale (WRIAS). In response to perceived conceptual, methodological, or content concerns with Helms and associates’ racial identity measures, many rebuttal measures followed. Rebuttal measures are scales that the new scale originator(s) specifically described as correc- tions for one or more such deficiencies in preexisting identity measures (e.g., Phinney, 1992, p. 157). Subsequent measures have tended to rely on the previously listed basic measurement princi- ples introduced by Parham and Helms (1981), although the theo- retical rationales for the measures have varied. Phinney’s Multi- group Ethnic Identity Measure (MEIM), the most frequently used of the rebuttal measures to date, added the principle of measuring “ethnic” rather than “racial identity,” which she seemingly viewed as interchangeable constructs. The MEIM also introduced the principle of measuring the same identity constructs across racial or ethnic groups rather than group-specific constructs within them. The BRIAS and WRIAS may be thought of as representative of a class of identity measures in which opposing stages, statuses, or schemas are assessed, whereas the MEIM may be conceptualized as representative of a class of measures in which different behav- iors or attitudes are used to assess levels of commitment to a single group (i.e., one’s own). Consequently, these measures are used
  • 65. as exemplars of their classes in subsequent discussions. The two classes of measures imply some similar as well as some different desirable practices with respect to research design, measurement or psychometrics, and interpretation that have not been addressed in the racial or ethnic identity literature heretofore. In fact, virtually no literature exists that focuses specifically on good practices for using or evaluating already developed theory-based racial or ethnic identity (REI) measures. It is important to describe better practices for using already developed REI scales to avoid oversimplifying essentially com- plex measurement issues that are often inherent in REI theoretical constructs. The primary sources of my belief that a discussion of better practices is necessary are my experiences reviewing manu- scripts, submitting manuscripts, advising researchers, and being fully engaged in REI research. Therefore, the purposes of this article are to make explicit better practices for designing research and conducting psychometric analyses when using REI measures to study identity constructs with new samples. I sometimes use published studies to illustrate a practice or procedure; in most instances, the studies were selected because their authors reported results in enough detail to permit the studies’ use for illustrative purposes. More generally, the article is divided into two broad sections, research design practices and psychometric practices.
  • 66. The first section addresses conceptual issues pertinent to research design; the psychometric section addresses scale development concerns. Research Design Practices The content of REI scales is intended to reflect standard samples of particular types of life experiences (racial vs. ethnic) as postu- lated by the relevant theory. A central empirical question with respect to researchers’ use of REI scales is whether racial identity and ethnic identity scales measure the same constructs. However, the question cannot be adequately addressed if researchers do not use research design practices that are congruent with the theoret- Correspondence concerning this article should be addressed to Janet E. Helms, Department of Counseling, Developmental, and Educational Psy- chology, Boston College, 317 Campion Hall, Chestnut Hill, MA 02467. E-mail: [email protected] Journal of Counseling Psychology Copyright 2007 by the American Psychological Association 2007, Vol. 54, No. 3, 235–246 0022-0167/07/$12.00 DOI: 10.1037/0022-0167.54.3.235 235
  • 67. ical model(s) underlying each scale(s) under study. In this section, I (a) discuss some conceptual issues related to measuring racial identity and ethnic identity as potentially different constructs, (b) discuss some poor practices that obscure differences if they exist, and (c) proffer some better practices. Differentiating Racial Identity From Ethnic Identity In REI research designs, if the researcher’s intent is to substitute one class of REI measures for the other, then it is important to demonstrate that the two types of measures assess the same racial or ethnic constructs. Factors to consider are (a) conceptualization of the research question, (b) sample selection, (c) use of other measures for assessing one type of identity rather than the other, and (d) comparability of validity evidence within and across REI measures. Racial Identity Scales as Replacements for Racial Categories Racial groups or categories are not psychological constructs because they do not connote any explicit behaviors, traits, or biological or environmental conditions (Helms, Jernigan, & Mascher, 2005). Instead racial categories are sociopolitical con- structions that society uses to aggregate people on the basis of ostensible biological characteristics (Helms, 1990). Because racial
  • 68. categories are null constructs, Helms et al. (2005) contended that they should not be used as the conceptual focus (e.g., independent variables) for empirical studies but may be used to describe or define samples or issues. Ascribed racial-group membership im- plies different group-level racial socialization experiences that vary according to whether the group is accorded advantaged or disadvantaged status in society. The content of racial-identity scales is individual group members’ internalization of the racial socialization (e.g., discrimination, undeserved privileges) that per- tains to their group. Ascribed racial group defines the type of life experiences to which a person is exposed and that are available for internalizing (i.e., group oppression or privilege). For example, Black Ameri- cans internalize different racial identities than White Americans, and, conversely, White Americans internalize different racial iden- tities than Black Americans. Also, the nature of the racial identities of Americans and immigrants or other nationals differs if they have not experienced similar racial socialization during their life- times. Thus, racial identity theories are intended to describe group- specific development in particular sociopolitical contexts. Racial identity measures are designed to assess the differential impact of racial dynamics on individuals’ psychological develop- ment. One expects items in racial identity scales or inventories to
  • 69. include some mention of race, racial groups, or conditions that commonly would be construed as racial in nature (e.g., discrimi- nation or advantage on the basis of skin color). For example, Helms and Carter’s (1990) WRIAS consists of five 10-item scales, each of which assesses the differential internalization of societal anti-Black racism on Whites’ identity development. Relevant sam- pling and measurement concerns are specifying samples and mea- sures for which race and racism in various forms are presumably relevant constructs. Ethnic Groups as Proxies for Theoretical Constructs Ethnicity refers to the cultural practices (e.g., customs, language, values) of a group of people, but the group need not be the same ascribed racial group. Betancourt and López (1993) use the term ethnic group to connote membership in a self-identified kinship group, defined by specific cultural values, language, and traditions, and that engages in transmission of the group’s culture to its members. Ethnic identity refers to commitment to a cultural group and engagement in its cultural practices (e.g., culture, religion), irrespective of racial ascriptions. Because ethnic groups imply psychological culture-defined constructs, the constructs rather than the categories should be used as conceptual focuses of studies (e.g., independent variables). The content domain of ethnic identity measures is internalized experiences of ethnic cultural socialization. Phinney and
  • 70. associates (Phinney, 1992; Phinney & Alipuria, 1990) initially developed the MEIM to assess adolescents’ search for and commitment to an ethnic identity in a manner consistent with Erikson’s (1968) mul- tistage pychosocial identity theory and without regard to group- specific cultural components. Originally, she conceptualized eth- nic identity as “a continuous variable [construct or scale], ranging from the lack of exploration and commitment . . . to evidence of both exploration and commitment, reflected in efforts to learn more about one’s background” (Phinney, 1992, p. 161). Her con- tinuous scale was composed of items assessing several dimensions of identity (e.g., ethnic behaviors, affirmation, and belonging); hence, it was a multidimensional scale (Helms, Henze, Sass, & Mifsud, 2006), with a focus on cultural characteristics that are assumed to be relevant to individuals across ethnic groups. Although the structure of the MEIM has varied, its underlying conceptual theme is conformance to ethnic culture rather than exposure to racism. The conceptual, sampling, and measurement issues specific to ethnic identity measures pertain to identifying participants who might be reasonably expected to engage in the cultural practices of the ethnic cultural kinship group in question and ensuring that ethnic identity measures assess relevant culture- related rather than race-inspired psychological construct(s). Selection and Use of Appropriate REI Measures Researchers often use one type of REI measure (e.g., ethnic
  • 71. identity) but provide a conceptual rationale for the other type (e.g., racial identity) without empirical justification for doing so. Em- pirical support for the interchangeability of identity constructs and measures would include evidence that (a) exemplars of the two classes of measures are similarly related to the same external racial or cultural criteria or (b) one type of measure explains its own as well as the other theory’s external criteria best. Support for the distinctiveness of constructs would be lack of support for inter- changeability and evidence that other identity measures from the same class relate to each other in a logically consistent matter. Empirical Comparisons of the MEIM and BRIAS as Measures of REI Constructs Researchers do not seem to consider whether their cultural or racial outcome measures are theoretically congruent with the type of REI measure that they have selected. Consequently, lack of 236 HELMS support for their hypotheses is attributed to deficient REI measures rather than possible incongruence between the researchers’ con- ceptualization and measurement of REI constructs in their research designs. It is difficult to find a single study in which both classes of REI measures and racial and cultural outcome measures were
  • 72. used. Yet for the purpose of illustrating the type of study necessary to support interchangeable use of REI measures, perhaps it is reasonable to think that scores on racial identity measures, such as the BRIAS, should be related to scores on explicit measures of racial constructs (e.g., perceived individual racism, institutional racism), whereas scores on ethnic identity measures, such as the MEIM, should be related to scores on explicit measures of cultural constructs (e.g., acculturation, cultural values). Confirmation of each of these propositions would be evidence of construct expli- cation in that each measure would be assessing constructs germane to it. Johnson’s (2004) study provides sufficient psychometric sum- mary statistics to permit illustration of the test for interchangeabil- ity of REI measures at least in part. His sample of Black college students (N � 167) responded to the MEIM and the RIAS–B (Parham & Helms, 1981), the earliest version of the BRIAS. Table 1 summarizes alpha coefficients (rxx; in the last column) for the REI measures, correlation coefficients between REI scores and perceived discrimination scores, and the same correlation coeffi- cients corrected for disattenuation due to measurement error (i.e., low alpha coefficients) attributable to MEIM or RIAS–B scores. In this example, the correction for attenuation may be interpreted as an estimate of the extent to which an REI and a discrimination subscale measure the same underlying theoretical construct when
  • 73. the effects of REI error are eliminated (Schmidt & Hunter, 1996, 1999). The dependent measures in the study, assessed by the Index of Race Related Stress (Utsey & Ponterotto, 1996), were three types of racism: (a) cultural—belief in the superiority of one’s own culture (e.g., values, traditions); (b) institutional— discrimination in social policies and practices; and (c) individual—personally experienced acts of discrimination. The dependent constructs favor the RIAS–B rather than the MEIM, as is evident in the table. If alpha is the correct reliability estimate and the correlations were calculated on untransformed scores, then the correlations, cor- rected for attenuation attributable to REI measurement error, sug- gest even stronger relations between the racial-identity constructs and perceived discrimination than the ethnic-identity constructs. The best correlation for MEIM scores is cultural racism, which is theory consistent and suggests that a more full blown cultural measure might have favored it. Alternative Measures of the Same Construct(s) The question of whether scores on different measures of the same theoretical constructs are related to scores on the original measures is a matter of seeking evidence of convergent validity, a type of construct validity. Researchers seemingly have not devel-
  • 74. oped alternative measures of the same constructs postulated in Phinney’s (1990, 1992) theoretical perspective, and only one set of researchers (Claney & Parker, 1989) has developed independent measures of the theoretical constructs of Helms’s (1984, 1990) White racial identity model. Choney and Rowe (1994) conducted an evaluation of Claney and Parker’s (1989) 15-item White Racial Consciousness Development Scale (WRCDS), a measure of Helms’s (1984) stages of White racial identity development that predated her own measure, the WRIAS. It is worth examining the study for what it can reveal about good practices in empirical investigation of construct validity of scores on REI measures. The expressed purpose of Choney and Rowe’s (1994) study was to investigate “how the WRCDS compares with the RIAS–W [sic], the current instrument of choice for investigations of White racial identity” (p. 102). Yet their conclusion that “it seems reasonable to conclude that the WRCDS is not capable of adequately assessing the stages of White identity proposed by Helms (1984)” (Choney & Rowe, 1994, p. 104) suggests that an unspoken purpose may have been to examine scores on the two scales for evidence of convergent validity. In Table 2, Cronbach’s alpha reliability coefficients for the WRCDS and WRIAS scales are summarized in the last two columns, and correlations between parallel subscales, adapted from Choney and Rowe (1994, p. 103), are shown in the second column. The authors did not report a full correlation matrix for
  • 75. the Table 1 Comparing Obtained and Corrected Correlations Between MEIM and RIAS-B Scores and Perceived Racism Scale Perceived discrimination Alpha Cultural Institutional Individual Obtained Corrected Obtained Corrected Obtained Corrected MEIM .16 .17 �.01 �.01 .08 .09 .87 Racial Preencounter �.01 �.01 .39 .47 .07 .08 .69 Encounter .36 .50 .25 .35 .28 .39 .51 Immersion .33 .41 .29 .36 .22 .27 .65 Internalization .37 .43 .13 .15 .35 .47 .75 Note. From The Relation of Racial Identity, Ethnic Identity, and Perceived Racial Discrimination Among African Americans Tables 1 (p. 18) and 6 (p. 33) by S. C. Johnson, 2004, Unpublished doctoral dissertation, University of Houston, Texas. “Obtained” were correlations reported by Johnson. “Corrected” are estimates of correlations with measurement error removed. Only measurement error for the MEIM or racial identity subscales (i.e., alpha) were used to correct for disattenuation of correlations attributable to measurement error. The correction for attenuation used was rxy� � rxy/rxx
  • 76. .5, where rxy� equals the corrected correlation, rxy equals the obtained correlation, and rxx .5 equals the square root of the reliability coefficient for the relevant MEIM or RIAS-B subscales. MEIM � Multigroup Ethnic Identity Measure; RIAS-B � Black Racial Identity Attitudes Scale. 237SPECIAL SECTION: RACIAL IDENTITY MEASUREMENT two measures, and so it is not possible to examine within- measure patterns of correlations. Table 2 also includes correlations cor- rected for measurement error in each of the REI measures using each scale’s reported Cronbach’s alpha coefficient. The corrected correlations for these data suggest that the two measures assessed the parallel constructs of Reintegration, Auton- omy, and Pseudo-Independence quite strongly and the constructs of Contact and Disintegration much more strongly than mere examination of the obtained correlations would suggest, thereby refuting Choney and Rowe’s assertion that the WRCDS was “incapable” of assessing Helms’s constructs. However, the fact that the corrected Reintegration correlation coefficient exceeded 1.00 suggests either that Choney and Rowe’s original correlations were downwardly biased by sampling error or that alpha coeffi- cients were not the appropriate reliability estimates for their data.