Some Better Practices for Measuring Racial and Ethnic Identity Constructs
Janet E. Helms
Boston College
Racial and ethnic identity (REI) measures are in danger of becoming conceptually meaningless because
of evaluators’ insistence that they conform to measurement models intended to assess unidimensional
constructs, rather than the multidimensional constructs necessary to capture the complexity of internal-
ized racial or cultural socialization. Some aspects of the intersection of REI theoretical constructs with
research design and psychometric practices are discussed, and recommendations for more informed use
of each are provided. A table that summarizes some psychometric techniques for analyzing multidimen-
sional measures is provided.
Keywords: racial identity, ethnic identity, reliability, validity, factor analysis
In counseling psychology, the measurement of racial identity
constructs is a relatively new phenomenon. Arguably, the practice
began when Jackson and Kirschner (1973) attempted to introduce
complexity into the measurement of Black students’ racial identity
by using a single categorical item with multiple options (e.g.,
“Black,” “Negro”) that the students could use to describe them-
selves. Helms and Parham (used in Parham & Helms, 1981) and
Helms and Carter (1990) built on the idea that assessment of
individual differences in racial identity is important, and they
added complexity to the measurement process by (a) developing
measures that were based on racial identity theoretical frame-
works, (b) using multiple items to assess the constructs inherent to
the theories, and (c) asking participants to use continua (i.e.,
5-point Likert scales) rather than categories to self-describe. These
principles underlie the Black Racial Identity Attitudes Scale
(BRIAS; formerly RIAS–B) and White Racial Identity Attitudes
Scale (WRIAS).
In response to perceived conceptual, methodological, or content
concerns with Helms and associates’ racial identity measures,
many rebuttal measures followed. Rebuttal measures are scales
that the new scale originator(s) specifically described as correc-
tions for one or more such deficiencies in preexisting identity
measures (e.g., Phinney, 1992, p. 157). Subsequent measures have
tended to rely on the previously listed basic measurement princi-
ples introduced by Parham and Helms (1981), although the theo-
retical rationales for the measures have varied. Phinney’s Multi-
group Ethnic Identity Measure (MEIM), the most frequently used
of the rebuttal measures to date, added the principle of measuring
“ethnic” rather than “racial identity,” which she seemingly viewed
as interchangeable constructs. The MEIM also introduced the
principle of measuring the same identity constructs across racial or
ethnic groups rather than group-specific constructs within them.
The BRIAS and WRIAS may be thought of as representative of
a class of identity measures in which opposing stages, statuses, or
schemas are assessed, wh.
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Some Better Practices for Measuring Racial and Ethnic Identity.docx
1. Some Better Practices for Measuring Racial and Ethnic Identity
Constructs
Janet E. Helms
Boston College
Racial and ethnic identity (REI) measures are in danger of
becoming conceptually meaningless because
of evaluators’ insistence that they conform to measurement
models intended to assess unidimensional
constructs, rather than the multidimensional constructs
necessary to capture the complexity of internal-
ized racial or cultural socialization. Some aspects of the
intersection of REI theoretical constructs with
research design and psychometric practices are discussed, and
recommendations for more informed use
of each are provided. A table that summarizes some
psychometric techniques for analyzing multidimen-
sional measures is provided.
Keywords: racial identity, ethnic identity, reliability, validity,
factor analysis
In counseling psychology, the measurement of racial identity
constructs is a relatively new phenomenon. Arguably, the
practice
began when Jackson and Kirschner (1973) attempted to
introduce
complexity into the measurement of Black students’ racial
identity
by using a single categorical item with multiple options (e.g.,
“Black,” “Negro”) that the students could use to describe them-
2. selves. Helms and Parham (used in Parham & Helms, 1981) and
Helms and Carter (1990) built on the idea that assessment of
individual differences in racial identity is important, and they
added complexity to the measurement process by (a) developing
measures that were based on racial identity theoretical frame-
works, (b) using multiple items to assess the constructs inherent
to
the theories, and (c) asking participants to use continua (i.e.,
5-point Likert scales) rather than categories to self-describe.
These
principles underlie the Black Racial Identity Attitudes Scale
(BRIAS; formerly RIAS–B) and White Racial Identity Attitudes
Scale (WRIAS).
In response to perceived conceptual, methodological, or content
concerns with Helms and associates’ racial identity measures,
many rebuttal measures followed. Rebuttal measures are scales
that the new scale originator(s) specifically described as correc-
tions for one or more such deficiencies in preexisting identity
measures (e.g., Phinney, 1992, p. 157). Subsequent measures
have
tended to rely on the previously listed basic measurement
princi-
ples introduced by Parham and Helms (1981), although the theo-
retical rationales for the measures have varied. Phinney’s Multi-
group Ethnic Identity Measure (MEIM), the most frequently
used
of the rebuttal measures to date, added the principle of
measuring
“ethnic” rather than “racial identity,” which she seemingly
viewed
as interchangeable constructs. The MEIM also introduced the
principle of measuring the same identity constructs across racial
or
ethnic groups rather than group-specific constructs within them.
3. The BRIAS and WRIAS may be thought of as representative of
a class of identity measures in which opposing stages, statuses,
or
schemas are assessed, whereas the MEIM may be
conceptualized
as representative of a class of measures in which different
behav-
iors or attitudes are used to assess levels of commitment to a
single
group (i.e., one’s own). Consequently, these measures are used
as
exemplars of their classes in subsequent discussions. The two
classes of measures imply some similar as well as some
different
desirable practices with respect to research design,
measurement or
psychometrics, and interpretation that have not been addressed
in
the racial or ethnic identity literature heretofore. In fact,
virtually
no literature exists that focuses specifically on good practices
for
using or evaluating already developed theory-based racial or
ethnic
identity (REI) measures.
It is important to describe better practices for using already
developed REI scales to avoid oversimplifying essentially com-
plex measurement issues that are often inherent in REI
theoretical
constructs. The primary sources of my belief that a discussion
of
better practices is necessary are my experiences reviewing
manu-
scripts, submitting manuscripts, advising researchers, and being
4. fully engaged in REI research. Therefore, the purposes of this
article are to make explicit better practices for designing
research
and conducting psychometric analyses when using REI measures
to study identity constructs with new samples. I sometimes use
published studies to illustrate a practice or procedure; in most
instances, the studies were selected because their authors
reported
results in enough detail to permit the studies’ use for illustrative
purposes. More generally, the article is divided into two broad
sections, research design practices and psychometric practices.
The first section addresses conceptual issues pertinent to
research
design; the psychometric section addresses scale development
concerns.
Research Design Practices
The content of REI scales is intended to reflect standard
samples
of particular types of life experiences (racial vs. ethnic) as
postu-
lated by the relevant theory. A central empirical question with
respect to researchers’ use of REI scales is whether racial
identity
and ethnic identity scales measure the same constructs.
However,
the question cannot be adequately addressed if researchers do
not
use research design practices that are congruent with the
theoret-
Correspondence concerning this article should be addressed to
Janet E.
Helms, Department of Counseling, Developmental, and
Educational Psy-
5. chology, Boston College, 317 Campion Hall, Chestnut Hill, MA
02467.
E-mail: [email protected]
Journal of Counseling Psychology Copyright 2007 by the
American Psychological Association
2007, Vol. 54, No. 3, 235–246 0022-0167/07/$12.00 DOI:
10.1037/0022-0167.54.3.235
235
ical model(s) underlying each scale(s) under study. In this
section,
I (a) discuss some conceptual issues related to measuring racial
identity and ethnic identity as potentially different constructs,
(b)
discuss some poor practices that obscure differences if they
exist,
and (c) proffer some better practices.
Differentiating Racial Identity From Ethnic Identity
In REI research designs, if the researcher’s intent is to
substitute
one class of REI measures for the other, then it is important to
demonstrate that the two types of measures assess the same
racial
or ethnic constructs. Factors to consider are (a)
conceptualization
of the research question, (b) sample selection, (c) use of other
measures for assessing one type of identity rather than the
other,
and (d) comparability of validity evidence within and across
REI
measures.
6. Racial Identity Scales as Replacements for Racial
Categories
Racial groups or categories are not psychological constructs
because they do not connote any explicit behaviors, traits, or
biological or environmental conditions (Helms, Jernigan, &
Mascher, 2005). Instead racial categories are sociopolitical con-
structions that society uses to aggregate people on the basis of
ostensible biological characteristics (Helms, 1990). Because
racial
categories are null constructs, Helms et al. (2005) contended
that
they should not be used as the conceptual focus (e.g.,
independent
variables) for empirical studies but may be used to describe or
define samples or issues. Ascribed racial-group membership im-
plies different group-level racial socialization experiences that
vary according to whether the group is accorded advantaged or
disadvantaged status in society. The content of racial-identity
scales is individual group members’ internalization of the racial
socialization (e.g., discrimination, undeserved privileges) that
per-
tains to their group.
Ascribed racial group defines the type of life experiences to
which a person is exposed and that are available for
internalizing
(i.e., group oppression or privilege). For example, Black Ameri-
cans internalize different racial identities than White
Americans,
and, conversely, White Americans internalize different racial
iden-
tities than Black Americans. Also, the nature of the racial
identities
of Americans and immigrants or other nationals differs if they
7. have not experienced similar racial socialization during their
life-
times. Thus, racial identity theories are intended to describe
group-
specific development in particular sociopolitical contexts.
Racial identity measures are designed to assess the differential
impact of racial dynamics on individuals’ psychological
develop-
ment. One expects items in racial identity scales or inventories
to
include some mention of race, racial groups, or conditions that
commonly would be construed as racial in nature (e.g.,
discrimi-
nation or advantage on the basis of skin color). For example,
Helms and Carter’s (1990) WRIAS consists of five 10-item
scales,
each of which assesses the differential internalization of
societal
anti-Black racism on Whites’ identity development. Relevant
sam-
pling and measurement concerns are specifying samples and
mea-
sures for which race and racism in various forms are presumably
relevant constructs.
Ethnic Groups as Proxies for Theoretical Constructs
Ethnicity refers to the cultural practices (e.g., customs,
language,
values) of a group of people, but the group need not be the same
ascribed racial group. Betancourt and López (1993) use the term
ethnic group to connote membership in a self-identified kinship
group, defined by specific cultural values, language, and
traditions,
and that engages in transmission of the group’s culture to its
8. members. Ethnic identity refers to commitment to a cultural
group
and engagement in its cultural practices (e.g., culture, religion),
irrespective of racial ascriptions. Because ethnic groups imply
psychological culture-defined constructs, the constructs rather
than
the categories should be used as conceptual focuses of studies
(e.g., independent variables).
The content domain of ethnic identity measures is internalized
experiences of ethnic cultural socialization. Phinney and
associates
(Phinney, 1992; Phinney & Alipuria, 1990) initially developed
the
MEIM to assess adolescents’ search for and commitment to an
ethnic identity in a manner consistent with Erikson’s (1968)
mul-
tistage pychosocial identity theory and without regard to group-
specific cultural components. Originally, she conceptualized
eth-
nic identity as “a continuous variable [construct or scale],
ranging
from the lack of exploration and commitment . . . to evidence of
both exploration and commitment, reflected in efforts to learn
more about one’s background” (Phinney, 1992, p. 161). Her
con-
tinuous scale was composed of items assessing several
dimensions
of identity (e.g., ethnic behaviors, affirmation, and belonging);
hence, it was a multidimensional scale (Helms, Henze, Sass, &
Mifsud, 2006), with a focus on cultural characteristics that are
assumed to be relevant to individuals across ethnic groups.
Although the structure of the MEIM has varied, its underlying
conceptual theme is conformance to ethnic culture rather than
exposure to racism. The conceptual, sampling, and measurement
9. issues specific to ethnic identity measures pertain to identifying
participants who might be reasonably expected to engage in the
cultural practices of the ethnic cultural kinship group in
question
and ensuring that ethnic identity measures assess relevant
culture-
related rather than race-inspired psychological construct(s).
Selection and Use of Appropriate REI Measures
Researchers often use one type of REI measure (e.g., ethnic
identity) but provide a conceptual rationale for the other type
(e.g.,
racial identity) without empirical justification for doing so. Em-
pirical support for the interchangeability of identity constructs
and
measures would include evidence that (a) exemplars of the two
classes of measures are similarly related to the same external
racial
or cultural criteria or (b) one type of measure explains its own
as
well as the other theory’s external criteria best. Support for the
distinctiveness of constructs would be lack of support for inter-
changeability and evidence that other identity measures from
the
same class relate to each other in a logically consistent matter.
Empirical Comparisons of the MEIM and BRIAS as
Measures of REI Constructs
Researchers do not seem to consider whether their cultural or
racial outcome measures are theoretically congruent with the
type
of REI measure that they have selected. Consequently, lack of
236 HELMS
10. support for their hypotheses is attributed to deficient REI
measures
rather than possible incongruence between the researchers’ con-
ceptualization and measurement of REI constructs in their
research
designs. It is difficult to find a single study in which both
classes
of REI measures and racial and cultural outcome measures were
used. Yet for the purpose of illustrating the type of study
necessary
to support interchangeable use of REI measures, perhaps it is
reasonable to think that scores on racial identity measures, such
as
the BRIAS, should be related to scores on explicit measures of
racial constructs (e.g., perceived individual racism, institutional
racism), whereas scores on ethnic identity measures, such as the
MEIM, should be related to scores on explicit measures of
cultural
constructs (e.g., acculturation, cultural values). Confirmation of
each of these propositions would be evidence of construct expli-
cation in that each measure would be assessing constructs
germane
to it.
Johnson’s (2004) study provides sufficient psychometric sum-
mary statistics to permit illustration of the test for
interchangeabil-
ity of REI measures at least in part. His sample of Black college
students (N � 167) responded to the MEIM and the RIAS–B
(Parham & Helms, 1981), the earliest version of the BRIAS.
Table
1 summarizes alpha coefficients (rxx; in the last column) for the
REI measures, correlation coefficients between REI scores and
11. perceived discrimination scores, and the same correlation
coeffi-
cients corrected for disattenuation due to measurement error
(i.e.,
low alpha coefficients) attributable to MEIM or RIAS–B scores.
In
this example, the correction for attenuation may be interpreted
as
an estimate of the extent to which an REI and a discrimination
subscale measure the same underlying theoretical construct
when
the effects of REI error are eliminated (Schmidt & Hunter,
1996,
1999).
The dependent measures in the study, assessed by the Index of
Race Related Stress (Utsey & Ponterotto, 1996), were three
types
of racism: (a) cultural— belief in the superiority of one’s own
culture (e.g., values, traditions); (b) institutional—
discrimination
in social policies and practices; and (c) individual—personally
experienced acts of discrimination. The dependent constructs
favor
the RIAS–B rather than the MEIM, as is evident in the table. If
alpha is the correct reliability estimate and the correlations
were
calculated on untransformed scores, then the correlations, cor-
rected for attenuation attributable to REI measurement error,
sug-
gest even stronger relations between the racial-identity
constructs
and perceived discrimination than the ethnic-identity constructs.
The best correlation for MEIM scores is cultural racism, which
is
12. theory consistent and suggests that a more full blown cultural
measure might have favored it.
Alternative Measures of the Same Construct(s)
The question of whether scores on different measures of the
same theoretical constructs are related to scores on the original
measures is a matter of seeking evidence of convergent validity,
a
type of construct validity. Researchers seemingly have not
devel-
oped alternative measures of the same constructs postulated in
Phinney’s (1990, 1992) theoretical perspective, and only one set
of
researchers (Claney & Parker, 1989) has developed independent
measures of the theoretical constructs of Helms’s (1984, 1990)
White racial identity model. Choney and Rowe (1994)
conducted
an evaluation of Claney and Parker’s (1989) 15-item White
Racial
Consciousness Development Scale (WRCDS), a measure of
Helms’s (1984) stages of White racial identity development that
predated her own measure, the WRIAS. It is worth examining
the
study for what it can reveal about good practices in empirical
investigation of construct validity of scores on REI measures.
The expressed purpose of Choney and Rowe’s (1994) study was
to investigate “how the WRCDS compares with the RIAS–W
[sic],
the current instrument of choice for investigations of White
racial
identity” (p. 102). Yet their conclusion that “it seems
reasonable to
conclude that the WRCDS is not capable of adequately
assessing
13. the stages of White identity proposed by Helms (1984)”
(Choney
& Rowe, 1994, p. 104) suggests that an unspoken purpose may
have been to examine scores on the two scales for evidence of
convergent validity.
In Table 2, Cronbach’s alpha reliability coefficients for the
WRCDS and WRIAS scales are summarized in the last two
columns, and correlations between parallel subscales, adapted
from Choney and Rowe (1994, p. 103), are shown in the second
column. The authors did not report a full correlation matrix for
the
Table 1
Comparing Obtained and Corrected Correlations Between MEIM
and RIAS-B Scores and Perceived Racism
Scale
Perceived discrimination
Alpha
Cultural Institutional Individual
Obtained Corrected Obtained Corrected Obtained Corrected
MEIM .16 .17 �.01 �.01 .08 .09 .87
Racial
Preencounter �.01 �.01 .39 .47 .07 .08 .69
Encounter .36 .50 .25 .35 .28 .39 .51
Immersion .33 .41 .29 .36 .22 .27 .65
Internalization .37 .43 .13 .15 .35 .47 .75
Note. From The Relation of Racial Identity, Ethnic Identity, and
14. Perceived Racial Discrimination Among African Americans
Tables 1 (p. 18) and 6 (p.
33) by S. C. Johnson, 2004, Unpublished doctoral dissertation,
University of Houston, Texas. “Obtained” were correlations
reported by Johnson.
“Corrected” are estimates of correlations with measurement
error removed. Only measurement error for the MEIM or racial
identity subscales (i.e., alpha)
were used to correct for disattenuation of correlations
attributable to measurement error. The correction for
attenuation used was rxy� � rxy/rxx
.5, where
rxy� equals the corrected correlation, rxy equals the obtained
correlation, and rxx
.5 equals the square root of the reliability coefficient for the
relevant MEIM
or RIAS-B subscales. MEIM � Multigroup Ethnic Identity
Measure; RIAS-B � Black Racial Identity Attitudes Scale.
237SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
two measures, and so it is not possible to examine within-
measure
patterns of correlations. Table 2 also includes correlations cor-
rected for measurement error in each of the REI measures using
each scale’s reported Cronbach’s alpha coefficient.
The corrected correlations for these data suggest that the two
measures assessed the parallel constructs of Reintegration,
Auton-
omy, and Pseudo-Independence quite strongly and the
15. constructs
of Contact and Disintegration much more strongly than mere
examination of the obtained correlations would suggest, thereby
refuting Choney and Rowe’s assertion that the WRCDS was
“incapable” of assessing Helms’s constructs. However, the fact
that the corrected Reintegration correlation coefficient exceeded
1.00 suggests either that Choney and Rowe’s original
correlations
were downwardly biased by sampling error or that alpha coeffi-
cients were not the appropriate reliability estimates for their
data.
In such circumstances, sampling error concerns can be avoided
to
some extent by using the better conceptual, sampling, and inter-
pretative practices discussed subsequently. Procedures for
judging
the appropriateness of Cronbach’s alpha as one’s reliability
esti-
mator are addressed in the later section on better psychometric
practices.
Poor and Better Construct Defining Practices
Research design practices are considered poor if they do not
permit data obtained from REI measures to be interpreted in a
manner consistent with the relevant REI theory. Practices are
better if they make it possible to subject theory-congruent
hypoth-
eses to empirical testing.
Conceptual Practices
Researchers often assume that because the MEIM is intended to
be a multi-ethnic group measure, it is appropriate to collapse
data
across racial and ethnic groups without examining whether the
16. responses of the subgroups on the MEIM items and scales are
similar (e.g., Phan & Tylka, 2006). Analogously, researchers
ag-
gregate data across ethnic groups within ascribed racial
categories
when using the BRIAS or WRIAS without investigating the
types
of racial socialization to which they have been exposed.
Alterna-
tively, researchers find that ethnic groups differ but still report
aggregated descriptive statistics (Phelps, Taylor, & Gerard,
2001).
A conceptual problem associated with a priori aggregation is
that
researchers presume rather than demonstrate that potentially di-
verse racial and ethnic categories (e.g., African Americans and
other Black ethnic groups) share the same cultural or racial
social-
ization experiences. A methodological consequence is the
potential
loss of statistical power for subsequent analyses, if the groups’
responses are actually different.
Phinney’s (1992) studies show that the term ethnicity may have
different meaning to different populations. More specifically,
she
found ethnic group differences in responses to the MEIM such
that
White participants had lower scores than groups of color. On
the
basis of their responses to an open-ended item, Phinney (1992)
observed that “few White subjects in either sample identified
themselves as belonging to a distinct ethnic group. . . . The
num-
bers of Whites who considered themselves as ethnic group
17. mem-
bers was too small to permit a separate analysis” (p. 174). The
implications of this observation are rarely heeded.
A better research design practice is that users of racial identity
measures should provide a “racial” conceptual rationale,
focused
on racial socialization; users of ethnic identity measures should
provide an “ethnic cultural” rationale, focused on cultural
social-
ization; and researchers interested in both types of identity
should
provide both types of rationales. Matching samples to the
appro-
priate type of REI measure should enhance investigations of va-
lidity.
Also, it follows from the foregoing analyses that correction for
attenuation attributable to measurement error is a good practice
when the results of the researcher’s study are intended to have
far-reaching implications for REI theoretical constructs or to
lead
to substantive advice about the theoretical constructs assessed
by
the measures (Schmidt & Hunter, 1999). Correlations were the
focus of the corrections in the examples, but virtually any
statistic
can be corrected for measurement error if it conforms to the
assumptions of the general linear model (GLM).
Sampling Practices
Because scale respondents’ attributes interact with their re-
sponses to scales generally and REI scales specifically (Dawis,
1987; Helms, 2005; Vacha-Haase, Kogan, & Thompson, 2000),
researchers minimally should provide both a conceptual
18. rationale
for why the particular REI measure is appropriate for the
research
participants that were studied as well as empirical support
derived
from previous studies. However, researchers typically do not
de-
scribe any inclusion criteria for defining their participants as
members of a racial or ethnic group as such designations are
used
in the United States. At best, they indicate that the racial/ethnic
categories were “self-identified” without explaining how such
identification occurred (e.g., Mercer & Cunningham, 2003, p.
221). At worst, they either do not describe the racial or ethnic
composition of their sample in the Participants section at all
(e.g.,
Reese, Vera, & Paikoff, 1998) or they assign participants to a
racial or ethnic group without any indication of how the assign-
ment was determined (e.g., Goodstein & Ponterotto, 1997).
Better practices are that researchers should describe their pro-
cedures for recruiting research participants, collecting racial or
Table 2
An Example of a Convergent Validity Study of White Racial
Identity Constructs as Assessed by the White Racial Identity
Attitudes Scale (WRIAS) and the White Racial Consciousness
Development Scale (WRCDS)
Scale Correlation Correction
�
WRIAS WRCDS
Contact .11 0.42 .54 .13
19. Disintegration .17 0.30 .77 .43
Reintegration .53 1.19 .80 .25
Pseudo-Independence .29 0.61 .69 .32
Autonomy .55 0.91 .67 .55
Note. Alphas are adapted from Choney and Rowe, (1994), p.
103. Cor-
relations are between parallel subscales. Computation of the
correction for
attenuation was as follows: Contact � .11/(.54 � .13).5 � .42;
Disinte-
gration � .17/ (.77 � .43).5 � .30; Reintegration � .53/ (.80 �
.25).5 �
1.19; Pseudo-Independence � .29/ (.69 � .32).5 � .61;
Autonomy � .55/
(.67 � .55).5 � .91.
238 HELMS
ethnic data, quantifying the data, and assigning participants to
racial or ethnic categories. These aspects of researchers’
research
design should be described as thoroughly as the researchers de-
scribe the other measures or manipulations and analyses in their
studies. For example, if respondents were asked to describe
them-
selves, were they provided with checklists or open-ended items?
How were responses coded? If different racial or ethnic groups
were included, the researcher should provide descriptive
informa-
tion (e.g., means, standard deviations, and reliability
coefficients)
as evidence that aggregating the various groups was
appropriate.
20. Careful attention to the racial and cultural aspects of research
designs will provide better contexts for conducting the
psychomet-
ric studies of REI measures that have so intrigued researchers
and
reviewers since the measures’ inception.
Psychometric Practices
The original focus of racial identity scales—assessment tools
for
assisting counselors to better diagnose and remediate the
varying
psychological effects on individuals of internalized racial
social-
ization (Jackson & Kirschner, 1973; Parham & Helms, 1981)—
has virtually disappeared from the REI literature. In fact,
research-
ers have been severely chastised for attempting to use the
measures
for diagnostic or treatment purposes, and evidence of their
useful-
ness has been discounted (Behrens, 1997; Behrens & Rowe,
1997;
Fischer & Moradi, 2001). Instead researchers have focused on
evaluating the worthiness of REI scales by using reliability
anal-
yses, principal-components analyses, and factor analyses to
exam-
ine the internal structure of the measures. However, poor
practices
associated with each of these methodologies threaten to reduce
REI scales to simpler measures than are necessary to explain
individuals’ complex racial and cultural functioning.
Reliability Analyses
21. Researchers typically use Cronbach’s (1951) alpha coefficients
to estimate the reliability of a sample’s responses to REI items
comprising subscales or scales even though Cronbach’s alpha
was
not designed to assess the reliability of multidimensional
measures
(Hattie, 1985). Nevertheless, much of the threat to REI
theoretical
constructs from reliability analyses results as much from
research-
ers’ poor practices with respect to their use of Cronbach’s alpha
as
from the likelihood that it is the wrong statistic most of the time
(Helms, 2005).
To explain when Cronbach’s alpha is the wrong statistic, it is
necessary to provide a brief overview of the calculation and
assumptions underlying use of Cronbach’s alpha because
research-
ers seem to be unaware of them. Consequently, the researchers
do
not examine the fit of their data to the implied reliability
measure-
ment model or to the REI theoretical assumptions under investi-
gation. The Cronbach’s alpha coefficient is the focus of my REI
reliability discussion because it is virtually the only reliability
estimator that is ever used to evaluate REI measures in the REI
measurement literature or other types of measures in the social
and
behavioral sciences literature more generally (Behrens, 1997;
Cor-
tina, 1993; Helms, 1999; Peterson, 1994).
Overview of Cronbach’s Alpha
22. This overview of Cronbach’s alpha (henceforth, alpha) is not
intended to be a technical treatise on the statistic. A number of
primers for applied researchers are available to fulfill that goal.
These include Cortina (1993), Helms et al. (2006), Thompson
and
Vacha-Haase (2000), and the Standards for Educational and
Psy-
chological Testing (American Educational Research Association
[AERA], American Psychological Association, and National
Council on Measurement in Education, 1999). Sometimes poor
practices occur because researchers are not aware of the nature
of
the data necessary to use and interpret alpha; sometimes they
occur
because it is not the appropriate statistic for the researcher’s
intended use. Therefore, the purpose of this overview is to
provide
enough information to explain why the proposed “better”
practices
are better.
An alpha coefficient is a statistic that summarizes the degree of
interrelatedness among a sample of participants’ responses to a
set
of items intended to measure a single construct. Alpha
coefficients
typically range from zero to 1.00, with values approaching 1.00
suggesting a high level of positive intercorrelation among items.
The nature of the interitem relationships is captured in the
formula
for standardized alpha:
� � kr/�1 � �k�1)r], (1)
where k refers to the number of items in the subscale or scale
23. and
r is the average correlation between participants’ responses to
all
pairs of subscale items. It should be noted that this alternative
alpha formula should not be used unless the researcher
standard-
izes item responses before calculating total scores for
subsequent
analyses because it assumes homogeneity of item responses;
how-
ever, it is useful for making some necessary points (Helms et
al.,
2006).
As is true of all reliability coefficients, alpha is not an inherent
or stable property of scales, regardless of its value (AERA et
al.,
1999; Thompson, 1994). Rather it is a value that describes one
sample’s responses to a set of items under one set of circum-
stances. Therefore, appraisals of REI measures based solely on
the
magnitude of alpha coefficients, such as “Practitioners should
withhold use of the WRIAS until there is clear evidence with
regard to what it measures” (Behrens, 1997, p. 10), reflect
multiple
misunderstandings concerning how alpha coefficients should be
used and interpreted.
With few exceptions (e.g., Owens, 1947), researchers’
reliability
ideal is an alpha coefficient of 1.00 or as close as possible,
which
is the standard used to evaluate REI measures and non-REI mea-
sures alike. A coefficient of such magnitude would be correctly
interpreted as indicating that 1.00 or 100% of the variance of
the
24. total subscale or scale scores is reliable or systematic variance
for
the sample under study. No conclusions could rightfully be
made
about “what” the scale scores measure (i.e., the nature of the
systematic variance), which is a matter to be addressed by
means
of validity studies involving measurement or manipulations of
relevant constructs external to the target measure. Generalizing
the
obtained level of reliability to other samples would not be
proper
because reliability coefficients likely differ from sample to
sample.
Moreover, obtaining alpha coefficients close to 1.00 is a Pyrrhic
ideal generally because a unit value signifies that (a) certain
kinds
of statistical analyses cannot be conducted at the item level or
will
yield spurious results, and (b) the usefulness of scale scores for
validity studies under such circumstances is quite limited.
Owens’s
reductio ad absurdum is that perfect or almost perfect internal
consistency reliability (ICR) coefficients indicate that sample’s
239SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
responses to each item are perfectly predictable from their re-
sponses to every other item as well as the total scale scores
from
which the alpha coefficient was derived. Thus, the item-level
responses would be redundant with each other and results of
25. statistical analyses requiring matrix inversion—such as factor
analyses, structural equation modeling, and item analyses via
multiple regression analyses—would be trivial. Some computer
packages warn the user that the data matrix is “non-positive
definite” when this type of redundancy occurs.
The relevant validity issue is that when the ICR coefficient is
nearly perfect, it suggests that the items and subscale scores
have
some single construct (i.e., systematic variance) in common and
for validity evidence to be obtained, the same single construct
must
be salient in the external criteria. For example, the researcher
would need to identify the same cultural construct in the MEIM
and the external criteria; in the case of the WRIAS or BRIAS
and
external criteria, the researcher would have to identify the same
racial construct in each, if alpha coefficients were perfect. Use
of
alpha to evaluate either class of REI measures presumes that
item
responses fit a unidimensional structure, because alpha is
intended
to evaluate the unidimensionality of item responses or scale
scores,
whether it does or not (Hattie, 1985). Thus, validity analyses
might
result in “too small” validity coefficients when alpha is too
large.
In sum, it is not clear why alpha is so popular a statistic for
evaluating measures generally given that, at its best, it
describes
and promotes development of very simple constructs. In some
ways, the simple structure assumption is least problematic for
REI
26. measures such as the MEIM if their theoretical rationale
proposes
positively related constructs and their item responses are
positively
correlated. Positively correlated item responses and subscale
scores yield high alpha coefficients, even if they restrict the
other
kinds of analyses that should be conducted.
Simplification of constructs is most problematic for evaluation
of REI measures such as the WRIAS because their theories pro-
pose that persons endorsing some subscale items as self-
descriptive will reject others. Depending on the samples’ racial
socialization experiences, datasets may be defined by some
nega-
tive and some positive correlations among item responses; some
item responses may be more homogeneous as indicated by small
standard deviations; and the sample’s level of endorsement (i.e.,
item means) might differ across items. Any of these conditions
would contribute to low alpha coefficients because they are vio-
lations of basic alpha measurement assumptions, but each of the
conditions may be consistent with some REI theory.
It is important to understand these basic aspects of alpha as a
measurement model so that the researcher can make an informed
decision about whether its use is consistent with the theoretical
framework of the selected REI measure and the researcher’s
rea-
son for selecting it. Once data are collected, it is important to
check
the validity of assumptions associated with anticipated
psychomet-
ric analyses, particularly if REI scale modification depends on
the
magnitude of obtained reliability coefficients.
27. Assumptions and Scale Modification
Because alpha is virtually the only reliability coefficient used to
evaluate and modify REI measures, it is useful to discuss
assump-
tion checks and scale modification practices as they pertain to
alpha. Yet the better practices are relevant to virtually any reli-
ability coefficient.
Alpha Measurement Assumptions
The basic assumptions that should be examined to support use
of alpha are (a) item responses are positively correlated, (b)
item
responses are homogeneous, (c) item means are essentially
equal,
and (d) the REI theory postulates unidimensional or
homogeneous
constructs. The first assumption can be investigated by
examining
the interitem correlation matrix. The presence of any negative
correlations means that the resulting alpha coefficient will be an
underestimate of item relatedness. If Assumption A is
confirmed,
Assumption B can be checked by using Feldt and Charter’s
(2003)
ratio for examining interitem homogeneity of variances (i.e.,
com-
pare the largest item standard deviation to the smallest item
stan-
dard deviation in the data set). If the result of the comparison is
less than 1.3 (i.e., SDL/SDS � 1.3), alpha may be an
appropriate
estimate of ICR. If Assumption B is supported, then the
smallest
28. item mean and largest item mean should be compared via statis-
tical tests (e.g., paired comparison t tests, within-subjects
analysis
of variance). The check for Assumption D is conceptual, but it
may
be rejected if the REI theory proposes clusters of items or
people
intended to measure more than one construct using a single
scale
or multiple subscales.
If any of the assumptions are not supported, then alternative
procedures for estimating ICR are available. Some of these
alter-
native procedures are summarized in Table 3, which cites re-
sources and describes alternatives to use when specific
measure-
ment conditions exist. For example, if alpha is “too low,”
Rogers,
Schmitt, and Mullins (2002), cited in the last row, recommend
exploratory factor analysis, followed by calculation of alphas
for
identified item subsets and composite alpha if the researcher
intends to use the items as a single multidimensional scale. The
procedures in Table 3 ought to work well for MEIM-like scales
but
not for WRIAS-like scales if they are ipsative.
Ipsative Measurement Assumptions
Conceptually, a measure is ipsative if individuals’ scores are
their explicit or implicit self-rankings on a set of items or
subscales
such that some scores must be higher than some others.
Ordinarily,
ipsativity results from response formats (e.g., forced choice,
29. rank-
ings) or transformations (e.g., subtracting person’s mean scores
from total scores). However, for some REI measures, ipsativity
is
induced by individual participants’ need to be logically
consistent
and, therefore, unwillingness to endorse contradictory items as
self-descriptive. Consequently, when some subscale scores are
high, some others are relatively lower (i.e., some subscales are
inversely related within individuals and therefore negatively
cor-
related between individuals).
Three properties that Helms and Carter’s (1990) WRIAS share
with ipsative measures are (a) half or more correlations in a
matrix
of correlations typically are negative, (b) samples’ mean
correla-
tion among subscales typically is negative and approaches zero,
and (c) the average of the full-scale item responses is a
constant.
Of the 21 correlation matrices analyzed by Behrens (1997), for
which 10 correlation coefficients were reported, 100% consisted
of
five or more negative correlations and the mean correlation,
240 HELMS
weighted by sample size, was –.03. For the 50 WRIAS items,
the
average value within individuals and samples rounds to the
scale
midpoint (i.e., 3). Also, except for measurement error, such as
missing data, item means for any four subscales (total number
30. of
subscales minus one) will also equal a rounded value of 3 in
most
samples, meaning that the contribution of the remaining items is
not unique.
Experts in measurement have long debated how best to analyze
ipsative datasets given that the data violate all of the
assumptions
of the GLM and CTT, particularly the assumption of random
error
among items (Baron, 1996; Johnson, Wood, & Blinkhorn, 1988).
There is no consensual resolution to the debate, and most of the
compromises do not pertain to REI scales because their
ipsativity
is theoretically induced rather than an artifact of the response
format of scales. For now, the best compromises are first to
explore Points a– c from the previous paragraph to determine
whether data conform to an ipsative pattern. Second, if so, do
not
include all of the items in reliability or factor analyses. Conduct
analyses with single subscales or subsets of subscales. Also,
con-
sider analyses that do not depend on correlation coefficients,
such
as cluster or profile analyses. Third, if data are not ipsative,
multidimensional analyses in Table 3 might yield meaningful
results.
REI Scale Modification
Many researchers modify or negatively evaluate REI measures
when they obtain unsatisfactory alpha coefficients. Such
practices
typically occur without regard to any of the previously
discussed
31. research design or psychometric issues and thereby contribute
to
the development of nonstandard, atheoretical scales and perhaps
the discarding of important information about samples.
Vacha-Haase (1998) developed reliability– generalization (RG)
methodology to assess the effects of sample and research design
characteristics on the magnitude of reliability coefficients
across
studies. RG studies allow the researcher to adjust for the fact
that
tests per se are not reliable by identifying the conditions (e.g.,
sample demographic and response characteristics, settings)
under
which one is likely to obtain desired levels of reliability for
some
intended purpose using the same set of items. Utsey and
Gernat’s
(2002) study provides an example of how RG studies might be
used to improve psychometric reliability practices.
Utsey and Gernat used their obtained alpha coefficient of .28
for
scores on the 10-item WRIAS Autonomy subscale as the
rationale
for dropping two of its items, “Sometimes jokes based on Black
people’s experiences are funny” and “I understand that White
women and men must end racism in this country because White
people created it.” The two items assessed sense of humor and
historical knowledge in Helms’s (1990) White identity theory.
To
their credit, the researchers (a) reported which items were
omitted;
(b) attempted to find justification for their unusually low alpha
in
32. previous literature; and, finding none quite as low, (c) checked
their data for input errors before revising the subscale on the
basis
of corrected item–total correlations. The revised alpha was .55.
Had Utsey and Gernat conducted RG comparisons of their
sample to samples in other studies as Helms (2005) recently
recommended, they would have discovered that relative to
Helms
and Carter’s (1990) referent total sample, the responses of their
sample were much more variable on the Contact subscale; they
were less variable with respect to the remaining subscales,
except
Autonomy for which the authors did not report a standard devia-
tion for the full subscale. Participants in their study differed on
a
variety of attributes relative to Helms and Carter’s referent
sample
(including sense of humor and racial historical knowledge), but
the
point here is that if researchers do not rule out sample attributes
as
a rationale for their reliability results, then it is improper to
ascribe
the “erratic pattern of Cronbach’s alpha coefficients” to the REI
scale or subscales rather than their “unusual” sample (Utsey &
Gernat, 2002, p. 477).
An alpha coefficient is a point estimate of a population value
that is obtained with some level of precision and is sample
depen-
Table 3
Summary of Some Recommended Methodologies for Evaluating
Measures of Multidimensional Constructs
33. Author Description
Bacon, Sauer, & Young
(1995)
In SEM, use weighted omega to estimate rxx so that items can
violate alpha assumptions and receive weights
proportional to their true score variances.
Ferketich (1990) Compute rxx (a) from first eigenvalue of a
PCA (omega) or (b) the item commonalities of an FA (theta) if
items are heterogeneous.
Komaroff (1997) If CFA reveals correlated error, adjust alpha
by subtracting estimated positive sum of error covariances from
it.
Lee, Dunbar, & Frisbie (2001) SEM multifactor partially tau
equivalent—assumes items within subscales are homogeneous
and positively
correlated but subscales are not. SEM multifactor congeneric
(heterogeneous)—assumes subscale-specific
common factors; structural coefficients are not restricted for
subscales, and different parameters in each
subscale may be estimated.
Raykov (1998) 1. Use SEM to test whether scale is congeneric
(not homogeneous); if so, examine MI associated with error
covariances and expected parameter changes. If MIs � 5, it may
reflect heterogeneous item subsets. 2.
EFA using maximum likelihood extraction and examine chi-
square for fit; compare eigenvalues.
Raykov (1997) Use Raykov LVM to examine underlying factor
structure of item sets before deleting items to increase
alpha.
34. Rogers et al. (2002) If CFI is � .80, use EFA to identify item
subsets and calculate composite alpha, if a single
multidimensional
scale is desired.
Note. SEM � structural equation model; PCA � principal-
components analysis; FA � factor analysis; CFA �
confirmatory factor analysis; MI �
modification indices; EFA � exploratory factor analysis; LVM
� latent variable model; CFI � comparative fit index.
241SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
dent. RG studies may be used to discover whether one’s
obtained
reliability coefficients are aberrant relative to samples in other
studies (Helms et al., 2006). Fan and Thompson (2001) recom-
mended that researchers report confidence intervals (CIs) for
ob-
tained alpha coefficients and provided the methodology for
calcu-
lating them (pp. 522–523). They also illustrated the analysis for
statistically comparing an obtained coefficient to a population
value(s).
Helms et al.’s (2006) and Fan and Thompson’s (2001) advice
may be applied to Utsey and Gernat’s (2002) previously
discussed
Autonomy coefficient (.28). Behrens’s (1997) meta-analysis of
Autonomy alpha coefficients from 23 studies, which was cited
by
Utsey and Gernat, yielded an alpha coefficient population
35. estimate
of .61. The upper and lower limits of the (presumably) 95% CI
for
the population estimate were respectively .63 and .60. The 95%
CI
(based on a central F distribution) calculated for Utsey and Ger-
nat’s alpha of .28 is .09 (lower limit) to .44 (upper limit). Thus,
the
range of population estimates for their obtained and revised
alpha
(.55) coefficients were considerably below the average range
Be-
hrens reported. Additional support is that an analysis of
variance,
conducted with the smaller alpha value in the numerator (i.e.,
.28),
indicates that the researchers’ reported alpha was significantly
lower than Behrens’s population estimate of .61, F(144, 1296)
�
1.85, p � .0001.
Therefore, it is reasonable to conclude either that Utsey and
Gernat’s sample was aberrant or that alpha was not the
appropriate
statistic for analyzing their data. The more general principle is
that
to maintain the theoretical meaningfulness of REI scales,
modify-
ing the scales or subscales should not be the automatic response
to
“too-low” alpha coefficients. Instead, alternative psychometric
hy-
potheses should be explored, including effects of sample
attributes.
Whole Scale REI Scale Revisions
36. Researchers routinely engage in a variety of practices intended
to develop new REI measures from the original items. Most of
these practices involve analyses of responses to the entire scales
by
means of techniques intended to assess the fit of data to a unidi-
mensional measurement model (i.e., interscale correlations,
principal-components analysis, and factor analysis). Many of
these
analyses are conducted without regard to the interplay between
psychometrics and theory; some others rely on improperly con-
ducted or incorrectly interpreted psychometric procedures.
Nonstandard REI Scales
One consequence of disregarding the interactions between the-
ory and psychometric practices is that researchers replace
standard
sets of items with whatever sets of items best describe their
samples. Fit may be determined by reliability analyses, factor
analysis, principal-components analysis, correlation analyses, or
some combination.
Reliability Analyses for Subscales
When using the MEIM, researchers frequently describe the
measure as consisting of different numbers of items (range: 10
to
24) and different numbers of scales or subscales, dimensions, or
components (range: 1 to 5 subscales), each of which consists of
varying numbers of items and item anchors (e.g., Carter,
Sbrocco,
Lewis, & Friedman, 2001; Cokley, 2005). Often it is impossible
to
discern whether the items used by authors correspond to those
37. listed by Phinney (1992). The analogue for racial identity
measures
is that researchers drop scales or recombine items on the basis
of
their own or someone else’s reliability analyses or personal
pref-
erences (Kelly, 2004).
It is not clear why researchers are so flexible about the structure
of the MEIM in particular given that Phinney (1992) intended
her
measure to assess the same constructs across ethnic groups. Per-
haps they are confused because she did not report reliability
coefficients for the scores of her two developmental samples on
the two-item Ethnic Behaviors subscale. She asserted that “reli-
ability [i.e., alpha] cannot be calculated with only two items”
(Phinney, 1992, p. 165). Subsequent researchers have followed
suit and cite her as the source for this poor practice. However,
Phinney’s assertion is demonstrably untrue. Much of CTT has
focused on developing methodologies for estimating the
reliability
of scores on two-item (e.g., split-half, alternate form) tests
(Feldt
& Charter, 2003). In fact, if the researcher calculates item vari-
ances and covariances, then the standard formula for alpha, used
in
Table 4, may be used to estimate reliability.
The Spearman–Brown formula was used to estimate the alpha
coefficient for the two-item Ethnic Behavior subscale scores of
Phinney’s college student sample whose overall reliability for
responses to 14 items was .90. As shown in Table 4, the
estimated
alpha for the Ethnic Behavior subscale responses was much
lower
than the alphas reported for the responses of her college student
38. sample to the other subscales, but it could be calculated and
doing
so is consistent with the advice in the Testing Standards that
researchers report reliability coefficients for all scales and sub-
scales used in their studies (AERA et al., 1999).
Table 4
Summary of Calculation of Two-Item Reliability and Composite
Alpha for Phinney’s (1992) College Student Sample
Scale
Phinney data Untransformedb
k � M SD M SD
Affirmation/Belonging 5 .86 3.36 .59 16.80 2.95
Identity Achievement 7 .80 2.90 .64 20.30 4.48
Ethnic Behaviorsa 2 .56 2.67 .85 5.34 1.70
Ethnic Identity 14 .90 3.04 .59 42.56 8.26
Composite alpha 3 .80
Note. Data adapted from the college student sample in “The
Multigroup
Ethnic Group Measure: A New Scale for Use With Diverse
Groups,” by
J. S. Phinney, 1992, Journal of Adolescent Research, 7, Tables
2 & 3, p.
167. Calculation of composite alpha (CA) was as follows:
CA � �k/k�1)�[1�( SDss
2/SDtotal
2
]
39. � �3/ 2
� �1 � ��2.95 2 � 4.48 2 � 1.7 2
/8.26 2�
� �3/ 2
� �1 � �31.6629/68.2276
�
� .80
a Alpha for Ethnic Behaviors was estimated from alpha of
Ethnic Identity
Scale scores using Spearman-Brown formula. bUntransformed
scores
were computed by weighting Phinney’s data by the number of
items (k)
and used to compute composite alpha.
242 HELMS
Reliability Analyses for Total Scale Scores
Judging from her reporting of reliability coefficients for four
individual subscales (Phinney, 1992, p. 165), Phinney intended
one early version of the MEIM to assess four unidimensional
constructs: Affirmation/Belonging (five items, �s � .75, .86),
Ethnic Identity Achievement (seven items, alphas � .69, .80),
Ethnic Behaviors (two items, alphas not reported), and Other
Group Orientation (six items, �s � .71, .74). The two alphas in
parentheses are for samples of high school students (N � 417)
and
college students (N � 136), respectively. The Ethnic Identity
total
40. scale (14 items, �s � .81, .90) is the aggregated responses to
the
14 items comprising the three subscales (excluding Other Group
Orientation).
Researchers routinely report and analyze reliability data only
for
the Ethnic Identity (EI) total, thereby overlooking potentially
the-
oretically interesting information. Alternatively, they report
alpha
only for the EI total but use the individual subscales in their
analyses, thereby ignoring the better practice of reporting
reliabil-
ity coefficients for all scales and subscales (AERA et al., 1999).
A
third poor practice is that researchers use item variances rather
than subscale variances to calculate alpha for the composite
scores
(i.e., composite alpha) in spite of the fact that the EI total
obviously
consists of multiple components or subscales (i.e., is
multidimen-
sional).
Cronbach (1951) advised that for multisubscale measures, re-
searchers should use subscale values (e.g., variances,
intercorrela-
tions) in the standard formula to calculate lumpy alpha. Doing
so
determines whether a principal component, defining a
superordi-
nate dimension (e.g., ethnic identity), runs through the subscale
responses more strongly than individual constructs run through
subscale responses. In Table 4, I show the difference between
composite or lumpy alpha (.80), calculated for Phinney’s (1992)
41. data for college students, and the (presumably) item-level alpha
that she reported (.90). To do so, the original subscale variances
were estimated by weighting their standard deviations by the
number of scale items (k). Only one of the subscales has a lower
alpha than composite alpha, which means that although a
common
theme runs through the subscales, it does not warrant
abandoning
the theoretical constructs that the individual subscales assess by
replacing them with total scale scores.
When researchers use responses to individual items to calculate
ICR for total scale scores rather than responses to subscales,
total-scale alpha coefficients typically will be inflated for one
or
both of the following reasons: (a) the number of items for the
scale
overall exceeds the number of items for the separate subscales
(note that shared variance is weighted by the number of items in
alpha formulas) and (b) nominal positive correlations across
item
sets elevate the level of shared variance (Cortina, 1993; Helms
et
al., 2006). Thus, users of the MEIM and other REI measures
often
have been seduced by higher total-scale alpha coefficients into
abandoning the theoretical constructs underlying the selected
mea-
sure.
Yet conceptually meaningful measures are likely to yield better
validity evidence. Table 1 illustrates this point to some extent.
Notice that scores on the measure that yielded the best ICR (rxx
�
.87) correlated worst with measures of other constructs, but the
measure whose scores yielded the worst ICR (rxx � .51)
42. demon-
strated better correlations overall than any of the other measures
with better ICR (i.e., alphas). For some of the revisions of the
MEIM, it would not be surprising to discover that the subscales
yielded higher correlations with measures of other constructs
than
did the total scale because conceptual complexity is lost by
aggre-
gating subscales. Thus, a better practice is that if composite
ICR
must be calculated for REI measures for some purpose, then
researchers should calculate it at the subscale level as a means
of
determining whether it is meaningful or necessary to collapse
across theoretical REI constructs.
Analyses of Correlation Coefficients
Researchers frequently calculate correlations among subscales
and when large correlations are found between pairs of scales,
they
either (a) collapse the subscales, (b) claim “multicollinearity”
as
the explanation for why their hypotheses were not confirmed, or
(c) use the findings as a rationale for creating new scales. Both
statistical and REI theoretical assumptions interact to suggest
that
these are not good practices. Examination of the dataset’s fit to
statistical assumptions is necessary if the researcher intends to
use
inferential statistics to test hypotheses concerning correlations
between REI subscales or to reconfigure REI subscales.
Onwuegbuzie and Daniel (1999, pp. 8 –10) reported that re-
searchers fail to confirm that Pearson correlation coefficients
43. are
the correct statistic for evaluating associations between
measures
by assessing the conformance of the distributions of the pair(s)
of
variables to the assumptions of GLM. Some of the major
assump-
tions and their relevance to REI measures are as follows:
1. One variable of the pair is presumed to be the independent or
predictor variable and the other is presumed to be the dependent
or
criterion variable. Of course, this assumption is not true in the
case
of REI measures given that subscales are administered to the
same
people at the same time via the same measure and that REI
theories
suppose that subscale scores are interrelated within individuals.
Consequently, sampling and measurement error are likely corre-
lated and, therefore, influence the magnitude of correlations in
one
direction or another.
2. The dependent variable must be normally distributed. This
also is unlikely to be true because many of the REI theories
suppose that people behave differently according to the
setting(s)
in which they find themselves or the people with whom they are
interacting. Thus, skewness or kurtosis of subscale responses
may
affect the size of correlation coefficients, thereby contributing
to
Type I or Type II error.
3. The variability of scores for the dependent variable is about
44. the same at all levels of the independent variable. Because no
variable is the independent or dependent variable when all intra-
subscale correlations are compared, this assumption in effect re-
quires variances to be equal for all subscales at all levels, which
is
unlikely given that REI theories postulate sample heterogeneity
across subscales.
In sum, attributions that REI subscales are flawed, which cite
bivariate correlations as evidence, might be counterindicated if
the
researcher cannot provide evidence that relevant GLM assump-
tions were tested. Moreover, that a correlation differs
significantly
from zero is not evidence of multicollinearity of either of the
involved subscales. Researchers often use the term
multicollinear-
ity to mean redundancy, both of which are inferred from “large”
243SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
correlations of various sizes between scales, although the
defini-
tion of large varies from study to study.
Branch (1990) pointed out that it is fallacious to conclude on
the
basis of even substantial correlations (e.g., .80) that two
subscales
measure the same construct and are interchangeable as a result.
He
contended that (a) correlations do not necessarily reveal that
each
45. person obtained similar scores on each subscale, and (b) inter-
changeability requires evidence that the subscales involved
share
observed scores, means, variances, and content. These are
aspects
of whole scale correlation analyses that are rarely examined and
interpreted in evaluations of REI measures.
Principal-Components and Factor Analysis
When alpha coefficients are too small, researchers routinely
conduct post hoc principal-components analysis (PCA) or factor
analysis (FA) to develop “reliable and conceptually meaningful
scales” (Mercer & Cunningham, 2003, p. 217) or to “identify
and
describe a [more useful] subset of items” (Yancey, Aneshensel,
&
Driscoll, 2001, p. 194) from already developed conceptually
mean-
ingful REI scales. In doing so, they typically confuse PCA with
FA, although many psychometric texts indicate that the two
types
of analyses are based on different mathematical assumptions
and
serve different purposes (Kim & Mueller, 1978). Yet some of
the
poor psychometric practices are similar when they are used to
evaluate entire REI measures at the item level and some are
different.
PCA issues. PCA has been the primary methodology used to
evaluate responses to both types of REI measures. PCA is
intended
to reduce a large number of items (in this case) to a smaller
number, and the first component accounts for the maximum
amount of variance possible among the items. The implicit re-
46. search question when PCA is used to analyze the WRIAS,
BRIAS,
or the MEIM is whether the items can be transformed into some
other smaller set of variables—a question whose answer is theo-
retically meaningless. Helms and Carter (1990) used PCA in
developing the WRIAS and should not have because the number
of
dimensions (i.e., subscales) was already rationally defined by
theory.
PCA conducted at the item level assembles the strongest posi-
tively related items across subscales until it has accounted for
as
much variance as possible. Yet typically the analysis accounts
for
less interitem variance overall than the average of the alpha
coef-
ficients of the multidimensional subscales that inspired the PCA
analysis because the units of analysis are different. That is, reli-
ability analyses generally examine item responses within sub-
scales, whereas PCA analyses examine items without regard to
subscale. For example, in Mercer and Cunningham’s (2003)
WRIAS study, alpha coefficients accounted for an average of
62%
of the interitem variance, whereas their PCA accounted for 42%
of
the interitem variance.
Yet researchers may fool themselves into believing that they
have discovered better subscales because alpha coefficients
calcu-
lated for PCA-derived scales must be large because the PCA
statistically maximizes the shared covariance among item re-
sponses of the sample. Thus, alpha coefficients calculated for
PCA-derived subscales are statistical artifacts. Also, it should
be
47. noted that the amount of interitem variance explained by alpha
is
equivalent to the first principal component if the same data are
used in the analyses (e.g., subscale items) and the previously
discussed alpha assumptions are supported (e.g., homogeneity
of
variances). Support is recognizable from equal pattern/structure
coefficients (formerly “loadings”). If responses are
heterogeneous,
then the eigenvalue for the first component may be used to
calcu-
late a variety of statistics other than alpha to assess ICR of
measurements (see Table 3; Hattie, 1985). Nevertheless, use of
results from analyses of full scales to replace subscales
endangers
theoretical constructs because the analyses involve different
clus-
ters of items as well as different implicit hypotheses.
FA issues. In general, Phinney and her associates have con-
ducted increasingly more sophisticated FAs of her measures, al-
though evaluators of her measure typically have favored PCA
(Phinney, 1992; Roberts et al., 1999). On the face of it, FA and
confirmatory FA should permit the test of theoretical models
associated with the relevant items, if the theory and data fit the
measurement model. As previously discussed, when alpha
coeffi-
cients are too large (e.g., close to 1.00) or data are ipsative,
then
FA of entire scales either will not be possible or will yield
specious
results because of item redundancy. Warnings to the effect that
covariance matrices are “nonpositive definite” or “ill-
conditioned”
are indicative of interdependencies among items, but such warn-
48. ings are not necessarily indicative of flawed measures. Instead
they
might signal the need to shift one’s psychometric focus from
traditional ICR studies to alternatives, such as, for example,
anal-
yses to identify clusters or profiles of people rather than items
(Johnson, Wood, & Blinkhorn, 1988).
Common issues. For PCA and FA, researchers typically do not
report their analyses well enough to permit replication. Either
they
do not report methods for deciding the number of components or
factors that were extracted or rotation methods, or they use out-
moded methods, or they test models that are incongruent with
theory. A few remedies for these poor practices not discussed
elsewhere are as follows:
1. As appropriate, researchers should report all of the pattern/
structure coefficients for their PCA, FA, or structural equation
modeling (SEM) regardless of whether the coefficients conform
to
a preferred cut score (Lance, Butts, & Michels, 2006). They
should
also report eigenvalues, numbers of items analyzed, commonali-
ties, and any other structural properties of the analysis that
would
permit other investigators to verify or better understand their
findings.
2. Researchers should specify the assumptions of the measure-
ment model that they tested (e.g., homogeneity of variances).
For
SEM researchers, a good practice is to indicate what parameters
were constrained so that other researchers can determine
whether
the measurement model fits their interpretation of the relevant
49. REI
theory.
3. Users of PCA and FA should report what procedure was used
to decide the number of factors or components to extract and
should use parallel analysis rather than Kaiser’s criterion of
eig-
envalues greater than 1.00 (Hayton, Allen, & Scarpello, 2004).
4. Most REI theories do not propose orthogonal or independent
constructs; therefore, models and rotation methodologies that
as-
sume independence should not be used if they are inconsistent
with
theories.
5. If ipsative data are analyzed, they will necessarily yield
bipolar factors or components equal to one fewer than the
number
244 HELMS
of REI subscales, and the resulting “new” PCA/FA-derived sub-
scales will be ipsative or partially ipsative, too.
Conclusions
Researchers have not differentiated scale development from
theory testing research. The former is necessary if no tool exists
for
assessing relevant theoretical constructs or if the researcher
wants
to develop alternative measures to evaluate constructs, but this
typically has not been the case where REI scales are concerned.
50. Existent REI scales may need to be revised to better represent
their
underlying hypothetical constructs, but this cannot happen if (a)
each researcher changes the content of already developed REI
scales to reflect the responses of each new sample, (b) the mea-
surement models and samples used to inform the revisions do
not
fit the model implied by the REI theory being tested, and (c)
revisions rely exclusively on evaluations of the internal
structure
of the REI measures, no matter how well the research is
conducted.
REI scale development began as a quest to replace one-item
measures (i.e., racial categories) with more complex tools for
assessing individual differences in internalized racial-group
social-
ization (Jackson & Kirschner, 1973). However, contemporary
re-
searchers have commonly engaged in a variety of research
design
and psychometric practices that threaten to return the measures
and
their associated theories back to their more simplistic
atheoretical
roots, thereby limiting their usefulness for understanding the ef-
fects of racial and cultural socialization on people’s mental
health.
In an effort to discourage the routine practices of using research
design and ICR psychometric analyses to reduce all REI scales
to
measures of unidimensional constructs whether or not such
reduc-
tion is theory consistent, the focus of this article has been on
introducing counseling psychologists to some better practices
51. for
matching REI theoretical constructs and measures to the
theories’
implicit measurement models. Doing so should not be construed
as
an endorsement of the practice of giving preeminence to studies
of
the internal structure of REI measures rather than to other types
of
validity studies, even if methodologies that permit development
of
heterogeneous REI scales are used. Ultimately it does not
matter
how well REI scale items relate to each other if they do not help
explain complex behaviors beyond themselves (AERA et al.,
1999).
[The] complexity of psychosocial behavior may require tests to
be
heterogeneous, perhaps irreducibly so, to maintain their
reliability,
validity, and predictive utility. . .If a theory claims that an
entity has
multiple attributes, then the test measuring that entity should
measure
all relevant attributes. Therefore, tests must be heterogeneous.
The
meaningfulness of a test lies not in a methodological
prescription of
homogeneity but in the test’s ability to capture all relevant
attributes
of the entity it purports to measure. (Lucke, 2005, p. 66)
What psychosocial behaviors can be more complex than racial
identity and ethnic identity in the United States?
52. References
American Educational Research Association (AERA), American
Psycho-
logical Association and National Council on Measurement in
Education.
(1999). Standards for educational and psychological testing.
Washing-
ton, DC: AERA.
Bacon, D. R., Sauer, P. L., & Young, M. (1995). Composite
reliability in
structural equations modeling. Educational and Psychological
Measure-
ment, 55, 394 – 406.
Baron, H. (1996). Strengths and limitations of ipsative
measurement.
Journal of Occupational and Organizational Psychology, 69, 49
–56.
Behrens, J. T. (1997). Does the White Racial Identity Scale
measure racial
identity? Journal of Counseling Psychology, 44, 3–12.
Behrens, J. T., & Rowe, W. (1997). Measuring White racial
identity: A
reply to Helms (1997). Journal of Counseling Psychology, 44,
17–19.
Betancourt, H., & López, S. R. (1993). The study of culture,
ethnicity, and
race in American psychology. American Psychologist, 48, 629 –
637.
53. Branch, W. (1990). On interpreting correlation coefficients.
American
Psychologist, 45, 296.
Carter, M. M., Sbrocco, T., Lewis, E. L., & Friedman, E. K.
(2001).
Parental bonding and anxiety: Differences between African
American
and European American college students. Anxiety Disorders,
15, 555–
569.
Choney, S. K., & Rowe, W. (1994). Assessing White racial
identity: The
White Racial Consciousness Development Scale (WRCDS).
Journal of
Counseling & Development, 73, 102–104.
Claney, C., & Parker, W. M. (1989). Assessing White racial
consciousness
and perceived comfort with Black individuals: A preliminary
study.
Journal of Counseling & Development, 67, 449 – 451.
Cokley, K. O. (2005). Racial(ized) identity, ethnic identity, and
Africentric
values: Conceptual and methodological challenges in
understanding
African American identity. Journal of Counseling Psychology,
52, 517–
526.
Cortina, J. M. (1993). What is coefficient alpha? An
examination of theory
and applications. Journal of Applied Psychology, 78, 96 –104.
54. Cronbach, L. J. (1951). Coefficient alpha and the internal
structure of tests.
Psychometrika, 16, 297–334.
Dawis, R. V. (1987). Scale construction. Journal of Counseling
Psychol-
ogy, 34, 481– 489.
Erikson, E. H. (1968). Identity: Youth and crisis. New York:
Norton.
Fan, X., & Thompson, B. (2001). Confidence intervals about
score reli-
ability please: An EPM guidelines editorial. Educational and
Psycho-
logical Measurement, 61, 517–531.
Feldt, L. S., & Charter, R. A. (2003). Estimating the reliability
of a test split
into two parts of equal or unequal length. Psychological
Methods, 8,
102–109.
Ferketich, S. (1990). Focus on psychometrics: Internal
consistency esti-
mates of reliability. Research in Nursing & Health, 13, 437–
440.
Fischer, A. R., & Moradi, B. (2001). Racial and ethnic identity:
Recent
developments and needed directions. In J. G. Ponterotto, J. M.
Casas,
L. A. Suzuki, & C. M. Alexander (Eds.), Handbook of
multicultural
counseling (2nd ed., pp. 341–370). Thousand Oaks, CA: Sage.
55. Goodstein, R., & Ponterotto, J. G. (1997). Racial and ethnic
identity: Their
relationship and their contribution to self-esteem. Journal of
Black
Psychology, 23, 275–292.
Hattie, J. (1985). Methodology review: Assessing
unidimensionality of
tests and items. Applied Psychological Measurement, 9, 139 –
164.
Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor
retention
decisions in exploratory factor analysis: A tutorial on parallel
analysis.
Organizational Research Methods, 7, 191–205.
Helms, J. E. (1984). Toward a theoretical explanation of the
effects of race
on counseling: A Black and White model. The Counseling
Psychologist,
12, 153–165.
Helms, J. E. (Ed.). (1990). Black and White racial identity:
Theory,
research, and practice. Westport, CT: Greenwood Press.
Helms, J. E. (1999). Another meta-analysis of the White Racial
Identity
Attitudes Scale’s alphas: Implications for validity. Measurement
and
Evaluation in Counseling and Development, 32, 122–137.
Helms, J. E. (2005). Challenging some misuses of reliability
coefficients as
reflected in evaluations of the White Racial Identity Attitude
56. Scale
(WRIAS). In R. T. Carter (Ed.), Handbook of racial– cultural
psychol-
245SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
ogy and counseling: Theory and research (Vol. 1, pp. 360 –390).
New
York: Wiley.
Helms, J. E., & Carter, R. T. (1990). Development of the White
Racial
Identity Inventory. In J. E. Helms (Ed.), Black and White racial
identity:
Theory, research, and practice (pp. 67– 80). Westport, CT:
Greenwood
Press.
Helms, J. E., Henze, K., Sass, T., & Mifsud, V. (2006). Treating
Cron-
bach’s alpha reliability coefficients as data in counseling
research. The
Counseling Psychologist, 34, 630 – 660.
Helms, J. E., Jernigan, M., & Mascher, J. (2005). The meaning
of race in
psychology and how to change it. American Psychologist, 60,
27–36.
Jackson, G. G., & Kirschner, S. A. (1973). Racial self-
designation and
preference for a counselor. Journal of Counseling Psychology,
20,
57. 560 –564.
Johnson, C. E., Wood, R., & Blinkhorn, S. F. (1988). Spuriouser
and
spuriouser: The use of ipsative personality tests. Journal of
Occupa-
tional Psychology, 61, 153–162.
Johnson, S. C. (2004). The relation of racial identity, ethnic
identity, and
perceived racial discrimination among African Americans.
Unpublished
doctoral dissertation, University of Houston, Texas.
Kelly, S. (2004). Underlying components of scores assessing
African
Americans’ racial perspectives. Measurement and Evaluation in
Coun-
seling and Development, 37, 28 – 40.
Kim, J., & Mueller, C. W. (1978). Factor analysis: Statistical
methods and
practical issues. Newbury Park, CA: Sage.
Komaroff, E. (1997). Effect of simultaneous violations of
essential
�-equivalence and uncorrelated error on coefficient �. Applied
Psycho-
logical Measurement, 21, 337–348.
Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The
sources of four
commonly reported cutoff criteria: What did they really say?
Organiza-
tional Research Methods, 9, 202–220.
58. Lee, G., Dunbar, S. B., & Frisbie, D. A. (2001). The relative
appropriate-
ness of eight measurement models for analyzing scores from
tests
composed of testlets. Educational and Psychological
Measurement, 61,
958 –975.
Lucke, J. F. (2005). The � and the
of congeneric test theory: An
extension of reliability and internal consistency to
heterogeneous tests.
Applied Psychological Measurement, 29, 65– 81.
Mercer, S. H., & Cunningham, M. (2003). Racial identity in
White Amer-
ican college students: Issues of conceptualization and
measurement.
Journal of College Student Development, 44, 217–230.
Onwuegbuzie, A. J., & Daniel, L. G. (1999, November). Uses
and misuses
of the correlation coefficient. Paper presented at the annual
meeting of
the Mid-South Educational Research Association, Point Clear,
AL.
Owens, W. A. (1947). An empirical study of the relationship
between item
validity and internal consistency. Educational and Psychological
Mea-
surement, 7, 281–288.
Parham, T. A., & Helms, J. E. (1981). The influence of Black
students’
racial identity attitudes on preferences for counselor’s race.
59. Journal of
Counseling Psychology, 28, 250 –257.
Peterson, R. A. (1994). A meta-analysis of Cronbach’s
coefficient alpha.
Journal of Consumer Psychology, 21, 381–391.
Phan, T., & Tylka, T. L. (2006). Exploring a model and
moderators of
disordered eating with Asian American college women. Journal
of
Counseling Psychology, 53, 36 – 47.
Phelps, R. E., Taylor, J. D., & Gerard, P. A. (2001). Cultural
mistrust,
ethnic identity, racial identity, and self-esteem among ethnically
diverse
Black university students. Journal of Counseling &
Development, 79,
209 –216.
Phinney, J. S. (1990). Ethnic identity in adolescence and
adulthood: A
review of research. Psychological Bulletin, 108, 499 –514.
Phinney, J. S. (1992). The Multigroup Ethnic Group Measure: A
new scale
for use with diverse groups. Journal of Adolescent Research, 7,
156 –
176.
Phinney, J. S., & Alipuria, L. L. (1990). Ethnic identity in
college students
from four ethnic groups. Journal of Adolescence, 13, 171–183.
60. Raykov, T. (1997). Scale reliability, Cronbach’s coefficient
alpha, and
violations of essential tau-equivalence with fixed congeneric
compo-
nents. Multivariate Behavioral Research, 32, 329 –353.
Raykov, T. (1998). Coefficient alpha and composite reliability
with inter-
related nonhomogeneous items. Applied Psychological
Measurement,
22, 375–385.
Reese, L. E., Vera, E. M., & Paikoff, R. L. (1998). Ethnic
identity
assessment among inner-city African American children:
Evaluating the
applicability of the Multigroup Ethnic Identity Measure. Journal
of
Black Psychology, 24, 289 –304.
Roberts, R. E., Phinney, J. S., Masse, L. C., Chen, Y. R.,
Roberts, C. R.,
& Romero, A. (1999). The structure of ethnic identity of young
adoles-
cents from diverse ethnocultural groups. Journal of Early
Adolescence,
19, 301–322.
Rogers, W. M., Schmitt, N., & Mullins, M. E. (2002).
Correction for
unreliability of multifactor measures: Comparison of alpha and
parallel
forms approaches. Organizational Research Methods, 5, 184 –
199.
Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in
61. psychological
research: Lessons from 26 research scenarios. Psychological
Methods, 2,
199 –223.
Schmidt, F. L., & Hunter, J. E. (1999). Theory testing and
measurement
error. Intelligence, 27, 183–198.
Thompson, B. (1994). Guidelines for authors reporting score
reliability
estimates. Educational and Psychological Measurement, 54,
837– 847.
Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is
datametrics:
The test is not reliable. Educational and Psychological
Measurement,
60, 174 –195.
Utsey, S. O., & Gernat, C. A. (2002). White racial identity
attitudes and the
ego defense mechanisms used by counselor trainees in racially
provoc-
ative counseling situations. Journal of Counseling &
Development, 80,
475– 483.
Utsey, S. O., & Ponterotto, J. (1996). Development and
validation of the
Index of Race-Related Stress (IRRS). Journal of Counseling
Psychol-
ogy, 43, 490 –501.
Vacha-Haase, T. (1998). Reliability generalization: Exploring
variance in
62. measurement error affecting score reliability across studies.
Educational
and Psychological Measurement, 58, 6 –20.
Vacha-Haase, T., Kogan, L. R., & Thompson, B. (2000). Sample
compo-
sitions and variability in published studies versus those in test
manuals:
Validity of score reliability inductions. Educational and
Psychological
Measurement, 60, 509 – 622.
Yancey, A. K., Aneshensel, C. S., & Driscoll, A. K. (2001). The
assess-
ment of ethnic identity in a diverse urban youth population.
Journal of
Black Psychology, 27, 190 –208.
Received August 30, 2006
Revision received January 9, 2007
Accepted January 14, 2007 �
246 HELMS
Some Better Practices for Measuring Racial and Ethnic Identity
Constructs
Janet E. Helms
Boston College
Racial and ethnic identity (REI) measures are in danger of
63. becoming conceptually meaningless because
of evaluators’ insistence that they conform to measurement
models intended to assess unidimensional
constructs, rather than the multidimensional constructs
necessary to capture the complexity of internal-
ized racial or cultural socialization. Some aspects of the
intersection of REI theoretical constructs with
research design and psychometric practices are discussed, and
recommendations for more informed use
of each are provided. A table that summarizes some
psychometric techniques for analyzing multidimen-
sional measures is provided.
Keywords: racial identity, ethnic identity, reliability, validity,
factor analysis
In counseling psychology, the measurement of racial identity
constructs is a relatively new phenomenon. Arguably, the
practice
began when Jackson and Kirschner (1973) attempted to
introduce
complexity into the measurement of Black students’ racial
identity
by using a single categorical item with multiple options (e.g.,
“Black,” “Negro”) that the students could use to describe them-
selves. Helms and Parham (used in Parham & Helms, 1981) and
Helms and Carter (1990) built on the idea that assessment of
individual differences in racial identity is important, and they
added complexity to the measurement process by (a) developing
measures that were based on racial identity theoretical frame-
works, (b) using multiple items to assess the constructs inherent
to
the theories, and (c) asking participants to use continua (i.e.,
5-point Likert scales) rather than categories to self-describe.
These
principles underlie the Black Racial Identity Attitudes Scale
64. (BRIAS; formerly RIAS–B) and White Racial Identity Attitudes
Scale (WRIAS).
In response to perceived conceptual, methodological, or content
concerns with Helms and associates’ racial identity measures,
many rebuttal measures followed. Rebuttal measures are scales
that the new scale originator(s) specifically described as correc-
tions for one or more such deficiencies in preexisting identity
measures (e.g., Phinney, 1992, p. 157). Subsequent measures
have
tended to rely on the previously listed basic measurement
princi-
ples introduced by Parham and Helms (1981), although the theo-
retical rationales for the measures have varied. Phinney’s Multi-
group Ethnic Identity Measure (MEIM), the most frequently
used
of the rebuttal measures to date, added the principle of
measuring
“ethnic” rather than “racial identity,” which she seemingly
viewed
as interchangeable constructs. The MEIM also introduced the
principle of measuring the same identity constructs across racial
or
ethnic groups rather than group-specific constructs within them.
The BRIAS and WRIAS may be thought of as representative of
a class of identity measures in which opposing stages, statuses,
or
schemas are assessed, whereas the MEIM may be
conceptualized
as representative of a class of measures in which different
behav-
iors or attitudes are used to assess levels of commitment to a
single
group (i.e., one’s own). Consequently, these measures are used
65. as
exemplars of their classes in subsequent discussions. The two
classes of measures imply some similar as well as some
different
desirable practices with respect to research design,
measurement or
psychometrics, and interpretation that have not been addressed
in
the racial or ethnic identity literature heretofore. In fact,
virtually
no literature exists that focuses specifically on good practices
for
using or evaluating already developed theory-based racial or
ethnic
identity (REI) measures.
It is important to describe better practices for using already
developed REI scales to avoid oversimplifying essentially com-
plex measurement issues that are often inherent in REI
theoretical
constructs. The primary sources of my belief that a discussion
of
better practices is necessary are my experiences reviewing
manu-
scripts, submitting manuscripts, advising researchers, and being
fully engaged in REI research. Therefore, the purposes of this
article are to make explicit better practices for designing
research
and conducting psychometric analyses when using REI measures
to study identity constructs with new samples. I sometimes use
published studies to illustrate a practice or procedure; in most
instances, the studies were selected because their authors
reported
results in enough detail to permit the studies’ use for illustrative
purposes. More generally, the article is divided into two broad
sections, research design practices and psychometric practices.
66. The first section addresses conceptual issues pertinent to
research
design; the psychometric section addresses scale development
concerns.
Research Design Practices
The content of REI scales is intended to reflect standard
samples
of particular types of life experiences (racial vs. ethnic) as
postu-
lated by the relevant theory. A central empirical question with
respect to researchers’ use of REI scales is whether racial
identity
and ethnic identity scales measure the same constructs.
However,
the question cannot be adequately addressed if researchers do
not
use research design practices that are congruent with the
theoret-
Correspondence concerning this article should be addressed to
Janet E.
Helms, Department of Counseling, Developmental, and
Educational Psy-
chology, Boston College, 317 Campion Hall, Chestnut Hill, MA
02467.
E-mail: [email protected]
Journal of Counseling Psychology Copyright 2007 by the
American Psychological Association
2007, Vol. 54, No. 3, 235–246 0022-0167/07/$12.00 DOI:
10.1037/0022-0167.54.3.235
235
67. ical model(s) underlying each scale(s) under study. In this
section,
I (a) discuss some conceptual issues related to measuring racial
identity and ethnic identity as potentially different constructs,
(b)
discuss some poor practices that obscure differences if they
exist,
and (c) proffer some better practices.
Differentiating Racial Identity From Ethnic Identity
In REI research designs, if the researcher’s intent is to
substitute
one class of REI measures for the other, then it is important to
demonstrate that the two types of measures assess the same
racial
or ethnic constructs. Factors to consider are (a)
conceptualization
of the research question, (b) sample selection, (c) use of other
measures for assessing one type of identity rather than the
other,
and (d) comparability of validity evidence within and across
REI
measures.
Racial Identity Scales as Replacements for Racial
Categories
Racial groups or categories are not psychological constructs
because they do not connote any explicit behaviors, traits, or
biological or environmental conditions (Helms, Jernigan, &
Mascher, 2005). Instead racial categories are sociopolitical con-
structions that society uses to aggregate people on the basis of
ostensible biological characteristics (Helms, 1990). Because
racial
68. categories are null constructs, Helms et al. (2005) contended
that
they should not be used as the conceptual focus (e.g.,
independent
variables) for empirical studies but may be used to describe or
define samples or issues. Ascribed racial-group membership im-
plies different group-level racial socialization experiences that
vary according to whether the group is accorded advantaged or
disadvantaged status in society. The content of racial-identity
scales is individual group members’ internalization of the racial
socialization (e.g., discrimination, undeserved privileges) that
per-
tains to their group.
Ascribed racial group defines the type of life experiences to
which a person is exposed and that are available for
internalizing
(i.e., group oppression or privilege). For example, Black Ameri-
cans internalize different racial identities than White
Americans,
and, conversely, White Americans internalize different racial
iden-
tities than Black Americans. Also, the nature of the racial
identities
of Americans and immigrants or other nationals differs if they
have not experienced similar racial socialization during their
life-
times. Thus, racial identity theories are intended to describe
group-
specific development in particular sociopolitical contexts.
Racial identity measures are designed to assess the differential
impact of racial dynamics on individuals’ psychological
develop-
ment. One expects items in racial identity scales or inventories
to
69. include some mention of race, racial groups, or conditions that
commonly would be construed as racial in nature (e.g.,
discrimi-
nation or advantage on the basis of skin color). For example,
Helms and Carter’s (1990) WRIAS consists of five 10-item
scales,
each of which assesses the differential internalization of
societal
anti-Black racism on Whites’ identity development. Relevant
sam-
pling and measurement concerns are specifying samples and
mea-
sures for which race and racism in various forms are presumably
relevant constructs.
Ethnic Groups as Proxies for Theoretical Constructs
Ethnicity refers to the cultural practices (e.g., customs,
language,
values) of a group of people, but the group need not be the same
ascribed racial group. Betancourt and López (1993) use the term
ethnic group to connote membership in a self-identified kinship
group, defined by specific cultural values, language, and
traditions,
and that engages in transmission of the group’s culture to its
members. Ethnic identity refers to commitment to a cultural
group
and engagement in its cultural practices (e.g., culture, religion),
irrespective of racial ascriptions. Because ethnic groups imply
psychological culture-defined constructs, the constructs rather
than
the categories should be used as conceptual focuses of studies
(e.g., independent variables).
The content domain of ethnic identity measures is internalized
experiences of ethnic cultural socialization. Phinney and
70. associates
(Phinney, 1992; Phinney & Alipuria, 1990) initially developed
the
MEIM to assess adolescents’ search for and commitment to an
ethnic identity in a manner consistent with Erikson’s (1968)
mul-
tistage pychosocial identity theory and without regard to group-
specific cultural components. Originally, she conceptualized
eth-
nic identity as “a continuous variable [construct or scale],
ranging
from the lack of exploration and commitment . . . to evidence of
both exploration and commitment, reflected in efforts to learn
more about one’s background” (Phinney, 1992, p. 161). Her
con-
tinuous scale was composed of items assessing several
dimensions
of identity (e.g., ethnic behaviors, affirmation, and belonging);
hence, it was a multidimensional scale (Helms, Henze, Sass, &
Mifsud, 2006), with a focus on cultural characteristics that are
assumed to be relevant to individuals across ethnic groups.
Although the structure of the MEIM has varied, its underlying
conceptual theme is conformance to ethnic culture rather than
exposure to racism. The conceptual, sampling, and measurement
issues specific to ethnic identity measures pertain to identifying
participants who might be reasonably expected to engage in the
cultural practices of the ethnic cultural kinship group in
question
and ensuring that ethnic identity measures assess relevant
culture-
related rather than race-inspired psychological construct(s).
Selection and Use of Appropriate REI Measures
Researchers often use one type of REI measure (e.g., ethnic
71. identity) but provide a conceptual rationale for the other type
(e.g.,
racial identity) without empirical justification for doing so. Em-
pirical support for the interchangeability of identity constructs
and
measures would include evidence that (a) exemplars of the two
classes of measures are similarly related to the same external
racial
or cultural criteria or (b) one type of measure explains its own
as
well as the other theory’s external criteria best. Support for the
distinctiveness of constructs would be lack of support for inter-
changeability and evidence that other identity measures from
the
same class relate to each other in a logically consistent matter.
Empirical Comparisons of the MEIM and BRIAS as
Measures of REI Constructs
Researchers do not seem to consider whether their cultural or
racial outcome measures are theoretically congruent with the
type
of REI measure that they have selected. Consequently, lack of
236 HELMS
support for their hypotheses is attributed to deficient REI
measures
rather than possible incongruence between the researchers’ con-
ceptualization and measurement of REI constructs in their
research
designs. It is difficult to find a single study in which both
classes
of REI measures and racial and cultural outcome measures were
72. used. Yet for the purpose of illustrating the type of study
necessary
to support interchangeable use of REI measures, perhaps it is
reasonable to think that scores on racial identity measures, such
as
the BRIAS, should be related to scores on explicit measures of
racial constructs (e.g., perceived individual racism, institutional
racism), whereas scores on ethnic identity measures, such as the
MEIM, should be related to scores on explicit measures of
cultural
constructs (e.g., acculturation, cultural values). Confirmation of
each of these propositions would be evidence of construct expli-
cation in that each measure would be assessing constructs
germane
to it.
Johnson’s (2004) study provides sufficient psychometric sum-
mary statistics to permit illustration of the test for
interchangeabil-
ity of REI measures at least in part. His sample of Black college
students (N � 167) responded to the MEIM and the RIAS–B
(Parham & Helms, 1981), the earliest version of the BRIAS.
Table
1 summarizes alpha coefficients (rxx; in the last column) for the
REI measures, correlation coefficients between REI scores and
perceived discrimination scores, and the same correlation
coeffi-
cients corrected for disattenuation due to measurement error
(i.e.,
low alpha coefficients) attributable to MEIM or RIAS–B scores.
In
this example, the correction for attenuation may be interpreted
as
an estimate of the extent to which an REI and a discrimination
subscale measure the same underlying theoretical construct
when
73. the effects of REI error are eliminated (Schmidt & Hunter,
1996,
1999).
The dependent measures in the study, assessed by the Index of
Race Related Stress (Utsey & Ponterotto, 1996), were three
types
of racism: (a) cultural—belief in the superiority of one’s own
culture (e.g., values, traditions); (b) institutional—
discrimination
in social policies and practices; and (c) individual—personally
experienced acts of discrimination. The dependent constructs
favor
the RIAS–B rather than the MEIM, as is evident in the table. If
alpha is the correct reliability estimate and the correlations
were
calculated on untransformed scores, then the correlations, cor-
rected for attenuation attributable to REI measurement error,
sug-
gest even stronger relations between the racial-identity
constructs
and perceived discrimination than the ethnic-identity constructs.
The best correlation for MEIM scores is cultural racism, which
is
theory consistent and suggests that a more full blown cultural
measure might have favored it.
Alternative Measures of the Same Construct(s)
The question of whether scores on different measures of the
same theoretical constructs are related to scores on the original
measures is a matter of seeking evidence of convergent validity,
a
type of construct validity. Researchers seemingly have not
devel-
74. oped alternative measures of the same constructs postulated in
Phinney’s (1990, 1992) theoretical perspective, and only one set
of
researchers (Claney & Parker, 1989) has developed independent
measures of the theoretical constructs of Helms’s (1984, 1990)
White racial identity model. Choney and Rowe (1994)
conducted
an evaluation of Claney and Parker’s (1989) 15-item White
Racial
Consciousness Development Scale (WRCDS), a measure of
Helms’s (1984) stages of White racial identity development that
predated her own measure, the WRIAS. It is worth examining
the
study for what it can reveal about good practices in empirical
investigation of construct validity of scores on REI measures.
The expressed purpose of Choney and Rowe’s (1994) study was
to investigate “how the WRCDS compares with the RIAS–W
[sic],
the current instrument of choice for investigations of White
racial
identity” (p. 102). Yet their conclusion that “it seems
reasonable to
conclude that the WRCDS is not capable of adequately
assessing
the stages of White identity proposed by Helms (1984)”
(Choney
& Rowe, 1994, p. 104) suggests that an unspoken purpose may
have been to examine scores on the two scales for evidence of
convergent validity.
In Table 2, Cronbach’s alpha reliability coefficients for the
WRCDS and WRIAS scales are summarized in the last two
columns, and correlations between parallel subscales, adapted
from Choney and Rowe (1994, p. 103), are shown in the second
column. The authors did not report a full correlation matrix for
75. the
Table 1
Comparing Obtained and Corrected Correlations Between MEIM
and RIAS-B Scores and Perceived Racism
Scale
Perceived discrimination
Alpha
Cultural Institutional Individual
Obtained Corrected Obtained Corrected Obtained Corrected
MEIM .16 .17 �.01 �.01 .08 .09 .87
Racial
Preencounter �.01 �.01 .39 .47 .07 .08 .69
Encounter .36 .50 .25 .35 .28 .39 .51
Immersion .33 .41 .29 .36 .22 .27 .65
Internalization .37 .43 .13 .15 .35 .47 .75
Note. From The Relation of Racial Identity, Ethnic Identity, and
Perceived Racial Discrimination Among African Americans
Tables 1 (p. 18) and 6 (p.
33) by S. C. Johnson, 2004, Unpublished doctoral dissertation,
University of Houston, Texas. “Obtained” were correlations
reported by Johnson.
“Corrected” are estimates of correlations with measurement
error removed. Only measurement error for the MEIM or racial
identity subscales (i.e., alpha)
were used to correct for disattenuation of correlations
attributable to measurement error. The correction for
attenuation used was rxy� � rxy/rxx
76. .5, where
rxy� equals the corrected correlation, rxy equals the obtained
correlation, and rxx
.5 equals the square root of the reliability coefficient for the
relevant MEIM
or RIAS-B subscales. MEIM � Multigroup Ethnic Identity
Measure; RIAS-B � Black Racial Identity Attitudes Scale.
237SPECIAL SECTION: RACIAL IDENTITY
MEASUREMENT
two measures, and so it is not possible to examine within-
measure
patterns of correlations. Table 2 also includes correlations cor-
rected for measurement error in each of the REI measures using
each scale’s reported Cronbach’s alpha coefficient.
The corrected correlations for these data suggest that the two
measures assessed the parallel constructs of Reintegration,
Auton-
omy, and Pseudo-Independence quite strongly and the
constructs
of Contact and Disintegration much more strongly than mere
examination of the obtained correlations would suggest, thereby
refuting Choney and Rowe’s assertion that the WRCDS was
“incapable” of assessing Helms’s constructs. However, the fact
that the corrected Reintegration correlation coefficient exceeded
1.00 suggests either that Choney and Rowe’s original
correlations
were downwardly biased by sampling error or that alpha coeffi-
cients were not the appropriate reliability estimates for their
data.