Journal of Personality and Social Psychology
1988, Vol, 54, No. 6, 1063-1070
Copyright 1988 by the American Psychological Association, Inc.
0022-3514/88/Í00.75
Th
is
do
cu
m
en
t i
s c
op
yr
ig
ht
ed
b
y
th
e
A
m
er
ic
an
P
sy
ch
ol
og
ic
al
A
ss
oc
ia
tio
n
or
o
ne
o
f i
ts
al
lie
d
pu
bl
ish
er
s.
Th
is
ar
tic
le
is
in
te
nd
ed
so
le
ly
fo
r t
he
p
er
so
na
l u
se
o
f t
he
in
di
vi
du
al
u
se
r a
nd
is
n
ot
to
b
e
di
ss
em
in
at
ed
b
ro
ad
ly
.
Development and Validation of Brief Measures of Positive
and Negative Affect: The PANAS Scales
David Watson and Lee Anna Clark
Southern Methodist University
Auke Tellegen
University of Minnesota
in recent studies of the structure of affect, positive and negative affect have consistently emerged as
two dominant and relatively independent dimensions. A number of mood scales have been created
to measure these factors; however, many existing measures are inadequate, showing low reliability
or poor convergent or discriminant validity. To fill the need for reliable and valid Positive Affect and
Negative Affect scales that are also brief and easy to administer, we developed two 10-item mood
scales that comprise the Positive and Negative Affect Schedule (PANAS). The scales are shown to be
highly internally consistent, largely uncorrelated, and stable at appropriate levels over a 2-month
time period. Normative data and factorial and external evidence of convergent and discriminant
validity for the scales are also presented.
Two dominant dimensions consistently emerge in studies of
affective structure, both in the United States and in a number
of other cultures. They appear as the first two factors in factor
analyses of self-rated mood and as the first two dimensions in
multidimensional scalings of facial expressions or mood terms
(Diener, Larsen, Levine, & Emmons, 1985; Russell, 1980,
1983; Stone, 1981; Watson, Clark, & Tellegen, 1984; Zevon &
Tellegen, 1982).
Watson and Tellegen (1985) have summarized the relevant
evidence and presented a basic, consensual two-factor model.
Whereas some investigators work with the unrotated dimen
sions (typically labeled pleasantness-unpleasantness and
arousal), the varimax-rotated factors—usually called Positive
Affect and Negative Affect—have been used more extensively in
the self-report mood literature; they are the focus of this article.
Although the terms Positive Affect and Negative Affect might
suggest that these two mood factors are opposites (that is,
strongly negatively correlated), they have in fact emerged as
highly distinctive dimensions that can be meaningfully repre
sented as orthogonal dimensions in factor analytic studies of
affect.
Briefly, Positive Affect (PA) reflects the extent to which a per
son feels enthusiastic, active, and alert. High PA is a state of
high energy, full concentration, and pleasurable engagement,
whereas low PA is chara ...
Measures of Central Tendency: Mean, Median and Mode
Journal of Personality and Social Psychology 1988, Vol, 54,
1. Journal of Personality and Social Psychology
1988, Vol, 54, No. 6, 1063-1070
Copyright 1988 by the American Psychological Association,
Inc.
0022-3514/88/Í00.75
Th
is
do
cu
m
en
t i
s c
op
yr
ig
ht
ed
b
y
th
e
5. di
ss
em
in
at
ed
b
ro
ad
ly
.
Development and Validation of Brief Measures of Positive
and Negative Affect: The PANAS Scales
David Watson and Lee Anna Clark
Southern Methodist University
Auke Tellegen
University of Minnesota
in recent studies of the structure of affect, positive and negative
affect have consistently emerged as
two dominant and relatively independent dimensions. A number
of mood scales have been created
to measure these factors; however, many existing measures are
inadequate, showing low reliability
or poor convergent or discriminant validity. To fill the need for
reliable and valid Positive Affect and
Negative Affect scales that are also brief and easy to
administer, we developed two 10-item mood
6. scales that comprise the Positive and Negative Affect Schedule
(PANAS). The scales are shown to be
highly internally consistent, largely uncorrelated, and stable at
appropriate levels over a 2-month
time period. Normative data and factorial and external evidence
of convergent and discriminant
validity for the scales are also presented.
Two dominant dimensions consistently emerge in studies of
affective structure, both in the United States and in a number
of other cultures. They appear as the first two factors in factor
analyses of self-rated mood and as the first two dimensions in
multidimensional scalings of facial expressions or mood terms
(Diener, Larsen, Levine, & Emmons, 1985; Russell, 1980,
1983; Stone, 1981; Watson, Clark, & Tellegen, 1984; Zevon &
Tellegen, 1982).
Watson and Tellegen (1985) have summarized the relevant
evidence and presented a basic, consensual two-factor model.
Whereas some investigators work with the unrotated dimen-
sions (typically labeled pleasantness-unpleasantness and
arousal), the varimax-rotated factors—usually called Positive
Affect and Negative Affect—have been used more extensively
in
the self-report mood literature; they are the focus of this article.
Although the terms Positive Affect and Negative Affect might
suggest that these two mood factors are opposites (that is,
strongly negatively correlated), they have in fact emerged as
highly distinctive dimensions that can be meaningfully repre-
sented as orthogonal dimensions in factor analytic studies of
affect.
Briefly, Positive Affect (PA) reflects the extent to which a per-
son feels enthusiastic, active, and alert. High PA is a state of
high energy, full concentration, and pleasurable engagement,
whereas low PA is characterized by sadness and lethargy. In
7. con-
trast, Negative Affect (NA) is a general dimension of subjective
distress and unpleasurable engagement that subsumes a variety
of aversive mood states, including anger, contempt, disgust,
guilt, fear, and nervousness, with low NA being a state of calm-
We wish to thank Lisa Binz, Sondra Brumbelow, Richard Cole,
Mary
Dieffenwierth, Robert Folger, Jay Leeka, Curt McIntyre, James
Pen-
nebaker, and Karen Schneider for their help in collecting the
data re-
ported in this article.
Correspondence should be addressed to David Watson,
Department
of Psychology, Southern Methodist University, Dallas, Texas,
75275.
ness and serenity. These two factors represent affective state di-
mensions, but Tellegen (1985; see also Watson & Clark, 1984)
has demonstrated that they are related to corresponding affec-
tive trait dimensions of positive and negative emotionality
(indi-
vidual differences in positive and negative emotional
reactivity).
Trait PA and NA roughly correspond to the dominant personal-
ity factors of extraversion and anxiety/neuroticism, respectively
(Tellegen, 1985; Watson & Clark, 1984). Drawing on these and
other findings, Tellegen has linked trait NA and PA,
respectively,
to psychobiological and psychodynamic constructs of sensitiv-
ity to signals of reward and punishment. He has also suggested
that low PA and high NA (both state and trait) are major distin-
guishing features of depression and anxiety, respectively (Tel-
legen, 1985; see also Hall, 1977).
8. Numerous PA and NA scales have been developed and stud-
ied in a variety of research areas. Generally speaking, the find-
ings from these studies indicate that the two mood factors relate
to different classes of variables. NA—but not PA—is related to
self-reported stress and (poor) coping (Clark & Watson, 1986;
Kanner, Coyne, Schaefer, & Lazarus, 1981; Wills, 1986), health
complaints (Beiser, 1974; Bradburn, 1969; Tessier & Mechanic,
1978; Watson & Pennebaker, in press), and frequency of un-
pleasant events (Stone, 1981; Warr, Barter, & Brownbridge,
1983). In contrast, PA—but not NA—is related to social activ-
ity and satisfaction and to the frequency of pleasant events
(Beiser, 1974; Bradburn, 1969; Clark & Watson, 1986, 1988;
Watson, 1988).
Anomalous and inconsistent findings have also been re-
ported, however. For example, whereas most studies have found
these NA and PA scales to have low or nonsignificant corre-
lations with one another (e.g., Clark & Watson, 1986, 1988;
Harding, 1982; Moriwaki, 1974; Warr, 1978; Wills, 1986), oth-
ers have found them to be substantially related (Brenner, 1975;
Diener & Emmons, 1984; Kammann, Christie, Irwin, &
Dixon, 1979). There are many possible explanations for such
inconsistencies (e.g., see Diener & Emmons, 1984), but one that
must be considered concerns the various scales themselves. It
1063
Th
is
do
cu
13. 1064 D. WATSON, L. CLARK,
may be, for example, that some scales are simply better, purer
measures of the underlying factors than are others. Watson (in
press) reported evidence supporting this idea. He found that
some scale pairs (such as those used by Diener and his
associates
in a number of studies; e.g., Diener & Emmons, 1984; Diener &
Iran-Nejad, 1986; Diener et al., 1985) yield consistently higher
NA-PA correlations than do others (such as our own scales, to
be described shortly).
More generally, one must question the reliability and validity
of many of these measures. Some mood scales have been devel-
oped through factor analysis (e.g., Stone, 1981), but others have
been constructed on a purely ad hoc basis with no supporting
reliability or validity data (e.g., McAdams & Constantian,
1983). Watson (in press) analyzed the psychometric properties
of several popular measures and found many of them to be
wanting, at least for use in student populations. For example,
Bradburn’s (1969) widely used NA and PA scales were unreli
able (coefficient « = .52 for NA, .54 for PA) and only moder-
ately related to other measures of the same factor (for NA, the
convergent correlations ranged from .39 to .52; for PA, they
ranged from .41 to .53). The short PA and NA scales used by
Stone and his colleagues (Hedges, Jandorf, & Stone, 1985;
Stone, 1987; Stone, Hedges, Neale, & Satin, 1985) were also
unreliable (in two samples, the NA scale had coefficient as of
.48 and .52, whereas the PA scale had corresponding values of
.64 and .70).
Clearly there is a need for reliable and valid PA and NA scales
that are also brief and easy to administer. In this article we de-
scribe the development of such scales, the 10-item NA and PA
scales that comprise the Positive and Negative Affect Schedule
(PANAS), and present reliability and validity evidence to sup-
14. port their use.
Development of the PANAS Scales
Much of our previous mood research has been concerned
with identifying these dominant dimensions of affect and clari-
fying their nature (Clark & Watson, 1986, 1988; Tellegen, 1985;
Watson, in press; Watson & Clark, 1984; Watson et al., 1984;
Watson & Tellegen, 1985; Zevon & Tellegen, 1982). To have a
broad and representative sample of mood descriptors, we have
used questionnaires that contained a large number (57-65) of
mood terms. Once the basic NA and PA factors were clearly
identified, however, we wanted to measure them more simply
and economically. We therefore turned our attention to the de-
velopment of brief PA and NA scales.
Our greatest concern was to select terms that were relatively
pure markers of either PA or NA; that is, terms that had a sub-
stantial loading on one factor but a near-zero loading on the
other. As a starting point, we used the 60 terms included in the
factor analyses reported by Zevon and Tellegen (1982). This
sample of descriptors was constructed by selecting three terms
from each of 20 content categories; for example, the terms
guilty, ashamed, and blameworthy comprise the guilty category
(see Zevon & Tellegen, 1982, Table 1 ). The categories were
iden-
tified through a principal-components analysis of content sort-
ings of a large sample of descriptors and provide a comprehen-
sive sample of the affective lexicon.
From this list we selected those terms that had an average
loading of .40 or greater on the relevant factor across both the
AND A. TELLEGEN
R- and P-analyses reported in Zevon & Tellegen ( 1982).
15. Twenty
PA markers and 30 NA markers met this initial criterion. How-
ever, as noted previously, we were also concerned that the terms
not have strong secondary loadings on the other factor. We
therefore specified that a term could not have a secondary load-
ing of |.25| or greater in either analysis. This reduced the pool
of candidate descriptors to 12 for PA and 25 for NA.
Preliminary reliability analyses convinced us that 10 terms
were sufficient for the PANAS PA scale; we therefore dropped
2
terms (delightedand healthy) that had relatively high secondary
loadings on NA. This yielded the final list of 10 descriptors for
the PA scale: attentive, interested, alert, excited, enthusiastic,
inspired, proud, determined, strong and active.
The 25 NA candidate terms included all 3 terms from seven
of the content categories (distressed, angry, contempt, revul-
sion, fearful, guilty, and jittery) and 2 from each of two others
(rejected and angry at self). Because we wanted to tap a broad
range of content, we constructed a preliminary 14-item scale
that included 2 terms from each of the seven complete triads.
We found, however, that the contempt and revulsion terms did
not significantly enhance the reliability and validity of the
scale.
Moreover, these terms were less salient to our subjects and were
occasionally left unanswered. We therefore settled on a final
10-
item version that consisted of 2 terms from each of the other
five
triads: distressed, upset (distressed); hostile, irritable (angry);
scared, afraid (fearful); ashamed, guilty (guilty); and nervous,
jittery (jittery). The final version of PANAS is given in the Ap-
pendix.
Reliability and Validity of the PANAS Scales
16. Subjects and Measures
The basic psychometric data were gathered primarily from
undergraduates enrolled in various psychology courses at
Southern Methodist University (SMU), a private southwestern
university. The students participated in return for extra course
credit. In addition, groups of SMU employees completed ques-
tionnaires asking how they felt “during the past few weeks”
{n = 164) and “during the past few days” (n = 50). A sample of
53 adults not affiliated with SMU also filled out a mood form
with “today” time instructions. Preliminary analyses revealed
no systematic differences between student and nonstudent re-
sponses, and they have been combined in all analyses.
Neverthe-
less, because most of our data were collected from college stu-
dents, it is important to establish that the PANAS scales also
work reasonably well in adult and clinical samples. We briefly
address this issue in a later section.
The mood questionnaire consisted of a single раде with the
60 Zevon and Tellegen (1982) descriptors arrayed in various
orders. The subjects were asked to rate on a 5-point scale the
extent to which they had experienced each mood state during
a specified time frame. The points of the scale were labeled
very
slightly or not at all, a little, moderately, quite a bit, and very
much, respectively. The PANAS terms were randomly distrib-
uted throughout the questionnaire. It is important to note that
we have since used the 20 PANAS descriptors without these ad-
ditional terms and obtained essentially identical results (Clark
& Watson, 1986; Watson, 1988).
We obtained ratings with seven different temporal instruc-
17. DEVELOPMENT AND VALIDATION OF THE PANAS
SCALES 1065
Th
is
do
cu
m
en
t i
s c
op
yr
ig
ht
ed
b
y
th
e
A
m
er
ic
an
21. ed
b
ro
ad
ly
.
Table 1
Positive and Negative Affect Schedule (PANAS) Scale Means
and Standard Deviations for Each Rated Time Frame
Time
instructions n
PANAS PA
Scale
PANAS NA
Scale
Μ SD Μ SD
Moment 660 29.7 7.9 14.8 5.4
Today 657 29.1 8.3 16.3 6.4
Past few days 1,002 33.3 7.2 17.4 6.2
Past few weeks 586 32.0 7.0 19.5 7.0
Year 649 36.2 6.3 22.1 6.4
General 663 35.0 6.4 18.1 5.9
Note. PA = Positive Affect. NA = Negative Affect.
tions. Subjects were asked to rate how they felt (a) “right now
(that is, at the present moment)” {moment instructions); (b)
22. “today” (today); (c) “during the past few days” (past few days);
(d) “during the past week” (week); (e) “during the past few
weeks” (past few weeks); (f) “during the past year” (year); and
(g) “in general, that is, on the average” (general). For six of
these
time frames, we collected data on large samples to be used for
normative, internal consistency, and factor analyses. The ns are
660 (moment), 657 (today), 1,002 (past few days), 586 (past few
weeks), 649 (year), and 663 (general). These samples are
largely
but not completely independent: Some subjects completed
mood forms involving two or more different temporal instruc-
tions; such multiple ratings were always spaced at least 1 week
apart. In addition, a subset of these subjects (n - 101) com-
pleted ratings on all seven time frames on two different occa-
sions, providing retest data.
Normative and Reliability Data
Basic scale data. Table 1 presents basic descriptive data on
the PANAS PA and NA scales for the various time instructions.
Given the large sample sizes, these provide reasonably good
col-
lege student norms. In our data, we have not found any large or
consistent sex differences, so the data are collapsed across sex.
Nevertheless, it seems advisable to test for sex differences in
any
new (especially nonstudent) sample.
Inspecting Table 1, one sees that subjects report more PA
than NA, regardless of the time frame. Moreover, mean scores
on both scales tend to increase as the rated time frame length-
ens. This pattern is expectable: As the rated time period in-
creases, the probability that a subject will have experienced a
significant amount of a given affect also increases.
23. The PANAS scale intercorrelations and internal consistency
reliabilities (Cronbach’s coefficient a) are reported in Table 2.
The alpha reliabilities are all acceptably high, ranging from .86
to .90 for PA and from .84 to .87 for NA. The reliability of the
scales is clearly unaffected by the time instructions used.
The correlation between the NA and PA scales is invariably
low, ranging from —.12 to —.23; thus, the two scales share ap-
proximately 1 % to 5% of their variance. These discriminant
val-
ues indicate quasi-independence, an attractive feature for many
purposes, and are substantially lower than those of many other
short PA and NA scales (see Watson, in press). Interestingly,
our PA-NA correlation was unaffected by the rated time frame,
whereas Diener and Emmons (1984) found that the correlation
between their PA and NA scales decreased as the rated time
frame lengthened. However, this discrepancy is beyond the
scope of our article; see Watson (in press) for a detailed discus-
sion of the effects of different temporal instructions on various
mood scales.
Test-relest reliability. As noted previously, 101 SMU under-
graduates filled out PANAS ratings for each of the seven time
frames on two different occasions. The mood ratings were col-
lected at weekly intervals. The first set of ratings was collected
during Weeks 1-7 of the fall 1986 semester in the following or-
der: year, past few days, today, past few weeks, general,
moment,
and week. Then, following a 1-week break, the PANAS scales
were readministered during Weeks 9-15 in the same sequence.
Thus, each scale was retested after an 8-week interval.
These reliability data are shown in Table 3. The NA and PA
stability values were first compared at each rated time frame
and no significant differences were found (p > .05, 2-tailed t
24. test). Multiple comparisons were then made across the time
frames for each affect separately (p < .002, Bonferroni cor-
rected for 21 comparisons). Not surprisingly, the retest stability
tends to increase as the rated time frame lengthens. Ratings of
longer time periods, such as how one has felt during the past
few weeks or the past year, are implicit aggregations. In a
sense,
subjects average their responses over a longer time frame and
hence over more occasions. Thus, these data replicate the fre-
quent finding that stability rises with increasing temporal ag-
gregation (e.g., Diener & Larsen, 1984; Epstein, 1979). The sta-
bility coefficients of the general ratings are high enough to sug-
gest that they may in fact be used as trait measures of affect.
It is also noteworthy that the PANAS scales exhibit a signifi-
cant level of stability in every time frame, even in the moment
ratings. These results are also consistent with earlier findings
(e.g., Watson & Clark, 1984, Table 8) and reflect the strong dis-
positional component of affect. That is, even momentary
moods are, to a certain extent, reflections of one’s general
affective level (Costa & McCrae, 1980; Watson & Clark, 1984).
Generalizability to nonstudent samples. Our largest nonstu-
dent sample consisted of 164 SMU employees who rated how
they had felt during the past few weeks. A separate analysis of
this sample yielded results comparable with the values listed in
Alpha reliabilities
Table 2
Internal Consistency Reliabilities (Coefficient A Ipha)
and Scale Intercorrelations
Time
instructions n
25. PANAS PA
scale
PANAS NA
scale
PA-NA
intercor-
relation
Moment 660 .89 .85 -.15
Today 657 .90 .87 -.12
Past few days 1,002 .88 .85 -.22
Past few weeks 586 .87 .87 -.22
Year 649 .86 .84 -.23
General 663 .88 .87 -.17
Note. PANAS = Positive and Negative Affect Schedule. PA =
Positive
Affect. NA = Negative Affect.
1066 D. WATSON, L. CLARK, AND A. TELLEGEN
Tesi-Retest Reiiabilities of the Positive and Negative
Table 3
Affect Schedule (PANAS) Scales (8- Week Retest Interval)
Time
instructions
PANAS PA
scale
26. PANAS NA
scale
Moment .54Bb .45”
Today ,47b .39b
Past few days .48” .42b
Past week .47” .47”
Past few weeks ,58“b ,48b
Year ,63“b .60“b
General .68“ .71“
Note. n = 101. Coefficients not sharing the same superscript are
differ-
ent at p < .05 (two-tailed, Bonferroni corrected for multiple
compari-
sons). PA = Positive Affect. NA = Negative Affect.
Significance tests are
computed separately for each scale. See text for further details.
Th
is
do
cu
m
en
t i
s c
30. se
r a
nd
is
n
ot
to
b
e
di
ss
em
in
at
ed
b
ro
ad
ly
.
Table 2. Specifically, the alpha reliabilities of the PANAS PA
and NA scales were .86 and .87, respectively, and the
correlation
between the scales was —.09. Given these data, we believe that
the PANAS scales will provide useful information in adult sam-
ples as well, although further data are desirable to establish this
31. fully.
We have also collected data on a small (n = 61) psychiatric
inpatient sample using the general instructions. Again, the PA-
NAS scales were reliable (for PA, a = .85; for NA, a = .91) and
only moderately intercorrelated with one another (r = -.27).
Given the small sample size, these data cannot be considered
definitive, but they are encouraging and suggest that the
PANAS
scales retain their reliability and quasi-independence in clinical
samples. In addition, all but four of the patients retook the mea-
sure after a 1-week interval, and the resulting stability analyses
yielded high test-retest reliabilities: .81 for NA and .79 for PA.
Finally, consistent with previous studies (Watson & Clark,
1984), we found significant group differences for NA, with the
patients considerably higher (Μ = 26.6) and more variable
(SD = 9.2) than the normative group (M = 18.1, SD = 5.9; see
Table 1). The corresponding differences for PA (patient group
Л/ = 32.5, SD = 7.5; normative group Μ = 35.0, SD = 6.4)
were also statistically significant because of the very large n of
the normative sample, but it would be premature to accept a
mean scale difference of 2.5 points as clinically meaningful
without further study.
Factorial Validity
Scale validity. An important step in evaluating the PANAS
scales is to demonstrate that they adequately capture the under-
lying mood factors. To do this, we subjected ratings on the 60
Zevon and Tellegen (1982) mood descriptors in each of the six
large data sets to a principal factor analysis with squared multi-
ple correlations as the communality estimates. Two dominant
factors emerged in each solution. Together, they accounted for
roughly two thirds of the common variance, ranging from
62.8% in the moment solution to 68.7% in the general ratings.
The first two factors in each solution were then rotated to or-
32. thogonal simple structure according to the varimax criterion.
Each of the six solutions generated two sets of factor scoring
weights that can be used to compute regression estimates of the
underlying PA and NA factors in those data. Within each data
set, we then correlated these estimated factor scores with the
PANAS PA and NA scales. The results, shown in Table 4, dem-
onstrate the expected convergent/discriminant pattern: Both
PANAS scales are very highly correlated with their correspond-
ing regression-based factor scores in each solution, with conver-
gent correlations ranging from .89 to .95, whereas the discrimi-
nant correlations are quite low, ranging from -.02 to -. 18.
Item validity. It is also important to demonstrate the factorial
validity of the individual PANAS items. To do this, we factored
subjects’ ratings on the 20 PANAS descriptors in each of the
six data sets; as before, we used a principal factor analysi s with
squared multiple correlations as the initial communality esti-
mates. Because the PANAS terms were selected to be relatively
pure factor markers, it is not surprising that two dimensions
accounted for virtually all of the common variance in these so-
lutions (ranging from 87.4% in the moment data to 96.1 % in
the general ratings).
Median varimax loadings for the PANAS terms on these two
factors are presented in Table 5. All of the descriptors have
strong primary loadings (.50 and above) on the appropriate fac-
tor, and the secondary loadings are all acceptably low. Thus, all
of the PANAS items are good markers of their corresponding
factors.
Rating scale effects. The data shown in Tables 1 through 5
are all based on the same 5-point rating scale. Because the sub-
jects were instructed to rate the extent to which they experi-
enced each mood state, this may be termed an extent format. It
33. seems reasonable to ask, however, whether different response
formats might yield different results. Warr et al. (1983) have
presented data indicating that the correlation between PA and
NA scales varies according to the response scale used. Specifi-
cally, their PA and NA scales were highly correlated when they
used a frequency-type format in which subjects rated the pro-
portion of time they had experienced each mood state during a
specified time period.
To test the effect of rating format, we collected ratings on 54
mood terms in two different student samples, both using past
few weeks time instructions. In the first sample, 413 subjects
rated their mood using the usual extent rating format. In the
second, 338 students rated themselves on a 4-point frequency
Table 4
Correlations Between the Positive and Negative Affect Schedule
(PANAS) Scales and Scores of the First Two
Varimax Factors in Each Sample
Time
instructions n
PANAS PA scale
correlations
PANAS NA scale
correlations
Factor 1 Factor 2 Factor 1 Factor 2
Moment 660 -.02 .95 .91 -.15
Today 657 -.02 .95 .93 -.11
Past few days 1,002 -.15 .92 .93 -.10
Past few weeks 586 -.10 .92 .92 -.18
Year 649 -.17 .89 .93 -.09
34. General 663 -.08 .94 .93 -.12
Note. Factor analyses are based on the set of 60 mood terms
reported
in Zevon & Tellegen ( 1982). PA = Positive Affect. NA =
Negative Affect.
DEVELOPMENT AND VALIDATION OF THE PANAS
SCALES 1067
Th
is
do
cu
m
en
t i
s c
op
yr
ig
ht
ed
b
y
th
40. Hostile -.07 .52
format (the options were little or none of the time, some of the
time, a good part of the time, and most ofthe time).
In addition to the PANAS terms, the mood descriptors used
in these samples allowed us to compare the factorial validity of
our scales with those of other investigators. In both samples, we
were able to measure the brief NA and PA scales developed by
Diener and Emmons (1984, Studies 3 through 5; see also Diener
& Iran-Nejad, 1986; Diener & Larsen, 1984; Diener et al.,
1985), Stone and his associates (Hedges et al., 1985; Stone,
1987; Stone et al., 1985), and McAdams and Constantian
(1983). Further, in the extent sample, 301 subjects rated them-
selves on Bradbum’s (1969) widely used NA and PA scales;
these were replaced by Warr et al.’s (1983) revised measures in
the frequency sample.
The ratings in each sample were subjected to separate princi-
pal factor analyses with squared multiple correlations in the di-
agonal (these analyses are reported in detail in Watson, in
press). Two large factors emerged in each solution, accounti ng
for 75.4% and 73.3% of the common variance in the extent and
frequency data, respectively. The first two factors in each solu-
tion were therefore rotated using varimax.
41. Table 6 presents correlations between the various mood
scales and regression estimates of these factors. Considering
first
the PANAS scales, Table 6 demonstrates that they have excel-
lent factorial validity even when a frequency response format is
used: In both samples the convergent correlations are above .90
and the discriminant coefficients are all low. Thus, while we
prefer an extent-type rating scale, other response formats can
be used without diminishing the factorial validity of the scales.
Table 6 also demonstrates that the PANAS scales compare
favorably with other brief affect measures. With the exception
of the Bradburn scales, all of the mood scales have good
conver-
gent correlations (i.e., .76 to .92) with the appropriate factor,
but none are higher than the corresponding values for the PA-
NAS scales. Thus, in terms of convergent validity, most of these
scales are reasonable approximations of the underlying factors,
although some are clearly more precise representations than
others. The discriminant correlations vary widely, however, es-
pecially in the frequency-format data, where many of the co-
efficients exceed —.30; across both samples, only the PANAS
scales have discriminant correlations consistently under -.20.
42. Overall, the PANAS scales offer the clearest convergent/dis -
criminant pattern of any pair.
In summary, the PANAS scales provide reliable, precise, and
largely independent measures of Positive Affect and Negative
Affect, regardless of the subject population studied or the time
frame and response format used.
External Validity
Correlations with measures of distress and psychopathology.
It is also interesting to examine correlations between the PA-
NAS scales and measures of related constructs, such as state
anxiety, depression, and general psychological distress (for an
extended discussion of how Positive and Negative Affect relate
to anxiety, depression, and general psychological dysfunction,
see Tellegen, 1985; Watson & Clark, 1984). We have used the
PANAS scales in conjunction with a number of other com-
monly used measures and report here on three of them: the
Hopkins Symptom Checklist (HSCL; Derogatis, Lipman, Rick-
Table 6
Correlations Between Various Positive Affect (PA) and
Negative
Affect (NA) Mood Scales and the Factor Scores From the
51. Schedule (PANAS) Scales and the Hopkins Symptom
Checklist (HSCL), Beck Depression Inventory (BDI),
and STAI State Anxiety Scale (A-State)
Measure and PANAS
time instructions n
Correlations with
PANAS NA PANAS PA
HSCL
Past few weeks 398 .74 -.19
Today® 53 .65 -.29
BDI
Past few days 880 .56 -.35
Past few weeks 208 .58 -.36
A-State
Past few weeks 203 .51 -.35
Note. Unless otherwise noted, subjects are college students. PA
= Posi-
tive Affect. NA = Negative Affect.
52. • Normal adult sample.
els, Uhlenhuth, & Covi, 1974), the Beck Depression Inventory
(BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961), and
the State-Trait Anxiety Inventory State Anxiety Scale (A-State;
Spielberger, Gorsuch, & Lushene, 1970).
The HSCL (Derogatis et al., 1974) is a measure of general
distress and dysfunction. Subjects rate the extent to which they
have experienced each of 58 symptoms or problems during the
past week. The HSCL and a subsequent 90-item version, the
SCL-90 (Derogatis, Rickels, & Rock, 1976), have been used fre-
quently as measures of clinical symptomatology in both normal
and clinical populations (e.g., Gotlib, 1984; Kanner et al., 1981;
Rickels, Lipman, Garcia, & Fisher, 1972). Although the HSCL
and SCL-90 each contain several subscales, analyses have re-
peatedly shown that both instruments reflect a large general dis-
tress factor (e.g., Dinning & Evans, 1977; Gotlib, 1984).
The BDI (Beck et al., 1961 ) is a 2 1-item self-report measure
of depressive symptomatology. Subjects rate whether they have
experienced each symptom during the past few days. The BDI
is commonly used to assess mild to moderate levels of depres-
sion, and studies have generally supported its validity in this
context (e.g., Bumberry, Oliver, & McClure, 1978; Coyne &
53. Gotlib, 1983; Hammen, 1980).
The A-State (Spielberger et al., 1970) is a 20-item scale that
asks subjects to rate their current affect. Researchers have used
the A-State to study subjects’ responses to a variety of stressful
and aversive events, including surgery, shock, pain, failure,
criti-
cism, interviews, and exams (see Watson & Clark, 1984).
Correlations between the PANAS scales and the HSCL, BDI,
and A-State are presented in Table 7. Looking first at the
HSCL,
Table 7 indicates that it is largely a measure of NA, although it
also shows modest (negative) correlations with PA. In fact, the
correlations between the HSCL and the PANAS NA scale are
high enough to suggest that the two measures are roughly inter-
changeable, at least in normal populations. Insofar as this is the
case, the PANAS NA scale seems to offer a shorter (10 vs. 58
items), simpler, and conceptually more straightforward mea-
sure of general psychological distress.
The BDI is also substantially correlated with the PANAS NA
scale, but the coefficients are not so high as to indicate inter-
changeability. In addition, the BDI has significant (negative)
54. correlations with PA, consistent with previous findings that de-
pressive symptomatology is affectively complex (Tellegen,
1985;
Watson & Clark, 1984; Watson, Clark, & Carey, in press). That
is, it involves the lack of pleasurable experiences (low PA) in
addition to anger, guilt, apprehension, and general psychologi-
cal distress (high NA). The PANAS scales offer the advantage
of providing reliable and independent measures of these two
affective components. Researchers interested in studying de-
pressed affect might therefore want to use the PANAS scales as
a complement to more traditional depression measures.
The A-State is also a mixture of high NA and low PA, repli-
cating the results of Watson and Clark (1984, Table 4) using
NA and PA factor scores. An inspection of the A-State’s items
indicates why this is the case. Many of the items tap mood
states
traditionally associated with anxiety (e.g., feeling tense, upset,
worried, anxious, nervous, jittery, and highstrung) or its
absence
(e.g., feeling calm, relaxed, and content), and such items will
produce a substantial correlation with the PANAS NA scale.
Other (reverse-keyed) items, however, reflect pleasant or high
PA states (e.g., feeling joyful, pleasant, self-confident, and
rested) that account for the A-State’s significant correlation
55. with PA. The A-State has repeatedly demonstrated its useful-
ness as a sensitive measure of unpleasant mood states; but, as
with the BDI, the PANAS scales offer the advantage of
assessing
these two affective components separately.
Intraindividual analyses of nontest correlates.1 When used
with short-term time frame instructions (i.e., moment or to-
day), the PANAS scales are sensitive to changing internal or ex-
ternal circumstances. We have used the PANAS scales in three
large scale within-subjects investigations that illustrate their
usefulness in studying qualitatively distinctive intraindividual
mood fluctuations. In the first (Watson, 1988), 80 subjects com-
pleted a PANAS questionnaire each evening for 5-7 weeks, us-
ing today time instructions. At each assessment the subjects
also
estimated their social activity (number of hours spent with
friends that day) and rated the level of stress they had experi-
enced. A total of 3,554 measurements were collected (Μ = 44.4
per subject). As hypothesized, within-subject variations in per-
ceived stress were strongly correlated with fluctuati ons in NA
but not in PA. Also, as expected, social activity was more
highly
related to PA than to NA.
56. 1 The data reported in Watson ( 1988) and Clark and Watson (
1986)
are based on PA and NA factor scores. We have reanalyzed
these data
using the PANAS scales and have obtained virtually identical
results.
The other two studies were primarily concerned with diurnal
variation in mood. Clark and Watson (1986) had 123 subjects
fill out a PANAS form every 3 waking hours for a week using
moment time instructions. Subjects also rated their current
stress and noted whether they had been interacting socially
within the past hour. A total of5,476 assessments were collected
(Μ = 44.9 per subject). Leeka (1987) replicated this design with
an additional 73 subjects (a total of 3,206 measurements; Μ =
43.9 per subject). In both studies, perceived stress was again
consistently correlated with intraindividual fluctuations in NA
but not in PA. And, as before, social interaction was more
strongly related to PA than to NA.
PA also showed a strong time-of-day effect in both studies.
Specifically, PA scores tended to rise throughout the morning,
63. ly
.
remain steady during the rest of the day, and then decline again
during the evening. However, NA did not exhibit a significant
diurnal pattern in either sample.
Conclusion
We have presented information regarding the development of
brief scales to measure the two primary dimensions of mood-
Positive and Negative Affect. Whereas existing scales are
unreli-
able, have poor convergent or discriminant properties, or are
cumbersome in length, these 10-item scales are internally con-
sistent and have excellent convergent and discriminant corre-
lations with lengthier measures of the underlying mood factors.
They also demonstrate appropriate stability over a 2-month
time period. When used with short-term instructions (e.g., right
now or today), they are sensitive to fluctuations in mood,
whereas they exhibit traitlike stability when longer-term in-
structions are used (e.g., past year or general). The scales corre-
late at predicted levels with measures of related constructs and
show the same pattern of relations with external variables that
have been seen in other studies. For example, the PA scale (but
64. not the NA scale) is related to social activity and shows signifi-
cant diurnal variation, whereas the NA scale (but not the PA
scale) is significantly related to perceived stress and shows no
circadian pattern.
Thus, we offer the Positive and Negative Affect Schedule as
a reliable, valid, and efficient means for measuring these two
important dimensions of mood.
References
Beck, A. T., Ward, C. H., Mendelson, Μ., Mock, J., & Erbaugh,
J.
(1961). An inventory for measuring depression. Archives of
General
Psychiatry, 4, 561-571.
Beiser, Μ. (1974). Components and correlates of mental well -
being.
Journal of Health and Social Behavior, 15, 320-327.
Bradburn, N. Μ. (1969). The structure of psychological well -
being. Chi-
cago: Aldine.
65. Brenner, B. ( 1975). Enjoyment as a preventive of depressive
affect. Jour-
nal of Community Psychology, 3, 346-357.
Bumberry, W, Oliver, J. Μ., & McClure, J. (1978). Validation
of the
Beck Depression Inventory in a university population using
psychiat-
ric estimate as a criterion. Journal of Consulting and Clinical
Psychol-
ogy, 46, 150-155.
Clark, L. A„ & Watson, D. (1986, August). Diurnal variation in
mood:
Interaction with daily events and personality. Paper presented at
the
meeting of the American Psychological Association,
Washington,
DC.
Clark, L. A., & Watson, D. (1988). Mood and the mundane:
Relations
between daily life events and self-reported mood. Journal of
Personal-
ity and Social Psychology, 54, 296-308.
66. Costa, P. T., Jr., & McCrae, R. R. (1980). Influence of
extraversion and
neuroticism on subjective well-being: Happy and unhappy
people.
Journal of Personality and Social Psychology, 38, 668-678.
Coyne, J. C„ & Gotlib, I. H, ( 1983). The role of cognition in
depression:
A critical appraisal. Psychological Bulletin, 94, 472-505.
Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H.,
& Covi,
L. (1974). The Hopkins Symptom Checklist (HSCL): A self-
report
symptom inventory. Behavioral Science, 19, 1-15.
Derogatis, L. R., Rickels, K., & Rock, A. (1976), The SCL-90
and the
MMPI: A step in the validation of a new self-report scale.
British
Journal of Psychiatry, 128, 280-289.
Dienet; E., & Emmons, R. A. ( 1984). The independence of
positive and
67. negative affect. Journal of Personality and Social Psychology,
47,
1105-1117.
Diener, E., & Iran-Nejad, A. ( 1986). The relationship in
experience be-
tween various types of affect. Journal of Personality and Social
Psy-
chology, 50, 1031-1038.
Diener, E., & Larsen, R. J. (1984). Temporal stability and cross -
situa-
tional consistency of affective, behavioral, and cognitive
responses.
Journal of Personality and Social Psychology, 47, 871-883.
Diener, E., Larsen, R. J., Levine, S., & Emmons, R. A. (1985).
Intensity
and frequency: Dimensions underlying positive and negative
affect.
Journal of Personality and Social Psychology, 48, 1253-1265.
Dinning, W. D., & Evans, R. G. (1977). Discriminant and
convergent
68. validity of the SCL-90 in psychiatric inpatients. Journal of
Personal-
ity Assessment. 41, 304-310.
Epstein, S. (1979). The stability of behavior: I. On predicting
most of
the people much of the time. Journal of Personality and Social
Psy-
chology, 37. 1097-1126.
Gotlib, I. H. ( 1984). Depression and general psychopathology
in univer-
sity students. Journal of Abnormal Psychology, 93, 19-30.
Hall, C. A. (1977). Differential relationships of pl easure and
distress
with depression and anxiety over a past, present, and future
time
framework. Unpublished doctoral dissertation, University of
Minne-
sota, Minneapolis.
Hammen, C. L. (1980). Depression in college students: Beyond
the
69. Beck Depression Inventory. Journal of Consulting and Clinical
Psy-
chology, 48, 126-128.
Harding, S. D. (1982). Psychological well-being in Great
Britain: An
evaluation of the Bradburn Affect Balance Scale. Personality
and In-
dividual Differences, 3, 167-175.
Hedges, S. Μ., Jandorf, L., & Stone, A. A. (1985). Meaning of
daily
mood assessments. Journal of Personality and Social
Psychology, 48,
428-434.
Kammann, R., Christie, D., Irwin, R-, & Dixon, G. (1979).
Properties
of an inventory to measure happiness (and psychological
health). New
Zealand Psychologist, 8, 1-9.
Kanner, A. D., Coyne, J. C., Schaefer, C., & Lazarus, R. S.
(1981). Com-
parison of two modes of stress measurement: Daily hassles and
70. uplifts
versus major life events. Journal of Behavioral Medicine, 4, 1-
39.
Leeka, J. ( 1987). Patterns of diurnal variation in mood in
depressed and
nondepressed college students. Unpublished master’s thesis,
Southern
Methodist University, Dallas, TX.
McAdams, D. P., & Constantian, C. A. (1983). Intimacy and
affiliation
motives in daily living: An experience sampling analysis.
Journal of
Personality and Social Psychology, 45, 851-861.
Moriwaki, S. Y. (1974). The Affect Balance Scale: A validity
study with
aged samples. Journal of Gerontology, 29. 73-78.
Rickels, K,, Lipman, R. S., Garcia, C. R., &Fisher, E. (1972).
Evaluat-
ing clinical improvement in anxious outpatients. American
Journal
of Psychiatry, 128, 119-123.
71. Russell, J. A. (1980). A circumplex model of affect. Journal of
Personal-
ity and Social Psychology, 39, 1161-1178.
Russell, J. A. (1983). Pancultural aspects of the human
conceptual orga-
nization of emotions. Journal of Personality and Social
Psychology,
45. 1281-1288.
Spielberger, C. D., Gorsuch, R. L„ & Lushene, R. E. (1970).
Manual
for the Slate-Trait Anxiety Inventory. Palo Alto, CA:
Consulting Psy-
chologists Press.
Stone, A. A. ( 1981 ). The association between perceptions of
daily expe-
riences and self- and spouse-rated mood. Journal of Research in
Per-
sonality. 15, 510-522.
Stone, A. A. (1987). Event content in a daily survey is
differentially
72. associated with concurrent mood. Journal of Personality and
Social
Psychology, 52, 56-58.
Stone, A. A., Hedges, S. Μ., Neale, J. Μ., & Satin, M. S.
(1985). Pro-
spective and cross-sectional mood reports offer no evidence of a
1070 D. WATSON, L. CLARK, AND A. TELLEGEN
Th
is
do
cu
m
en
t i
s c
op
78. in
at
ed
b
ro
ad
ly
.
“Blue Monday” phenomenon. Journal of Personality and Social
Psy-
chology, 49, 129-134.
Tellegen, A. (1985). Structures of mood and personality and
their rele-
vance to assessing anxiety, with an emphasis on self-report. In
A. H.
Tuma & J. D. Maser (Eds.), Anxiety and the anxiety disorders
(pp.
681-706). Hillsdale, NJ: Erlbaum.
Tessier, R., & Mechanic, D. ( 1978). Psychological distress and
79. perceived
health status. Journal of Health and Social Behavior, 19. 254-
262.
Warr, P. ( 1978). A study of psychological well-being. British
Journal of
Psychology; 69, 111-121.
Warr, P., Barter, J., & Brownbridge, G. ( 1983). On the
independence of
positive and negative affect. Journal of Personality and Social
Psy-
chology, 44, 644-651.
Watson, D. (in press). The vicissitudes of mood measurement:
Effects
of varying descriptors, time frames, and response formats on
mea-
sures of Positive and Negative Affect. Journal of Personality
and So-
cial Psychology.
Watson, D. ( 1988). Intraindividual and interindividual analyses
of Posi-
tive and Negative Affect: Their relation to health complaints,
80. per-
ceived stress, and daily activities. Journal of Personality and
Social
Psychology, 54, 1020-1030.
Watson, D., &Clark, L. A. (1984). Negative Affectivity: The
disposition
to experience aversive emotional states. Psychological Bull etin,
96,
465-490.
Watson, D., Clark, L. A., & Carey, G. (in press). Positive and
Negative
Affectivity and their relation to anxiety and depressive
disorders.
Journal of Abnormal Psychology.
Watson, D., Clark, L. A., & Tellegen, A. (1984). Cross-cultural
conver-
gence in the structure of mood: A Japanese replication and a
compar-
ison with U.S. findings. Journal of Personality and Social
Psychology,
47. 127-144.
81. Watson, D., & Pennebaker, J. W. (in press). Health complaints,
stress,
and distress: Exploring the central role of negative affectivity.
Psycho-
logical Review.
Watson, D., & Tellegen, A. (1985) Toward a consensual
structure of
mood. Psychological Bulletin, 98. 219-235.
Wills, T. A. ( 1986). Stress and coping in early adolescence:
Relationships
to substance use in urban school samples. Health Psychology, 5,
503-
529.
Zevon, Μ. A., & Tellegen, A. (1982). The structure of mood
change: An
idiographic/nomothetic analysis. Journal of Personality and
Social
Psychology 4 J, 111-122.
Appendix
The PANAS
82. This scale consists of a number of words that describe different
feelings and emotions. Read each item and then mark
the appropriate answer in the space next to that word. Indicate
to what extent [INSERT APPROPRIATE TIME
INSTRUCTIONS HERE]. Use the following scale to record
your answers.
We have used PANAS with the following time instructions:
1
very slightly
or not at all
2
a little
3
moderately
4
quite a bit
5
extremely
83. _____ interested _____ irritable
_____ distressed _____ alert
excited ashamed
upset inspired
strong nervous
_____ guilty _____ determined
scared attentive
_____ hostile _____ jittery
enthusiastic active
proud afraid
Moment
Today
Past few days
Week
Past few weeks
Year
General
(you feel this way right now, that is, at the present moment)
(you have felt this way today)
84. (you have felt this way during the past few days)
(you have felt this way during the past week)
(you have felt this way during the past few weeks)
(you have felt this way during the past year)
(you generally feel this way, that is, how you feel on the
average)
Received May 10, 1987
Revision received September 14,1987
Accepted November 11, 1987 ■
PERSONNEL PSYCHOLOGY
1991,44
THE BIG FIVE PERSONALITY DIMENSIONS AND JOB
PERFORMANCE: A META-ANALYSIS
MURRAY R. BARRICK, MICHAEL K. MOUNT
Department of Management and Organizations
University of Iowa
85. This study investigated the relation of the "Big Five"
personality di-
mensions (Extraversion, Emotional Stability, Agreeableness,
Consci-
entiousness, and Openness to Experience) to three job
performance
criteria (job proficiency, training proficiency, and personnel
data) for
five occupational groups (professionals, police, managers, sales,
and
skilled/semi-skilled). Results indicated that one dimension of
person-
ality. Conscientiousness, showed consistent relations with all
job per-
formance criteria for all occupational groups. For the remaining
per-
sonality dimensions, the estimated true score correlations varied
by
occupational group and criterion type. Extraversion was a valid
pre-
dictor for two occupations involving social interaction,
managers and
sales (across criterion types). Also, both Openness to
Experience and
86. Extraversion were valid predictors of the training proficiency
criterion
(across occupations). Other personality dimensions were also
found
to be valid predictors for some occupations and some criterion
types,
but the magnitude of the estimated true score correlations was
small
(p < .10). Overall, the results illustrate the benefits of using the
5-
factor model of personality to accumulate and communicate
empirical
findings. The findings have numerous implications for research
and
practice in personnel psychology, especially in the subfields of
person-
nel selection, training and development, and performance
appraisal.
Introduction
Over the past 25 years, a number of researchers have
investigated the
validity of personality measures for personnel selection
purposes. The
88. 1
2 PERSONNEL PSYCHOLOGY
Gooding, Noe, & Kirsch, 1984). However, at the time these
studies were
conducted, no well-accepted taxonomy existed for classifying
personality
traits. Consequently, it was not possible to determine whether
there
were consistent, meaningful relationships between particular
personality
constructs and performance criteria in different occupations.
In the past 10 years, the views of many personalify
psychologists have
converged regarding the structure and concepts of personalify.
Gener-
ally, researchers agree that there are five robust factors of
personalify
(described below) which can serve as a meaningful taxonomy
for classi-
fying personalify attributes (Digman, 1990). Our purpose in the
89. present
study is to examine the relationship of these five personalify
constructs
to job performance measures for different occupations, rather
than to
focus on the overall validify of personalify as previous
researchers have
done.
Emergence of the 5-Factor Model
Systematic efforts to organize the taxonomy of personalify
began
shortly after McDougall (1932) wrote that, "Personalify may to
advan-
tage be broadly analyzed into five distinguishable but separate
factors,
namely intellect, character, temperament, disposition, and
temper..."
(p. 15). About 10 years later, Cattell (1943, 1946, 1947, 1948)
devel-
oped a relatively complex taxonomy of individual differences
that con-
sisted of 16 primary factors and 8 second-order factors.
However, re-
90. peated attempts by researchers to replicate his work were
unsuccessful
(Fiske, 1949; Tupes, 1957; Tupes & Christal, 1961) and, in each
case,
researchers found that the 5-factor model accounted for the data
quite
well. For example, Tupes and Christal (1961) reanalyzed the
correlations
reported by Cattell and Fiske and found that there was good
support for
five factors: Surgency, Emotional Stabilify, Agreeableness,
Dependabil-
ify, and Culture. As it would turn out later, these factors (and
those of
McDougall 35 years before) were remarkably similar to those
generally
accepted by researchers today. However, as Digman (1990)
points out,
the work of Tupes and Christal had only a minor impact because
their
study was published in an obscure Air Force technical report.
The 5-
factor model obtained by Fiske (1949) and Tupes and Christal
(1961)
was corroborated in four subsequent studies (Borgatta, 1964;
91. Hakel,
1974; Norman, 1963; Smith 1967). Borgatta's findings are
noteworthy
because he obtained five stable factors across five methods of
data gath-
ering. Norman's work is especially significant because his
labels (Ex-
traversion. Emotional Stabilify, Agreeableness,
Conscientiousness, and
Culture) are used commonly in the literature and have been
referred to,
subsequently, as "Norman's Big Five" or simply as the "Big
Five."
BARRICK AND MOUNT 3
During the past decade, an impressive body of literature has
accu-
mulated which provides compelling evidence for the robustness
of the 5-
factor model: across different theoretical frameworks
(Goldberg, 1981);
using different instruments (e.g., Conley, 1985; Costa &
92. McCrae, 1988;
Lorr & Youniss, 1973; McCrae, 1989; McCrae & Costa, 1985,
1987,
1989); in different cultures (e.g.. Bond, Nakazato, & Shiraishi,
1975;
Noller, Law, & Comrey, 1987); using ratings obtained from
different
sources (e.g., Digman & Inouye, 1986; Digman & Takemoto-
Chock,
1981; Fiske, 1949; McCrae & Costa, 1987; Norman, 1963;
Norman &
Goldberg, 1966; Watson, 1989); and with a variety of samples
(see Dig-
man, 1990, for a more detailed discussion). An important
consideration
for the field of personnel psychology is that these dimensions
are also rel-
atively independent of measures of cognitive ability (McCrae &
Costa,
1987).
It should be pointed out that some researchers have reservations
about the 5-factor model, particularly the imprecise
specification of
these dimensions (Briggs, 1989; John, 1989; Livneh & Livneh,
93. 1989;
Waller & Ben-Porath, 1987). Some researchers suggest that
more than
five dimensions are needed to encompass the domain of
personality. For
example, Hogan (1986) advocates six dimensions (Sociability,
Ambition,
Adjustment, Likability, Prudence, and Intellectance). The
principle dif-
ference seems to be the splitting of the Extraversion dimension
into So-
ciability and Ambition.
Interpretations of the "Big Five"
While there is general agreement among researchers concerning
the
number of factors, there is some disagreement about their
precise mean-
ing, particularly Norman's Conscientiousness and Culture
factors. Of
course, some variation from study to study is to be expected
with factors
as broad and inclusive as the 5-factor model. As shown below,
however,
94. there is a great deal of commonality in the traits that define
each factor,
even though the name attached to the factor differs.
It is widely agreed that the first dimension is Eysenck's
Extraver-
sion/Intraversion. Most frequently this dimension has been
called Ex-
traversion or Surgency (Botwin & Buss, 1989; Digman &
Takemoto-
Chock, 1981; Hakel, 1974; Hogan, 1983; Howarth, 1976; John,
1989;
Krug & Johns, 1986; McCrae & Costa, 1985; Noller et al.,
1987; Nor-
man, 1963; Smith, 1967). Traits frequently associated with it
include be-
ing sociable, gregarious, assertive, talkative, and active. As
mentioned
above, Hogan (1986) interprets this dimension as consisting of
two com-
ponents. Ambition (initiative, surgency, ambition, and
impetuous) and
Sociability (sociable, exhibitionist, and expressive).
95. 4 PERSONNEL PSYCHOLOGY
There is also general agreement about the second dimension.
This
factor has been most frequently called Emotional Stability,
Stability,
Emotionality, or Neuroticism (Borgatta, 1964; Conley, 1985;
Hakel,
1974; John, 1989; Lorr & Manning, 1978; McCrae & Costa,
1985; Noller
et al., 1987; Norman, 1963; Smith, 1967). Common traits
associated with
this factor include being anxious, depressed, angry,
embarrassed, emo-
tional, worried, and insecure. These two dimensions
(Extraversion and
Emotional Stability) represent the "Big Two" described by
Eysenck over
40 years ago.
The third dimension has generally been interpreted as
Agreeable-
ness or Likability (Borgatta, 1964; Conley, 1985; Goldberg,
1981; Hakel,
96. 1974; Hogan, 1983; John, 1989; McCrae & Costa, 1985; Noller
et al.,
1987; Norman, 1963; Smith, 1967; Tupes & Christal, 1961).
Others have
labeled it Friendliness (Guilford & Zimmerman, 1949), Social
Confor-
mity (Fiske, 1949), Compliance versus Hostile Non-Compliance
(Dig-
man & Thkemoto-Chock, 1981), or Love (Peabody & Goldberg,
1989).
Traits associated with this dimension include being courteous,
flexible,
trusting, good-natured, cooperative, forgiving, soft-hearted, and
toler-
ant.
The fourth dimension has most frequently been called Conscien-
tiousness or Conscience (Botwin & Buss, 1989; Hakel, 1974;
John, 1989;
McCrae & Costa, 1985; Noller et al., 1987; Norman, 1963;),
although it
has also been called Conformity or Dependability (Fiske, 1949;
Hogan,
1983). Because of its relationship to a variety of educational
achieve-
97. ment measures and its association with volition, it has also been
called
Will to Achieve or Will (Digman, 1989; Smith, 1967; Wiggins,
Black-
burn, & Hackman, 1969), and Work (Peabody & Goldberg,
1989). As
the disparity in labels suggests, there is some disagreement
regarding the
essence of this dimension. Some writers (Botwin & Buss, 1989;
Fiske,
1949; Hogan, 1983; John, 1989; Noller et al., 1987) have
suggested that
Conscientiousness reflects dependability; that is, being careful,
thor-
ough, responsible, organized, and planful. Others have
suggested that
in addition to these traits, it incorporates volitional variables,
such as
hardworking, achievement-oriented, and persevering. Based on
the evi-
dence cited by Digman (1990), the preponderance of evidence
supports
the definition of conscientiousness as including these volitional
aspects
(Bernstein, Garbin, & McClellan, 1983; Borgatta, 1964; Conley,
98. 1985;
Costa & McCrae, 1988; Digman & Inouye, 1986; Digman &
Takemoto-
Chock, 1981; Howarth, 1976; Krug & Johns, 1986; Lei &
Skinner, 1982;
Lorr & Manning, 1978; McCrae & Costa, 1985, 1987, 1989;
Norman,
1963; Peabody & Goldberg, 1989; Smith, 1967).
The last dimension has been the most difficult to identify. It has
been
interpreted most frequently as Intellect or Intellectence
(Borgatta, 1964;
BARRICK AND MOUNT 5
Digman & Takemoto-Chock, 1981; Hogan, 1983; John, 1989;
Peabody
and Goldberg, 1989). It has also been called Openness to
Experience
(McCrae & Costa, 1985) or Culture (Hakel, 1974; Norman,
1963). Dig-
man (1990) points out that it is most likely all of these. Itaits
99. commonly
associated with this dimension include being imaginative,
cultured, curi-
ous, original, broad-minded, intelligent, and artistically
sensitive.
The emergence of the 5-factor model has important implications
for
the field of personnel psychology. It illustrates that personality
consists
of five relatively independent dimensions which provide a
meaningful
taxonomy for studying individual differences. In any field of
science, the
availability of such an orderly classification scheme is essential
for the
communication and accumulation of empirical findings. For
purposes
of this study, we adopted names and definitions similar to those
used
by Digman (1990): Extraversion, Emotional Stability,
Agreeableness,
Conscientiousness, and Openness to Experience.
Expected Relations Between PersonaUty Dimensions and Job
100. Performance
In the present study, we investigate the validity of the five
dimen-
sions of personality for five occupational groups (professionals,
police,
managers, sales, and skilled/semi-skilled) and for three types of
job per-
formance criteria (job proficiency, training proficiency, and
personnel
data) using meta-analytic methods. We also investigate the
validity of
the five personality dimensions for objective versus subjective
criteria.
We hypothesize that two of the dimensions of personality.
Consci-
entiousness and Emotional Stability, will be valid predictors of
all job
performance criteria for all jobs. Conscientiousness is expected
to be
related to job performance because it assesses personal
characteristics
such as persistent, planful, careful, responsible, and
hardworking, which
101. are important attributes for accomplishing work tasks in all
jobs. There
is some evidence that in educational settings there are
consistent cor-
relations between scores on this dimension and educational
achieve-
ment (Digman & Takemoto-Chock, 1981; Smith, 1967). Thus,
we ex-
pect that the validity of this dimension will generalize across all
occupa-
tional groups and criterion categories. We also expect that the
validity
of Emotional Stability will generalize across occupations and
criterion
types. Viewing this dimension from its negative pole, we expect
that em-
ployees exhibiting neurotic characteristics, such as worry,
nervousness,
temperamentalness, high-strungness, and self-pity, will tend to
be less
successful than more emotionally stable individuals in all
occupations
studied because these traits tend to inhibit rather than facilitate
the ac-
complishment of work tasks.
102. 6 PERSONNEL PSYCHOLOGY
We expect that other personality dimensions may be related to
job
performance, but only for some occupations or some criteria.
For ex-
ample, in those occupations that involve frequent interaction or
cooper-
ation with others, we expect that two personality dimensions,
Extraver-
sion and Agreeableness, will be valid predictors. These two
dimensions
should be predictive of performance criteria for occupations
such as
management and sales, but would not be expected to be valid
predic-
tors for occupations such as production worker or engineer.
In a similar vein, we expect that Openness to Experience will be
a
valid predictor of one of the performance criteria, training
proficiency.
103. This dimension is expected to be related to training proficiency
because it
assesses personal characteristics such as curious, broadminded,
cultured,
and intelligent, which are attributes associated with positive
attitudes
toward learning experiences. We believe that such individuals
are more
likely to be motivated to learn upon entry into the training
program and,
consequently, are more likely to benefit from the training.
Finally, we investigated a research question of general interest
to per-
sonnel psychologists for which we are not testing a specific
hypothesis.
The question is whether the validity coefficients for the five
personality
dimensions diflfer for two types of criteria, objective and
subjective. A
recent meta-analysis by Nathan and Alexander (1988) indicates
that, in
general, there is no difference between the magnitude of the
validities
for cognitive ability tests obtained for objective and subjective
104. criteria for
clerical jobs. In another study, Schmitt et al. (1984)
investigated the va-
lidity of personality measures (across dimensions and
occupations) for
different types of criteria, but no definitive conclusions were
apparent
from the data. The average validity for the subjective criterion
(perfor-
mance ratings) was .206. Validities for three of four objective
criteria
were lower (.121 for turnover, .152 for achievement/grades, and
.126 for
status change), whereas the validity was higher for wages
(.268). Thus,
conclusions regarding whether the validities for personality
measures are
higher for objective, compared to subjective, criteria depend to
a large
extent on which objective measures are used. Because our study
exam-
ines personality using a 5-factor model, we are able to assess
whether
dimensions have differential relationships to various objective
and sub-
105. jective criteria.
In summary, the following hypotheses will be tested in this
study.
Of the five dimensions of personality. Conscientiousness and
Emotional
Stability are expected to be valid predictors of job performance
for all
jobs and all criteria because Conscientiousness measures those
personal
characteristics that are important for accomplishing work tasks
in all
jobs, while Emotional Stability (when viewed from the negative
pole)
measures those characteristics that may hinder successful
performance.
BARRICK AND MOUNT 7
In contrast, Extraversion and Agreeableness are expected to
correlate
with job performance for two occupations, sales and
management, be-
106. cause interpersonal dispositions are likely to be important
determinants
of success in those occupations. Finally, Openness to
Experience is ex-
pected to correlate with one of the criterion types, training
proficiency,
because Openness to Experience appears to assess individuals'
readiness
to participate in learning experiences. In addition, we
investigated the
validity of various objective and subjective criteria for the five
personality
dimensions.
Method
Literature Review
A literature search was conducted to identify published and
unpub-
lished criterion-related validity studies of personality for
selection pur-
poses between 1952 and 1988. Three strategies were used to
search
the relevant literature. First, a computer search was done of
107. PsycINFO
(1967-1988) and Dissertation Abstracts (1952-1988) in order to
find all
references to personality in occupational selection. Second, a
manual
search was conducted that consisted of checking the sources
cited in the
reference section of literature reviews, articles, and books on
this topic,
as well as personality inventory manuals, Buros Tests in Print
(volumes 4-
9,1953-1985), and journals that may have included such articles
(includ-
ing the Journal of Applied Psychology, Personnel Psychology,
Academy of
Management Journal, Organizational Behavior and Human
Decision Pro-
cesses/Organizational Behavior and Human Performance,
Journal of Man-
agement, Journal of Vocational Behavior, Journal of Personality
and Social
Psychology, Journal of Personality, and Journal of Consulting
and Clinical
Psychology). Finally, personality test publishers and over 60
practition-
108. ers known to utilize personalify inventories in selection
contexts were
contacted by letter, requesting their assistance in sending or
locating ad-
ditional published or unpublished validation studies.
Overall, these searches yielded 231 criterion-related validify
studies,
117 of which were acceptable for inclusion in this analysis. The
remain-
ing 114 studies were excluded for several reasons: 44 reported
results
for interest and value inventories only and were excluded
because they
did not focus on the validity of personality measures; 24 used
composite
scores or, conversely, extracted specific items from difî erent
scales and
instruments; 19 reported only significant validity coefficients;
15 used
military or laboratory "subjects"; and 12 either were not
selection stud-
ies or provided insufficient information.
109. 8 PERSONNEL PSYCHOLOGY
A total of 162 samples were obtained from the 117 studies.
Sample
sizes ranged from 13 to 1,401 (M = 148.11; SD = 185.79),
yielding a total
sample of 23,994. Thirty-nine samples were reported in the
1950s, 52 in
the 1960s, 33 in the 1970s, and 38 in the 1980s. Fifty samples
(31%) were
collected from unpublished sources, most of which were
unpublished
dissertations.
The studies were categorized into five major occupational
groupings
and three criterion types. The occupational groups were
professionals
(5% of the samples), which consisted of engineers, architects,
attorneys,
accountants, teachers, doctors, and ministers; police (13% of the
sam-
ples); managers (41% of the samples), which ranged from
foremen to
110. top executives; sales (17% of the samples); and skilledlsemi -
skilled (24%
of the samples), which consisted of jobs such as clerical, nurses
aides,
farmers, flight attendants, medical assistants, orderlies, airline
baggage
handlers, assemblers, telephone operators, grocery clerks, truck
drivers,
and production workers.
The three criterion types were fob proficiency (included in 68%
of the
samples), training proficiency (12% of the samples), and
personnel data
(33% of the samples). It should be noted that in 21 samples,
data were
available from two of the three criterion categories, which
explains why
the total percent of sample for the three criterion types exceeds
100%.
Similarly, the total sample size on which these analyses are
based will be
larger than those for analyses by occupation. Job proficiency
measures
primarily included performance ratings (approximately 85% of
111. the mea-
sures) as well as productivity data; training proficiency
measures con-
sisted mostly of training performance ratings (approximately
90% of the
measures) in addition to productivity data, such as work sample
data and
time to complete training results; and personnel data included
data from
employee files, such as salary level, turnover, status change,
and tenure.
Key variables of interest in this study were the validity
coefficients,
sample sizes, range restriction data for those samples, reliability
esti-
mates for the predictors and criteria, the personality scales (and
the in-
ventories used), and the types of occupations. A subsample of
approx-
imately 25% of the studies was selected to assess interrater
agreement
on the coding of the key variables of interest. Agreement was
95% for
these variables and disagreement between coders was resolved
112. by refer-
ring back to the original study.
Scales from all the inventories were classified into the five
dimensions
defined earlier (i.e., Extraversion, Emotional Stability,
Agreeableness,
Conscientiousness, and Openness to Experience) or a sixth
Miscella-
neous dimension. The personality scales were categorized into
these di-
mensions by six trained raters. Five of these raters had received
Ph.D.s in
BARRICK AND MOUNT 9
psychology (three were practicing consulting psychologists with
respon-
sibilities for individual assessment; the other two were
professors of psy-
chology and human resources management, respectively, and
both had
taught personnel selection courses) and the other taught similar
113. courses
while completing his Ph.D. in human resources management and
was
very familiar with the literature on personality. A short training
session
was provided to the raters to familiarize them with the rating
task and
examples were provided. The description of the five factors
provided to
the raters corresponded to those presented by Digman (1990)
and as de-
scribed above. Raters were provided a list of the personality
scales and
their definitions for each inventory and were instructed to
assign each
to the dimension to which it best fit. A sixth category.
Miscellaneous,
was used in those cases where the scale could not be assigned
clearly
into one of the five categories. If at least five of the six raters
agreed
on a dimension, the scale was coded in that dimension. If four
of the
six raters agreed and the two authors' ratings (completed
independently
114. of the raters) agreed with the raters, the scale was coded into
that di-
mension. If three or fewer raters agreed, the scale was coded
into the
Miscellaneous dimension. At least five of six raters agreed in
68% of the
cases, four of six raters agreed in 23% of the cases, and three or
fewer
raters agreed on 9% of the cases. Of the 191 scales, 39 were
categorized
as representing Emotional Stability; 32 as Extraversion; 31 as
Openness
to Experience; 29 as Agreeableness; 32 as Conscientiousness;
28 as Mis-
cellaneous. (A list of the inventories, their respective scales,
and dimen-
sional category assigned are available from the first author.) It
should
be noted that an alternative method for assigning the scales
would be to
use empirical data, such as factor analyses of inventories or
correlations
among scales from different inventories. However, we were
unable to
locate sufficient factor analytic studies or correlational data to
115. allow us
to use these approaches because in both cases data was
available for only
about half of the variables.
To arrive at an overall validity coefficient for each scale from
an in-
ventory, the following decision rules were applied in situations
where
more than one validity coefficient was reported from a sample:
(a) If an
overall criterion was provided, that coefficient was used and (b)
when
multiple criteria were provided, they were assigned to the
appropriate
criterion category (job proficiency, training proficiency, or
personnel
data). If there were multiple measures from a criterion category,
the
coefficients were averaged. However, because our analyses
focused on
personality dimensions rather than individual personality scales
(from
various inventories), the following decision rules were applied
to estab-
116. lish the validity coefficient for each personality dimension from
a sample:
(a) If a personality dimension had only one scale categorized
into that
10 PERSONNEL PSYCHOLOGY
dimension for that sample, the overall validify coefficient from
that scale
(calculated as previously explained) was used and (b) if
multiple scales
were available for a dimension, the coefficients from each of
these scales
from that sample were averaged and the resulting average
validify coef-
ficient was used in all analyses.
A number of analyses were conducted. The first was an analysis
of
the validities for the five personalify dimensions for each
occupational
group (across criterion types). The second was an analysis of
personalify
117. dimensions for the three criterion types (across occupations).
The final
analysis investigated the validify of the dimensions for
objective versus
subjective criteria (across occupations and criterion fypes).
The meta-analytic procedure adopted in this study used the
formu-
las available in Hunter and Schmidt (1990)-' and corrected the
mean and
variance of validify coefficients across studies for artifactual
variance due
to sampling error, range restriction, and attenuation due to
measure-
ment error. However, because the vast majorify of studies did
not report
information on range restriction and measurement error,
particularly
predictor reliabilities, it was necessary to use artifact
distributions to es-
timate artifactually induced variance on the validify coefficients
(Hunter
& Schmidt, 1990).
Because reliabilify coefficients for predictors were only rarely
118. pre-
sented in the validify studies, the distributions were based upon
informa-
tion obtained from the inventories' manuals. The mean of the
predictor
reliabilify distribution was .76 (SD = .08). Similarly, because
informa-
tion for the criterion reliabilities was available in less than one -
third of
the studies, we developed an artifact distribution for criterion
reliabili-
ties based on data provided by Hunter, Schmidt, and Judiesch
(1990) for
productivify data (with a mean of .92, SD = .05) and Rothstein
(1990) for
performance ratings (with a mean of .52, SD = .05). It should be
noted,
however, that 30 studies included criteria which were
categorized as per-
sonnel data. For these criteria (e.g., turnover, tenure, accidents,
wages,
etc.), reliabilify estimates were unknown because no estimates
have been
provided in the literature. Therefore, the artifact distributions
for crite-
119. rion reliabilities did not include reliabilify estimates for these
criteria.
Thus, for the objective versus subjective analysis, the
productivity and
performance rating artifact distributions were used in each
analysis, re-
spectively, for each personalify dimension. For all other
analyses, the
two criterion distributions were combined (with a mean value of
.56, SD
= .10). Finally, the artifact distribution for range restriction
data was
based upon those studies that reported both restricted and
unrestricted
^All analyses were conducted using a microcomputer program
developed by Frank
Schmidt and reported in Hunter and Schmidt, 1990.
BARRICK AND MOUNT 11
Standard deviation data (i.e., from accepted and rejected
applicants).
120. The effects on the mean validities due to range restriction were
relatively
small because the mean range restriction was .94 (SD = .05).
As previously stated, the Schmidt-Hunter non-interactive
validity
generalization procedure (Hunter & Schmidt, 1990) was applied
to the
data for assumed (predictors and criteria) and sample-based
artifact dis-
tributions (range restriction). (These distributions are available
from the
first author.) However, because the purpose of our study is to
enhance
theoretical understanding of the five personality constructs, we
present
fully corrected correlations that correct for unreliability in the
predictor
as well as the criterion.
Finally, there has been some confusion regarding the use and
inter-
pretation of confidence and credibility values in meta-analysis
(Whitener,
1990). The confidence interval is centered around the sample-
121. size
weighted mean effects sizes (r, before being corrected for
measurement
error or range restriction) and is used to assess the influence of
sampling
error on the uncorrected estimate. In contrast, the credibility
value is
centered around the estimated true score correlations (generated
from
the corrected standard deviation) and is used to assess the
influence of
moderators. Our purpose in the present study is to understand
the true
score correlations between the personality dimensions and job
perfor-
mance criteria for different occupations and to assess the
presence of
moderators. Therefore, the focus in this study is on p and the
corre-
sponding credibility values.
Results
Analysis by Occupational Group
122. The number of correlations upon which the meta-analysis is
based is
shown in Table 1 for the five personality dimensions, five
occupational
types, and three criterion types. It can be seen that the
frequencies differ
substantially from cell to cell. For example, the number of
correlations
for the job proficiency criterion is generally larger for all
personality
dimensions and occupations than for the other criterion types. It
can also
be seen that the number of correlations for the management
occupation
is greater than for the other occupations. The table also shows
that
for some cells there are two or fewer correlations for
professionals and
sales for the training proficiency criterion, and for professionals
and
police for the personnel data criterion. Consequently, we were
unable to
123. 12 PERSONNEL PSYCHOLOGY
TABLE 1
Call Frequencies of Correlations for Personality Dimensions,
Occupational Groups, and Criterion Types
Occupational group
Job proficiency
Professionals
Police
Managers
Sales
Skilled/Semi-skilled
Training proficiency
Professionals
Police
Managers
Sales
Skilled/Semi-skilled
Personnel data
Professionals
129. Table 2 presents the results of the meta-analysis for the five
person-
alify dimensions across the occupational groups (professionals,
police,
managers, sales, and skilled/semi-skilled labor). The first six
columns
of the table contain, respectively, the total sample size, the
number of
correlation coefficients on which each distribution was based,
the un-
corrected (i.e., observed) mean validify, the estimated true
correlation
(p), the estimated true residual standard deviation (SDp), and
the lower
bound of the 90% credibilify value for each distribution, based
on its true
correlation and SDp estimates. The true SDp is the square root
of the
variance that was not attributed to the four artifacts (i.e.,
sampling error
and between-study differences in test unreliabilify, criterion
unreliabil-
ify, and degree of range restriction), after correcting for those
artifacts.
The last column in Table 2 reports the percentage of observed
130. variance
that was accounted for by the four artifacts.
As shown in Table 2, the correlations for the occupational
categories
differed across the five personalify dimensions. Consistent with
our hy-
pothesis, the Conscientiousness dimension was a valid predictor
for all
occupational groupings. It can be seen that the estimated true
score cor-
relations are noticeably larger for Conscientiousness compared
to the
BARRICK AND MOUNT
TABLE 2
Meta-Analysis Results for Personality Dimension-Occupation
Combinations (all Criterion Types Included)
13
145. 70°
94
181
37
46
49
59°
° An unbiased estimate of mean percentage of variance
accounted for across meta-
analyses, calculated by taking the reciprocal of the average of
reciprocals of individual
predicted to observed variance ratios (Hunter & Schmidt, 1990).
Other personality dimensions and are remarkably consistent
across the
five occupational groups (p ranges from .20 to .23).
14 PERSONNEL PSYCHOLOGY
Very little support was found for the hypothesis regarding
146. Emotional
Stability. Compared to the Conscientiousness dimension, the
correla-
tions for Emotional Stability are lower (p ranges from -.13 to
.12). In
fact, for professionals the relationship was in the opposite
direction pre-
dicted (p = -.13).
It was also hypothesized that Extraversion and Agreeableness
would
be valid predictors for the two occupations involving
interpersonal skills,
managers and sales representatives. This hypothesis was
supported for
Extraversion for both occupations (p = .18 and .15,
respectively). How-
ever, very little support was obtained for Agreeableness, as p =
.10 for
managers and .00 for sales. With respect to the other
dimensions, the
remaining true score correlations reported in the table were
quite low
(i.e., p = .10 or less).
147. Analysis by Criteria Type
Table 3 shows the correlation coefficients for the fi ve
personality di-
mensions for the three criterion types. Consistent with our
hypothe-
sis. Conscientiousness is a valid predictor for each of the three
crite-
rion types. As was the case with the occupational analysis in
Table 2,
the results for Conscientiousness are quite consistent across the
crite-
rion types (p ranges from .20 to .23). As reported, the
correlations are
generally higher than for the other personality dimensions. Also
consis-
tent with our hypothesis. Openness to Experience predicted the
training
proficiency criterion relatively well (p = .25). Interestingly,
Extraversion
was also a significant predictor of training proficiency (p =
.26). Most
of the remaining correlations for the three criterion types are
relatively
small (i.e., p = .10 or less).
148. Analysis by Objective and Subjective Criteria
Table 4 shows the validity of the five personality dimensions
for cri-
teria categorized as objective and subjective. It should be noted
that this
analysis is different from that reported in Table 3 because two
of the three
criterion types contain some objective and subjective measures.
First, it
can be seen that the subjective criteria are used about twice as
frequently
as objective criteria. Second, the estimated true score
correlations are
generally higher for subjective, compared to objective, criteria.
In fact,
only one objective criterion, status change, has true score
correlations
equal to or larger than the subjective ratings for four of the
personal-
ity dimensions. For the fifth personality dimension.
Conscientiousness,
the estimated true correlations for the subjective criteria are
higher (p
149. = .23) than for all objective criteria (p ranges from .12 to .17).
BARRICK AND MOUNT
TABLE 3
Meta-Analysis Results for Personality Dimension and Criteria
(Pooled Across Occupational Groups)
15
Criterion type
Extraversion
Job proficiency
Ti-aining proficiency
Personnel data
Mean (across criteria)
Emotional stability
Job proficiency
Training proficiency
150. Personnel data
Mean (across criteria)
Agreeableness
Job proficiency
TVaining proficiency
Personnel data
Mean (across criteria)
Conscientiousness
Job proficiency
•Raining proficiency
Personnel data
Mean (across criteria)
Openness to experience
Job proficiency
•Raining proficiency
Personnel data
Mean (across criteria)
Total N
160. 44
51°
° An unbiased estimate of mean percentage of variance
accounted for across meta-
analyses, calculated by taking the reciprocal of the average of
reciprocals of individual
predicted to observed variance ratios (Hunter & Schmidt, 1990).
We conducted additional analyses of the correlation coeflicients
by
personality dimensions, criterion types, and occupational
subgroups.
Data from these analyses are not reported here (though available
upon
request) because for many of the subgroup categories there were
too few
validity studies. Overall, however, the results for those
subcategories
where data were available do not alter the conclusions reported
above.
A key outcome in any meta-analysis of selection studies is the
amount
161. of variation in the validities that is attributed to different
situations. For
a majority of the analyses reported in Tables 2, 3, and 4, the
percentage
of variance accounted for by the four statistical artifacts (i.e.,
sampling
16 PERSONNEL PSYCHOLOGY
TABLE 4
Meta-Analysis Results for Personality Dimensions and
Objective
and Subjective Criteria (Pooled Across Occupational Groups)
Criterion type
Extraversion
Productivity data
Turnover/Tenure
Status change
Salary
Objective mean (across criteria)
162. Subjective ratings
Emotional stability
Productivity data
Turnover/Tenure
Status change
Salary
Objective mean (across criteria)
Subjective ratings
Agreeableness
Productivity data
TurnoverATenure
Status change
Salary
Objective mean (across criteria)
Subjective ratings
Conscientiousness
Productivity data
Turnover/Tenure
Status change
Salary
163. Objective mean (across criteria)
Subjective ratings
Openness to experience
Productivity data
Turnover/Tenure
Status change
Salary
Objective mean (across criteria)
Subjective ratings
Total N
1,774
1,437
4,374
666
12,943
1,436
1,495
3,483
170. 82°
60
161
80
119
120
113°
42
° An unbiased estimate of mean percentage of variance
accounted for across meta-
analyses, calculated by taking the reciprocal of the average of
reciprocals of individual
predicted to observed variance ratios (Hunter & Schmidt, 1990).
error and between-study differences in test unreliability,
criterion unre-
liability, and degree of range restriction) failed to exceed the
75% rule
(Hunter & Schmidt, 1990). This suggests that differences in
correlations
may exist across subpopulations.
171. BARRICK AND MOUNT 17
Discussion
This study differs from previous studies by using an accepted
taxon-
omy to study the relation of personality to job performance
criteria. The
results illustrate the benefits of using this classification scheme
to com-
municate and accumulate empirical findings. Using this
taxonomy, we
were able to show that there are differential relations between
the per-
sonality dimensions and occupations and performance criteria.
Before discussing the substantive findings, a comment is in
order
regarding the relatively small observed and true score
correlations ob-
tained in this study. We would like to re-emphasize that our
purpose
172. was not to determine the overall validity of personality; in fact,
we ques-
tion whether such an analysis is meaningful. Rather, the purpose
was to
increase our understanding of the way the Big Five personality
dimen-
sions relate to selected occupational groups and criterion types.
It is likely that the purpose and methodology used in the present
study, both of which differ from other reviews, may have
contributed
to the lower correlations. For example, in the present study,
only those
samples that reported zero-order correlations for all scales from
an in-
ventory were included in the analysis. Studies were excluded if
they re-
ported composite validities or reported only those scales with
significant
correlations. Thus, the results for each of the five dimensions
are based
on the average of the correlations between personality scales
and job
performance criteria. Further, for those studies reporting
multiple mea-
173. sures for each dimension, an average correlation was used in the
meta-
analysis, rather than a composite score correlation (which
adjusts the
average correlation by the sum of the covariances among the
measures
incorporated in the average estimate). Use of the composite
score cor-
relation always results in a mean validity estimate larger in size
than that
resulting from the average correlation (Hunter & Schmidt,
1990). How-
ever, because intercorrelations among personality scales or
dimensions
were generally not reported (even inventory manuals report only
a few
intercorrelations), it was not possible to use the composite score
corre-
lation in this analysis. A better estimate of the validity of a
personality
dimension would be provided by combining all scales measuring
a sin-
gle dimension into a predictor composite. Doing this would
provide a
better measure of the predictive validity of the construct in
174. question.
Therefore, in interpreting the results of this study, the reader
should fo-
cus on understanding which dimensions are the best predictors
for spe-
cific occupations and criterion types rather than on the
magnitude of the
validities because they are underestimates.
The most significant finding in the study relates to the
Conscientious-
ness dimension. It was found to be a consistently valid predictor
for all
18 PERSONNEL PSYCHOLOGY
occupational groups studied and for all criterion types. Thus,
this as-
pect of personality appears to tap traits which are important to
the ac-
complishment of work tasks in all jobs. That is, those
individuals who
exhibit traits associated with a strong sense of purpose,
175. obligation, and
persistence generally perform better than those who do not.
Similar find-
ings have been reported in educational settings where
correlations be-
tween scores on this dimension and educational achievement
(Digman &
Takemoto-Chock, 1981; Smith, 1967) and vocational
achievement (Take-
moto, 1979) have consistently been reported in the range of .50
to .60.
Further evidence that this dimension is a valid predictor of job
per-
formance is found in two studies conducted as part of the U.S.
Army Se-
lection and Classification Study (Project A) (Hough, Hanser, &
Eaton
1988; McHenry, Hough, Toquam, Hanson, & Ashworth, 1990).
Two of
the personality constructs. Achievement Orientation and
Dependability,
were found to be valid predictors of job performance measur es
in both
studies. Although the relationship of the personality constructs