Big Five [Features of Personality] Inventory Review-Kenneth Smith 10 DEC 2013

Runninghead: A Review of BFI 1
Review of the Big Five Inventory (BFI)
Kenneth Smith
Union Institute and University
December 2013

A Review of BFI 2
Abstract
This paper examines the use of the popular Big Five Inventory (BFI) as a tool of personality
assessment. The BFI is a free, short (44-item) inventory which can be taken and scored online
for free. The BFI is based on the Big Five Model of personality, which are described as:
Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Very good
reliability and validly data for the BFI is explored. Possible weaknesses of the BFI results are
also discussed. The BFI appears to be a very useful tool of broad personality assessment based
on the Big Five Model of personality.
Keywords: Big Five Inventory, BFI, personality assessments

A Review of BFI 3
Introduction and General information of Big Five Inventory (BFI)
The Big Five Inventory (BFI)
The BFI - Versions 4a and 54 was created by John, Donahue, & Kentel (1991). It was
first published in 1991 by the Institute of Personality and Social Research at the University of
California at Berkeley. Interestingly the test is fully available online, at the website
www.outofservice.com/bigfive (The Big Five Personality Test, 2013) and scoring is done by
algorithm when the subject has completed the test. Simple interpretation of results are displayed
to the subject with brief comparison of what percentile she participates in, compared to the
population who has taken the test (Srivastava, Oliver, Gosling, & Potter, 2003). The paper
version with the scoring instructions can be downloaded from the Berkeley Personality Lab
(John, 2013). Other forms of the BFI exist, e.g. BFI-10 (Rammstedt & John, 2007), however the
most widely used form is the 44 item inventory and the BFI designation refers to this 44 item
inventory.
The inventory is free if used for non-commercial and research purposes. For commercial
purposes, a request can be made to the publisher, however at this time it cannot be used
commercially (John, 2013). It is estimated that the BFI takes approximately five minutes to take
(Gosling, Rentfrow, & Swann, 2003). No published manual exists for the test, the best way to
examine the norming group (which is described by Srivastava (2013) as better understood as not
so much a norming group as a “comparison samples”) is to see for comparison samples sheet
(Berkeley Personality Lab, 2013) based on age, mean response, and standard deviations. For a
further understanding of the comparison groups see Srivastava et al. (2003).
Purpose and Background of the Test

A Review of BFI 4
Purpose of Test and Theory. This is a personality assessment based on the Big Five
model (referred interchangeably as the Five Factor Model) of personality. These Big Five are
based on natural language terms that provide a taxonomy and a common parlance attempting to
accommodate any theoretical orientation on personality (John, Naumann, & Soto, 2008). These
Big Five personality factors used in the BFI are: Openness to Experience, Conscientiousness,
Extraversion/Introversion, Agreeableness, and Neuroticism (nervousness) (De Raad & Perugini,
2003). The Big Five model is biased on the idea that personality can be generalized into five
major areas that all other personality attributes fit into (John, Naumann, & Soto, 2008).
Intended population. The focus of the Big Five model should make it applicable to a
wide population. It is claimed that Big Five factors of the BFI are based on discovered concepts
of personality and not on theoretical underpinning (John et al., 2008). This suggests that the test
should be applicable to most of the population over the age of 21, regardless of social status and
gender (Srivastava, Oliver, Gosling, & Potter, 2003). A modified test exists for younger persons
with simplified language (John, 2013). The test seems to have applications across cultures as it
has be translated into at least ten languages including Chinese, Hebrew, and Lithuanian (John,
2013). There has been validation of its cultural applicability to Spanish speakers (Benet-
Martinez & John, 1998). A lack of standard manual makes it difficult to have a standard
population, but certainly, based on data easily accessible (Berkeley Personality Lab, 2013), an
American between the ages 21-60 would be a good candidate for the assessment. Further
discussion of the norming groups will take place in the Technical Attribute section of this paper.
Nature and structure of the BFI. The BFI is a questionnaire containing 44 items (The
Big Five Personality Test, 2013). Questions are designed to measure each of Big Five aspects of
personality by the assesse responding how much they agree with a statement. Each question

A Review of BFI 5
assesses just one of the five categories. For instance question 19 (this question is designed to
evaluate the Neurotic scale of the test) of the BFI (The Big Five Personality Test, 2013) reads “I
see myself as someone who worries a lot.” The assesse than assigns a value on a five point
Likert scale ranging from Strongly Disagree (1) to Strongly Agree (5). All items are written-
response in form. There are no subscales with the BFI, though data is provided to compare
subjects to their perspective age groups (Berkeley Personality Lab, 2013).
Evaluation of Practical Administration and Scoring
Test design, layout, and administration. Both the layout and questions of the written
test (John, 2013) and the online version (The Big Five Personality Test, 2013) are very simple
and easy to use. The instructions are clear and free from technical language. Though the site
seems a bit dated in design and color, it seems to work well on multiple platforms and operating
systems. The paper version is a single page. The 44 items are easy to understand and are in non-
technical language. No problems in syntax or grammar was noted in the instructions or the
questions. The 44 items are not intimidating due to length, unlike other personality assessments
that have hundreds of questions.
The test can be self-administered by the assesse or for those having limited reading
ability, read to the assesse with the corresponding number assigned to each item. As the majority
of persons will take the online version of the assessment, the size of font and text can be adjusted
for age and disability. Some may find the online or paper versions less intimidating according to
preferences (e.g. the online version may be intimidating for persons not familiar with
computers).
Face Validity, Tester Qualifications, and Scoring. The face validity of the BFI is very
good. The BFI uses language in common parlance and in simple question forms. The

A Review of BFI 6
descripting words (e.g. worried, aloof, calm, etc.) are directly related to attributes that people
exhibit and have knowledge of. An assesse should defiantly understand why the questions were
being asked.
There are no tester qualifications necessary with the BFI as it is self-administered. The
scoring done by the online version is done by percentile and is presented as such. Along with the
percentile is a descriptor statement. E. g. a person who scores on the 93rd
percentile on the
online version will also receive a descriptor saying (paraphrased from an actual test) that the
person is very outgoing, very social and energetic with others.
While the percentile form of result from the BFI seems preferable to some researchers (e.
g. Srivastava, 2012), standard scores with the deviation from the mean can be calculated with the
paper inventory. To generate the standard score per item a simple mathematic transformation is
done on items measuring negative items and then summed and averaged to generate the standard
score (John, Naumann, & Soto, 2008). Once this score was computed the item can then be
compared to the mean (Berkeley Personality Lab, 2013) and standard deviations can be
calculated for the individual. The calculations are very simple; while a small knowledge of
statistical inference is necessary for understanding standard deviation, it is well within the realm
of a college educated person. In other words, no special training is needed for scoring and
interpretation.
Age Subset. The only subset provided for the comparison group is age (Berkeley
Personality Lab, 2013). This is due to findings that suggest that age significantly effects
expression of the Big Five as recorded by the BFI (Srivastava et al., 2003). Further disscussion
of age effects are examined in the Technial Attibutes section of this paper; however, since age as

A Review of BFI 7
an effect on BFI results, it would be best to compare an individual assesse by both the general
and age specific comparison group to get the best overall view.
Technical Attributes of BFI
Comparison Groups (Norms)
Age. The ages ranged from 21-60, those over 60 were not included in the sample due to
the lack of internet use (Helson, Kwan, John, & Jones, 2002). The mean age of the respondents
were 31 with a standard deviation of 9 years. Personality as measured by the BFI does seem to
be affected by age, however these are most likely due to developmental or secular trends than
assessment bias (Srivastava et al, 2003). Therefore, it is best to compare an individual against
their age group on the written test; it is unclear on the online version if comparison is done to the
whole comparison group or to the age subset, which could be a weakness of the online version.
Recruitment Process and Nationality. Data collected online from participants on
nationality and age was self-reported. The comparison group (which the authors who collected
the data prefer to call it (John, 2013; Srivastava, 2012) as compared to norming group) is based
data provided from a paper by Srivastava et al. (2003). The subjects were recruited from the
internet (http://www.outofservice.com/bigfive/). The sample (n=132,515) was of American
(90.8%) and Canadian (9.2%) (Srivastava et al., 2003).
Ethnicity. Ethnicity data was self-reported. The vast majority of the respondents in the
comparison group were White (n=110,004). While a small percentage of the sample, a large
number of ethnic minority persons took part: Asian (n=5,710), Black (n=2,414), Latino
(n=2,094). These numbers are taken from Srivastava et al. (2003) which is the study that
collected the data, as Helson et al. (2002) reported that the Srivastava et al. (2003) study had a
much larger number of ethnic minority participants, e.g. Helson et al. reported that 12,000 Asian

A Review of BFI 8
Americans were in the comparison group, far larger than Srivastava et al. (2003) reported.
Srivastava et al. (2003) ran regressions to control for ethnicity and found that ethnicity had little
effect on the findings. This agrees with the findings of study by Benet-Martinez & John (1998)
that used the BFI to compare American English speakers and American Spanish speakers. The
studies suggest that the BFI in appropriate to use on different ethnic groups in the US for general
personality measures.
Social status of participants. Social class was reported by the participants. Social class
was reported (with percentage of the respondents reporting participating in each) as: poor (1%),
working class (18%), middle class (54%), upper-middle class (25%), and upper class (2%)
(Srivastava et al, 2003). Controlling for social class did not significantly affect the results of the
comparison groups. Also the sample seems well spread and diverse (Helson et al., 2002).
Therefore, the BFI should be appropriate regardless of social class.
Gender. Gender does seem to effect the results of the BFI. Sirvastava et al. (2003)
reported that gender is correlated with result at all ages. However, as no subset on gender exists
for the comparison group on the BFI. Therefore it is not possible to do an inter-gender
comparison with the comparison group.
Summary of Comparison Group. As the above reports suggest, that the only subset
and area where consideration of assessment interpretation seems to be is on age of the individual
taking the inventory. The diversity of the comparison group appears to be wide enough to ensure
validity of results. As nationality was American base on the comparison sample, it is unclear
how the valid the comparison group would be to subjects of other nationalities, though some
studies suggest (e.g. Benet-Martinez & John, 1998) nationality would have very little effect on
results. Also, as social class and ethnicity of subjects on the BFI had no significant affect, the

A Review of BFI 9
test should be acceptable to all participants in these attributes. A lack of gender groups in the
data of comparison group provided (Berkeley Personality Lab, 2013) may be a weakness of the
BFI, since gender may have an effect on BFI results.
Reliability of BFI
Parallel forms.
Different languages. The BFI has been produced in parallel forms in various ways. The
BFI’s translation into other languages (Benet-Martinez & John, 1998; Rammstedt & John, 2007)
produce a parallel form to test the measures on the BFI. Rammstedt & John’s (2007) comparison
report about the use of the BFI (which they refer to as the BFI-44) in German and English
showed that both versions of the test seemed to correctly measure the Big Five. Similarly, when
the BFI is given in Spanish and English forms both seem to have high reliability and accurately
reflect the test scales in parallel language forms (Benet-Martinez & John, 1998).
Different forms of test. Another test of reliability presented for the BFI was comparing it
to different versions. Rammstedt & John (2007) used BFI (referred to as the BFI-44) and short
version (BFI-10) in a study seeking to validate the BFI-10. This short form was simply taking
two questions from the BFI-44 for to measure each of the Big Five. They found that the short
form and the long form where highly correlated with each other (on specific items the range was
between .74 and .9 in English), suggesting that the short form results were measuring similar
attributes between the two forms. It can be inferred then, that the individual questions
themselves on the BFI are highly related to what is being measured.
Test- Retest Reliability. The test- retest reliability of the BFI appears to be good.
Retests of the BFI given three months apart have been reported to have an average correlation of
.85 (John et. al, 2008). In a study by Hampson & Goldberg (2006 in John et al. (2008)) when the

A Review of BFI 10
BFI was given to a middle-aged sample found that there was .74 average correlation between the
test and retest. These high correlations support the reliability of the test with individuals.
Scorer Reliability. There is very little risk of scorer error on the online version of the
BFI, as it is automated. Some small risk of scorer error exists in the written version, mostly due
to the risk of failing to transform certain number scores as required (Berkeley Personality Lab,
2013). However, due to the simplicity of calculating results, scorer reliability should be high.
Long Term Stability of the BFI. As discussed previously, age seems to effect the
results of the BFI (Srivastava et al., 2003). However, it should not be surprising that personality
might change with age. Srivastava et al. (2003) point out that any studies have reported that age
may have an effect on personality attributes as a part of human development. Therefore, if
changes in personality is noted as an individual ages, this may not in any way be due to test
reliability. However, if there is a secular trend in personality along generational lines, as
suggested by Twenge (2000 in Srivastava et al., 2003), then the comparison group provided for
the BFI may eventually become less representative of the population over time.
Validity of the BFI
Item Validity. The item/questions on the BFI were “developed to represent the Big Five
prototype definitions…a canonical representation of factors intended to capture their core
elements across the previous studies, samples or instruments” (John et al., 2008, p. 129). In
other words, the focus on using data collected from previous studies to construct the BFI was
designed to make the BFI items based on empirical findings, not theoretical constructs. The
initial items determined for the development of the BFI were selected by a consensus of expert
judges (Rammstedt & John, 2007). The initial proto-BFI had 54 items (John, 2013) and the final
44 items were determined by factor analysis from sample of junior college and university

A Review of BFI 11
students (John et al., 2008). Item analyses were empirically carried out to confirm that the core
Big Five items, which the BFI was designed to measure, were used to confirm the expert judge’s
choices of BFI questions/items (Rammstedt & John, 2007). This expert judgment and empirical
item analyses provide the basis for the validity of the items on the BFI (Srivastava et al., 2003).
Validity of BFI by comparison to other Big Five assessments of personality. The BFI
was compared to other measures of the Big Five model to confirm validity of the BFI. The two
most used short measurements of the Big Five apart from BFI are Goldberg’s (1992 in John et
al., 2008) Trait Descriptive Adjectives (TDA) and Costa & McCrae’s (1992 in John et al., 2008)
NEO Five-Factor Inventory (NEO-FFI). Results from the TDA, NEO-FFI, and the BFI were
compared to find convergent validity coefficients between them (John et al., 2008).
While some variation between these three assessments exist, the positive correlations
between all three are very good. Corrected convergent validity correlations means between the
three instruments are as follows based on each of the Five Factors the BFI tests: .94
Extraversion, .95 Agreeableness, .95 Conscientiousness, .86 Neuroticism, and .9 Openness (John
et al., 2008). With such strong correlations between three separately developed instruments to
measure the Big Five factors, it can be asserted that “…the Big Five are fairly independent
dimensions that can be measured by several instruments with impressive convergent and
discriminate validity” (John et al., 2008, p. 134). The results of the comparison between other
measures and the BFI suggest that BFI is a valid instrument in measuring the Big Five areas of
personality.
BFI and Peer Ratings Validity. A major way of evaluating validity of personality
measures is to demonstrate strong correlations between the measure’s result and ratings of
personality factors by people who know the subject (peer ratings) being assessed (Rammstedt &

A Review of BFI 12
John, 2007). Correlations between the BFI and peer ratings are reported to be very good
(Rammstedt & John, 2007; John et al., 2008). Strong correlations between peer ratings and BFI
are not effected by different languages in which the BFI is administered, as shown in English
(John et al., 2008), Spanish (Benet-Martinez & John, 1998) and German (Rammstedt & John,
2007). The reported agreement between BFI results and peer ratings give strong support to BFI
validity.
Wide use of the BFI
Very few critical reviews dealing directly with BFI exist. However, many in the study of
personality have embraced and enthusiastically use th Big Five model of personality (Cohen,
Swerdlik, & Sturman, 2013). In fact, “…the Big Five model has acquired the status of a
reference-model” (De Raad & Perugini, 2003, p. 143). This is evident in the BFI’s in very wide
use in the research literature. The wideness of the BFI use in various languages strongly suggests
that the BFI is a strong, consistent measure of the Big Five model of personality.
Possible Weaknesses of the BFI
The possible weakness of language to describe Big Five. Possible weaknesses directly
cited of the BFI in the literature appears to be very rare. However, critiques of the Big Five
Model may indicate an area of possible weakness of the BFI. A major criticism of the Big Five
is its reliance on language to describe traits of personality. This broad use of comparative
statements to assess personality (McAdams, 1992) may have the BFI simply recording the
consistently of language to describe traits, possibly independent of personality. This thought
may be supported by a study that looked at pre-literate societies in Bolivia using the BFI
(Gurven, von Rueden, Massenkoff, Kaplan, & Vie, 2012). The study found that the BFI could
not reliably be used to test personality in this pre-literate society. This may suggest that

A Review of BFI 13
personality is not as universal as the Big Five model would posit or it could be a demonstration
that the perception of personality is based on language cues that may not capture real personality
(Gurven et al., 2012).
Cultural pressures. Another possible weaknesses of BFI results are cultural pressures.
Gurven et al. (2012) cite numerous studies that suggest that culture can have a great influence in
the expression of personality traits, even in non-human species. An expression/repression of
certain parts of the personality has been hypnotized to have survival value, but may cause an
individual to not express all parts of her whole personality. It is also well known that results of
test of personality may get an individual to accept vague descriptions of personality as true
which may not really capture the personality; this is known as the Barnum effect (Cohen et al.,
2013).
Applicability of the BFI results to individuals. Finally a weakness can be noted in what
the results from the BFI really tell about an individual. Descriptions of the Big Five are very
encompassing. For instance, a person who scores in the 69th
percentile of Agreeableness in the
online version will be advised that [paraphrased] “you tend to weigh the feelings of others’ (The
Big Five Personality Test, 2013). What such vague feedback may do for a person is unknown.
Also in any personality test, it is best to try to get the individual to compare to the smallest group
you want to know something about (Srivastava S. , 2013). With very large samples and lack of
subgroup, the results of the BFI may be very useful in large studies but basically worthless for
the individual (e.g. just because the BFI tells you your highly neurotic as compared to the general
population, how does that help you?). Some have described the Big Five model as a psychology
of a stranger; that the information provided is so nebulas that it would only be useful to a person
who has never met the subject who took the BFI (McAdams, 1992)

A Review of BFI 14
Summary of the BFI
As this paper demonstrates that the BFI is a cost effective and easy way to assess
personality traits biased on the popular Big Five model of personality. The shortness and ease of
use of the BFI lends itself to wide use in both individuals and psychological research. The
reliability of the test is well established by parallel tests, test-retest, and scorer error comparisons.
Validity of the measures are supported by its use expert panels to construct items, comparisons to
other assessments of the Big Five, and peer ratings. The BFI’s very wide use in research also
suggests that very respected as a measure of personality.
The possible weaknesses of the BFI tend to be focused on the Big Five model itself. The
reliance on the BFI on language constructs may affect what is really being assessed; is language
or personality what the BFI really measures? Also, as with all personality assessments, do the
results affect the person or does the person just accept vague descriptors of personality as true?
Furthermore, do the descriptors provided really give any useful information to the person or is it
so wide that the results would only be useful to a stranger who do not know the subject
(McAdams, 1992)?
Despite some of the possible weaknesses of the BFI, for a simple overview of a person’s
personality, the BFI would be an effective tool. The BFI should certainly be valid for any
literate person in the US for the assessment of personality, due to its very large comparison
groups. The availably online with automated scoring is also a great advantage over some other
tools of personality assessment. While other tools may be needed for much more specific
descriptors personality (John, 2013), the BFI would be a good starting place in any area requiring
personality assessment.

A Review of BFI 15
References
Benet-Martinez, V., & John, O. (1998). Los Cinco Grandes Across Cultures and Ethnic
Groups:Multitrait Multimethod Analyses of the Big Five in Spanish and English. Journal
of Personality and Social Psychology, 75(3), 729-750.
Berkeley Personality Lab. (2013, December 7). Comparison Sample: Means and Standard
Deviations for Big Five Inventory (John & Srivastava, 1999) by Age. Retrieved from
http://www.ocf.berkeley.edu/~johnlab/pdfs/BFI%20Comparison%20Samples%20%28Ag
es%2021%20-%2060%29.doc
Cohen, R., Swerdlik, M., & Sturman, E. (2013). Psycholigcal Testing and Assessement. New
York: McGraw-Hill.
De Raad, B., & Perugini, M. (2003). Big Five Model Assessment. In R. Fernández Ballesteros,
Encyclopedia of Psychological Assessment (pp. 138-144). London: SAGE Publications.
Gosling, S., Rentfrow, P., & Swann, W. (2003). A very brief measure of the Big-Five personality
domains. Journal of Research in Personality, 37, 504-528.
Gurven, M., von Rueden, C., Massenkoff, M., Kaplan, H., & Vie, L. (2012). How Universal Is
the Big Five? Testing the Five-Factor Model of Personality Variation AmongForager–
Farmers in the Bolivian Amazon. Journal of Personality and Social Psychology, doi:
10.1037/a0030841.
Helson, R., Kwan, V., John, O., & Jones, C. (2002). The growing evidence for personality
change in adulthood: Findings from research with personality inventories. Journal of
Research in Personality, 36, 287-306.

A Review of BFI 16
John, O. (2013, DEC 7). Berkeley Personality Lab. Retrieved from
http://www.ocf.berkeley.edu/~johnlab/bfi.htm
John, O., Donahue, E., & Kentle, R. (1991). The Big Five Inventory—Versions 4a and 54.
Berkeley: University of California, Berkeley, Institute of Personality and Social
Research.
John, O., Naumann, L., & Soto, C. (2008). Paradigm Shift to the Integrative Big Five Trait
Taxonomy. In R. W. O. P. John, Paradigm Shift to the Integrative Big-Five Trait
Taxonomy: History, Measurement, and Conceptual Issue (pp. 114-158). New York :
Guilford Press.
McAdams, D. (1992). The five-factor model in personality: a critical appraisal. J Pers., 60, 329-
361.
Rammstedt, B., & John, O. (2007). Measuring personality in one minute or less: A 10-item short
version of the Big Five Inventoryin English and German. Journal of Research in
Personality, 41, 203-212.
Srivastava, S. (2012, OCT 17). Norms for the Big Five Inventory and other personality
measures. Retrieved from The Hardest Science:
http://hardsci.wordpress.com/2012/10/17/norms-for-the-big-five-inventory-and-other-
personality-measures/
Srivastava, S. (2013, DEC 7). Measuring the Big Five Personality Factors. Retrieved from
http://psdlab.uoregon.edu/bigfive.html

A Review of BFI 17
Srivastava, S., Oliver, O., Gosling, D., & Potter, J. (2003). Development of personality in early
and middle adulthood: Set like plaster or persistent change? Journal of Personality and
Social Psychology, 84(5): 1041-1053.
The Big Five Personality Test. (2013, December 7). Retrieved from outofservice.com:
http://www.outofservice.com/bigfive/

Big Five [Features of Personality] Inventory Review-Kenneth Smith 10 DEC 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Five [Features of Personality] Inventory Review-Kenneth Smith 10 DEC 2013

Similar to Big Five [Features of Personality] Inventory Review-Kenneth Smith 10 DEC 2013 (16)

Big Five [Features of Personality] Inventory Review-Kenneth Smith 10 DEC 2013