67
3Foundations of
Psychological Testing
Noel Hendrickson/Photodisc/Thinkstock
Learning Outcomes
After reading this chapter, you should be able to
• Identify the purpose and uses of psychological testing.
• Describe the characteristics of a high-quality psychological
assessment tool or selection method.
• Explain the importance of reliability and validity.
• Identify commonly used psychological test design formats.
• Recognize the types of tests used to assess individual
differences.
• List the steps needed to develop and administer tests most
effectively.
• Discuss special issues in testing, including applicants’
reactions to testing and online administration.
• Summarize the importance of testing for positivity.
you83701_03_c03_067-102.indd 67 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
68
Section 3.2 What Are Tests?
3.1 The Importance of Testing
When you hear the word test, what comes to mind? For many
people, tests are not a pleasant
activity. Most of us can remember wishing, as children, to grow
up and finish school so we
would never have to take a test again. Of course, as adults, we
discover that tests affect our
lives long after we earn our diplomas. Tests determine whether
we can drive a car, get into
a graduate or job-training program, or earn a professional
certification. They influence our
career choices and, quite often, our career advancement.
What profession do you plan to pursue? Do you want to be a
doctor or a lawyer? How about
a police officer or firefighter? Perhaps you would like to earn
your MBA and start your own
business? Each of these examples, along with most professions,
require many years of tests,
demand high levels of knowledge and skills, and require
continued education and recertifica-
tion testing.
Businesses use tests to help determine whether job applicants
possess the skills and abilities
needed to perform a job. After an applicant is hired, tests will
help determine placement in an
appropriate training and development program. Throughout an
employee’s career, the orga-
nization may require testing for new job placements or
promotions.
As you can see, tests can have a significant influence on
people’s lives; they can help identify
talent and promote deserving candidates. But they can also be
misused. Unfortunately, there
are many poorly designed psychological tests on the market.
They seduce organizations with
promises of fantastic results but do little to identify quality
employees. I/O psychologists pos-
sess the knowledge, skills, and education to design, implement,
and score measures that meet
the legal and ethical standards for an effective psychological
test.
The goal of this chapter is not to teach you how to design
quality psychological tests, but
rather to acquaint you with the requirements, challenges, and
advantages of doing so. Fur-
thermore, understanding the test-making methods that I/O
psychologists use will make you
a more informed consumer of tests for your own personal and
professional goals.
3.2 What Are Tests?
In general, a test is an instrument or procedure that measures
samples of behavior or perfor-
mance. In an employment situation, tests measure an
individual’s employment and career-
related qualifications and characteristics. The Uniform
Guidelines on Employee Selection
Procedures (1978) defines a test as any method used to make a
decision about whether to
hire, retain, promote, place, demote, or dismiss an employee or
potential employee. By this
definition, then, any procedure that eliminates an applicant from
the selection process would
be defined as a test. As discussed in Chapter 2, examples
include application forms that eval-
uate education and experience; résumé screening processes;
interviews; reference checks;
performance in training programs; and psychological, physical
ability, cognitive, or knowl-
edge-based tests.
you83701_03_c03_067-102.indd 68 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
69
Section 3.2 What Are Tests?
I/O psychologists are concerned with design-
ing and implementing selection systems that
identify quality job candidates. Clearly, orga-
nizations want to hire the best workers, but
it is a real challenge to screen candidates
who vary widely in their KSAOs, behaviors,
traits, and attitudes—or what is known as
their individual differences. This is especially
true when hiring decisions are made with
only basic tools such as application forms
and short interviews. To help organizations
better measure applicants’ personal charac-
teristics, I/O psychologists have developed
psychological measurements to identify how
and to what extent people vary regarding
individual differences (Berry, 2003).
What Is the Purpose of
Psychological Testing and Selection Methods?
In employment, tests and other selection tools are used to
predict job performance. Keep in
mind that job performance can never be predicted with 100%
accuracy. The only way employ-
ers could reach such a high level of accuracy would be to hire
all the applicants for a particu-
lar job, have them perform the job, and then choose those with
the highest performance. Of
course, this approach is neither practical nor cost effective.
Moreover, even if an organiza-
tion could afford to hire a large number of applicants and retain
only those who performed
best, performance prediction is still not perfectly accurate. For
example, many organizations
have probationary periods in which the employer and the
employee try each other out before
a more permanent arrangement is established. Employees may
be motivated to perform at
a much higher level during the probationary period in order to
secure permanent employ-
ment, but performance levels may drop once the probationary
period is over. Moreover, job
performance may be influenced over time by a myriad of factors
that cannot be predicted or
managed.
Although it is impossible to perfectly predict job performance,
psychological testing and
selection methods can provide reasonable levels of prediction if
they accurately and consis-
tently assess predictors that are related to specific performance
criteria. As briefly introduced
in Chapter 2, accurately predicting performance criteria—
usually referred to as validity—
ensures that test results indicate performance outcomes, so that
those who score favorably
on the test are more likely to be high performers than those who
do not. Simply put, valid-
ity reflects the correlation between applicants’ scores on the
test or selection tool and their
actual performance. A high correlation indicates that test scores
can accurately predict per-
formance. A low correlation indicates that test scores are poorly
related to performance.
In order to assess the validity of a selection tool, job
performance must be quantifiable, mean-
ing that there is a set of numbers associated with applicants’
test scores so one can calculate
a correlation. Performance scores are usually obtained from
performance appraisal systems,
which will be discussed in more detail in Chapter 4. Poorly
designed performance appraisal
Fuse/Thinkstock
Students take tests to demonstrate
their knowledge of a particular subject.
Similarly, employers administer exams to
job applicants to measure employment
and career-related qualifications and
characteristics.
you83701_03_c03_067-102.indd 69 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
70
Section 3.2 What Are Tests?
systems can hinder an organization’s ability to assess the
validity of its selection methods.
For example, performance evaluations are sometimes highly
subjective. Some managers tend
to score all of their employees similarly regardless of
performance in order to ensure they all
receive a raise. Some do so to sidestep confrontation or to avoid
having to justify the decision.
This lack of variability in scores can bias the results of the
statistical analysis’s underlying
validity, preventing it from adequately calculating or comparing
the validity of various selec-
tion methods.
Another important determining factor of tests and other
selection tools is reliability. Also
referred to as consistency, reliability is the extent to which test
scores can be replicated over
time and across situations. A reliable test will reflect an
applicant’s aptitude rather than the
influence of other factors such as the interviewer, room
temperature, or noise level. For exam-
ple, many organizations use multiple interviews or panel
interviews to evaluate applicants so
that there are multiple raters scoring each applicant. Such
processes have scorers assign both
an absolute score, which measures how the applicant did in
relation to the highest possible
score, and a relative score, which measures how the applicant
did in relation to the rest of the
interviewees. When the scores assigned by these multiple raters
are comparable in terms of
absolute scores for each applicant, as well as relative scores and
rankings across applicants,
the interview process is considered reliable. On the other hand,
if different raters score the
same applicant very differently, and if the interview process
yields different rankings across
applicants and thus different hiring recommendations, then the
process is unreliable. Similar
to validity, no test or selection method has perfect reliability,
but the more reliable and con-
sistent a selection tool is, the more accurate it will be in
determining quality candidates, and
the more legally defensible it will be if the organization is sued
for discriminatory hiring. An
objective and systematic selection process that leads to
consistent results across candidates
and raters is an organization’s first line of defense against such
accusations.
Ensuring that tests are both valid and reliable is an important
part of the assessment process.
Of course, the more accurate the testing process, the more likely
the best candidates will be
selected, promoted, or matched with the right job. However,
that is not the only reason to
do so: invalid or unreliable tests can be costly. Many tests need
to be purchased or require a
license of use to be obtained. Testing also takes time, both for
the candidate and the organiza-
tion. Tests need to be administered, be rated, and have the
results reported, which requires
managers’ and HR professionals’ time and effort. Tests that are
not valid or reliable also have
opportunity costs, such as the time spent using them as well as
the lower productivity of
those who were hired or promoted using the wrong test.
Finally, there are legal implications for ineffective testing. An
invalid test may not be related
to performance, but it may be discriminatory. It may favor
certain protected classes over oth-
ers. For example, if younger job applicants consistently score
higher than older ones on a test,
and these scores are not related to job performance, then that
test may be found discrimina-
tory. Similarly, if a particular test favors men over women or
places minority applicants at a
disadvantage, it can be considered discriminatory and thus
illegal. For example, complaints
were filed against the pharmacy chain CVS Caremark for using
a discriminatory personality
test. The test included questions about the applicant’s
propensity to get angry, trust others,
and build friendships. These questions were found to be
potentially discriminatory against
applicants with mental disabilities or emotional disorders
(Tahmincioglu, 2011). Although
the organization may have had no intent to discriminate, using
invalid discriminatory tests
can result in what was referred to in Chapter 2 as “disparate
impact,” which is also illegal.
you83701_03_c03_067-102.indd 70 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
71
Section 3.2 What Are Tests?
Uses of Tests
Companies of all sizes are integrating tests into their
employment practices. In 2001 a study
by the American Management Association found that 68% of
large U.S. companies used job-
skill testing as part of their employment process; psychological
(29%) and cognitive (20%)
measurements were used less frequently. More recent studies,
however, show that the test-
ing trend is on the rise, with nearly 80% of Fortune 500
organizations using assessments of
Consider This: Tests and Testing
Make a list of as many tests as you can remember having taken
during times when you were up
for selection from a pool of potential candidates (e.g., jobs,
volunteering opportunities, college
admissions, scholarships, military service). Remember that a
test can be any instrument or
procedure that measures samples of behavior or performance. It
does not have to be a written,
proctored exam.
Questions to Consider
1. What do you think each of those tests was attempting to
measure?
2. What is your opinion of each of those tests?
3. Did the tests adequately measure what they were trying to
measure?
4. What were some strengths and weaknesses of each test?
Find Out for Yourself: The Validity and Reliability of
Commonly
Used Selection Tools
Visit the following websites to read about the different types of
validity and reliability, as well
as how they are measured.
Validity & Reliability of Methods
Psychometric Assessment Validity
Psychometric Test Reliability
What Did You Learn?
1. Compare the validity of various selection methods such as
interviews, reference checks,
and others. If you were a recruiter, which ones would you use?
Why?
2. If you were to use an actual test, how long would you design
the test to be? Why?
3. If you were to design an interview process, how many
interviewers would you use for
each candidate? What are the benefits of using more than one
interviewer (rater)?
4. What are the key factors in increasing the validity of the
selection process?
5. What are the key factors in increasing the reliability of the
selection process?
you83701_03_c03_067-102.indd 71 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
http://www.nicheconsulting.co.nz/validity_reliability.htm
http://www.nicheconsulting.co.nz/validity.htm
http://www.nicheconsulting.co.nz/reliability.htm
72
Section 3.3 Requirements of Psychological Measurement
some sort (Dattner, 2008). Moreover, about a quarter of
employers utilize online personality
testing to weed out applicants early in the recruitment process,
even before any other screen-
ing tool is used; this trend is expected to increase by about 20%
annually. Examples include
large, nationwide employers such as McDonald’s and CVS
Caremark (Tahmincioglu, 2011).
The growing popularity of testing within organizations has
resulted in the use of tests not
only for selection but also for a number of other HR functions.
One of the most important ways in which organizations use tests
is to evaluate job applicant
or employee fit. Organizations often administer tests to
accurately evaluate an applicant’s
job-related characteristics or determine an employee’s
suitability for promotion or place-
ment in a new position within the company. Because promotions
and training are expensive,
organizations place high importance on being able to determine
which employees possess
the higher level abilities and skills needed to assume advanced
job positions. Similarly, during
job reorganizations, companies must be able to place
individuals into new jobs that align with
their skills and abilities. Keep in mind that selection,
promotion, and job-placement processes
all involve employment decisions and thus must be well
designed in order to meet the requi-
site legal and professional standards.
HR professionals make use of tests in areas outside the realm of
employment selection. Train-
ing and development is one such example. Trainees are often
tested on their job knowledge
and skills to determine the level of training that will fit their
proficiency. At the end of train-
ing, they may take tests to assess their mastery of the training
materials or to identify areas
where they need to be retrained. Other types of tests help
individuals identify areas for self-
improvement, and sometimes job teams take tests to help
facilitate team-building activities.
Finally, tests can help individuals make educational or
vocational choices. People who work
at jobs that utilize their skills and interests are more likely to be
successful and satisfied, so
it is important that vocational and educational tests make
accurate matches and predictions.
Note that tests used solely for career exploration or counseling
need not meet the same strict
legal standards as tests used for employee selection.
3.3 Requirements of Psychological Measurement
Tests designed by I/O psychologists possess a number of
important characteristics that set
them apart from other tests you may have taken. Scientifically
designed tests differ from mag-
azine quizzes or informal tests that you find online in that they
are more than a set of ques-
tions related to a specific topic. Instead, they must meet
standards related to administration
(the way in which the test is given), scoring methods, score
interpretation, reliability, and
validity. Unfortunately, many employers and consultants think
they can simply put together a
test or an interview protocol that they believe measures what
they feel is necessary and start
using the selection tool without any statistical analysis to assess
its quality. In addition to the
fact that this approach does not effectively distinguish
applicants with higher performance
potential, which makes it a waste of time and resources, it can
also yield inconsistent and
discriminatory results, which makes it illegal.
you83701_03_c03_067-102.indd 72 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
73
Section 3.3 Requirements of Psychological Measurement
Standardized Administration
To administer a selection test properly, the conditions under
which applicants complete the
test must be standard—that is, they must be the same every time
the test is given. These con-
ditions include the test materials, instructions, testing facilities,
time allowed for testing, and
allowed resources and materials. To ensure standardization,
organizations put instructions
into written form or administer the test to large groups so that
all applicants hear the same
instructions. Additionally, applicants will all complete the test
in the same location using
well-functioning equipment and comfortable seating. Test
administrators are also careful to
keep the testing environment comfortable in terms of
temperature and humidity as well as
free from extraneous noise or other distractions.
Variations in testing conditions can significantly
interfere with results, making it impossible to
create accurate comparisons between different
applicants based on the conditions in which they
were tested.
Consider how changing even one aspect of test-
ing conditions can affect test performance. What
would happen if, on a cold day in the middle
of winter, the heat stopped working partway
through a series of applicant evaluations? Appli-
cants might not perform well on a typing test
because their hands were cold and stiff, or they
might not complete a written test because they
were shivering and could not concentrate. Now think about how
differently two groups of
test takers would perform if one group accidentally received
incomplete instructions from an
inexperienced administrator, while a second group received the
proper instructions from an
experienced administrator. You can easily see how unfair it
would be to try to compare test
results of applicants who were not all tested under equal
conditions!
Of course, it is sometimes not only appropriate but also
necessary to alter the testing condi-
tions. Applicants with disabilities may need specific
accommodations, such as a sign language
interpreter for a person with a hearing impairment or a reader or
Braille version of a written
test for a person with a visual impairment. For applicants with
disabilities, then, not allow-
ing for changes in the testing conditions would make it
difficult, if not impossible, for them to
perform their best.
A real-life example of this occurred when an on-campus
recruiter for a highly coveted intern-
ship program noticed that, despite performing as well as other
students when interviewed
on campus, minority students from very low-income areas fared
poorly when invited to an
on-site interview. The recruiter suspected that the
organization’s luxurious office building
and extravagant furnishings may have intimidated those
students and caused their poor per-
formance. To further investigate his point, he changed the
location of the interview to a local
community youth center—everything else about the interview
process was kept identical.
Under these conditions, the students’ performance improved
significantly and was no differ-
ent from the overall applicant pool. In other words, the change
was necessary to neutralize
the distracting effects of an otherwise unrelated testing
condition.
Lisa F. Young/iStock/Thinkstock
It is important that all applicants take the
same test under the same conditions.
you83701_03_c03_067-102.indd 73 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
74
Section 3.3 Requirements of Psychological Measurement
Objective Scoring
Just as performance on a test can be affected by testing
conditions, results can be affected
by the way in which the test is scored. To eliminate the
possibility of bias, test scores must
be standardized, which means that a test must be scored in the
same way for everyone who
takes it. Establishing a scoring key or a clear scoring criterion
before administering a test will
reduce an evaluator’s subjective judgments and produce the
same or similar test scores no
matter who evaluates the test.
Organizations can utilize both objective tests and subjective
tests. Objective tests, such
as achievement tests and some cognitive ability tests, have one
clearly correct answer—
multiple-choice tests are an example. As long as a scoring key
is available, scoring an objec-
tive test should be free from bias. In contrast, subjective tests,
such as résumé evaluations,
interviews, some personality tests, and work simulations, have
no definitive right or wrong
answers; their scores rely on the interpretation and judgment of
the evaluator. To minimize
the influence of personal biases and increase scoring accuracy,
the evaluator uses a predeter-
mined scoring guide or template, also sometimes referred to as a
rubric, which establishes
a very specific set of scoring criteria, usually with examples of
what to look for in order to
assign a particular score. For example, if a rater is scoring an
applicant on professionalism
using a scale of 1 to 5, with 3 being “average” or “meeting
expectations,” this score can also
be accompanied by a description of what is considered “meeting
expectations” so the assess-
ment is not subject to the rater’s interpretation of what is
expected. Although subjective tests
are commonly used to make employment decisions, objective
tests are preferred for making
fair, accurate evaluations of and comparisons between job
candidates. In the majority of situ-
ations, not everything can be objectively measured, so both
subjective and objective tests are
used to combine the specificity of objective tests and the
richness of subjective tests.
Score Interpretation
After a test has been scored, the score needs to be interpreted.
Once again, this process must
be standardized. Additionally, to be interpreted properly, a
person’s score on a test must be
compared to other people’s scores on the same test. I/O
psychologists use a standardization
sample, which is a large group of people who have taken the
test against whose scores an
individual’s score can be compared. The comparison scores
provided by the standardization
sample are called test norms. The demographics of the
standardization sample can be used
to establish test norms for various racial and ethnic groups, men
and women, and groups of
different ages and various education levels.
Let’s look at an example of how test-score interpretation works.
As part of a selection process,
an applicant might answer 30 out of 40 questions correctly on a
multiple-choice cognitive
ability test. By itself, this score provides little information
about the applicant’s level of cogni-
tive ability. However, if we can compare it to how others
performed on the same test—spe-
cifically, the test norms established by a standardization
sample—we will be able to ascribe
some meaning to the score.
Often, raw scores, the number of points a person scores on a
test, are transformed into per-
centile scores, which tell the evaluator the percentage of people
in the standardized sample
who scored below an individual’s raw score. Continuing the
example above, if the score of 30
falls in the 90th percentile, 90% of the people in the
standardized sample scored lower than
our fictional applicant.
you83701_03_c03_067-102.indd 74 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
75
Section 3.3 Requirements of Psychological Measurement
Test Reliability
As introduced earlier, test reliability refers to the dependability
and consistency of a test’s
measurements. If a person takes a test several times and scores
similarly on it each time, that
test is said to measure consistently. If a test measures
inconsistently, then outside factors
must be influencing the results. For example, if an applicant
takes a mechanical ability test one
week and correctly answers 90 out of 100 items, but then takes
another form of the test the
following week and gets only 50 out of 100 items correct, the
test evaluator must ask whether
the tests are actually doing what they’re supposed to be doing—
measuring mechanical abil-
ity—or if something else is influencing the scores. Examples of
common but often unreliable
interview questions include “tell me about yourself ” and
“discuss your strengths and weak-
nesses.” Such questions are unreliable because the answer may
vary widely depending on the
applicant’s mood or recollection of recent events. Moreover,
interpretation of the answers
is subjective and depends on whether the interviewer likes or
dislikes what the applicant
says. There is also a very limited scope for comparing answers
across applicants to determine
which answers are higher or lower quality. On the other hand,
more targeted and job-related
questions—such as “tell me about a situation where you had to .
. .” or “what would you do
if you were faced with this situation”—are more likely to yield
consistent and comparable
responses. Thus, before trusting scores from any test that
measures inconsistently, I/O psy-
chologists must discover what is influencing the scores.
The test taker’s emotional and physical state can influence his
or her score. A person’s mood,
state of anxiety, and level of fatigue may change from one test-
taking time to another, and
these factors can have a profound effect on test performance.
Illness can also impact perfor-
mance. If a person is healthy the first time she takes a test, but
then has a cold when she takes
the test again, her score will likely be lower the second time
around.
Changing environmental factors can also make a test measure
inconsistently. Differences
from one testing environment to another, such as room lighting,
noise level, temperature,
humidity, and equipment, will affect a person’s performance, as
will the relative completeness
or incompleteness of instructions and manner in which they are
given.
Differences between versions of the same test also influence
reliability. Many tests have
more than one version or form (the written portion of a driver’s
license test is an example).
Although the test versions aim to measure the same knowledge,
the test items or questions
vary. If the questions in one version are more difficult than
those in another, the test taker
may perform better on one version of the test.
Finally, some inconsistency in test scores stems from real
changes in the test taker’s KSAOs.
These changes often appear if a significant period of time
passes between tests. High school
students taking the SAT, for example, may show vast increases
in scores from their junior year
to their senior year because they have improved their cognitive
ability, subject knowledge,
and/or test-taking skills.
Measures of Reliability
Typically, reliability is measured by gathering scores from two
sets of tests and then deter-
mining their association. The reliability coefficient states the
correlation—or relation-
ship—between the two score sets, and ranges from 0 to +1.00.
Although we won’t go into
you83701_03_c03_067-102.indd 75 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
76
Section 3.3 Requirements of Psychological Measurement
mathematical details of calculating correlations here, it is
important to understand that the
closer the score sets approach a perfect +1.00 correlation, the
more reliable the test. For
employment tests, reliability coefficients above +.90 are
considered excellent, and those
above +.70 are considered adequate. Tests with reliability
estimates lower than +.70 may
possess sufficient errors to make them unusable in employment
situations. I/O psychologists
measure test reliability with the internal consistency, test–
retest, interrater, alternate-form
(or parallel-form), and split-halves methods described below.
Internal Consistency Reliability
Internal consistency reliability assesses the extent to which
different test items or questions
are correlated and thus consistently measure the same trait or
characteristic. For example,
if a 20-question test is used to measure extroversion, then
scores on these 20 items should
be highly correlated. If they are not, then some of the items may
be measuring a different
concept. The most common measure of internal consistency
reliability is Cronbach’s alpha,
which is an overall statistical measure of all the
intercorrelations across items in a particular
test. It can also pinpoint which items are poorly performing and
which should be removed to
improve the test’s internal consistency.
Test–Retest Reliability
Test–retest reliability involves administering the same test to
the same group of people at two
different times and then correlating the two sets of scores. To
the extent that the scores are
consistent over time, the reliability coefficient shows the test’s
stability (see Figure 3.1). This
method has a few limitations. First, it can be uneconomical,
because it requires considerable
time for employees to complete the tests on two or more
occasions. Second, it can be difficult
to determine the optimal length of time that should pass
between one test-taking session
and the next. If the interval is short, say, a few hours, test
takers may remember all the ques-
tions and simply answer everything the same way on the retest,
which could artificially inflate
the reliability coefficient. Conversely, waiting too long between
tests, say, 6 months to 1 year,
could cause retesting scores to be affected by changes that
result from outside learning. This
can artificially reduce the test’s reliability. The best time
interval is therefore relatively short,
such as a few weeks or up to a few months.
Interrater Reliability
As the name implies, and as introduced in earlier examples,
interrater reliability involves
allowing more than one evaluator (rater) to assess each
candidate’s performance, and then
correlating the scores of the raters across candidates. To the
extent that the scores are con-
sistent across raters, the test is considered more reliable. This
method is particularly relevant
for subjective tests that are prone to personal interpretation.
Even with well-designed rubrics
and valid questions, these methods introduce some variability in
assessment across candi-
dates and raters. Interrater reliability ensures that such
variability is under control and insuf-
ficient to bias the results of the process or alter hiring
decisions. Various advanced statistical
methods are available to take into account both the consistency
of the absolute scores of the
candidates across raters and their relative scores and rankings
compared to each other. Both
types of consistency are important. For example, absolute
scores can affect selection deci-
sions in situations where there is a cutoff score, such as in the
case of college admissions and
certifications. Relative scores and rankings can affect
promotion decisions or come into play
when only a predetermined number of candidates (e.g., top five)
can be selected.
you83701_03_c03_067-102.indd 76 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
77
Mechanical
comprehension at first
test administration
Mechanical
comprehension at second
test administration
High
Low
High
Low
Mechanical
comprehension at first
test administration
Mechanical
comprehension at second
test administration
High
Low
High
Low
High reliability
Low reliability
Section 3.3 Requirements of Psychological Measurement
Alternate-Form Reliability
Alternate- (or parallel-) form reliability has to do with how
consistent test scores are likely to
be if a person takes two similar, but not identical, forms of a
test. As with test–retest reliability,
this measure requires two sets of scores from a group of people
who have taken two varia-
tions of a test. If the score sets yield a high parallel-form
reliability coefficient, then the tests
are not materially different. If the reverse happens, the tests are
probably not equivalent and
therefore cannot be used interchangeably.
Figure 3.1: Test-retest reliability
One way to determine a test’s reliability is to conduct a test–
retest. The more consistent test scores
are over time, the more reliable the test is thought to be.
From Levy, P.E. (2016). Industrial/organizational psychology:
Understanding the workplace. (5 ed). p. 26, Fig. 2.3. Copyright
2017
by Worth Publishers. All rights reserved. Reprinted by
permission of Worth Publishers.
Mechanical
comprehension at first
test administration
Mechanical
comprehension at second
test administration
High
Low
High
Low
Mechanical
comprehension at first
test administration
Mechanical
comprehension at second
test administration
High
Low
High
Low
High reliability
Low reliability
you83701_03_c03_067-102.indd 77 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
78
Section 3.3 Requirements of Psychological Measurement
The major limitations of parallel-form reliability are that it is
time consuming and costly. Fur-
thermore, it requires the test developer to design two versions
of a test that both cover the
same subject matter and are equivalent in difficulty and reading
level.
Split-Halves Reliability
Split-halves reliability tests are more cost effective than either
test–retest or parallel-form
reliability because they can be assessed with scores from one
test administered just one time.
After the test has been given, it is split in half, and scores from
the two halves of the test are
correlated. A high reliability coefficient indicates that each
section of the test is consistently
measuring similar content, whereas the reverse is true with a
low reliability coefficient.
The tricky part of split-halves reliability is determining how
best to split the test. For example,
if test items increase in difficulty as the test progresses, or if
the first half of the test contains
fewer difficult questions than the second, it won’t work to
simply split the test down the
middle and compare the scores from each half. To solve this
dilemma, tests are often split by
odd- and even-numbered questions.
Find Out for Yourself: Test Reliability
This exercise is designed to help you grasp the challenges
involved in designing a reliable test.
To begin, choose a trait or skill in which you are interested. For
example, consider an academic
subject, mastery of an online game, or a personality trait that
you admire. Then, write 10 state-
ments you believe to be good measures of the selected trait or
skill. Ask three friends or family
members to rate themselves on each statement on a scale of 1–5
(1 = strongly disagree, 5 =
strongly agree). Finally, without looking at their scores, rate
each individual on the same 10
statements based on your own perceptions of that individual.
What Did You Learn?
1. Assess interrater reliability: Add up each individual’s scores
based on his or her own
assessment. Rank order the scores. Add up each individual’s
scores based on your
assessment of that individual. Rank order the scores. Did the
rankings change?
2. Assess split-halves reliability: Add up each individual’s
scores on the first five questions.
Add up each individual’s scores on the second set of five
questions. Are the scores of
each individual on the two test halves similar? Rank order the
three individuals based
on their first five questions. Rank order them again based on the
second set of five ques-
tions. Did the rankings change?
3. Ask the same three individuals to rate the same statements
again a week later. Assess
test–retest reliability: Add up each individual’s scores on the
first time he or she took
the test. Add up each individual’s scores on the second time he
or she took the test. Are
the scores similar? Rank order the three individuals based on
their first test. Rank order
them again based on their second test. Did the rankings change?
As you can probably appreciate from this exercise, anyone can
“whip up” a test, but the test
may be highly subjective and unreliable if the scores are not
consistent across raters, ques-
tions, and times of test administration. You probably now have
some idea as to how to improve
your test. Similarly, scientific test design requires numerous
iterations of writing and rewrit-
ing items and statistically examining results with multiple
samples to ensure reliability before
the test can be used.
you83701_03_c03_067-102.indd 78 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
79
Section 3.3 Requirements of Psychological Measurement
Test Validity
Validity is the most important aspect of an employment test.
Although a test is reliable if it is
able to make consistent measurements, a test is valid if it truly
measures the characteristics it
is supposed to measure. An employment test may yield reliable
scores, but if it doesn’t mea-
sure the skills that are needed to perform a job successfully, it
is not very useful. I/O psycholo-
gists use three methods to establish test validity: criterion-
related validity, content validity,
and construct validity.
Criterion-Related Validity
The purpose of criterion-related validity is to establish a
predictive, empirical (number-based)
link between test scores and actual job performance (see Figure
3.2). To do this, I/O psycholo-
gists compare applicants’ employment test scores with their
subsequent job performance.
This correlation is called the validity coefficient, and it ranges
from 0 to ±1.00. Tests that yield
validity coefficients ranging from +.35 to +.45 are considered
useful for making employment
decisions, whereas those with validity coefficients of less than
+.10 probably have little rela-
tionship to job performance.
I/O psychologists use two different methods to establish
criterion-related validity: predictive
validity and concurrent validity. Predictive validity involves
administering a new test to all job
applicants but not using the test scores to try to predict the
applicants’ job readiness. Instead,
the scores are filed away to be analyzed later. After a time,
managers will have accumulated
performance ratings or other information that indicates how new
hires are performing on
the job. At that point, the new hires’ preemployment test scores
will be correlated with their
performance, and the organization can look at how successfully
the test predicted perfor-
mance. If the test proves to be a valid predictor of performance,
it can be used in future hiring
decisions. Although this approach is considered the gold
standard of test validation, many
organizations are unwilling to use the predictive validity
method because filing away employ-
ment test scores lets some unqualified applicants slip through
the preemployment screening
process. However, scientifically developed tests regularly use
this method to validate them
prior to their use.
The concurrent validation approach is more popular because,
instead of job applicants, on-
the-job employees are used to establish the test’s validity. With
this method, both the current
employees’ test scores and their job performance ratings are
gathered at the same time, and
test validity is established by correlating these measures.
Organizations appreciate the cost-
effectiveness offered by concurrent validation. Tests that are
found to be of high concurrent
validity based on the results from current employees are then
used to assess job applicants.
Other forms of concurrent validation include convergent and
divergent validity. Convergent
validity refers to the correlation between test scores and scores
on other related tests. For
example, SAT and ACT tests are expected to correlate, so the
validation of new SAT questions
may involve examining their correlation with ACT questions.
Divergent validity refers to the
correlation between test scores and scores on other tests or
factors that should not be related.
For example, test scores should not be related to gender, race,
religion, or other protected
classes. To ensure that a test is not discriminatory, the
correlation between its scores and
each of these factors can be examined. Divergent validity is
supported when these correla-
tions are low or statistically insignificant. Taken together,
convergent and divergent validity
you83701_03_c03_067-102.indd 79 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
80
Cognitive
ability score
Job
performance
High
Low
Excellent
Poor
Cognitive
ability score
Job
performance
High
Low
Excellent
Poor
High validity
Low validity
Section 3.3 Requirements of Psychological Measurement
examine the extent to which a test relates to what it should be
related to and does not relate
to what it should not be related to, respectively.
Despite the time- and cost-saving advantages, concurrent
validation does have a number of
drawbacks. First, the group of employees who validate a test
could be very different from the
Figure 3.2: Criterion-related validity
In order to establish a connection between test scores and actual
job performance, I/O psychologists
use criterion-related validity. Tests with high correlations
between scores and job performance are
considered to be high-validity tests and can be useful for
making employment decisions. Tests with
low correlations between scores and job performance are
considered low-validity tests and would
not be ideal for assessing job performance.
From Levy, P.E. (2016). Industrial/organizational psychology:
Understanding the workplace. (5 ed). p. 29, Fig. 2.4. Copyright
2017
by Worth Publishers. All rights reserved. Reprinted by
permission of Worth Publishers.
Cognitive
ability score
Job
performance
High
Low
Excellent
Poor
Cognitive
ability score
Job
performance
High
Low
Excellent
Poor
High validity
Low validity
you83701_03_c03_067-102.indd 80 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
81
Section 3.3 Requirements of Psychological Measurement
group of applicants who actually make use of the test. The
former would likely skew the vali-
dation because lower performing employees (and their low
scores) would have already been
removed from their positions and thus not be part of a test-
validating process. This can cause
the validity of the test to appear higher than it really is. On the
other hand, employees may
also skew the validation by not trying as hard as job applicants.
Employees who already have
jobs might not be as motivated to do their best as applicants
eager for a job would be. This can
cause the test’s validity to appear lower than it really is.
Content-Related Validity
Content-related validity is the rational link between the test
content and the critical job-related
behaviors. In other words, test items should be directly related
to the important require-
ments and qualifications for the job. The rationale behind
content-related validation is that if
a test samples actual job behaviors, then individuals who
perform well on it will also perform
well on the job. Remember that the goal of testing is to predict
performance.
As you can probably guess, content-related validation studies
rely heavily on information
gathered from a job analysis. If test questions are directly
related to the specific skills needed
to perform a job, the test will have high content-related validity.
For example, a test for admin-
istrative professionals might ask questions related to effective
filing methods, schedule man-
agement, and typing techniques. Because these skills are
important for the administrative
professional position, this test would have high content-related
validity. Content validity can-
not be evaluated numerically or statistically as readily as
criterion validity. It is often qualita-
tively evaluated by subject matter experts. However, qualitative
evaluations can be quantified
using well-designed rubrics and assessed for reliability and
consistency.
Construct Validity
Construct validity is the extent to which a test accurately
assesses the abstract personal attri-
butes, or constructs, that it intends to measure. Although there
are valid measures of many
personality traits, numerous invalid measures of the same traits
can be found in less scientific
sources such as magazines or the Internet. Because constructs
are intangible, it can be chal-
lenging to design tests that measure them.
How do we know if tests of personality, reasoning, or
motivation actually measure these
intangible, unobservable characteristics? One way to establish
construct validity is to cor-
relate a new test with an established test that is known to
measure the construct in question.
If the new test correlates highly with the established test, the
new test is likely measuring the
construct it is intended to measure.
Validity Generalization
Initially, I/O psychologists thought that validity evidence was
situation specific; that is, a test
that was validated and used for applicants for one job could not
be used for applicants for a
different job unless an additional job-specific validation study
was performed. Further, I/O
psychologists believed that tests that had been validated for a
position at one company could
not be used for the same position at a different company—
again, unless it was validated for
the second company, which is a tedious and costly process.
you83701_03_c03_067-102.indd 81 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
82
Section 3.3 Requirements of Psychological Measurement
However, research over the past few decades has shown that
validity specificity is unfounded
(Lubinski & Dawis, 1992). I/O psychologists now believe that
validity evidence transfers
across situations, a notion that is referred to as validity
generalization. Researchers discov-
ered that the studies that supported validity specificity were
flawed (Schmidt & Hunter, 1981).
Validity generalization has been a huge breakthrough for
organizations. Establishing validity
evidence for every employment test in every situation was, for
most organizations, both cost
and time prohibitive. Because of its cost- and time-saving
benefits, the advent of validity gen-
eralization has meant that organizations are more willing to
integrate quality tests into their
employment practices.
Face Validity
Face validity is not a form of validity in a technical sense;
rather, it is the subjective impression
of how job relevant an applicant perceives a test to be. For
example, a bank teller would find
nothing strange about taking an employment test that dealt with
numerical ability or money-
counting skills, because these skills are obviously related to job
performance. On the other
hand, the applicant may not see the relevance of a personality
test that asks questions about
personal relationships. This test would thus have low face
validity for this job.
Organizations need to pay close attention to their applicants’
perceptions of a test’s face valid-
ity, because low face validity can cause an applicant to feel
negatively about the organization
(Chan, Schmitt, DeShon, Clause, & Delbridge, 1997; Smither,
Reilly, Millsap, Pearlman, & Stof-
fey, 1993). If organizations have the opportunity to pick
between two tests that are otherwise
equally valid, they should use the test with the greater level of
face validity.
Find Out for Yourself: Your Personality Type
Visit the 16 Personalities website and take the personality-type
assessment provided. Read
your results, then visit the Encyclopedia Britannica entry on
personality assessment and read
about the reliability and validity of assessment methods.
16 Personalities
Reliability and Validity of Assessment Methods
What Did You Learn?
1. What have you learned about yourself through this
assessment?
2. Is this assessment accurate? Which types of validity and
reliability apply to it?
3. Based on this assessment, what are some examples of jobs
that would fit your type?
you83701_03_c03_067-102.indd 82 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
https://www.16personalities.com/free-personality-test
https://www.britannica.com/science/personality-
assessment/Reliability-and-validity-of-assessment-methods
83
Section 3.4 Test Formats
Qualitative Methods
Most of the methods discussed in this chapter are quantitative in
nature. However, employ-
ers often find it necessary to take into consideration other
factors that are not necessarily
quantifiable but can be very important in employee selection.
Qualitative selection methods
may include observation, unstructured interview questions,
consultation with references, or
solicitation of opinions from past or future managers and
coworkers. Qualitative methods can
yield a much richer and broader array of information about an
applicant that may be impos-
sible to capture using quantitative methods.
Unfortunately, however, qualitative methods are much harder to
assess in terms of validity
and reliability, and thus they may lead to erroneous decisions.
Their subjectivity can also lead
to discriminatory decisions. Qualitative methods can be
particularly problematic in ranking
applicants. Without a predetermined set of evaluation criteria
and a rating scale, interrater
reliability can be very low. That is why psychologists attempt to
quantify even what would
be considered qualitative criteria—such as person–organization
fit and person–job fit—by
creating survey measures of these factors. With the help of I/O
psychologists, employers can
also create quantitative scoring themes for qualitative data to
increase the integrity and legal-
ity of these methods.
3.4 Test Formats
Thousands of employment tests are on the market today.
Naturally, no two tests are the same
and may differ in their construction or administration. Tests
vary in their quality depending
on the rigor of their validation processes. They also vary in cost
depending on their extensive-
ness and popularity. However, it is important to note that
quality and cost do not always go
hand in hand. Some of the most valid and reliable tests are
available in the scientific literature
free of charge, but they are not very popular among
practitioners, who are often unfamiliar
with the scholarly literature. On the other hand, some of the
popular and expensive tests
marketed by well-known consulting companies have
questionable validity and reliability. In
some cases these tests are only taken at face validity and are
never statistically analyzed. In
other cases the test developers make sure their tests are valid
and reliable but are reluctant to
publicly share their analyses for proprietary reasons. In all
cases prudent employers should
demand evidence of validity and reliability in order to ensure
that they are (a) investing their
time and resources in the right selection tools that will yield the
most qualified workforce and
(b) using legally defensible and nondiscriminatory methods.
Commonly used test-design for-
mats include assessment centers, computer-adaptive tests, speed
and power tests, situational
judgment tests, and work-sample tests.
Assessment Centers
An assessment center is one of the most comprehensive tests
available and is often used to
select management and sales personnel because of its ability to
assess interpersonal, commu-
nication, and managerial skills. A typical assessment center
includes personality inventories,
cognitive assessments, and interviews, as well as simulated
activities that mimic the types of
activities performed on the job. Common types of simulated
activities include in-basket tasks,
leaderless group discussions, and role-play exercises.
you83701_03_c03_067-102.indd 83 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
84
Section 3.4 Test Formats
Although assessment centers can predict the level of success
both in training and on the job,
they have a number of limitations. First, assessment centers are
expensive to design and
administer, because administrators must be specifically trained
to evaluate discussions and
perform role plays. Assessment centers for senior management
positions can cost more than
$10,000, a price tag that is prohibitive for many organizations.
Second, because scoring an
assessment center relies on the judgment of its assessors, it can
be difficult to standardize
scores across time and location. This issue can be mitigated by
training assessors to evaluate
behaviors against an established set of scoring criteria.
Computer-Adaptive Tests
Typical tests include items that sample all levels of a
candidate’s ability. In other words, they
contain some questions that are easy and will be answered
correctly by almost all test tak-
ers, some that are difficult and will be answered correctly by
only a few, and some that are
in-between. A computer-adaptive test, however, tailors the test
to each test taker’s individual
ability.
In this type of test, the candidate begins by answering a
question that has an average level of
difficulty. If he or she answers correctly, the next question will
be more difficult; if he or she
answers incorrectly, the next question will be easier. This
process continues until the candi-
date’s proficiency level is determined. The fact that candidates
do not waste time answering
questions of inappropriate difficulty is a clear advantage of the
computer-adaptive test. Addi-
tionally, because each test is tailored to the individual, test
security (i.e., cheating) is less of a
concern.
Speed and Power Tests
Tests can be designed to assess either an individual’s depth of
knowledge or rate of response.
The first type of test is called a power test. Power tests are
designed to be difficult, and very
few individuals are able to answer all of the items correctly.
Test takers receive either a gen-
erous time limit or no time limit at all. The overall purpose of
the power test is to evaluate
depth of knowledge in a particular domain. Therefore, response
accuracy is the focus, instead
of response speed.
Speed tests contain a homogenous content set, and test takers
receive a limited amount of
time to complete the test. These tests are well suited to jobs in
which tasks must be per-
formed both quickly and accurately, such as bookkeeping or
word processing. For these jobs,
a data-entry test would be an appropriate speed test for
measuring an applicant’s potential
for success.
Situational Judgment Tests
A situational judgment test is a type of job simulation that is
composed of a number of job-
related situations designed to assess the applicant’s judgment.
Each situation includes mul-
tiple options for how to respond. The applicant must select the
options that will produce
the most and least effective outcomes. Statistically, situational
judgment tests have validities
comparable to structured interviews, biographical data, and
assessment centers (Schmidt &
Hunter, 1998).
you83701_03_c03_067-102.indd 84 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
85
Section 3.5 Testing for Individual Differences
Situational judgment tests are frequently admin-
istered to candidates for management positions,
although research shows them to be predictive
of performance for a wide variety of jobs. Stud-
ies have found validity evidence for situational
judgment tests’ ability to predict supervisory
performance (Motowidlo, Hanson, & Craft, 1997)
and to predict job performance for sales profes-
sionals (Phillips, 1992), insurance agents (Dales-
sio, 1994), and retail store employees (Weekley
& Jones, 1997).
Work-Sample Tests
Work-sample tests evaluate an applicant’s level of
performance when demonstrating a small sample
set of a job’s specific tasks. The two general areas for work-
sample tests are motor skills and
verbal abilities. A test that requires a machinist applicant to
properly operate a drill press is
an example of a motor skills work-sample test; a test that asks a
training applicant to present a
portion of the organization’s training program is a verbal ability
work-sample test.
One advantage of work-sample tests is that they generally show
a high degree of job related-
ness, so applicants perceive that they have a high degree of face
validity. Additionally, these
tests provide applicants with a realistic job preview. The
disadvantage is that they can be
expensive to develop and administer.
Goodshot/Thinkstock
Situational judgment tests present
applicants with job-related scenarios
in order to assess their decision-
making skills.
Consider This: The Costs of Testing
1. For most of the tests discussed in this section, expense is a
major drawback. Why do you
think an organization would go to the trouble of developing and
using employment tests?
2. How might the expense of a test be justified, offset, or
overcome?
3.5 Testing for Individual Differences
People differ in psychological and physical characteristics, and
identifying and categorizing
people in respect to these differences is important for
successfully predicting both job per-
formance and job satisfaction. The most commonly tested
categories of individual differences
are cognitive ability, physical ability, personality, integrity, and
vocational interests. Each of
these categories has an important theoretical foundation as well
as its own set of advantages
and disadvantages.
Cognitive Abilities
The past hundred years of study have produced two distinct
concepts of cognitive ability.
Beginning with Spearman’s seminal research in 1904 on general
intelligence, one concept is
based on the belief that cognitive ability is a single, unitary
construct (called the g, or general,
you83701_03_c03_067-102.indd 85 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
86
Section 3.5 Testing for Individual Differences
factor) along with multiple subfactors (called s factors).
According to this two-factor, or hier-
archical, theory of cognitive ability, the g factor is important to
all cognitive performance,
whereas s factors influence specific intelligence domains. For
example, your performance on
a math test will be influenced by both your general, overall
intelligence (the g factor) and your
knowledge of the specific math topic being tested (an s factor).
From test to test, your scores
will be strongly correlated, because all performance is
influenced by the g factor; however, the
influence of s factors will keep the correlation from being
perfect. So, although a high g factor
of overall intelligence might mean that you would score higher
than most on both a math test
and a verbal reasoning test, your math test scores could be
lower than your verbal reasoning
scores simply because you never took a class that covered the
specific topics on the math test
(an s factor).
Led by Thurstone’s research, begun in 1938, scientists
challenged Spearman’s theories by
proposing that cognitive ability was a combination of multiple
distinct factors, with no over-
arching factor. Using a statistical technique called factor
analysis, Thurstone and his col-
leagues identified seven primary mental abilities: spatial
visualization, number facility, verbal
comprehension, word fluency, associative memory, perceptual
speed, and reasoning (Thur-
stone, 1947). This theory suggests that employment tests should
evaluate the primary men-
tal abilities that are most closely linked to a specific job. For
example, a test for engineering
applicants would focus on spatial and numerical abilities.
Although there is no consensus, research has supported
Spearman’s hierarchical model of
cognitive ability (Carroll, 1993; Schmid & Leiman, 1957).
Consequently, employers tend to
use tests to measure both general intelligence and specific
mental domains. Those that focus
on the g factor are called general cognitive ability tests. They
measure one or more broad
mental abilities, such as verbal, mathematical, or reasoning
skills. General cognitive ability
tests can be used to evaluate candidates for almost any job,
especially those in which cogni-
tive abilities such as reading, computing, analyzing, or
communicating are involved. Specific
cognitive ability tests measure the s factors and focus on
discrete mental abilities such as reac-
tion time, written comprehension, and mathematical reasoning.
These tests must be closely
linked to the job’s specific functions.
Cognitive ability tests are among the most widely used tests
because they are highly effec-
tive at predicting job and training success across many
occupations. In their landmark study,
Schmidt and Hunter (1998) examined validity evidence for 19
different selection processes
from thousands of studies over an 85-year period. After
compiling a meta-analysis (which is
a combination of the results of several studies that address a set
of related research hypoth-
eses), Schmidt and Hunter found that cognitive ability test
scores correlated with job perfor-
mance at .51 and training success at .53, which were the highest
validity coefficients among
all the types of tests they examined. Other researchers found
similar validities using data
from European countries (Bertua, Anderson, & Salgado, 2005;
Salgado, Anderson, Moscoso,
Bertua, & Fruyt, 2003). Interestingly, additional research has
found that a job’s complexity
positively affects the validity of cognitive ability tests. In other
words, the more complex the
job, the better the test is at predicting future job performance.
For jobs with low complexity,
on the other hand, high cognitive ability test scores are less
important for predicting success-
ful job performance (Hunter, 1980; Schmidt & Hunter, 2004).
The most popular general cognitive ability test is the Wonderlic
Cognitive Ability Test. Devel-
oped by Eldon F. Wonderlic in the 1930s, this test contains 50
items, has a 12-minute time
you83701_03_c03_067-102.indd 86 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
87
Section 3.5 Testing for Individual Differences
limit, and is used by both private businesses and
government agencies. Test norms have been set by
more than 450,000 working adults and are avail-
able for over 140 different jobs. Test content cov-
ers numerical reasoning, verbal comprehension,
and spatial ability. The test begins with very easy
questions and progresses to very difficult ones;
due to this range and the short time limit, few peo-
ple are able to answer all 50 questions correctly.
The test is offered in a variety of versions, includ-
ing computer- and paper-based formats, and is
now available in nine languages, including French,
Spanish, British English, and American English.
In all, more than 130 million job applicants have
taken the Wonderlic.
The Wechsler Adult Intelligence Scale-Revised
(WAIS-R), developed by David Wechsler in 1955
and currently in its fourth edition, is another com-
monly used general cognitive ability test. It differs
from the Wonderlic in both length and scope and is composed of
11 different tests (6 verbal
and 5 performance), requiring 75 minutes to complete. The 6
verbal tests are comprehension,
information, digit span, vocabulary, arithmetic, and similarities.
The performance tests are
picture completion, picture arrangement, object assembly,
digital symbol, and block design.
Naturally, this complex psychological assessment requires well-
trained administrators to
ensure accurate scoring and score interpretation. The WAIS-R is
typically used when select-
ing for senior management or other positions that require
complex cognitive thinking.
Outside the world of work, a person’s cognitive ability also
predicts his or her academic suc-
cess. Using meta-analytic research, psychologists examined the
relationship between the
Miller Analogies cognitive ability test (a test commonly used to
select both graduate students
and professional-level employees) and student and professional
success. Interestingly, results
showed that there was no significant difference between the
cognitive abilities required for
academic and business success (Kuncel, Hezlett, & Ones, 2004).
Although they are valid performance predictors for many jobs,
cognitive ability tests can pro-
duce different selection rates for individuals in select classes.
Whites typically score higher
than African Americans, and there is much concern that these
differences are due to bias
within the test. If a test is biased to favor one ethnic group or
population over others, then
any employment selection or promotion program that utilizes it
will be inherently flawed.
Discovering bias within a test can be extremely difficult, but
I/O psychologists have been able
to reduce the impact of potentially biased cognitive ability tests
by adding noncognitive tests,
such as personality tests, to selection processes (Olson-
Buchanan et al., 1998).
Physical Abilities
Many jobs require a significant amount of physical effort.
Firefighters and police officers,
for example, may need physical strength, and mechanics may
need finger dexterity. Other
examples of hazardous or physically demanding work
environments include factories, power
Roy Delgado/CartoonStock
you83701_03_c03_067-102.indd 87 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
88
Section 3.5 Testing for Individual Differences
plants, and hospitals. Organizations must be careful to use
information from a job analysis to
understand the specific physical requirements of a job before
purchasing or developing any
work-sample tests to use in the selection process. Fleishman
(1967) identifies nine physical
ability characteristics present in many jobs (see Table 3.1).
Measures of each physical ability
are not strongly correlated, however, which suggests that there
is no overall measure of gen-
eral physical ability.
Table 3.1: Fleishman’s physical ability dimensions
Physical ability Description
Static strength Maximum muscle force exerted by muscle
groups (e.g., legs, arms,
hands) over a continuous period of time
Explosive strength Explosive bursts of muscle energy over a
very short duration to move
the individual or an object
Dynamic strength Repeated use of a single muscle over an
extended period of time
Trunk strength Ability of back and core muscles to support the
body over repeated
lifting movements
Extent flexibility Depth of flexibility of arms, legs, and body
Dynamic flexibility Speed of flexibility of arms, legs, and body
Gross body coordination Ability to coordinate arms, legs, and
body to perform activities
requiring whole-body movements
Gross body equilibrium Ability to coordinate arms, legs, and
body to maintain balance and
remain upright in unstable positions
Stamina Ability to exert oneself physically over a long duration
Research consistently demonstrates that physical ability tests
predict job performance for
physically demanding jobs (Hogan, 1991). Identifying
individuals who cannot perform the
essential physical functions of a job—especially in hazardous
positions such as police officer,
firefighter, and military personnel—can minimize the risk of
physical harm to the job candi-
date, other employees, and civilians. Another positive feature of
physical ability tests is that
they are not strongly correlated with cognitive ability tests,
which as mentioned earlier tend to
be biased. Thus, using physical ability tests in conjunction with
cognitive ability tests can help
decrease potential bias and make job performance predictions
more accurate (Carroll, 1993).
I/O psychologists must be careful to design and measure
physical ability tests so they do
not discriminate against minority groups. Unfortunately,
although a job may legitimately
need candidates who possess specific physical skills, the
standards and measures of those
skills are often arbitrarily or inaccurately made. For example,
height and weight are often
used as a proxy for physical strength. Even though these
measurements are quick and easy
to make, they are not always the most accurate, and they have
resulted in the underselection
of women for physically demanding jobs (Meadows v. Ford
Motor Co., 1975). Companies that
fail to implement accurate, careful measurements of an
applicant’s ability to perform a job’s
necessary, specific physical requirements run the risk of
discriminating against protected
classes—something that will likely land them in court, where
judges and juries consistently
rule in favor of the plaintiff and award large settlements.
you83701_03_c03_067-102.indd 88 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
89
Section 3.5 Testing for Individual Differences
One example of a valid, commercially available physical ability
test is the Crawford Small Parts
Dexterity Test. This test is used to assess the fine motor skills
and manual dexterity required
for industrial jobs. It examines applicants’ ability to place small
objects into small holes on
a board. For their first task, test takers use a pair of tweezers to
place 36 pins into holes and
position a collar around each pin. In their second task, test
takers use a screwdriver to insert
36 screws into threaded holes. The test is scored in two ways:
by measuring the amount of
time taken to complete both parts of the test and by measuring
the number of items com-
pleted during a set time limit (3 minutes for part 1 and 5
minutes for part 2).
Personality
I/O psychologists have studied how personality affects the
prediction of job performance
since the early 20th century. After examining 113 studies
published from 1913 to 1953, Ghis-
elli and Barthol (1953) found positive but small correlations
between personality and the
prediction of job performance. The researchers were surprised
that the correlation was not
stronger and suggested the companies had used personality tests
with weak validity evidence.
Developing and using tests with stronger validity evidence, they
concluded, would facilitate
better predictions of applicants’ future job performance.
Guion and Gottier (1965) disagreed with this notion, suggesting
instead that personality
measures were unrelated to job performance—even though the
two noted that their data
came from studies that used poorly designed or theoretically
unfounded personality mea-
sures. In fact, most researchers at the time developed their own
concepts of personality and
created tests to match them, which naturally led to considerable
inconsistency in the ways
they measured personality constructs. Thus, although
organizations continued to use person-
ality tests to select candidates for management and sales
positions, academic research in this
area waned for the next 20 years.
In the early 1990s Barrick and Mount’s (1991) landmark meta-
analysis established the five-
factor model of personality, now commonly referred to as the
Big Five personality factors—
extraversion, agreeableness, conscientiousness, neuroticism,
and openness to experience—
which are described in detail in Table 3.2. This model is
broader than earlier theories and
therefore lends itself more readily to a useful classification for
the interpretation of personal-
ity constructs (Digman, 1990).
Table 3.2: Big Five personality factors
Factor Also referred to as Description
Neuroticism Adjustment Insecure, untrusting, worried, guilty
Extraversion Sociability Sociable, gregarious, fun, people
person
Openness to experience Inquisitiveness Risk taking, creative,
independent
Agreeableness Interpersonal sensitivity Empathetic,
approachable, courteous
Conscientiousness Mindfulness Dependable, organized,
hardworking
you83701_03_c03_067-102.indd 89 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
90
Section 3.5 Testing for Individual Differences
The most important advantage of the five-factor model is the
functional structure it provides
for predicting the relationships between personality and job
performance. Barrick and Mount
(1991) reviewed 117 criterion-related validation studies
published between 1952 and 1988
that measured at least one of the five personality factors.
Results showed that conscientious-
ness (a measure of dependability, planfulness, and persistence)
was predictive of job perfor-
mance for all types of jobs. Further, extraversion (a measure of
energy, enthusiasm, and gre-
gariousness) predicted performance in sales and management
jobs. The other three factors
were found to be valid but were weaker predictors of some
dimensions of performance in
some occupations. That same year, Tett, Jackson, and Rothstein
(1991) found strong positive
predictive evidence not only for conscientiousness and
extraversion but also for agreeable-
ness and openness to experience. On the other hand, they found
neuroticism to be negatively
related to job performance. These researchers also discovered
that validity was higher among
studies that referenced job analysis information to create tests
that linked specific personal-
ity traits with job requirements. In summary, then, measures of
the Big Five personality fac-
tors can significantly predict job performance, but to do so, they
must be carefully aligned
with critical job functions.
In addition to strong criterion-related validity, measures of the
Big Five factors also gener-
ally show very little bias. Across a number of personality
factors, score differences across
racial groups and/or genders are minor and would be unlikely to
adversely impact employ-
ment decisions; two areas that fall outside this generalization
are agreeableness, in which
women score higher than men, and dominance (an element of
extraversion), in which men
score higher (Feingold, 1994; Foldes, Duehr, & Ones, 2008).
Because they produce almost no
adverse impact, personality tests can be used in conjunction
with cognitive ability tests dur-
ing selection processes to increase validity while reducing the
threat of potential bias (Hough,
Oswald, & Ployhart, 2001).
The first test to examine the Big Five personality factors was
the NEO Personality Inventory,
named for the first three factors of neuroticism, extraversion,
and openness to experience.
Composed of 240 items, the test can be completed in 30 to 40
minutes and is available in a
number of languages, including Spanish, German, and British
English. However, research now
supports much shorter versions, as short as 10 items (Gosling,
Rentfrow, & Swann, 2003),
which are ideal for use in work settings, either separately or in
combination with other tests
and survey measures.
Types of Personality Tests
There are two basic types of personality tests: projective tests
and self-report inventories. The
former presents test takers with an ambiguous image, such as an
inkblot, and asks them to
describe what they see. Trained psychologists then interpret the
descriptions. The rationale
for this type of test is that test takers will project their
unconscious personalities into their
descriptions of the image. Two examples of projective tests are
the Rorschach (inkblot) test
and the Thematic Apperception Test.
Projective tests are most often used by clinical psychologists
and are rarely used in employee
selection processes because they are expensive, are time
consuming, and require professional
interpretation. Instead, employers make use of self-report
personality inventories, which ask
individuals to identify how a situation, behavior, activity, or
feeling is related to them. The
rationale for this type of test is that test takers know themselves
well enough to make an
you83701_03_c03_067-102.indd 90 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
91
Section 3.5 Testing for Individual Differences
accurate report of their own personality. Some advantages of
the self-report inventories over
projective tests are cost-effectiveness, standardization of
administration practices, and ease
in scoring and interpreting results. For example, the website
below can help you assess your
Big Five personality traits.
Find Out for Yourself: Your Big Five Personality Traits
Visit the following website to find extensive information and
research about the Big Five per-
sonality traits, take the Big Five personality test, and get instant
feedback.
The Big Five Project Personality Test
Unfortunately, self-report inventories have drawbacks. A major
one is the tendency of test
takers to distort or fake their responses. Because test items
usually have no right or wrong
answers, test takers can easily choose to provide socially
acceptable responses instead of
true answers in order to make themselves look better to the
hiring organization. Indeed, in
one controlled study, researchers instructed test takers to try to
respond to a personality
inventory in a way they felt would create the best impression,
which resulted in more posi-
tive scores (Hough, Eaton, Dunnette, Kamp, & McCloy, 1990).
Furthermore, a significant num-
ber of actual applicants who took personality inventories as part
of a selection process were
found to have distorted their responses to appear more
attractive, even without having been
told to do so (Stark, Chernyshenko, Chan, Lee, & Drasgow,
2001).
The real question for I/O psychologists is whether response
distortion significantly affects the
validity of personality inventories. Ones, Viswesvaran, and
Reiss (1996) conducted a meta-
analysis that examined the effects of social desirability on the
relationship between measures
of Big Five factors and both job performance and
counterproductive behaviors. They found
that test takers’ attempts to provide socially acceptable—not
necessarily truthful—answers
did affect their scores in the areas of emotional stability and
conscientiousness but did not seri-
ously influence test validity. However, faking answers can
influence hiring decisions by chang-
ing the rank ordering of job candidates (Christiansen, Goffin,
Johnston, & Rothstein, 1994).
One interesting and paradoxical finding of personality test
research is that people who are
able to recognize the socially acceptable answers, whether they
accurately represent the truth
of that person, tend to perform better on the job than people
who are unable do so (Ones et
al., 1996). How can this be? One explanation is that people in
the former group are better at
reading a situation’s subtle social cues and are therefore more
able to extrapolate what they
need to do to fulfill coworkers’ and managers’ expectations.
To balance the advantages and disadvantages of projective and
self-report tests, a new type of
tests, called implicit measures, has emerged. Implicit measures
are self-report tests in which
the questions are intentionally designed to make the purpose of
the test less obvious and thus
less amenable to faking and social desirability biases. For
example, the test taker may be given
a few seemingly neutral situations and a list of thoughts,
feelings, and actions and be directed
to select the ones that most closely represent him or her in each
situation. This intentional
vagueness allows implicit measures to assess a construct more
accurately and comprehen-
sively (Bing, LeBreton, Davison, Migetz, & James, 2007;
LeBel, & Paunonen, 2011).
you83701_03_c03_067-102.indd 91 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
http://www.outofservice.com/bigfive/
92
Section 3.5 Testing for Individual Differences
Honesty and Integrity
Organizations need to be able to identify individuals who are
likely to engage in dishonest
behaviors. Employee misconduct is more serious than distortion
of answers on a personality
test and can have a significant impact on both coworkers and
the organization as a whole.
Employee theft, embezzlement, and other forms of dishonesty
may cost American businesses
billions of dollars annually. According to the National White
Collar Crime Center, embezzle-
ment alone is estimated to cost companies as much as $90
billion each year (Bressler, 2009).
In the past, organizations used polygraph tests, but polygraph
test results are not always
accurate, and applicants may find them to be an invasion of
privacy. When the Employee Poly-
graph Protection Act was passed in 1988, most private
employers became unable to use these
tests, thus requiring them to find another way to identify this
trait.
A more valid way to measure employee dishonesty is with an
integrity test. Integrity tests fall
into two categories: overt integrity tests and personality-based
integrity tests. The first type
assesses an individual’s direct attitudes and actions toward theft
and employment dishon-
esty. Test items typically ask individuals to consider their
opinions about theft behaviors or
to think of their own dishonest behaviors. Sample questions
include “Is it OK to take money
from someone who is rich?” and “Have you taken illegal drugs
in the past year?”
Personality-based integrity tests typically contain disguised-
purpose, or covert, questions
that measure various personality factors—such as responsibility,
virtue, rule following,
excitement seeking, anger, hostility, and social conformity—
that are related to both produc-
tive and counterproductive employee behaviors. Although overt
integrity tests can predict
theft and other glaring forms of dishonesty, personality-based
integrity tests are able to pre-
dict behaviors that are more subtly or secretly dishonest, such
as absenteeism, insubordina-
tion, and substance abuse.
Vocational Interests
Unlike most of the tests we have discussed so far, vocational
interest inventories are designed
for career counseling and should not be used for employee
selection. In these inventories,
test takers respond to a series of statements pertaining to
various interests and preferences.
In theory, people who share the preferences and interests of
successful workers in a given
occupation should experience a high level of job satisfaction if
they pursue that line of work.
Vocational interest scores predict future occupational choices
reasonably well, with between
50% and 60% of test takers subsequently working in jobs
consistent with their vocational
interests (Hansen & Dik, 2005). However, even though people
are likely to get a job doing
something in which they are interested, research does not
support the notion that vocational
interest will always lead to high job performance or
satisfaction. In fact, interest congruence
is only weakly related to job satisfaction and does not
effectively predict either job or train-
ing performance (Tranberg, Slane, & Ekeberg, 1993; Schmidt &
Hunter, 1998). Keep in mind
that just because someone is interested in a certain job does not
mean he or she will be able
to perform it well.
One frequently used measure of vocational interest is the Strong
Interest Inventory (SII),
previously known as the Strong Vocational Interest Blank. The
SII is a self-report inventory
composed of 291 items divided into six themes: artistic,
conventional, social, realistic, enter-
prising, and investigative. The test requires 25 minutes to
complete and is administered and
you83701_03_c03_067-102.indd 92 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
93
Section 3.6 Developing a Testing Program
scored by computer. The results help test takers identify
occupations, leisure activities, and
work preferences that match their interests. It possesses norms
for 211 different occupa-
tions. Because interests tend to remain stable throughout a
person’s life, it makes sense for
high school and college students to take an interest inventory
like the SII as they begin the
process of developing their professional careers.
Another example is the Armed Services Vocational Aptitude
Battery (ASVAB). This test
assesses a wide range of abilities that predict future success in
the military. More than 1 mil-
lion military applicants, high school students, and
postsecondary students take this test every
year. Test subcategories include reading comprehension, world
knowledge, science, math,
electronics, and mechanics.
Find Out for Yourself: Occupational Interests and Personal
Abilities
Complete the Interest Profiler available on the O*NET website.
O*NET Interest Profiler
What Did You Learn?
1. What did you find out about your occupational interests and
personal abilities?
2. Are you currently working in a job that aligns with your
occupational interests? Why or
why not?
3.6 Developing a Testing Program
Although creating, identifying, and using valid tests are
essential for any quality testing pro-
gram, organizations also face a number of administrative
decisions that can affect the pro-
gram’s overall success.
Deciding When Not to Test
Most important, an organization must decide whether to use
testing in the selection process.
Time and cost are extremely important considerations, and they
include test development
and design, necessary equipment, facility usage, and
administrator/evaluator training and
pay. Naturally, organizations will need to ensure that the
benefits of their testing program
outweigh the costs.
Sometimes, the level of employee productivity that a test is able
to identify is not high enough
to warrant the expense of a testing program. In these cases other
measures, such as improv-
ing employee training and development, can help advance new
hires’ performance. Alter-
nately, conducting a more careful review of applicants’
educational backgrounds or asking
more in-depth interview questions can provide greater insight
into the job-related skills and
abilities of potential employees, without having to add tests to
the preemployment process.
you83701_03_c03_067-102.indd 93 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
http://www.mynextmove.org/explore/ip
94
Section 3.6 Developing a Testing Program
In summary, then, it is important for an organization both to
establish its employment needs
and to determine the potential benefits and expected costs of
testing programs before imple-
menting these useful selection tools.
Finding Quality Tests
Over the past few decades, the volume and variety of selection
tests have increased dramati-
cally. Unfortunately, the test publishers and consulting
companies that design them use vary-
ing levels of rigor and expertise. How, then, is an organization
to know which tests have met
acceptable standards of validity, reliability, and test norming?
Two periodicals, the Mental
Measures Yearbook and Tests in Print, are excellent reference
sources that provide descrip-
tions and expert reviews of a large number of tests, including
those used for selection. As you
learned in Chapter 1, you can also consult the original scientific
research that was conducted
to develop and validate various tests in order to assess their
quality and rigor.
Find Out for Yourself: Quality of Selection Methods
Research various selection methods with which you are familiar
or that you have undergone in
the past. Examples may include job applications, interviews,
reference checks, medical exams,
and referrals. Try to find validity and reliability scores for each
method. Which ones are more
valid? Which ones are more reliable? Why do you think that is
the case?
Test Administrators
A test’s usefulness depends in part on its proper administration,
scoring, and interpretation.
Organizations must not only train their testing administrators on
these key functions but also
establish quality controls and retrain administrators when
necessary. The requirements for
administrator qualifications and abilities vary from test to test,
so it is important for organi-
zations to be aware of the requirements outlined in each test’s
manual when selecting and
administering tests.
Addressing Ethical and Privacy Issues
Test security is a major concern for I/O psychologists and
organizations in order to maintain
high ethical testing practices. Tests and scores must remain
confidential. Questions should
never be published or distributed to the public, and tests should
only be administered to
qualified individuals.
Some applicants may view tests as an invasion of privacy,
particularly ones that assess per-
sonality and integrity or screen for drugs. As we have noted,
fear or mistrust in the selection
process can have adverse consequences. Organizations can
alleviate some of these concerns
by communicating to applicants the reasons for the test, how
test results will be used, and
how confidentiality will be maintained.
you83701_03_c03_067-102.indd 94 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
95
Section 3.7 Psychological Testing: Special Issues
Testing People With Cultural and Language Differences
Differences in cultural backgrounds can shape how test takers
interpret and respond to test
questions. As the American workforce becomes more racially
and ethnically diverse, it is criti-
cal that organizations emphasize the use of culturally unbiased
tests. Moreover, English is
no longer the primary language for a growing number of
applicants. Naturally, applicants
who cannot fluently read or speak the language used in a test
will likely return artificially
low scores, not because they lack skills or knowledge but
simply because they cannot com-
prehend the instructions or understand the test questions. To
overcome language barriers,
a test can be translated into a variety of languages. However, a
common problem with this
approach is that expressions and phrases used in the test items
may be lost in translation,
which decreases the test’s validity generalization. Thus,
additional validation is necessary
to assess validity generalization whenever a test will be used for
a different racial or ethnic
group or translated into a different language, and the test may
need to be adapted accordingly.
Testing People With Disabilities
The ADA protects qualified individuals with disabilities from
discrimination in all areas of
employment, including employment testing. It can be
challenging for organizations to accom-
modate individuals with disabilities; they must aim to be
sensitive to the needs of the indi-
vidual while also maintain the integrity of the testing process
and avoid incurring undue
hardship. Test administrators require training to understand and
properly respond to accom-
modation requests. Examples of reasonable accommodations
include modifying test equip-
ment or seating, ensuring accessibility to the testing facility,
and providing a Braille or large-
print version of a test to visually impaired candidates.
Establishing Appeals and Retest Processes
Every applicant should have the opportunity to perform at his or
her best on a test. Despite
every intention to create this opportunity, sometimes it is
simply not possible. Equipment can
malfunction, the testing environment could be poor (noise,
temperature, bad odors, or even
disasters such as fire or flood), and candidates can be affected
by outside stressors (illness or
hospitalization of a family member, among others). With each
of these situations, candidates
could perform significantly better if given the opportunity to
retake the test under conditions
in which the negative influences are not present.
Test administrators should be trained to identify situations that
produce invalid results and
then implement a specific process to retest the candidate based
on information and guidance
provided by the test publisher. The organization should also
establish policies for handling
complaints regarding testing in order to resolve concerns fairly
and consistently.
3.7 Psychological Testing: Special Issues
Over the past decade, I/O psychologists have become interested
in a number of questions
related to employment testing. How do applicants feel about
being tested? Do these feel-
ings affect their perceptions of the company? Do online tests
show the same validity as
you83701_03_c03_067-102.indd 95 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
96
Section 3.7 Psychological Testing: Special Issues
paper-and-pencil tests, and how can organizations keep
applicants from cheating on them?
Recent research findings shed some light on these interesting
questions.
Applicants’ Reactions to Tests
Most research about testing has focused on technical aspects—
content, type, statistical mea-
sures, scoring, and interpretation—and not on social aspects.
The fact is, no matter how use-
ful and important tests are, applicants generally do not like
taking them. According to a study
by Schmit and Ryan (1997), 1 out of 3 Americans have a
negative perception of employment
testing. Another study found that students, after completing a
number of different selection
measures as part of a simulated application process, preferred
hiring processes that excluded
testing (Rosse, Ringer, & Miller, 1996). Additional research has
shown that applicants’ nega-
tive perceptions about tests significantly affect their opinions
about the organization giving
the test. It is important for I/O psychologists to understand how
and why this occurs so they
can adopt testing procedures that are more agreeable to test
takers and that reflect a more
positive organizational image.
Typically, negative reactions to tests lower applicants’
perceptions of several organizational
outcome variables, including organizational attraction (how
much they like a company), job
acceptance intentions (whether they will accept a job offer),
recommendation intentions
(whether they will tell others to patronize or apply for a job at
the company), and purchasing
intentions (whether they will shop at or do business with the
company). A study conducted
in 2006 found that “[e]mployment tests provide organizations
with a serious dilemma [: . . .]
how can [they] administer assessments in order to take
advantage of their predictive capa-
bilities without offending the applicants they are trying to
attract?” (Noon, 2006, p. 2). With
the increasing war for top-quality talent, organizations must
develop recruitment and selec-
tion tools that attract, rather than drive away, highly qualified
personnel.
What Can Be Done?
I/O psychologists have addressed this dilemma by identifying
several ways to improve testing
perceptions. One is to increase the test’s face validity. For
example, applicants typically view
cognitive ability tests negatively, but their reactions change if
the test items are rewritten
to reflect a business-situation perspective. Similarly,
organizations can use test formats that
already tend to be viewed positively, such as assessment centers
and work samples, because
they are easily relatable to the job (Smither et al., 1993).
Providing applicants with information about the test is another,
less costly way to improve
perceptions. Applicants can be told what a test is intended to
measure, why it is necessary,
who will see the results, and how these will be used (Ployhart &
Hayes, 2003). Doing so should
lead applicants to view both the organization and its treatment
of them during the selection
process more favorably (Gilliland, 1993). Noon’s 2006 study
investigated applicants’ reac-
tions to completing cognitive ability and personality tests as
part of the selection process for
a data-processing position. Half of the applicants received
detailed information explaining
you83701_03_c03_067-102.indd 96 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
97
Section 3.7 Psychological Testing: Special Issues
the testing process, while the other half received the standard
information normally provided
by the company. Applicants who received the detailed
information found the company more
attractive and were more likely to recommend it to others, even
if they did not receive a job
offer. Although this tactic is not always used during testing
process, providing detailed infor-
mation about a test is a quick, cost-effective, and practical way
for organizations to improve
applicant test perceptions (Lounsbury, Bobrow, & Jensen,
1989).
Online Administration
Over the past decade, online testing has increased dramatically.
Also referred to as unproctored
Internet testing, the online test has replaced many traditional
paper-and-pencil alternatives,
and almost every type of test is now available
through online administration. Online tests have
a number of advantages over paper-and-pencil
tests. They can be taken by anyone from nearly
anywhere in the world, increasing the pool of
applicants for a job. Brick-and-mortar testing
facilities and proctors no longer need to be a
part of the testing program, because applicants
can complete the test from home or a public
library. The amount of administrative support
also decreases, further lowering costs. Hiring
decisions are made faster using online admin-
istration, because the testing software provides
immediate scoring and feedback on test perfor-
mance. Finally, online tests often take advantage
of new interactive technology, such as video
clips, simulations, and gaming platforms, and
thus test takers find them more engaging.
Despite the ease and mass marketability of Internet tests,
however, I/O psychologists struggle
to reach a consensus on their efficacy and ethicality as well as
the validity of the test scores
they yield. Online tests present a number of challenges,
including applicant cheating, poten-
tial for subgroup differences, and the inability to identify the
test taker (Tippins et al., 2006).
Current research suggests that unproctored Internet tests are not
compromised by cheating
(Nye, Do, Drasgow, & Fine, 2008), although undoubtedly there
will be occasions when an
applicant will feel the stakes are high enough to rationalize
cheating. One solution to this
potential problem is to require candidates to complete a portion
of the test under proctored
conditions at the organization prior to receiving a job offer.
Regardless of the challenges, online tests are here to stay. The
advantages far outweigh the
disadvantages, leaving I/O psychologists with the “delicate task
of balancing best ‘academic’
practice with often conflicting practical and organizational
needs to implement valid and reli-
able online assessments” (Kaminski & Hemingway, 2009, p.
26).
Minerva Studio/iStock/Thinkstock
Online testing is becoming more
prevalent and has many benefits, such
as increasing the applicant pool and
lowering costs.
you83701_03_c03_067-102.indd 97 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
98
Section 3.8 Testing for Positivity
3.8 Testing for Positivity
Selecting for KSAOs is important, but so is selecting for
positivity. Psychological testing can
be an effective way to assess applicants’ positivity levels.
Although valid and reliable tests are
available to assess many physical, cognitive, social, and
psychological traits and abilities, tests
are just starting to emerge that adequately assess pos-
itivity in general as well as specific positive psycho-
logical qualities in an applicant or an employee.
This is not to say that established psychological
assessments are predominantly negative. To the con-
trary, many of the most recognized psychological tests
are positive in nature. For example, the Big Five per-
sonality traits include four clearly positive traits—
conscientiousness, extroversion, agreeableness, and
openness to experience—and only one negative trait,
neuroticism. However, many of the existing psycho-
logical tests are primarily based on problem-oriented
models and processes, which mainly focus on detect-
ing what an applicant may be lacking rather than on
assessing what makes an applicant flourish and thrive
in the workplace.
Inherently, an applicant’s positivity and negativity
are often considered to be opposite sides of the same
coin. For example, an employer might assume that an
employee who tests high on optimism will be low on
pessimism, or that an employee who is high on posi-
tive affect should also be low on negative affect. These
assumptions have gone unchallenged for years, and
many organizations and consultants have readily
made extrapolations from positive to negative and negative to
positive psychological charac-
teristics, as if they are two ends of the same continuum.
However, recent studies show that
these assumptions may not always hold true. For example,
Schaufeli and Bakker’s (2004)
study showed strong evidence that experiencing burnout and
engagement at work are two
distinct psychological constructs rather than polar opposites of
the same continuum, and
each is affected by different job characteristics. Another
example is work behavior. Positive
and negative work behaviors have been shown to be distinct and
to yield different, not just
opposite, performance outcomes (Dunlop & Lee, 2004; Sackett,
Berry, Wiemann, & Laczo,
2006).
Several resources are available for those employers searching
for valid and reliable tests of
positivity. The reference Positive Psychological Assessment: A
Handbook of Models and Mea-
sures (Lopez & Snyder, 2003), published by the American
Psychological Association, evalu-
ates the validity and reliability of numerous tests of positive
psychological constructs such
as optimism, hope, confidence, creativity, and courage. Of
course, for each psychological con-
struct, there are several alternative tests. Some offer higher
validity or reliability than others,
so employers should carefully select not only which constructs
they should test and evaluate
but also which specific tests to use.
Monkey Business Images Ltd/Thinkstock
Psychological testing can be an
effective way to assess applicants’
positivity levels.
you83701_03_c03_067-102.indd 98 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
99
Section 3.8 Testing for Positivity
Three specific positivity tests have been found particularly
relevant and predictive of work
performance and other desirable outcomes such as job
satisfaction, organizational commit-
ment, and overall well-being. The first of these is Gallup’s
Strengthfinder (Rath, 2007), which
assesses test takers on 34 different “talents” and is widely used
in the United States and
around the world to select, place, and fit job candidates in the
right jobs, usually based on the
test taker’s top five talents.
The second is the Psychological Capital Questionnaire (PCQ-
24), which has been recently
developed and validated by Fred Luthans and colleagues
(Luthans, Avolio, Avey, & Norman,
2007; Luthans, Youssef-Morgan, & Avolio, 2015) at the
University of Nebraska. It is a 24-item
measure that assesses the test taker’s levels of hope, confidence,
resilience, and optimism and
combines these four psychological resources into one higher
order positive construct. There
is also a short, 12-item version of this test (PCQ-12; Avey,
Avolio, & Luthans, 2011; Luthans,
Youssef, Sweetman, & Harms, 2013) that has been translated
into a number of languages
and tested in numerous cultures (Wernsing, 2014), as well as an
implicit version that uses
adaptable positive, negative, and neutral situations to assess test
takers’ reactions (Harms &
Luthans, 2012).
The third is Barbara Fredrickson’s (2009) positivity ratio
assessment, which measures posi-
tivity and negativity as two independent constructs and then
calculates the ratio of positive-
to-negative responses. See the feature box Find Out for
Yourself: How Positive Are You? to learn
more about this assessment.
Find Out for Yourself: How Positive Are You?
Visit the Positivity website to complete Barbara Fredrickson’s
positivity ratio assessment and
instantly obtain your own positivity ratio. Keep in mind that
this assessment is somewhat
volatile and will change depending on the situations you
encountered the day before. To get a
more accurate assessment, it is recommended that you complete
this test multiple times over
several days and take an average of your scores. Also keep in
mind that some of the statisti-
cal analysis behind positivity ratios has been criticized, so be
sure to review the 2013 update
posted by Fredrickson on the same website.
Positivity Ratio Assessment
you83701_03_c03_067-102.indd 99 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
http://www.positivityratio.com/single.php
100
Summary and Conclusion
Summary and Conclusion
Selecting the right candidates from the available pool of job
applicants is critical for
employee and organizational success. When the right employee
is selected and placed in
the right job, performance is higher, which can translate into
higher productivity and finan-
cial returns for the organization. For example, well-placed
employees tend to go above and
beyond the immediate job requirements, which can positively
influence coworkers and pro-
mote a positive organizational culture. Properly selected
employees are also likely to stay
with the organization longer and be absent less often, which can
translate into enormous
cost savings. Equally important, well-placed employees will
likely experience more satisfac-
tion with their jobs and the organization, have higher work
engagement levels, and perceive
their jobs as more meaningful, all of which contribute to higher
employee well-being.
Because effective employee selection relies on predicting
subsequent performance on the
job, it is beneficial for managers to use the most accurate and
consistent predictors avail-
able. Psychological testing affords managers the opportunity to
use valid and reliable tests
that can fulfill this important role. However, many managers
continue to use highly subjec-
tive approaches to selection, which often ends up wasting their
time and their organization’s
resources on selecting, training, and managing the wrong job
applicants, or worse, exposing
their organization to discrimination-based lawsuits that can be
time consuming and costly
and can compromise its reputation.
The role an I/O psychologist plays in the test-selection process
is threefold. First, I/O psy-
chologists can educate managers and organizational decision
makers on the importance of
finding evidence for a test’s validity and reliability before
attempting to use it. Second, they
can use available evidence to help organizations discern among
multiple tests, a process
that managers often perceive as intimidating or difficult to
understand. Third, I/O psycholo-
gists contribute to the development of more valid and reliable
tests and selection tools in
areas where none currently exist, helping create the most
appropriate and efficient methods
for selecting the right candidates in an ever-changing
workplace.
constructs Abstract, intangible personal
attributes.
Cronbach’s alpha A statistical measure of
the intercorrelations across items in a test.
meta-analysis A combination of the results
of several studies that address a set of
related research hypotheses.
objective tests Tests that have one clearly
correct answer.
percentile scores The percentage of people
in the standardized sample who scored
below an individual’s raw score.
raw scores The number of points a person
scores on a test.
reliability The extent to which the results
from a predictor such as a selection tool,
method, or procedure can be consistently
replicated over time and across situations.
Key Terms
you83701_03_c03_067-102.indd 100 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
101
Summary and Conclusion
standardization sample A large group of
people who take a test and act as a compari-
son group against whose scores an individ-
ual applicant’s scores can be compared.
subjective tests Tests that have no defini-
tive right or wrong answer; thus, scores rely
heavily on the evaluator’s interpretation and
judgment.
test A method used to make an employment
decision about whether to hire, retain, pro-
mote, place, demote, or dismiss someone; an
instrument or procedure that measures an
individual’s employment and career-related
qualifications and characteristics or mea-
sures samples of behavior or performance.
test norms The comparison scores pro-
vided by a standardization sample.
validity The extent to which a selection tool
or procedure can accurately predict subse-
quent performance.
validity generalization The notion that
validity evidence transfers across situations.
you83701_03_c03_067-102.indd 101 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
you83701_03_c03_067-102.indd 102 4/20/17 4:22 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
103
4Performance Appraisal
iStockphoto/Thinkstock
Learning Outcomes
After reading this chapter, you should be able to
• Recognize the importance of performance measurement for
organizational success.
• Identify and critique common approaches to measuring
employee performance in organizations.
• Explain the differences between objective and subjective
performance appraisals.
• Describe the different performance appraisal formats.
• Apply the concepts of validity and reliability to performance
appraisal tools and processes.
• Assess the effects of rating errors on performance appraisal
accuracy.
• Implement an effective performance management system.
• Apply positive psychology to performance appraisal
processes.
• Link employee performance and performance appraisal results
to financial outcomes.
you83701_04_c04_103-134.indd 103 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
104
Section 4.1 The Importance of Performance Appraisals
4.1 The Importance of Performance Appraisals
Throughout your life, people will evaluate your performance in
ways that shape who you
become and where you will go. From elementary school through
college, on the athletic field
and in your community, from your first part-time job to your
adult career, others will test and
evaluate and compare your performance, the results of which
will determine whether you
advance to the next phase of life.
Within organizations, assessing employees’ performance tends
to be perceived as a necessary
evil that neither managers nor staff particularly enjoy. Many
employees fear that even one
low performance rating could affect their pay or damage their
career. Even more concerning
is the prospect of receiving low ratings from a manager who
doesn’t ever directly observe or
work with you but uses secondhand information or personal
biases to make his or her evalu-
ations. Sadly, this is frequently the case.
Consider This: How Do You Feel About Being Evaluated?
Think about one or more occasions in which you were being
evaluated. It could be at work, at
school, while playing a sport, or elsewhere.
Questions to Consider
1. Describe your feelings and thoughts before you received
these evaluations. Were you
anxious? Were you looking forward to the evaluation?
2. Describe your feelings and thoughts while receiving these
evaluations. Were you sur-
prised? Upbeat? Interested in receiving feedback? Actively
involved? Did you passively
receive the information? Feel under attack?
3. Describe your feelings and thoughts immediately after these
evaluations. Were you
excited? Flattered? Humiliated? Angry? Defensive?
4. What effects did these evaluations have on your personal,
social, or professional life?
Did they make you a better person in any way? Explain your
answer.
Managers also experience anxiety when completing performance
appraisals. Most often, they
worry that criticisms, no matter how small, might provoke
negative reactions, ranging from
disappointment and frustration to anger and hostility. These
emotions can strain the man-
ager–employee relationship or cause the employee to become
less motivated or even to quit.
As a result, managers tend to shy away from providing negative
performance feedback, which
of course negates accuracy.
you83701_04_c04_103-134.indd 104 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
105
Section 4.1 The Importance of Performance Appraisals
If everyone dislikes performance appraisals, why keep doing
them? For one thing, unman-
aged performance is chaotic and random. Employees’ work
needs to be aligned with the orga-
nization’s overall goals, and clear performance feedback helps
everyone know if this is indeed
happening. In fact, a well-designed performance appraisal
system should not only provide
employees with rich feedback but should also communicate
clear performance expectations
and include information that will help them perform at their
highest level possible (Pulakos
& O’Leary, 2011). Appraisals that meet each of these concerns
will enable the organization to
further its mission to succeed. The question, then, is not
whether to keep doing performance
appraisals but how to make them most effective.
Using Performance Appraisals
A performance appraisal is the formal process through which
employee performance is
assessed. It involves providing feedback to the employee and
designing corrective action
plans if necessary. Organizations conduct performance
appraisals for the following reasons:
1. To evaluate performance objectively. Organizations need
some sort of system to mea-
sure the value of each employee’s performance. These measures
must be objective
and allow managers to consistently compare the performance of
people who have
the same job function.
Consider This: How Do You Feel About Evaluating Others?
Think about one or more occasions in which you had to evaluate
or give feedback to someone.
Again, it can be at work, at school, or in the context of a sport.
Personal and social settings can
also be used for this exercise.
Questions to Consider
1. Describe your feelings and thoughts before you gave your
evaluation or feedback. Were
you anxious? Hesitant? Excited?
2. What were your primary concerns? The fairness of your
evaluations? The reactions of
the people you were evaluating? The repercussions of your
evaluation for yourself and/
or the person you were evaluating?
3. Describe the settings in which you had to communicate your
evaluations. Was it face-to-
face? On the phone? Via e-mail? In a written report?
4. Describe the content of your feedback. Was it positive,
neutral, or negative?
5. How did the person you were evaluating react? Was your
feedback appreciated? Toler-
ated? Rejected?
6. How did you manage or leverage the reaction of the person
you were evaluating? Did
you involve him or her? Did you ask for his or her input?
7. Describe your feelings and thoughts after giving your
evaluation. Were you stressed?
Drained? Relieved? Did you feel more or less confident about
your feedback communi-
cation skills and your ability to accurately assess others’
performance?
8. What effects did your evaluation have on others’ personal,
social, or professional lives?
Did it make them better? Did it help them advance their
careers? How did it affect your
relationships with the individual you evaluated? Explain.
you83701_04_c04_103-134.indd 105 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
106
Section 4.1 The Importance of Performance Appraisals
2. To increase worker motivation.
Appraisals provide employees with
specific feedback regarding their
strengths and weaknesses. When
workers know what they should be
doing, how they actually are doing,
and how they can improve, they are
often motivated to perform better.
3. To make administrative decisions.
Managers rely heavily on data from
performance appraisals when making
decisions about employee raises and
bonuses, promotions, demotions, or
even terminations. Employees must
perceive these decisions as fair and
free from bias; a good performance
appraisal will facilitate those favor-
able perceptions.
4. To improve organizational perfor-
mance. Performance appraisals are
essential to improving organizational performance (DeNisi &
Sonesh, 2010). They
pinpoint skill deficiencies in specific parts of the organization,
helping managers
focus their training and selection efforts. Appraisals also
enhance an organization’s
opportunities for success by identifying poor performers. This
not only helps weed
out subpar personnel but also motivates top performers to keep
their performance
levels high. It is important to note that the link between
performance appraisals
and organizational performance is not always direct. As
discussed throughout this
chapter, performance appraisal needs to be an integrated part of
an effective set of
human resource practices in order to have a positive impact on
organizational per-
formance (DeNisi & Smith, 2014).
5. To establish training requirements. Appraisal data provides
insight into workers’
knowledge, skill, and deficiency levels. This information helps
managers establish
specific training objectives, update or redesign training
programs, and provide
appropriate retraining for specific employees.
6. To enhance selection and testing processes and outcomes. An
important use of perfor-
mance appraisal data is to establish the criterion-related validity
of selection tests.
Recall from Chapter 3 that criterion-related validity establishes
a predictive, empiri-
cal (number-based) link between test scores and actual job
performance by cor-
relating applicants’ employment test scores with their
subsequent performance on
the job. How can the predictive capacity of a test be determined
if the organization
does not design and implement objective performance measures
and procedures?
It would have no accurate data with which to correlate test
scores, which would
hinder its ability to design and use accurate tests and implement
effective selection
processes.
7. To provide a level playing field. Performance appraisals help
clarify expectations.
This is especially important in large organizations that feature
many managers who
oversee similar positions. Both managers and employees should
ensure that they are
pursuing a unified set of goals and expectations and that there
are no discrepancies. A
iStockphoto/Thinkstock
Performance appraisals have several
benefits, including increased worker
motivation. Employees are more productive
and satisfied when they know what is
expected of them and how they can achieve
these results.
you83701_04_c04_103-134.indd 106 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
107
Section 4.2 Approaches to Measuring Performance
formal performance appraisal process can help with that
standardization and calibra-
tion process. Without performance appraisals, employees may
be evaluated unfairly
because some managers may be intentionally or unintentionally
more lenient than
others. You will learn about some of these common biases in
this chapter.
4.2 Approaches to Measuring Performance
I/O psychologists have identified a number of techniques for
measuring employee perfor-
mance. These measures can be either objective or subjective.
Generally, what is measured and
how it is done depends on the type of work an employee
performs. Some jobs, such as sales
and assembly-line work, have objective outcome measures
(sales revenue, number of pieces
assembled), whereas others are more subjective (wait staff
performance, art design work).
Objective Performance Measures
Objective performance measures are quantitative measures of
performance that can be
found in unbiased organizational records. They are generally
easy to gather and include two
types of data: production measures (such as sales volume,
number of units made, number of
error occurrences) and personnel measures (such as
absenteeism, tardiness, theft, and safety
behaviors). Both measures are also usually evaluated according
to the quality of performance.
However, objective measures can be deceivingly
simple. Consider the performance of two sales
professionals in an insurance company. Over
the course of a year, Salesperson A sold 500 pol-
icies and Salesperson B sold 1,000. According to
this data, Salesperson B appears to be the better
salesperson—twice as good, in fact. However, if
we examine the quality of each worker’s per-
formance, we might learn that Salesperson B
sold unprofitable policies, resulting in a $1 mil-
lion loss for the company. Salesperson A, on the
other hand, sold very profitable policies, result-
ing in a $1 million profit.
Alternatively, Salesperson B may have focused
on selling more policies while cutting corners
on after-sale service and follow-up, which could
have resulted in dissatisfied customers. On the
other hand, Salesperson A may have invested more time per
sold policy on such interactions.
Although the organization may not directly measure or reward
after-sale service and follow-
up, these customer interactions can help build its reputation and
are known to result in more
satisfied customers returning for additional products and
referring others. Repeat business
from and referrals by satisfied customers are significantly less
costly for an organization to
generate than is building new clientele. However, these
additional sales may not be easily
attributable to Salesperson A if the returning or referred
customers are assigned to another
michaeljung/iStock/Thinkstock
Recording how many cars this man sold
over the past year is one way to objectively
measure his performance, although it
does not address other aspects of his
performance, such as customer service.
you83701_04_c04_103-134.indd 107 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
108
Section 4.2 Approaches to Measuring Performance
salesperson. As you can see, evaluating worker performance by
quantity alone is not a wise
course of action.
Unfortunately, even after accounting for performance quality,
objective data may not provide
an accurate or complete picture of an employee’s performance.
Many factors beyond workers’
control can limit their ability to perform their best. Looking
more closely at our two insurance
salespeople, we might discover that the difference in sales
volume could be attributable to the
location of each employee’s branch office. Salesperson A could
work in a small town, while
Salesperson B works in a large metropolitan region. Thus,
Salesperson A could have captured
a larger market share of his designated region than Salesperson
B, even though he sold fewer
policies. Alternatively, perhaps Salesperson B was assigned an
easy-to-sell policy because she
was a new employee, while Salesperson A, as a veteran
employee, was assigned a hard-to-sell
but very lucrative policy. As you can see, accurate performance
evaluations require more than
a cursory look at sales and production numbers, although adding
manager interpretation into
the mix does make objective performance measures more
subjective.
Personnel data, another objective measure, includes components
such as theft, tardiness,
absenteeism, safety behaviors, and rate of advancement. Though
not typically related to a
worker’s ability to do the job, these elements do indicate job
success or failure. Many jobs, such
as teachers, customer service representatives, and bank tellers,
require consistent and timely
daily attendance. Thus, absenteeism and tardiness are often used
to evaluate these workers’
performance. Other jobs, such as machine operators, assembly-
line workers, and truck driv-
ers, have serious safety risks. With jobs such as these, it makes
sense to keep count of employ-
ees’ accidents and safety incidents and use them as objective
measures of performance.
As with any type of objective data, however, taken on its own,
personnel data can be mis-
leading. Once again, circumstances outside a worker’s control
could affect performance. Sick
children, a death in the family, or transportation troubles may
affect an otherwise superior
employee’s ability to come to work. Similarly, a workplace
accident could have been caused
by faulty company equipment, not worker error. Because you
now understand the limits of
objective data, let’s turn to subjective performance measures,
their limitations, and ways to
keep this data fair and free from bias.
Subjective Performance Measures
The allure of objective performance measures has to do with
their ability to provide bias-free
information on all workers across a specific job. Of course, we
now know that objective data
can still be misleading. Further, most jobs require much more
than simply looking at sales or
production numbers, because most jobs are composed of a
complex web of tasks, not all of
which can be measured objectively. For example, a teacher’s
performance must be made up of
more than his or her students’ test scores, just as a police
officer cannot be evaluated solely
on the number of arrests he or she makes each month. While it
is in most respects easier to
measure things objectively, this can exclude important aspects
of a situation that may not be
objectively quantifiable but should still be taken into
consideration. Examples in the context
of performance appraisal include “friendliness” of a
salesperson, “helpfulness” of a customer
service representative, or “leadership potential” of a frontline
employee. To account for these
hard-to-quantify characteristics, I/O psychologists created
subjective performance mea-
sures, which rely on human judgment for evaluation and are
thus exposed to some degree
you83701_04_c04_103-134.indd 108 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
109
Section 4.2 Approaches to Measuring Performance
of subjectivity and personal judgment. Of course, with personal
judgment comes some per-
sonal bias. To reduce such bias, evaluators must base their
ratings on observations of worker
behaviors that are critical for successful job performance.
Furthermore, these behaviors must
be identified by conducting an accurate job analysis.
Interestingly, research shows only a small correlation between
objective and subjective per-
formance measures, which suggests that they measure different
aspects of worker perfor-
mance (Bommer, Johnson, Rich, Podsakoff, & McKenzie,
1995). Thus, the two sets of measures
are complementary and should be used in conjunction whenever
possible.
Organizations use many types of subjective performance
measures, ranging from manager-
composed performance narratives to numerically oriented rating
scales. Each method differs
in complexity as well as the amount of time required to create
and implement it. The next sec-
tion provides a brief review of some common subjective
performance measures.
Written Narratives
With the written narrative, one of the easiest performance
measures to develop, the man-
ager writes a paragraph or two summarizing an employee’s
performance over a certain length
of time. An example of a written narrative is a reference letter
written by a supervisor for an
intern at the end of an internship. When used to measure
performance, managers often share
specific examples of the worker’s strengths and weaknesses,
which the worker can then use
to help improve his or her performance during the next
appraisal cycle.
Although written narratives are quick and easy, they have a
number of drawbacks. First, every
manager will set different evaluation standards, making it
impossible to compare workers
with different managers. As a result, the written narrative
should not be used to make deci-
sions about compensation, promotions, or layoffs. Second,
managers vary in their quality of
written communication skills. Some may use ambiguous,
incomplete, or misleading language,
which can lead the employee to misinterpret or not understand
the feedback. Finally, manag-
ers are often reluctant to address poor performance in a
straightforward manner and some-
times deliberately write the narrative to cast negative behavior
in a positive light.
The drawbacks of the written narrative have prompted I/O
psychologists to develop a num-
ber of techniques to both improve the objectivity of subjective
performance measures and
reduce managerial biases.
Rank Ordering
Rank ordering requires no forms or instruments and is the
easiest way to evaluate workers.
Managers simply rank their employees from best to worst. Some
managers have the tendency
to evaluate employees similarly. Rank ordering provides much-
needed differentiation, even
though the small differences between median employees still
make it a challenge for manag-
ers to rank employees. Rankings also do not provide workers
with performance feedback,
which means they are not useful for self-improvement or
training guidance. Because of their
limitations, rankings should be used only during periods of
promotion, downsizing, reorgani-
zation, or any other situation in which it would be valuable to
understand a worker’s standing
relative to other workers.
you83701_04_c04_103-134.indd 109 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
110
Section 4.2 Approaches to Measuring Performance
Paired Comparison
As with rank ordering, the paired comparison technique requires
the manager to evaluate
a worker’s performance compared to the other workers on the
team. In a systematic fashion,
the manager compares one pair of workers at a time and then
judges which of the two dem-
onstrates superior performance. After comparing all the workers
in all possible pairings, the
manager then creates a rank ordering based on the number of
times each worker is the better
performer of a pair. For example, using the formula N(N -
1)/2)]N to determine the number
of discrete pairings in a group, a manager with 10 employees
would need to make 45 paired
comparisons. A manager with a team of 20 employees would
need to make 190 comparisons.
As you can see, the number of pairs goes up quite quickly as the
size of the team increases. For
this reason, paired comparisons are only advantageous for
smaller groups.
Like general rank orderings, paired comparisons do not provide
performance feedback. How-
ever, they are generally simpler to use because managers need
only compare one employee
pair at a time, instead of the entire work team. Organizational
leaders should keep in mind
that rankings are not standard across the entire workplace. The
lowest ranked member of a
high-performing team might, for example, actually perform
better than the highest ranked
member of a poorly performing team.
Forced Distribution
When an organization needs to evaluate a large number of
employees, forced distribution is
a viable option. With this technique, managers place employees
into categories based on pre-
established proportions. A typical performance distribution uses
the following performance
categories and proportions:
Superior 10%
Above average 20%
Average 40%
Below average 20%
Poor 10%
Using this distribution for a team of 100 workers, a manager
would identify the top 10
employees (10%) and the bottom 10 employees (10%) and place
them in the superior and
poor categories, respectively. From the remaining 80 workers,
the manager would then select
the next 20 highest performers (20%) for the above-average
category and the next 20 lowest
performers (20%) for the below-average category. The final 40
workers (40% of the origi-
nal 100) would fall into the average category. The lowest
performance group would then be
assigned to additional training, reprimanded, put on probation,
or terminated. This approach
is most commonly associated with Jack Welch, former CEO of
General Electric. The company
eliminated the lowest 10% of performers every year using this
method.
Obviously, one of the major drawbacks of forced distribution is
that it assumes that worker
performance follows a normal distribution (some high
performers, some low, most some-
where in the middle). This method makes no concessions for
teams that are filled with supe-
rior performers or, conversely, teams fraught with poor
performers. Furthermore, it makes no
you83701_04_c04_103-134.indd 110 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
111
Teamwork
Below
average
Above
average
AveragePoor Superior
2 431 5
Section 4.2 Approaches to Measuring Performance
distinctions among workers in a category; all average workers,
for example, are simply con-
sidered average. Finally, as with rank ordering and paired
comparisons, forced distribution
can add artificial luster to “superior” members of poor-
performing teams or unfairly tarnish
“poor” members of a high-performing team. However, many
organizations find forced distri-
bution methods necessary to maintain a high-quality workforce
in a competitive market. In
order for forced distribution to realize its benefits, an
organization should consider factors
such as how low performers are treated and how high
performers are differentiated from low
performers in terms of rewards (Blume, Baldwin, & Rubin,
2009)
Consider This: Leadership
In this video, former General Electric CEO Jack Welch
discusses his leadership strategies,
including the value of the forced distribution approach.
Jack Welch on Leadership
Graphic Rating Scale
Graphic rating scales are the most commonly used method for
rating worker performance.
Managers observe specific employee behaviors or job duties
along a number of predeter-
mined performance dimensions such as quality of work,
teamwork, initiative, leadership abil-
ity, and judgment. Then the manager rates the quality of
performance for each dimension on
a scale ranging from high to low performance. Looking at
Figure 4.1, you can see that this
employee received a below-average rating on teamwork.
Figure 4.1: General graphic rating scale
Teamwork
Below
average
Above
average
AveragePoor Superior
2 431 5
Each point on a graphic rating scale, called an anchor, is
defined along the continuum. Anchors
can vary in number, description, and depth of detail and can be
stated in numbers, words,
longer phrases, or a combination of these forms. Typically, the
manager rates workers’ per-
formance for each anchor using the 5-point rating scale,
although 7- or even 9-point scales
are not uncommon.
you83701_04_c04_103-134.indd 111 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
https://www.youtube.com/watch?v=l5GryYk5hV8
112
Judgment: Evaluates situations and people effectively when
choosing to
respond. Makes ethical decisions. Is able to identify relevant
and irrelevant
issues in a situation and respond appropriately. Applies the
correct standards
and policies.
Judgment
Poor Superior
2 431 5
Judgment: Makes sound decisions that affect his/her work
Needs
improvement
Above
expectations
Meets
expectations
Unsatisfactory
Exceeds
expectations
2 431 5
1. Outstanding
2. Very good
3. Good
4. Improvement needed
5. Unsatisfactory
Excellent Satisfactory UnsatisfactoryOutstanding
Judgment:
Makes clear,
logical
decisions
Comments
(D)
(C)
(B)
(A)
Section 4.2 Approaches to Measuring Performance
Graphic rating scales are versatile, inexpensive, and quickly
made. However, in order for the
manager to make clear, accurate distinctions in worker
performance across different dimen-
sions, care must be taken to create specific and unambiguous
anchor descriptions. For exam-
ple, in Figure 4.2, Scale A uses only qualitative anchors and
requires the rater to place a check
mark at the point that represents the worker’s current
performance level. This is a poorly
designed rating scale because the rating anchors are left
undefined. Similarly, Scales B, C, and
D include both verbal and numerical anchors, but ratings rely
solely on manager judgment.
Figure 4.2: Examples of different graphic rating scales
Judgment: Evaluates situations and people effectively when
choosing to
respond. Makes ethical decisions. Is able to identify relevant
and irrelevant
issues in a situation and respond appropriately. Applies the
correct standards
and policies.
Judgment
Poor Superior
2 431 5
Judgment: Makes sound decisions that affect his/her work
Needs
improvement
Above
expectations
Meets
expectations
Unsatisfactory
Exceeds
expectations
2 431 5
1. Outstanding
2. Very good
3. Good
4. Improvement needed
5. Unsatisfactory
Excellent Satisfactory UnsatisfactoryOutstanding
Judgment:
Makes clear,
logical
decisions
Comments
(D)
(C)
(B)
(A)
you83701_04_c04_103-134.indd 112 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
113
Completes work at a high standard and by the deadline assigned
7
6
5
4
3
Always meet deadlines and require only minimal supervision
Take on major projects beyond his or her duties with little or
no supervision
Ratee can be expected to…
Responsibility
2
1
Never meet deadlines
Sometimes accept as little responsibility as possible and often
miss deadlines
Accept ownership of projects only when assigned
Regularly meet deadlines and accept responsibility for only
his/her duties
Section 4.2 Approaches to Measuring Performance
Of course, this can be problematic, because one manager might
judge his or her employees
more or less stringently than another. Standard measures allow
for clear comparisons across
workers, even if they have different managers.
Some organizations ask managers to provide written examples
that support their ratings of
employees for each performance dimension and/or for the
employee’s overall performance
level. By combining both their rating scores and written
feedback, employees learn how their
performance compares to the company’s expectations and what
their current strengths and
weaknesses are. This allows the company to set goals or devise
training strategies accord-
ingly, and it allows the employee to seek out self-improvement
or educational resources.
Behaviorally Anchored Rating Scale
First proposed by Smith and Kendall in 1963, the behaviorally
anchored rating scale
(BARS) attempts to evaluate workers’ performance of very
specific behaviors critical for job
success. These behaviors are established using the critical
incidents job analysis technique
discussed in Chapter 2.
Developing a BARS can be a long and difficult process. To
begin, a group of supervisors famil-
iar with the job both identifies the performance dimensions
(quality of work, teamwork,
initiative, etc.) that should be measured and observes critical
incidents that exemplify both
effective and ineffective job performance. Another group of
subject matter experts transforms
the list of critical incidents into behavioral statements that
describe different levels of perfor-
mance. A final group evaluates the behavioral statements and
assigns a numerical value scale
to each. See Figure 4.3 for an example.
Completes work at a high standard and by the deadline assigned
7
6
5
4
3
Always meet deadlines and require only minimal supervision
Take on major projects beyond his or her duties with little or
no supervision
Ratee can be expected to…
Responsibility
2
1
Never meet deadlines
Sometimes accept as little responsibility as possible and often
miss deadlines
Accept ownership of projects only when assigned
Regularly meet deadlines and accept responsibility for only
his/her duties
Figure 4.3: Example of a BARS
you83701_04_c04_103-134.indd 113 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
114
Greets customers upon arrival
Bank teller...
Correctly counts money
Accurately describes product
characteristics to customers
Quickly addresses customer
problems
Never
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Sometimes Always
Section 4.3 Sources of Performance Appraisal
One positive feature of the BARS approach is that the
behaviorally defined anchors are very
explicit as to what performance criteria are being measured.
This makes it much easier for
managers to distinguish between high and low performers.
Additionally, because the rating
scale is standardized, managers can compare BARS performance
ratings across individuals
and teams. Furthermore, workers perceive BARS to have high
face validity, which reduces
negative reactions to low ratings.
Despite the advantages, the significant time investment needed
to develop BARS means that
most organizations do not employ this technique. Additionally,
this method’s overall rating
quality still depends on each manager’s observational skills (or
lack thereof ). Finally, research
shows that the BARS is no more valid or reliable than any other
rating method, nor is it more
successful at decreasing rater error (Landy & Farr, 1980).
Behavioral Observation Scale
The behavioral observation scale (BOS) is similar to BARS in
that both use critical incidents
to identify worker behaviors observed on the job. The biggest
difference between BOS and
BARS is the rating format. Instead of quality, BOS rates the
frequency with which a worker
is observed to perform a critical job behavior (see Figure 4.4 for
an example). Frequency is
typically measured on a 5-point rating scale, comprising
numerical (0%–20% of the time,
21%–40% of the time, etc.) or verbal assignments (sometimes,
always, never, etc.) or a combi-
nation. The ratings are aggregated across all behavioral
statements to establish a total score.
Some researchers have tried to determine which method—BARS
or BOS—is superior, but the
research has been mixed and inconclusive.
Figure 4.4: Example of a BOS for a bank teller
Greets customers upon arrival
Bank teller...
Correctly counts money
Accurately describes product
characteristics to customers
Quickly addresses customer
problems
Never
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Sometimes Always
4.3 Sources of Performance Appraisal
The goal of performance appraisal is to accurately measure
employees’ performance. To do so,
raters must directly observe an employee’s actions and
behaviors. In many cases managers
do not get to directly observe their employees. A police chief,
for example, cannot accompany
all of his or her officers as they perform their daily patrols, nor
can a school principal sit in
classrooms all day long. Who, then, should evaluate such
workers? For many jobs, input from
you83701_04_c04_103-134.indd 114 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
115
Section 4.3 Sources of Performance Appraisal
a variety of sources helps create a more accurate, well-rounded
performance appraisal—for
a police officer, these sources might include his or her partner
or members of the community;
for a professor, they might include students or fellow faculty
members. The following section
describes the most common sources of input.
Supervisor Evaluation
Supervisors are the most common source of input for a
performance appraisal, and rightly so.
After all, managers are in the best position to evaluate
employees’ performance as it relates
to the organization’s objectives. Furthermore, because managers
are responsible for recom-
mending rewards and punishments, they must be able to tie their
evaluations to employees’
performance. Without this link between performance and
rewards, employees can become
less motivated, resulting in poorer performance. Indeed,
research shows that supervisors’
performance ratings are more strongly correlated to employees’
actual performance than any
other rating source (Becker & Klimoski, 1989).
Peer Evaluations
Peer evaluations, or those made by one worker about a
coworker, are common in jobs that
require employees to work as a team. Peer feedback can be
especially insightful. Coworkers
often understand the job in greater depth than managers and can
analyze how team mem-
bers’ behaviors affect each other and contribute to the team’s
success. Similarly, peer ratings
on the dimension of leadership effectiveness provide a valuable
perspective into a worker’s
leadership skills and abilities.
How do employees respond to peer evaluations? Generally,
reactions are mixed. In some sit-
uations, workers are appreciative because their peers are the
only ones who ever directly
observe their performance and are therefore the only ones who
can accurately evaluate them.
Furthermore, because peer ratings are nearly as accurate as
supervisory ratings, they are
excellent guides for self-improvement (Harris & Schaubroeck,
1988). On the other hand,
workers may question the validity of a negative peer review,
which can detrimentally affect
their future performance. DeNisi, Randolph, and Blencoe (1983)
found that workers who
received negative peer-rating feedback went on to hold more
negative perceptions of group
performance, group cohesion, and overall satisfaction during the
subsequent team task. Con-
versely, positive peer feedback did not significantly affect any
of these variables on the next
task. Peer ratings, therefore, should serve as a supplemental—
not sole—source of a worker’s
performance evaluation.
Subordinate Evaluations
Subordinates are uniquely capable of assessing their manager’s
actions and effectiveness
across a broad range of dimensions, including delegation,
coaching, communication, leader-
ship, and goal setting. The process of subordinate evaluation,
also called upward feedback,
involves allowing subordinates to evaluate their superior’s
performance.
Some research supports the notion that upward feedback can
improve management’s perfor-
mance. In one study, subordinates rated their managers on a
number of different dimensions
you83701_04_c04_103-134.indd 115 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
116
Section 4.3 Sources of Performance Appraisal
(quality, fairness, support, and communication). Superiors who
received low to moderate rat-
ings showed significantly more rating improvements 6 months
later than those who received
high ratings (Smither et al., 1995). Further research shows that
managers can improve their
ratings even more by discussing upward feedback with their
subordinates (Walker & Smither,
1999).
Confidentiality is critical to ensure that subordinate evaluations
are accurate. Many employ-
ees fear there will be potential repercussions to giving managers
negative feedback and will
artificially inflate their evaluations if they know the manager
will be able to identify them
(Antonioni, 1994).
Self-Evaluation
Employees who complete self-evaluations, or evaluations of
their own performance, feel as
though they have a voice in the appraisal process, which in turn
increases their acceptance of
and decreases their potential defensiveness about the final
ratings. Although typically used as
supplemental evaluative data, self-ratings are especially useful
with workers who work alone
or independently.
Generally, self-ratings show more leniency, less
variability, greater bias, and less agreement with
those provided by supervisors, peers, or subor-
dinates (Harris & Schaubroeck, 1988). These
differences could stem from the worker’s use
of a different evaluative standard (Schrader &
Steiner, 1996), but research has identified sev-
eral ways to make self-ratings more accurate.
First, workers can be told that their self-ratings
will be compared against objective performance
criteria. Second, the organization can make it
clear that the self-evaluation will be used only
for self-developmental purposes (Meyer, 1991).
Third, educating both superiors and subordi-
nates on the rating criteria and appraisal process
leads to greater agreement between supervisor
ratings and self-ratings (Williams & Levy, 1992).
360° Appraisals
360° appraisal is a multisource evaluative process; it utilizes
performance input from many
different viewpoints. In this process, a manager might, for
example, receive feedback from his
or her supervisor, peers, subordinates, and internal or external
customers, as well as conduct
a self-evaluation. Normally, each source uses a rating scale to
evaluate the manager’s current
proficiency level for a predetermined set of job duties and/or
leadership dimensions (coach-
ing, delegating, communicating, etc.). After all ratings are
complete, they are compiled in a
report and shared with the manager.
iStockphoto/Thinkstock
Although self-ratings show more leniency
and greater bias than those provided
by supervisors or peers, employees
who complete self-evaluations feel
more engaged in the appraisal process,
which can decrease their feelings of
defensiveness regarding the final ratings.
you83701_04_c04_103-134.indd 116 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
117
Section 4.3 Sources of Performance Appraisal
Over the past 30 years, 360° appraisals have become
significantly more popular. In the mid-
1980s fewer than 10% of companies used this method to
evaluate managers (Bernardin,
1986). Today, even though there is no exact percentage, it is
likely that every Fortune 1000
company has some experience with conducting 360° appraisals.
Interestingly, there is no con-
sensus on how exactly to use these evaluations. Some believe it
is appropriate to use them
to make administrative decisions (Church & Bracken, 1997), but
most disagree, suggesting
they be used only for management development purposes
(Antonioni, 1996; Coates, 1996).
Some of this debate stems from concerns that employees may
feel uncomfortable with rating
their superiors and may fear the repercussions of providing an
unfavorable evaluation. On
the other hand, managers may feel that their peers or employees
are unqualified to rate their
performance because they may not have the full picture.
However, despite differing opinions
on how 360° appraisals should be used, research shows that
consulting various sources pro-
vides unique and distinct perspectives about an employee’s
competencies, performance, and
effectiveness (Semeijn, Van Der Heijden, & Van Der Lee,
2014).
I/O psychologists recommend a number of practices to increase
the effectiveness of 360°
appraisals. First, both raters and the manager should receive
instructions on how to interpret
the different performance dimensions and the rating scale.
Second, all participants must be
explicitly told that feedback will only be used for the purpose
of manager development. Fur-
thermore, maintaining rater anonymity tends to prompt
subordinates to view 360° apprais-
als more positively, although, interestingly, managers tend to
prefer that employees be held
accountable for their ratings (Antonioni, 1994). Finally, to
ensure the highest quality, a 360°
appraisal program must include skilled coaches to help
managers interpret and use their
feedback to create goals and courses of developmental action
(Coates, 1996; Crystal, 1994).
Absence of such coaching can compromise the desired
behavioral changes and performance
improvements and can lead to disengagement (Nowack &
Mashihi, 2012).
Consider This: Seeking Feedback
1. Most of us often seek feedback on our performance in various
life domains, whether at
work, at school, at home, in sports, or at church. In which
domains of your life do you
normally seek or get feedback on your performance?
2. What are the advantages and disadvantages of each source of
feedback?
Find Out for Yourself: Performance Measures and Sources
Review the following templates from the HR offices at
University of California–Berkeley and
University of California–Davis for examples of performance
measures that utilize feedback
from different sources as well as performance evaluations that
use these measures.
UC Berkeley Annual Performance Planning and Review
UC Davis Employee Performance Appraisal Report
you83701_04_c04_103-134.indd 117 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
http://hr.berkeley.edu/sites/default/files/attachments/FY16_Perf
_Mgt_Form-
Director_of_Finance_and_Strategic_Planning_RP.docx
http://www.ucdmc.ucdavis.edu/hr/hrdepts/forms/MSP_EPAR.do
c
118
Section 4.4 Sources of Rating Error and Bias in Performance
Evaluation
4.4 Sources of Rating Error and Bias in Performance
Evaluation
Performance appraisal relies on the assumption that human
judgment is capable of some
degree of accuracy. However, humans are not objective
observers. We can never be completely
certain that our judgments are free from error or personal
biases. Often, of course, errors are
unintentional. In the workplace, rating errors can occur if
managers do not observe workers’
performance or if they do not use the rating scale correctly.
More insidious, managers can
also harbor unacknowledged biases against certain types of
workers. At other times, rating
errors are intentional. Managers can deliberately inflate a
poorly performing employee’s rat-
ings because they don’t want to jeopardize their relationship
with the employee or because
they don’t want to cause negative reactions. I/O psychologists
have worked hard to identify,
understand, and correct sources of rating error.
Types of Rating Error
In order to improve accuracy, one must first identify and
understand error. With performance
appraisals, rater errors typically fall into three major categories:
observational errors, distribu-
tional errors, and rating-scale errors. In this section, we review
the specific rating errors associ-
ated with each category and discuss how these errors reduce
performance appraisal accuracy.
Observational Errors
As stressed throughout this chapter, appraisals must be based on
thorough observation of
an employee’s performance in order to be accurate. Without
direct observation, managers
may rate employees based on unreliable sources of information
such as general impressions,
observations from past ratings, or hearsay.
Even if managers are able to observe workers’ performance,
they are often unable to remem-
ber more than an employee’s most memorable performance
accomplishments (or failures).
An average appraisal cycle lasts up to 12 months, and as you
might expect, a manager will
remember and thus more strongly emphasize recent
performance, an effect called recency
error. Of course, when a recency error occurs, a worker’s
ratings do not accurately represent
his or her overall performance throughout the entire appraisal
cycle.
The best way to overcome recency error is simple: Reduce the
amount of performance
information the manager needs to remember. One practical way
for managers to do this is
to shorten the appraisal cycle by conducting more frequent
appraisals throughout the year
instead of once annually. Additionally, managers can improve
recall by keeping a detailed
performance log for each employee, especially at the beginning
of the rating cycle. Regular
feedback is important for motivation and development. Thus,
performance appraisal should
not be viewed as a once-a-year event but as an ongoing process.
The timing and frequency of
performance feedback should be determined by employee needs,
situations that necessitate
such feedback, and logical milestones in the employee’s
performance goals.
Distributional Errors
Within an organization, evaluation standards tend to differ from
manager to manager. Signifi-
cant error occurs if managers inaccurately distribute rating
scores along the rating scale. For
you83701_04_c04_103-134.indd 118 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
119
Section 4.4 Sources of Rating Error and Bias in Performance
Evaluation
example, some managers may clump everyone together with
average scores. Others may be
overly positive or afraid to give anyone negative scores. A third
group of managers may be too
stringent and thus give most of their employees low scores.
These faulty judgments are called
distributional errors.
First, one of the most common forms of rating error in general
is leniency error. In this sit-
uation, managers have a low performance standard and rate their
employees higher than
their performance deserves. A graphic representation of a
lenient manager’s rating scores
will show that they tend to cluster on the positive end of the
distribution. Even though the
high scores could be a reflection of a truly high-performing
team, research tells us that such
a result is unlikely. Normal worker performance distribution
tends to follow a bell-curve pat-
tern, with some workers falling at the high and low ends but
most clustering somewhere in
the middle. Leniency error is very obvious when it occurs. For
example, one study found that
out of 12,000 federal employees, 85% received ratings at the
superior level on their perfor-
mance appraisals, whereas only 1% received ratings below the
fully successful level (Marrelli
& Tsugawa, 2009). It is extremely unlikely that this distribution
is accurate.
Research shows that rater personality characteristics such as
agreeableness and conscien-
tiousness could be linked to leniency error. In an experimental
lab study, people with low con-
scientiousness and high agreeableness rated their peers more
positively, regardless of actual
performance (Bernardin, Cooke, & Villanova, 2000). As
previously discussed, a manager’s
reluctance to give negative feedback is another common impetus
for rating leniency. Most
managers want to develop and maintain positive relationships
with their subordinates, and
offering positive feedback during a performance discussion is
certainly more enjoyable than
the alternative. Unfortunately, being lenient not only fails to
challenge and improve workers’
future performance, it also makes it legally difficult to
terminate a poor performer.
The second form of distributional error is central tendency.
Central tendency error occurs
when a manager is reluctant to rate employees as either superior
or inferior. As a result,
ratings scores cluster around the middle of the performance
scale. Third is severity error, in
which a manager holds excessively high standards and rates
employee performance as lower
than it actually is. A severe manager’s ratings scores will
cluster around the low end of the
performance scale. Although these two distribution errors are
less common than leniency
error, they are just as problematic. Because they fail to address
the real differences among
employees’ performance, scores tainted by severity and central
tendency error are worthless
for making employee decisions.
Find Out for Yourself: Distributional Errors and Biases
The next time you participate in a group activity or team
project, rate each of the group mem-
bers (excluding yourself ) on his or her performance and
contribution to the project on a scale
of 1–10. Then ask each team member to evaluate each of the
other members (excluding him-
self or herself ).
What Did You Learn?
1. Are your evaluations consistently more lenient than, more
stringent than, or compa-
rable to your team members’ evaluations?
you83701_04_c04_103-134.indd 119 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
120
Section 4.4 Sources of Rating Error and Bias in Performance
Evaluation
Rating-Scale Error
Sometimes performance appraisal errors occur because the rater
does not know how to
use the rating scale correctly. In other cases a manager’s
general opinion about a specific
employee can color his or her ratings of all performance
dimensions for that employee (Lance,
LaPointe, & Stewart, 1994). This tendency, called the halo
effect, is the most common form of
rating-scale error and can either artificially inflate or deflate
ratings. For example, if a man-
ager believes one of his or her subordinates is extremely smart,
he or she might transfer that
positive opinion to evaluations of other performance areas, such
as collaboration, ethics, and
loyalty. Basically, then, managers who rate workers high (or
low) on one significant dimen-
sion will go on to score them high (or low) on all other
dimensions on the appraisal, especially
if the other dimensions are not well defined or directly
observed.
One way to counteract the halo effect is for managers to rate all
employees on the same dimen-
sion before moving on to the next one. This helps managers
keep employee performance in
perspective. Another option is to use more than one source to
rate employees.
Although most researchers believe that the halo effect is present
in almost all ratings and set-
tings, some studies suggest that it is less prevalent—and less of
a concern—than previously
thought. In a review of past research, Murphy, Jako, and Anhalt
(1993) concluded that, in the
studies they examined, halo was not nearly as common as
traditionally believed and, even
when it did occur, did not negatively affect rating accuracy.
Surprisingly, when organizations
consciously try to control halo, they end up with less accurate
ratings (Murphy & Reynolds,
1988). Organizations must therefore not become overzealous in
their attempts to eliminate
halo. Indeed, some employees really are very strong (or very
weak) across all performance
dimensions, and their consistent ratings reflect an accurate
evaluation of their performance.
Rater Biases
Ratings can also be influenced by a worker’s per-
sonal relationship with the evaluator. Similar-to-
me error, for example, occurs when evaluators give
higher ratings to workers whom they perceive to be
like them (Wexley, Alexander, Greenawalt, & Couch,
1980). A study of 104 air force officers showed that
familiarity between the officers and the aviators
they debriefed, if it existed, positively affected the
aviators’ ratings of the officers (Scotter, Moustafa,
Burnett, & Michael, 2007). Other research has exam-
ined personal characteristics such as attractiveness
and demographic characteristics, each of which
influences performance ratings.
Another source of rater bias is implicit person the-
ory. People tend to hold implicit theories about the
extent to which a person can change. Those who adopt an
incremental implicit theory believe
in people’s malleability and ability to change, while those who
adopt an entity implicit theory
believe that people’s characteristics are inherently fixed and
difficult to change. Research
shows that managers who adopt an incremental theory are more
likely to observe changes
in employee behaviors from one performance appraisal to the
next and are less likely to be
Stockbyte/Thinkstock
Performance ratings are often based
on the evaluator’s personal bias
regarding gender, age, and race. For
instance, older workers typically
receive lower performance ratings
than their younger coworkers.
you83701_04_c04_103-134.indd 120 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
121
Section 4.4 Sources of Rating Error and Bias in Performance
Evaluation
bogged down by prior impressions in judging current
performance than their entity theory
counterparts. Thus, these managers are more likely to provide
more accurate and objec-
tive performance evaluations. Managers can be trained to use an
incremental theory when
evaluating their employees and thus become more effective
performance evaluators (Heslin,
Latham, & VandeWalle, 2005). In other words, when managers
believe in their employees’
ability to change and develop, they are more likely to observe
changes in their behaviors
over time and to evaluate them more accurately and objectively.
Thus, effective performance
management systems should emphasize supervisor training in
order for raters to have an
appropriate mind-set for yielding accurate and beneficial
results.
Three demographic characteristics require particular attention:
race, gender, and age. As you
recall from Chapter 2, any tool organizations use to make
decisions about employees (hiring,
promotion, placement, compensation, termination, etc.) must
not discriminate against pro-
tected classes. In most organizations, performance appraisals
provide important data used
to make those decisions. You can easily understand, then, that
any personal biases based on
race, gender, and age are especially problematic for an
organization if they significantly influ-
ence performance appraisal ratings.
Research on race, gender, and age biases has produced mixed
results. In the category of
race, research shows that overall, Black employees receive
slightly lower ratings than White
employees (McKay & McDaniel, 2006). However, because
closer examination shows that all
managers tend to give higher ratings for individuals of their
own race, the overall results
seem to be due to a combination of similar-to-me error and the
higher proportion of Whites
in managerial roles.
Likewise, gender bias occurs in some situations, but it tends to
be limited to situations of
gender role incongruence. Specifically, male employees’
performance tends to be rated higher
than female employees’ performance in traditionally masculine
roles but lower in tradition-
ally feminine roles, and vice versa for women (Pulakos, White,
Oppler, & Borman, 1989). A
more recent study of 448 upper level managers examined the
equity of promotions for men
and women and found that women needed to show significantly
higher performance appraisal
ratings than men in order to be considered for promotion
(Lyness & Heilman, 2006).
Finally, research shows that managers may favor younger
workers. In general, older work-
ers receive lower ratings than their younger coworkers, and the
bias increases when older
employees work for younger managers (Shore, Cleveland, &
Goldberg, 2003).
Improving Rater Accuracy
Clearly, performance appraisals are susceptible to numerous
types of rating errors, which
affect the appraisals’ usefulness in employee decision making
and self-improvement. To
thwart the negative effects of error, I/O psychologists have
identified several ways to increase
rating accuracy. Let’s take a moment to review some of these
techniques.
Rating-Error Training
One way to deal with rating errors is to train raters not to make
them. Rating-error training
involves teaching raters about the different types of rating
errors along with practical strate-
gies for avoiding them. When dealing with leniency error, for
example, raters will first learn
you83701_04_c04_103-134.indd 121 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
122
Section 4.4 Sources of Rating Error and Bias in Performance
Evaluation
about the type of error (in this case, inflated ratings with little
variability), then its possible
causes (e.g., raters wish to maintain a positive relationship with
the employee), and finally
strategies to overcome it (e.g., rank order employees before
rating each one). This process
assumes that educating raters about potential errors and possible
solutions can help them
overcome their biases.
Unfortunately, the perceptual biases that lead to rating errors
(such as stereotypes, the need
to please, and high expectations) are ingrained in our cognitive
processes and are difficult to
change. Rating-error training thus requires significant time and
effort, with sessions typically
lasting from 6 to 8 hours (Latham, Wexley, & Pursell, 1975).
Yet even with extensive training,
improved rating accuracy is short lived (Fay & Latham, 1982).
Rating-Accuracy Training
Like rating-error training, frame-of-reference (FOR) training
aims to help raters improve their
accuracy. With this method, raters learn about the type of
employee performance their orga-
nization wishes to see. They are given a frame of reference, or
model, against which to com-
pare their employees. If raters know the organization’s
performance standard for a specific
job position, they will be less likely to use their own
idiosyncratic standards to judge a worker
in that position.
Designing FOR training involves a number of important steps.
First, the raters receive a clear
and concrete definition of the various performance dimensions
and rating scale included on
the appraisal they will be using. Next, the trainer discusses
work behavior that illustrates
appropriate performance at each rating-scale level. After the
raters have a basic understand-
ing of the different performance expectations, they practice
using the rating scale, usually
by watching video scenarios of people working at various levels
of performance and then
attempting to appropriately rate them. Finally, the trainer goes
over each scenario, discussing
at which performance level each sample employee should have
been rated.
Research has found that FOR training does indeed
lead to more accurate performance appraisal
ratings (Woehr & Huffcutt, 1994). As an added
bonus, research by Davis and Mount (1984)
found that managers who received FOR training
became more effective at creating development
plans for their direct reports because they more
clearly understood what the company expected
of its employees.
Rater Accountability
Raters need to be motivated to make accurate rat-
ings, but organizations often do not provide much
incentive to do so. If your boss never underwent
formal performance appraisal training, or if he
or she was never really held accountable for the
quality of his or her performance ratings, how
iStockphoto/Thinkstock
In calibration meetings, managers
must discuss the ratings of employees
in similar jobs. As such, calibration
meetings increase accountability, as they
require managers to justify their ratings
of each employee to their peers.
you83701_04_c04_103-134.indd 122 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
123
Section 4.5 Performance Management Systems
motivated would he or she be to take performance appraisal
seriously? In turn, how likely will
you be to take it seriously when it is time for you to evaluate
your own staff ? Unfortunately,
the poor behavior modeled by some executives often provokes
similarly cavalier attitudes
among the company’s entire management team. If, however,
managers are held accountable,
they will take the appraisal process seriously, and their ratings
will be more accurate (Harris,
Ipsas, & Schmidt, 2008; Mero & Motowidlo, 1995). One way to
increase accountability is for
organizations to hold calibration meetings, at which managers
discuss the performance and
ratings of workers in similar jobs. As you might surmise,
because they force managers to jus-
tify their ratings to their peers, these meetings increase rating
consistency across employees
(McIntyre, Smith, & Hassett, 1984).
4.5 Performance Management Systems
So far in this chapter, we have focused on the need for and
strategies used to improve the
accuracy of performance appraisals. However, even the most
accurate appraisal can fuel neg-
ative manager and employee reactions if it does not include
quality feedback. This knowledge
has prompted some I/O psychologists to suggest that worker
self-improvement, rather than
rating accuracy, should be the focus of a performance appraisal
(Ilgen, 1993). In order to build
a successful self-improvement plan, an employee needs to have
the following information: (a)
clear performance expectations, (b) an understanding of his or
her own current performance,
and (c) suggestions from his or her manager on how to improve.
Performance appraisal, then, is really part of a larger
performance management system, which
includes not only the appraisal but also the setting of
performance expectations and continu-
ous feedback. Recent research has presented serious criticism of
conducting performance
appraisals without integrating them into an effective
performance management system. Com-
mon examples include annual performance ratings that are not
connected to daily operations,
retrofitted ratings to justify pay increases that are not
necessarily related to performance,
and appraisals that result in limited or no pay differentials
between high and low perform-
ers (Pulakos, Mueller Hanson, Arad, & Moye, 2015). In fact,
Pulakos et al. (2015) presented
an interesting and unique case study of Cargill, a very large
organization in the food-service
industry that has 149,000 employees in 70 countries. Cargill has
abandoned performance
ratings altogether, with minimal impact on the organization’s
operations and effectiveness.
When performance appraisals are integrated into a performance
management system, how-
ever, they offer important benefits that should not be
overlooked. Thus, it is critical that orga-
nizations do not consider performance ratings in isolation, but
in terms of their links to stra-
tegic decision making (Adler et al., 2016; Gorman,
Cunningham, Bergman, & Meriac, 2016;
Smither, 2015).
Of course, if an employee truly wishes to improve his or her
performance, the employee must
be willing to listen to and act upon constructive performance
feedback, something that will
not happen unless the manager and employee are able to
establish a strong, high-quality
relationship with a significant amount of trust. In this section,
we will review the three major
components of the performance management system: building
relationships and trust, pro-
viding continuous feedback, and setting expectations. We will
also discuss the effects of cross-
cultural similarities and differences on performance
management systems and practices.
you83701_04_c04_103-134.indd 123 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
124
Section 4.5 Performance Management Systems
Building Relationships and Trust
Think about the last time you worked with a peer or for a
manager you did not trust. How did
you feel? Chances are, at one time or another, you experienced
frustration, confusion, apa-
thy, or even anger toward that person. Now imagine that the
person took some time to offer
you constructive performance feedback. How would you
respond? Most of us would ques-
tion this untrustworthy person’s intentions, withdraw from the
situation, or even become
hostile, ignoring the feedback or interpreting it in a negative
way. Furthermore, most people
would stay quiet and not question the feedback if they feared
they might be punished for
disagreeing. Actual negative feedback is even more detrimental,
especially when little trust
exists between the manager and the employee. Employees can
become unmotivated and lose
confidence (Kluger & DeNisi, 1996). When trust exists,
however, workers are more likely to
accept criticism openly, believing their manager has their best
interests in mind.
As you can see, a positive manager–employee relationship is
necessary to have a positive per-
formance appraisal experience, and trust is a prerequisite for
executing an effective perfor-
mance management system (Peterson & Hicks, 1996). In fact,
research shows that the quality
of the relationship between manager and employee has a
stronger influence on reactions to
performance appraisals than the favorability of the ratings or
whether the employee par-
ticipates in the appraisal process (Pichler, 2012). Relationship
quality and perceived orga-
nizational justice are key factors in employees’ satisfaction
and buy-in (Dusterhoff, Cunningham, & MacGregor, 2014).
Therefore, it is important for organizations to train man-
agers how to build relationships and trust with their staff
by keeping their commitments, displaying integrity, pro-
viding timely feedback, and showing an interest in their
subordinates.
Providing Continuous Feedback
During a typical annual postappraisal meeting, a man-
ager will hold a one-sided conversation with an employee,
reviewing the employee’s successes and failures from the
most recent performance review cycle. As you may have
determined from the opening exercises of this chapter, the
meeting can quickly become tense, even hostile, if the feed-
back includes criticism. Employees tend to instinctively
deflect criticism, blaming it on forces outside their con-
trol. Just think for a moment, though: If this meeting were
a once-a-year occurrence, how desirable or useful would
criticism be if it referred to something that happened 10
months ago? Could the employee even do anything now
to fix the situation? Probably not. How might an employee
react differently if, instead of annually, she or he received
informal feedback on a more regular basis?
Comstock/Thinkstock
When managers provide
continuous feedback,
employees feel less threatened
and can immediately adjust
their performance to solve
existing problems and account
for current demands.
you83701_04_c04_103-134.indd 124 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
125
Section 4.5 Performance Management Systems
Both scientific researchers and in-the-field practitioners
advocate the habit of providing con-
tinuous informal feedback to employees (Corporate Executive
Board, 2011; Gregory, Levy,
& Jeffers, 2008). More specifically, continuous feedback should
be given in addition to, not
at the exclusion of, the annual performance appraisal, and it
must be provided immediately
after any instance of effective or ineffective employee
performance. Once-a-year feedback has
never been very effective and is especially unhelpful for today’s
users of instant-communica-
tion systems such as Facebook and Twitter. Increasing its
frequency makes feedback not only
less threatening but also more helpful, because employees can
actually use the information to
solve existing problems or adjust performance to meet current
demands.
In addition to increasing frequency, I/O psychologists have
identified a number of other con-
ditions under which employees will be more likely to adopt
managers’ feedback suggestions:
1. Feedback must address specific employee behaviors or
actions, not personal char-
acteristics. It is best to use facts, data, statistics, or direct
observations to support
positive or negative feedback.
2. Feedback should involve two-way communication between
managers and employ-
ees. If employees feel they can express their views, they are
more likely to be satis-
fied with the feedback process (Dipboye & de Pontbriand,
1981).
3. Managers should provide constructive feedback only on
behaviors the employee can
control.
4. If feedback is negative, it should be constructive.
Furthermore, managers should pro-
vide support for the employees’ self-improvement.
Setting Expectations
Most of us truly wish to do our best at our jobs, but it is hard to
do so if we do not understand
what is expected of us. Managers can facilitate effective goal-
setting sessions by following
these guidelines (Latham, 2009b):
1. Encourage employee participation. Managers should not
simply assign perfor-
mance goals. Rather, they should collaborate with employees to
create mutually
agreed-upon expectations. Participating in the goal-setting
process increases an
employee’s goal-aspiration motivation (his or her desire to meet
the goal) and leads
to the creation of more challenging goals.
2. Set clear, realistic goals. Goals that are specific and
challenging yet achievable
more successfully motivate workers to perform their best.
3. Determine specific achievement deadlines. Workers perceive
open-ended goals
as less urgent than those with deadlines and are therefore less
likely to achieve
them.
4. Link performance to rewards. Employees tend to follow the
adage “What gets
rewarded gets done.” Employees should always understand how
achieving (or not
achieving) goals will impact them and how accomplishments
will be rewarded.
5. Facilitate priority setting. Not all goals hold the same
importance. Managers should
help employees prioritize and order goals according to their
importance.
you83701_04_c04_103-134.indd 125 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
126
Section 4.5 Performance Management Systems
Cross-Cultural Perspectives
Cross-cultural differences have substantial effects on business
practices. Many organizational
practices do not transfer readily across cultures and thus require
adaptation (Hofstede, 1980;
House et al., 1999; House et al., 2004). Performance
management is particularly prone to
such differences and necessary adaptations. The United States is
largely an individualistic
culture, which leads to an emphasis on individual success and
accomplishment. By contrast,
collectivistic cultures, often found in Eastern and Middle
Eastern countries, place a high value
on saving face and maintaining harmony. Thus, negative
feedback can be perceived as too
confrontational and may have far more damaging effects on
relationships. Even positive feed-
back in the form of public recognition can be uncomfortable to
high-performing employees in
collectivistic cultures; they may prefer to be recognized
privately in order to avoid alienating
their colleagues (Sumelius, Björkman, Ehrnrooth, Mäkelä, &
Smale, 2014). That is why in col-
lectivistic cultures, it is usually recommended that recognition
occurs in a private, one-on-one
setting.
Another distinct difference can be seen in how cultures view
power distance. In the United
States there is a strong emphasis on equality and fairness. It is
acceptable for employees to
express their opinions freely, even when they disagree with
their managers. Power and status
are earned based on achievements. However, in some cultures,
power and status differences
are readily accepted and rarely challenged. Examples include
most Asian and South Ameri-
can countries. In such cultures, employees will likely conform
to the process and accept its
results, but they will be less likely to actively participate in
setting their ratings or disagree
with their managers. Similarly, in high power distance cultures,
ratings by peers or subordi-
nates are unlikely to be authentic and may not be accepted as
legitimate or taken seriously
because they are not assigned by a superior (Sumelius et al.,
2014). In such cultures, tradi-
tional top-down performance evaluations are usually more
effective.
Because in many countries pay, promotions, and status continue
to be highly based on senior-
ity, performance appraisals may be perceived as an imposed
Western practice, and thus not
taken seriously or adopted wholeheartedly. This can at best
make the process a waste of time,
and at worst a counterproductive practice that can damage
relationships and cause divisive-
ness. This notion is particularly relevant to multinational
corporations when they attempt to
implement uniform practices across their global operations. It is
thus necessary and valuable
to adapt various features of performance management systems
to cross-cultural differences
(Sumelius et al., 2014).
Finally, it is important to note that cultural diversity can exist
even within one country or geo-
graphic location, and cultures can differ substantially from one
area of a country to another.
For example, in the United States cultural differences exist
between the West Coast, East Coast,
Midwest, and South, as well as from rural to urban to suburban
locations. Organizations often
find it necessary to adapt business practices across operations,
even within the same country.
Managers should also understand these differences when
dealing with employees from differ-
ent backgrounds, whether these involve local, regional,
national, or international differences.
you83701_04_c04_103-134.indd 126 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
127
U.K. and U.S. = Good job,
okay
Japan = Money
Russia and France = Zero
Brazil = Insult
U.S. = Good job, approval
U.S. (palm facing inward or
outward) = Peace, victory, 2
Australia, U.K., South Africa
(palm facing inward) = Insult
Australia, Greece, Middle
East = Insult
Germany, Hungary = 1
Japan = 5
Section 4.6 The R in ROI: Linking Performance Evaluations to
Financial Results—Friend or Foe?
4.6 The R in ROI: Linking Performance Evaluations to
Financial Results—Friend or Foe?
Performance measurement is extremely important, for the
numerous reasons discussed ear-
lier. However, performance appraisals can also become a
laborious exercise that I/O psychol-
ogists or HR departments push on the rest of the organization to
serve other, less important
goals. For example, managing the performance appraisal system
can become a goal in itself
for those whose job is to ensure that the system is functional
and well maintained. It justifies
the jobs they are holding and the salaries they are paid. When
that is the case, the managers
who perform the evaluations start to view the performance
appraisal system as a formality
and do not take it very seriously. They may fill in the necessary
forms, but the accuracy of
their ratings will likely be questionable.
Another common but often less effective approach to
performance appraisals is to limit their
outcomes to only determine annual salary increases. Especially
in a tight economy, when
Consider This: Delivering Compliments Across Cultures
Even simple gestures can mean different things across cultures.
Consider how you would
interpret the gestures in Figure 4.5.
In some cultures, these gestures would be considered positive
and complimentary. In others,
they could be impolite or even offensive. Managers should be
sensitive to their employees’
cultural backgrounds. They should not assume that anything
they say or do in their own cul-
ture would be readily accepted in other cultures. Similarly,
organizations should keep cross-
cultural differences front and center when making strategic
decisions.
Figure 4.5: Cross-cultural interpretations of the same gestures
U.K. and U.S. = Good job,
okay
Japan = Money
Russia and France = Zero
Brazil = Insult
U.S. = Good job, approval
U.S. (palm facing inward or
outward) = Peace, victory, 2
Australia, U.K., South Africa
(palm facing inward) = Insult
Australia, Greece, Middle
East = Insult
Germany, Hungary = 1
Japan = 5
you83701_04_c04_103-134.indd 127 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
128
Section 4.6 The R in ROI: Linking Performance Evaluations to
Financial Results—Friend or Foe?
overall payroll allocations are frozen, relatively stable, or even
decreased, linking perfor-
mance appraisals solely to annual raises is unlikely to be
conducive to performance outcomes.
For example, when the best performers get a raise of only 4%
and the worst performers get
a raise of 2%, you can see why the difference is unlikely to
translate into any constructive
performance feedback for either employee. In cases where there
is a limited pool of resources
to distribute, high performers often feel guilty about getting a
raise at the expense of their
colleagues, rather than feeling appreciated or rewarded.
Conducting performance appraisals
only to determine annual raises also ignores their purpose of
providing continuous construc-
tive performance feedback. As discussed earlier, employees
need to receive feedback on their
performance more often. This feedback should be provided to
help them better align their per-
formance with the organization’s goals. When appraisals are
always linked to pay, appraisal
sessions become mostly about money, rather than about
performance improvement.
Performance appraisals are also often used in conjunction with
layoff decisions. Even though
this is a legitimate use, if they are only used to justify these
decisions, they become perceived
as a way for the organization to provide a legally defensible
paper trail. This diminishes the
appraisals’ value, and the process becomes resented and
distrusted by managers and employ-
ees alike. Again, although the above uses of performance
appraisals are important and legiti-
mate, their primary use should be as a tool with which to
objectively measure performance
and facilitate its improvement.
So how do organizations, managers, I/O psychologists, and HR
departments design truly
effective performance appraisals systems? It is critical to
choose the correct measures. As
discussed earlier, the correct measures should be readily linked
to the organization’s goals
and its success. This chapter offers numerous ways to enhance
the quality of performance
measures. However, in the words of sociologist William Bruce
Cameron (1963) in his book
Informal Sociology: A Casual Introduction to Sociological
Thinking: “Not everything that can be
counted counts, and not everything that counts can be counted”
(p. 13). The Pareto efficiency
principle, also known as the 80–20 rule, posits that in most
situations, 80% of outcomes are
caused by 20% of the inputs. In performance appraisal, this
means that it would be most
effective for managers to focus on their employees’ most
critical behaviors, which constitute
about 20% of everything the employees do on a daily basis,
because these critical behaviors
cause 80% of the outcomes that truly impact the organization’s
success and effectiveness.
Although not the only approach, performance measures that are
directly linked to financial
results tend to be perceived as objective and fair because they
reflect an employee’s true value
to an organization. They are also critical for making resource
allocation decisions, because
they put HR decisions on a par with other investments. For
example, the financial value of
an employee’s performance can justify the costs of hiring or
retaining that employee over
outsourcing the position or investing in a piece of equipment
that would allow the job to be
automated.
Should an appraisal system, then, attempt to capture the
financial value of every aspect of
an employee’s performance? Absolutely not! The concept of
opportunity cost, introduced
in earlier chapters, implies that appraisals should only capture
those performance dimen-
sions where the benefits of measurement exceed the costs. For
example, even though it is
easy to quantify stationery consumption, the cost of policing
employees to be less wasteful in
their use of inexpensive stationery items can be higher than the
cost savings that may accrue
from these initiatives (such as saving paper). Similarly, many
organizations install expensive
you83701_04_c04_103-134.indd 128 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
129
Section 4.7 Toward More Positive Appraisal Sessions
equipment or have their managers waste numerous hours
managing their employees’ atten-
dance when the costs of the few minutes that an employee may
come in late or leave early far
exceed the costs of monitoring.
What, then, should an appraisal system track? The Pareto
efficiency principle implies that it
should focus on performance dimensions that would be of high
enough financial value to jus-
tify the cost. The key is not always the absolute cost or benefit
of a performance dimension but
rather the variation of that cost or benefit. To use our previous
example, the difference between
the most and least conservative use of stationery is not
substantial. In measuring the perfor-
mance dimensions that would have the most substantial effects
on financial outcomes, empha-
sis should be on the dimensions with the highest variability and
the ones that are “pivotal” to
performance (Cascio & Boudreau, 2011). For example, at an
upscale restaurant, the most piv-
otal performance dimension for cooks is their cooking skills. On
the other hand, the most piv-
otal skills of the wait staff are their social skills. Although
cooks need to have some social skills
to effectively deal with the wait staff and restaurant
management, social skills are not pivotal
for cooks. They can still be subjectively evaluated (e.g., using a
narrative or a rating scale), but
attempting to place a financial value on those skills is both
impractical and unnecessary.
Consider This: Appraising Performance Dimensions
That Really Matter
Choose a job that would be of interest to you. It can be your
current job, a job you held in the
past, a job you hope to have in the future, or just a job that you
have come across in a job-
opening announcement or advertisement. Describe the job in
detail. For more information,
you may search for job descriptions of similar jobs.
Questions to Consider
1. How would you measure the performance of the incumbent of
this job? What are the
most important dimensions to measure, according to the Pareto
efficiency principle?
2. What are the performance dimensions that will likely exhibit
the most variability (the
most pivotal dimensions)?
3. Which dimensions will be most readily linked to financial
results? Explain.
4. Which dimensions should be subjectively evaluated?
5. Which dimensions should be ignored and not evaluated? Note
that this is an important
decision because it can significantly affect the efficiency and
effectiveness of a perfor-
mance appraisal system. It is also a decision that is often
neglected or inadequately
addressed.
4.7 Toward More Positive Appraisal Sessions
As humans, we have a tendency to overemphasize and amplify
the magnitude of negativ-
ity in our lives (Baumeister, Bratslavsky, Finkenauer, & Vohs
2001). Negative stimuli tend to
receive more of our attention and energy. For example,
threatening personal relationships
have been shown to receive more of our thought time than
supportive ones, and blocked
goals tend to receive more thought time than those with open
and available options (Klinger,
you83701_04_c04_103-134.indd 129 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
130
Section 4.7 Toward More Positive Appraisal Sessions
Barta, & Maxeiner, 1980). Performance appraisal
is no exception. It is much easier to dwell on our
own or others’ faults than to acknowledge talents,
strengths, and positive performance attributes.
Doing the latter requires intention.
So why do humans in general tend to focus on neg-
ativity? The tendency to overemphasize negativity
has been attributed to primitive survival mecha-
nisms in reaction to perceived physical danger. In
civilized societies, overemphasis on negativity has
been attributed to four psychological factors that
are comparable to these survival mechanisms:
intensity, urgency, novelty, and singularity (Cam-
eron, 2008). The first factor is the intensity of negative stimuli.
Because negative events are
perceived as threatening, they are experienced more intensely.
Second is the sense of urgency
that negative stimuli place on our perceptions and action
tendencies, because something is
wrong and needs to be fixed. Positive stimuli do not pose the
same sense of urgency, because
ignoring positive stimuli does not pose as much risk as ignoring
negative stimuli. Third is the
perceived novelty of negative events. Believe it or not, a lot of
what is going on in most peo-
ple’s lives is positive. That’s why it tends to go unnoticed.
Negativity is the exception. That’s
why it gets more attention.
Fourth, one of the unique characteristics of negativity is what is
referred to as singularity.
Imagine a system with one defective component, a body with
one ailing organ, a team with one
counterproductive employee, or a family with one dysfunctional
member. A single negative
component is capable of tainting the performance of the
collective, which causes that single
negative component to really stand out and alert the rest to the
need to somehow remedy
the problem. On the other hand, positivity tends to be more
general and global. One positive
component alone does not necessarily make a system better.
One good employee alone usually
cannot make an organization successful. One healthy organ
alone cannot make the whole body
healthy. This singularity makes the effect of negativity more
pronounced and far reaching.
Paradoxically, humans also have a natural tendency, referred to
as the heliotropic tendency,
to gravitate toward what is pleasurable (i.e., positive) and away
from painful or uncomfort-
able stimuli. However, this tendency tends to be overwhelmed
by the intensity, urgency, nov-
elty, and singularity of negativity and needs to be encouraged
through intentional decisions
and actions. That is why although most managers recognize
their tendency to overemphasize
their employees’ weaknesses, faults, and mistakes and wish they
could be more positive, they
cannot. For example, they may get overwhelmed by the urgent
need to address the dysfunc-
tional behaviors of their worst employees and end up with no
time to interact with and praise
their better ones for their consistent positive behaviors.
Moreover, those consistent positive
behaviors may no longer stand out; they may be taken for
granted. A manager may even for-
get to recognize them when appraising these employees’
performances.
So how can managers overcome their negative tendencies and
conduct more positive perfor-
mance appraisal sessions? First, a manager needs to recognize
the importance of positivity,
which was introduced in Chapter 3. Although extreme positivity
is unnecessary and can even
Cartoonstock.com
you83701_04_c04_103-134.indd 130 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
131
Section 4.7 Toward More Positive Appraisal Sessions
be dysfunctional, research supports the idea that humans thrive
and flourish in a positive
environment (Keyes, 2002). So managers need to intentionally
create positive interactions
with their employees, especially when they need to
counterbalance a negative interaction,
such as when it is necessary to give negative feedback. In
performance appraisal sessions,
managers should put in the effort to find and comment on
positive aspects of their employ-
ees’ performance. This requires the art of catching employees
doing something right instead
of the common practice of focusing on problems and mistakes.
You might think that this
hand-holding is more necessary for new or inexperienced
employees and that more mature
employees or more established relationships can tolerate less
positivity. However, research
shows that even more positivity is needed in more complex
settings such as top management
teams and marital relationships (Fredrickson, 2013; Gottman,
1994).
Consider This: Positive Performance Appraisal Sessions
In order to conduct more positive performance appraisal
sessions, managers need to be more
positive when collecting and sharing performance information.
What follows are two exam-
ples of positively oriented practices that can be used to replace
the negative practices often
used by managers.
Example 1
Negative: Reprimand workers when late.
Positive: Praise and reward workers who are consistently on
time.
Rationale: Workers who are on time will know that their
positive behavior is noticed and
appreciated instead of ignored. Late workers will start coming
on time to get the manager’s
attention and receive rewards.
Example 2
Negative: Criticize an employee for weaknesses (e.g., having
poor people or leadership skills).
Positive: Find and acknowledge the employee’s strengths that
parallel those weaknesses (e.g.,
being an independent thinker or being willing and able to follow
directions). Suggest changes
in role to better fit employee strengths and/or training
opportunities to develop lacking skills.
Rationale: Weaknesses may be based on stable personality traits
that cannot be readily
changed (e.g., introversion). In these cases role changes are
more likely to lead to improve-
ments in performance. In others training and development are
more likely to be perceived as
opportunities. Criticism is more likely to be perceived as a
threat.
It is important to note that positive feedback and recognition
have been found to positively
influence performance almost as much as financial rewards
(Luthans & Stajkovic, 1999;
Stajkovic & Luthans, 2003). Since positive feedback and
recognition hardly cost managers and
organizations anything, it may seem surprising that they are not
used more often or more effec-
tively. Furthermore, millennials rate development opportunities
as more important than pay
(McCarthy, 2015). When feedback is framed positively and
offered as a way to develop, it can
have important motivational effects that facilitate the attraction
and retention of employees.
you83701_04_c04_103-134.indd 131 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
132
Summary and Conclusion
Summary and Conclusion
Organizations that excel in their ability to measure their
employees’ performance are at a
significant competitive advantage. They place themselves in a
favorable position to select
the best employees, capitalize on their talents and skills,
promote them into their areas of
excellence, adequately reward them, and retain them for enough
time to reap significant
returns on their investment in them. They also have an edge in
accurately and promptly
identifying performance deficiencies and pursuing corrective
measures. Accurate perfor-
mance appraisals can be a source of information for the
organization, a communication
vehicle for managers, and a much needed feedback process for
employees.
However, in many instances, performance appraisal systems can
be plagued with so much
subjectivity that they not only defeat their purpose but become
counterproductive. They can
be perceived as inequitable, which can damage morale and
performance. They can also be
discriminatory, which may result in legal costs and damage to
the organization’s reputation.
Thus, a poorly designed performance appraisal system may be
worse than not having one
at all!
Organizations should view the process of designing and
maintaining a well-functioning
performance appraisal system as a worthwhile investment,
rather than just an expense,
a formality, or a necessary evil. Accurate measures should be
designed to assess the most
pivotal dimensions of performance. They should be integrated
with other HR systems, such
as recruitment, selection, training, and compensation, as well as
with overall organizational
goals and strategies in order to ensure full utilization and
impact on the organization’s bot-
tom line.
behaviorally anchored rating scale
(BARS) A method for rating worker perfor-
mance in which performance dimensions
and critical incidents that exemplify both
effective and ineffective job performance
are identified, transformed into behavioral
statements that describe different levels of
performance, assigned numerical values,
and used as anchors in a typical graphic rat-
ing scale.
behavioral observation scale (BOS) A
method for rating worker performance that
is similar to BARS, except that instead of
performance quality, it rates the frequency
with which a worker is observed to perform
a critical job behavior.
forced distribution A subjective perfor-
mance measure in which managers place
employees into performance categories
based on preestablished proportions.
graphic rating scale The most commonly
used method for rating worker perfor-
mance, in which managers observe specific
employee behaviors or job duties along a
number of predetermined performance
dimensions and rate the quality of perfor-
mance for each dimension on a scale ranging
from high to low.
objective performance measures Quanti-
tative measures of performance that can be
found in unbiased organizational records.
Key Terms
you83701_04_c04_103-134.indd 132 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
133
Summary and Conclusion
paired comparison A subjective perfor-
mance measure in which managers system-
atically evaluate each worker’s performance
compared to the other workers on the team
by comparing one pair of workers at a time,
judging which of the two demonstrates
superior performance, and then creating a
rank ordering based on the number of times
each worker is considered the better per-
former of a pair.
performance appraisal The formal pro-
cess in which employee performance is
assessed, feedback is provided, and correc-
tive action plans are designed.
rank ordering A subjective performance
measure in which managers rank their
employees from best to worst.
subjective performance measures Per-
formance measures that rely on human
judgments.
360° appraisal A multisource evaluative
process that utilizes performance feed-
back from a variety of sources, such as an
employee’s supervisor, peers, subordinates,
and internal or external customers, as well
as self-evaluation.
written narrative A subjective perfor-
mance measure in which the manager writes
a paragraph or two summarizing a specific
employee’s performance over a certain
length of time.
you83701_04_c04_103-134.indd 133 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
you83701_04_c04_103-134.indd 134 4/20/17 4:34 PM
© 2017 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
Multimedia
Dimoff, D. (Producer). (2011). Performance evaluation (Links
to an external site.)Links to an external site. [Video segment].
In D. S. Walko, & B. Kloza (Executive Producers), Managing
your business: Prices, finances, and staffing. Retrieved from
https://fod.infobase.com/OnDemandEmbed.aspx?token=42251&
wID=100753&loid=116118&plt=FOD&w=420&h=315
· The full version of this video is available through the Films
On Demand database in the Ashford University Library. This
video discusses the role of performance reviews, and provides
guidance. This video has closed captioning. It may assist you in
your discussions, Assessment in the Workplace Diversity and
the Organizational Process, this week.
Accessibility Statement (Links to an external site.)Links to an
external site.
Privacy Policy
Marofsky, M., Grote, K. (Writers), Christiansen, L., Dean, W.
(Directors), Christiansen, L., & Hommeyer, T (Producers).
(1991). Understanding our biases and assumptions (Links to an
external site.)Links to an external site. [Video file]. Retrieved
from
https://fod.infobase.com/OnDemandEmbed.aspx?token=2574&w
ID=100753&plt=FOD&loid=0&w=640&h=480&fWidth=660&f
Height=530
· The full version of this video is available through the Films
On Demand database in the Ashford University Library. This
video discusses the nature of biases and preconceptions, and it
stresses the need to examine one’s own thinking about “us” and
“them.” This video has closed captioning. It may assist you in
your discussions, Assessment in the Workplace Diversity and
the Organizational Process, this week.
Accessibility Statement (Links to an external site.)Links to an
external site.
Privacy Policy (Links to an external site.)Links to an external
site.
Preparing for my appraisal: Cutting edge communication
comedy series (Links to an external site.)Links to an external
site. [Video file]. (2016). Retrieved from
https://fod.infobase.com/OnDemandEmbed.aspx?token=111702
&wID=100753&plt=FOD&loid=0&w=640&h=360&fWidth=660
&fHeight=410
· The full version of this video is available through the Films
On Demand database in the Ashford University Library. This
video shows several examples of different work appraisals,
showing the “dos” and “don’ts” and providing helpful tips. This
video has closed captioning. It may assist you in your
discussions, Assessment in the Workplace Diversity and the
Organizational Process, this week.
Accessibility Statement (Links to an external site.)Links to an
external site.
Privacy Policy (Links to an external site.)Links to an external
site.
Twitter: Login (Links to an external site.)Links to an external
site. (2017). Retrieved from https://twitter.com/login
Accessibility Statement (Links to an external site.)Links to an
external site.
Privacy Policy (Links to an external site.)Links to an external
site.
Twitter Support (Links to an external site.)Links to an external
site.(2017). Retrieved from
https://support.twitter.com/articles/215585#
Accessibility Statement (Links to an external site.)Links to an
external site.
Privacy Policy (Links to an external site.)Links to an external
site.
What your boss wants: Business (Links to an external
site.)Links to an external site. [Video file]. (2013). Retrieved
from
https://fod.infobase.com/OnDemandEmbed.aspx?token=94142&
wID=100753&plt=FOD&loid=0&w=640&h=360&fWidth=660&
fHeight=410
· The full version of this video is available through the Films
On Demand database in the Ashford University Library. This
video gives an insider’s perspective on what makes a good job
application, a successful interview, what to expect in the
induction process, and the types of assessments at the end of the
probationary period. This video has closed captioning. It may
assist you in your discussions, Assessment in the Workplace
Diversity and the Organizational Process, this week.
Accessibility Statement (Links to an external site.)Links to an
external site.
Privacy Policy (Links to an external site.)Links to an external
site.
Supplemental Material
Rosser-Majors, M. (2017). Week Two Study Guide. Ashford
University.
Recommended Resource
Multimedia
Bandura, A., Jordan, D. S. (Writers), & Davidson, F. W.
(Producer). (2003). Modeling and observational learning – 4
processes [Video segment]. In Bandura’s social cognitive
theory: An introduction. Retrieved from
https://fod.infobase.com/OnDemandEmbed.aspx?token=44898&
wID=100753&loid=114202&plt=FOD&w=420&h=315&fWidth=
440&fHeight=365
· The full version of this video is available through the Films
On Demand database in the Ashford University Library. In this
video, Albert Bandura explains the four processes of
observational learning. He also describes the Bobo doll
experiment on the social modeling of aggression. This video has
closed captioning. It may assist you in your discussions,
Assessment in the Workplace Diversity and the Organizational
Process, this week.
673Foundations of Psychological TestingNoel Hendrick.docx

673Foundations of Psychological TestingNoel Hendrick.docx

  • 1.
    67 3Foundations of Psychological Testing NoelHendrickson/Photodisc/Thinkstock Learning Outcomes After reading this chapter, you should be able to • Identify the purpose and uses of psychological testing. • Describe the characteristics of a high-quality psychological assessment tool or selection method. • Explain the importance of reliability and validity. • Identify commonly used psychological test design formats. • Recognize the types of tests used to assess individual differences. • List the steps needed to develop and administer tests most effectively. • Discuss special issues in testing, including applicants’ reactions to testing and online administration. • Summarize the importance of testing for positivity. you83701_03_c03_067-102.indd 67 4/20/17 4:22 PM
  • 2.
    © 2017 BridgepointEducation, Inc. All rights reserved. Not for resale or redistribution. 68 Section 3.2 What Are Tests? 3.1 The Importance of Testing When you hear the word test, what comes to mind? For many people, tests are not a pleasant activity. Most of us can remember wishing, as children, to grow up and finish school so we would never have to take a test again. Of course, as adults, we discover that tests affect our lives long after we earn our diplomas. Tests determine whether we can drive a car, get into a graduate or job-training program, or earn a professional certification. They influence our career choices and, quite often, our career advancement. What profession do you plan to pursue? Do you want to be a doctor or a lawyer? How about a police officer or firefighter? Perhaps you would like to earn your MBA and start your own business? Each of these examples, along with most professions, require many years of tests, demand high levels of knowledge and skills, and require continued education and recertifica- tion testing. Businesses use tests to help determine whether job applicants possess the skills and abilities needed to perform a job. After an applicant is hired, tests will help determine placement in an
  • 3.
    appropriate training anddevelopment program. Throughout an employee’s career, the orga- nization may require testing for new job placements or promotions. As you can see, tests can have a significant influence on people’s lives; they can help identify talent and promote deserving candidates. But they can also be misused. Unfortunately, there are many poorly designed psychological tests on the market. They seduce organizations with promises of fantastic results but do little to identify quality employees. I/O psychologists pos- sess the knowledge, skills, and education to design, implement, and score measures that meet the legal and ethical standards for an effective psychological test. The goal of this chapter is not to teach you how to design quality psychological tests, but rather to acquaint you with the requirements, challenges, and advantages of doing so. Fur- thermore, understanding the test-making methods that I/O psychologists use will make you a more informed consumer of tests for your own personal and professional goals. 3.2 What Are Tests? In general, a test is an instrument or procedure that measures samples of behavior or perfor- mance. In an employment situation, tests measure an individual’s employment and career- related qualifications and characteristics. The Uniform Guidelines on Employee Selection Procedures (1978) defines a test as any method used to make a decision about whether to
  • 4.
    hire, retain, promote,place, demote, or dismiss an employee or potential employee. By this definition, then, any procedure that eliminates an applicant from the selection process would be defined as a test. As discussed in Chapter 2, examples include application forms that eval- uate education and experience; résumé screening processes; interviews; reference checks; performance in training programs; and psychological, physical ability, cognitive, or knowl- edge-based tests. you83701_03_c03_067-102.indd 68 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 69 Section 3.2 What Are Tests? I/O psychologists are concerned with design- ing and implementing selection systems that identify quality job candidates. Clearly, orga- nizations want to hire the best workers, but it is a real challenge to screen candidates who vary widely in their KSAOs, behaviors, traits, and attitudes—or what is known as their individual differences. This is especially true when hiring decisions are made with only basic tools such as application forms and short interviews. To help organizations better measure applicants’ personal charac- teristics, I/O psychologists have developed
  • 5.
    psychological measurements toidentify how and to what extent people vary regarding individual differences (Berry, 2003). What Is the Purpose of Psychological Testing and Selection Methods? In employment, tests and other selection tools are used to predict job performance. Keep in mind that job performance can never be predicted with 100% accuracy. The only way employ- ers could reach such a high level of accuracy would be to hire all the applicants for a particu- lar job, have them perform the job, and then choose those with the highest performance. Of course, this approach is neither practical nor cost effective. Moreover, even if an organiza- tion could afford to hire a large number of applicants and retain only those who performed best, performance prediction is still not perfectly accurate. For example, many organizations have probationary periods in which the employer and the employee try each other out before a more permanent arrangement is established. Employees may be motivated to perform at a much higher level during the probationary period in order to secure permanent employ- ment, but performance levels may drop once the probationary period is over. Moreover, job performance may be influenced over time by a myriad of factors that cannot be predicted or managed. Although it is impossible to perfectly predict job performance, psychological testing and selection methods can provide reasonable levels of prediction if they accurately and consis-
  • 6.
    tently assess predictorsthat are related to specific performance criteria. As briefly introduced in Chapter 2, accurately predicting performance criteria— usually referred to as validity— ensures that test results indicate performance outcomes, so that those who score favorably on the test are more likely to be high performers than those who do not. Simply put, valid- ity reflects the correlation between applicants’ scores on the test or selection tool and their actual performance. A high correlation indicates that test scores can accurately predict per- formance. A low correlation indicates that test scores are poorly related to performance. In order to assess the validity of a selection tool, job performance must be quantifiable, mean- ing that there is a set of numbers associated with applicants’ test scores so one can calculate a correlation. Performance scores are usually obtained from performance appraisal systems, which will be discussed in more detail in Chapter 4. Poorly designed performance appraisal Fuse/Thinkstock Students take tests to demonstrate their knowledge of a particular subject. Similarly, employers administer exams to job applicants to measure employment and career-related qualifications and characteristics. you83701_03_c03_067-102.indd 69 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for
  • 7.
    resale or redistribution. 70 Section3.2 What Are Tests? systems can hinder an organization’s ability to assess the validity of its selection methods. For example, performance evaluations are sometimes highly subjective. Some managers tend to score all of their employees similarly regardless of performance in order to ensure they all receive a raise. Some do so to sidestep confrontation or to avoid having to justify the decision. This lack of variability in scores can bias the results of the statistical analysis’s underlying validity, preventing it from adequately calculating or comparing the validity of various selec- tion methods. Another important determining factor of tests and other selection tools is reliability. Also referred to as consistency, reliability is the extent to which test scores can be replicated over time and across situations. A reliable test will reflect an applicant’s aptitude rather than the influence of other factors such as the interviewer, room temperature, or noise level. For exam- ple, many organizations use multiple interviews or panel interviews to evaluate applicants so that there are multiple raters scoring each applicant. Such processes have scorers assign both an absolute score, which measures how the applicant did in relation to the highest possible
  • 8.
    score, and arelative score, which measures how the applicant did in relation to the rest of the interviewees. When the scores assigned by these multiple raters are comparable in terms of absolute scores for each applicant, as well as relative scores and rankings across applicants, the interview process is considered reliable. On the other hand, if different raters score the same applicant very differently, and if the interview process yields different rankings across applicants and thus different hiring recommendations, then the process is unreliable. Similar to validity, no test or selection method has perfect reliability, but the more reliable and con- sistent a selection tool is, the more accurate it will be in determining quality candidates, and the more legally defensible it will be if the organization is sued for discriminatory hiring. An objective and systematic selection process that leads to consistent results across candidates and raters is an organization’s first line of defense against such accusations. Ensuring that tests are both valid and reliable is an important part of the assessment process. Of course, the more accurate the testing process, the more likely the best candidates will be selected, promoted, or matched with the right job. However, that is not the only reason to do so: invalid or unreliable tests can be costly. Many tests need to be purchased or require a license of use to be obtained. Testing also takes time, both for the candidate and the organiza- tion. Tests need to be administered, be rated, and have the results reported, which requires managers’ and HR professionals’ time and effort. Tests that are
  • 9.
    not valid orreliable also have opportunity costs, such as the time spent using them as well as the lower productivity of those who were hired or promoted using the wrong test. Finally, there are legal implications for ineffective testing. An invalid test may not be related to performance, but it may be discriminatory. It may favor certain protected classes over oth- ers. For example, if younger job applicants consistently score higher than older ones on a test, and these scores are not related to job performance, then that test may be found discrimina- tory. Similarly, if a particular test favors men over women or places minority applicants at a disadvantage, it can be considered discriminatory and thus illegal. For example, complaints were filed against the pharmacy chain CVS Caremark for using a discriminatory personality test. The test included questions about the applicant’s propensity to get angry, trust others, and build friendships. These questions were found to be potentially discriminatory against applicants with mental disabilities or emotional disorders (Tahmincioglu, 2011). Although the organization may have had no intent to discriminate, using invalid discriminatory tests can result in what was referred to in Chapter 2 as “disparate impact,” which is also illegal. you83701_03_c03_067-102.indd 70 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
  • 10.
    71 Section 3.2 WhatAre Tests? Uses of Tests Companies of all sizes are integrating tests into their employment practices. In 2001 a study by the American Management Association found that 68% of large U.S. companies used job- skill testing as part of their employment process; psychological (29%) and cognitive (20%) measurements were used less frequently. More recent studies, however, show that the test- ing trend is on the rise, with nearly 80% of Fortune 500 organizations using assessments of Consider This: Tests and Testing Make a list of as many tests as you can remember having taken during times when you were up for selection from a pool of potential candidates (e.g., jobs, volunteering opportunities, college admissions, scholarships, military service). Remember that a test can be any instrument or procedure that measures samples of behavior or performance. It does not have to be a written, proctored exam. Questions to Consider 1. What do you think each of those tests was attempting to measure? 2. What is your opinion of each of those tests? 3. Did the tests adequately measure what they were trying to measure? 4. What were some strengths and weaknesses of each test?
  • 11.
    Find Out forYourself: The Validity and Reliability of Commonly Used Selection Tools Visit the following websites to read about the different types of validity and reliability, as well as how they are measured. Validity & Reliability of Methods Psychometric Assessment Validity Psychometric Test Reliability What Did You Learn? 1. Compare the validity of various selection methods such as interviews, reference checks, and others. If you were a recruiter, which ones would you use? Why? 2. If you were to use an actual test, how long would you design the test to be? Why? 3. If you were to design an interview process, how many interviewers would you use for each candidate? What are the benefits of using more than one interviewer (rater)? 4. What are the key factors in increasing the validity of the selection process? 5. What are the key factors in increasing the reliability of the selection process? you83701_03_c03_067-102.indd 71 4/20/17 4:22 PM
  • 12.
    © 2017 BridgepointEducation, Inc. All rights reserved. Not for resale or redistribution. http://www.nicheconsulting.co.nz/validity_reliability.htm http://www.nicheconsulting.co.nz/validity.htm http://www.nicheconsulting.co.nz/reliability.htm 72 Section 3.3 Requirements of Psychological Measurement some sort (Dattner, 2008). Moreover, about a quarter of employers utilize online personality testing to weed out applicants early in the recruitment process, even before any other screen- ing tool is used; this trend is expected to increase by about 20% annually. Examples include large, nationwide employers such as McDonald’s and CVS Caremark (Tahmincioglu, 2011). The growing popularity of testing within organizations has resulted in the use of tests not only for selection but also for a number of other HR functions. One of the most important ways in which organizations use tests is to evaluate job applicant or employee fit. Organizations often administer tests to accurately evaluate an applicant’s job-related characteristics or determine an employee’s suitability for promotion or place- ment in a new position within the company. Because promotions and training are expensive, organizations place high importance on being able to determine which employees possess the higher level abilities and skills needed to assume advanced job positions. Similarly, during
  • 13.
    job reorganizations, companiesmust be able to place individuals into new jobs that align with their skills and abilities. Keep in mind that selection, promotion, and job-placement processes all involve employment decisions and thus must be well designed in order to meet the requi- site legal and professional standards. HR professionals make use of tests in areas outside the realm of employment selection. Train- ing and development is one such example. Trainees are often tested on their job knowledge and skills to determine the level of training that will fit their proficiency. At the end of train- ing, they may take tests to assess their mastery of the training materials or to identify areas where they need to be retrained. Other types of tests help individuals identify areas for self- improvement, and sometimes job teams take tests to help facilitate team-building activities. Finally, tests can help individuals make educational or vocational choices. People who work at jobs that utilize their skills and interests are more likely to be successful and satisfied, so it is important that vocational and educational tests make accurate matches and predictions. Note that tests used solely for career exploration or counseling need not meet the same strict legal standards as tests used for employee selection. 3.3 Requirements of Psychological Measurement Tests designed by I/O psychologists possess a number of important characteristics that set them apart from other tests you may have taken. Scientifically designed tests differ from mag- azine quizzes or informal tests that you find online in that they
  • 14.
    are more thana set of ques- tions related to a specific topic. Instead, they must meet standards related to administration (the way in which the test is given), scoring methods, score interpretation, reliability, and validity. Unfortunately, many employers and consultants think they can simply put together a test or an interview protocol that they believe measures what they feel is necessary and start using the selection tool without any statistical analysis to assess its quality. In addition to the fact that this approach does not effectively distinguish applicants with higher performance potential, which makes it a waste of time and resources, it can also yield inconsistent and discriminatory results, which makes it illegal. you83701_03_c03_067-102.indd 72 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 73 Section 3.3 Requirements of Psychological Measurement Standardized Administration To administer a selection test properly, the conditions under which applicants complete the test must be standard—that is, they must be the same every time the test is given. These con- ditions include the test materials, instructions, testing facilities, time allowed for testing, and allowed resources and materials. To ensure standardization,
  • 15.
    organizations put instructions intowritten form or administer the test to large groups so that all applicants hear the same instructions. Additionally, applicants will all complete the test in the same location using well-functioning equipment and comfortable seating. Test administrators are also careful to keep the testing environment comfortable in terms of temperature and humidity as well as free from extraneous noise or other distractions. Variations in testing conditions can significantly interfere with results, making it impossible to create accurate comparisons between different applicants based on the conditions in which they were tested. Consider how changing even one aspect of test- ing conditions can affect test performance. What would happen if, on a cold day in the middle of winter, the heat stopped working partway through a series of applicant evaluations? Appli- cants might not perform well on a typing test because their hands were cold and stiff, or they might not complete a written test because they were shivering and could not concentrate. Now think about how differently two groups of test takers would perform if one group accidentally received incomplete instructions from an inexperienced administrator, while a second group received the proper instructions from an experienced administrator. You can easily see how unfair it would be to try to compare test results of applicants who were not all tested under equal conditions!
  • 16.
    Of course, itis sometimes not only appropriate but also necessary to alter the testing condi- tions. Applicants with disabilities may need specific accommodations, such as a sign language interpreter for a person with a hearing impairment or a reader or Braille version of a written test for a person with a visual impairment. For applicants with disabilities, then, not allow- ing for changes in the testing conditions would make it difficult, if not impossible, for them to perform their best. A real-life example of this occurred when an on-campus recruiter for a highly coveted intern- ship program noticed that, despite performing as well as other students when interviewed on campus, minority students from very low-income areas fared poorly when invited to an on-site interview. The recruiter suspected that the organization’s luxurious office building and extravagant furnishings may have intimidated those students and caused their poor per- formance. To further investigate his point, he changed the location of the interview to a local community youth center—everything else about the interview process was kept identical. Under these conditions, the students’ performance improved significantly and was no differ- ent from the overall applicant pool. In other words, the change was necessary to neutralize the distracting effects of an otherwise unrelated testing condition. Lisa F. Young/iStock/Thinkstock
  • 17.
    It is importantthat all applicants take the same test under the same conditions. you83701_03_c03_067-102.indd 73 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 74 Section 3.3 Requirements of Psychological Measurement Objective Scoring Just as performance on a test can be affected by testing conditions, results can be affected by the way in which the test is scored. To eliminate the possibility of bias, test scores must be standardized, which means that a test must be scored in the same way for everyone who takes it. Establishing a scoring key or a clear scoring criterion before administering a test will reduce an evaluator’s subjective judgments and produce the same or similar test scores no matter who evaluates the test. Organizations can utilize both objective tests and subjective tests. Objective tests, such as achievement tests and some cognitive ability tests, have one clearly correct answer— multiple-choice tests are an example. As long as a scoring key is available, scoring an objec- tive test should be free from bias. In contrast, subjective tests, such as résumé evaluations, interviews, some personality tests, and work simulations, have
  • 18.
    no definitive rightor wrong answers; their scores rely on the interpretation and judgment of the evaluator. To minimize the influence of personal biases and increase scoring accuracy, the evaluator uses a predeter- mined scoring guide or template, also sometimes referred to as a rubric, which establishes a very specific set of scoring criteria, usually with examples of what to look for in order to assign a particular score. For example, if a rater is scoring an applicant on professionalism using a scale of 1 to 5, with 3 being “average” or “meeting expectations,” this score can also be accompanied by a description of what is considered “meeting expectations” so the assess- ment is not subject to the rater’s interpretation of what is expected. Although subjective tests are commonly used to make employment decisions, objective tests are preferred for making fair, accurate evaluations of and comparisons between job candidates. In the majority of situ- ations, not everything can be objectively measured, so both subjective and objective tests are used to combine the specificity of objective tests and the richness of subjective tests. Score Interpretation After a test has been scored, the score needs to be interpreted. Once again, this process must be standardized. Additionally, to be interpreted properly, a person’s score on a test must be compared to other people’s scores on the same test. I/O psychologists use a standardization sample, which is a large group of people who have taken the test against whose scores an individual’s score can be compared. The comparison scores
  • 19.
    provided by thestandardization sample are called test norms. The demographics of the standardization sample can be used to establish test norms for various racial and ethnic groups, men and women, and groups of different ages and various education levels. Let’s look at an example of how test-score interpretation works. As part of a selection process, an applicant might answer 30 out of 40 questions correctly on a multiple-choice cognitive ability test. By itself, this score provides little information about the applicant’s level of cogni- tive ability. However, if we can compare it to how others performed on the same test—spe- cifically, the test norms established by a standardization sample—we will be able to ascribe some meaning to the score. Often, raw scores, the number of points a person scores on a test, are transformed into per- centile scores, which tell the evaluator the percentage of people in the standardized sample who scored below an individual’s raw score. Continuing the example above, if the score of 30 falls in the 90th percentile, 90% of the people in the standardized sample scored lower than our fictional applicant. you83701_03_c03_067-102.indd 74 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
  • 20.
    75 Section 3.3 Requirementsof Psychological Measurement Test Reliability As introduced earlier, test reliability refers to the dependability and consistency of a test’s measurements. If a person takes a test several times and scores similarly on it each time, that test is said to measure consistently. If a test measures inconsistently, then outside factors must be influencing the results. For example, if an applicant takes a mechanical ability test one week and correctly answers 90 out of 100 items, but then takes another form of the test the following week and gets only 50 out of 100 items correct, the test evaluator must ask whether the tests are actually doing what they’re supposed to be doing— measuring mechanical abil- ity—or if something else is influencing the scores. Examples of common but often unreliable interview questions include “tell me about yourself ” and “discuss your strengths and weak- nesses.” Such questions are unreliable because the answer may vary widely depending on the applicant’s mood or recollection of recent events. Moreover, interpretation of the answers is subjective and depends on whether the interviewer likes or dislikes what the applicant says. There is also a very limited scope for comparing answers across applicants to determine which answers are higher or lower quality. On the other hand, more targeted and job-related questions—such as “tell me about a situation where you had to . . .” or “what would you do if you were faced with this situation”—are more likely to yield
  • 21.
    consistent and comparable responses.Thus, before trusting scores from any test that measures inconsistently, I/O psy- chologists must discover what is influencing the scores. The test taker’s emotional and physical state can influence his or her score. A person’s mood, state of anxiety, and level of fatigue may change from one test- taking time to another, and these factors can have a profound effect on test performance. Illness can also impact perfor- mance. If a person is healthy the first time she takes a test, but then has a cold when she takes the test again, her score will likely be lower the second time around. Changing environmental factors can also make a test measure inconsistently. Differences from one testing environment to another, such as room lighting, noise level, temperature, humidity, and equipment, will affect a person’s performance, as will the relative completeness or incompleteness of instructions and manner in which they are given. Differences between versions of the same test also influence reliability. Many tests have more than one version or form (the written portion of a driver’s license test is an example). Although the test versions aim to measure the same knowledge, the test items or questions vary. If the questions in one version are more difficult than those in another, the test taker may perform better on one version of the test. Finally, some inconsistency in test scores stems from real
  • 22.
    changes in thetest taker’s KSAOs. These changes often appear if a significant period of time passes between tests. High school students taking the SAT, for example, may show vast increases in scores from their junior year to their senior year because they have improved their cognitive ability, subject knowledge, and/or test-taking skills. Measures of Reliability Typically, reliability is measured by gathering scores from two sets of tests and then deter- mining their association. The reliability coefficient states the correlation—or relation- ship—between the two score sets, and ranges from 0 to +1.00. Although we won’t go into you83701_03_c03_067-102.indd 75 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 76 Section 3.3 Requirements of Psychological Measurement mathematical details of calculating correlations here, it is important to understand that the closer the score sets approach a perfect +1.00 correlation, the more reliable the test. For employment tests, reliability coefficients above +.90 are considered excellent, and those above +.70 are considered adequate. Tests with reliability estimates lower than +.70 may
  • 23.
    possess sufficient errorsto make them unusable in employment situations. I/O psychologists measure test reliability with the internal consistency, test– retest, interrater, alternate-form (or parallel-form), and split-halves methods described below. Internal Consistency Reliability Internal consistency reliability assesses the extent to which different test items or questions are correlated and thus consistently measure the same trait or characteristic. For example, if a 20-question test is used to measure extroversion, then scores on these 20 items should be highly correlated. If they are not, then some of the items may be measuring a different concept. The most common measure of internal consistency reliability is Cronbach’s alpha, which is an overall statistical measure of all the intercorrelations across items in a particular test. It can also pinpoint which items are poorly performing and which should be removed to improve the test’s internal consistency. Test–Retest Reliability Test–retest reliability involves administering the same test to the same group of people at two different times and then correlating the two sets of scores. To the extent that the scores are consistent over time, the reliability coefficient shows the test’s stability (see Figure 3.1). This method has a few limitations. First, it can be uneconomical, because it requires considerable time for employees to complete the tests on two or more occasions. Second, it can be difficult to determine the optimal length of time that should pass between one test-taking session
  • 24.
    and the next.If the interval is short, say, a few hours, test takers may remember all the ques- tions and simply answer everything the same way on the retest, which could artificially inflate the reliability coefficient. Conversely, waiting too long between tests, say, 6 months to 1 year, could cause retesting scores to be affected by changes that result from outside learning. This can artificially reduce the test’s reliability. The best time interval is therefore relatively short, such as a few weeks or up to a few months. Interrater Reliability As the name implies, and as introduced in earlier examples, interrater reliability involves allowing more than one evaluator (rater) to assess each candidate’s performance, and then correlating the scores of the raters across candidates. To the extent that the scores are con- sistent across raters, the test is considered more reliable. This method is particularly relevant for subjective tests that are prone to personal interpretation. Even with well-designed rubrics and valid questions, these methods introduce some variability in assessment across candi- dates and raters. Interrater reliability ensures that such variability is under control and insuf- ficient to bias the results of the process or alter hiring decisions. Various advanced statistical methods are available to take into account both the consistency of the absolute scores of the candidates across raters and their relative scores and rankings compared to each other. Both types of consistency are important. For example, absolute scores can affect selection deci- sions in situations where there is a cutoff score, such as in the
  • 25.
    case of collegeadmissions and certifications. Relative scores and rankings can affect promotion decisions or come into play when only a predetermined number of candidates (e.g., top five) can be selected. you83701_03_c03_067-102.indd 76 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 77 Mechanical comprehension at first test administration Mechanical comprehension at second test administration High Low High Low Mechanical comprehension at first
  • 26.
    test administration Mechanical comprehension atsecond test administration High Low High Low High reliability Low reliability Section 3.3 Requirements of Psychological Measurement Alternate-Form Reliability Alternate- (or parallel-) form reliability has to do with how consistent test scores are likely to be if a person takes two similar, but not identical, forms of a test. As with test–retest reliability, this measure requires two sets of scores from a group of people who have taken two varia- tions of a test. If the score sets yield a high parallel-form reliability coefficient, then the tests are not materially different. If the reverse happens, the tests are probably not equivalent and therefore cannot be used interchangeably. Figure 3.1: Test-retest reliability
  • 27.
    One way todetermine a test’s reliability is to conduct a test– retest. The more consistent test scores are over time, the more reliable the test is thought to be. From Levy, P.E. (2016). Industrial/organizational psychology: Understanding the workplace. (5 ed). p. 26, Fig. 2.3. Copyright 2017 by Worth Publishers. All rights reserved. Reprinted by permission of Worth Publishers. Mechanical comprehension at first test administration Mechanical comprehension at second test administration High Low High Low Mechanical comprehension at first test administration Mechanical comprehension at second
  • 28.
    test administration High Low High Low High reliability Lowreliability you83701_03_c03_067-102.indd 77 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 78 Section 3.3 Requirements of Psychological Measurement The major limitations of parallel-form reliability are that it is time consuming and costly. Fur- thermore, it requires the test developer to design two versions of a test that both cover the same subject matter and are equivalent in difficulty and reading level. Split-Halves Reliability Split-halves reliability tests are more cost effective than either test–retest or parallel-form reliability because they can be assessed with scores from one
  • 29.
    test administered justone time. After the test has been given, it is split in half, and scores from the two halves of the test are correlated. A high reliability coefficient indicates that each section of the test is consistently measuring similar content, whereas the reverse is true with a low reliability coefficient. The tricky part of split-halves reliability is determining how best to split the test. For example, if test items increase in difficulty as the test progresses, or if the first half of the test contains fewer difficult questions than the second, it won’t work to simply split the test down the middle and compare the scores from each half. To solve this dilemma, tests are often split by odd- and even-numbered questions. Find Out for Yourself: Test Reliability This exercise is designed to help you grasp the challenges involved in designing a reliable test. To begin, choose a trait or skill in which you are interested. For example, consider an academic subject, mastery of an online game, or a personality trait that you admire. Then, write 10 state- ments you believe to be good measures of the selected trait or skill. Ask three friends or family members to rate themselves on each statement on a scale of 1–5 (1 = strongly disagree, 5 = strongly agree). Finally, without looking at their scores, rate each individual on the same 10 statements based on your own perceptions of that individual. What Did You Learn? 1. Assess interrater reliability: Add up each individual’s scores
  • 30.
    based on hisor her own assessment. Rank order the scores. Add up each individual’s scores based on your assessment of that individual. Rank order the scores. Did the rankings change? 2. Assess split-halves reliability: Add up each individual’s scores on the first five questions. Add up each individual’s scores on the second set of five questions. Are the scores of each individual on the two test halves similar? Rank order the three individuals based on their first five questions. Rank order them again based on the second set of five ques- tions. Did the rankings change? 3. Ask the same three individuals to rate the same statements again a week later. Assess test–retest reliability: Add up each individual’s scores on the first time he or she took the test. Add up each individual’s scores on the second time he or she took the test. Are the scores similar? Rank order the three individuals based on their first test. Rank order them again based on their second test. Did the rankings change? As you can probably appreciate from this exercise, anyone can “whip up” a test, but the test may be highly subjective and unreliable if the scores are not consistent across raters, ques- tions, and times of test administration. You probably now have some idea as to how to improve your test. Similarly, scientific test design requires numerous iterations of writing and rewrit- ing items and statistically examining results with multiple samples to ensure reliability before
  • 31.
    the test canbe used. you83701_03_c03_067-102.indd 78 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 79 Section 3.3 Requirements of Psychological Measurement Test Validity Validity is the most important aspect of an employment test. Although a test is reliable if it is able to make consistent measurements, a test is valid if it truly measures the characteristics it is supposed to measure. An employment test may yield reliable scores, but if it doesn’t mea- sure the skills that are needed to perform a job successfully, it is not very useful. I/O psycholo- gists use three methods to establish test validity: criterion- related validity, content validity, and construct validity. Criterion-Related Validity The purpose of criterion-related validity is to establish a predictive, empirical (number-based) link between test scores and actual job performance (see Figure 3.2). To do this, I/O psycholo- gists compare applicants’ employment test scores with their subsequent job performance. This correlation is called the validity coefficient, and it ranges from 0 to ±1.00. Tests that yield validity coefficients ranging from +.35 to +.45 are considered
  • 32.
    useful for makingemployment decisions, whereas those with validity coefficients of less than +.10 probably have little rela- tionship to job performance. I/O psychologists use two different methods to establish criterion-related validity: predictive validity and concurrent validity. Predictive validity involves administering a new test to all job applicants but not using the test scores to try to predict the applicants’ job readiness. Instead, the scores are filed away to be analyzed later. After a time, managers will have accumulated performance ratings or other information that indicates how new hires are performing on the job. At that point, the new hires’ preemployment test scores will be correlated with their performance, and the organization can look at how successfully the test predicted perfor- mance. If the test proves to be a valid predictor of performance, it can be used in future hiring decisions. Although this approach is considered the gold standard of test validation, many organizations are unwilling to use the predictive validity method because filing away employ- ment test scores lets some unqualified applicants slip through the preemployment screening process. However, scientifically developed tests regularly use this method to validate them prior to their use. The concurrent validation approach is more popular because, instead of job applicants, on- the-job employees are used to establish the test’s validity. With this method, both the current employees’ test scores and their job performance ratings are
  • 33.
    gathered at thesame time, and test validity is established by correlating these measures. Organizations appreciate the cost- effectiveness offered by concurrent validation. Tests that are found to be of high concurrent validity based on the results from current employees are then used to assess job applicants. Other forms of concurrent validation include convergent and divergent validity. Convergent validity refers to the correlation between test scores and scores on other related tests. For example, SAT and ACT tests are expected to correlate, so the validation of new SAT questions may involve examining their correlation with ACT questions. Divergent validity refers to the correlation between test scores and scores on other tests or factors that should not be related. For example, test scores should not be related to gender, race, religion, or other protected classes. To ensure that a test is not discriminatory, the correlation between its scores and each of these factors can be examined. Divergent validity is supported when these correla- tions are low or statistically insignificant. Taken together, convergent and divergent validity you83701_03_c03_067-102.indd 79 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 80
  • 34.
    Cognitive ability score Job performance High Low Excellent Poor Cognitive ability score Job performance High Low Excellent Poor Highvalidity Low validity Section 3.3 Requirements of Psychological Measurement examine the extent to which a test relates to what it should be related to and does not relate
  • 35.
    to what itshould not be related to, respectively. Despite the time- and cost-saving advantages, concurrent validation does have a number of drawbacks. First, the group of employees who validate a test could be very different from the Figure 3.2: Criterion-related validity In order to establish a connection between test scores and actual job performance, I/O psychologists use criterion-related validity. Tests with high correlations between scores and job performance are considered to be high-validity tests and can be useful for making employment decisions. Tests with low correlations between scores and job performance are considered low-validity tests and would not be ideal for assessing job performance. From Levy, P.E. (2016). Industrial/organizational psychology: Understanding the workplace. (5 ed). p. 29, Fig. 2.4. Copyright 2017 by Worth Publishers. All rights reserved. Reprinted by permission of Worth Publishers. Cognitive ability score Job performance High Low Excellent
  • 36.
    Poor Cognitive ability score Job performance High Low Excellent Poor High validity Lowvalidity you83701_03_c03_067-102.indd 80 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 81 Section 3.3 Requirements of Psychological Measurement group of applicants who actually make use of the test. The former would likely skew the vali- dation because lower performing employees (and their low scores) would have already been
  • 37.
    removed from theirpositions and thus not be part of a test- validating process. This can cause the validity of the test to appear higher than it really is. On the other hand, employees may also skew the validation by not trying as hard as job applicants. Employees who already have jobs might not be as motivated to do their best as applicants eager for a job would be. This can cause the test’s validity to appear lower than it really is. Content-Related Validity Content-related validity is the rational link between the test content and the critical job-related behaviors. In other words, test items should be directly related to the important require- ments and qualifications for the job. The rationale behind content-related validation is that if a test samples actual job behaviors, then individuals who perform well on it will also perform well on the job. Remember that the goal of testing is to predict performance. As you can probably guess, content-related validation studies rely heavily on information gathered from a job analysis. If test questions are directly related to the specific skills needed to perform a job, the test will have high content-related validity. For example, a test for admin- istrative professionals might ask questions related to effective filing methods, schedule man- agement, and typing techniques. Because these skills are important for the administrative professional position, this test would have high content-related validity. Content validity can- not be evaluated numerically or statistically as readily as criterion validity. It is often qualita-
  • 38.
    tively evaluated bysubject matter experts. However, qualitative evaluations can be quantified using well-designed rubrics and assessed for reliability and consistency. Construct Validity Construct validity is the extent to which a test accurately assesses the abstract personal attri- butes, or constructs, that it intends to measure. Although there are valid measures of many personality traits, numerous invalid measures of the same traits can be found in less scientific sources such as magazines or the Internet. Because constructs are intangible, it can be chal- lenging to design tests that measure them. How do we know if tests of personality, reasoning, or motivation actually measure these intangible, unobservable characteristics? One way to establish construct validity is to cor- relate a new test with an established test that is known to measure the construct in question. If the new test correlates highly with the established test, the new test is likely measuring the construct it is intended to measure. Validity Generalization Initially, I/O psychologists thought that validity evidence was situation specific; that is, a test that was validated and used for applicants for one job could not be used for applicants for a different job unless an additional job-specific validation study was performed. Further, I/O psychologists believed that tests that had been validated for a position at one company could not be used for the same position at a different company—
  • 39.
    again, unless itwas validated for the second company, which is a tedious and costly process. you83701_03_c03_067-102.indd 81 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 82 Section 3.3 Requirements of Psychological Measurement However, research over the past few decades has shown that validity specificity is unfounded (Lubinski & Dawis, 1992). I/O psychologists now believe that validity evidence transfers across situations, a notion that is referred to as validity generalization. Researchers discov- ered that the studies that supported validity specificity were flawed (Schmidt & Hunter, 1981). Validity generalization has been a huge breakthrough for organizations. Establishing validity evidence for every employment test in every situation was, for most organizations, both cost and time prohibitive. Because of its cost- and time-saving benefits, the advent of validity gen- eralization has meant that organizations are more willing to integrate quality tests into their employment practices. Face Validity Face validity is not a form of validity in a technical sense; rather, it is the subjective impression
  • 40.
    of how jobrelevant an applicant perceives a test to be. For example, a bank teller would find nothing strange about taking an employment test that dealt with numerical ability or money- counting skills, because these skills are obviously related to job performance. On the other hand, the applicant may not see the relevance of a personality test that asks questions about personal relationships. This test would thus have low face validity for this job. Organizations need to pay close attention to their applicants’ perceptions of a test’s face valid- ity, because low face validity can cause an applicant to feel negatively about the organization (Chan, Schmitt, DeShon, Clause, & Delbridge, 1997; Smither, Reilly, Millsap, Pearlman, & Stof- fey, 1993). If organizations have the opportunity to pick between two tests that are otherwise equally valid, they should use the test with the greater level of face validity. Find Out for Yourself: Your Personality Type Visit the 16 Personalities website and take the personality-type assessment provided. Read your results, then visit the Encyclopedia Britannica entry on personality assessment and read about the reliability and validity of assessment methods. 16 Personalities Reliability and Validity of Assessment Methods What Did You Learn? 1. What have you learned about yourself through this
  • 41.
    assessment? 2. Is thisassessment accurate? Which types of validity and reliability apply to it? 3. Based on this assessment, what are some examples of jobs that would fit your type? you83701_03_c03_067-102.indd 82 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. https://www.16personalities.com/free-personality-test https://www.britannica.com/science/personality- assessment/Reliability-and-validity-of-assessment-methods 83 Section 3.4 Test Formats Qualitative Methods Most of the methods discussed in this chapter are quantitative in nature. However, employ- ers often find it necessary to take into consideration other factors that are not necessarily quantifiable but can be very important in employee selection. Qualitative selection methods may include observation, unstructured interview questions, consultation with references, or solicitation of opinions from past or future managers and coworkers. Qualitative methods can yield a much richer and broader array of information about an applicant that may be impos- sible to capture using quantitative methods. Unfortunately, however, qualitative methods are much harder to
  • 42.
    assess in termsof validity and reliability, and thus they may lead to erroneous decisions. Their subjectivity can also lead to discriminatory decisions. Qualitative methods can be particularly problematic in ranking applicants. Without a predetermined set of evaluation criteria and a rating scale, interrater reliability can be very low. That is why psychologists attempt to quantify even what would be considered qualitative criteria—such as person–organization fit and person–job fit—by creating survey measures of these factors. With the help of I/O psychologists, employers can also create quantitative scoring themes for qualitative data to increase the integrity and legal- ity of these methods. 3.4 Test Formats Thousands of employment tests are on the market today. Naturally, no two tests are the same and may differ in their construction or administration. Tests vary in their quality depending on the rigor of their validation processes. They also vary in cost depending on their extensive- ness and popularity. However, it is important to note that quality and cost do not always go hand in hand. Some of the most valid and reliable tests are available in the scientific literature free of charge, but they are not very popular among practitioners, who are often unfamiliar with the scholarly literature. On the other hand, some of the popular and expensive tests marketed by well-known consulting companies have questionable validity and reliability. In some cases these tests are only taken at face validity and are never statistically analyzed. In
  • 43.
    other cases thetest developers make sure their tests are valid and reliable but are reluctant to publicly share their analyses for proprietary reasons. In all cases prudent employers should demand evidence of validity and reliability in order to ensure that they are (a) investing their time and resources in the right selection tools that will yield the most qualified workforce and (b) using legally defensible and nondiscriminatory methods. Commonly used test-design for- mats include assessment centers, computer-adaptive tests, speed and power tests, situational judgment tests, and work-sample tests. Assessment Centers An assessment center is one of the most comprehensive tests available and is often used to select management and sales personnel because of its ability to assess interpersonal, commu- nication, and managerial skills. A typical assessment center includes personality inventories, cognitive assessments, and interviews, as well as simulated activities that mimic the types of activities performed on the job. Common types of simulated activities include in-basket tasks, leaderless group discussions, and role-play exercises. you83701_03_c03_067-102.indd 83 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 84
  • 44.
    Section 3.4 TestFormats Although assessment centers can predict the level of success both in training and on the job, they have a number of limitations. First, assessment centers are expensive to design and administer, because administrators must be specifically trained to evaluate discussions and perform role plays. Assessment centers for senior management positions can cost more than $10,000, a price tag that is prohibitive for many organizations. Second, because scoring an assessment center relies on the judgment of its assessors, it can be difficult to standardize scores across time and location. This issue can be mitigated by training assessors to evaluate behaviors against an established set of scoring criteria. Computer-Adaptive Tests Typical tests include items that sample all levels of a candidate’s ability. In other words, they contain some questions that are easy and will be answered correctly by almost all test tak- ers, some that are difficult and will be answered correctly by only a few, and some that are in-between. A computer-adaptive test, however, tailors the test to each test taker’s individual ability. In this type of test, the candidate begins by answering a question that has an average level of difficulty. If he or she answers correctly, the next question will be more difficult; if he or she answers incorrectly, the next question will be easier. This process continues until the candi- date’s proficiency level is determined. The fact that candidates
  • 45.
    do not wastetime answering questions of inappropriate difficulty is a clear advantage of the computer-adaptive test. Addi- tionally, because each test is tailored to the individual, test security (i.e., cheating) is less of a concern. Speed and Power Tests Tests can be designed to assess either an individual’s depth of knowledge or rate of response. The first type of test is called a power test. Power tests are designed to be difficult, and very few individuals are able to answer all of the items correctly. Test takers receive either a gen- erous time limit or no time limit at all. The overall purpose of the power test is to evaluate depth of knowledge in a particular domain. Therefore, response accuracy is the focus, instead of response speed. Speed tests contain a homogenous content set, and test takers receive a limited amount of time to complete the test. These tests are well suited to jobs in which tasks must be per- formed both quickly and accurately, such as bookkeeping or word processing. For these jobs, a data-entry test would be an appropriate speed test for measuring an applicant’s potential for success. Situational Judgment Tests A situational judgment test is a type of job simulation that is composed of a number of job- related situations designed to assess the applicant’s judgment. Each situation includes mul- tiple options for how to respond. The applicant must select the
  • 46.
    options that willproduce the most and least effective outcomes. Statistically, situational judgment tests have validities comparable to structured interviews, biographical data, and assessment centers (Schmidt & Hunter, 1998). you83701_03_c03_067-102.indd 84 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 85 Section 3.5 Testing for Individual Differences Situational judgment tests are frequently admin- istered to candidates for management positions, although research shows them to be predictive of performance for a wide variety of jobs. Stud- ies have found validity evidence for situational judgment tests’ ability to predict supervisory performance (Motowidlo, Hanson, & Craft, 1997) and to predict job performance for sales profes- sionals (Phillips, 1992), insurance agents (Dales- sio, 1994), and retail store employees (Weekley & Jones, 1997). Work-Sample Tests Work-sample tests evaluate an applicant’s level of performance when demonstrating a small sample set of a job’s specific tasks. The two general areas for work- sample tests are motor skills and verbal abilities. A test that requires a machinist applicant to
  • 47.
    properly operate adrill press is an example of a motor skills work-sample test; a test that asks a training applicant to present a portion of the organization’s training program is a verbal ability work-sample test. One advantage of work-sample tests is that they generally show a high degree of job related- ness, so applicants perceive that they have a high degree of face validity. Additionally, these tests provide applicants with a realistic job preview. The disadvantage is that they can be expensive to develop and administer. Goodshot/Thinkstock Situational judgment tests present applicants with job-related scenarios in order to assess their decision- making skills. Consider This: The Costs of Testing 1. For most of the tests discussed in this section, expense is a major drawback. Why do you think an organization would go to the trouble of developing and using employment tests? 2. How might the expense of a test be justified, offset, or overcome? 3.5 Testing for Individual Differences People differ in psychological and physical characteristics, and identifying and categorizing people in respect to these differences is important for successfully predicting both job per- formance and job satisfaction. The most commonly tested
  • 48.
    categories of individualdifferences are cognitive ability, physical ability, personality, integrity, and vocational interests. Each of these categories has an important theoretical foundation as well as its own set of advantages and disadvantages. Cognitive Abilities The past hundred years of study have produced two distinct concepts of cognitive ability. Beginning with Spearman’s seminal research in 1904 on general intelligence, one concept is based on the belief that cognitive ability is a single, unitary construct (called the g, or general, you83701_03_c03_067-102.indd 85 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 86 Section 3.5 Testing for Individual Differences factor) along with multiple subfactors (called s factors). According to this two-factor, or hier- archical, theory of cognitive ability, the g factor is important to all cognitive performance, whereas s factors influence specific intelligence domains. For example, your performance on a math test will be influenced by both your general, overall intelligence (the g factor) and your knowledge of the specific math topic being tested (an s factor). From test to test, your scores
  • 49.
    will be stronglycorrelated, because all performance is influenced by the g factor; however, the influence of s factors will keep the correlation from being perfect. So, although a high g factor of overall intelligence might mean that you would score higher than most on both a math test and a verbal reasoning test, your math test scores could be lower than your verbal reasoning scores simply because you never took a class that covered the specific topics on the math test (an s factor). Led by Thurstone’s research, begun in 1938, scientists challenged Spearman’s theories by proposing that cognitive ability was a combination of multiple distinct factors, with no over- arching factor. Using a statistical technique called factor analysis, Thurstone and his col- leagues identified seven primary mental abilities: spatial visualization, number facility, verbal comprehension, word fluency, associative memory, perceptual speed, and reasoning (Thur- stone, 1947). This theory suggests that employment tests should evaluate the primary men- tal abilities that are most closely linked to a specific job. For example, a test for engineering applicants would focus on spatial and numerical abilities. Although there is no consensus, research has supported Spearman’s hierarchical model of cognitive ability (Carroll, 1993; Schmid & Leiman, 1957). Consequently, employers tend to use tests to measure both general intelligence and specific mental domains. Those that focus on the g factor are called general cognitive ability tests. They measure one or more broad
  • 50.
    mental abilities, suchas verbal, mathematical, or reasoning skills. General cognitive ability tests can be used to evaluate candidates for almost any job, especially those in which cogni- tive abilities such as reading, computing, analyzing, or communicating are involved. Specific cognitive ability tests measure the s factors and focus on discrete mental abilities such as reac- tion time, written comprehension, and mathematical reasoning. These tests must be closely linked to the job’s specific functions. Cognitive ability tests are among the most widely used tests because they are highly effec- tive at predicting job and training success across many occupations. In their landmark study, Schmidt and Hunter (1998) examined validity evidence for 19 different selection processes from thousands of studies over an 85-year period. After compiling a meta-analysis (which is a combination of the results of several studies that address a set of related research hypoth- eses), Schmidt and Hunter found that cognitive ability test scores correlated with job perfor- mance at .51 and training success at .53, which were the highest validity coefficients among all the types of tests they examined. Other researchers found similar validities using data from European countries (Bertua, Anderson, & Salgado, 2005; Salgado, Anderson, Moscoso, Bertua, & Fruyt, 2003). Interestingly, additional research has found that a job’s complexity positively affects the validity of cognitive ability tests. In other words, the more complex the job, the better the test is at predicting future job performance. For jobs with low complexity,
  • 51.
    on the otherhand, high cognitive ability test scores are less important for predicting success- ful job performance (Hunter, 1980; Schmidt & Hunter, 2004). The most popular general cognitive ability test is the Wonderlic Cognitive Ability Test. Devel- oped by Eldon F. Wonderlic in the 1930s, this test contains 50 items, has a 12-minute time you83701_03_c03_067-102.indd 86 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 87 Section 3.5 Testing for Individual Differences limit, and is used by both private businesses and government agencies. Test norms have been set by more than 450,000 working adults and are avail- able for over 140 different jobs. Test content cov- ers numerical reasoning, verbal comprehension, and spatial ability. The test begins with very easy questions and progresses to very difficult ones; due to this range and the short time limit, few peo- ple are able to answer all 50 questions correctly. The test is offered in a variety of versions, includ- ing computer- and paper-based formats, and is now available in nine languages, including French, Spanish, British English, and American English. In all, more than 130 million job applicants have taken the Wonderlic.
  • 52.
    The Wechsler AdultIntelligence Scale-Revised (WAIS-R), developed by David Wechsler in 1955 and currently in its fourth edition, is another com- monly used general cognitive ability test. It differs from the Wonderlic in both length and scope and is composed of 11 different tests (6 verbal and 5 performance), requiring 75 minutes to complete. The 6 verbal tests are comprehension, information, digit span, vocabulary, arithmetic, and similarities. The performance tests are picture completion, picture arrangement, object assembly, digital symbol, and block design. Naturally, this complex psychological assessment requires well- trained administrators to ensure accurate scoring and score interpretation. The WAIS-R is typically used when select- ing for senior management or other positions that require complex cognitive thinking. Outside the world of work, a person’s cognitive ability also predicts his or her academic suc- cess. Using meta-analytic research, psychologists examined the relationship between the Miller Analogies cognitive ability test (a test commonly used to select both graduate students and professional-level employees) and student and professional success. Interestingly, results showed that there was no significant difference between the cognitive abilities required for academic and business success (Kuncel, Hezlett, & Ones, 2004). Although they are valid performance predictors for many jobs, cognitive ability tests can pro- duce different selection rates for individuals in select classes. Whites typically score higher
  • 53.
    than African Americans,and there is much concern that these differences are due to bias within the test. If a test is biased to favor one ethnic group or population over others, then any employment selection or promotion program that utilizes it will be inherently flawed. Discovering bias within a test can be extremely difficult, but I/O psychologists have been able to reduce the impact of potentially biased cognitive ability tests by adding noncognitive tests, such as personality tests, to selection processes (Olson- Buchanan et al., 1998). Physical Abilities Many jobs require a significant amount of physical effort. Firefighters and police officers, for example, may need physical strength, and mechanics may need finger dexterity. Other examples of hazardous or physically demanding work environments include factories, power Roy Delgado/CartoonStock you83701_03_c03_067-102.indd 87 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 88 Section 3.5 Testing for Individual Differences plants, and hospitals. Organizations must be careful to use information from a job analysis to
  • 54.
    understand the specificphysical requirements of a job before purchasing or developing any work-sample tests to use in the selection process. Fleishman (1967) identifies nine physical ability characteristics present in many jobs (see Table 3.1). Measures of each physical ability are not strongly correlated, however, which suggests that there is no overall measure of gen- eral physical ability. Table 3.1: Fleishman’s physical ability dimensions Physical ability Description Static strength Maximum muscle force exerted by muscle groups (e.g., legs, arms, hands) over a continuous period of time Explosive strength Explosive bursts of muscle energy over a very short duration to move the individual or an object Dynamic strength Repeated use of a single muscle over an extended period of time Trunk strength Ability of back and core muscles to support the body over repeated lifting movements Extent flexibility Depth of flexibility of arms, legs, and body Dynamic flexibility Speed of flexibility of arms, legs, and body Gross body coordination Ability to coordinate arms, legs, and body to perform activities requiring whole-body movements
  • 55.
    Gross body equilibriumAbility to coordinate arms, legs, and body to maintain balance and remain upright in unstable positions Stamina Ability to exert oneself physically over a long duration Research consistently demonstrates that physical ability tests predict job performance for physically demanding jobs (Hogan, 1991). Identifying individuals who cannot perform the essential physical functions of a job—especially in hazardous positions such as police officer, firefighter, and military personnel—can minimize the risk of physical harm to the job candi- date, other employees, and civilians. Another positive feature of physical ability tests is that they are not strongly correlated with cognitive ability tests, which as mentioned earlier tend to be biased. Thus, using physical ability tests in conjunction with cognitive ability tests can help decrease potential bias and make job performance predictions more accurate (Carroll, 1993). I/O psychologists must be careful to design and measure physical ability tests so they do not discriminate against minority groups. Unfortunately, although a job may legitimately need candidates who possess specific physical skills, the standards and measures of those skills are often arbitrarily or inaccurately made. For example, height and weight are often used as a proxy for physical strength. Even though these measurements are quick and easy to make, they are not always the most accurate, and they have resulted in the underselection
  • 56.
    of women forphysically demanding jobs (Meadows v. Ford Motor Co., 1975). Companies that fail to implement accurate, careful measurements of an applicant’s ability to perform a job’s necessary, specific physical requirements run the risk of discriminating against protected classes—something that will likely land them in court, where judges and juries consistently rule in favor of the plaintiff and award large settlements. you83701_03_c03_067-102.indd 88 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 89 Section 3.5 Testing for Individual Differences One example of a valid, commercially available physical ability test is the Crawford Small Parts Dexterity Test. This test is used to assess the fine motor skills and manual dexterity required for industrial jobs. It examines applicants’ ability to place small objects into small holes on a board. For their first task, test takers use a pair of tweezers to place 36 pins into holes and position a collar around each pin. In their second task, test takers use a screwdriver to insert 36 screws into threaded holes. The test is scored in two ways: by measuring the amount of time taken to complete both parts of the test and by measuring the number of items com- pleted during a set time limit (3 minutes for part 1 and 5
  • 57.
    minutes for part2). Personality I/O psychologists have studied how personality affects the prediction of job performance since the early 20th century. After examining 113 studies published from 1913 to 1953, Ghis- elli and Barthol (1953) found positive but small correlations between personality and the prediction of job performance. The researchers were surprised that the correlation was not stronger and suggested the companies had used personality tests with weak validity evidence. Developing and using tests with stronger validity evidence, they concluded, would facilitate better predictions of applicants’ future job performance. Guion and Gottier (1965) disagreed with this notion, suggesting instead that personality measures were unrelated to job performance—even though the two noted that their data came from studies that used poorly designed or theoretically unfounded personality mea- sures. In fact, most researchers at the time developed their own concepts of personality and created tests to match them, which naturally led to considerable inconsistency in the ways they measured personality constructs. Thus, although organizations continued to use person- ality tests to select candidates for management and sales positions, academic research in this area waned for the next 20 years. In the early 1990s Barrick and Mount’s (1991) landmark meta- analysis established the five- factor model of personality, now commonly referred to as the
  • 58.
    Big Five personalityfactors— extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience— which are described in detail in Table 3.2. This model is broader than earlier theories and therefore lends itself more readily to a useful classification for the interpretation of personal- ity constructs (Digman, 1990). Table 3.2: Big Five personality factors Factor Also referred to as Description Neuroticism Adjustment Insecure, untrusting, worried, guilty Extraversion Sociability Sociable, gregarious, fun, people person Openness to experience Inquisitiveness Risk taking, creative, independent Agreeableness Interpersonal sensitivity Empathetic, approachable, courteous Conscientiousness Mindfulness Dependable, organized, hardworking you83701_03_c03_067-102.indd 89 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 90
  • 59.
    Section 3.5 Testingfor Individual Differences The most important advantage of the five-factor model is the functional structure it provides for predicting the relationships between personality and job performance. Barrick and Mount (1991) reviewed 117 criterion-related validation studies published between 1952 and 1988 that measured at least one of the five personality factors. Results showed that conscientious- ness (a measure of dependability, planfulness, and persistence) was predictive of job perfor- mance for all types of jobs. Further, extraversion (a measure of energy, enthusiasm, and gre- gariousness) predicted performance in sales and management jobs. The other three factors were found to be valid but were weaker predictors of some dimensions of performance in some occupations. That same year, Tett, Jackson, and Rothstein (1991) found strong positive predictive evidence not only for conscientiousness and extraversion but also for agreeable- ness and openness to experience. On the other hand, they found neuroticism to be negatively related to job performance. These researchers also discovered that validity was higher among studies that referenced job analysis information to create tests that linked specific personal- ity traits with job requirements. In summary, then, measures of the Big Five personality fac- tors can significantly predict job performance, but to do so, they must be carefully aligned with critical job functions. In addition to strong criterion-related validity, measures of the Big Five factors also gener-
  • 60.
    ally show verylittle bias. Across a number of personality factors, score differences across racial groups and/or genders are minor and would be unlikely to adversely impact employ- ment decisions; two areas that fall outside this generalization are agreeableness, in which women score higher than men, and dominance (an element of extraversion), in which men score higher (Feingold, 1994; Foldes, Duehr, & Ones, 2008). Because they produce almost no adverse impact, personality tests can be used in conjunction with cognitive ability tests dur- ing selection processes to increase validity while reducing the threat of potential bias (Hough, Oswald, & Ployhart, 2001). The first test to examine the Big Five personality factors was the NEO Personality Inventory, named for the first three factors of neuroticism, extraversion, and openness to experience. Composed of 240 items, the test can be completed in 30 to 40 minutes and is available in a number of languages, including Spanish, German, and British English. However, research now supports much shorter versions, as short as 10 items (Gosling, Rentfrow, & Swann, 2003), which are ideal for use in work settings, either separately or in combination with other tests and survey measures. Types of Personality Tests There are two basic types of personality tests: projective tests and self-report inventories. The former presents test takers with an ambiguous image, such as an inkblot, and asks them to describe what they see. Trained psychologists then interpret the
  • 61.
    descriptions. The rationale forthis type of test is that test takers will project their unconscious personalities into their descriptions of the image. Two examples of projective tests are the Rorschach (inkblot) test and the Thematic Apperception Test. Projective tests are most often used by clinical psychologists and are rarely used in employee selection processes because they are expensive, are time consuming, and require professional interpretation. Instead, employers make use of self-report personality inventories, which ask individuals to identify how a situation, behavior, activity, or feeling is related to them. The rationale for this type of test is that test takers know themselves well enough to make an you83701_03_c03_067-102.indd 90 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 91 Section 3.5 Testing for Individual Differences accurate report of their own personality. Some advantages of the self-report inventories over projective tests are cost-effectiveness, standardization of administration practices, and ease in scoring and interpreting results. For example, the website below can help you assess your Big Five personality traits.
  • 62.
    Find Out forYourself: Your Big Five Personality Traits Visit the following website to find extensive information and research about the Big Five per- sonality traits, take the Big Five personality test, and get instant feedback. The Big Five Project Personality Test Unfortunately, self-report inventories have drawbacks. A major one is the tendency of test takers to distort or fake their responses. Because test items usually have no right or wrong answers, test takers can easily choose to provide socially acceptable responses instead of true answers in order to make themselves look better to the hiring organization. Indeed, in one controlled study, researchers instructed test takers to try to respond to a personality inventory in a way they felt would create the best impression, which resulted in more posi- tive scores (Hough, Eaton, Dunnette, Kamp, & McCloy, 1990). Furthermore, a significant num- ber of actual applicants who took personality inventories as part of a selection process were found to have distorted their responses to appear more attractive, even without having been told to do so (Stark, Chernyshenko, Chan, Lee, & Drasgow, 2001). The real question for I/O psychologists is whether response distortion significantly affects the validity of personality inventories. Ones, Viswesvaran, and Reiss (1996) conducted a meta- analysis that examined the effects of social desirability on the relationship between measures
  • 63.
    of Big Fivefactors and both job performance and counterproductive behaviors. They found that test takers’ attempts to provide socially acceptable—not necessarily truthful—answers did affect their scores in the areas of emotional stability and conscientiousness but did not seri- ously influence test validity. However, faking answers can influence hiring decisions by chang- ing the rank ordering of job candidates (Christiansen, Goffin, Johnston, & Rothstein, 1994). One interesting and paradoxical finding of personality test research is that people who are able to recognize the socially acceptable answers, whether they accurately represent the truth of that person, tend to perform better on the job than people who are unable do so (Ones et al., 1996). How can this be? One explanation is that people in the former group are better at reading a situation’s subtle social cues and are therefore more able to extrapolate what they need to do to fulfill coworkers’ and managers’ expectations. To balance the advantages and disadvantages of projective and self-report tests, a new type of tests, called implicit measures, has emerged. Implicit measures are self-report tests in which the questions are intentionally designed to make the purpose of the test less obvious and thus less amenable to faking and social desirability biases. For example, the test taker may be given a few seemingly neutral situations and a list of thoughts, feelings, and actions and be directed to select the ones that most closely represent him or her in each situation. This intentional vagueness allows implicit measures to assess a construct more
  • 64.
    accurately and comprehen- sively(Bing, LeBreton, Davison, Migetz, & James, 2007; LeBel, & Paunonen, 2011). you83701_03_c03_067-102.indd 91 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. http://www.outofservice.com/bigfive/ 92 Section 3.5 Testing for Individual Differences Honesty and Integrity Organizations need to be able to identify individuals who are likely to engage in dishonest behaviors. Employee misconduct is more serious than distortion of answers on a personality test and can have a significant impact on both coworkers and the organization as a whole. Employee theft, embezzlement, and other forms of dishonesty may cost American businesses billions of dollars annually. According to the National White Collar Crime Center, embezzle- ment alone is estimated to cost companies as much as $90 billion each year (Bressler, 2009). In the past, organizations used polygraph tests, but polygraph test results are not always accurate, and applicants may find them to be an invasion of privacy. When the Employee Poly- graph Protection Act was passed in 1988, most private employers became unable to use these tests, thus requiring them to find another way to identify this
  • 65.
    trait. A more validway to measure employee dishonesty is with an integrity test. Integrity tests fall into two categories: overt integrity tests and personality-based integrity tests. The first type assesses an individual’s direct attitudes and actions toward theft and employment dishon- esty. Test items typically ask individuals to consider their opinions about theft behaviors or to think of their own dishonest behaviors. Sample questions include “Is it OK to take money from someone who is rich?” and “Have you taken illegal drugs in the past year?” Personality-based integrity tests typically contain disguised- purpose, or covert, questions that measure various personality factors—such as responsibility, virtue, rule following, excitement seeking, anger, hostility, and social conformity— that are related to both produc- tive and counterproductive employee behaviors. Although overt integrity tests can predict theft and other glaring forms of dishonesty, personality-based integrity tests are able to pre- dict behaviors that are more subtly or secretly dishonest, such as absenteeism, insubordina- tion, and substance abuse. Vocational Interests Unlike most of the tests we have discussed so far, vocational interest inventories are designed for career counseling and should not be used for employee selection. In these inventories, test takers respond to a series of statements pertaining to various interests and preferences.
  • 66.
    In theory, peoplewho share the preferences and interests of successful workers in a given occupation should experience a high level of job satisfaction if they pursue that line of work. Vocational interest scores predict future occupational choices reasonably well, with between 50% and 60% of test takers subsequently working in jobs consistent with their vocational interests (Hansen & Dik, 2005). However, even though people are likely to get a job doing something in which they are interested, research does not support the notion that vocational interest will always lead to high job performance or satisfaction. In fact, interest congruence is only weakly related to job satisfaction and does not effectively predict either job or train- ing performance (Tranberg, Slane, & Ekeberg, 1993; Schmidt & Hunter, 1998). Keep in mind that just because someone is interested in a certain job does not mean he or she will be able to perform it well. One frequently used measure of vocational interest is the Strong Interest Inventory (SII), previously known as the Strong Vocational Interest Blank. The SII is a self-report inventory composed of 291 items divided into six themes: artistic, conventional, social, realistic, enter- prising, and investigative. The test requires 25 minutes to complete and is administered and you83701_03_c03_067-102.indd 92 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
  • 67.
    93 Section 3.6 Developinga Testing Program scored by computer. The results help test takers identify occupations, leisure activities, and work preferences that match their interests. It possesses norms for 211 different occupa- tions. Because interests tend to remain stable throughout a person’s life, it makes sense for high school and college students to take an interest inventory like the SII as they begin the process of developing their professional careers. Another example is the Armed Services Vocational Aptitude Battery (ASVAB). This test assesses a wide range of abilities that predict future success in the military. More than 1 mil- lion military applicants, high school students, and postsecondary students take this test every year. Test subcategories include reading comprehension, world knowledge, science, math, electronics, and mechanics. Find Out for Yourself: Occupational Interests and Personal Abilities Complete the Interest Profiler available on the O*NET website. O*NET Interest Profiler What Did You Learn? 1. What did you find out about your occupational interests and
  • 68.
    personal abilities? 2. Areyou currently working in a job that aligns with your occupational interests? Why or why not? 3.6 Developing a Testing Program Although creating, identifying, and using valid tests are essential for any quality testing pro- gram, organizations also face a number of administrative decisions that can affect the pro- gram’s overall success. Deciding When Not to Test Most important, an organization must decide whether to use testing in the selection process. Time and cost are extremely important considerations, and they include test development and design, necessary equipment, facility usage, and administrator/evaluator training and pay. Naturally, organizations will need to ensure that the benefits of their testing program outweigh the costs. Sometimes, the level of employee productivity that a test is able to identify is not high enough to warrant the expense of a testing program. In these cases other measures, such as improv- ing employee training and development, can help advance new hires’ performance. Alter- nately, conducting a more careful review of applicants’ educational backgrounds or asking more in-depth interview questions can provide greater insight into the job-related skills and abilities of potential employees, without having to add tests to the preemployment process.
  • 69.
    you83701_03_c03_067-102.indd 93 4/20/174:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. http://www.mynextmove.org/explore/ip 94 Section 3.6 Developing a Testing Program In summary, then, it is important for an organization both to establish its employment needs and to determine the potential benefits and expected costs of testing programs before imple- menting these useful selection tools. Finding Quality Tests Over the past few decades, the volume and variety of selection tests have increased dramati- cally. Unfortunately, the test publishers and consulting companies that design them use vary- ing levels of rigor and expertise. How, then, is an organization to know which tests have met acceptable standards of validity, reliability, and test norming? Two periodicals, the Mental Measures Yearbook and Tests in Print, are excellent reference sources that provide descrip- tions and expert reviews of a large number of tests, including those used for selection. As you learned in Chapter 1, you can also consult the original scientific research that was conducted to develop and validate various tests in order to assess their quality and rigor.
  • 70.
    Find Out forYourself: Quality of Selection Methods Research various selection methods with which you are familiar or that you have undergone in the past. Examples may include job applications, interviews, reference checks, medical exams, and referrals. Try to find validity and reliability scores for each method. Which ones are more valid? Which ones are more reliable? Why do you think that is the case? Test Administrators A test’s usefulness depends in part on its proper administration, scoring, and interpretation. Organizations must not only train their testing administrators on these key functions but also establish quality controls and retrain administrators when necessary. The requirements for administrator qualifications and abilities vary from test to test, so it is important for organi- zations to be aware of the requirements outlined in each test’s manual when selecting and administering tests. Addressing Ethical and Privacy Issues Test security is a major concern for I/O psychologists and organizations in order to maintain high ethical testing practices. Tests and scores must remain confidential. Questions should never be published or distributed to the public, and tests should only be administered to qualified individuals. Some applicants may view tests as an invasion of privacy, particularly ones that assess per- sonality and integrity or screen for drugs. As we have noted,
  • 71.
    fear or mistrustin the selection process can have adverse consequences. Organizations can alleviate some of these concerns by communicating to applicants the reasons for the test, how test results will be used, and how confidentiality will be maintained. you83701_03_c03_067-102.indd 94 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 95 Section 3.7 Psychological Testing: Special Issues Testing People With Cultural and Language Differences Differences in cultural backgrounds can shape how test takers interpret and respond to test questions. As the American workforce becomes more racially and ethnically diverse, it is criti- cal that organizations emphasize the use of culturally unbiased tests. Moreover, English is no longer the primary language for a growing number of applicants. Naturally, applicants who cannot fluently read or speak the language used in a test will likely return artificially low scores, not because they lack skills or knowledge but simply because they cannot com- prehend the instructions or understand the test questions. To overcome language barriers, a test can be translated into a variety of languages. However, a common problem with this approach is that expressions and phrases used in the test items
  • 72.
    may be lostin translation, which decreases the test’s validity generalization. Thus, additional validation is necessary to assess validity generalization whenever a test will be used for a different racial or ethnic group or translated into a different language, and the test may need to be adapted accordingly. Testing People With Disabilities The ADA protects qualified individuals with disabilities from discrimination in all areas of employment, including employment testing. It can be challenging for organizations to accom- modate individuals with disabilities; they must aim to be sensitive to the needs of the indi- vidual while also maintain the integrity of the testing process and avoid incurring undue hardship. Test administrators require training to understand and properly respond to accom- modation requests. Examples of reasonable accommodations include modifying test equip- ment or seating, ensuring accessibility to the testing facility, and providing a Braille or large- print version of a test to visually impaired candidates. Establishing Appeals and Retest Processes Every applicant should have the opportunity to perform at his or her best on a test. Despite every intention to create this opportunity, sometimes it is simply not possible. Equipment can malfunction, the testing environment could be poor (noise, temperature, bad odors, or even disasters such as fire or flood), and candidates can be affected by outside stressors (illness or hospitalization of a family member, among others). With each of these situations, candidates
  • 73.
    could perform significantlybetter if given the opportunity to retake the test under conditions in which the negative influences are not present. Test administrators should be trained to identify situations that produce invalid results and then implement a specific process to retest the candidate based on information and guidance provided by the test publisher. The organization should also establish policies for handling complaints regarding testing in order to resolve concerns fairly and consistently. 3.7 Psychological Testing: Special Issues Over the past decade, I/O psychologists have become interested in a number of questions related to employment testing. How do applicants feel about being tested? Do these feel- ings affect their perceptions of the company? Do online tests show the same validity as you83701_03_c03_067-102.indd 95 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 96 Section 3.7 Psychological Testing: Special Issues paper-and-pencil tests, and how can organizations keep applicants from cheating on them? Recent research findings shed some light on these interesting questions.
  • 74.
    Applicants’ Reactions toTests Most research about testing has focused on technical aspects— content, type, statistical mea- sures, scoring, and interpretation—and not on social aspects. The fact is, no matter how use- ful and important tests are, applicants generally do not like taking them. According to a study by Schmit and Ryan (1997), 1 out of 3 Americans have a negative perception of employment testing. Another study found that students, after completing a number of different selection measures as part of a simulated application process, preferred hiring processes that excluded testing (Rosse, Ringer, & Miller, 1996). Additional research has shown that applicants’ nega- tive perceptions about tests significantly affect their opinions about the organization giving the test. It is important for I/O psychologists to understand how and why this occurs so they can adopt testing procedures that are more agreeable to test takers and that reflect a more positive organizational image. Typically, negative reactions to tests lower applicants’ perceptions of several organizational outcome variables, including organizational attraction (how much they like a company), job acceptance intentions (whether they will accept a job offer), recommendation intentions (whether they will tell others to patronize or apply for a job at the company), and purchasing intentions (whether they will shop at or do business with the company). A study conducted in 2006 found that “[e]mployment tests provide organizations with a serious dilemma [: . . .]
  • 75.
    how can [they]administer assessments in order to take advantage of their predictive capa- bilities without offending the applicants they are trying to attract?” (Noon, 2006, p. 2). With the increasing war for top-quality talent, organizations must develop recruitment and selec- tion tools that attract, rather than drive away, highly qualified personnel. What Can Be Done? I/O psychologists have addressed this dilemma by identifying several ways to improve testing perceptions. One is to increase the test’s face validity. For example, applicants typically view cognitive ability tests negatively, but their reactions change if the test items are rewritten to reflect a business-situation perspective. Similarly, organizations can use test formats that already tend to be viewed positively, such as assessment centers and work samples, because they are easily relatable to the job (Smither et al., 1993). Providing applicants with information about the test is another, less costly way to improve perceptions. Applicants can be told what a test is intended to measure, why it is necessary, who will see the results, and how these will be used (Ployhart & Hayes, 2003). Doing so should lead applicants to view both the organization and its treatment of them during the selection process more favorably (Gilliland, 1993). Noon’s 2006 study investigated applicants’ reac- tions to completing cognitive ability and personality tests as part of the selection process for a data-processing position. Half of the applicants received detailed information explaining
  • 76.
    you83701_03_c03_067-102.indd 96 4/20/174:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 97 Section 3.7 Psychological Testing: Special Issues the testing process, while the other half received the standard information normally provided by the company. Applicants who received the detailed information found the company more attractive and were more likely to recommend it to others, even if they did not receive a job offer. Although this tactic is not always used during testing process, providing detailed infor- mation about a test is a quick, cost-effective, and practical way for organizations to improve applicant test perceptions (Lounsbury, Bobrow, & Jensen, 1989). Online Administration Over the past decade, online testing has increased dramatically. Also referred to as unproctored Internet testing, the online test has replaced many traditional paper-and-pencil alternatives, and almost every type of test is now available through online administration. Online tests have a number of advantages over paper-and-pencil tests. They can be taken by anyone from nearly anywhere in the world, increasing the pool of applicants for a job. Brick-and-mortar testing
  • 77.
    facilities and proctorsno longer need to be a part of the testing program, because applicants can complete the test from home or a public library. The amount of administrative support also decreases, further lowering costs. Hiring decisions are made faster using online admin- istration, because the testing software provides immediate scoring and feedback on test perfor- mance. Finally, online tests often take advantage of new interactive technology, such as video clips, simulations, and gaming platforms, and thus test takers find them more engaging. Despite the ease and mass marketability of Internet tests, however, I/O psychologists struggle to reach a consensus on their efficacy and ethicality as well as the validity of the test scores they yield. Online tests present a number of challenges, including applicant cheating, poten- tial for subgroup differences, and the inability to identify the test taker (Tippins et al., 2006). Current research suggests that unproctored Internet tests are not compromised by cheating (Nye, Do, Drasgow, & Fine, 2008), although undoubtedly there will be occasions when an applicant will feel the stakes are high enough to rationalize cheating. One solution to this potential problem is to require candidates to complete a portion of the test under proctored conditions at the organization prior to receiving a job offer. Regardless of the challenges, online tests are here to stay. The advantages far outweigh the disadvantages, leaving I/O psychologists with the “delicate task of balancing best ‘academic’ practice with often conflicting practical and organizational
  • 78.
    needs to implementvalid and reli- able online assessments” (Kaminski & Hemingway, 2009, p. 26). Minerva Studio/iStock/Thinkstock Online testing is becoming more prevalent and has many benefits, such as increasing the applicant pool and lowering costs. you83701_03_c03_067-102.indd 97 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 98 Section 3.8 Testing for Positivity 3.8 Testing for Positivity Selecting for KSAOs is important, but so is selecting for positivity. Psychological testing can be an effective way to assess applicants’ positivity levels. Although valid and reliable tests are available to assess many physical, cognitive, social, and psychological traits and abilities, tests are just starting to emerge that adequately assess pos- itivity in general as well as specific positive psycho- logical qualities in an applicant or an employee. This is not to say that established psychological assessments are predominantly negative. To the con-
  • 79.
    trary, many ofthe most recognized psychological tests are positive in nature. For example, the Big Five per- sonality traits include four clearly positive traits— conscientiousness, extroversion, agreeableness, and openness to experience—and only one negative trait, neuroticism. However, many of the existing psycho- logical tests are primarily based on problem-oriented models and processes, which mainly focus on detect- ing what an applicant may be lacking rather than on assessing what makes an applicant flourish and thrive in the workplace. Inherently, an applicant’s positivity and negativity are often considered to be opposite sides of the same coin. For example, an employer might assume that an employee who tests high on optimism will be low on pessimism, or that an employee who is high on posi- tive affect should also be low on negative affect. These assumptions have gone unchallenged for years, and many organizations and consultants have readily made extrapolations from positive to negative and negative to positive psychological charac- teristics, as if they are two ends of the same continuum. However, recent studies show that these assumptions may not always hold true. For example, Schaufeli and Bakker’s (2004) study showed strong evidence that experiencing burnout and engagement at work are two distinct psychological constructs rather than polar opposites of the same continuum, and each is affected by different job characteristics. Another example is work behavior. Positive and negative work behaviors have been shown to be distinct and to yield different, not just opposite, performance outcomes (Dunlop & Lee, 2004; Sackett,
  • 80.
    Berry, Wiemann, &Laczo, 2006). Several resources are available for those employers searching for valid and reliable tests of positivity. The reference Positive Psychological Assessment: A Handbook of Models and Mea- sures (Lopez & Snyder, 2003), published by the American Psychological Association, evalu- ates the validity and reliability of numerous tests of positive psychological constructs such as optimism, hope, confidence, creativity, and courage. Of course, for each psychological con- struct, there are several alternative tests. Some offer higher validity or reliability than others, so employers should carefully select not only which constructs they should test and evaluate but also which specific tests to use. Monkey Business Images Ltd/Thinkstock Psychological testing can be an effective way to assess applicants’ positivity levels. you83701_03_c03_067-102.indd 98 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 99 Section 3.8 Testing for Positivity
  • 81.
    Three specific positivitytests have been found particularly relevant and predictive of work performance and other desirable outcomes such as job satisfaction, organizational commit- ment, and overall well-being. The first of these is Gallup’s Strengthfinder (Rath, 2007), which assesses test takers on 34 different “talents” and is widely used in the United States and around the world to select, place, and fit job candidates in the right jobs, usually based on the test taker’s top five talents. The second is the Psychological Capital Questionnaire (PCQ- 24), which has been recently developed and validated by Fred Luthans and colleagues (Luthans, Avolio, Avey, & Norman, 2007; Luthans, Youssef-Morgan, & Avolio, 2015) at the University of Nebraska. It is a 24-item measure that assesses the test taker’s levels of hope, confidence, resilience, and optimism and combines these four psychological resources into one higher order positive construct. There is also a short, 12-item version of this test (PCQ-12; Avey, Avolio, & Luthans, 2011; Luthans, Youssef, Sweetman, & Harms, 2013) that has been translated into a number of languages and tested in numerous cultures (Wernsing, 2014), as well as an implicit version that uses adaptable positive, negative, and neutral situations to assess test takers’ reactions (Harms & Luthans, 2012). The third is Barbara Fredrickson’s (2009) positivity ratio assessment, which measures posi- tivity and negativity as two independent constructs and then calculates the ratio of positive-
  • 82.
    to-negative responses. Seethe feature box Find Out for Yourself: How Positive Are You? to learn more about this assessment. Find Out for Yourself: How Positive Are You? Visit the Positivity website to complete Barbara Fredrickson’s positivity ratio assessment and instantly obtain your own positivity ratio. Keep in mind that this assessment is somewhat volatile and will change depending on the situations you encountered the day before. To get a more accurate assessment, it is recommended that you complete this test multiple times over several days and take an average of your scores. Also keep in mind that some of the statisti- cal analysis behind positivity ratios has been criticized, so be sure to review the 2013 update posted by Fredrickson on the same website. Positivity Ratio Assessment you83701_03_c03_067-102.indd 99 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. http://www.positivityratio.com/single.php 100 Summary and Conclusion Summary and Conclusion Selecting the right candidates from the available pool of job
  • 83.
    applicants is criticalfor employee and organizational success. When the right employee is selected and placed in the right job, performance is higher, which can translate into higher productivity and finan- cial returns for the organization. For example, well-placed employees tend to go above and beyond the immediate job requirements, which can positively influence coworkers and pro- mote a positive organizational culture. Properly selected employees are also likely to stay with the organization longer and be absent less often, which can translate into enormous cost savings. Equally important, well-placed employees will likely experience more satisfac- tion with their jobs and the organization, have higher work engagement levels, and perceive their jobs as more meaningful, all of which contribute to higher employee well-being. Because effective employee selection relies on predicting subsequent performance on the job, it is beneficial for managers to use the most accurate and consistent predictors avail- able. Psychological testing affords managers the opportunity to use valid and reliable tests that can fulfill this important role. However, many managers continue to use highly subjec- tive approaches to selection, which often ends up wasting their time and their organization’s resources on selecting, training, and managing the wrong job applicants, or worse, exposing their organization to discrimination-based lawsuits that can be time consuming and costly and can compromise its reputation.
  • 84.
    The role anI/O psychologist plays in the test-selection process is threefold. First, I/O psy- chologists can educate managers and organizational decision makers on the importance of finding evidence for a test’s validity and reliability before attempting to use it. Second, they can use available evidence to help organizations discern among multiple tests, a process that managers often perceive as intimidating or difficult to understand. Third, I/O psycholo- gists contribute to the development of more valid and reliable tests and selection tools in areas where none currently exist, helping create the most appropriate and efficient methods for selecting the right candidates in an ever-changing workplace. constructs Abstract, intangible personal attributes. Cronbach’s alpha A statistical measure of the intercorrelations across items in a test. meta-analysis A combination of the results of several studies that address a set of related research hypotheses. objective tests Tests that have one clearly correct answer. percentile scores The percentage of people in the standardized sample who scored below an individual’s raw score. raw scores The number of points a person scores on a test.
  • 85.
    reliability The extentto which the results from a predictor such as a selection tool, method, or procedure can be consistently replicated over time and across situations. Key Terms you83701_03_c03_067-102.indd 100 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 101 Summary and Conclusion standardization sample A large group of people who take a test and act as a compari- son group against whose scores an individ- ual applicant’s scores can be compared. subjective tests Tests that have no defini- tive right or wrong answer; thus, scores rely heavily on the evaluator’s interpretation and judgment. test A method used to make an employment decision about whether to hire, retain, pro- mote, place, demote, or dismiss someone; an instrument or procedure that measures an individual’s employment and career-related qualifications and characteristics or mea- sures samples of behavior or performance.
  • 86.
    test norms Thecomparison scores pro- vided by a standardization sample. validity The extent to which a selection tool or procedure can accurately predict subse- quent performance. validity generalization The notion that validity evidence transfers across situations. you83701_03_c03_067-102.indd 101 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. you83701_03_c03_067-102.indd 102 4/20/17 4:22 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 103 4Performance Appraisal iStockphoto/Thinkstock Learning Outcomes After reading this chapter, you should be able to
  • 87.
    • Recognize theimportance of performance measurement for organizational success. • Identify and critique common approaches to measuring employee performance in organizations. • Explain the differences between objective and subjective performance appraisals. • Describe the different performance appraisal formats. • Apply the concepts of validity and reliability to performance appraisal tools and processes. • Assess the effects of rating errors on performance appraisal accuracy. • Implement an effective performance management system. • Apply positive psychology to performance appraisal processes. • Link employee performance and performance appraisal results to financial outcomes. you83701_04_c04_103-134.indd 103 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 104 Section 4.1 The Importance of Performance Appraisals
  • 88.
    4.1 The Importanceof Performance Appraisals Throughout your life, people will evaluate your performance in ways that shape who you become and where you will go. From elementary school through college, on the athletic field and in your community, from your first part-time job to your adult career, others will test and evaluate and compare your performance, the results of which will determine whether you advance to the next phase of life. Within organizations, assessing employees’ performance tends to be perceived as a necessary evil that neither managers nor staff particularly enjoy. Many employees fear that even one low performance rating could affect their pay or damage their career. Even more concerning is the prospect of receiving low ratings from a manager who doesn’t ever directly observe or work with you but uses secondhand information or personal biases to make his or her evalu- ations. Sadly, this is frequently the case. Consider This: How Do You Feel About Being Evaluated? Think about one or more occasions in which you were being evaluated. It could be at work, at school, while playing a sport, or elsewhere. Questions to Consider 1. Describe your feelings and thoughts before you received these evaluations. Were you anxious? Were you looking forward to the evaluation? 2. Describe your feelings and thoughts while receiving these evaluations. Were you sur-
  • 89.
    prised? Upbeat? Interestedin receiving feedback? Actively involved? Did you passively receive the information? Feel under attack? 3. Describe your feelings and thoughts immediately after these evaluations. Were you excited? Flattered? Humiliated? Angry? Defensive? 4. What effects did these evaluations have on your personal, social, or professional life? Did they make you a better person in any way? Explain your answer. Managers also experience anxiety when completing performance appraisals. Most often, they worry that criticisms, no matter how small, might provoke negative reactions, ranging from disappointment and frustration to anger and hostility. These emotions can strain the man- ager–employee relationship or cause the employee to become less motivated or even to quit. As a result, managers tend to shy away from providing negative performance feedback, which of course negates accuracy. you83701_04_c04_103-134.indd 104 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 105 Section 4.1 The Importance of Performance Appraisals
  • 90.
    If everyone dislikesperformance appraisals, why keep doing them? For one thing, unman- aged performance is chaotic and random. Employees’ work needs to be aligned with the orga- nization’s overall goals, and clear performance feedback helps everyone know if this is indeed happening. In fact, a well-designed performance appraisal system should not only provide employees with rich feedback but should also communicate clear performance expectations and include information that will help them perform at their highest level possible (Pulakos & O’Leary, 2011). Appraisals that meet each of these concerns will enable the organization to further its mission to succeed. The question, then, is not whether to keep doing performance appraisals but how to make them most effective. Using Performance Appraisals A performance appraisal is the formal process through which employee performance is assessed. It involves providing feedback to the employee and designing corrective action plans if necessary. Organizations conduct performance appraisals for the following reasons: 1. To evaluate performance objectively. Organizations need some sort of system to mea- sure the value of each employee’s performance. These measures must be objective and allow managers to consistently compare the performance of people who have the same job function. Consider This: How Do You Feel About Evaluating Others? Think about one or more occasions in which you had to evaluate
  • 91.
    or give feedbackto someone. Again, it can be at work, at school, or in the context of a sport. Personal and social settings can also be used for this exercise. Questions to Consider 1. Describe your feelings and thoughts before you gave your evaluation or feedback. Were you anxious? Hesitant? Excited? 2. What were your primary concerns? The fairness of your evaluations? The reactions of the people you were evaluating? The repercussions of your evaluation for yourself and/ or the person you were evaluating? 3. Describe the settings in which you had to communicate your evaluations. Was it face-to- face? On the phone? Via e-mail? In a written report? 4. Describe the content of your feedback. Was it positive, neutral, or negative? 5. How did the person you were evaluating react? Was your feedback appreciated? Toler- ated? Rejected? 6. How did you manage or leverage the reaction of the person you were evaluating? Did you involve him or her? Did you ask for his or her input? 7. Describe your feelings and thoughts after giving your evaluation. Were you stressed? Drained? Relieved? Did you feel more or less confident about your feedback communi-
  • 92.
    cation skills andyour ability to accurately assess others’ performance? 8. What effects did your evaluation have on others’ personal, social, or professional lives? Did it make them better? Did it help them advance their careers? How did it affect your relationships with the individual you evaluated? Explain. you83701_04_c04_103-134.indd 105 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 106 Section 4.1 The Importance of Performance Appraisals 2. To increase worker motivation. Appraisals provide employees with specific feedback regarding their strengths and weaknesses. When workers know what they should be doing, how they actually are doing, and how they can improve, they are often motivated to perform better. 3. To make administrative decisions. Managers rely heavily on data from performance appraisals when making decisions about employee raises and bonuses, promotions, demotions, or even terminations. Employees must perceive these decisions as fair and
  • 93.
    free from bias;a good performance appraisal will facilitate those favor- able perceptions. 4. To improve organizational perfor- mance. Performance appraisals are essential to improving organizational performance (DeNisi & Sonesh, 2010). They pinpoint skill deficiencies in specific parts of the organization, helping managers focus their training and selection efforts. Appraisals also enhance an organization’s opportunities for success by identifying poor performers. This not only helps weed out subpar personnel but also motivates top performers to keep their performance levels high. It is important to note that the link between performance appraisals and organizational performance is not always direct. As discussed throughout this chapter, performance appraisal needs to be an integrated part of an effective set of human resource practices in order to have a positive impact on organizational per- formance (DeNisi & Smith, 2014). 5. To establish training requirements. Appraisal data provides insight into workers’ knowledge, skill, and deficiency levels. This information helps managers establish specific training objectives, update or redesign training programs, and provide appropriate retraining for specific employees. 6. To enhance selection and testing processes and outcomes. An important use of perfor-
  • 94.
    mance appraisal datais to establish the criterion-related validity of selection tests. Recall from Chapter 3 that criterion-related validity establishes a predictive, empiri- cal (number-based) link between test scores and actual job performance by cor- relating applicants’ employment test scores with their subsequent performance on the job. How can the predictive capacity of a test be determined if the organization does not design and implement objective performance measures and procedures? It would have no accurate data with which to correlate test scores, which would hinder its ability to design and use accurate tests and implement effective selection processes. 7. To provide a level playing field. Performance appraisals help clarify expectations. This is especially important in large organizations that feature many managers who oversee similar positions. Both managers and employees should ensure that they are pursuing a unified set of goals and expectations and that there are no discrepancies. A iStockphoto/Thinkstock Performance appraisals have several benefits, including increased worker motivation. Employees are more productive and satisfied when they know what is expected of them and how they can achieve these results.
  • 95.
    you83701_04_c04_103-134.indd 106 4/20/174:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 107 Section 4.2 Approaches to Measuring Performance formal performance appraisal process can help with that standardization and calibra- tion process. Without performance appraisals, employees may be evaluated unfairly because some managers may be intentionally or unintentionally more lenient than others. You will learn about some of these common biases in this chapter. 4.2 Approaches to Measuring Performance I/O psychologists have identified a number of techniques for measuring employee perfor- mance. These measures can be either objective or subjective. Generally, what is measured and how it is done depends on the type of work an employee performs. Some jobs, such as sales and assembly-line work, have objective outcome measures (sales revenue, number of pieces assembled), whereas others are more subjective (wait staff performance, art design work). Objective Performance Measures Objective performance measures are quantitative measures of performance that can be found in unbiased organizational records. They are generally
  • 96.
    easy to gatherand include two types of data: production measures (such as sales volume, number of units made, number of error occurrences) and personnel measures (such as absenteeism, tardiness, theft, and safety behaviors). Both measures are also usually evaluated according to the quality of performance. However, objective measures can be deceivingly simple. Consider the performance of two sales professionals in an insurance company. Over the course of a year, Salesperson A sold 500 pol- icies and Salesperson B sold 1,000. According to this data, Salesperson B appears to be the better salesperson—twice as good, in fact. However, if we examine the quality of each worker’s per- formance, we might learn that Salesperson B sold unprofitable policies, resulting in a $1 mil- lion loss for the company. Salesperson A, on the other hand, sold very profitable policies, result- ing in a $1 million profit. Alternatively, Salesperson B may have focused on selling more policies while cutting corners on after-sale service and follow-up, which could have resulted in dissatisfied customers. On the other hand, Salesperson A may have invested more time per sold policy on such interactions. Although the organization may not directly measure or reward after-sale service and follow- up, these customer interactions can help build its reputation and are known to result in more satisfied customers returning for additional products and referring others. Repeat business from and referrals by satisfied customers are significantly less costly for an organization to
  • 97.
    generate than isbuilding new clientele. However, these additional sales may not be easily attributable to Salesperson A if the returning or referred customers are assigned to another michaeljung/iStock/Thinkstock Recording how many cars this man sold over the past year is one way to objectively measure his performance, although it does not address other aspects of his performance, such as customer service. you83701_04_c04_103-134.indd 107 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 108 Section 4.2 Approaches to Measuring Performance salesperson. As you can see, evaluating worker performance by quantity alone is not a wise course of action. Unfortunately, even after accounting for performance quality, objective data may not provide an accurate or complete picture of an employee’s performance. Many factors beyond workers’ control can limit their ability to perform their best. Looking more closely at our two insurance salespeople, we might discover that the difference in sales volume could be attributable to the
  • 98.
    location of eachemployee’s branch office. Salesperson A could work in a small town, while Salesperson B works in a large metropolitan region. Thus, Salesperson A could have captured a larger market share of his designated region than Salesperson B, even though he sold fewer policies. Alternatively, perhaps Salesperson B was assigned an easy-to-sell policy because she was a new employee, while Salesperson A, as a veteran employee, was assigned a hard-to-sell but very lucrative policy. As you can see, accurate performance evaluations require more than a cursory look at sales and production numbers, although adding manager interpretation into the mix does make objective performance measures more subjective. Personnel data, another objective measure, includes components such as theft, tardiness, absenteeism, safety behaviors, and rate of advancement. Though not typically related to a worker’s ability to do the job, these elements do indicate job success or failure. Many jobs, such as teachers, customer service representatives, and bank tellers, require consistent and timely daily attendance. Thus, absenteeism and tardiness are often used to evaluate these workers’ performance. Other jobs, such as machine operators, assembly- line workers, and truck driv- ers, have serious safety risks. With jobs such as these, it makes sense to keep count of employ- ees’ accidents and safety incidents and use them as objective measures of performance. As with any type of objective data, however, taken on its own, personnel data can be mis-
  • 99.
    leading. Once again,circumstances outside a worker’s control could affect performance. Sick children, a death in the family, or transportation troubles may affect an otherwise superior employee’s ability to come to work. Similarly, a workplace accident could have been caused by faulty company equipment, not worker error. Because you now understand the limits of objective data, let’s turn to subjective performance measures, their limitations, and ways to keep this data fair and free from bias. Subjective Performance Measures The allure of objective performance measures has to do with their ability to provide bias-free information on all workers across a specific job. Of course, we now know that objective data can still be misleading. Further, most jobs require much more than simply looking at sales or production numbers, because most jobs are composed of a complex web of tasks, not all of which can be measured objectively. For example, a teacher’s performance must be made up of more than his or her students’ test scores, just as a police officer cannot be evaluated solely on the number of arrests he or she makes each month. While it is in most respects easier to measure things objectively, this can exclude important aspects of a situation that may not be objectively quantifiable but should still be taken into consideration. Examples in the context of performance appraisal include “friendliness” of a salesperson, “helpfulness” of a customer service representative, or “leadership potential” of a frontline employee. To account for these hard-to-quantify characteristics, I/O psychologists created
  • 100.
    subjective performance mea- sures,which rely on human judgment for evaluation and are thus exposed to some degree you83701_04_c04_103-134.indd 108 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 109 Section 4.2 Approaches to Measuring Performance of subjectivity and personal judgment. Of course, with personal judgment comes some per- sonal bias. To reduce such bias, evaluators must base their ratings on observations of worker behaviors that are critical for successful job performance. Furthermore, these behaviors must be identified by conducting an accurate job analysis. Interestingly, research shows only a small correlation between objective and subjective per- formance measures, which suggests that they measure different aspects of worker perfor- mance (Bommer, Johnson, Rich, Podsakoff, & McKenzie, 1995). Thus, the two sets of measures are complementary and should be used in conjunction whenever possible. Organizations use many types of subjective performance measures, ranging from manager- composed performance narratives to numerically oriented rating scales. Each method differs
  • 101.
    in complexity aswell as the amount of time required to create and implement it. The next sec- tion provides a brief review of some common subjective performance measures. Written Narratives With the written narrative, one of the easiest performance measures to develop, the man- ager writes a paragraph or two summarizing an employee’s performance over a certain length of time. An example of a written narrative is a reference letter written by a supervisor for an intern at the end of an internship. When used to measure performance, managers often share specific examples of the worker’s strengths and weaknesses, which the worker can then use to help improve his or her performance during the next appraisal cycle. Although written narratives are quick and easy, they have a number of drawbacks. First, every manager will set different evaluation standards, making it impossible to compare workers with different managers. As a result, the written narrative should not be used to make deci- sions about compensation, promotions, or layoffs. Second, managers vary in their quality of written communication skills. Some may use ambiguous, incomplete, or misleading language, which can lead the employee to misinterpret or not understand the feedback. Finally, manag- ers are often reluctant to address poor performance in a straightforward manner and some- times deliberately write the narrative to cast negative behavior in a positive light.
  • 102.
    The drawbacks ofthe written narrative have prompted I/O psychologists to develop a num- ber of techniques to both improve the objectivity of subjective performance measures and reduce managerial biases. Rank Ordering Rank ordering requires no forms or instruments and is the easiest way to evaluate workers. Managers simply rank their employees from best to worst. Some managers have the tendency to evaluate employees similarly. Rank ordering provides much- needed differentiation, even though the small differences between median employees still make it a challenge for manag- ers to rank employees. Rankings also do not provide workers with performance feedback, which means they are not useful for self-improvement or training guidance. Because of their limitations, rankings should be used only during periods of promotion, downsizing, reorgani- zation, or any other situation in which it would be valuable to understand a worker’s standing relative to other workers. you83701_04_c04_103-134.indd 109 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 110 Section 4.2 Approaches to Measuring Performance
  • 103.
    Paired Comparison As withrank ordering, the paired comparison technique requires the manager to evaluate a worker’s performance compared to the other workers on the team. In a systematic fashion, the manager compares one pair of workers at a time and then judges which of the two dem- onstrates superior performance. After comparing all the workers in all possible pairings, the manager then creates a rank ordering based on the number of times each worker is the better performer of a pair. For example, using the formula N(N - 1)/2)]N to determine the number of discrete pairings in a group, a manager with 10 employees would need to make 45 paired comparisons. A manager with a team of 20 employees would need to make 190 comparisons. As you can see, the number of pairs goes up quite quickly as the size of the team increases. For this reason, paired comparisons are only advantageous for smaller groups. Like general rank orderings, paired comparisons do not provide performance feedback. How- ever, they are generally simpler to use because managers need only compare one employee pair at a time, instead of the entire work team. Organizational leaders should keep in mind that rankings are not standard across the entire workplace. The lowest ranked member of a high-performing team might, for example, actually perform better than the highest ranked member of a poorly performing team. Forced Distribution When an organization needs to evaluate a large number of
  • 104.
    employees, forced distributionis a viable option. With this technique, managers place employees into categories based on pre- established proportions. A typical performance distribution uses the following performance categories and proportions: Superior 10% Above average 20% Average 40% Below average 20% Poor 10% Using this distribution for a team of 100 workers, a manager would identify the top 10 employees (10%) and the bottom 10 employees (10%) and place them in the superior and poor categories, respectively. From the remaining 80 workers, the manager would then select the next 20 highest performers (20%) for the above-average category and the next 20 lowest performers (20%) for the below-average category. The final 40 workers (40% of the origi- nal 100) would fall into the average category. The lowest performance group would then be assigned to additional training, reprimanded, put on probation, or terminated. This approach is most commonly associated with Jack Welch, former CEO of General Electric. The company eliminated the lowest 10% of performers every year using this method.
  • 105.
    Obviously, one ofthe major drawbacks of forced distribution is that it assumes that worker performance follows a normal distribution (some high performers, some low, most some- where in the middle). This method makes no concessions for teams that are filled with supe- rior performers or, conversely, teams fraught with poor performers. Furthermore, it makes no you83701_04_c04_103-134.indd 110 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 111 Teamwork Below average Above average AveragePoor Superior 2 431 5 Section 4.2 Approaches to Measuring Performance distinctions among workers in a category; all average workers, for example, are simply con- sidered average. Finally, as with rank ordering and paired comparisons, forced distribution
  • 106.
    can add artificialluster to “superior” members of poor- performing teams or unfairly tarnish “poor” members of a high-performing team. However, many organizations find forced distri- bution methods necessary to maintain a high-quality workforce in a competitive market. In order for forced distribution to realize its benefits, an organization should consider factors such as how low performers are treated and how high performers are differentiated from low performers in terms of rewards (Blume, Baldwin, & Rubin, 2009) Consider This: Leadership In this video, former General Electric CEO Jack Welch discusses his leadership strategies, including the value of the forced distribution approach. Jack Welch on Leadership Graphic Rating Scale Graphic rating scales are the most commonly used method for rating worker performance. Managers observe specific employee behaviors or job duties along a number of predeter- mined performance dimensions such as quality of work, teamwork, initiative, leadership abil- ity, and judgment. Then the manager rates the quality of performance for each dimension on a scale ranging from high to low performance. Looking at Figure 4.1, you can see that this employee received a below-average rating on teamwork. Figure 4.1: General graphic rating scale Teamwork
  • 107.
    Below average Above average AveragePoor Superior 2 4315 Each point on a graphic rating scale, called an anchor, is defined along the continuum. Anchors can vary in number, description, and depth of detail and can be stated in numbers, words, longer phrases, or a combination of these forms. Typically, the manager rates workers’ per- formance for each anchor using the 5-point rating scale, although 7- or even 9-point scales are not uncommon. you83701_04_c04_103-134.indd 111 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. https://www.youtube.com/watch?v=l5GryYk5hV8 112 Judgment: Evaluates situations and people effectively when choosing to respond. Makes ethical decisions. Is able to identify relevant and irrelevant issues in a situation and respond appropriately. Applies the
  • 108.
    correct standards and policies. Judgment PoorSuperior 2 431 5 Judgment: Makes sound decisions that affect his/her work Needs improvement Above expectations Meets expectations Unsatisfactory Exceeds expectations 2 431 5 1. Outstanding 2. Very good 3. Good 4. Improvement needed 5. Unsatisfactory Excellent Satisfactory UnsatisfactoryOutstanding Judgment: Makes clear,
  • 109.
    logical decisions Comments (D) (C) (B) (A) Section 4.2 Approachesto Measuring Performance Graphic rating scales are versatile, inexpensive, and quickly made. However, in order for the manager to make clear, accurate distinctions in worker performance across different dimen- sions, care must be taken to create specific and unambiguous anchor descriptions. For exam- ple, in Figure 4.2, Scale A uses only qualitative anchors and requires the rater to place a check mark at the point that represents the worker’s current performance level. This is a poorly designed rating scale because the rating anchors are left undefined. Similarly, Scales B, C, and D include both verbal and numerical anchors, but ratings rely solely on manager judgment. Figure 4.2: Examples of different graphic rating scales Judgment: Evaluates situations and people effectively when choosing to respond. Makes ethical decisions. Is able to identify relevant and irrelevant
  • 110.
    issues in asituation and respond appropriately. Applies the correct standards and policies. Judgment Poor Superior 2 431 5 Judgment: Makes sound decisions that affect his/her work Needs improvement Above expectations Meets expectations Unsatisfactory Exceeds expectations 2 431 5 1. Outstanding 2. Very good 3. Good 4. Improvement needed 5. Unsatisfactory Excellent Satisfactory UnsatisfactoryOutstanding Judgment:
  • 111.
    Makes clear, logical decisions Comments (D) (C) (B) (A) you83701_04_c04_103-134.indd 1124/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 113 Completes work at a high standard and by the deadline assigned 7 6 5 4 3 Always meet deadlines and require only minimal supervision
  • 112.
    Take on majorprojects beyond his or her duties with little or no supervision Ratee can be expected to… Responsibility 2 1 Never meet deadlines Sometimes accept as little responsibility as possible and often miss deadlines Accept ownership of projects only when assigned Regularly meet deadlines and accept responsibility for only his/her duties Section 4.2 Approaches to Measuring Performance Of course, this can be problematic, because one manager might judge his or her employees more or less stringently than another. Standard measures allow for clear comparisons across workers, even if they have different managers. Some organizations ask managers to provide written examples that support their ratings of employees for each performance dimension and/or for the employee’s overall performance level. By combining both their rating scores and written feedback, employees learn how their performance compares to the company’s expectations and what
  • 113.
    their current strengthsand weaknesses are. This allows the company to set goals or devise training strategies accord- ingly, and it allows the employee to seek out self-improvement or educational resources. Behaviorally Anchored Rating Scale First proposed by Smith and Kendall in 1963, the behaviorally anchored rating scale (BARS) attempts to evaluate workers’ performance of very specific behaviors critical for job success. These behaviors are established using the critical incidents job analysis technique discussed in Chapter 2. Developing a BARS can be a long and difficult process. To begin, a group of supervisors famil- iar with the job both identifies the performance dimensions (quality of work, teamwork, initiative, etc.) that should be measured and observes critical incidents that exemplify both effective and ineffective job performance. Another group of subject matter experts transforms the list of critical incidents into behavioral statements that describe different levels of perfor- mance. A final group evaluates the behavioral statements and assigns a numerical value scale to each. See Figure 4.3 for an example. Completes work at a high standard and by the deadline assigned 7 6 5
  • 114.
    4 3 Always meet deadlinesand require only minimal supervision Take on major projects beyond his or her duties with little or no supervision Ratee can be expected to… Responsibility 2 1 Never meet deadlines Sometimes accept as little responsibility as possible and often miss deadlines Accept ownership of projects only when assigned Regularly meet deadlines and accept responsibility for only his/her duties Figure 4.3: Example of a BARS you83701_04_c04_103-134.indd 113 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
  • 115.
    114 Greets customers uponarrival Bank teller... Correctly counts money Accurately describes product characteristics to customers Quickly addresses customer problems Never 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Sometimes Always Section 4.3 Sources of Performance Appraisal One positive feature of the BARS approach is that the behaviorally defined anchors are very explicit as to what performance criteria are being measured. This makes it much easier for managers to distinguish between high and low performers. Additionally, because the rating scale is standardized, managers can compare BARS performance ratings across individuals
  • 116.
    and teams. Furthermore,workers perceive BARS to have high face validity, which reduces negative reactions to low ratings. Despite the advantages, the significant time investment needed to develop BARS means that most organizations do not employ this technique. Additionally, this method’s overall rating quality still depends on each manager’s observational skills (or lack thereof ). Finally, research shows that the BARS is no more valid or reliable than any other rating method, nor is it more successful at decreasing rater error (Landy & Farr, 1980). Behavioral Observation Scale The behavioral observation scale (BOS) is similar to BARS in that both use critical incidents to identify worker behaviors observed on the job. The biggest difference between BOS and BARS is the rating format. Instead of quality, BOS rates the frequency with which a worker is observed to perform a critical job behavior (see Figure 4.4 for an example). Frequency is typically measured on a 5-point rating scale, comprising numerical (0%–20% of the time, 21%–40% of the time, etc.) or verbal assignments (sometimes, always, never, etc.) or a combi- nation. The ratings are aggregated across all behavioral statements to establish a total score. Some researchers have tried to determine which method—BARS or BOS—is superior, but the research has been mixed and inconclusive. Figure 4.4: Example of a BOS for a bank teller Greets customers upon arrival
  • 117.
    Bank teller... Correctly countsmoney Accurately describes product characteristics to customers Quickly addresses customer problems Never 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Sometimes Always 4.3 Sources of Performance Appraisal The goal of performance appraisal is to accurately measure employees’ performance. To do so, raters must directly observe an employee’s actions and behaviors. In many cases managers do not get to directly observe their employees. A police chief, for example, cannot accompany all of his or her officers as they perform their daily patrols, nor can a school principal sit in classrooms all day long. Who, then, should evaluate such workers? For many jobs, input from you83701_04_c04_103-134.indd 114 4/20/17 4:34 PM
  • 118.
    © 2017 BridgepointEducation, Inc. All rights reserved. Not for resale or redistribution. 115 Section 4.3 Sources of Performance Appraisal a variety of sources helps create a more accurate, well-rounded performance appraisal—for a police officer, these sources might include his or her partner or members of the community; for a professor, they might include students or fellow faculty members. The following section describes the most common sources of input. Supervisor Evaluation Supervisors are the most common source of input for a performance appraisal, and rightly so. After all, managers are in the best position to evaluate employees’ performance as it relates to the organization’s objectives. Furthermore, because managers are responsible for recom- mending rewards and punishments, they must be able to tie their evaluations to employees’ performance. Without this link between performance and rewards, employees can become less motivated, resulting in poorer performance. Indeed, research shows that supervisors’ performance ratings are more strongly correlated to employees’ actual performance than any other rating source (Becker & Klimoski, 1989). Peer Evaluations
  • 119.
    Peer evaluations, orthose made by one worker about a coworker, are common in jobs that require employees to work as a team. Peer feedback can be especially insightful. Coworkers often understand the job in greater depth than managers and can analyze how team mem- bers’ behaviors affect each other and contribute to the team’s success. Similarly, peer ratings on the dimension of leadership effectiveness provide a valuable perspective into a worker’s leadership skills and abilities. How do employees respond to peer evaluations? Generally, reactions are mixed. In some sit- uations, workers are appreciative because their peers are the only ones who ever directly observe their performance and are therefore the only ones who can accurately evaluate them. Furthermore, because peer ratings are nearly as accurate as supervisory ratings, they are excellent guides for self-improvement (Harris & Schaubroeck, 1988). On the other hand, workers may question the validity of a negative peer review, which can detrimentally affect their future performance. DeNisi, Randolph, and Blencoe (1983) found that workers who received negative peer-rating feedback went on to hold more negative perceptions of group performance, group cohesion, and overall satisfaction during the subsequent team task. Con- versely, positive peer feedback did not significantly affect any of these variables on the next task. Peer ratings, therefore, should serve as a supplemental— not sole—source of a worker’s performance evaluation.
  • 120.
    Subordinate Evaluations Subordinates areuniquely capable of assessing their manager’s actions and effectiveness across a broad range of dimensions, including delegation, coaching, communication, leader- ship, and goal setting. The process of subordinate evaluation, also called upward feedback, involves allowing subordinates to evaluate their superior’s performance. Some research supports the notion that upward feedback can improve management’s perfor- mance. In one study, subordinates rated their managers on a number of different dimensions you83701_04_c04_103-134.indd 115 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 116 Section 4.3 Sources of Performance Appraisal (quality, fairness, support, and communication). Superiors who received low to moderate rat- ings showed significantly more rating improvements 6 months later than those who received high ratings (Smither et al., 1995). Further research shows that managers can improve their ratings even more by discussing upward feedback with their subordinates (Walker & Smither, 1999).
  • 121.
    Confidentiality is criticalto ensure that subordinate evaluations are accurate. Many employ- ees fear there will be potential repercussions to giving managers negative feedback and will artificially inflate their evaluations if they know the manager will be able to identify them (Antonioni, 1994). Self-Evaluation Employees who complete self-evaluations, or evaluations of their own performance, feel as though they have a voice in the appraisal process, which in turn increases their acceptance of and decreases their potential defensiveness about the final ratings. Although typically used as supplemental evaluative data, self-ratings are especially useful with workers who work alone or independently. Generally, self-ratings show more leniency, less variability, greater bias, and less agreement with those provided by supervisors, peers, or subor- dinates (Harris & Schaubroeck, 1988). These differences could stem from the worker’s use of a different evaluative standard (Schrader & Steiner, 1996), but research has identified sev- eral ways to make self-ratings more accurate. First, workers can be told that their self-ratings will be compared against objective performance criteria. Second, the organization can make it clear that the self-evaluation will be used only for self-developmental purposes (Meyer, 1991). Third, educating both superiors and subordi- nates on the rating criteria and appraisal process leads to greater agreement between supervisor ratings and self-ratings (Williams & Levy, 1992).
  • 122.
    360° Appraisals 360° appraisalis a multisource evaluative process; it utilizes performance input from many different viewpoints. In this process, a manager might, for example, receive feedback from his or her supervisor, peers, subordinates, and internal or external customers, as well as conduct a self-evaluation. Normally, each source uses a rating scale to evaluate the manager’s current proficiency level for a predetermined set of job duties and/or leadership dimensions (coach- ing, delegating, communicating, etc.). After all ratings are complete, they are compiled in a report and shared with the manager. iStockphoto/Thinkstock Although self-ratings show more leniency and greater bias than those provided by supervisors or peers, employees who complete self-evaluations feel more engaged in the appraisal process, which can decrease their feelings of defensiveness regarding the final ratings. you83701_04_c04_103-134.indd 116 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 117 Section 4.3 Sources of Performance Appraisal
  • 123.
    Over the past30 years, 360° appraisals have become significantly more popular. In the mid- 1980s fewer than 10% of companies used this method to evaluate managers (Bernardin, 1986). Today, even though there is no exact percentage, it is likely that every Fortune 1000 company has some experience with conducting 360° appraisals. Interestingly, there is no con- sensus on how exactly to use these evaluations. Some believe it is appropriate to use them to make administrative decisions (Church & Bracken, 1997), but most disagree, suggesting they be used only for management development purposes (Antonioni, 1996; Coates, 1996). Some of this debate stems from concerns that employees may feel uncomfortable with rating their superiors and may fear the repercussions of providing an unfavorable evaluation. On the other hand, managers may feel that their peers or employees are unqualified to rate their performance because they may not have the full picture. However, despite differing opinions on how 360° appraisals should be used, research shows that consulting various sources pro- vides unique and distinct perspectives about an employee’s competencies, performance, and effectiveness (Semeijn, Van Der Heijden, & Van Der Lee, 2014). I/O psychologists recommend a number of practices to increase the effectiveness of 360° appraisals. First, both raters and the manager should receive instructions on how to interpret the different performance dimensions and the rating scale. Second, all participants must be
  • 124.
    explicitly told thatfeedback will only be used for the purpose of manager development. Fur- thermore, maintaining rater anonymity tends to prompt subordinates to view 360° apprais- als more positively, although, interestingly, managers tend to prefer that employees be held accountable for their ratings (Antonioni, 1994). Finally, to ensure the highest quality, a 360° appraisal program must include skilled coaches to help managers interpret and use their feedback to create goals and courses of developmental action (Coates, 1996; Crystal, 1994). Absence of such coaching can compromise the desired behavioral changes and performance improvements and can lead to disengagement (Nowack & Mashihi, 2012). Consider This: Seeking Feedback 1. Most of us often seek feedback on our performance in various life domains, whether at work, at school, at home, in sports, or at church. In which domains of your life do you normally seek or get feedback on your performance? 2. What are the advantages and disadvantages of each source of feedback? Find Out for Yourself: Performance Measures and Sources Review the following templates from the HR offices at University of California–Berkeley and University of California–Davis for examples of performance measures that utilize feedback from different sources as well as performance evaluations that use these measures.
  • 125.
    UC Berkeley AnnualPerformance Planning and Review UC Davis Employee Performance Appraisal Report you83701_04_c04_103-134.indd 117 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. http://hr.berkeley.edu/sites/default/files/attachments/FY16_Perf _Mgt_Form- Director_of_Finance_and_Strategic_Planning_RP.docx http://www.ucdmc.ucdavis.edu/hr/hrdepts/forms/MSP_EPAR.do c 118 Section 4.4 Sources of Rating Error and Bias in Performance Evaluation 4.4 Sources of Rating Error and Bias in Performance Evaluation Performance appraisal relies on the assumption that human judgment is capable of some degree of accuracy. However, humans are not objective observers. We can never be completely certain that our judgments are free from error or personal biases. Often, of course, errors are unintentional. In the workplace, rating errors can occur if managers do not observe workers’ performance or if they do not use the rating scale correctly. More insidious, managers can also harbor unacknowledged biases against certain types of workers. At other times, rating errors are intentional. Managers can deliberately inflate a
  • 126.
    poorly performing employee’srat- ings because they don’t want to jeopardize their relationship with the employee or because they don’t want to cause negative reactions. I/O psychologists have worked hard to identify, understand, and correct sources of rating error. Types of Rating Error In order to improve accuracy, one must first identify and understand error. With performance appraisals, rater errors typically fall into three major categories: observational errors, distribu- tional errors, and rating-scale errors. In this section, we review the specific rating errors associ- ated with each category and discuss how these errors reduce performance appraisal accuracy. Observational Errors As stressed throughout this chapter, appraisals must be based on thorough observation of an employee’s performance in order to be accurate. Without direct observation, managers may rate employees based on unreliable sources of information such as general impressions, observations from past ratings, or hearsay. Even if managers are able to observe workers’ performance, they are often unable to remem- ber more than an employee’s most memorable performance accomplishments (or failures). An average appraisal cycle lasts up to 12 months, and as you might expect, a manager will remember and thus more strongly emphasize recent performance, an effect called recency error. Of course, when a recency error occurs, a worker’s ratings do not accurately represent
  • 127.
    his or heroverall performance throughout the entire appraisal cycle. The best way to overcome recency error is simple: Reduce the amount of performance information the manager needs to remember. One practical way for managers to do this is to shorten the appraisal cycle by conducting more frequent appraisals throughout the year instead of once annually. Additionally, managers can improve recall by keeping a detailed performance log for each employee, especially at the beginning of the rating cycle. Regular feedback is important for motivation and development. Thus, performance appraisal should not be viewed as a once-a-year event but as an ongoing process. The timing and frequency of performance feedback should be determined by employee needs, situations that necessitate such feedback, and logical milestones in the employee’s performance goals. Distributional Errors Within an organization, evaluation standards tend to differ from manager to manager. Signifi- cant error occurs if managers inaccurately distribute rating scores along the rating scale. For you83701_04_c04_103-134.indd 118 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 119
  • 128.
    Section 4.4 Sourcesof Rating Error and Bias in Performance Evaluation example, some managers may clump everyone together with average scores. Others may be overly positive or afraid to give anyone negative scores. A third group of managers may be too stringent and thus give most of their employees low scores. These faulty judgments are called distributional errors. First, one of the most common forms of rating error in general is leniency error. In this sit- uation, managers have a low performance standard and rate their employees higher than their performance deserves. A graphic representation of a lenient manager’s rating scores will show that they tend to cluster on the positive end of the distribution. Even though the high scores could be a reflection of a truly high-performing team, research tells us that such a result is unlikely. Normal worker performance distribution tends to follow a bell-curve pat- tern, with some workers falling at the high and low ends but most clustering somewhere in the middle. Leniency error is very obvious when it occurs. For example, one study found that out of 12,000 federal employees, 85% received ratings at the superior level on their perfor- mance appraisals, whereas only 1% received ratings below the fully successful level (Marrelli & Tsugawa, 2009). It is extremely unlikely that this distribution is accurate. Research shows that rater personality characteristics such as
  • 129.
    agreeableness and conscien- tiousnesscould be linked to leniency error. In an experimental lab study, people with low con- scientiousness and high agreeableness rated their peers more positively, regardless of actual performance (Bernardin, Cooke, & Villanova, 2000). As previously discussed, a manager’s reluctance to give negative feedback is another common impetus for rating leniency. Most managers want to develop and maintain positive relationships with their subordinates, and offering positive feedback during a performance discussion is certainly more enjoyable than the alternative. Unfortunately, being lenient not only fails to challenge and improve workers’ future performance, it also makes it legally difficult to terminate a poor performer. The second form of distributional error is central tendency. Central tendency error occurs when a manager is reluctant to rate employees as either superior or inferior. As a result, ratings scores cluster around the middle of the performance scale. Third is severity error, in which a manager holds excessively high standards and rates employee performance as lower than it actually is. A severe manager’s ratings scores will cluster around the low end of the performance scale. Although these two distribution errors are less common than leniency error, they are just as problematic. Because they fail to address the real differences among employees’ performance, scores tainted by severity and central tendency error are worthless for making employee decisions.
  • 130.
    Find Out forYourself: Distributional Errors and Biases The next time you participate in a group activity or team project, rate each of the group mem- bers (excluding yourself ) on his or her performance and contribution to the project on a scale of 1–10. Then ask each team member to evaluate each of the other members (excluding him- self or herself ). What Did You Learn? 1. Are your evaluations consistently more lenient than, more stringent than, or compa- rable to your team members’ evaluations? you83701_04_c04_103-134.indd 119 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 120 Section 4.4 Sources of Rating Error and Bias in Performance Evaluation Rating-Scale Error Sometimes performance appraisal errors occur because the rater does not know how to use the rating scale correctly. In other cases a manager’s general opinion about a specific employee can color his or her ratings of all performance dimensions for that employee (Lance, LaPointe, & Stewart, 1994). This tendency, called the halo effect, is the most common form of
  • 131.
    rating-scale error andcan either artificially inflate or deflate ratings. For example, if a man- ager believes one of his or her subordinates is extremely smart, he or she might transfer that positive opinion to evaluations of other performance areas, such as collaboration, ethics, and loyalty. Basically, then, managers who rate workers high (or low) on one significant dimen- sion will go on to score them high (or low) on all other dimensions on the appraisal, especially if the other dimensions are not well defined or directly observed. One way to counteract the halo effect is for managers to rate all employees on the same dimen- sion before moving on to the next one. This helps managers keep employee performance in perspective. Another option is to use more than one source to rate employees. Although most researchers believe that the halo effect is present in almost all ratings and set- tings, some studies suggest that it is less prevalent—and less of a concern—than previously thought. In a review of past research, Murphy, Jako, and Anhalt (1993) concluded that, in the studies they examined, halo was not nearly as common as traditionally believed and, even when it did occur, did not negatively affect rating accuracy. Surprisingly, when organizations consciously try to control halo, they end up with less accurate ratings (Murphy & Reynolds, 1988). Organizations must therefore not become overzealous in their attempts to eliminate halo. Indeed, some employees really are very strong (or very weak) across all performance
  • 132.
    dimensions, and theirconsistent ratings reflect an accurate evaluation of their performance. Rater Biases Ratings can also be influenced by a worker’s per- sonal relationship with the evaluator. Similar-to- me error, for example, occurs when evaluators give higher ratings to workers whom they perceive to be like them (Wexley, Alexander, Greenawalt, & Couch, 1980). A study of 104 air force officers showed that familiarity between the officers and the aviators they debriefed, if it existed, positively affected the aviators’ ratings of the officers (Scotter, Moustafa, Burnett, & Michael, 2007). Other research has exam- ined personal characteristics such as attractiveness and demographic characteristics, each of which influences performance ratings. Another source of rater bias is implicit person the- ory. People tend to hold implicit theories about the extent to which a person can change. Those who adopt an incremental implicit theory believe in people’s malleability and ability to change, while those who adopt an entity implicit theory believe that people’s characteristics are inherently fixed and difficult to change. Research shows that managers who adopt an incremental theory are more likely to observe changes in employee behaviors from one performance appraisal to the next and are less likely to be Stockbyte/Thinkstock Performance ratings are often based on the evaluator’s personal bias regarding gender, age, and race. For
  • 133.
    instance, older workerstypically receive lower performance ratings than their younger coworkers. you83701_04_c04_103-134.indd 120 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 121 Section 4.4 Sources of Rating Error and Bias in Performance Evaluation bogged down by prior impressions in judging current performance than their entity theory counterparts. Thus, these managers are more likely to provide more accurate and objec- tive performance evaluations. Managers can be trained to use an incremental theory when evaluating their employees and thus become more effective performance evaluators (Heslin, Latham, & VandeWalle, 2005). In other words, when managers believe in their employees’ ability to change and develop, they are more likely to observe changes in their behaviors over time and to evaluate them more accurately and objectively. Thus, effective performance management systems should emphasize supervisor training in order for raters to have an appropriate mind-set for yielding accurate and beneficial results. Three demographic characteristics require particular attention:
  • 134.
    race, gender, andage. As you recall from Chapter 2, any tool organizations use to make decisions about employees (hiring, promotion, placement, compensation, termination, etc.) must not discriminate against pro- tected classes. In most organizations, performance appraisals provide important data used to make those decisions. You can easily understand, then, that any personal biases based on race, gender, and age are especially problematic for an organization if they significantly influ- ence performance appraisal ratings. Research on race, gender, and age biases has produced mixed results. In the category of race, research shows that overall, Black employees receive slightly lower ratings than White employees (McKay & McDaniel, 2006). However, because closer examination shows that all managers tend to give higher ratings for individuals of their own race, the overall results seem to be due to a combination of similar-to-me error and the higher proportion of Whites in managerial roles. Likewise, gender bias occurs in some situations, but it tends to be limited to situations of gender role incongruence. Specifically, male employees’ performance tends to be rated higher than female employees’ performance in traditionally masculine roles but lower in tradition- ally feminine roles, and vice versa for women (Pulakos, White, Oppler, & Borman, 1989). A more recent study of 448 upper level managers examined the equity of promotions for men and women and found that women needed to show significantly
  • 135.
    higher performance appraisal ratingsthan men in order to be considered for promotion (Lyness & Heilman, 2006). Finally, research shows that managers may favor younger workers. In general, older work- ers receive lower ratings than their younger coworkers, and the bias increases when older employees work for younger managers (Shore, Cleveland, & Goldberg, 2003). Improving Rater Accuracy Clearly, performance appraisals are susceptible to numerous types of rating errors, which affect the appraisals’ usefulness in employee decision making and self-improvement. To thwart the negative effects of error, I/O psychologists have identified several ways to increase rating accuracy. Let’s take a moment to review some of these techniques. Rating-Error Training One way to deal with rating errors is to train raters not to make them. Rating-error training involves teaching raters about the different types of rating errors along with practical strate- gies for avoiding them. When dealing with leniency error, for example, raters will first learn you83701_04_c04_103-134.indd 121 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
  • 136.
    122 Section 4.4 Sourcesof Rating Error and Bias in Performance Evaluation about the type of error (in this case, inflated ratings with little variability), then its possible causes (e.g., raters wish to maintain a positive relationship with the employee), and finally strategies to overcome it (e.g., rank order employees before rating each one). This process assumes that educating raters about potential errors and possible solutions can help them overcome their biases. Unfortunately, the perceptual biases that lead to rating errors (such as stereotypes, the need to please, and high expectations) are ingrained in our cognitive processes and are difficult to change. Rating-error training thus requires significant time and effort, with sessions typically lasting from 6 to 8 hours (Latham, Wexley, & Pursell, 1975). Yet even with extensive training, improved rating accuracy is short lived (Fay & Latham, 1982). Rating-Accuracy Training Like rating-error training, frame-of-reference (FOR) training aims to help raters improve their accuracy. With this method, raters learn about the type of employee performance their orga- nization wishes to see. They are given a frame of reference, or model, against which to com- pare their employees. If raters know the organization’s performance standard for a specific job position, they will be less likely to use their own idiosyncratic standards to judge a worker
  • 137.
    in that position. DesigningFOR training involves a number of important steps. First, the raters receive a clear and concrete definition of the various performance dimensions and rating scale included on the appraisal they will be using. Next, the trainer discusses work behavior that illustrates appropriate performance at each rating-scale level. After the raters have a basic understand- ing of the different performance expectations, they practice using the rating scale, usually by watching video scenarios of people working at various levels of performance and then attempting to appropriately rate them. Finally, the trainer goes over each scenario, discussing at which performance level each sample employee should have been rated. Research has found that FOR training does indeed lead to more accurate performance appraisal ratings (Woehr & Huffcutt, 1994). As an added bonus, research by Davis and Mount (1984) found that managers who received FOR training became more effective at creating development plans for their direct reports because they more clearly understood what the company expected of its employees. Rater Accountability Raters need to be motivated to make accurate rat- ings, but organizations often do not provide much incentive to do so. If your boss never underwent formal performance appraisal training, or if he or she was never really held accountable for the quality of his or her performance ratings, how
  • 138.
    iStockphoto/Thinkstock In calibration meetings,managers must discuss the ratings of employees in similar jobs. As such, calibration meetings increase accountability, as they require managers to justify their ratings of each employee to their peers. you83701_04_c04_103-134.indd 122 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 123 Section 4.5 Performance Management Systems motivated would he or she be to take performance appraisal seriously? In turn, how likely will you be to take it seriously when it is time for you to evaluate your own staff ? Unfortunately, the poor behavior modeled by some executives often provokes similarly cavalier attitudes among the company’s entire management team. If, however, managers are held accountable, they will take the appraisal process seriously, and their ratings will be more accurate (Harris, Ipsas, & Schmidt, 2008; Mero & Motowidlo, 1995). One way to increase accountability is for organizations to hold calibration meetings, at which managers discuss the performance and ratings of workers in similar jobs. As you might surmise,
  • 139.
    because they forcemanagers to jus- tify their ratings to their peers, these meetings increase rating consistency across employees (McIntyre, Smith, & Hassett, 1984). 4.5 Performance Management Systems So far in this chapter, we have focused on the need for and strategies used to improve the accuracy of performance appraisals. However, even the most accurate appraisal can fuel neg- ative manager and employee reactions if it does not include quality feedback. This knowledge has prompted some I/O psychologists to suggest that worker self-improvement, rather than rating accuracy, should be the focus of a performance appraisal (Ilgen, 1993). In order to build a successful self-improvement plan, an employee needs to have the following information: (a) clear performance expectations, (b) an understanding of his or her own current performance, and (c) suggestions from his or her manager on how to improve. Performance appraisal, then, is really part of a larger performance management system, which includes not only the appraisal but also the setting of performance expectations and continu- ous feedback. Recent research has presented serious criticism of conducting performance appraisals without integrating them into an effective performance management system. Com- mon examples include annual performance ratings that are not connected to daily operations, retrofitted ratings to justify pay increases that are not necessarily related to performance, and appraisals that result in limited or no pay differentials between high and low perform-
  • 140.
    ers (Pulakos, MuellerHanson, Arad, & Moye, 2015). In fact, Pulakos et al. (2015) presented an interesting and unique case study of Cargill, a very large organization in the food-service industry that has 149,000 employees in 70 countries. Cargill has abandoned performance ratings altogether, with minimal impact on the organization’s operations and effectiveness. When performance appraisals are integrated into a performance management system, how- ever, they offer important benefits that should not be overlooked. Thus, it is critical that orga- nizations do not consider performance ratings in isolation, but in terms of their links to stra- tegic decision making (Adler et al., 2016; Gorman, Cunningham, Bergman, & Meriac, 2016; Smither, 2015). Of course, if an employee truly wishes to improve his or her performance, the employee must be willing to listen to and act upon constructive performance feedback, something that will not happen unless the manager and employee are able to establish a strong, high-quality relationship with a significant amount of trust. In this section, we will review the three major components of the performance management system: building relationships and trust, pro- viding continuous feedback, and setting expectations. We will also discuss the effects of cross- cultural similarities and differences on performance management systems and practices. you83701_04_c04_103-134.indd 123 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for
  • 141.
    resale or redistribution. 124 Section4.5 Performance Management Systems Building Relationships and Trust Think about the last time you worked with a peer or for a manager you did not trust. How did you feel? Chances are, at one time or another, you experienced frustration, confusion, apa- thy, or even anger toward that person. Now imagine that the person took some time to offer you constructive performance feedback. How would you respond? Most of us would ques- tion this untrustworthy person’s intentions, withdraw from the situation, or even become hostile, ignoring the feedback or interpreting it in a negative way. Furthermore, most people would stay quiet and not question the feedback if they feared they might be punished for disagreeing. Actual negative feedback is even more detrimental, especially when little trust exists between the manager and the employee. Employees can become unmotivated and lose confidence (Kluger & DeNisi, 1996). When trust exists, however, workers are more likely to accept criticism openly, believing their manager has their best interests in mind. As you can see, a positive manager–employee relationship is necessary to have a positive per- formance appraisal experience, and trust is a prerequisite for executing an effective perfor-
  • 142.
    mance management system(Peterson & Hicks, 1996). In fact, research shows that the quality of the relationship between manager and employee has a stronger influence on reactions to performance appraisals than the favorability of the ratings or whether the employee par- ticipates in the appraisal process (Pichler, 2012). Relationship quality and perceived orga- nizational justice are key factors in employees’ satisfaction and buy-in (Dusterhoff, Cunningham, & MacGregor, 2014). Therefore, it is important for organizations to train man- agers how to build relationships and trust with their staff by keeping their commitments, displaying integrity, pro- viding timely feedback, and showing an interest in their subordinates. Providing Continuous Feedback During a typical annual postappraisal meeting, a man- ager will hold a one-sided conversation with an employee, reviewing the employee’s successes and failures from the most recent performance review cycle. As you may have determined from the opening exercises of this chapter, the meeting can quickly become tense, even hostile, if the feed- back includes criticism. Employees tend to instinctively deflect criticism, blaming it on forces outside their con- trol. Just think for a moment, though: If this meeting were a once-a-year occurrence, how desirable or useful would criticism be if it referred to something that happened 10 months ago? Could the employee even do anything now to fix the situation? Probably not. How might an employee react differently if, instead of annually, she or he received informal feedback on a more regular basis? Comstock/Thinkstock
  • 143.
    When managers provide continuousfeedback, employees feel less threatened and can immediately adjust their performance to solve existing problems and account for current demands. you83701_04_c04_103-134.indd 124 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 125 Section 4.5 Performance Management Systems Both scientific researchers and in-the-field practitioners advocate the habit of providing con- tinuous informal feedback to employees (Corporate Executive Board, 2011; Gregory, Levy, & Jeffers, 2008). More specifically, continuous feedback should be given in addition to, not at the exclusion of, the annual performance appraisal, and it must be provided immediately after any instance of effective or ineffective employee performance. Once-a-year feedback has never been very effective and is especially unhelpful for today’s users of instant-communica- tion systems such as Facebook and Twitter. Increasing its frequency makes feedback not only less threatening but also more helpful, because employees can actually use the information to solve existing problems or adjust performance to meet current
  • 144.
    demands. In addition toincreasing frequency, I/O psychologists have identified a number of other con- ditions under which employees will be more likely to adopt managers’ feedback suggestions: 1. Feedback must address specific employee behaviors or actions, not personal char- acteristics. It is best to use facts, data, statistics, or direct observations to support positive or negative feedback. 2. Feedback should involve two-way communication between managers and employ- ees. If employees feel they can express their views, they are more likely to be satis- fied with the feedback process (Dipboye & de Pontbriand, 1981). 3. Managers should provide constructive feedback only on behaviors the employee can control. 4. If feedback is negative, it should be constructive. Furthermore, managers should pro- vide support for the employees’ self-improvement. Setting Expectations Most of us truly wish to do our best at our jobs, but it is hard to do so if we do not understand what is expected of us. Managers can facilitate effective goal- setting sessions by following these guidelines (Latham, 2009b): 1. Encourage employee participation. Managers should not
  • 145.
    simply assign perfor- mancegoals. Rather, they should collaborate with employees to create mutually agreed-upon expectations. Participating in the goal-setting process increases an employee’s goal-aspiration motivation (his or her desire to meet the goal) and leads to the creation of more challenging goals. 2. Set clear, realistic goals. Goals that are specific and challenging yet achievable more successfully motivate workers to perform their best. 3. Determine specific achievement deadlines. Workers perceive open-ended goals as less urgent than those with deadlines and are therefore less likely to achieve them. 4. Link performance to rewards. Employees tend to follow the adage “What gets rewarded gets done.” Employees should always understand how achieving (or not achieving) goals will impact them and how accomplishments will be rewarded. 5. Facilitate priority setting. Not all goals hold the same importance. Managers should help employees prioritize and order goals according to their importance. you83701_04_c04_103-134.indd 125 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
  • 146.
    126 Section 4.5 PerformanceManagement Systems Cross-Cultural Perspectives Cross-cultural differences have substantial effects on business practices. Many organizational practices do not transfer readily across cultures and thus require adaptation (Hofstede, 1980; House et al., 1999; House et al., 2004). Performance management is particularly prone to such differences and necessary adaptations. The United States is largely an individualistic culture, which leads to an emphasis on individual success and accomplishment. By contrast, collectivistic cultures, often found in Eastern and Middle Eastern countries, place a high value on saving face and maintaining harmony. Thus, negative feedback can be perceived as too confrontational and may have far more damaging effects on relationships. Even positive feed- back in the form of public recognition can be uncomfortable to high-performing employees in collectivistic cultures; they may prefer to be recognized privately in order to avoid alienating their colleagues (Sumelius, Björkman, Ehrnrooth, Mäkelä, & Smale, 2014). That is why in col- lectivistic cultures, it is usually recommended that recognition occurs in a private, one-on-one setting. Another distinct difference can be seen in how cultures view power distance. In the United States there is a strong emphasis on equality and fairness. It is
  • 147.
    acceptable for employeesto express their opinions freely, even when they disagree with their managers. Power and status are earned based on achievements. However, in some cultures, power and status differences are readily accepted and rarely challenged. Examples include most Asian and South Ameri- can countries. In such cultures, employees will likely conform to the process and accept its results, but they will be less likely to actively participate in setting their ratings or disagree with their managers. Similarly, in high power distance cultures, ratings by peers or subordi- nates are unlikely to be authentic and may not be accepted as legitimate or taken seriously because they are not assigned by a superior (Sumelius et al., 2014). In such cultures, tradi- tional top-down performance evaluations are usually more effective. Because in many countries pay, promotions, and status continue to be highly based on senior- ity, performance appraisals may be perceived as an imposed Western practice, and thus not taken seriously or adopted wholeheartedly. This can at best make the process a waste of time, and at worst a counterproductive practice that can damage relationships and cause divisive- ness. This notion is particularly relevant to multinational corporations when they attempt to implement uniform practices across their global operations. It is thus necessary and valuable to adapt various features of performance management systems to cross-cultural differences (Sumelius et al., 2014).
  • 148.
    Finally, it isimportant to note that cultural diversity can exist even within one country or geo- graphic location, and cultures can differ substantially from one area of a country to another. For example, in the United States cultural differences exist between the West Coast, East Coast, Midwest, and South, as well as from rural to urban to suburban locations. Organizations often find it necessary to adapt business practices across operations, even within the same country. Managers should also understand these differences when dealing with employees from differ- ent backgrounds, whether these involve local, regional, national, or international differences. you83701_04_c04_103-134.indd 126 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 127 U.K. and U.S. = Good job, okay Japan = Money Russia and France = Zero Brazil = Insult U.S. = Good job, approval U.S. (palm facing inward or outward) = Peace, victory, 2
  • 149.
    Australia, U.K., SouthAfrica (palm facing inward) = Insult Australia, Greece, Middle East = Insult Germany, Hungary = 1 Japan = 5 Section 4.6 The R in ROI: Linking Performance Evaluations to Financial Results—Friend or Foe? 4.6 The R in ROI: Linking Performance Evaluations to Financial Results—Friend or Foe? Performance measurement is extremely important, for the numerous reasons discussed ear- lier. However, performance appraisals can also become a laborious exercise that I/O psychol- ogists or HR departments push on the rest of the organization to serve other, less important goals. For example, managing the performance appraisal system can become a goal in itself for those whose job is to ensure that the system is functional and well maintained. It justifies the jobs they are holding and the salaries they are paid. When that is the case, the managers who perform the evaluations start to view the performance appraisal system as a formality and do not take it very seriously. They may fill in the necessary forms, but the accuracy of their ratings will likely be questionable. Another common but often less effective approach to performance appraisals is to limit their
  • 150.
    outcomes to onlydetermine annual salary increases. Especially in a tight economy, when Consider This: Delivering Compliments Across Cultures Even simple gestures can mean different things across cultures. Consider how you would interpret the gestures in Figure 4.5. In some cultures, these gestures would be considered positive and complimentary. In others, they could be impolite or even offensive. Managers should be sensitive to their employees’ cultural backgrounds. They should not assume that anything they say or do in their own cul- ture would be readily accepted in other cultures. Similarly, organizations should keep cross- cultural differences front and center when making strategic decisions. Figure 4.5: Cross-cultural interpretations of the same gestures U.K. and U.S. = Good job, okay Japan = Money Russia and France = Zero Brazil = Insult U.S. = Good job, approval U.S. (palm facing inward or outward) = Peace, victory, 2 Australia, U.K., South Africa (palm facing inward) = Insult
  • 151.
    Australia, Greece, Middle East= Insult Germany, Hungary = 1 Japan = 5 you83701_04_c04_103-134.indd 127 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 128 Section 4.6 The R in ROI: Linking Performance Evaluations to Financial Results—Friend or Foe? overall payroll allocations are frozen, relatively stable, or even decreased, linking perfor- mance appraisals solely to annual raises is unlikely to be conducive to performance outcomes. For example, when the best performers get a raise of only 4% and the worst performers get a raise of 2%, you can see why the difference is unlikely to translate into any constructive performance feedback for either employee. In cases where there is a limited pool of resources to distribute, high performers often feel guilty about getting a raise at the expense of their colleagues, rather than feeling appreciated or rewarded. Conducting performance appraisals only to determine annual raises also ignores their purpose of providing continuous construc-
  • 152.
    tive performance feedback.As discussed earlier, employees need to receive feedback on their performance more often. This feedback should be provided to help them better align their per- formance with the organization’s goals. When appraisals are always linked to pay, appraisal sessions become mostly about money, rather than about performance improvement. Performance appraisals are also often used in conjunction with layoff decisions. Even though this is a legitimate use, if they are only used to justify these decisions, they become perceived as a way for the organization to provide a legally defensible paper trail. This diminishes the appraisals’ value, and the process becomes resented and distrusted by managers and employ- ees alike. Again, although the above uses of performance appraisals are important and legiti- mate, their primary use should be as a tool with which to objectively measure performance and facilitate its improvement. So how do organizations, managers, I/O psychologists, and HR departments design truly effective performance appraisals systems? It is critical to choose the correct measures. As discussed earlier, the correct measures should be readily linked to the organization’s goals and its success. This chapter offers numerous ways to enhance the quality of performance measures. However, in the words of sociologist William Bruce Cameron (1963) in his book Informal Sociology: A Casual Introduction to Sociological Thinking: “Not everything that can be counted counts, and not everything that counts can be counted”
  • 153.
    (p. 13). ThePareto efficiency principle, also known as the 80–20 rule, posits that in most situations, 80% of outcomes are caused by 20% of the inputs. In performance appraisal, this means that it would be most effective for managers to focus on their employees’ most critical behaviors, which constitute about 20% of everything the employees do on a daily basis, because these critical behaviors cause 80% of the outcomes that truly impact the organization’s success and effectiveness. Although not the only approach, performance measures that are directly linked to financial results tend to be perceived as objective and fair because they reflect an employee’s true value to an organization. They are also critical for making resource allocation decisions, because they put HR decisions on a par with other investments. For example, the financial value of an employee’s performance can justify the costs of hiring or retaining that employee over outsourcing the position or investing in a piece of equipment that would allow the job to be automated. Should an appraisal system, then, attempt to capture the financial value of every aspect of an employee’s performance? Absolutely not! The concept of opportunity cost, introduced in earlier chapters, implies that appraisals should only capture those performance dimen- sions where the benefits of measurement exceed the costs. For example, even though it is easy to quantify stationery consumption, the cost of policing employees to be less wasteful in
  • 154.
    their use ofinexpensive stationery items can be higher than the cost savings that may accrue from these initiatives (such as saving paper). Similarly, many organizations install expensive you83701_04_c04_103-134.indd 128 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 129 Section 4.7 Toward More Positive Appraisal Sessions equipment or have their managers waste numerous hours managing their employees’ atten- dance when the costs of the few minutes that an employee may come in late or leave early far exceed the costs of monitoring. What, then, should an appraisal system track? The Pareto efficiency principle implies that it should focus on performance dimensions that would be of high enough financial value to jus- tify the cost. The key is not always the absolute cost or benefit of a performance dimension but rather the variation of that cost or benefit. To use our previous example, the difference between the most and least conservative use of stationery is not substantial. In measuring the perfor- mance dimensions that would have the most substantial effects on financial outcomes, empha- sis should be on the dimensions with the highest variability and the ones that are “pivotal” to
  • 155.
    performance (Cascio &Boudreau, 2011). For example, at an upscale restaurant, the most piv- otal performance dimension for cooks is their cooking skills. On the other hand, the most piv- otal skills of the wait staff are their social skills. Although cooks need to have some social skills to effectively deal with the wait staff and restaurant management, social skills are not pivotal for cooks. They can still be subjectively evaluated (e.g., using a narrative or a rating scale), but attempting to place a financial value on those skills is both impractical and unnecessary. Consider This: Appraising Performance Dimensions That Really Matter Choose a job that would be of interest to you. It can be your current job, a job you held in the past, a job you hope to have in the future, or just a job that you have come across in a job- opening announcement or advertisement. Describe the job in detail. For more information, you may search for job descriptions of similar jobs. Questions to Consider 1. How would you measure the performance of the incumbent of this job? What are the most important dimensions to measure, according to the Pareto efficiency principle? 2. What are the performance dimensions that will likely exhibit the most variability (the most pivotal dimensions)? 3. Which dimensions will be most readily linked to financial
  • 156.
    results? Explain. 4. Whichdimensions should be subjectively evaluated? 5. Which dimensions should be ignored and not evaluated? Note that this is an important decision because it can significantly affect the efficiency and effectiveness of a perfor- mance appraisal system. It is also a decision that is often neglected or inadequately addressed. 4.7 Toward More Positive Appraisal Sessions As humans, we have a tendency to overemphasize and amplify the magnitude of negativ- ity in our lives (Baumeister, Bratslavsky, Finkenauer, & Vohs 2001). Negative stimuli tend to receive more of our attention and energy. For example, threatening personal relationships have been shown to receive more of our thought time than supportive ones, and blocked goals tend to receive more thought time than those with open and available options (Klinger, you83701_04_c04_103-134.indd 129 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 130 Section 4.7 Toward More Positive Appraisal Sessions Barta, & Maxeiner, 1980). Performance appraisal is no exception. It is much easier to dwell on our
  • 157.
    own or others’faults than to acknowledge talents, strengths, and positive performance attributes. Doing the latter requires intention. So why do humans in general tend to focus on neg- ativity? The tendency to overemphasize negativity has been attributed to primitive survival mecha- nisms in reaction to perceived physical danger. In civilized societies, overemphasis on negativity has been attributed to four psychological factors that are comparable to these survival mechanisms: intensity, urgency, novelty, and singularity (Cam- eron, 2008). The first factor is the intensity of negative stimuli. Because negative events are perceived as threatening, they are experienced more intensely. Second is the sense of urgency that negative stimuli place on our perceptions and action tendencies, because something is wrong and needs to be fixed. Positive stimuli do not pose the same sense of urgency, because ignoring positive stimuli does not pose as much risk as ignoring negative stimuli. Third is the perceived novelty of negative events. Believe it or not, a lot of what is going on in most peo- ple’s lives is positive. That’s why it tends to go unnoticed. Negativity is the exception. That’s why it gets more attention. Fourth, one of the unique characteristics of negativity is what is referred to as singularity. Imagine a system with one defective component, a body with one ailing organ, a team with one counterproductive employee, or a family with one dysfunctional member. A single negative component is capable of tainting the performance of the
  • 158.
    collective, which causesthat single negative component to really stand out and alert the rest to the need to somehow remedy the problem. On the other hand, positivity tends to be more general and global. One positive component alone does not necessarily make a system better. One good employee alone usually cannot make an organization successful. One healthy organ alone cannot make the whole body healthy. This singularity makes the effect of negativity more pronounced and far reaching. Paradoxically, humans also have a natural tendency, referred to as the heliotropic tendency, to gravitate toward what is pleasurable (i.e., positive) and away from painful or uncomfort- able stimuli. However, this tendency tends to be overwhelmed by the intensity, urgency, nov- elty, and singularity of negativity and needs to be encouraged through intentional decisions and actions. That is why although most managers recognize their tendency to overemphasize their employees’ weaknesses, faults, and mistakes and wish they could be more positive, they cannot. For example, they may get overwhelmed by the urgent need to address the dysfunc- tional behaviors of their worst employees and end up with no time to interact with and praise their better ones for their consistent positive behaviors. Moreover, those consistent positive behaviors may no longer stand out; they may be taken for granted. A manager may even for- get to recognize them when appraising these employees’ performances. So how can managers overcome their negative tendencies and
  • 159.
    conduct more positiveperfor- mance appraisal sessions? First, a manager needs to recognize the importance of positivity, which was introduced in Chapter 3. Although extreme positivity is unnecessary and can even Cartoonstock.com you83701_04_c04_103-134.indd 130 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 131 Section 4.7 Toward More Positive Appraisal Sessions be dysfunctional, research supports the idea that humans thrive and flourish in a positive environment (Keyes, 2002). So managers need to intentionally create positive interactions with their employees, especially when they need to counterbalance a negative interaction, such as when it is necessary to give negative feedback. In performance appraisal sessions, managers should put in the effort to find and comment on positive aspects of their employ- ees’ performance. This requires the art of catching employees doing something right instead of the common practice of focusing on problems and mistakes. You might think that this hand-holding is more necessary for new or inexperienced employees and that more mature employees or more established relationships can tolerate less
  • 160.
    positivity. However, research showsthat even more positivity is needed in more complex settings such as top management teams and marital relationships (Fredrickson, 2013; Gottman, 1994). Consider This: Positive Performance Appraisal Sessions In order to conduct more positive performance appraisal sessions, managers need to be more positive when collecting and sharing performance information. What follows are two exam- ples of positively oriented practices that can be used to replace the negative practices often used by managers. Example 1 Negative: Reprimand workers when late. Positive: Praise and reward workers who are consistently on time. Rationale: Workers who are on time will know that their positive behavior is noticed and appreciated instead of ignored. Late workers will start coming on time to get the manager’s attention and receive rewards. Example 2 Negative: Criticize an employee for weaknesses (e.g., having poor people or leadership skills). Positive: Find and acknowledge the employee’s strengths that parallel those weaknesses (e.g., being an independent thinker or being willing and able to follow
  • 161.
    directions). Suggest changes inrole to better fit employee strengths and/or training opportunities to develop lacking skills. Rationale: Weaknesses may be based on stable personality traits that cannot be readily changed (e.g., introversion). In these cases role changes are more likely to lead to improve- ments in performance. In others training and development are more likely to be perceived as opportunities. Criticism is more likely to be perceived as a threat. It is important to note that positive feedback and recognition have been found to positively influence performance almost as much as financial rewards (Luthans & Stajkovic, 1999; Stajkovic & Luthans, 2003). Since positive feedback and recognition hardly cost managers and organizations anything, it may seem surprising that they are not used more often or more effec- tively. Furthermore, millennials rate development opportunities as more important than pay (McCarthy, 2015). When feedback is framed positively and offered as a way to develop, it can have important motivational effects that facilitate the attraction and retention of employees. you83701_04_c04_103-134.indd 131 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 132
  • 162.
    Summary and Conclusion Summaryand Conclusion Organizations that excel in their ability to measure their employees’ performance are at a significant competitive advantage. They place themselves in a favorable position to select the best employees, capitalize on their talents and skills, promote them into their areas of excellence, adequately reward them, and retain them for enough time to reap significant returns on their investment in them. They also have an edge in accurately and promptly identifying performance deficiencies and pursuing corrective measures. Accurate perfor- mance appraisals can be a source of information for the organization, a communication vehicle for managers, and a much needed feedback process for employees. However, in many instances, performance appraisal systems can be plagued with so much subjectivity that they not only defeat their purpose but become counterproductive. They can be perceived as inequitable, which can damage morale and performance. They can also be discriminatory, which may result in legal costs and damage to the organization’s reputation. Thus, a poorly designed performance appraisal system may be worse than not having one at all! Organizations should view the process of designing and maintaining a well-functioning
  • 163.
    performance appraisal systemas a worthwhile investment, rather than just an expense, a formality, or a necessary evil. Accurate measures should be designed to assess the most pivotal dimensions of performance. They should be integrated with other HR systems, such as recruitment, selection, training, and compensation, as well as with overall organizational goals and strategies in order to ensure full utilization and impact on the organization’s bot- tom line. behaviorally anchored rating scale (BARS) A method for rating worker perfor- mance in which performance dimensions and critical incidents that exemplify both effective and ineffective job performance are identified, transformed into behavioral statements that describe different levels of performance, assigned numerical values, and used as anchors in a typical graphic rat- ing scale. behavioral observation scale (BOS) A method for rating worker performance that is similar to BARS, except that instead of performance quality, it rates the frequency with which a worker is observed to perform a critical job behavior. forced distribution A subjective perfor- mance measure in which managers place employees into performance categories based on preestablished proportions. graphic rating scale The most commonly
  • 164.
    used method forrating worker perfor- mance, in which managers observe specific employee behaviors or job duties along a number of predetermined performance dimensions and rate the quality of perfor- mance for each dimension on a scale ranging from high to low. objective performance measures Quanti- tative measures of performance that can be found in unbiased organizational records. Key Terms you83701_04_c04_103-134.indd 132 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. 133 Summary and Conclusion paired comparison A subjective perfor- mance measure in which managers system- atically evaluate each worker’s performance compared to the other workers on the team by comparing one pair of workers at a time, judging which of the two demonstrates superior performance, and then creating a rank ordering based on the number of times each worker is considered the better per- former of a pair.
  • 165.
    performance appraisal Theformal pro- cess in which employee performance is assessed, feedback is provided, and correc- tive action plans are designed. rank ordering A subjective performance measure in which managers rank their employees from best to worst. subjective performance measures Per- formance measures that rely on human judgments. 360° appraisal A multisource evaluative process that utilizes performance feed- back from a variety of sources, such as an employee’s supervisor, peers, subordinates, and internal or external customers, as well as self-evaluation. written narrative A subjective perfor- mance measure in which the manager writes a paragraph or two summarizing a specific employee’s performance over a certain length of time. you83701_04_c04_103-134.indd 133 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. you83701_04_c04_103-134.indd 134 4/20/17 4:34 PM © 2017 Bridgepoint Education, Inc. All rights reserved. Not for
  • 166.
    resale or redistribution. Multimedia Dimoff,D. (Producer). (2011). Performance evaluation (Links to an external site.)Links to an external site. [Video segment]. In D. S. Walko, & B. Kloza (Executive Producers), Managing your business: Prices, finances, and staffing. Retrieved from https://fod.infobase.com/OnDemandEmbed.aspx?token=42251& wID=100753&loid=116118&plt=FOD&w=420&h=315 · The full version of this video is available through the Films On Demand database in the Ashford University Library. This video discusses the role of performance reviews, and provides guidance. This video has closed captioning. It may assist you in your discussions, Assessment in the Workplace Diversity and the Organizational Process, this week. Accessibility Statement (Links to an external site.)Links to an external site. Privacy Policy Marofsky, M., Grote, K. (Writers), Christiansen, L., Dean, W. (Directors), Christiansen, L., & Hommeyer, T (Producers). (1991). Understanding our biases and assumptions (Links to an external site.)Links to an external site. [Video file]. Retrieved from https://fod.infobase.com/OnDemandEmbed.aspx?token=2574&w ID=100753&plt=FOD&loid=0&w=640&h=480&fWidth=660&f Height=530 · The full version of this video is available through the Films On Demand database in the Ashford University Library. This video discusses the nature of biases and preconceptions, and it stresses the need to examine one’s own thinking about “us” and “them.” This video has closed captioning. It may assist you in your discussions, Assessment in the Workplace Diversity and the Organizational Process, this week. Accessibility Statement (Links to an external site.)Links to an
  • 167.
    external site. Privacy Policy(Links to an external site.)Links to an external site. Preparing for my appraisal: Cutting edge communication comedy series (Links to an external site.)Links to an external site. [Video file]. (2016). Retrieved from https://fod.infobase.com/OnDemandEmbed.aspx?token=111702 &wID=100753&plt=FOD&loid=0&w=640&h=360&fWidth=660 &fHeight=410 · The full version of this video is available through the Films On Demand database in the Ashford University Library. This video shows several examples of different work appraisals, showing the “dos” and “don’ts” and providing helpful tips. This video has closed captioning. It may assist you in your discussions, Assessment in the Workplace Diversity and the Organizational Process, this week. Accessibility Statement (Links to an external site.)Links to an external site. Privacy Policy (Links to an external site.)Links to an external site. Twitter: Login (Links to an external site.)Links to an external site. (2017). Retrieved from https://twitter.com/login Accessibility Statement (Links to an external site.)Links to an external site. Privacy Policy (Links to an external site.)Links to an external site. Twitter Support (Links to an external site.)Links to an external site.(2017). Retrieved from https://support.twitter.com/articles/215585# Accessibility Statement (Links to an external site.)Links to an external site. Privacy Policy (Links to an external site.)Links to an external site. What your boss wants: Business (Links to an external site.)Links to an external site. [Video file]. (2013). Retrieved from
  • 168.
    https://fod.infobase.com/OnDemandEmbed.aspx?token=94142& wID=100753&plt=FOD&loid=0&w=640&h=360&fWidth=660& fHeight=410 · The fullversion of this video is available through the Films On Demand database in the Ashford University Library. This video gives an insider’s perspective on what makes a good job application, a successful interview, what to expect in the induction process, and the types of assessments at the end of the probationary period. This video has closed captioning. It may assist you in your discussions, Assessment in the Workplace Diversity and the Organizational Process, this week. Accessibility Statement (Links to an external site.)Links to an external site. Privacy Policy (Links to an external site.)Links to an external site. Supplemental Material Rosser-Majors, M. (2017). Week Two Study Guide. Ashford University. Recommended Resource Multimedia Bandura, A., Jordan, D. S. (Writers), & Davidson, F. W. (Producer). (2003). Modeling and observational learning – 4 processes [Video segment]. In Bandura’s social cognitive theory: An introduction. Retrieved from https://fod.infobase.com/OnDemandEmbed.aspx?token=44898& wID=100753&loid=114202&plt=FOD&w=420&h=315&fWidth= 440&fHeight=365 · The full version of this video is available through the Films On Demand database in the Ashford University Library. In this video, Albert Bandura explains the four processes of observational learning. He also describes the Bobo doll experiment on the social modeling of aggression. This video has closed captioning. It may assist you in your discussions, Assessment in the Workplace Diversity and the Organizational Process, this week.