Rater Issues in Performance ManagementMichael RosePsychology.docx

Rater Issues in Performance Management
Michael Rose
Psychology 601
Overview
Possible sources of performance information
Rater Motivation
Rater Training Programs
Case Study 6-4
Frame-of-Reference training
Article 1
Article 2
Article 3
Summary
Possible Sources of Performance Information (Raters)
Possible Sources
Supervisors
Peers
Subordinates
Self
Customers
Disagreements among raters
Not necessarily a problem
Behavioral indicators may vary across sources.
Important to define the target behavior clearly for all raters.
If disagreements are found, the importance of each source must

be determined.
Rater Error Motivation
Raters may intentionally or unintentionally distort ratings.
Raters may be motivated to inflate or deflate ratings.
Motivation to provide accurate ratings.
Rater expects certain positive or negative consequences.
Probability of receiving rewards will be high if they provide
accurate ratings.
Motivation to distort ratings.
Rater expects certain positive or negative consequences.
Probability of receiving rewards will be high if they distort
ratings.
Motivations to Inflate or Deflate Ratings
Motivations for inflated ratings
Motivations for deflated Ratings
Maximize the merit raise/rewards
Encourage Employees
Avoid creating a written record
Avoid confrontation with employees
Promote undesired employees out of unit
Make the manager look good to his/her supervisor
Shock an employee
Teach a rebellious employee a lesson
Send a message to the employee that he/she should consider
leaving
Build written record of employees poor performance

Preventing Conscious Distortion
Convince raters that they have more to gain by providing
accurate ratings.
Increase accountability.
Have raters justify their ratings
Have raters justify their ratings face-to-face
Provide rater training
Rater Training programs
May cover the following topics:
Reasons for implementing the performance management system.
How to identify and rank job activities.
How to observe, record, and measure performance.
Information on the appraisal form and system mechanics.
How to minimize rating errors.
How to conduct an appraisal interview.
How to train, counsel, and coach.
Case Study 6-4
Provide a detailed discussion of the intentional and
unintentional rating distortion factors that may come into play
in this situation.
Evaluate the kinds of training programs that could minimize the
factors you have described. What do you recommend and why?
Frame-of-Reference Training

Improves rater accuracy by familiarizing raters with the
performance dimensions to be assessed.
Typically involves:
Discussion of the job description for the individual being rated.
Review of the definition for each dimension to be rated.
Discussion of examples of good, average, and poor
performance.
Trainees rate fictitious employees.
Trainees informed of correct ratings for each dimension.
Article 1
Ratings of counterproductive performance: the effect of source
and rater behavior.
Mann, S. L., Budworth, M., & Ismaila, A. S., (2012)
Purpose
To examine whether there is inter-rater agreement on
counterproductive performance between self and peer-ratings.
To examine factors that moderate inter-rater agreement.
Factors examined include: self reported levels of
counterproductive behaviors, conscientiousness, and integrity.
Hypotheses
Hypothesis 1: Peer-ratings of counterproductive performance
are significantly higher than self-ratings of counterproductive
performance.
Hypothesis 2 : Conscientiousness moderates the relationship
between rating source and rater agreement such that individuals
with similar levels of conscientiousness demonstrate agreement

for self and peer-ratings of counterproductive behaviors.
Hypothesis 3: Values toward integrity moderates the
relationship between rating sources and rater agreement such
that individuals with similar levels of integrity demonstrate
agreement for self and peer-ratings of counterproductive
behavior.
Hypothesis 4: Individuals who exhibit similar levels of
counterproductive performance, as rated by their peers,
demonstrate agreement for self and peer-ratings of
counterproductive behaviors.
Results
Hypothesis 1: Supported. Peer-ratings (m = 2.1) were
significantly higher than self-ratings (m = 1.4).
Hypothesis 2: Not supported. Conscientiousness was not a
significant moderator of the relationship between rating source
and rater agreement.
Hypothesis 3: Not supported. Integrity was not a significant
moderator of the relationship between rating sources and rater
agreement.
Hypothesis 4: Supported. Individuals who exhibit similar levels
of counterproductive performance, as rated by their peers, are
more likely to agree on ratings of counterproductive
performance. Estimated effect = 0.39 (p < 0.001).
Practical Implications
Individual differences between the rater and the individual
being rated may have a significant impact in an organizational
settings.
Provides support for 360 feedback on counterproductive
performance, as sources were shown to provide unique
feedback.

Understanding peer ratings is important due to the increased
number of teams in the workplace.
Article 2
Rater personality and dimensions weighting in making overall
performance judgments.
Ogunfowora, B., Bourage, J., (2010).
Purpose
Examined the effects of rater personality on the performance
appraisal process.
Specifically, the influence of rater personality on the relative
weights which raters placed on different performance
dimensions was investigated.
Honesty-humility, openness
Hypotheses
Hypothesis 1: Rater honesty-humility will positively relate to
weights which are placed on items associated with maintaining
personal discipline.
Hypothesis 2: Rater openness will positively relate to weights
which are placed on items associated with adaptive
performance.
Results
Hypothesis 1: Not directly supported, as higher levels of
honesty-humility did not (p > .05) relate to increased weights on
personal discipline.

Though modesty was positively related to personal discipline (p
< .01).
Hypothesis 2: Supported. Raters higher in openness weighed
adaptive performance significantly higher than those lower in
openness (p < . 01).
Practical Implications
Indicates that organizations must communicate a standard
theory of performance to their employees.
Organizations must account for systematic differences in raters.
(ex: Supervisors likely to be systematically different than other
sources in openness. Expect systematically different ratings.)
Supports the use of frame-of-reference training.
Article 3
Rater training revisited: An updated meta-analytic review of
frame of reference training.
Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U.
(2012).
Purpose
To demonstrate that not all measures of accuracy are equally
improved by frame-of-reference training (FOR).
To investigate how much FOR training protocols differ.
Findings/Implications
FOR does not impact all measures of accuracy equally.

Best for training raters to recognize patterns of performance.
Therefore, improved the raters ability to rank order the
employees who they were rating.
Provides support for FOR training as an effective rater training
method.
Overview
Presented possible sources of performance information
Identified various rater motivations.
Made suggestions on how to overcome intentional or
unintentional distortions.
Completed case study 6-4
Introduced Frame-of Reference training
Article 1: Difference between sources.
Article 2: Supported a standard theory of performance (FOR
training).
Article 3: Identified situations which are most impacted by FOR
training.
References
Aguinis, H. (2013). Performance Management. Indiana: Pearson
Mann, S. L., Budworth, M., & Ismaila, A. S. (2012). Ratings of
counterproductive performance: the effect of source and rater
behavior. International Journal of Productivity and
Performance Management, 61, 142-156.
Ogunfowora, B., Bourdage, J., & Lee, K. (2010). Rater
personality and performance dimension weighting in making

overall performance judgments. Journal of Business and
Psychology, 25, 465-476.
Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U.
(2012). Rater training revisited: An updated meta-analytic
review of frame-of-reference training. Journal of Occupational
and Organizational Psychology, 85, 370-395.
OL 324: Milestone Two Guidelines and Checklist
Prompt: Your Milestone Two submission will be a detailed
outline that must incorporate each critical element of the final
project. Use the checklist below as a guideline for each critical
element that must be covered in this outline. For the purpose of
this milestone, copy and paste the critical elements into an
outline and add one to two bullet items per critical element.
Each bullet item should serve as the basis of your information
for that critical element. The intent of this outline is to ensure
that you are on the right track for your final project. A
comprehensive outline will provide a solid foundation as you
develop and complete your final project.
This assignment will be graded pass/fail. You must address all
of the critical elements to receive credit for this assignment.
Requirements of Submission: Written components of this
project must follow these guidelines: double spacing, 12-point
Times New Roman font, one-inch margins, and APA-style
citations.
Research: Include at least two of the four required sources of
research that you will incorporate into your final project. The
source of the research is all that is required for this outline.

Instructor Feedback: Students can find their feedback in the
Grade Center.
Checklist for Outline
Critical Elements
Are all elements covered?
If not, what is missing?
Instructor feedback
Company Background and History
Description of Quality Issue
Quality Culture
Voice of the Customer
Change Management Plan
Quality Theories
Quality Tools and Techniques

Implementing Change
Resistance to Change
Expected Outcomes
Research
RATER ISSUES IN PERFORMANCE MANAGEMENT
Vanessa Beckles
Psychology 601
Performance Assessment

1
Overview
Who Should Provide Performance Information
Rater Errors Motivation
Rater Motivation to Inflate and Deflate Ratings
Prevent Rater Distortion
Reasons for Rater Training Programs
Rater Error Training
Case Study
Criteria for Evaluating Errors
Article 1
Article 2
Article 3
Summary Conclusion

2
Supervisors
Peers
Subordinates
Self
Customers
Handling Disagreement across sources
Ratings may not be similar due to different levels of
engagement with employee

3
Rater Errors Motivation
Performance ratings may be intentionally or unintentionally
distorted or inaccurate
Raters behavior are influenced by
The motivation to provide accurate ratings
Rater expects positive or negative consequences
The probability of receiving these rewards and punishments will
be high if accurate ratings are provided
The motivation to distort ratings
Rater expects any positive or negative consequences
The probability of experiencing such consequences if ratings
are indeed distorted
4
Rater Motivation to Inflate and Deflate Ratings

Reasons supervisors may inflate ratings
Maximize the merit raise/rewards
Encourage employees
Avoid confrontation with employees or jeopardize relationship
Uncomfortable talking about weaknesses
Make the manager look good to his/her supervisor
Reasons to provide artificially deflated ratings
Shock employee
Responsible for too many people to evaluate them accurately
Build a strongly documented, written record of poor
performance
Failure to remember accurate performance behaviors
5
Prevent Rater Distortion
How to prevent conscious distortion of ratings
Provide incentives so raters have more to gain than to lose by
giving accurate ratings

Accountability in the system of rating
Overly lenient appraisals are challenged
Raters justify their ratings
Raters justify their ratings in face-to-face meeting
Training Programs
Improve skills in evaluating performance
6
Reasons for Rater Training Programs
Training programs to address intentional and unintentional rater
errors in appraisals should include:
Reasons for implementing the PM system
How to identify and rank job activities
How to observe, record, and measure performance
How to minimize rating errors
How to conduct an appraisal interview
How to train, counsel and coach

7
Primary Purpose
Conscious awareness of possible errors; discriminating among
raters (differential elevation)
Content Training
Focus on leniency, halo, range of restriction
Liability of Approach
May use inappropriate response sets may be a “ rating effect”
can reduce the rating effect but may also lower accuracy
Primary Value
Use to make decisions that require distinguishing accurately
among employees
Rater Error Training (RET) most useful when the primary goal
is to distinguish accurately among workers

London, M. Mone, E. M., & Scott, J.C. (2004)
8
Case Study: Our Civil Service
At the State Employment Service, a number of employment
counselors were hired together during a special recruiting effort
12 years ago in 2000. They formed a cohort, went through
training together, and received graduate hours in vocational
counseling together.
About a year ago, Jane Midland, the first member of the
cohort to get promoted, tested into a supervisory position at one
of the Job Service Centers. Two of the eleven employees who
report to her are members of the 2000 cohort. Barb Rick and
George Malloy deeply respect her abilities and have a strong
affection for her. In fact, Barb Rick has spent time at Jane’s
home watching their children play together and helping with the

remodel of Jane’s house. George, Jane, and Barb get together
for lunch regularly. Recently, they have considered attending
evening classes together to get a master’s degree in Human
Resource Management.
Yesterday, Jane received a memo from management
reminding her that it is time to complete the annual appraisal
forms for her staff.
Discuss the factors that may cause Jane to intentionally
and unintentionally distort her ratings of Barb and George.
Evaluate the kinds of training programs that could help
minimize the factors you have described. What do you
recommend and why?
Aguinis, (2013)
9

Criteria for Evaluating Errors
Psychometric
Indirect measures
Rater Error
Most common
Indirect measures
Rating Accuracy
Direct measures
Rare
Usually in a laboratory
Murphy, K. R. & Cleveland, J.N. (1995).
10
Article One
Explaining the Weak Relationship Between Job Performance
and Ratings of Job Performance

Murphy, K. R. (2008)
11
Overview
Ways of Improving Performance Appraisals
Researchers and Practitioners regard performance ratings as the
Rodney Dangerfield of HRM
“Rarely do they get much respect”
Some argue that they should be banned entirely
Survival of performance appraisals is primarily due to limited
alternatives
Three General Models of the Relationship Between Job
Performance and Performance Ratings

12
Improving Performance Appraisals
Researchers/Practitioners
Behavioral anchors
Identifying specific types of rating errors
Leniency error, halo error
Rater Training
Focus on improving quality of ratings versus avoiding specific
errors
360 degree evaluations
Peer, supervisor, subordinates, and clients
Organizational Strategies
Forced Distribution Systems
Identify weakest performers
Called “rank and yank”
Group and Discussion Review systems
Require raters to compare, discuss, and justify evaluations
Both help calibrate raters and discourage unrealistically lenient
or harsh ratings
Raters vulnerable to social influences

13
Three General Models
Numerous models in Organizations for performance-
performance rating relationships
One Factor Models
Multi-Factor Models
Mediated Models

14
One Factor Models
Most Popular
Offer few explanations for improving raters’ limited ability to
evaluate subordinates.
Multi-Factor Models
Useful starting point for improving appraisals
Weakness between rater and performance is not entirely the
fault of the rater
Situational constraints a factor
Influence of nonperformance
Mediated Models
Engage raters as willing and motivated partners

15
Article Two
The Impact of Non-Performance Information on Ratings of Job
Performance: A Policy-Capturing Approach
Spence, J. R., & Keeping, L. M. (2010)
16
Overview
Policy–capturing approach to performance appraisals
Background
Motives for Rating Distortion
Personality

17
Policy-Capturing Approach
“Scenario based research methodology that allows researchers
to determine how individuals utilize various pieces of
information to arrive at judgments”
Spence, J. R., & Keeping, L. M. (2010)

18
Background
People have difficulty rating other human beings
Challenge for managers to transition from being leaders to
being judicial evaluators
Fear of repercussions
Three most commonly discussed reasons for non-performance
rating distortion
Avoidance of negative consequences of ratings
Organizational norms
Opportunity to advance self-Interest

19
Motives for Rater Distortion
Avoidance of negative consequences
Raters may intentionally manipulate ratings to avoid
uncomfortable situations
Connection with lenient appraisals
Avoid negative feedback
More lenient when giving face-to-face feedback
Potentially damaging interpersonal relationships
Alter performance ratings as preventive behavior for conflict
20
Organizational Norms
Reflect behavior that is acceptable in the workplace
Raters are influenced by the norms within an organization
Ratings may be altered by what managers perceive to be
permissible behaviors
Relationship between rating accuracy and rating error

Situational factors such as organizational norms can influence
raters motivations to rate accurately or inaccurately
21
Self-Interest
Managers distort ratings to make themselves look competent or
receive incentives
Inflate employees ratings to gain resources or to gain favor with
the leaders
Some managers use employee’s rating to self-enhance the
perception of the department
Desire to manage impressions

22
Personality
Two areas based on the Five Factor Model of personality traits
may provide some explanation for rater accuracy and inaccuracy
Conscientiousness
Less elevated ratings
Strongly controlled by long term performance
More focused on the big picture goal more accurate ratings
Agreeableness
Managers high in agreeableness may produce more lenient
appraisals
Produce more elevated ratings

23
Article Three
Using a Frame-of-Reference Training to Understand the
Implications of Rater Idiosyncrasy for Rating Accuracy
Uggerslev, K. L. & Sulsky, L. M. (2008)
24
Overview
What is Frame-of-Reference (FOR) Training?
Two levels of theories
Performance Theories and Rater Idiosyncrasy
Performance Theories Idiosyncrasy and Rating Accuracy

25
Frame-of-Reference Training
Primary Purpose
Develop shared performance schema with organization;
discriminating among dimensions of performance (stereotype
accuracy) and discriminating among ratees within organizations
(differential accuracy)
Content of Training
Examples of normative (poor, average, good) behaviors for
behavioral dimensions
Liability of Approach
Does not reduce rating effect
Primary Value
Used to make decisions that require comparing employees on
different performance dimensions—e.g. job assignments and
placements, and feedback for development and goal setting
London, M. Mone, E. M., & Scott, J. C. (2004)

26
Performance Theory and
Rater Idiosyncrasy
Levels of Theory Idiosyncrasy
The differences between raters’ implicitly held theories, and the
normative performance theory imparted during FOR training
Two forms of rater idiosyncrasy
Performance Dimensions
Performance-Related Behaviors
Both contain errors of omission and commissions

27
Performance Theory Idiosyncrasy and Rating Accuracy
Rater accuracy before FOR training:
Hypothesis 1: Prior to training, the more dimensions in
performance evaluation the less accurate appraisals will be
Hypothesis 2: There should be better results for raters who had
less dimensions (omission) than the organization for evaluation
than the raters who had more dimensions (commission) than the
organization
Hypothesis 3: Training will improve rating accuracy for all FOR
trainers, such that trainees with relatively high performance
theory idiosyncrasy will improve significantly more than
trainees with lower idiosyncrasy.

28
Results
Hypothesis 1
Supported: Idiosyncratic raters have the most to gain and that
there was negative relationship between idiosyncrasy and rating
accuracy prior to raters’ receiving training
Hypothesis 2
Somewhat supported: Relationship between idiosyncrasy and
rating accuracy may depend on both the degree of idiosyncrasy
and the form of idiosyncrasy
Hypothesis 3
Somewhat supported: Most trainees were at least mildly
idiosyncratic with each aspect of the performance theory
Idiosyncrasy“: extra or less dimensions than the organization
for performance evaluating performance. Different that what is
perceived to be normal evaluative or organization guidelines
29

Summary
Who Should Provide Performance Information: peers,
customers, subordinates, and you
Discussed reasons for rater errors motivation
Several reasons was presented for rater motivation to inflate and
deflate ratings
Presented ways to prevent rater distortion
Provided several reasons for rater training programs
One example of rater error training
Analyzed a case study
Covered different types of criteria for evaluating errors
Article 1
Article 2
Article 3
30
References
Aguinis, H. (2013). Performance management. Indiana: Pearson
London, M., Mone, E.M., & Scott, J.C. (2004). Performance

management and assessment: Methods for improved rater
accuracy and employee goal setting. Human Resource
Management, 43, 319-336.
Murphy, K.R. (2008). Explaining the weak relationship between
job performance and ratings of job performance. Industrial and
Organizational Psychology, 1, 148-160.
Murphy, K.R., & Cleveland, J.N. (1995). Understanding
Performance Appraisal: Social, Organizational, and Goal-Based
Perspectives. Chapter 10: Error and accuracy measures.
Thousand Oaks, CA: Sage.
Spence, J. R., & Keeping, L. M.,(2010). The impact of non-
performance information on ratings of job performance: A
policy-capturing approach. Journal of Organizational Behavior,
31, 587-608.
Uggerslev,K.L., & Sulsky, L. M., (2008). Using frame-of-
reference training to understand the implications of rater
idiosyncrasy for rating accuracy. Journal of Applied
Psychology, 3, 711-719.
31

Rater Issues & Performance Management
Ashley Durrant
PSYC 601 – Performance Assessment
Fall 2015
Overview
Who can provide performance feedback?
What influences rater behavior?
Types of Bias
Self-Serving, Leniency, Centrality…
A unique example of bias in the workplace
How can we prevent rater distortion?
Case Study 6-4
2
Importance/Implications of PM Ratings
There are many benefits to a well implemented PM system
Increased motivation and self-esteem
Job criteria are clarified
Employees become more competent; misconduct is minimized
Org. change is facilitated
Employee engagement enhanced
When one link is broken, the whole system fails.
If raters do not provide accurate information, this is a failure in

Who should provide performance information?
Employees should be involved in deciding which of the above
sources will rate the various dimensions of their performance.
Without going into too much detail, just sum this up
4
WHY DO WE CARE?
“REGARDLESS OF WHO RATES PERFORMANCE, THE
RATER IS LIKELY TO BE AFFECTED BY BIASES THAT
DISTORT THE RESULTING RATINGS” (AGUINIS, 2013, P.
150)
Rater Behaviors Influenced by:
Motivation to provide accurate ratings
Are there consequences for being inaccurate?
Are there rewards for having accurate ratings?
Motivation to distort the ratings
As a means toward achieving other goals

Supervisor May Inflate Ratings:
Maximize the merit raise/reward
Encourage employees
Avoid confrontation with employees
Promote undesired employees out of unit
Make manager look good to his/her supv.
(Aguinis, 2013)
Supervisor May Deflate Ratings:
To shock an employee into action
Imply that an employee should leave
Build a strongly documented, written record of poor
performance
(Aguinis, 2013)
Types of Rater Bias
Self-Serving Bias
Leniency Bias
Centrality Bias
Self-Serving Bias

Do you think a supervisor would be more likely to rate a
specific trait as important if they themselves also possessed that
trait?
Workers were asked to self report their level of competency on
a given measure and also report the value of that competency to
the job through a worker-oriented job analysis survey.
Analysis of this historical data of government employed clerical
workers (N = 26, 682) found statistically significant positive
correlations across all competencies between self-rated
performance and importance ratings of competencies.
(Cucina et al., 2012)
While this study did not examine performance appraisal
specifically, the findings are still applicable in making the point
that it is human nature to rate those qualities which are inherent
in oneself as important over those which one does not possess.
9
Correlations at each competency, standardized within each
occupation
Additional analysis was run to rule out common source
variance.

Standardizing across occupations also helped to account for
false strong correlations due to a competency not applying to a
certain job title.
The results show that importance ratings could be somewhat
bias upward or downward depending on incumbent standing on
a certain competency.
10
Leniency Bias
The tendency to rate performance as better than it actually is,
this correlation is most noticeable when looking at poor
performing employees.
Bol (2011) supported the claim that manager tend to display
both leniency and centrality bias. Leniency was related to the
managers’ desire to avoid confrontation with poor performers.
When the employee-manager relationship is weaker, rater’s tend
to display less leniency bias (Bol, 2011).
Centrality Bias
The clustering of ratings toward a middle value score; not
providing any extreme value ratings.
Bol (2011) supported the claim that managers tend to display
both leniency and centrality bias. Centrality was related to the
opportunity-cost of obtaining relevant performance information.

Bias & Opportunity Cost
Collecting performance information can be costly for a
manager, this will vary based on location of employees, how
similar job duties are between supervisor and incumbent, and
how often the supervisor works directly with the incumbent.
When there are limited natural opportunities to collect this data,
supervisors lack the complete picture. Bol (2011) also found
evidence to support that this lack of complete information will
lead to centrality bias.
Bias & Impact on Future Performance
Even if an objective rating is not a valid benchmark of
performance, employees will use this benchmark for comparison
and deriving perceptions (Bol, 2011).
Centrality bias has a negative influence on performance
improvement for below-average performers
Centrality bias has a negative effect on all employees’
incentives
Employee performance over time increases when leniency bias
is present
There is also likely a stronger employee-incumbent relationship

when leniency bias is present.
14
Another example of Bias
Employee start times, supervisor beliefs, and impact on
performance appraisal.
Start times – stereotypic belief held by many supervisors that
employees whom start early are better employees and more
conscientious.
This did have an impact on performance ratings, but only when
the supervisor believed in this stereotype that late starting
employees viewed as being less ‘conscientiousness,’
When supervisors hold to this stereotype, employees are more
likely to be rated poorly
Even though flexible work schedules are established to help
retain good employees through increasing work/life balance,
employees with late start times are putting themselves at risk of
receiving lower performance scores (all other factors being
equal).
(Yam, Fehr, & Barnes, 2013)
Methods for preventing rater distortion
Managers should understand the reasons for implementing the
performance management system.
Training On:
How to Identify & Rank Job Activities
How to Observe, Record, & Measure Performance
The appraisal form and system mechanics
How to minimize rating errors (be aware of bias)
How to train, counsel, & coach
(Aguinis, 2013)

Case Study / Activity
6-4, p. 165
Provide a detailed discussion of the unintentional rating
distortion factors that may come into play in this situation?
Evaluate the kinds of training programs that could minimize the
factors you have described? What do you recommend and why?
(Aguinis, 2013)
Summary & Review
What we covered
What we learned
We covered motivation factors for providing accurate or
inaccurate performance ratings
We reviewed three types of bias: Self-Serving, Centrality, &
Leniency
We also saw a unique example of bias
We learned that various individuals can provide performance
information
We learned methods for preventing rater distortion
We came up with ideas to trim down bias in organizations
through the case study activity
References
Aguinis, H. (2013). Performance management. Upper Saddle

River, NJ: Pearson Prentice Hall.
Bol, J. C. (2011). The determinants and performance effects of
managers’ performance evaluation bias. The Accounting
Review, 86(5), 1549-1675.
Cucina, J.M., Martin, N.R., Vasilopoulos, N.L. & Thibodeuax,
H.F. (2012). Self- serving bias effects on job analysis ratings.
The Journal of Psychology, 146(5), 511-531.
Yam, K.C., Fehr, R., & Barnes, C.M. (2014). Morning
employees are perceived as better employees: Employees’ start
times influence supervisor performance ratings. Journal of
Applied Psychology, 99(6), 1288-1299.
Rater Issues in Performance Management
By Amanda Deane
*
Ch. 6 p. 134-143 and Ch. 7 p. 161- 168
OverviewRater error overviewFOR training and rater
idiosyncrasiesThe effect of rater’s goals on performance
ratingsRater errors in 360 degree feedbackConclusion

*
FOR training and rater idiosyncrasiesUsing Frame of Reference
Training to Understand the Implications of rater Idiosyncrasy
for Rating Accuracy
Uggerselev & Sulsky (2008)
What is FOR training?
Define theory Idiosyncrasy
Why are raters idiosyncratic? – performance dimensions and
performance-related behaviors
Error of Omission v. Error of Commission
*
FOR- enhances rater accuracy by calibrating raters such that
they have a common conceptualization of what constitutes
performance effectiveness across the performance continuum.
Levels of theory idiosyncrasy= the difference between raters
implicitly held theories and the performance theopry imparted
during FOR training (normative training theory)
Raters are idiosyncratic because of the dimensions they use to
evaluate performance- they might be different than the
dimensions the organization uses – can consider various aspects
of performance when making an overall conclusion
Error of omission- when the normative theory contains
performance dimensions not included in the rater’s implicit
performance theory/ rater’s dimensional schema does not
contain behaviors included in the normative training theory
Error of commission- when a rater’s implicit theory contains
additional dimensions to those comprising the normative theory

Types of Rater ErrorSimilar to me Contrast Leniency
*Severity*Central
tendency*HaloPrimacyRecencyNegativityFirst
impressionSpilloverStereotypeAttribution
*
* intentional
Types of Rater Error Training ProgramsFrame of Reference
TrainingBehavioral Observation TrainingSelf Leadership
Training
HypothesisH1- Prior to training, rating accuracy will be
negatively correlated with rater idiosyncrasyH2- Rating
accuracy improvements following FOR training will be greater
for trainees with higher omission idiosyncrasy than higher
commission idiosyncrasyH3- Training will improve rating
accuracy for all FOR trainees, such that trainees with relatively
high performance theory idiosyncrasy will improve significantly
more than trainees with lower idiosyncrasy
*
Rating accuracy depends on conforming to the normative theory
that informs the development of the comparison scores used in
the computation of accuracy
Perseverance Effect- People are less willing to accept
information that is counter to info already in their schemas,
being confronted with contrary info can even make them more

accepting of their original schema- increasing resistance to new
info
Articulating the performance theory may be a benefit for rating
accuracy in and of itself
ResultsFOR training program was effective in teaching raters to
evaluate professor lecturing performance more accurately than
control training.Found a negative relationship between
idiosyncrasy and rating accuracy prior to rater’s receiving
training.Relationship between idiosyncrasy and rating accuracy
may depend on both the degree of idiosyncrasy and the form of
idiosyncrasy- omission v. commissionFor ¾ idiosyncrasy
measures raters with low idiosyncrasy still improved, just not as
much as raters with high idiosyncrasy following FOR training
*
Continuous measure of rater idiosyncrasy developed for
performance dimensions by comparing the dimensions students
identified with the dimensions from the normative training
theory created for the study
First study to address the relationship between rater’s
performance theory idiosyncrasy and rating accuracy
Study showed that mere exposure or practice to rating forms
was not sufficient to increase accuracy, the FOR training itself
is the mechanism for accuracy improvement.
Individual idiosyncrasy measures did not reliably predict rating
accuracy, however when the measures are combined the
expected relationship emerges
At dimension level training was more effective in improving
accuracy for trainees with initially higher levels of omission
when combined with lower commission than with higher
commission- perseverance effect may extend only to situations
where an individual is asked to remove something from a

preexisting schema (error of commission)
At behavior level- trainees with higher levels of both forms of
idiosyncrasy had the highest accuracy improvements- perhaps
there is a provision in an individual’s performance schema that
there is no absolute or limited set of behaviors that link to a
dimension- rater schemas at the behavioral level may include
the provision that there are multiple paths to the same end
No trainee held a performance theory that was not at least
mildly idiosyncratic to the normative theory- there was also
considerable variability among trainees in terms of the
dimensions and behaviors they identified as part of their
implicit performance theories – all trainees may be
idiosyncratic!
For training was initially intended only for raters who hold
performance theories that are highly idiosyncratic to the
organizationally adopted theory.
How rater goals effect rater errorRaters Who Pursue Different
Goals Give Different Ratings
Murphy, Cleveland, Skattebo & Kinney (2004)
Rater goals influence the ratings they give- they provide ratings
consistent with their goals
Goal of study- to provide an empirical test of the proposition
that the goals raters claim to emphasize when evaluating
performance are related to the ratings they give.
*
Rater errors and other psychometric deficiencies in ratings
might not be the result of errors or limitations in the rater’s
capacity, but rather might reflect the effects of strategic
decisions on the part of raters about the sorts of ratings they
should record.
Raters pursue a number of goals when completing performance

appraisals – using ratings to maintain harmony, motivate
subordinates to perform better)
The goals raters actually pursue are not always the same as the
goals the organization would like them to pursue and conflicts
between the official purpose of performance appraisal systems
and the ways raters actually use these systems can substantially
affect the utility of performance appraisal.
If raters goal is to motivate employees he will give higher
rating that encourage them, even if the ratings aren't accurate
Many of the support systems and interventions in performance
appraisal appear to be based on the questionable assumption
that raters are trying their best to provide accurate ratings but
they lack the skill and knowledge to do the job- more likely that
they are able but chose to give ratings that advance their goals
HypothesisH1- Measures of rating goals most strongly
emphasized by raters will account for a substantial portion of
the variance in the performance ratings they assign H2-
Measures of goal importance obtained after the rater has
observed the ratee will account for variance in performance
ratings not accounted for by ratings of goal importance
collected before observing the ratees performance
*
University teacher evaluation- conservative test of link between
goals and evaluations because the ratings don’t have immediate
consequences for the rater, they are anonymous and the rater
has little to lose or gain by giving overly positive or negative
ratings. Produces weaker goal effects- even stronger links when
raters have strong incentives to give positive or negative ratings
In this context can partially control for ratee performance on
raters goals and on the relationship between goals and ratings
Info about raters goals obtained before they observed

performance and after
Rater goals would be influenced by ratees level of performance
so hypothesized that importance of rating goals would change
over time and these changes would be related to the rater’s final
evaluation of the ratee.
Results Ratings of goal importance obtained at the beginning of
the semester before students observed teacher performance
predicted ratings of teacher performance collected at the end of
the semester. Both in the individual and pooled
sample.Corelational study between classes found that rater
goals might in part might reflect stable orientations on the part
of the rater even without external incentives Changes in ratings
of goal importance might in part reflect the rater’s evaluation of
the ratees performance (strength only)
*
Pilot study identified 4 goals- identifying strength and
weaknesses, giving fair evaluations and motivating the ratee.
Pilot study- students filled out goal questionnaire at same time
of evaluation ½ before, ½ after- goals were related to ratings
given however because both were done at the same time it is
possible that the links were spurious due to priming effects or
common method effects
All had high ratings, differences across teachers accounted for
14% of variance in teacher ratings.
Goal importance ratings collected at the end of the semester
were better predictors of performance than goal ratings
collected at the beginning of the semester
Only modest stability in the goals raters pursue over time
Sample was pooled to control for teacher differences
Because of how study was set up reverse causation is not likely
nor is priming effects

Raters who evaluated the ratees performance more favorably
were more likely to rate the goal of conveying info about
strengths more important at end of semester than at the
beginning
Are raters at different organizational levels rating the same
constructs?Measurement Equivalence of 360 degree Assessment
Data: Are Different Raters Rating the Same Constructs?
Hannum (2007)
-This study used data collected using a 360 assessment
instrument to investigate the structural equivalence of ratings
according to rater type, controlling for organizational level.
-Raters inappropriately classified in previous research.
-Rater group differences due to three things
-Construct, scaling and reliability
*
Concerning ranks- researchers use the evaluator’s relationship
to the individual being rated (peer, boss/supervisor) as a proxy
for organizational level- not good because rater type alone is
not a measure of organizational level with respect to either
member of the dyad.
When organizational level is termed by rater type, researchers
have found little evidence that rating source matters as
differences of individual rater effects, rating differences could
still be organizational level but when rater type is used a s a
proxy the effect of organizational level is spread out across
rater types.
Unless the same constructs are being evaluated it would be
inappropriate to compare mean scores
Rater group differences may be attributable to real differences
in the behavior of the ratees, different understanding of
constructs, and differences associated with a difference of

application of measurement ratings
For performance ratings to be directly comparable we must rule
out that differences are attributable to construct, scaling or
reliability differences across groups.
Method and ResultsRater types included in sample: boss, peer
and direct report. Only ratings of upper middle managers were
used Multi-group SEM model demonstrated marginally-adequate
fit across rater types at different organizational levels
Information from the various rating sources can be
combined
*
** in order to control variance associated with organizational
level
Majority if sample was white males
360 degree assessment instrument was the Prospector
Boss ratings tended to be slightly higher than the other two
Rated on seven scales: learns from people and events, acts with
integrity, adapts to cultural differences, is committed to making
a difference, seeks broad business knowledge, is insightful: sees
things from new angles, and has the courage to take risks.
SEM was used to test the equivalence of the proposed models
for reach rater group – systematic plan for investigating the
structure of variance across groups
They analyzed model fit
The multi group SEM models were employed to determine the
degree to which data from the rater groups were structurally
simillar
They created a hypothesized rater for each rater type for each
manager in order to control for sample size – the sample size
for each rater group was equal to the number of target managers

Why was there marginal evidence of invariance?Complexity
theoryContingency theory
*
Jacques and Clement (1994) complexity theory- as leaders rise
through the ranks of an organization they obtain a broader and
more complex understanding of the organization and thus what
it takes to be successful within the organization. The impact of
this altered perspective may be a slightly different
interpretation of rating schemes employed in 360 assessment,-
reinforces organizational level as a variable of interest separate
from rater type because level is a closer approximation of work
complexity
Fiedler and Chemers (1982)- managers need to behave
differently in different situations in order to be effective-
because leaders may act and therefore may be perceived
differently in different settings ratings from different sources
may be based on incongruent evidence
Rater group effects are marginal
Case StudyMacaroni GrillIFCOLife Wire
*
The most important goal is make the process transparent so that
employees
understand what they are being asked to perform. Particularly
when filling

out performance appraisals, ratees should understand exactly
what they are
being rated on and how ratings were arrived at. Their
perception of
fairness and accuracy means everything to their acceptance and
subsequent
action to improve performance (or maintain high performance).
Type of error that is most common in your experience?
The most common error I've experienced/witnessed is rater's
tendency to rate
everyone favorably. This eliminates the chance of
distinguishing between
truly excellent performance, and average or poor performance.
And how ratings at different levels of the organization differ?
Ratings naturally change at different levels of the organization.
Ratings

of front line employees are likely to focus predominantly on
specific job
tasks and roles as well as adherence to organizational values.
As you
ascend the organizational ranks, the focus tends to shift from
specific
tasks to more general performance standards like management,
leadership and
strategic thinking skills. In practice, there is a large amount of
variability in how this
manifests itself in different organizations.
SummaryFOR training improves rating accuracy even for people
who weren’t idiosyncraticRater goals predict the performance
rating they give. Goals can change in time.Information from
raters at different organizational levels can be combined
Psyc601 - K. Shultz, Week 6
1
1
1
1

PSYC601 – Week 6
Gathering Information and Implementation
Tong’s Presentation – Rater Issues
Gathering Information in PM (Ch 6)
Case Study 6-3
BREAK
Arturo’s Presentation – Rating Issues
Implementation of the PM Process (Ch 7)
Modified Case Studies 7-2 and 7-3
Things to come
1
2
Appraisal Forms:
8 Desirable Features
Simplicity
Relevancy
Descriptiveness
Adaptability
Comprehensiveness
Definitional Clarity
Communication
Time Orientation
3
Determining Overall Rating

Judgmental strategy
Holistic judgments – with defensible summary
Mechanical strategy
Weighted summary based on relative importance
4
Appraisal Period and Timing
Number of Meetings
Annual, Semi-annual, or Quarterly
Anniversary Date
Supervisor doesn’t have to fill out forms at same time, but
Can’t tie rewards to fiscal year
Fiscal Year
Rewards tied to fiscal year
Goals tied to corporate goals, but
May be burden to supervisor, depending on implementation
5
Who Should Provide
Performance Information?
Employees should be involved in selecting
Which sources evaluate
Which performance dimensions
When employees are actively involved
Higher acceptance of results
Perception that system is fair

Those with direct knowledge of employee performance should
be used
Supervisors, Peers, Subordinates, Self, Customers (both internal
and external)
6
Disagreement Across Sources
Expect disagreement
Ensure employee receives feedback by source
Assign differential weights to scores by source, depending on
importance
Ensure employees take active role in selecting which sources
will rate which dimensions
7
Types of Rating Errors
Intentional errors
Rating inflation
Rating deflation
Unintentional errors
Due to complexity of task
8

Expected Positive and Negative Consequences of Rating
Accuracy
Probability of Experiencing Positive and Negative
Consequences
Expected Positive and Negative Consequences of Rating
Distortion
Probability of Experiencing Positive and Negative
Consequences
Motivation to Provide Accurate Ratings
Motivation to Distort Ratings
Rating Behavior
A Model of Rater Motivation

9
Should Cover
Information on how the system works
Motivation – What’s in it for me?
Identifying, observing, recording and evaluating performance
How to interact with employees when they receive performance
information
10
Case Study – 6.3
Based on what we now know about rater training programs, rate
each content area in terms of whether they are intentional error
or unintentional errors.
11
Break
12

Implementing a Performance Management System: Overview
Preparation
Communication Plan
Appeals Process
Training Programs
Pilot Testing
Ongoing Monitoring and Evaluation
13
Preparation
Need to gain system buy-in through:
Communication plan regarding Performance Management
system
Including appeals process
Training programs for raters
Pilot testing system
Ongoing monitoring and evaluation
14
Communication Plan Answers
What is Performance Management (PM)?
How does PM fit in our strategy?
What’s in it for me?
How does it work?
What are our roles and responsibilities?
How does PM relate to other initiatives?

15
Cognitive Biases that Affect
Communications Effectiveness
Selective exposure
What you see?
Selective perception
What you perceive?
Selective retention
What you retain?
16
To minimize effects
of cognitive biases
A. Consider employees
Involve employees in system design
Show how employee needs are met
B. Emphasize the positive
Use credible communicators
Strike first – create positive attitude
Provide facts and conclusions
C. Repeat, document, be consistent
Put it in writing
Use multiple channels of communication
Say it, and then – say it again

17
Appeals Process
Promote Employee buy-in to PM system
Amicable/Non-retaliatory
Resolution of disagreements
Employees can question two types of issues
Judgmental -validity of evaluation
Administrative-whether policies and procedures were followed
18
Appeals Process
Level 1
HR reviews facts, policies, procedures
HR reports to supervisor/employee
HR attempts to negotiate settlement
Level 2
Arbitrator (panel of peers and managers) and/or
High-level manager – final decision
19
Content Areas to include
Information
Identifying, Observing, Recording, Evaluating
How to Interact with Employees

Choices of Training Programs to implement
Frame of Reference Training
Behavioral Observation
Self-leadership Training
20
Content
A. Information - how the system works
Reasons for implementing the performance management system
Information
the appraisal form
system mechanics
B. Identifying, observing, recording, and evaluating
performance
How to identify and rank job activities
How to observe, record, and measure performance
How to minimize rating errors
C. How to interact with employees when they receive
performance information
How to train, counsel, and coach
21
Choices of Training Programs
Rater Error Training (RET)
Frame of Reference Training (FOR)
Behavioral Observation Training (BO)

Self-leadership Training (SL)
22
Intentional Rating Errors
Leniency (inflation)
Severity (deflation)
Central tendency
23
Unintentional Rating Errors
Similar to Me
Halo
Primacy
First Impression
Contrast
Stereotype
Negativity
Recency
Spillover
Attribution
24

Frame of Reference Training (FOR)
Goal of FOR*
Raters develop common frame of reference
Observing performance
Evaluating performance
*Most appropriate when PM appraisal system focuses on
behaviors
25
Behavioral Observation Training (BO)
Goals of BO
Minimize unintentional rating errors
Improve rater skills by focusing on how raters:
Observe performance
Store information about performance
Recall information about performance
Use information about performance
26
Self-leadership Training (SL)
Goals of SL
Improve rater confidence in ability to manage performance
Enhance mental processes
Increase self-efficacy

27
Pilot Testing
Provides ability to
Discover potential problems
Fix them
Benefits
Gain information from potential participants
Learn about difficulties/obstacles
Collect recommendations on how to improve
Understand personal reactions
Get early buy-in
Get higher rate of acceptance
28
Implementing a Pilot Test
Roll out test version with sample group
Staff and jobs generalizable to organization
Fully implement planned system
All participants keep records of issues encountered
Do not record appraisal scores
Collect input from all participants
29
Ongoing Monitoring and Evaluation

When system is implemented, decide:
How to evaluate system effectiveness
How to measure implementation
How to measure results
30
Evaluation data to collect
Reactions to the system
Assessments of requirements
Operational
Technical
Effectiveness of performance ratings
31
Indicators to Consider
Number of individuals evaluated
Distribution of performance ratings
Quality of information
Quality of performance discussion meetings
System satisfaction
Cost/benefit ratio
Unit-level and organization-level performance
32

Case Study 7-2 and 7-3
After implementing the PM process (via Exercise 7-1)
Setting up an appeals process (Exercise 7-2)
Evaluating the process (Exercise 7-3)
32
33
Summary – Chapter 6
Several keys to good and useful PA forms
Can combine information via mechanical or holistic approaches
Several practical issues to work out (e.g., appeals period, who
should rate)
Many potential motivators for raters
Several options to reduce rater distortion
33
34
Summary – Chapter 7
Implementation of a solid PM process requires lots of
preparation
Rater training a key component
Many options here

Pilot testing and ongoing monitoring keys to success
34
35
35
35
35
Next Time
Discussion of Chapter 8 in Aguinis – PM and Employee
Development
Presentations by:
Zytlaly (360 degree feedback), and
Jamie (personal development plans)
35
Journal of Occupational and Organizational Psychology (2015),
88, 387–414
© 2014 The British Psychological Society

www.wileyonlinelibrary.com
Does rater personality matter? A meta-analysis of
rater Big Five–performance rating relationships
Michael B. Harari1*, Cort W. Rudolph2 and Andrew J.
Laginess3
1Florida Atlantic University, Boca Raton, Florida, USA
2Saint Louis University, Saint Louis, Missouri, USA
3Florida International University, Miami, Florida, USA
Weexamined rater personality traits consistent with the Five-
FactorModel as sources of
systematic non-performance variance in job performance ratings
using meta-analysis
(k = 28). Several personality factors, including agreeableness,
extraversion, and emo-
tional stability, were related to performance ratings (q = .25,
.12, and .12, respectively),
and features of the rating context (e.g., study setting, appraisal
purpose, accountability)
moderated these relationships. Cumulatively, the Big Five
accounted for between 6% and
22% of the variance in performance ratings. Implications for
performance appraisal
research and practice are discussed.
Practitioner points
� Performance ratings serve a number of important functions in

organizations, and their construct
validity is a central issue.
� We identified rater personality traits, consistent with the
Five-Factor Model, as sources of non-
performance variance in performance ratings.
� To the extent that job performance ratings are contaminated
by rater personality traits, requiring
raters to justify their ratings may result in criterion scores that
reflect greater levels of criterion
relevance.
Performance ratings serve central functions in organizations,
with implications for
performance management and employee development (den
Hartog, Boselie, & Paauwe,
2004; Smither, London, & Reilly, 2005), administrative
decision-making (Cleveland,
Murphy, & Williams, 1989), and other human resource
functions (e.g., strategic,
informational, maintenance, and documentation, see Aguinis,
2013, p. 18). Considering
the importance of performance ratings, their construct validity
has been a crucial issue
and a large body of research on the subject exists (Austin &
Villanova, 1992; Levy &
Williams, 2004; Murphy & Cleveland, 1995). While it is
desirable for ratings of job

performance to be saturated by the ratee’s job-relevant
behaviours, research suggests that
other sources contribute variance to performance ratings
(Murphy, 2008). To identify and
account for performance irrelevant sources of variance, research
has focused on the role
of the organization’s social context (Levy & Williams, 2004).
At present, this research
*Correspondence should be addressed toMichael B. Harari,
Department ofManagement, Florida Atlantic University,
777Glades
Rd., Boca Raton, FL 33431, USA (email: [email protected]).
An earlier version of this paper was presented at the 2014
Society for Industrial and Organizational Psychology
Conference
Honolulu, HI.
DOI:10.1111/joop.12086
387
indicates that features of the evaluation context can influence
performance rating
processes and outcomes (Ferris, Munyon, Basik, & Buckley,
2008). By examining
performance evaluation in situ, this research has provided
useful avenues for

understanding why performance ratings may be inaccurate and
means by which the
quality of ratings can be improved.
While features of the rating context have been highlighted as
determinants of
performance ratings in several systematic reviews (e.g., Jawahar
&Williams, 1997; Levy &
Williams, 2004), much less research has emphasized how rater
personality traits impact
performance assessments. Personality traits reflect one’s
propensities to behave in
characteristic ways in response to situational demands (e.g.,
Mischel & Shoda, 1995;
Roberts, 2009), and research suggests that personality traits
guide behaviours in contexts
that are relevant for trait expression (Tett & Guterman, 2000).
We argue that the
situational demands posed by the performance rating context
provide ample opportunity
for raters to behave in trait-relevantways (Ferris et al., 2008;
Tett & Burnett, 2003; Tziner,
Murphy, & Cleveland, 2005), and as a result, rater traits will
influence performance rating
processes and outcomes. This issue bears on the validity of
performance ratings, as rater

personality traits may act as a source of performance irrelevant
variance. Thus, research
into rater personality–performance rating relationships can have
important implications
for performance evaluation in organizations.
A number of researchers and theorists have speculated that rater
traits play a role in the
performance appraisal process. For instance, Landy and Farr
(1980) proposed a model of
job performance ratings where rater individual differences
influenced rater performance
appraisal strategies, which influenced performance ratings.
Kane, Bernardin, Villanova,
and Peyrefitte (1995) suggested that performance ratings could
‘reflect personality or
information-processing differences among raters’ (p. 1047). In a
review article, Tziner
et al. (2005) noted that personality traits ‘likely play a part in
shaping rating behavior’ (p.
94). Furthermore, a number of empirical investigations along
these lines have taken place
(e.g., Bernardin & Orban, 1990; Fried, Levi, Ben-David, Tiegs,
& Avital, 2000; Randall &
Sharples, 2012; Yun, Donahue, Dudley, & McFarland, 2005).
However, despite the
research conducted thus far, there are several issues in this

literature that preclude the
ability to draw meaningful conclusions.
One issue is that primary studies have examined the effects of a
wide variety of rater
personality traits on performance ratings with little coherency
across studies. For
instance, research has examined the effects of rater hostility
(Phillips, 1960), need for
achievement (Kovacs & Kapel, 1976), cognitive complexity
(Bernardin & Orban, 1990;
Schneider, 1977), self-esteem (Guven, 2007; Wexley & Youtz,
1985), and ego concern
(Chambers, 2003), among others. Thus, this literature has
proceeded in a fragmented
fashion and has lacked a systematic framework for organizing
personality traits.
Another issue is that there are many inconsistent findings in this
literature. For
instance, some research indicates a positive relationship
between conscientiousness and
performance ratings (Roch, Ayman, Newhouse, & Harris, 2005),
while other research
indicates a negative relationship (Bernardin, Cooke, &
Villanova, 2000; Bernardin, Tyler,

& Villanova, 2009), and still other research indicates virtually
no relationship (Tziner,
Murphy, &Cleveland, 2002). The same state of affairs is evident
for other personality traits
as well (e.g., extraversion; Bernardin et al., 2000, 2009;
Strauss, Barrick, & Connerley,
2001).
Compounding this issue is that features of the rating context
vary across studies. It is
well recognized that personality traits influence one’s
characteristic manner of respond-
ing to certain situations, rather than a tendency to behave
similarly across situations
388 Michael B. Harari et al.
(Mischel & Shoda, 1995; Roberts, 2009; Stewart & Barrick,
2004; Tett &Guterman, 2000).
Thus, features of the rating context should influence the effect
of rater personality on
performance ratings and variation in contextual features likely
contributes to the
inconsistent findings that are evident in this literature. As such,
contextual features should
be examined as moderators of the rater personality–performance
rating relationships.

The purpose of this study is to address these issues by
conducting a quantitative
literature review. Meta-analytic methods allow us to assess the
extent to which mixed
findings are due to sampling error as opposed to the presence of
moderators (Hunter &
Schmidt, 2004). Furthermore, our analytic approach allows us to
test the effects of
theoretically relevant features of the rating context
asmoderators of the rater personality–
performance rating relationships. Therefore, our approach will
not only clarify the
existing literature, but may also result in a number of novel
conclusions concerning the
conditional influences of rater personality traits on performance
ratings.
To classify the personality traits examined in prior studies
consistent with a well-
accepted taxonomy, we draw upon the Five-Factor Model of
personality (cf. Costa &
McCrae, 1992; Digman, 1990). This approach has been useful
for quantifying general-
izable relationships between personality traits and workplace
outcomes (e.g., Barrick &
Mount, 1991; Judge & Ilies, 2002). Note that we are not

suggesting that traits that fall
outside of the Five-Factor Model are not useful for
understanding rating behaviour.
Indeed, individual differences such as political skill and
emotional intelligence may very
well play a role as determinants of rating behaviour and
outcomes (cf. Ferris et al., 2008).
However, the Five-Factor Model is a useful typology for
organizing the existing
fragmented literature.
In the following paragraphs, we discuss the social context
features of organizations
that are relevant to performance evaluation. Following this, we
review the Five-Factor
Model of personality, discuss the relevance of these traits as
determinants of performance
ratings in the light of the social context of performance
appraisal, and form hypotheses
concerning rater personality–performance rating relationships.
Finally, we discuss
potential moderators of these relationships.
The social context of performance ratings
Research has recognized that the construct validity of
performance ratings might be

problematic, and several approaches to dealing with this issue
have subsequently
emerged (Murphy, 2008). Research first focused on
measurement issues, suggesting that
the construct validity of performance ratings could be improved
by, for example,
developing better rating scales (cf. Austin & Villanova, 1992).
However, this line of
research had limited success in improving performance ratings
(Landy & Farr, 1980).
Following this, research began to focus on the cognitive
processes involved in
performance appraisal, noting that raters were imperfect
information processors and
that this could lead to errors and biases in the rating process and
thus poor construct
validity of performance ratings (Feldman, 1981). However, this
approach also had only
limited success in improving performance ratings in
organizations (Ilgen, Barnes-Farrell,
& McKellin, 1993).
In response to the limited utility of these approaches, research
began to shift focus
towards the performance evaluation context. That is, raters were
perceived as actors in a

rich and complex organizational context that could influence
their motivations such that
accuracy may be only a subsidiary goal (Murphy, 2008; Murphy
& Cleveland, 1995).
By recognizing the rich context in which performance ratings
occur, research has
Does rater personality matter? 389
developed a more clear understanding of the non-performance
factors that influence
performance ratings in organizations (Ferris et al., 2008; Levy
& Williams, 2004).
Murphy and Cleveland (1995) brought light to the role of the
organizational
context in performance evaluation by noting that the rating
context influences
performance judgments, performance ratings, and the evaluation
process. Murphy and
Cleveland’s model distinguished between distal variables (i.e.,
those that influence
performance evaluation indirectly) and proximal variables (i.e.,
those that influence
performance evaluation directly). Distal variables include, for
example, the organiza-

tional structure, the legal climate, and workforce composition.
Proximal variables
include, for example, the rater–ratee relationship, consequences
of performance
ratings, and rater accountability.
Levy andWilliams (2004) conducted a cumulative review of the
literature into the role
of the organizational social context on performance ratings. In
doing so, they expanded
upon Murphy and Cleveland’s (1995) typology to build a
framework for summarizing the
existing literature. Similar to Murphy and Cleveland’s
framework, Levy and Williams
distinguished between distal and proximal contextual
determinants of performance
rating behaviour. However, the research also distinguished
between two types of
proximal variables – process proximal variables (i.e., how the
appraisal process is
conducted) and structural proximal variables (i.e., formal design
characteristics). Levy
and Williams’ review highlighted evidence suggesting that
process and structural
proximal variables influenced performance evaluation processes
and rating behaviours.
More recently, Ferris et al. (2008) provided an integrative
framework for understand-

ing the contextual backdrop in organizations as it pertains to
performance evaluation.
Specifically, Ferris et al.’s reviewnoted that the context inwhich
performance evaluation
occurs encompasses ‘cognitive, social and relationship,
affective and emotional, and
political and relationship context features’ (p. 150). Rather than
acting in isolation, each of
these context features interacts with one another continuously
and dynamically in
shaping the evaluation context in an organization. Cognitive
context refers to the rater’s
cognitive processes involved in observing, encoding, storing,
retrieving, and integrating
observations of a ratee’s performance. Social and relationship
context concerns features
of the rater–ratee dyadic work relationship. Social influence and
politics concerns the
influence of deliberate manipulation by raters and ratees on
performance ratings. Finally,
affect and emotion concerns the role of affective regard for a
ratee (i.e., liking) on
performance ratings. Ferris et al. reviewed evidence suggesting
that each of these
contextual features influenced performance ratings in

organizations (see also Fletcher,
2001; Levy & Williams, 2004).
Personality and performance ratings
Consistent with Roberts (2009), we define personality as
‘relatively enduring patterns of
thoughts, feelings, and behaviors that reflect the tendency to
respond in certain ways
under certain circumstances’ (p. 140). While personality traits
themselves remain
relatively stable within person across appreciable amounts of
time (Roberts &
DelVecchio, 2000), their influence on thoughts, feelings, and
behaviours is shaped by
environmental influences – features of the context in which the
actor is embedded
(Roberts & Jackson, 2008; Roberts, Lejuez, Krueger, Richards,
&Hill, 2014). For example,
Tett and Guterman (2000) discussed the principle of trait
activation. Trait activation
holds that ‘the behavioral expression of a trait requires arousal
of that trait by trait-relevant
situational cues’ (Tett & Guterman, 2000, p. 398). Similarly,
Mischel and Shoda (1995)

characterized personality as ‘a system of mediating processes,
conscious and uncon-
scious, whose interactions are manifested in predictable patterns
of situation-behavior
relations’ (p. 247).
We argue that the contextual features that serve as a backdrop
to performance
evaluation provide a landscape for rater personality traits to
manifest themselves in the
rating process (Tett &Guterman, 2000; Tziner et al., 2005). That
is, in response to certain
contextual features, rater personality traits would influence
their thoughts, feelings, and
rating behaviours, and thus, rater personality traits should
influence performance rating
scores (Tett & Burnett, 2003; Tett &Guterman, 2000). This in
situ theoretical perspective
helps to explain the dynamic interplay between rating context
and rater personality, and
the joint influence of context and personality in the performance
rating process.
Considering this, rater personality traits have the potential to
influence the validity of
performance ratings, which bears on critical decisions that are

guided by performance
rating scores. Examining and understanding these relationships
can ultimately improve
the construct validity of performance ratings by spurring
research into interventions that
can reduce their influence. Thus, a focus on rater personality
will enrich our
understanding of the determinants of performance ratings in
organizations beyond the
influence of the rating context alone (Ferris et al., 2008).
As already noted, the Five-Factor Model of personality (also
referred to as the Big
Five) has been proposed as an integrative framework for
studying individual differences
in personality and is among the most well-accepted taxonomies
of personality in the
literature (Costa & McCrae, 1992; Digman, 1990). According to
this perspective, the
latent structure of personality is hierarchical with five factors at
the highest level
(Goldberg, 1992; Hough & Ones, 2001). The five factors are as
follows: Agreeableness,
extraversion, emotional stability, conscientiousness, and
openness. We review each of

these factors in the following sections and discuss features of
the rating context that are
relevant for the expression of each of the five factors in a
performance evaluation setting.
Note that positive personality–performance rating relationships
are interpreted as
reflecting rating elevation (i.e., higher scores on the personality
trait are associated with
higher, or more elevated, performance ratings), while negative
personality–performance
rating relationships are interpreted as reflecting rating
stringency (i.e., higher scores on
the personality trait are associated with lower, or more
stringent, performance ratings;
Bernardin et al., 2000).
Agreeableness
Agreeableness reflects traits such as trust, altruism, and
cooperation (Costa & McCrae,
1992). Agreeableness is related to a tendency to favour positive
social relationships and to
avoid conflict (Jensen-Campbell & Graziano, 2001).
Considering this, we suggest that the
social and relationship context features as reviewed by Ferris et
al. (2008) provide
opportunities for rater agreeableness to influence performance

ratings. For example,
delivering negative feedback to an employee regarding his or
her performance can have a
negative influence on the rater–ratee work relationship (Murphy
& Cleveland, 1995) and
raters who are high in agreeableness may be motivated to avoid
disturbing this social
relationship. Along these lines, research suggests that raters
may respond to a motivation
to avoid negative exchanges with ratees by inflating
performance rating scores (Shore &
Tashchian, 2002).
Agreeableness is also associated with a tendency to be
sympathetic and concerned
for the welfare of others (Costa & McCrae, 1992). Such
tendencies are relevant for
performance evaluation behaviours. Performance appraisal has
both short- and long-
term consequences for employees. In terms of the former, Ferris
et al. (2008) argued
that poor performance evaluations could negatively influence
ratee affect, which in

turn could result in withdrawal and decreased relationship
satisfaction. In terms of the
later, poor performance ratings could influence an employee’s
career (e.g., oppor-
tunities for promotions or salary increases; Aguinis, 2013;
Cleveland et al., 1989).
Raters who are high in agreeableness may inflate performance
ratings to protect ratees
from the consequences associated with poor performance
evaluations. Considering
this, we propose that agreeableness is positively related to
performance ratings.
Hypothesis 1: Rater agreeableness is positively related to
Extraversion
Extraversion reflects traits such as sociability, assertiveness,
and gregariousness and is
associated with a tendency to be friendly and to prefer the
company of others (Costa &
McCrae, 1992). Research suggests that extraversion is
positively related to networking
behaviours in organizations as well as social network size
(Forret & Dougherty, 2001).
Such tendencies are relevant when considering the social and
relationship context

features of performance evaluation (Ferris et al., 2008), as
raters who are high in
extraversion would be likely to form favourable dyadic work
relationships with their
ratees (John, Naumann, & Soto, 2008). Along these lines,
research suggests that rater–
ratee relationship quality is positively related to performance
ratings, beyond the
influence of objective performance levels (Duarte, Goodson, &
Klich, 1994). Thus, as
extraverted raters are likely to form favourable relationships
with their ratees, and
favourable rater–ratee relationships are associated with elevated
performance ratings,
raters who are high in extraversionmay be likely to provide
elevated performance ratings.
This line of reasoning suggests a positive rater extraversion–
performance rating
relationship.
Hypothesis 2: Rater extraversion is positively related to
Emotional stability
Emotional stability reflects traits such as calmness and even-
temperedness (Costa &
McCrae, 1992). In discussing the proposed relationship between
rater emotional

stability and performance ratings, it is useful to consider low
emotional stability –
neuroticism. Neuroticism encompasses traits such as anxiety
and depression (John
et al., 2008) and is associated with a tendency to become easily
angered and frustrated
by others (Costa & McCrae, 1992). Along these lines, research
suggests that neuroticism
inhibits cooperation with others and may result in poor working
relationships (George,
1990). This is relevant for performance rating behaviours, as
neurotic raters may be
likely to have poor relationships with their ratees, which could
result in lower
performance ratings (Duarte et al., 1994). As this perspective
suggests that raters who
are low in emotional stability (i.e., high in neuroticism) should
rate performance low,
we propose that rater emotional stability (i.e., low neuroticism)
is positively related to
Hypothesis 3: Rater emotional stability is positively related to

Conscientiousness
Conscientiousness reflects traits such as dependability,
thoroughness, and achievement
orientation (Costa & McCrae, 1992). Ferris et al.’s (2008)
review noted that the
standards against which performance is evaluated could
influence performance ratings
(see also Murphy & Cleveland, 1995). Employees who are high
in conscientiousness
generally display superior job performance as compared to
employees who are lower in
this trait (Barrick & Mount, 1991; Hurtz & Donovan, 2000). As
such, raters who are high
in conscientiousness may have relatively high standards for
performance and may
therefore expect exceptional performance, which could result in
lower performance
ratings. As a result, rater conscientiousness should be
negatively related to performance
ratings.
Hypothesis 4: Rater conscientiousness is negatively related to
Openness

Openness reflects traits such as imaginative, creative, curious,
and original (Costa &
McCrae, 1992). Openness is relevant for performance appraisal
in organizations in the
light of the cognitive context features discussed in Ferris et al.
(2008). Cognitive models
of performance appraisal indicate that rating errors result from
the rater’s finite
information-processing capabilities. As openness is positively
associated with cognitive
functioning (DeYoung, Peterson, & Higgins, 2005), raters who
are high in openness may
be better able to integrate larger numbers of performance
episodes into their
performance judgments (Murphy & Cleveland, 1995). That is,
raters who are high in
openness may be less prone to cognitive biases that influence
performance evaluation
(Feldman, 1981).
Openness is also related to a tendency to form more complex
attributions for the
behaviours of others (Brookings, Zembar, & Hochstetler, 2003).
In their review of the
performance rating context literature, Levy and Williams (2004)
noted that ‘attributional

processing is an important element of the rating process and
these attributions, in part,
determine raters’ reactions and ratings’ (p. 887). Thus, raters
who are high in openness
may react to the performance rating context by deeply
considering the causes and
meaning of a wide range of observed performance episodes.
Considering this, rater
openness may facilitate accurate performance ratings that are
neither elevated nor
suppressed. Thus, there is little reason to predict a rater
openness–performance rating
relationship, and we therefore hypothesize the following:
Hypothesis 5: Rater openness is not related to performance
ratings.
Moderators
Study setting
We have noted that the contextual landscape that characterizes
the social system of
organizations provides the opportunity for personality traits to
influence performance
rating behaviours (Ferris et al., 2008; Levy & Williams, 2004;
Tett & Guterman, 2000).

While laboratory studies may be able to simulate some
naturalistic context features of
performance appraisal, it is likely that the full spectrum of
contextual features and
their dynamic interactions (cf. Ferris et al., 2008) would not be
well represented. To
the extent that rater personality–performance rating
relationships depend on such
features, we would expect the magnitude of these effects to be
larger in field studies
as compared to laboratory studies. For example, if a positive
rater agreeableness–
performance rating relationship occurs in response to
interpersonal consequences
associated with poor performance ratings (Jensen-Campbell &
Graziano, 2001;
Murphy & Cleveland, 1995), this effect may be reduced in
laboratory settings where
there is no continued interaction among participants and
therefore scant possibility
for long-term interpersonal consequences (see Bell, 2007 for an
application of this
logic to research involving personality and team performance).
As a result, we

propose that study setting will moderate the rater personality–
performance rating
relationships such that the effects are stronger in field studies as
compared to
laboratory studies.
Hypothesis 6: The rater personality–performance rating
relationships are moderated
by study setting such that the relationships are stronger in field
settings
and weaker in laboratory settings.
Situational strength
Beyond the relevance of social context features for personality
trait expression in
performance appraisal, it is also important to note that features
of the situation can act to
inhibit the effect of traits, regardless of the relevance of other
contextual features for trait
expression (Tett & Burnett, 2003). Situational strength refers to
the extent to which
demands placed on individuals by a situation act to constrain
the influence of one’s
personality on their behaviour (Cooper&Withey, 2009;Mischel,
1977). A strong situation
is one in which situational demand characteristics pose pressure
to behave in a certain

way, inducing conformity and reducing the influence of
personality traits on behaviours
(Mischel, 1973). On the other hand, a weak situation is one in
which demand
characteristics posed by the environment are low, allowing the
individual to determine
their own course of action in a given situation. Thus, weak
situations better allow for
personality traits to influence behaviour. A number of features
of a performance appraisal
context may influence situational strength, and we focus on
two: Appraisal purpose and
accountability.
Appraisal purpose. Broadly speaking, research has considered
two classes of purposes
for which performance ratings are collected: Administrative
purposes (e.g., promotion,
raises) and research/developmental purposes (e.g., feedback,
training; Cleveland et al.,
1989; Levy & Williams, 2004). When rating performance for
administrative purposes,
raters are making decisions that will potentially have a vast
impact on the ratee’s career.
The results of the performance appraisal may be permanently
included in the ratee’s

records and could influence their chances at receiving a
promotion or raise and may
contribute towards their termination. Thus, when providing
administrative ratings,
raters face strong situational pressures to, for example, inflate
or distort performance
ratings in some way (Jawahar & Williams, 1997). Considering
this, a situation in which a
rater is evaluating performance for administrative purposes
could be characterized as
strong. In such a situation, the rater personality–performance
rating relationships may
be reduced. On the other hand, when ratings are collected for
research or develop-
mental purposes, many of the aforementioned situational
pressures are alleviated. Thus,
performance appraisals conducted under research or
developmental conditions
represent a weak situation, allowing rater personality traits to
influence performance
ratings.

by appraisal purpose such that the relationships are attenuated
when
ratings are collected for administrative purposes and
strengthened when
ratings are collected for research or developmental purposes.
Accountability. Accountability refers to the extent to which
raters are held responsible
for their ratings of an employee (Levy & Williams, 2004). For
instance, when raters must
justify their ratings to others, accountability would be high
(Wherry, 1952). Research
suggests that holding raters accountable for their ratings
represents a strong situation that
influences performance rating outcomes (Klimoski & Inks,
1990; Mero & Motowidlo,
1995). Indeed, as raters are pressed to explain their performance
ratings, they face
stronger situational pressures to rate performance accurately. As
a result, high
accountability could be characterized as representing a strong
situation that reduces
the influence of rater traits on performance ratings. On the other
hand, when
accountability is low, raters are free from situational pressures
that necessitate accurate

ratings. Therefore, we consider low accountability as reflecting
a weak situation that
would allow for rater personality traits to drive rating
behaviour. Thus,we suggest that the
rater personality–performance rating relationships will be
attenuated when accountabil-
ity is high and strengthened when accountability is low.
by accountability such that the relationships are attenuated
when
accountability is high and strengthened when accountability is
low.
In summary, the purpose of this study is to use meta-analytic
methods to assess the
relationships between rater personality traits and performance
ratings. In doing so, we
invoke the Five-Factor Model of personality as an organizing
framework for classifying
personality traits examined in the existing literature. We also
examine the influence of a
number of theoretically relevant moderators on these
relationships, including study
setting, appraisal purpose, and accountability.
Method

Literature search
To identify relevant studies, we first conducted a literature
search using ProQuest,
PsycINFO, Google Scholar, and ABI-Inform. We searched for
articles that included any
of the following: Rater individual differences or rater
personality. We also searched
these databases for articles that contained various combinations
of rater, rating,
individual differences, Big Five, five-factor model, personality,
openness, conscien-
tiousness, extraversion, agreeableness, neuroticism, and
emotional stability.
Furthermore, we identified additional studies by reviewing the
reference sections of all
relevant articles returned by the literature search. This process
yielded a total of 304
articles.
Criteria for inclusion
Todeterminewhich of these 304 articleswould be included in our

analyses,we employed
the following decision rules. First, the dependent variable in the
study had to be a rating of
another person’s performance, including performance on the
job, work samples or
simulations (either in a field or laboratory setting), and
vignettes describing a hypothetical
employee’s performance. Studies that reported ratings of traits,
clinical assessments,
problematic behaviour, leadership style, and other non-
performance variables were
excluded, as were any studies that examined self-ratings of
performance. Second, the
study had to include as an independent variable a personality
measure that could be
classified as openness, conscientiousness, extraversion,
agreeableness, or emotional
stability. Third, the study had to report a zero-order correlation
between the personality
variable and mean performance ratings across ratees or enough
statistical information so
that it could be computed. A total of 21 studies (and 28
independent samples) met this
inclusion criteria and were included in our analyses.

Coding scheme
Prior to coding, the first and third authors independently
classified each of the personality
measures used in each study as an indicator of openness,
conscientiousness, extraversion,
agreeableness, or emotional stability. Many studies included
explicit measures of the Big
Five personality factors (e.g., the NEO-PI-R; Costa &McCrae,
1992), and in these cases, no
judgment calls weremade on the part of the authors.Where
explicit measures of the Five-
Factor Model were not used, the authors consulted the taxons
proposed by Hough and
Ones (2001). TheHough andOnes taxonsweredeveloped to
guidemeta-analytic research
that involves personality traits by explicating the Big Five
personality factor assessed by a
variety of commercial personality scales. Since the introduction
of the taxons, they have
been drawn upon in a number of meta-analyses involving
personality (e.g., Dudley, Orvis,
Lebiecki, & Cortina, 2006; Foldes, Duehr, & Ones, 2008).
When a measure was not
explicitly developed to assess one of the Big Five factors and
was not included in the

Hough and Ones taxons, the two authors (1) reviewed the scale
development papers and
scale items to determine what (if any) Big Five factor the
measure was assessing and (2)
searched the literature for statistical evidence that supported a
classification of the
measure within the Big Five framework (e.g., factor analytic
studies that included the
measure and established measures of the Big Five, correlations
between the measure and
established measures of the Big Five). Agreement between the
two authors was initially
94%, and the discrepancies were resolved between the two
authors and the second
author. Ultimately, 100% agreement for all variables was
reached.
Next, the first and third authors independently coded each study
for sample size, effect
size, measurement reliability, and moderators. Measurement
reliability for both the
predictor and criterion measures was operationalized as internal
consistency (i.e.,
coefficient alpha). Where studies reported effect sizes between
multiple measures of the
same predictor and criterion, compositeswere formed according

to themethods outlined
by Hunter and Schmidt (2004). If the information needed to
compute composites was
unavailable, we computed composites by averaging the effect
sizes. We computed
composite reliabilities using the Spearman–Brown formula or,
where the needed
information was not available, by averaging the coefficient
alpha reliability estimates
reported for each measure.
In terms of the moderators, performance appraisal purpose and
accountability
were coded only if the primary study was explicit about these
features. Agreement
between the two authors was initially 97%, and all
discrepancies were resolved between
the two authors and the second author.1
Analyses
We conducted the meta-analysis according to the procedures
outlined by Hunter and
Schmidt (2004). This method allowed us to estimate construct-
level relationships by

correcting observed relationships for statistical artefacts,
specifically sampling error and
measurement reliability. First, we calculated the sample size-
weighted mean correlation
for performance ratings with each of the personality traits. We
then estimated the
population correlation (i.e., q) by correcting each mean
correlation for measurement
reliability in both the predictor and criterion using artefact
distributions, as reliability
information was only sporadically available. Direct range
restriction was not evident in
any studies, and therefore, we made no such corrections.
Finally, we estimated 95%
confidence intervals (using the methods outlined by
Viswesvaran, Schmidt, & Ones,
2002) and 80% credibility intervals around each estimate of q,
as well as the percentage
variance accounted for by statistical artefacts (i.e., sampling
error and unreliability in the
predictor and criterion measures) for each estimate of q. To
evaluate the presence of
moderators, we employed the 75% rule (i.e., a moderator is
likely to be present when the
percentage variance accounted for by statistical artefacts is
<75%; Hunter & Schmidt,
2004). The same analytic procedure was repeated for each level
of each moderator in the

moderator analyses.
While the moderators examined in our study were conceptually
distinct, it is
important to establish that they are empirically distinct aswell.
Therefore,we assessed the
relationships between the moderators using the phi coefficient.
The phi coefficient is
used to test the relationship between variables in a two-by-two
contingency table and can
be interpreted by the same standards as a correlation
coefficient. The study setting–
accountability and appraisal purpose–accountability
relationshipswereminimal (Φ = .22
and .18, respectively). The study setting–appraisal purpose
relationship was stronger
(Φ = .40), but still did not suggest that these two moderators
were redundant. Thus, we
concluded that themoderators examined herewere sufficiently
distinct tomerit inclusion
in our analyses.
An important consideration for our moderator analyses is the
minimum number of
studies that must be included in any given analysis to provide
informative results. For
instance, several meta-analytic reviews in the organizational
sciences have required k = 3
effect sizes to proceed with analyses (e.g., Eby, Allen, Evans,
Ng, & DuBois, 2008;

Viswesvaran et al., 2002). However, Valentine, Pigott, and
Rothstein (2010) noted that
meta-analytic methods can be applied effectively to as few as
two effect sizes and that this
approach is superior to othermeans bywhich findings from a
small number of studies can
be interpreted (e.g., so-called cognitive algebra whereby one
tries to mentally integrate
findings across studies). Therefore, we determined that we
would proceed with our
1 The complete code sheet is available upon request from the
first author.
moderator analyses so long as at least two effect sizes were
available (however, all of our
analyses involved at least k = 3 effect sizes).
While estimating bivariate rater personality–performance rating
relationships is
informative, we also used our bivariate meta-analytic estimates
to assess the multiple
correlation of the Five-Factor Model of personality as a set on
Doing so required us to build a correlation matrix involving the
relationships estimated in

this study as well as the intercorrelations among the Big Five
factors of personality.
Consistent with other meta-analyses (e.g., Munyon, Summers,
Thompson, & Ferris, 2014;
Ones, Dilchert, Viswesvaran, & Judge, 2007), we used the
estimates derived by Ones
(1993; also reported in Ones, Viswesvaran, & Reiss, 1996) for
these analyses. Consistent
with recommendations (Viswesvaran & Ones, 1995), we used
the harmonic mean across
cells as the sample size in our analyses. We conducted these
analyses by regressing
performance ratings onto the five personality factors.We
repeated these analyses for each
level of each moderator.
Note that when the predictors included in a regression model
are correlated, the
relative contribution of each predictor to the model R2 cannot
be accurately determined
by examining the beta weights alone (LeBreton, Ployhart, &
Ladd, 2004). Therefore, we
conducted relative importance analyses to determine the relative
contribution of each of
the Big Five factors to the prediction of performance ratings

(Tonidandel & LeBreton,
2011). This approach has recently been adopted in meta-
analyses to provide a more
nuanced perspective of howpredictors in a regressionmodel
operate in concert with one
another in influencing the criterion (e.g., Munyon et al., 2014;
O’Boyle, Humphrey,
Pollack, Hawver, & Story, 2010). We conducted these analyses
using a relative weight
analysis framework (Johnson, 2000) and repeated these analyses
for each level of
each moderator examined. Relative weight analysis produces
two types of coefficients –
relative weights and rescaled relative weights. The former
reflects the proportion of
variance in the outcome (i.e., performance ratings) that is
attributed to each of the
predictor variables, while the latter reflects the percentage of
predicted variance that is
accounted for by each predictor variable (calculated by dividing
the relative weights by
the model R2; LeBreton, Hargis, Griepentrog, Oswald, &
Ployhart, 2007).
Results
The bivariate meta-analytic results are reported in Table 1, the
results of our multiple

regression analyses are reported in Table 2, and the results of
our relative weight analyses
are reported in Table 3. The results of our overall multiple
regression analysis indicated
that the Big Five personality factors accounted for 8% of the
variance in performance
ratings. Hypothesis 1 predicted a positive relationship between
rater agreeableness and
performance ratings. The results of our bivariate analysis
indicated a corrected correlation
of q = .25. Furthermore, the results of our multiple regression
analysis indicated a
statistically significant betaweight of .22. Finally, the results of
our relativeweight analysis
indicated that agreeableness accounted for 65.9% of the
variance in performance ratings
explained by the Big Five personality factors. Cumulatively,
these findings support
hypothesis 1.
Hypothesis 2 predicted a positive rater extraversion–
performance rating relationship.
The results of our bivariate analysis indicated a corrected rater
extraversion–performance
rating correlation of q = .12. Furthermore, the results of our
multiple regression analysis
indicated a statistically significant betaweight of .08, and the
results of our relative weight

Table 1. Meta-analysis of rater personality–performance rating
correlations
Moderator k N r q rq % Var CVL CVU CIL CIU
Agreeableness
Overall 12 1,899 .20 .25 .17 25.65 .03 .46 .15 .35
Study setting
Field 6 874 .26 .33 .00 100 .33 .33 .33 .33
Laboratory 6 1,025 .14 .17 .21 17.05 �.09 .44 .00 .34
Appraisal purpose
Administrative 4 378 .28 .36 .00 100 .36 .36 .36 .36
Research/developmental 4 582 .18 .22 .23 15.93 �.08 .52 �.01
.45
Accountability
Low 5 864 .22 .27 .09 50.78 .15 .39 .19 .35
High 5 792 .16 .20 .24 14.65 �.10 .50 �.01 .41
Publication status
Published 9 1,015 .22 .27 .17 30.85 .05 .49 .16 .38
Unpublished 3 884 .17 .22 .15 17.91 .02 .42 .05 .39
Extraversion

Rater Issues in Performance ManagementMichael RosePsychology.docx

Rater Issues in Performance ManagementMichael RosePsychology.docx

Recommended

Recommended

More Related Content

Similar to Rater Issues in Performance ManagementMichael RosePsychology.docx

Similar to Rater Issues in Performance ManagementMichael RosePsychology.docx (20)

More from makdul

More from makdul (20)

Recently uploaded

Recently uploaded (20)

Rater Issues in Performance ManagementMichael RosePsychology.docx