A CRITICAL LOOK AT...
at our missions comes to a grinding halt      quote from Joseph Kraft: “The fact is
for one or two full months a year –   ...
EER works or does not work as a tool         forwarding that individual employees
for developing a workforce that could   ...
feedback from a focus groups consisting      ELOs, roughly 1 in 4 answered the
of several ELOs from the Consular          ...
reported their most recent EER was              Generalists liked their Raters as bosses
received while doing a Consular j...
dissatisfied with the EER system, despite            Another apparent problem is that
being quite satisfied with their own...
less satisfied with the EER system.           typically be used punitively. Possibly
Possibly these individuals are less  ...
disaffected may be more likely to            subordinates. In short, there is strong
respond by virtue of the discontent t...
decreases rapidly if one perceives he/she              One of the more controversial
is not liked on a personal level.    ...
Knowledge. Not one tenured Officer in         it or to wait until the next EER to
the dataset of 150 had Leadership skills...
Ratings, Reviews, and with the EER           males and females did not differ
system, goes up as the amount of            ...
inferential statistics on sex/gender and     levels of satisfaction with their EER
average number of tenure reviews.      ...
too few data points, however suspending       done as quickly as it could/should have
prerequisites, the difference in mea...
None of the above factors were       Now a semi-retired Harvard University
statistically related to tenure rate. I do    p...
developing policies that upset no one, of    the employee, the circumstances of the
establishing policies in such a way th...
from the many employees who are              when some of their interests run counter
happy with their own glowing EERs bu...
concept. It seems clear that when the        measure added, but the remaining 42%
Generalists report being in favor for 36...
We should consider a wide range      individual. Prompting questions could
of possibilities for implementing            be...
under Transformational Diplomacy, one       a “go along to get along” workforce? I
could argue that our EER system should ...
To contact the author:

Upcoming SlideShare
Loading in …5

2006 Kilburg EER Long Form Article 20pgs


Published on

2006 EER Study Short form article. Unofficial research study on State's EER. Unabridged article. Cleared on August 17, 2006, by PA/SCP, for publication.

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2006 Kilburg EER Long Form Article 20pgs

  1. 1. The EER: IS IT TRANSFORMATIONAL? A PSYCHOLOGIST TURNED DIPLOMAT TAKES A CRITICAL LOOK AT THE CURRENT EER SYSTEM Don Kilburg, Ph.D. Transformational Diplomacy Still others point out that when & Employee Evaluation Reports EERs do not paint a picture of water In a key address at Georgetown walking, they either “damn with faint University, Secretary of State praise” or they damn with the rare Condoleezza Rice showcased the candid critique. Damning with faint buzzwords “Transformational praise is arguably disingenuous. Diplomacy” (January 18, 2006). Supervisors may knowingly code what America needs a “diplomacy that not they write in order to covertly damage only reports about the world as it is, but an employee they will not overtly seeks to change the world itself”, the criticize. In contrast, damning with the Secretary advised. “We must transform rare candid critique is arguably an old diplomatic institutions to serve new accident of who your supervisor happens diplomatic purposes” and we must to be. Such supervisors may pride “prepare” and “challenge” our own themselves in breaking the high praise diplomats with “new expertise” and norm, which essentially results in a “new expectations”, she insisted. As disproportionate amount of overt both a diplomat and a psychologist, it damage to their own employees vis-à-vis occurred to me that if we want to their water-walking peers. advance this sort of diplomacy, we In any case, it is not clear how might need to take a critical look at how exactly the EER could serve as a tool for we formally shape ourselves as a transformation. A system wherein the diplomatic force: the Employee luck of the draw on supervisors’ candor Evaluation Report, or EER. Does the matters so much does not seem fair. A EER produce diplomats who are true system that over-rewards might serve “transformers”, as Secretary Rice has only to limit the advancement of called for? promising employees by virtue of under- Of course, just like the answer to challenging them. Worse still, a system most questions in the Department, “it that reinforces bad habits through an depends”. It depends who you ask. inability to effectively categorize Anecdotal reports from many employees employees could actually result in bad suggest “the system ain’t broke, so don’t employees getting promoted. fix it”. Other employees are openly “So what?” you say. “The real critical of the EER system, referring to it transformation of the workforce occurs as “a game” or a “kabuki dance”. Both by way of so-called ‘corridor tend to agree that the EER is essentially reputation’”, you suppose. In fact, you an exercise in “water-walking”, whereby may be right, but then this in turn begs everyone is made to look exceptional, another question: why do we spend so regardless of his or her true much time and energy on EERs? By performance. some estimates, progress on other work 1
  2. 2. at our missions comes to a grinding halt quote from Joseph Kraft: “The fact is for one or two full months a year – that the Department has not been run simply to deal with the business of primarily as a decision-making evaluating ourselves. Are EERs really instrument. It has been run as a fudge worth it? factory. The aim has been to make everybody happy, to conciliate interests, Context to avoid giving offense and rocking the To analyze the value of EERs, I boat.” This quotation certainly first attempted to review the existing resonates, yet I was inclined to dismiss a literature on the topic. I asked people in book written a full 35 years ago. It Human Resources if they knew of any could not possibly apply to today’s systematic review of the EER system. employee evaluation system, I thought. They did not. I asked the American Then I came across another Foreign Service Association if they opinion from the period, in the Campbell knew of any study or article that book, expressed clearly by renowned discussed the merits of the EER system. diplomat George Kennan however: “Let They did not. In fact, no one could point me control personnel and I will me in the direction of any sort of ultimately control policy. For the part of systematic appraisal of the employee the machine that recruits and hires and evaluation process. “Could it be that no fires and promotes people can soon one had ever systematically evaluated a control the entire shape of the institution, tool the Department uses so widely?” I and of our foreign policy.” asked myself. As a psychologist the I also heard from Doug Ellice, most common question you get when another retired Foreign Service Officer, presenting research findings is: does with 27 years of service under his belt. your measure have demonstrated He sent me a copy of his letter of a few statistical validity and reliability? I years ago, to Director General began to wonder: does the EER have Ambassador Davis. Mr. Ellice’s demonstrated statistical validity and commentary about EERs in the letter reliability? was, to say the least, scathing. He calls It was not until I started the EER “worthless”, stating that there mentioning my idea to conduct a survey are “no quantifiable performance of employee’s experiences with the EER measures to be weighed”. He goes on to that anything resembling preexisting state that the Rating Officer’s role is literature on the subject surfaced. A “totalitarian”. Ellice maintains in the retired Foreign Service Officer of 26 letter that the Foreign Service is years of service, named Eli Lauderdale, “poisoned by the need officers feel to told me about a book called “The please their Rating and Reviewing Foreign Affairs Fudge Factory”, written Officers”. by John Franklin Campbell in 1971. As With these anecdotes, but no real the title indicates, the book’s take on the research on the topic, I decided to Department’s bureaucratic culture of the conduct a survey of employees’ early seventies is critical. The experiences with the EER. This would introduction to the book starts with a be a first step toward identifying how the 2
  3. 3. EER works or does not work as a tool forwarding that individual employees for developing a workforce that could undertook through Department email to advance Transformational Diplomacy. their own colleagues (special thanks to The survey is distinct from the poll in John Dinkelman for providing website that the researcher aims to identify not contact names). Fortunately, when only people’s basic opinions, but also employees did come across my survey, the relationships between various factors the response rate was truly outstanding – that underlie them, especially by using employees clearly had something to get statistical analysis. The goal of the off their proverbial chests. For as project was not simply to collect a frustrating as it was to have official smattering of complaints, but rather to channels (HR/PE, CDA, AFSA) decline systematically characterize the collective to participate in a survey designed to experience our people have of the EER. make the Foreign Service better, I The idea was to hold the proverbial respect their traditionalist reaction and mirror to the EER, to evaluate it for what mainly the fact that they did not ask me it is worth as a tool (special thanks to to stop the survey as a private channel Brian Majewski and Isiah Parnell for endeavor. their encouragement). The survey was entirely web- based, utilizing a survey tool called Basic Questions & Hypotheses “Zoomerang”, found at I wanted to know what factors go (please contact into getting a good EER, getting tenured, me through email if you would like a and getting promoted. I openly surmised copy). It consisted of 80 questions, most that one’s experience with the EER was of which are multiple-choice. It asked based on more than just his or her actual about basic demographic information, work performance in a vacuum, but also such as age, sex, and education. It also on the approaches of the Rater and asked about what sort of EER the Reviewer, the background of the respondent got last, for example whether employee, the circumstances of the EER he or she was satisfied with the process, etcetera. In short, I evaluation and if not, why not. Indeed hypothesized the EER to be an imperfect by the time the respondent was finished tool, one that could be improved for the with all 80 questions, he or she had been betterment of the Foreign Service and asked about a full range of items hence for the advancement of hypothetically related to the EER Transformational Diplomacy. process and its outcomes. The amount of data generated by the survey was Research Method remarkable and cannot easily be Obtaining survey participants for summarized here. Key findings will be the on-line “EER Experiences Survey” I discussed and additional ones will be would undertake was challenging. reserved for future papers or for Denied institutional help, the survey had responses to specific inquiries. to be distributed almost entirely through The survey was admittedly private email and website channels, with designed primarily for Entry Level the exception of limited email Officers (ELOs). It was developed using 3
  4. 4. feedback from a focus groups consisting ELOs, roughly 1 in 4 answered the of several ELOs from the Consular survey, per incoming class. Section at US Embassy Mexico City. After cleaning the dataset of The survey was later expanded so that incomplete cases, there were 446 Specialists and Generalists of all ranks Generalists and 189 Specialists whose could answer it if they came across it. data could be used in analyses. Because Though it clearly does not apply as well of the relatively low response rate to these other employees, the goal was to among the Specialists and their inherent make the survey as inclusive as possible. variability in job type, I chose to focus Some questions are more useful than on the Generalists for now. others in comparing experiences across Among the 446 Generalists who employment type and pay grade. answered the survey completely, roughly The survey asked respondents 90% (389) were entry-level or lower about their most recent EER exclusively. echelon Officers with less than nine The point of limiting the survey to the years in the Foreign Service. The most recent EER was to hold constant as remaining 10% were too varied in length best as possible the extraneous effects of of service, so I separated them out and memory revision and disproportionate focused on the remaining 389 complaining. If each respondent Generalists. The vast majority (71%) of answered the questions with his/her most this group was hired during the three- recent EER in mind, we would have a year (2002-2004) DRI, i.e. 278 out of clearer snap-shot of the overall 389. One can appreciate the successful experiences of the respondents as a response rate of the survey when group, rather than a biased snap-shot of considering that the Diplomatic simply the bad or good experiences over Readiness Initiative was to hire 623 the years the respondents chose to voice. Generalists across 2002-2004 (according to GAO reports) and the survey had 278 Participants DRI hires. That is a sizeable 45% Participation in the survey was response rate within the DRI cohort. remarkably high and remarkably Most of the Generalists in the comprised of hires from the Diplomatic subset of 389 came from the 99th A-100 Readiness Initiative (DRI). In a month through the 120th. Most were not and a half, 644 Foreign Service tenured at the time of the survey, but 150 employees had completed the survey. out of 384 of them (39%) were tenured We cannot know how many employees (5 people did not disclose). The had access to the survey, since it was Generalists ranged in age from 24 to 63, passed around electronically. However, with an average of 34 and a mode (most the response rate can be considered common value) of 31. There were 188 exceptionally high, given both the female respondents and 199 male, a relatively small size of the Foreign roughly even split. The 5 job “cones” Service and especially the number of were relatively evenly represented in the respondents by orientation class (A- dataset, in terms of the Generalists’ 100), as a percentage, which was chosen career tracks. However, most of roughly 25%. That is, at least among the Generalists (48%) in the dataset 4
  5. 5. reported their most recent EER was Generalists liked their Raters as bosses received while doing a Consular job, at least “somewhat” (28%) and most 14% a Management job, 14% a Political (51%) liked them “very much”. Only job, 10% a Public Diplomacy job, and 21% reported liking their Raters as only 8% an Economic job. All the bosses “very little”. As much as 92% following results refer to the subset of liked their Raters as people at least the 389 newer Generalists unless “somewhat” (26%) and most (66%) otherwise specified. liked them “very much”. Only 8% reported liking their bosses as people Results “very little”. The figures were very Most Satisfied with Own EER and similar for Reviewers. Much is Good about the Current We can also say that the current System. EER system is a decent means of getting In many ways, much is good positive feedback from one’s supervisors about the current EER system. Most and of having something for which to Generalists surveyed were very satisfied strive. According to the 389 Generalists, with their own EERs. A full 86% of the most supervisors raise the EER with Generalists were either somewhat their subordinates as either a procedural satisfied (29%) or very satisfied (57%) matter or a positive opportunity to earn a with the final outcomes of their own reward for a good performance. Very EER Ratings. A full 82% of these few supervisors raise the EER Generalists were either somewhat threateningly with their subordinates. In satisfied (25%) or very satisfied (57%) short, the EER probably has some solid with the final outcomes of their own merits as a good incentive to perform EER Reviews. well. You might say these satisfaction It may further speak well of the levels are relatively high, given the fact current EER system that satisfaction that most of the Generalists received the with it would not seem to be largely given EERs for doing jobs they by and affected by many things that should large did not join the Foreign Service to arguably not affect it. In my analyses, do. Indeed, according to this group’s no significant differences were found in data, it is not until one’s fifth year in the levels of satisfaction with the EER Foreign Service that the odds of getting system on the basis of the employee’s: a non-Consular job become greater than age, sex, cone, years of pre-FS work “50-50”. Among those in the FS up to experience, or educational level. two years, 60% had their last EER in a Consular job; up to three years, 54%; up Most Dissatisfied with the Current to five years, 52%. At the fifth year System and Much is Bad about the mark, 34% of the Generalists in the System. dataset had a Consular job, the largest of Here is the catch. According to any cone group. this dataset, there is an enormous Satisfaction with Raters ran amount wrong with the current EER relatively high among the 389 system. For starters, most of the Generalists. Roughly 79% of the Generalists surveyed are quite 5
  6. 6. dissatisfied with the EER system, despite Another apparent problem is that being quite satisfied with their own the procedural regulations for Ratings and Reviews. A whopping 71% completing EERs are very poorly of the 389 Generalists were either followed. Raters and Reviewers are not neutral about or dissatisfied with the seen as being proactive. Only about half current EER system. The bulk of the (52%) of the Generalists reported their Generalists (46%) were either somewhat Raters/Reviewers to be proactive in dissatisfied (27%) or very dissatisfied getting their EERs completed. Only (18%) with the current EER system. 27% reported that the counseling dates Only 29% of the Generalists were on their EERs were accurate and another satisfied with the current EER system 27% reported their counseling dates did (24% were somewhat satisfied and 5% “not at all” correspond to any actual were very satisfied). counseling dates. Only 44% reported The big issue is that employees getting a written counseling statement, feel the EER system is not good for the something that is in theory supposed to Service, owing to its perceived inability document good performance as much as to effectively evaluate and sort the good bad. Lastly, one in four of the from the bad. Even a moment’s glance Generalists reported not getting a Work at the open-ended comments from the Requirements Statement on time. Generalists reveals deep criticism of the Employees also took issue with current EER system. One respondent having no real outlet to disagree with called the system a “Kabuki dance”. their Raters and Reviewers. Though Another called it a “game”. Still another most saw their own EERs as being quite called it a “joke” and “waste of time”. good, many were disgruntled over the Only a small minority of comments notion that the EER form and culture left expressed support for the current EER no means of dissent. The personal system, hedging their opinions with statement could not be effectively used comments like, “it’s not perfect, but it’s as a tool for dissent, without fear of the best we have”. downgrading, no matter what the true The single most common quality of the employee’s performance. complaint appeared to be that “everyone Most employees reported writing rosy is a water walker”. Many respondents statements and entertaining no viable indicated in one way or another that such option for EER grievances. a “water walking” system of over Some employees were more inflated praise is powerless to remove dissatisfied with the current EER system poor performers and does not discern than others, as a function of noticeable between average and exceptional work. factors. Those who had a lower level of As one Officer summarized succinctly: interest in the FS as a career had a higher “EERs don’t seem to be about level of dissatisfaction with the current management, growth, or performance – EER system (mean of 1.4 versus 1.9, on and they certainly aren’t reviews or a satisfaction scale of 0 to 4; p<.05). evaluations. They are the way the Both groups were relatively dissatisfied Foreign Service exemplifies its ‘go with the EER system, but the group with along to get along’ personality”. lower interest in the FS as a career was 6
  7. 7. less satisfied with the EER system. typically be used punitively. Possibly Possibly these individuals are less this presumption is incorrect and/or interested in FS careers precisely possibly employees are more satisfied because they are dissatisfied with the across the board when they are routinely EER system. informed of their performance, whether In any event, employees higher it is good or bad. on proactive-ness in completing their In analyses of EER system EERs were also less likely to be satisfied dissatisfaction in the larger dataset of with the current EER system (mean of 446 Officers, including more senior 1.59 versus 1.84, on a satisfaction scale officers with as much as 33 years of of 0 to 4; p<.05). In both cases, experience in the FS, there were also employees were overall dissatisfied with noticeable patterns. Those who were the current EER system. It may be that relatively new to the FS (with less than proactive people are simply more nine years in) were less satisfied with the irritated by a perceived sluggish nature current EER system than their more to the EER system. experienced counterparts (with more Certainly the Generalists were than nine years in); p<.05. That said, much less satisfied with the EER system neither group was very satisfied with the and their own EERs as well, if they had system. The mean level of satisfaction reported their counseling dates to be among the new Officers was 1.7 on a inaccurate and/or that they never satisfaction scale of 0 to 4 (where 2 is received a written counseling session. neutral), i.e. the new Officers were While the Generalists were quite mainly dissatisfied. The mean level satisfied with their own EER Ratings among more experienced Officers was and Reviews overall, the more they 2.12; i.e. they were neutral. reported their counseling dates as Interestingly, when comparing inaccurate and/or their written Specialists on EER system satisfaction, counseling sessions as absent, the less it is clear that they are the least satisfied. likely they were to be as satisfied with Analyzing the 175 usable surveys from their Ratings, Reviews, and the EER the Specialists, the mean level of system (p<.01 in all cases). satisfaction for the newer employees was These highly significant findings a largely dissatisfied 1.59 (on a scale of fly in the face of those who would argue 0 to 4). In contrast to the pattern with that counseling dates and formats do not the Officers, Specialists with more time matter. It appears that they matter to the in the FS were even less satisfied with extent that employees probably do not the EER system compared with their feel they were treated as fairly or newer counterparts. In fact, the least effectively when such structures are satisfied of all FS employees were the lacking. It is somewhat surprising that Specialists with greater than nine years absence of a written counseling session of time in the FS (mean of 1.57). did not correlate with greater satisfaction None of these low levels of with one’s own EER Ratings and satisfaction with our EER system may Reviews, given the presumption that come as any surprise. As in any survey written counseling sessions may about a controversial topic, the 7
  8. 8. disaffected may be more likely to subordinates. In short, there is strong respond by virtue of the discontent they evidence that liking and being liked as wish to vocalize. That said, the sample both a person and as a boss/subordinate size in this study is substantial and the are tightly linked. These findings were survey was worded in quite neutral highly significant using a variety of terms, allowing for the widest range of statistical tests (independent samples t- expression. Most importantly, the bulk tests: p<.001; paired samples of the employees surveyed reported high correlations: p<.001). levels of satisfaction with their own EER To illustrate, let us look at the Ratings and Reviews – hence their relationship between two variables: discontent appears to rest mainly on the perception of being liked as a person by EER system, as opposed to any general one’s Rater and perception of being demoralized state. liked as a subordinate by one’s Rater. I divided each of these variables into Many arbitrary factors affect EERs, “high” and “low” groups, wherein the beside performance. survey respondent was put in the “high” What may be most disconcerting group if he/she reported a 2 on the about the current EER system is that likeability scale of 0 to 2 and in the many seemingly arbitrary factors have “low” group if he/she reported a 0 or 1. significant affects on the outcome of Statistical results were highly one’s EER – things that are not directly significant, that “low” group employees related to performance. For one, in my perceived that their Raters liked them as research I found strong evidence that subordinates significantly less (mean of people cannot easily separate their 1.26) than “high” group employees fondness (or lack thereof) for other (mean of 1.87), as a function of how people as people, versus as fellow little or much they perceived the Raters employees. If this is true, EERs may be liked them as people. more about winning friendly favor (with The same result was found with bosses exclusively) than performing well the inverse model. That is, those who on the job per se. perceived their Raters liked them as Employees were asked to report subordinates to a “low” extent also how much they liked their Raters and perceived their Raters liked them as Reviewers and how much they perceived people to a lower extent (mean of 1.11), their Raters and Reviewers liked them. compared with their counterparts who Ratings were solicited in all cases in two perceived their Raters liked them as different categories: “as a person” and subordinates to a “high” extent (mean of “as a boss” (e.g. “how much did you like 1.78). your Rater as a person? As a boss?”). Though we cannot say for sure Results showed that employees’ ratings statistically whether there is a causal of their bosses as people strongly relationship between liking and being correlated with their ratings of their liked as a person and liking and being bosses as bosses. The same was true for liked as a boss or subordinate, we can how employees perceived their bosses to say that probabilistically speaking, one’s perceive them as people and as likelihood of getting a good EER 8
  9. 9. decreases rapidly if one perceives he/she One of the more controversial is not liked on a personal level. elements of the EER is the Area for Conversely, one’s likelihood of getting a Improvement box. I asked employees to good EER increases rapidly if one tell me what was written in their boxes perceives he/she is liked on a personal and how they chose to respond. Most level. In this case “good” is defined by reported that their boxes mentioned level of satisfaction one has. One’s Managerial skills (135, or 35%), then likelihood of being more satisfied with Substantive Knowledge skills (107, or one’s own EER as a function of 28%), then Communication/Foreign perceiving being liked by one’s bosses Language skills (90, or 23%), then as a person also contributed to greater Interpersonal skills (49, or 13%), then satisfaction with the EER system. This Leadership skills (41, or 11%), and can be reasonably said because in all finally Intellectual skills (38, or 10%). statistical analyses of EER satisfaction in These percentages do not total 100 the present study, satisfaction increased because some respondents had more as a function of increased perception of than one core competency group sited in being liked as a person. In all cases their Area for improvement boxes. these findings were highly significant at Interestingly, respondents varied p<.001. We would not expect to find greatly in the extent to which they these statistical relationships if how one thought their Areas for Improvement was perceived as a person bore little were accurate reflections of their relationship to how one was perceived as performance. Taken as a whole it a subordinate in the job or a supervisor appears that over a third of the in the job. Generalists (35%) did not think that their In short, it does not appear from Areas for Improvement were very these findings that people can easily germane to their actual performance. separate their feelings toward one This would seem to imply there is a another as people and as bosses or relatively large gap between Raters’ and subordinates. This may reflect the age their subordinates’ perceptions of the old wisdom of throwing cocktail parties subordinates’ Areas for Improvement. for your bosses and subordinates. It Yet what employees had in their would be fascinating to compare the Area for Improvement boxes on their EER and career successes of employees last EERs appeared to be related to on the basis of how much they involved tenure. The tenured subset of 150 themselves in socially ingratiating their Generalists from the 389 with fewer than bosses and subordinates over the years. nine years in the FS was examined for The effect may be more powerful than relationships between Area for we can imagine. The mechanism is two- Improvement boxes and tenure. The fold: (1) people rate people as more comments in the improvement boxes competent if they like them on a were coded by “core competency”: personal level and (2) people are Leadership skills, Managerial skills, disinclined to rate people as incompetent Interpersonal skills, if they like them on a personal level. Communication/Foreign Language skills, Intellectual skills, and Substantive 9
  10. 10. Knowledge. Not one tenured Officer in it or to wait until the next EER to the dataset of 150 had Leadership skills address it. Another 34% chose to agree and not one Officer had Intellectual with the comments in their Areas for skills cited in his/her last EER Area for Improvement and to grant the items as Improvement. Thirty-eight percent (57) “something to work on”. The remaining had Managerial skills cited, 29% (44) 25% or so chose to interpret their Area Substantive Knowledge, 19% (29) for Improvement positively, with a Communication/Foreign Language “spin” or reframe of the item. skills, and 11% (17) Interpersonal skills. These different employee The numbers do not add up correctly due responses to the Area for Improvement to the fact that some employees reported appear to have different consequences. multiple Areas for Improvement. We can see evidence of different tenure I wanted to see if one’s odds of rates for different responses to the Area being tenured on the first review for Improvement box by comparing the changed as a function of the core average tenure review numbers. When competency cited in one’s Area for asked about the Area for Improvement, Improvement box, so I calculated the those who reported they “interpreted it average numbers of tenure reviews positively, with a ‘spin’ or reframing of officers had prior to getting tenure in the it” were tenured in the lowest average subset of 150 tenured officers. There number of reviews (1.24), then those were too few cases to perform legitimate who “agreed with it explicitly, granting statistical analysis, however trends it as something to work on” (1.33), then emerged. Officers with Managerial those who “did not address it, preferring skills cited for improvement were to ignore it or wait until next EER” reviewed on average the most: 1.44 (1.42), and finally those who “disagreed times; then those with with it explicitly, offering a Communication/Foreign Language counterargument” (1.50). It is likely that skills, 1.31; then those with Substantive the more one overtly disagrees with Knowledge, 1.25; and finally those with one’s Area for Improvement, the less Interpersonal skills 1.00. This may not likely one is well received by tenure be a strong finding, but it does run panels (and probably promotion panels counter to the popular expectation that too). having Interpersonal skills cited in your The extent to which the 389 Area for Improvement is the “kiss of Generalists “haggled” or negotiated with death”. These data do not support that. their Raters/Reviewers was examined, How did the 389 Generalists along with the number of rewrite choose to respond to their Area for requests the Generalists made of their Improvement boxes? Few chose to Raters/Reviewers. Hypothetically one explicitly disagree with their Rater’s could get a better Rating/Review if one assessments: less than 2%. The bulk endeavored to elicit changes for the (40%) of the Generalists surveyed better from his/her Rater/Reviewer. I reported that they chose not to respond did not find evidence of that however. to the Area for Improvement at all in The results I found in these analyses their EER statement, preferring to ignore indicate that dissatisfaction with EER 10
  11. 11. Ratings, Reviews, and with the EER males and females did not differ system, goes up as the amount of significantly in their levels of haggling/negotiating goes up and as the satisfaction with the EER system, they number of rewrite requests goes up. did differ in their levels of satisfaction Most likely haggling/negotiating and with their own EER Ratings and rewrite requests go up rather as a Reviews. In a sample of 377 subjects function of dissatisfaction with one’s (12 subjects of 389 did not report their EER Ratings and Reviews, and this in sex/gender), there were 188 females and turn decreases one’s satisfaction with the 199 males. The mean satisfaction level EER system. For the record, most for EER Ratings among males was 3.42; employees did not haggle/negotiate very for females it was 3.22 (on a scale of 0 much, nor did they request many to 4). The mean satisfaction level for rewrites. From the larger dataset, some EER Reviews among males was 3.43; 70% reported not haggling/negotiating at for females it was 3.18. In both cases all and the average number of rewrite the figures are significantly different at requests was only one. p<.05. In explaining this sex/gender It is tempting to conclude from difference, we might consider general the above that employees do not differences in male/female willingness to influence very much the outcomes of express satisfaction with one’s their Ratings and Reviews through performance (or to hide dissatisfaction). haggling/negotiating or through multiple Alternatively, perhaps there is a rewrite requests. However, it is clear differential in how males are rewarded from comparing EER complaints before compared to females, in terms of either and after requests for changes, that what Raters/Reviewers value and/or employees have significantly fewer what they perceive as their employees’ complaints about EERs in the end – an strengths. estimated 30% less. Sex/gender differences in EER The survey respondents satisfaction might not be worth collectively had 492 various, reported speculating were it not for tenure review complaints about their own EER patterns. Analyzing the subset of 150 Ratings/Reviews before requesting tenured officers from within the 389 changes. After requesting changes, the Generalists with less than nine years in number of total complaints reported the FS, I found that females were dropped to 349, a difference of 143 or reviewed for tenure an average of 1.40 30%. In sum, it is safe to say that times before getting tenured. In contrast requesting changes of your males were reviewed an average of 1.28 Rater/Reviewer can dramatically reduce times before getting tenured. It would complaints you have about your EER, seem that males are slightly more likely and in turn probably influence your own than females to get tenured on the first competitiveness vis-à-vis your peers. review. However, it is difficult to tell Among seemingly arbitrary whether this was a chance finding given factors affecting one’s experience of the that the sample size of tenured officers EER, I found that sex/gender of the was too small to legitimately run employee comes into play. Though 11
  12. 12. inferential statistics on sex/gender and levels of satisfaction with their EER average number of tenure reviews. Ratings, Reviews, and with the EER One major complaint employees system as a whole; p<.001. had about the EER system is that their That is, those reporting low evaluations are subject entirely to the nurturing Raters had a mean satisfaction approaches of the individuals who serve level of 3.14 for their Ratings, compared as their Raters and Reviewers. I have with 3.68 for those reporting high already noted that most employees nurturing Raters; they had a mean surveyed saw their Raters/Reviewers as satisfaction level of 3.12 for their rather low on proactive-ness when it Reviews, compared with 3.66 for those came to completing the EER process. reporting high nurturing Raters; and they To investigate the effects of differing had a mean satisfaction level of 1.42 for levels of perceived proactive-ness the EER system, compared with 2.23 for among Raters/Reviewers on the part of those reporting high nurturing Raters. In employees, I divided the data into high sum, it is very clear that the level of and low categories and compared them. nurturance of the Rater has major affects In the case of both Raters and on how satisfied the employee is with Reviewers, there were highly significant his or her EER and with the EER system effects on employee satisfaction with in general. one’s own EER Rating, EER Review, Another common complaint and the EER system (p<.01). That is, employees had about the EER system in Generalists viewing their Raters and their open-ended responses to the survey Reviewers as low in proactive-ness were was that the quality of EERs is largely less satisfied all around. This finding driven by the writing styles (or lack would appear to run counter to the thereof) of the Raters and Reviewers notion many Raters/Reviewers have that writing them. As one way of exploring “so long as it gets done, that is all that this hypothesis, I compared satisfaction matters.” There are clearly deleterious levels between groups of employees effects to procrastinating on EERs, some whose Raters and Reviewers had written of which employees themselves may not their EERs in “list” versus “story” have been aware. format. It has been said that the story The impact of nurturant format is more powerful in that it leadership on employees is also an provides a chronological narrative of important factor in levels of satisfaction events, as opposed to a mere list or with EERs and the EER system. I inventory of accomplishments. looked at the extent employees The story format was much more perceived their Raters as mentors or unique in the employee sample coaches “proactively nurturing compared to the list format. Among the professional development throughout the subset of 150 tenured officers, taken rating period” (on a scale of 0 to 4, from the larger group of 389 who had where 0 is “not at all” and 4 is “all the served fewer than nine years in the FS, time”). In all cases, if employees there were 130 who had list format viewed their Raters as low in this Ratings and 17 who had story format. nurturing factor, they also reported lower Statistical conclusions are difficult with 12
  13. 13. too few data points, however suspending done as quickly as it could/should have prerequisites, the difference in mean been” (28%). Only 2% complained tenure rates between the two groups was there was “too much criticism.” significant (p<.05). Those tenured Surprisingly, 250 people out of officers who had the story format Rating 389 (64%) wrote lengthy comments in had an average number of tenure reviews the open-ended box for this item. There of 1.12, compared with their list format were many different kinds of complaints, counterparts who were tenured on most about style and form. Some wrote average in 1.38 reviews. that their EERs were done too hastily, A similar pattern was found for after much procrastination on the part of Reviews. One hundred three officers the Rater or Reviewer. Interestingly, had list format Reviews and 47 had story many respondents complained that their format reviews. Officers with the story Rater or Reviewer did not write the format Review had a slightly faster Rating or Reviewer, rather the employee tenure rate; 1.32 versus 1.35. Taken as a him or herself wrote the Rating or whole, it seems that the story format is Review. A few employees reported that more advantageous to the employee their EERs had Reviewers from the Civil being rated. Possibly it conveys a more Service or did not have Reviewers at all. compelling endorsement. The fact that it There were a number of variables is less common than the list format that I looked at that did not generate any might also have implications. Perhaps significant findings and that is perhaps Raters/Reviewers are less inclined to itself significant. For instance, I could take the time to write the presumably not find any evidence that the number of more labor-intensive story format unless times one took the Foreign Service they already have a compelling Written Exam and/or the Oral endorsement in mind for the employee. Assessment made any difference in In any case, it seems clear that the one’s likelihood getting tenured on the writing style of the Rater/Reviewer does first review. matter and does have important Some other factors that showed implications for the employee being no relationship to the rate at which one rated. If the list/story format distinction gains tenure were: marital/relationship has implications for the employee’s status, having dependent children or not, success, presumably more subtle forces doing hardship posts (with the exception such as sentence construction, of having served in Iraq or Afghanistan, vocabulary, and grammar most certainly which made tenure on the first review do as well. The concern for the very likely), number of months spent in employee is that these forces are language training (with the exception of considerably beyond his/her control. those who had only one EER – no one in Not surprisingly, the most the sample got tenured with only one common complaint that employees had EER), having the box “Candidate is about the EER system is that their EERs recommended for tenure…” checked, were “not well-written in style/form” level of promotion aspiration, (32%). The second most common employee’s pay grade, Rater’s pay complaint was that the EER was “not grade, and Reviewer’s pay grade. 13
  14. 14. None of the above factors were Now a semi-retired Harvard University statistically related to tenure rate. I do professor, Argyris was hired by the suspect some of them would be related Department as a consultant in the late to promotion rate. I undertook to 1960s to analyze the Department’s compare promotion rates (pay grade bureaucratic culture. I emailed with divided by time in the FS, controlling for Argyris, as a fellow psychologist, and various factors like starting pay grade), was fascinated to read the papers he sent however it was not possible given the me. He works for an organizational lack of survey participation on the part consulting firm now and uses the of mid-level and senior-level employees. Department in his examples of problem Perhaps a future study could look at cultures! promotion rates. What Argyris pointed out about Because there was concern from the Department over three decades ago employees taking lengthy language was that Department leadership courses that they might not be tenured as “espouses learning yet acts in ways that fast, I took an additional look at the inhibit learning”. He defined learning as effects of language on tenure rate, with “the detection and correction of error” cross-tabulations of the 150 tenured and said it was “key to effective officers. Firstly, not one tenured Officer organizational change and was still on language probation. development”. Argyris pointed out that Secondly, calculations of means the Department also has a problem of revealed a pattern wherein Officers off culture in that it “rewards spinning and probation in more difficult languages cover-up”, something that also precludes were reviewed on average fewer times effective learning and development. before being tenured. That is, Officers Even though I have been in the off probation in “Superhard” languages Foreign Service for less than four years, were reviewed on average 1.25 times, I was amazed at that poignancy of those off probation in “Hard” languages Argyris’ description of Foreign Service were reviewed on average 1.32 times, culture, derived from his work back in and those off probation in “World” the 1960s: “As a result of a powerful languages were reviewed on average feedback loop, a process within the 1.34 times. Hence, provided one has at Foreign Service culture tends to least two EERs, it is differentially easier reinforce the participants to minimize to gain tenure the harder the language interpersonal threat by minimizing risk- one speaks. taking, being open and being forthright, Conclusions as well as minimizing their feelings of From one psychologist to another: responsibility and their willingness to learning is the detection and confront conflict openly. This, in turn, correction of error. tends to reinforce those who have In researching for this paper, I decided to withdraw, play it safe, not ultimately came across the work of make waves, and to do both in their another Psychologist, named Chris behavior and writing. Under these Argyris, who wrote about the conditions people soon learn the survival Department of State over 30 years ago. quotient of ‘checking with everyone,’ of 14
  15. 15. developing policies that upset no one, of the employee, the circumstances of the establishing policies in such a way that EER process, and more. Certainly the superior takes responsibility for getting an EER one is satisfied with and them. It also coerces ‘layering,’ because being satisfied with the EER system is (1) subordinates staff to be ready for a based on a variety of seemingly arbitrary crisis, (2) more people are needed to factors. Collectively this evidence make a decision, and (3) protection of suggests that the EER is an imperfect one’s bureaucratic skin becomes critical tool that could be significantly improved for survival.” for the betterment of the Foreign Service So what does Argyris propose we and hence for the advancement of do then? Argyris sites the now historical Transformational Diplomacy. issue of the “slam dunk” case for The current EER system has weapons of mass destruction in Iraq as some basic flaws. One key theme an error in organizational cultures that through the research findings is that the we can only move beyond through the EER is in large part an exercise in enabling of learning as a key objective. “water walking”, or as it has been He does not specify how that should take referred to before, “apple polishing”. place, but maintains that “productive That is, EERs suffer from over-inflated reasoning” must be encouraged such that praise and the complicities in it, which claims in an organization can be tested may have the effect of undermining true for validity. Science knows this. organizational learning. Though there is Information Technology also knows this. a critical Area for Improvement box that Real progress does not happen without strives to convey real information, it actual, empirical testing of truth claims. may primarily be filled with soft items As I read Argyris’ work, I believe that either do not convey real needs or reforming our EER system is one such that serve only to covertly assassinate way to better enable learning and in turn the employee’s character. Another issue transformational diplomacy. with the Area for Improvement is that it is generated from only one perspective, The EER is probably not the Rater’s/Reviewer’s. Some survey transformative; much could be participants noted that the Area for improved. Improvement is also not supposed to Though the current EER system repeat across EERs, which is has much going for it that is positive, I questionable itself, given that employees am hard pressed to find much about it can have enduring issues that take time that lends itself to transformational to correct. The tenure check box section diplomacy. The evidence from my also does not appear to convey much research strongly confirms the useful information, since it almost hypothesis that one’s experience with invariably recommends the employee for the EER is based on much more than just tenure. Besides, nearly all employees his or her actual work performance in a get tenure sooner rather than later, vacuum. Rather it is based on the regardless of the tenure check box. circumstances of the approaches of the Finally, the most profound evidence that Rater and Reviewer, the background of the current EER system is flawed comes 15
  16. 16. from the many employees who are when some of their interests run counter happy with their own glowing EERs but to one another, a system that takes into disenchanted with the system for the account pleasing the bosses while also same reason, that everyone gets glowing pleasing coworkers and subordinates EERs. would be more optimal. One would I have several concrete think that Raters/Reviewers would want recommendations for improving the to know these other dimensions in their EER system, some of which come subordinates as well – otherwise they directly from doing this research, others could be erroneously recommending for of which come from basic tenure and promotion subordinates who psychometrics. Consider briefly the are dysfunctional in these other, allegory of the blind men and the arguably just as important, dimensions. elephant, taken from Eastern lore. A In my survey, I asked employees group of blind men are asked to each some questions about the so-called “360- feel part of the same elephant and then to degree”, multi-dimensional evaluation characterize it. Naturally they each concept and its components. A describe vastly different things, one whopping 92% of the Generalists describes an elongated trunk-like part, reported that they would like at least a another a floppy ear-like part, another a “little bit” (11%) the Department to flat and furry hide, etc. Each is consider changing the EER system to convinced he clearly knows the true incorporate 360-degree evaluations. The qualities of an elephant. The irony is bulk of the respondents (36%) that none of them knows the full reality. “absolutely” wanted the Department to In this regard, evaluating our employees consider 360-degree evaluations, 25% using exclusively qualitative, one- “very much”, and 20% “a moderate dimensional procedures is fundamentally amount”. Only 8% reported that they do flawed. You might say that the Rater- “not at all” want the Department to Reviewer system provides at least two consider incorporating 360-degree dimensions, even if no quantitative evaluations. measure. Yet the Reviewer is merely The Generalists also had solid another top-down perspective, and one ideas about what components of 360- that has a power-differential over the degree evaluations they would endorse. Rater at that. It is hardly an added, A remarkable 76% wanted to have independent dimension. evaluations of supervisors, by We need an EER system that has subordinates. That was the only solid both qualitative and quantitative agreement among the Generalists on components, as well as multi- 360-degree components. Fifty percent dimensional perspectives on the wanted to have evaluations of employee. Many support “360-degree” Americans by FSNs and/or LES type evaluations (and the Department employees. Forty-five percent wanted to has to its credit begun implementing have evaluations of same-level peers by some such reforms). Given that an same-level peers. Only 9% reported employee cannot please all the people all they wanted no additional types of the time in an organization, especially evaluations within the 360-degree 16
  17. 17. concept. It seems clear that when the measure added, but the remaining 42% Generalists report being in favor for 360- did not want any added quantitative degree evaluations, they are mainly measure added. Twenty-nine percent advocating the bottom-up component supported scaled “grades” for employees wherein subordinates evaluate their along each of the six core competencies supervisors. (e.g., John Doe gets a 4.0 in the Modern psychology has long Leadership skills competency, a 3.0 in held that qualitative components, even the Management skills competency, multi-dimensional ones, are not etcetera). Twenty-seven percent sufficient on their own – they need to be supported percentile rankings (e.g., accompanied by quantitative Suzie Smith is in the top 20% of components to effectively evaluate subordinates I have supervised). human behavior. Qualitative and Fourteen percent supported “within-the- quantitative components of evaluation person” rankings (e.g., Fred Johnson is each contribute critical pieces of best in Leadership skills, then information that should be viewed Intellectual skills, then Substantive jointly. In the context of EERs, knowledge, etcetera). Lastly, in an complaints of too much emphasis on open-ended question about quantitative qualitative components abound, in the measures, some Generalists expressed form of complaints about the over- concern that quantitative measures were importance of writing skills in the EER “too subjective”. Still others expressed process. One might therefore think that they would like to have some sort of Generalists would be supportive of scale quantifying employees vis-à-vis quantitative measures of performance their peers, to avoid the so-called “Lake evaluation. Such measures would Wobegon effect”, wherein most people clearly not involve writing skills. claim to be above average, despite the However, when the Generalists fact that most people cannot be above in the survey were asked the extent to average, by definition. which they would like the Department to In my own interpretation of the consider quantitative measures of Generalists’ less than full support for performance, the bulk of them were “not quantitative measures, I imagine that at all” in favor (45%). Still, a majority they may not have all known about the of respondents (55%) were at least “a great benefit of quantitative measures little bit” (13%) in favor of adding some that are used within context. Some sort of quantitative measure of stated in the open-ended comment boxes performance to the EER (“a moderate that they feared quantitative measures amount”, 20%; “very much”, 12%; because they suspected they would “absolutely”, 10%). suffer from “grade inflation” just as the The Generalists were asked about qualitative narratives have. Let me just several basic types of quantitative point out for now that there are ways to measures to gauge their views on them. mitigate that, including collecting data They were quite divided in their on average quantitative evaluations responses. A majority of 58% percent given by raters, in order to provide wanted some type of quantitative context. 17
  18. 18. We should consider a wide range individual. Prompting questions could of possibilities for implementing be taken directly from the six core changes to the current EER system. We precepts to stimulate quantitative, certainly do not want to make it any evaluative responses from 360-degree longer or complicated than it currently panel members in order to arrive at is. There should be ample means of both scaled ratings for employees, within the improving the EER system and core precepts and overall. streamlining it. We should consider We might even systematically reducing the amount of extraneous work derive Areas for Improvement from the that goes into EERs, while making the output new scales could generate. The reduced efforts count more fully. Some supervisor’s task would then not be so means of doing this include not only complicated in crafting the perfect Area adding additional dimensions and for Improvement for the subordinate. It quantitative input, but also reducing would be a matter of saying, “I see that overall the depth of these measures, you got your lowest 360-degree score in shifting instead to more frequent, X core precept, here is what I propose perhaps quarterly evaluations of a you do to raise that score.” Employees smaller, yet wider scale. could then of course have the same, One way to both enhance and system-generated Areas for streamline the EER system could be to Improvement across EERs, as they implement a new computer program worked to address pervasive problems which could be utilized by randomly and hence to better themselves. selected members of a 360-degree rating If the above initiatives proved to panel whose members privately (and be too risky for the Department to adopt possibly anonymously) enter both as part of its formal personnel evaluation qualitative and quantitative information system, we might consider phasing it in into secured, on-line employee profiles, gradually, as part of an informal process in a systematic fashion, orchestrated by which would in time come to be Human Resources sections. Such a normative and ultimately computer program would be easy to institutionalized as a formal evaluation design and could advance our outdated, process. The point is that the proposals analog system significantly. It could can be adopted incrementally to manage also easily track contextual factors of risks. evaluations, like the given evaluators’ We have seen other institutions average ratings. such as the military and private sector Regardless of whether a new companies updating their performance system should be more computerized, appraisal systems for the better. Indeed additional rating components from other many of my survey respondents pointed angles can be added, while tracking the out that they had experienced much average ratings of those doing the rating better systems of employee evaluation in in order to provide context. We might their previous careers, including in the consider invoking the well-established military. To the extent that Secretary core precepts as a foundation for this, to Rice has advised that State will be highlight employee strengths within the cooperating much more with the military 18
  19. 19. under Transformational Diplomacy, one a “go along to get along” workforce? I could argue that our EER system should take Secretary Rice at face value when be at least as good as theirs. In doing she says: “We must transform old this research I did review a number of diplomatic institutions to serve new military evaluation forms. I noted that diplomatic purposes” and I submit to they were in general more conducive to you that transforming an old EER multi-dimensional and quantitative system is a key component to reaching assessment than State’s forms. I that goal. wondered if we might learn something from both the military and the private About the Author sector. To further develop the Don Kilburg has been an FSO since performance appraisal system at State, 2003. He served in Mexico City and is some lines of future research could also moving onward to Santo Domingo with be pursued. We might wish to check his wife Keely. He holds a doctorate in promotion rates across a wide range of Experimental Social Psychology from employee grades, making use of data on DePaul University and a bachelor’s when individuals entered State and what degree in Research Psychology from the ranks they have attained in how much University of Illinois. Before coming time and why. We would need greater into the Foreign Service, he was a participation from senior level professor at Eastern Washington employees, but if we gained it we could University and more recently at Saint learn much about what we are Olaf College. reinforcing and developing in our human resources as an institution. The corollary to the present study would of course be a study on Raters and Reviewers and their experiences evaluating employees. We might at some point wish to collect data on perceptions of and from FSNs and peer- level coworkers too. Certainly “what we measure, we improve in” and hence we should not expect to easily improve in important factors without first measuring them. Ultimately deciding what if anything we should do to improve the EER system depends on the answer to the question: what is our goal? Do we really want to transform our workforce to carry out the work of Transformational Diplomacy? Or do we want to continue shaping and reinforcing 19
  20. 20. To contact the author: 20