Slides for my internship presentation during my graduate study at San Jose State University, 2010. This project was aimed at evaluating a spoken-language assessment, which prompted certain considerations for test design (i.e., competence versus performance and subjective versus objective scoring criteria). Data collected from job applicants revealed measurement redundancy via strong positive relationships between dimensions of the model. Recommendations to revise the model are provided.
1. Evaluation and Revision
of Evolv On-Demand’s
English Proficiency
Grading Criteria
San José State University
i/o psychology internship
Michael cilla
2. Agenda:
Background info on the company
What is Selection Science?
Details of the project
Results
Recommendations
3. The Company
Evolv is the leader in science-based, on-demand talent matching and
intelligence solutions, enabling enterprises around the world to
systematically improve their operations, brands and bottom line through
superior talent.
Employees and employers enjoy a better job match, improving day-to-day
satisfaction.
Clients benefit from identifying more productive talent that stays longer.
Employees benefit from the satisfaction of excelling at jobs well suited to
them.
Evolv delivers data-driven certainty and custom-configured solutions that
easily integrate with existing workforce solutions such as applicant tracking
systems.
4. Selection Science
Evolv’s Selection Science collects and analyzes
millions of data points to provide a more objective Rigorously developed by a team of Ph.D. I/O
and data-driven decision engine. psychologists and market-proven with clients, the
It uses proprietary assessment algorithms to Selection Science decision engine starts smart
evaluate existing employees and new job and gets smarter over time as it systematically
applicants for required traits, characteristics, correlates on-the-job performance to the
motivations and skills specific to specific job selection criteria that placed them.
roles.
5. Initial Assignment:
The Project: Rating Job Applicants’ Proficiency in
English for a Call/Contact Center in Puerto Rico
The initial grading criteria were developed by the Evolv Selection
Scientists in collaboration with linguists to define quantifiable grading
for voice quality and English language proficiency in job applicants
for a new client.
6. Questions used for voice auditions:
Using a minimum of 5 complete sentences, please answer the following questions in
English. Call center agents often have to take initiative to reach their goals. Describe a
time when you took some initiative or solved a problem. First describe the situation and
then describe the action you took.
Using a minimum of 5 complete sentences, please answer the following questions in
English. Establishing good relationships with customers is very important as a call center
agent. Please describe how you establish relationships with people you’ve just met. What
are some of the things you do or say? What challenges have you faced and how have you
overcome them?
Using your own words, please summarize what happened in the following situation. John
was already running late. John got out of his car even though he was running late. John
walked back to his garage to close the garage door. John’s garage door remote control
did not work.
Using a minimum of 5 complete sentences, please answer the following questions in
English. Working in a call center can be a stressful job. Tell me about a time when you
had to deal with a very stressful situation at work or at school. First describe the situation
in detail and then describe how you handled it and what you learned from it?
Using your own words, please summarize what happened in the following situation: Valerie
was thinking about calling the customer service help line because her computer was
having problems, but after it started giving off smoke, she decided to call the fire
department instead.
7.
8. The Project: Rating Job Applicants’ Proficiency in
English for a Call/Contact Center in Puerto Rico
Preliminary Calibration Process:
To establish consistency in scoring the applicants being selected, a
team of five employees uniformly calibrated assessments of
applicants for approximately 30-60 trials.
This consisted of the employees providing their individual
assessments to the group, and then discussing why they decided to
assign the scores that they chose. Based on these discussions, a
consensus was reached among the employees for applicant scores
based on what qualities differentiated the variability in their
individual assessments.
9. The Project: Rating Job Applicants’ Proficiency in
English for a Call/Contact Center in Puerto Rico
Scoring Applicants & Gaining Expertise:
This calibration process continued for 60 days, however the team
of employees dropped to three, and finally down to one after 90
days (>100 applicants).
After scoring over 200 applicants, the primary applicant scorer
propositioned the supervisory staff of Selection Science to allow the
employee the opportunity to evaluate and possibly redesign the
scoring system for school credit at the employee’s university.
10. The Project: Rating Job Applicants’ Proficiency in
English for a Call/Contact Center in Puerto Rico
Evaluation & Proposal for Revision
The supervisory staff provided the approval and encouraged the
employee to utilize a rigorous statistical analysis of the available
data, the conceptual subjective expertise the employee had
received while being the primary applicant scorer for this particular
model, and additional research on previously developed speech
assessment models developed by linguists.
11. Results
Descriptive Statistics
Dimension Correlation Matrix
M Dimension 1 2 3 4 5 6
SD 1. Pronunciation --------
Pronunciation
2.62
1.01 2. Semantic Content .71* --------
Semantic Content
2.80
1.14 3. Fluency .70* .90* --------
Fluency
2.76
1.19 4. Vocabulary .64* .86* .83* --------
Vocabulary
2.96
1.14 5. Tone & Pace .59* .73* .71* .72* --------
Tone & Pace
2.93
1.13 6. Total Aggregate
Total Aggregated Score .79* .94* .92* .87* .80* --------
2.93 Score
1.23
N = 212 * - Correlation is significant at .01 level.
Of particular note are the strong relationships between semantic content, fluency, and
vocabulary (r > .80), as well as the strong relationships between the applicants’ total scores and
the dimensions of semantic content, fluency, and vocabulary (r > .85).
The Evolv English Proficiency Grading Criteria appears to have some redundancy due to the
overlap of the two dimensions: semantic content and fluency.
12. Recommendations
Delineate the performance component from the other
competency-based components of the Grading Criteria by
adopting the broader definition of fluency into the overall
Grading Criteria.
Define the phenomenon that the fluency dimension is
currently assessing as encompassing the dimensions of
vocabulary and tone and pace.
Adjust those dimensions (Vocabulary and Tone and Pace)
to assess applicants’ lack of hesitation and lack of
staggering their phrases as well.
Replace the fluency dimension with a new dimension
assessing applicants’ comprehension.
13. Recommendations
Protocol for rating the dimension of comprehension:
1) Include a new voice audition prompt to allow for the specific assessment of comprehension,
possibly a “passing on a message” format.
Some examples:
• “Please tell Bill that this room is being used for a conference call this afternoon, and apologize
for any inconvenience. Also,and ask if he would mind finding another room for today only.
Finally, thank him sincerely for making this accommodation.”.
• “Please tell John that I have to cancel our plans to have dinner tomorrow night. I could not find
another babysitter to watch the children after my usual one cancelled. Please also tell him that
I’m sorry for canceling at the last minute, and to call my cellular phone to reschedule for another
time whenever he’d like”.
• “Please explain to your customer, Sheila, that her wireless service is being discontinued for her
lack of making her last monthly payment. Kindly ask if it is possible for her to make a payment
today so that she does not experience an interruption in service. Finally, thank her for her time
and wish her a pleasant day”.
2) The criteria for assessment on a comprehension item should include checking for the applicants’
use of their own words in the “message”, using the correct temporal sequence, and looking for
quality over quantity.
14. Revised Dimensions &
Detailed Criteria
Pronunciation
5 = Perfectly Clear
1 = Garbled/Very Poor
¬ How easily is he/she understood?
¬ Special focus on ‘yuh’ vs. ‘juh’ as native Spanish Vocabulary
speakers often substitute these phonemes. 5 = Extensive vocabulary
¬ Special focus on /v/ and /p/ as both will sound like /b/ 1 = Very limited use of vocabulary/Use of non-English
to English speakers words
¬ Special focus on /s/ vs. /th/ as they are often ¬ Are speech, vocabulary, and grammar appropriate for
interchanged an adult speaking on the phone?
¬ Is vocabulary noticeably limited (e.g., only simple,
Semantic Content short words with a relatively small vocabulary)?
5 = Perfectly sensible and complete ¬ Is there a significant amount of repetition of words?
1 = Completely mixed ¬ Are words used appropriately?
¬ Are responses coherent, sufficient, and conveying
complete thoughts? Tone & Pace
¬ Are thoughts and ideas linked sensibly? 5 = Very friendly, warm tone with a sense of energy and
¬ Are responses flowing naturally, without any enthusiasm
significant hesitation? 1 = Very slow or tired speech
¬ Is the speech too slow or rushed?
Comprehension ¬ Is the tone of voice friendly?
5 = Consistently demonstrates understanding of ¬ Does their voice have energy or sound sluggish?
concepts ¬ Are the responses staggering and/or short and
1 = Confused/responses off topic broken phrases?
¬ Did the applicant use the majority of his or her own ¬ Does the applicant sound like a “natural” fluent
words in the message? speaker?
¬ Was the sequence of events temporally accurate?
¬ Does the message qualitatively contain all information
necessary?