David Graus
Dec 6, 2024
recommender systems,
bias, and bias mitigation
in hiring
with a focus on NLP challenges and solutions*
* adapted from a talk at NLP4HR Workshop @ EACL
|
󰘆 whoami
🎓 academia
● BA Media Studies @ UvA (2008)
● MSc Media Technology @ Leiden Universiteit (2012)
● PhD in Information Retrieval @ ILPS/IRlab, UvA (2017)
🏢 industry
● Editor radio/online (science) news @ NTR (2008-2010)
● (Lead) data scientist @ FD Mediagroep (2017-2019)
● Lead data scientist @ Randstad (2020-today)
|
recommender systems
at randstad
3
recommending jobs to job seekers
4
approach
hybrid model combining collaborative
filtering, with embedding-based
capability matching.
results
+27% application rate
recommending jobs to job seekers
5
ranking job seekers to vacancies
6
ranking job seekers to vacancies
7
method
point-wise learning to rank model trained on
historic placements, using structured data
(i.e., tabular features)
result
14.4% of search queries yield a matching
talent vs. 28.1% of recommendations
| 8
incorporating
textual features
joint work with Dor Lavi and Volodymyr Medentsiy
(industry talk @ RecSys 2021 & paper at RecSys in HR)
|
Job seeker
CV
(PDF)
Vacancy
CV
Text
Embeddings
Raw Data Prediction
Features
Ranker
Ranker
Ranking
Features from
structured data
Features from
structured data
Features from
structured data
Structured Data (e.g
location)
Structured Data (e.g
location)
Structured
Data
Structured Data (e.g
location)
Structured Data (e.g
location)
Structured
Data
Job
Description
Embeddings
|
Nils Reimers — Extending Neural Retrieval Models to New Domains and Languages (Transformers at Work 2021 @ Startup Village)
opportunity: training data = talent funnel
approached
by recruiter
candidate
applies
rejected
by recruiter
proposed
to client
in interview
process
in interview
process
candidate
retracts
in screening
process
hired! 🤝
candidate
found by
recruiter
|
from our talent funnel, we collect:
• 👍 positive pairs: any point of contact between
applicant and recruiter for a given vacancy
• 👎 negative pairs: applicants who get rejected for a
job without interaction
dataset:
• ✅ 270K (expert-)labeled job seeker/job-pairs for
training a Sentence-BERT to match resumes to job
descriptions
opportunity: data availability
results
multilinguality results
14
I worked in a warehouse for 10 years
I've been an educator for the last 10 years
Ik heb 10 jaar in een magazijn gewerkt
Ik ben de afgelopen 10 jaar leraar geweest
w
e
zijn
op
zoek
naar een
getalenteerde
logistiek
m
edew
erker
W
e
zijn
op
zoek
naar een
getalenteerde
docent
W
e
are
looking
for a
talented
logistic
w
orker
W
e
are
looking
for a
talented
teacher
multilinguality results
15
“Talent side”
I worked in a warehouse for 10 years
I've been an educator for the last 10 years
Ik heb 10 jaar in een magazijn gewerkt
Ik ben de afgelopen 10 jaar leraar geweest
w
e
zijn
op
zoek
naar een
getalenteerde
logistiek
m
edew
erker
W
e
zijn
op
zoek
naar een
getalenteerde
docent
W
e
are
looking
for a
talented
logistic
w
orker
W
e
are
looking
for a
talented
teacher
multilinguality results
16
“Job side”
I worked in a warehouse for 10 years
I've been an educator for the last 10 years
Ik heb 10 jaar in een magazijn gewerkt
Ik ben de afgelopen 10 jaar leraar geweest
w
e
zijn
op
zoek
naar een
getalenteerde
logistiek
m
edew
erker
W
e
zijn
op
zoek
naar een
getalenteerde
docent
W
e
are
looking
for a
talented
logistic
w
orker
W
e
are
looking
for a
talented
teacher
multilinguality results - BoW
17
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
I worked in a warehouse for 10 years
I've been an educator for the last 10 years
Ik heb 10 jaar in een magazijn gewerkt
Ik ben de afgelopen 10 jaar leraar geweest
w
e
zijn
op
zoek
naar een
getalenteerde
logistiek
m
edew
erker
W
e
zijn
op
zoek
naar een
getalenteerde
docent
W
e
are
looking
for a
talented
logistic
w
orker
W
e
are
looking
for a
talented
teacher
multilinguality results - BERT
18
0.57 0.49 0.46 0.44
0.65 0.66 0.52 0.53
0.56 0.47 0.7 0.67
0.55 0.51 0.65 0.66
I worked in a warehouse for 10 years
I've been an educator for the last 10 years
Ik heb 10 jaar in een magazijn gewerkt
Ik ben de afgelopen 10 jaar leraar geweest
w
e
zijn
op
zoek
naar een
getalenteerde
logistiek
m
edew
erker
W
e
zijn
op
zoek
naar een
getalenteerde
docent
W
e
are
looking
for a
talented
logistic
w
orker
W
e
are
looking
for a
talented
teacher
multilinguality results - SBERT
19
0.91 0.41 0.76 0.083
0.3 0.86 0.33 0.62
0.89 0.62 0.91 0.23
0.26 0.82 0.37 0.91
I worked in a warehouse for 10 years
I've been an educator for the last 10 years
Ik heb 10 jaar in een magazijn gewerkt
Ik ben de afgelopen 10 jaar leraar geweest
w
e
zijn
op
zoek
naar een
getalenteerde
logistiek
m
edew
erker
W
e
zijn
op
zoek
naar een
getalenteerde
docent
W
e
are
looking
for a
talented
logistic
w
orker
W
e
are
looking
for a
talented
teacher
20
bias (in hiring)
challenge: bias in hiring is everywhere
Peng et al., What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring (HCOMP 2019)
in humans...
“Given CVs from real-life
scientists [...] with names
changed to traditional male and
female names”
Turns out that both men and
women were more likely to
hire male job applicants than
female with an identical
record.
in humans...
“White” names received 50%
more callbacks for interviews
[than “African-American”
names]
Amazon’s system [...]
penalized résumés that
included the word
“women’s”, as in “women’s
chess club captain”. And it
downgraded graduates of
two all-women’s colleges.
…in “AI tools”...
24
…and LLMs
| 26
bias in algorithmic hiring
a survey
joint work with Alessandro Fabris, Nina Baranowska,
Matthew J. Dennis, Philipp Hacker, Jorge Saldivar, Frederik
Zuiderveen Borgesius, Asia J. Biega
Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey (ACM TOIS)
surveys:
● bias conducive factors
● bias metrics (used in
algorithmic hiring)
● bias mitigation methods
(in algorithmic hiring)
● law & regulations
● industry practices
● …
bias in algorithmic hiring is well-studied
1. institutional biases: practices, habits, and
norms shared at institutions, such as
companies and societies
bias conducive factors: factors that contribute to bias (in hiring)
e.g., job segregation;
1) horizontal: men over-represented in IT jobs
2) vertical: “glass ceiling” effect
2. individual preferences: apparent
consequence of individual preferences, but
represent generalized patterns for protected
groups
bias conducive factors: factors that contribute to bias (in hiring)
e.g., culture-based avoidance/attraction:
stereotypically “male” language in job
descriptions can make a position less
attractive for women
3. technology blindspots: bias introduced by
biased components integrated into larger
algorithmic hiring pipelines.
bias conducive factors: factors that contribute to bias (in hiring)
e.g., disparate performance of text
algorithms w.r.t. gender, race, and other
protected attributes
bias metrics (used in algorithmic hiring)
bias mitigation methods (used in alg. hiring)
3 examples
33
of research projects
around bias mitigation
bias mitigation (1)
34
Joint work with Adam Arafan, Fernando
P. Santos and Emma Beauxis-Aussalet
(presented at RecSys in HR 2022)
using synthetic data
and re-ranking
● Case study demonstrating bias
mitigation in our job seeker
recommender system
bias mitigation (1)
● Case study demonstrating bias
mitigation in our job seeker
recommender system
● Introduce “Fairness Gates”
1. pre-processing: generate
(synthetic) re-balanced training
data
2. post-processing: apply greedy
re-ranking on the model’s output [1]
bias mitigation (1)
[2] Fairness-Aware Ranking in Search & Recommendation Systems
with Application to LinkedIn Talent Search
● Case study demonstrating bias
mitigation in our job seeker
recommender system
● Introduce “Fairness Gates”
1. pre-processing: generate
(synthetic) re-balanced training
data
2. post-processing: apply greedy
re-ranking on the model’s output
● pre-processing yields improved
accuracy in imbalanced scenarios
bias mitigation (1)
● Case study demonstrating bias
mitigation in our job seeker
recommender system
● Introduce “Fairness Gates”
1. pre-processing: generate
(synthetic) re-balanced training
data
2. post-processing: apply greedy
re-ranking on the model’s output
● pre-processing yields improved
accuracy in imbalanced scenarios
● post-processing yields better fairness
with minimal impact on accuracy
bias mitigation (1)
combining pre & post-processing bias
mitigation offers complementary
solutions
|
by removing sensitive
(proxy) features
39
bias mitigation (2)
joint work with Yuxin Luo, Feng Lu, Vaishali Pal
(presented at RecSys in HR 2023)
approach
• given a rich parallel dataset of structured &
textual data representing job seekers
• Q: can we use QA methods to learn to extract
structured information from resumes?
• (started out as PII removal project)
40
|
Job seeker
CV
(PDF)
CV
Text
Structured Data (e.g
location)
Structured Data (e.g
location)
Structured
Data
Vacancy
Embeddings
Ranker
Ranker
Ranking
Features from
structured data
Features from
structured data
Features from
structured data
Structured Data (e.g
location)
Structured Data (e.g
location)
Structured
Data
Embeddings
Job
Description
Job seeker
CV
(PDF)
CV
Text
Structured Data (e.g location)
Structured Data (e.g location)
Structured Data
age 32
gender M
phone +311234567890
John Doe
+311234567890
Work History:
- Company XYZ - Senior Software
Engineer - 2018-2022 Developed
software solutions for clients.
Collaborated with team members on
project planning and execution.
- Company ABC - Software Developer -
2015-2018 Designed and implemented
software features. Conducted code
reviews and provided feedback to team
members [...].
2-stage approach:
1. (para)phrase questions
John Doe
+311234567890
Work History:
Company XYZ - Senior Software
Engineer - 2018-2022 Developed
software solutions for clients.
Collaborated with team members
on project planning and
execution.
Company ABC - Software
Developer - [...].
questions
what is your age?
what is your gender?
what is your phone number?
attribute answer
age 32
gender M
phone +311234567890
mT5: A massively multilingual pre-trained text-to-text transformer
mT5
2-stage approach:
1. (para)phrase questions
2. fine-tune mT5
John Doe
+311234567890
Work History:
Company XYZ - Senior Software
Engineer - 2018-2022 Developed
software solutions for clients.
Collaborated with team members
on project planning and
execution.
Company ABC - Software
Developer - [...].
questions
what is your age?
what is your gender?
what is your phone number?
attribute answer
age 32
gender M
phone +311234567890
mT5: A massively multilingual pre-trained text-to-text transformer
mT5
• mT5 outperforms rule-based & NER baselines
• adding more (paraphrased) questions
improves performance
• promising results for basic extraction tasks
(e.g., name, address, email) but not so much
for more complex fields (e.g., skills,
education)
findings
45
|
joint work with Ninande Vermeer, Vera
Provatorova, Thilina Rajapakse, Sepideh Mesbah
(presented at CompJobs@WSDM 2022)
46
bias mitigation (3)
extracting relevant
features
motivation
• Skills are strong features for our
recommender system
• Explicit skills are typically
extracted with rule-based
methods, NER, or similar;
• Implicit skills are typically
extracted using ontologies
• Can we train a single model to
extract both?
data
• 20K vacancies labeled with
explicit skills + occupation
(linked to taxonomy)
• enriched with implicit skills
linked to occupation but not
in text (in taxonomy)
• Inspired by similar work*
• Encode vacancies using
RobBERT, adds a classification
output layer
• Learn skill extraction as a binary
classification task across 3,789
unique skills (XMLC)
approach
* Bhola et al., Retrieving Skills from Job Descriptions: A Language Model
Based Extreme Multi-label Classification Framework (COLING 2020)
• across all metrics RobBERT >
multilingual BERT models
(and monolingual EN
models)
• Limitations in noisy labels,
class imbalance, and
limitations of implicit skill
annotations
findings
| 51
wrapping up
a few examples, there’s more
• L. Rink, J. Meijdam, and D. Graus,
Aspect-based sentiment analysis for
open-ended HR survey responses, NLP4HR
Workshop @ EACL, 2024
• J. Vrolijk and D. Graus, Enhancing PLM
performance on labour market tasks via
instruction-based finetuning and
prompt-tuning with rules, RecSys in HR @
RecSys, 2023.
• A. Lőrincz, D. Graus, D. Lavi, and J. L. M.
Pereira, Transfer learning for multilingual
vacancy text generation, GEM Workshop @
EMNLP, 2022
• Advances of SotA: Innovations in NLP, from
multilingual transformers to prompt-tuning, unlock
new potential applications
• Data availability: Both lack and availability of data
can inspire new approaches for solving real-world
problems
• Business cases: Working on ‘larger’ projects like
recommender systems reveal opportunities for
information extraction, PII removal, and document
representation
• Narrow focus: Narrowing the focus of a vast topic
like bias to a specific domain (e.g., hiring) makes
the topic more manageable
Interesting research opportunities emerge
real-world contexts
thank you
happy to take questions or hear your thoughts
David Graus
🦋 graus.nu

recommender systems, bias, and bias mitigation in hiring

  • 1.
    David Graus Dec 6,2024 recommender systems, bias, and bias mitigation in hiring with a focus on NLP challenges and solutions* * adapted from a talk at NLP4HR Workshop @ EACL
  • 2.
    | 󰘆 whoami 🎓 academia ●BA Media Studies @ UvA (2008) ● MSc Media Technology @ Leiden Universiteit (2012) ● PhD in Information Retrieval @ ILPS/IRlab, UvA (2017) 🏢 industry ● Editor radio/online (science) news @ NTR (2008-2010) ● (Lead) data scientist @ FD Mediagroep (2017-2019) ● Lead data scientist @ Randstad (2020-today)
  • 3.
  • 4.
    recommending jobs tojob seekers 4
  • 5.
    approach hybrid model combiningcollaborative filtering, with embedding-based capability matching. results +27% application rate recommending jobs to job seekers 5
  • 6.
    ranking job seekersto vacancies 6
  • 7.
    ranking job seekersto vacancies 7 method point-wise learning to rank model trained on historic placements, using structured data (i.e., tabular features) result 14.4% of search queries yield a matching talent vs. 28.1% of recommendations
  • 8.
    | 8 incorporating textual features jointwork with Dor Lavi and Volodymyr Medentsiy (industry talk @ RecSys 2021 & paper at RecSys in HR)
  • 9.
    | Job seeker CV (PDF) Vacancy CV Text Embeddings Raw DataPrediction Features Ranker Ranker Ranking Features from structured data Features from structured data Features from structured data Structured Data (e.g location) Structured Data (e.g location) Structured Data Structured Data (e.g location) Structured Data (e.g location) Structured Data Job Description Embeddings
  • 10.
    | Nils Reimers —Extending Neural Retrieval Models to New Domains and Languages (Transformers at Work 2021 @ Startup Village)
  • 11.
    opportunity: training data= talent funnel approached by recruiter candidate applies rejected by recruiter proposed to client in interview process in interview process candidate retracts in screening process hired! 🤝 candidate found by recruiter
  • 12.
    | from our talentfunnel, we collect: • 👍 positive pairs: any point of contact between applicant and recruiter for a given vacancy • 👎 negative pairs: applicants who get rejected for a job without interaction dataset: • ✅ 270K (expert-)labeled job seeker/job-pairs for training a Sentence-BERT to match resumes to job descriptions opportunity: data availability
  • 13.
  • 14.
    multilinguality results 14 I workedin a warehouse for 10 years I've been an educator for the last 10 years Ik heb 10 jaar in een magazijn gewerkt Ik ben de afgelopen 10 jaar leraar geweest w e zijn op zoek naar een getalenteerde logistiek m edew erker W e zijn op zoek naar een getalenteerde docent W e are looking for a talented logistic w orker W e are looking for a talented teacher
  • 15.
    multilinguality results 15 “Talent side” Iworked in a warehouse for 10 years I've been an educator for the last 10 years Ik heb 10 jaar in een magazijn gewerkt Ik ben de afgelopen 10 jaar leraar geweest w e zijn op zoek naar een getalenteerde logistiek m edew erker W e zijn op zoek naar een getalenteerde docent W e are looking for a talented logistic w orker W e are looking for a talented teacher
  • 16.
    multilinguality results 16 “Job side” Iworked in a warehouse for 10 years I've been an educator for the last 10 years Ik heb 10 jaar in een magazijn gewerkt Ik ben de afgelopen 10 jaar leraar geweest w e zijn op zoek naar een getalenteerde logistiek m edew erker W e zijn op zoek naar een getalenteerde docent W e are looking for a talented logistic w orker W e are looking for a talented teacher
  • 17.
    multilinguality results -BoW 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I worked in a warehouse for 10 years I've been an educator for the last 10 years Ik heb 10 jaar in een magazijn gewerkt Ik ben de afgelopen 10 jaar leraar geweest w e zijn op zoek naar een getalenteerde logistiek m edew erker W e zijn op zoek naar een getalenteerde docent W e are looking for a talented logistic w orker W e are looking for a talented teacher
  • 18.
    multilinguality results -BERT 18 0.57 0.49 0.46 0.44 0.65 0.66 0.52 0.53 0.56 0.47 0.7 0.67 0.55 0.51 0.65 0.66 I worked in a warehouse for 10 years I've been an educator for the last 10 years Ik heb 10 jaar in een magazijn gewerkt Ik ben de afgelopen 10 jaar leraar geweest w e zijn op zoek naar een getalenteerde logistiek m edew erker W e zijn op zoek naar een getalenteerde docent W e are looking for a talented logistic w orker W e are looking for a talented teacher
  • 19.
    multilinguality results -SBERT 19 0.91 0.41 0.76 0.083 0.3 0.86 0.33 0.62 0.89 0.62 0.91 0.23 0.26 0.82 0.37 0.91 I worked in a warehouse for 10 years I've been an educator for the last 10 years Ik heb 10 jaar in een magazijn gewerkt Ik ben de afgelopen 10 jaar leraar geweest w e zijn op zoek naar een getalenteerde logistiek m edew erker W e zijn op zoek naar een getalenteerde docent W e are looking for a talented logistic w orker W e are looking for a talented teacher
  • 20.
  • 21.
    challenge: bias inhiring is everywhere Peng et al., What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring (HCOMP 2019)
  • 22.
    in humans... “Given CVsfrom real-life scientists [...] with names changed to traditional male and female names” Turns out that both men and women were more likely to hire male job applicants than female with an identical record.
  • 23.
    in humans... “White” namesreceived 50% more callbacks for interviews [than “African-American” names]
  • 24.
    Amazon’s system [...] penalizedrésumés that included the word “women’s”, as in “women’s chess club captain”. And it downgraded graduates of two all-women’s colleges. …in “AI tools”... 24
  • 25.
  • 26.
    | 26 bias inalgorithmic hiring a survey joint work with Alessandro Fabris, Nina Baranowska, Matthew J. Dennis, Philipp Hacker, Jorge Saldivar, Frederik Zuiderveen Borgesius, Asia J. Biega
  • 27.
    Fairness and Biasin Algorithmic Hiring: A Multidisciplinary Survey (ACM TOIS) surveys: ● bias conducive factors ● bias metrics (used in algorithmic hiring) ● bias mitigation methods (in algorithmic hiring) ● law & regulations ● industry practices ● … bias in algorithmic hiring is well-studied
  • 28.
    1. institutional biases:practices, habits, and norms shared at institutions, such as companies and societies bias conducive factors: factors that contribute to bias (in hiring) e.g., job segregation; 1) horizontal: men over-represented in IT jobs 2) vertical: “glass ceiling” effect
  • 29.
    2. individual preferences:apparent consequence of individual preferences, but represent generalized patterns for protected groups bias conducive factors: factors that contribute to bias (in hiring) e.g., culture-based avoidance/attraction: stereotypically “male” language in job descriptions can make a position less attractive for women
  • 30.
    3. technology blindspots:bias introduced by biased components integrated into larger algorithmic hiring pipelines. bias conducive factors: factors that contribute to bias (in hiring) e.g., disparate performance of text algorithms w.r.t. gender, race, and other protected attributes
  • 31.
    bias metrics (usedin algorithmic hiring)
  • 32.
    bias mitigation methods(used in alg. hiring)
  • 33.
    3 examples 33 of researchprojects around bias mitigation
  • 34.
    bias mitigation (1) 34 Jointwork with Adam Arafan, Fernando P. Santos and Emma Beauxis-Aussalet (presented at RecSys in HR 2022) using synthetic data and re-ranking
  • 35.
    ● Case studydemonstrating bias mitigation in our job seeker recommender system bias mitigation (1)
  • 36.
    ● Case studydemonstrating bias mitigation in our job seeker recommender system ● Introduce “Fairness Gates” 1. pre-processing: generate (synthetic) re-balanced training data 2. post-processing: apply greedy re-ranking on the model’s output [1] bias mitigation (1) [2] Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search
  • 37.
    ● Case studydemonstrating bias mitigation in our job seeker recommender system ● Introduce “Fairness Gates” 1. pre-processing: generate (synthetic) re-balanced training data 2. post-processing: apply greedy re-ranking on the model’s output ● pre-processing yields improved accuracy in imbalanced scenarios bias mitigation (1)
  • 38.
    ● Case studydemonstrating bias mitigation in our job seeker recommender system ● Introduce “Fairness Gates” 1. pre-processing: generate (synthetic) re-balanced training data 2. post-processing: apply greedy re-ranking on the model’s output ● pre-processing yields improved accuracy in imbalanced scenarios ● post-processing yields better fairness with minimal impact on accuracy bias mitigation (1) combining pre & post-processing bias mitigation offers complementary solutions
  • 39.
    | by removing sensitive (proxy)features 39 bias mitigation (2) joint work with Yuxin Luo, Feng Lu, Vaishali Pal (presented at RecSys in HR 2023)
  • 40.
    approach • given arich parallel dataset of structured & textual data representing job seekers • Q: can we use QA methods to learn to extract structured information from resumes? • (started out as PII removal project) 40
  • 41.
    | Job seeker CV (PDF) CV Text Structured Data(e.g location) Structured Data (e.g location) Structured Data Vacancy Embeddings Ranker Ranker Ranking Features from structured data Features from structured data Features from structured data Structured Data (e.g location) Structured Data (e.g location) Structured Data Embeddings Job Description
  • 42.
    Job seeker CV (PDF) CV Text Structured Data(e.g location) Structured Data (e.g location) Structured Data age 32 gender M phone +311234567890 John Doe +311234567890 Work History: - Company XYZ - Senior Software Engineer - 2018-2022 Developed software solutions for clients. Collaborated with team members on project planning and execution. - Company ABC - Software Developer - 2015-2018 Designed and implemented software features. Conducted code reviews and provided feedback to team members [...].
  • 43.
    2-stage approach: 1. (para)phrasequestions John Doe +311234567890 Work History: Company XYZ - Senior Software Engineer - 2018-2022 Developed software solutions for clients. Collaborated with team members on project planning and execution. Company ABC - Software Developer - [...]. questions what is your age? what is your gender? what is your phone number? attribute answer age 32 gender M phone +311234567890 mT5: A massively multilingual pre-trained text-to-text transformer mT5
  • 44.
    2-stage approach: 1. (para)phrasequestions 2. fine-tune mT5 John Doe +311234567890 Work History: Company XYZ - Senior Software Engineer - 2018-2022 Developed software solutions for clients. Collaborated with team members on project planning and execution. Company ABC - Software Developer - [...]. questions what is your age? what is your gender? what is your phone number? attribute answer age 32 gender M phone +311234567890 mT5: A massively multilingual pre-trained text-to-text transformer mT5
  • 45.
    • mT5 outperformsrule-based & NER baselines • adding more (paraphrased) questions improves performance • promising results for basic extraction tasks (e.g., name, address, email) but not so much for more complex fields (e.g., skills, education) findings 45
  • 46.
    | joint work withNinande Vermeer, Vera Provatorova, Thilina Rajapakse, Sepideh Mesbah (presented at CompJobs@WSDM 2022) 46 bias mitigation (3) extracting relevant features
  • 47.
    motivation • Skills arestrong features for our recommender system • Explicit skills are typically extracted with rule-based methods, NER, or similar; • Implicit skills are typically extracted using ontologies • Can we train a single model to extract both?
  • 48.
    data • 20K vacancieslabeled with explicit skills + occupation (linked to taxonomy) • enriched with implicit skills linked to occupation but not in text (in taxonomy)
  • 49.
    • Inspired bysimilar work* • Encode vacancies using RobBERT, adds a classification output layer • Learn skill extraction as a binary classification task across 3,789 unique skills (XMLC) approach * Bhola et al., Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework (COLING 2020)
  • 50.
    • across allmetrics RobBERT > multilingual BERT models (and monolingual EN models) • Limitations in noisy labels, class imbalance, and limitations of implicit skill annotations findings
  • 51.
  • 52.
    a few examples,there’s more • L. Rink, J. Meijdam, and D. Graus, Aspect-based sentiment analysis for open-ended HR survey responses, NLP4HR Workshop @ EACL, 2024 • J. Vrolijk and D. Graus, Enhancing PLM performance on labour market tasks via instruction-based finetuning and prompt-tuning with rules, RecSys in HR @ RecSys, 2023. • A. Lőrincz, D. Graus, D. Lavi, and J. L. M. Pereira, Transfer learning for multilingual vacancy text generation, GEM Workshop @ EMNLP, 2022
  • 53.
    • Advances ofSotA: Innovations in NLP, from multilingual transformers to prompt-tuning, unlock new potential applications • Data availability: Both lack and availability of data can inspire new approaches for solving real-world problems • Business cases: Working on ‘larger’ projects like recommender systems reveal opportunities for information extraction, PII removal, and document representation • Narrow focus: Narrowing the focus of a vast topic like bias to a specific domain (e.g., hiring) makes the topic more manageable Interesting research opportunities emerge real-world contexts
  • 54.
    thank you happy totake questions or hear your thoughts David Graus 🦋 graus.nu