Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

Competence-Level Prediction and Resume & Job Description Matching
Using Context-Aware Transformer Models
Changmao Li, Elaine Fisher, Rebecca S. Thomas, Stephen Pittard, Vicki Hertzberg, and Jinho Choi
Emory NLP

Outline
● Dataset
● Tasks
● Approaches
● Experiments
● Error Analysis
● Contributions

Dataset
Source: Clinical Research Coordinators(CRC) Applicants Resumes
Here we have two kinds of annotations:
1. The levels they applied(an applicant can apply multiple levels).
2. The level they should be qualified. This is annotated by human experts with
some annotation agreements. There are four levels, CRC1, CRC2, CRC3,
CRC4. For the annotation, if the resume cannot match any level it will be
annotated with Not Qualified(NQ)
Besides, there is a job description for each level.

Dataset
Preprocessing:
The original resume files are in DOC or PDF, they are parsed using some tools
and splitted into 6 sections and finally put into the json file for the convenient use.
The existence ratio of each section in the CRC levels

Dataset
Annotation:
Two experts with experience in recruiting applicants for CRC positions of all
levels design the annotation guidelines in 5 rounds by labeling each resume.
Kappa scores measured for ITA during the five rounds of guideline development

Tasks
Two novel tasks are proposed for this new dataset:
1. (Multiclass classification(5 class))Given a resume, decide which level of
CRC positions that the corresponding applicant is suitable for.(Use the
resume as input and the annotation 2 as the gold output)
2. (Binary classification)Given a resume and a CRC level job description,
decide whether the applicant is suitable for that particular level.(Use both
resume and job description for the levels they applied for as input and
combine the annotation 1 and annotation 2 to get the binary gold output)

Approaches
Baseline Approaches for both tasks

Approaches
Strategies when applying baseline models
● Section Trimming for baseline models due to input length limitation of
transformer encoders
Task 1 Task 2

Approaches
Proposed Models for the multiclass classification task
The context-aware model using section pruning and section encoding

Approaches
Proposed Models for the multiclass classification task
The context-aware model using chunk segmenting and section encoding

Approaches
Proposed Models for the binary classification task
Approaches
The context-aware models using chunk segmenting + section encoding + job description embedding
and multi-head attention between the resume and the job description

Approaches
Strategies when applying models
● Section Pruning for Proposed “encoding by sections” models in case
each section exceeds the input length of transformer encoders

Analysis on Section Pruning (in Appendix)
Section lengths before section pruning
Section lengths after section pruning

Experiments
Data split for the multiclass classification task(Keep label distributions):
Data statistics for the competence-level classification task

Experiments
Data split for binary classification task(keep label and CRC distributions
without overlap resumes between training and dev or test set ):
Data statistics for the resume-to-job description matching task

Algorithm to split dataset while avoiding overlaps
between training and evaluation dataset(in Appendix)
The key idea is
1. Split the data by targeted label distributions but with a smaller initial training
set ratio than the original one.
2. If there are overlapping applicants, then the algorithm puts all the overlaps
into the training set so that the training set ratio will be large enough to be
close to the targeted training set ratio while the label distributions are still kept
in a great extent.

Experiments
Experimented Models
W!: Whole context model + section trimming
P: Context-aware model + section pruning
P⊕I:P+ section encoding
C: Context-aware model + chunk segmenting
C⊕I:C+ section encoding
Models for the competence-level classification task
W!" : Whole context + sec./job_desc. trimming
P⊕I⊕J:P⊕I+ job_desc. embedding
P⊕I⊕J⊕A:P⊕I⊕J+ multi-head attention
P⊕I⊕J⊕AE:P⊕I⊕J-E#
C⊕I⊕J:C⊕I+ job_desc. embedding
C⊕I⊕J⊕A:C⊕I⊕J+ multi-head attention
C⊕I⊕J⊕AE:C⊕I⊕J- E#
Models for the resume-to-job description matching task

Experiments
Results for the competence-level classification task.

Experiments
Results for the resume-to-job description matching task.

Experiments
Analysis for the competence-level classification task.
Confusion matrix for the best model of the competence-level classification task

Experiments
Analysis for the resume-to-job description matching task.
Confusion matrix for the best model of the resume-to-job description matching task

Error Analysis
• It’s unable to identify clinical research experience.
• It can’t identify dates of experience.
• It’s hard to distinguish adjacent CRC positions.

Contributions
● Introduced a new resume classification dataset.
● Proposed two new tasks for this new dataset.
● Proposed novel context-aware transformer approaches for two tasks.
● Conducted experiments with several proposed models.
● Conducted both quantitative and qualitative analysis for future improvements.

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

Similar to Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models (20)

More from Jinho Choi

More from Jinho Choi (20)

Recently uploaded

Recently uploaded (20)

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models