SlideShare a Scribd company logo
1 of 6
Download to read offline
1
Resume Text Generation Using Language Models
Hana Ba-Sabaa, Aadil Islam, Joe Zhang
Dartmouth College
hana.h.ba-sabaa.22@dartmouth.edu,
{aadil.islam,joe.zhang}.gr@dartmouth.edu
2
Abstract
Text generation is a challenging task in
Natural Language Processing, where the
intention is to generate text when provided
with some input. The release of different
pre-trained open-source text generation
models like GPT-1,2,3 and GPT-Neo has
spurred so much attention due to the
algorithms’ alleged ability to generate
humanlike text. Using tasks to assess
whether Natural Language Generation
(NLG) algorithms can produce “natural”
text human evaluators’ inability to
distinguish human-written versus
machine-generated text is not a well-
studied area. We conducted an experiment
assessing behavioral reactions to the state-
of-the-art NLG algorithms GPT-2 and
GPT-Neo and compared them to human-
written text. In our study, our algorithm-
generated text was not successful in
outperforming human-written texts, as
evaluated by our human participants. Our
results do not align with current
contributions in the field that suggest that
participants fail to reliably detect
algorithmically generated texts in the
Human-in-the-loop treatment. We discuss
how these results convey some
shortcomings in our sample size, which if
addressed in following research, will
further prove the extraordinary
performance of NLG algorithms to
produce human-like text.
1 Introduction
Each year, manystudents and young professionals
are in the process of applying for jobs and
composing/editing their resumes. Our project is
an English language resume text generator that
will assist applicants through the arduous process
of applying for jobs. Our tool outputs bullet points
that elaborate the requirements/job description
given a topic or a job title. For instance, when the
user inputs the job title “School Counselor”
followed by the seed phrase “Worked at Smith
High School as a school counselor. Helped
students…”, an example output would be:
“...understand and overcome social or behavioral
problems.” Since it is difficult to write resumes
that properly describe a user’s prior job/internship
experiences without knowing the specifics of the
user’s experiences, our tool will provide a
template that can be personalized based on
personal experiences. The output can be adjusted
and modified manually by the user in any text
editor.
Similar work was done in a research study by
Köbis et al. (2021) that explored people’s ability
to discern artificial content from human content.
In this experiment, humans directly competed
with an AI agent in the form of the natural
language generation (NLG) algorithm GPT-2
(Radford et al., 2019). The results showed that
human participants were incapable of reliably
detecting algorithm-generated poetry, even when
incentivized to do so. Additionally, the study
showed significantly higher preference for
algorithm-generated texts when humans were
involved in the selection process, versus when the
selection was random. We drew inspiration from
the study by introducing human-in-the-loop
factors into our evaluation techniques when
selecting the machine-generated text.
The work of Lee et al. (2020) leveraged GPT-2
355M to generate coherent patent claims. Their
training dataset comprised over 500,000 of the
granted U.S. utility patents in 2013, which were
then preprocessed by splitting claim texts into
smaller units of inventive thoughts called ‘claim
spans,’ circumventing the need to fine-tune GPT-
2 upon entire claim texts that may be too coarse-
grained to model upon. This inspired us to
reconsider generating full resume texts and
instead focus on modeling smaller phrases such as
bullet points for job titles. The authors also
noticed that generating the first reasonably
3
coherent patent claims required relatively few
training steps of fine-tuning, implying that future
deep learning models may too have potential in
modeling within similar domains.
2 Methodology
2.1 Dataset
The dataset leveraged in this project was obtained
from Kaggle1
and comprises 2,484 resumes
scraped from a database of example resumes
hosted by LiveCareer.2
Our resumes were those
that were scored 85 (out of 100) or above by
LiveCareer agents, ie. those found by former
recruiters to be fairly well-written. This was to
ensure our system is fine-tuned on the highest-
quality resumes possible. The dataset offers the
following attributes for each resume:
● ID: Unique identifier for the resume PDF.
● Resume_str: Resume text in string format.
● Resume_html: Resume content in HTML
format as present from web scraping.
● Category: Type of job that the resume was
used to apply for: HR, Designer,
Information-Technology, Teacher,
Advocate, Business-Development,
Healthcare, Fitness, Agriculture, BPO,
Sales, Consultant, Digital-Media,
Automobile, Chef, Finance, Apparel,
Engineering, Accountant, Construction,
Public-Relations, Banking, Arts, Aviation.
2.2 Preprocessing
Figure 1: Resume #2 of dataset after preprocessing. Extracted job
titles and bullet points are colored in red and green, respectively.
We parsed HTML contents from resumes and
extracted tags containing job titles and their
corresponding bullet points. We then performed
1
https://www.kaggle.com/datasets/sneh
aanbhawal/resume-dataset
preprocessing steps to ensure that the data is
formatted suitably for being fed as input to our
language models. For instance, after extracting
the necessary text from each resume, we remove
all characters that are neither alphanumeric nor
punctuation. We kept punctuation in order to
retain proper formatting for dependent clauses
and lists. We kept numbers simply as
placeholders; even though the precise values may
not apply to each user, he/she can easily copy and
paste our generated bullet points and edit them by
incorporating his/her individual experiences.
Figure 1 visualizes the content we handle from a
random resume’s HTML content.
In order to enable users to customize the
information they want to generate, down to the
desired job title to be written about, we
transformed our original dataset of 2,484 resumes
into a dataset of 64,441 tuples of form (job title,
bullet point) comprising all observed pairs of job
titles and bullet points. This yields text samples of
the form “<JOB_TITLE>: <BULLET_POINT>”
such as:
Assistant Head Teller:
Consistently met or exceeded
quarterly sales goals.
By fine-tuning our language models on such
text samples, we allow users to pass in a prompt
of the form “<JOB_TITLE>”, optionally
followed by a colon and one or more seed words,
and receive synthetic job-specific bullet points.
2.3 Algorithm
Language generation models that we
experimented with include GPT-2 124M and
GPT-Neo 125M (Black et al., 2021). Though we
originally intended to use GPT-Neo 355M in
order to examine the effect of additional neural
network layers on the quality of GPT-Neo text
generations, we were held back by GPU memory
limitations of our Google Colab coding
environments (Bisong, 2019). We chose the
aforementioned models over the preferred state-
of-the-art GPT-3 architecture proposed by
OpenAI (Brown et al., 2020) due to the latter not
being open source, as well as the heavy monetary
costs of using OpenAI’s API to fine-tune on our
2
https://www.livecareer.com/resume-search/
4
sizable dataset. In contrast, GPT-Neo models
developed by EleutherAI3
are open source and
offer competitive performances in comparison
with those of GPT-2 models.4
In contrast to GPT-
2 models, GPT-Neo models are said to be more
suitable for longer texts and pre-trained on more
recent data.
To fine-tune and evaluate our language models,
we used the aitextgen library,5
a Python toolkit for
text-based AI training and generation across
numerous deep learning architectures. We fine-
tuned each language model on a single GPU for
10 epochs with a learning rate of 1e-3.
2.4 Evaluation
2.4.1 Human Evaluators
We created a Google Form containing 48 text
samples of both synthetic and human generated
bullet points and presented them to 10 people to
rate how ‘human-sounding’ they are. We first
chose the top-six most frequent job titles from our
dataset, which are Accountant, Sales Associate,
Consultant, Teacher, Administrative Assistant,
and Executive Chef. For each job title, we then
chose two ground truth bullet points from the
actual dataset, three bullet points generated by
GPT-2, and three bullet points generated by GPT-
Neo. We randomized the order of the bullet points
within each job title and asked our human
evaluators to rate them. The options were based
on a 5-point Likert scale denoting (5) Strongly
Agree, (4) Agree, (3) Neutral, (2) Disagree, and
(1) Strongly Disagree. Evaluators were current
undergraduate students.
2.4.2 BLEU Score
Bilingual evaluation understudy (BLEU) is an
algorithm for evaluating the quality of text
(Papineni et al., 2002). Designed to evaluate
language translation, BLEU has been shown to be
a performant metric for many natural language
generation tasks, having a high correlation with
human judgment (Callison-Burch et al., 2006). It
uses n-grams to calculate the similarity between
generated text and a reference corpus, which is
3
https://www.eleuther.ai/
4
https://towardsdatascience.com/guide-
to-fine-tuning-text-generation-models-
gpt-2-gpt-neo-and-t5-dc5de6b3bc5e
especially good for short sentences. We decided
to use BLEU as an initial metric to gauge text
generation.
2.4.3 Preliminary Results
We calculated the BLEU score for 6 phrases using
GPT-Neo. The respective input job titles were
chosen based on the number of data points it had:
2 with a lot of data, 2 with an average amount of
data, and 2 with few data. Since BLEU score
requires a ground truth phrase, the input needed to
be unique so that there was only a single point in
the data starting with the input. Some inputs are as
follows: “Teacher: Developed lesson”, “Project
Coordinator: Collected”, and “Finance and Sales
Consultant: Planned”. The BLEU score for these
initial results was 0.19.
3 Results
Figure 2: Loss curves for GPT-2 124M and GPT-Neo 125M after
fine-tuning for 10 epochs with learning rate of 1e-3.
Job Title
Bullet
Point
Sourc
e
Accou
ntant
Sales
Assoc
iate
Cons
ultant
Teach
er
Admi
nistra
tive
Assist
ant
Execu
tive
Chef
Groun
d
Truth
2.80
(1.36)
3.55
(1.10)
3.35
(1.27)
3.15
(1.27)
4.05
(1.10)
2.95
(1.28)
GPT-
2
124M
4.13
(1.20)
3.13
(1.63)
3.33
(1.27)
3.47
(1.57)
3.90
(1.21)
2.77
(1.52)
5
https://github.com/minimaxir/aitextgen
5
GPT-
Neo
125M
2.67
(1.54)
4.40
(0.97)
2.23
(1.04)
3.77
(1.14)
2.20
(1.52)
2.23
(1.38)
Table 1: Average human likeness scores (ranging from 1.0 to 5.0
where higher is better) across job types for resume bullet points
taken from ground truth resumes versus generated from language
models. Bolded values indicate the most natural-sounding bullet
points across job types. Parentheses indicate standard deviations.
Figure 2 shows the convergence of both language
models upon fine-tuning, indicating that both
GPT models are adjusting to subtleties in English
language found across resume bullet points. Table
1 compares human likeness scores for resume
bullet points generated by our models.
4 Discussion
4.1 BLEU Score
BLEU scores are between 0 and 1, with 0.6
considered the best one could realistically achieve
and 0.3 as decent. We got a relatively low score of
0.19 for our preliminary results. However, a low
BLEU score does not necessarily mean the
generated text is bad. There are many possibilities
for a ‘good’ sentence on a resume and there is not
one correct answer. The model may have
generated a good text that was very different from
the originals. We decided human evaluation
would be better suited for our task.
4.2 Human Evaluation
For only two of the top-six job titles do GPT-Neo
text generations appear to be (on average) more
humanlike than ground truth bullet points and
GPT-2 text generations, whereas they fare poorly
across all other job titles, making it a potentially
less versatile model for clients seeking
customization across a variety of job titles. Upon
closer examination, the following GPT-Neo text
generation for an ‘Administrative Assistant’ job
title received the poorest Likert score of Strongly
Disagree from 90% of evaluators:
Ensuring that certifications and
coordinates the billing and updates
Although the proposed bullet point cleverly
begins without a subject–this is assumed to refer
to the job applicant across most resumes–it fails
to complete the initial verb phrase describing how
the applicant supposedly handles certifications,
transitions unnaturally to how the applicant
supposedly coordinates billing processes, and
forms a run-on sentence bearing the conjunction
‘and’ twice. It seems that although the
responsibilities of the job title have been learned
by GPT-Neo over the course of fine-tuning,
perhaps further fine-tuning on text samples that
are ensured to be grammatically correct would
help expose the model to syntactic features of
language present in resumes. Another GPT-Neo
text generation given a score of Strongly
Disagree, this time by 80% of evaluators and for
the ‘Executive Chef’ job title, was the following:
Responsible for the daily operation
of the food items for the operation
of the food items
The proposed bullet point exhibits repetition of
the prepositional phrase “for the ?:(daily)
operation of the food items,” prompting the need
to study what a good balance would be between
forced no-repetition and repeating cycles of
identical n-grams. This balance could be
implicitly learned by fine-tuning our language
models further and by manually experimenting
with the repetition penalty parameter (Keskar et
al., 2019) featured in the generate method of
our aitextgen pipeline.
Regarding ethical concerns, recall that the
resumes we used in this project were specifically
those evaluated by LiveCareer agents to be fairly
well-written. Consequently, our dataset is biased
in terms of the quality of writing that real world
resumes actually bear. Another concern is the
usage of text generations. Our generations are
meant to serve as a high-quality template. Clients
should not misuse our system by taking its text
generations verbatim, as neither GPT-2 nor GPT-
Neo were fine-tuned on the client’s personal
experiences. Rather, clients should incorporate
their personal experiences into the writing styles
present in our text generations.
5 Conclusion
Algorithms that generate human-sounding text
language are becoming ever more widely
accessible. Models like GPT-2 and GPT-Neo can
guide users through daunting writing processes,
such as resume building. Understanding humans’
behavioral reactions to these algorithms helps
shape future breakthroughs in the field that will
address the shortcomings of the existing models.
As a step in that direction, our study adopts a
behavioral science approach to examine a balance
of formulaic and creative artificial intelligence in
6
the form of resumes. As with the majority of
studies, the design of our current study is subject
to limitations. Due to the small sample size of our
human evaluators, further research on a larger
sample size is critical for reinforcing the accuracy
of our results. We hope more studies follow suit
to provide new behavioral insights into human
versus innovative artificial intelligence.
References
Bisong, Ekaba. Building machine learning and deep
learning models on Google cloud platform: A
comprehensive guide for beginners. Apress, 2019.
Black, Sid, et al. "GPT-Neo: Large scale
autoregressive language modeling with mesh-
tensorflow." If you use this software, please cite it
using these metadata 58 (2021).
Brown, Tom, et al. "Language models are few-shot
learners." Advances in neural information
processing systems 33 (2020): 1877-1901.
Callison-Burch, Chris, Miles Osborne, and Philipp
Koehn. "Re-evaluatingthe role of BLEU in machine
translation research." 11th conference of the
european chapter of the association for
computational linguistics. 2006.
Keskar, Nitish Shirish, et al. "Ctrl: A conditional
transformer language model for controllable
generation." arXiv preprint arXiv:1909.05858
(2019).
Köbis, Nils, and Luca D. Mossink. "Artificial
intelligence versus Maya Angelou: Experimental
evidence that people cannot differentiate AI-
generated from human-written poetry." Computers
in human behavior 114 (2021): 106553.
Lee, Jieh-Sheng, and Jieh Hsiang. "Patent claim
generation by fine-tuning OpenAI GPT-2." World
Patent Information 62 (2020): 101983.
Papineni, Kishore, et al. "Bleu: a method for automatic
evaluation of machine translation." Proceedings of
the 40th annual meeting of the Association for
Computational Linguistics. 2002.
Radford, Alec, et al. "Language models are
unsupervised multitask learners." OpenAI blog 1.8
(2019): 9.

More Related Content

Similar to Comp_Ling_Resume_Generator.pdf

IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET Journal
 
Codex AI.pdf
Codex AI.pdfCodex AI.pdf
Codex AI.pdfepetitjr
 
Sentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product ReviewSentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product ReviewIRJET Journal
 
Application of Genetic Algorithm in Software Engineering: A Review
Application of Genetic Algorithm in Software Engineering: A ReviewApplication of Genetic Algorithm in Software Engineering: A Review
Application of Genetic Algorithm in Software Engineering: A ReviewIRJESJOURNAL
 
Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...IAESIJAI
 
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEJOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEijscai
 
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEJOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEijscai
 
International Journal on Soft Computing, Artificial Intelligence and Applicat...
International Journal on Soft Computing, Artificial Intelligence and Applicat...International Journal on Soft Computing, Artificial Intelligence and Applicat...
International Journal on Soft Computing, Artificial Intelligence and Applicat...ijscai
 
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEJOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEijscai
 
Data mining for prediction of human
Data mining for prediction of humanData mining for prediction of human
Data mining for prediction of humanIJDKP
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLijfcstjournal
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification SystemIRJET Journal
 
Derogatory Comment Classification
Derogatory Comment ClassificationDerogatory Comment Classification
Derogatory Comment ClassificationIRJET Journal
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsEditor IJCATR
 
Developing a framework for
Developing a framework forDeveloping a framework for
Developing a framework forcsandit
 
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...IRJET Journal
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
A New Metric for Code Readability
A New Metric for Code ReadabilityA New Metric for Code Readability
A New Metric for Code ReadabilityIOSR Journals
 
Work in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingWork in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingManuel Castro
 

Similar to Comp_Ling_Resume_Generator.pdf (20)

IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
 
Codex AI.pdf
Codex AI.pdfCodex AI.pdf
Codex AI.pdf
 
Sentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product ReviewSentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product Review
 
Application of Genetic Algorithm in Software Engineering: A Review
Application of Genetic Algorithm in Software Engineering: A ReviewApplication of Genetic Algorithm in Software Engineering: A Review
Application of Genetic Algorithm in Software Engineering: A Review
 
Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...
 
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEJOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
 
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEJOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
 
International Journal on Soft Computing, Artificial Intelligence and Applicat...
International Journal on Soft Computing, Artificial Intelligence and Applicat...International Journal on Soft Computing, Artificial Intelligence and Applicat...
International Journal on Soft Computing, Artificial Intelligence and Applicat...
 
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCEJOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
 
Data mining for prediction of human
Data mining for prediction of humanData mining for prediction of human
Data mining for prediction of human
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
 
Derogatory Comment Classification
Derogatory Comment ClassificationDerogatory Comment Classification
Derogatory Comment Classification
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
Developing a framework for
Developing a framework forDeveloping a framework for
Developing a framework for
 
ChatGPT.pptx
ChatGPT.pptxChatGPT.pptx
ChatGPT.pptx
 
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
A New Metric for Code Readability
A New Metric for Code ReadabilityA New Metric for Code Readability
A New Metric for Code Readability
 
Work in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingWork in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper Writing
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Recently uploaded (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Comp_Ling_Resume_Generator.pdf

  • 1. 1 Resume Text Generation Using Language Models Hana Ba-Sabaa, Aadil Islam, Joe Zhang Dartmouth College hana.h.ba-sabaa.22@dartmouth.edu, {aadil.islam,joe.zhang}.gr@dartmouth.edu
  • 2. 2 Abstract Text generation is a challenging task in Natural Language Processing, where the intention is to generate text when provided with some input. The release of different pre-trained open-source text generation models like GPT-1,2,3 and GPT-Neo has spurred so much attention due to the algorithms’ alleged ability to generate humanlike text. Using tasks to assess whether Natural Language Generation (NLG) algorithms can produce “natural” text human evaluators’ inability to distinguish human-written versus machine-generated text is not a well- studied area. We conducted an experiment assessing behavioral reactions to the state- of-the-art NLG algorithms GPT-2 and GPT-Neo and compared them to human- written text. In our study, our algorithm- generated text was not successful in outperforming human-written texts, as evaluated by our human participants. Our results do not align with current contributions in the field that suggest that participants fail to reliably detect algorithmically generated texts in the Human-in-the-loop treatment. We discuss how these results convey some shortcomings in our sample size, which if addressed in following research, will further prove the extraordinary performance of NLG algorithms to produce human-like text. 1 Introduction Each year, manystudents and young professionals are in the process of applying for jobs and composing/editing their resumes. Our project is an English language resume text generator that will assist applicants through the arduous process of applying for jobs. Our tool outputs bullet points that elaborate the requirements/job description given a topic or a job title. For instance, when the user inputs the job title “School Counselor” followed by the seed phrase “Worked at Smith High School as a school counselor. Helped students…”, an example output would be: “...understand and overcome social or behavioral problems.” Since it is difficult to write resumes that properly describe a user’s prior job/internship experiences without knowing the specifics of the user’s experiences, our tool will provide a template that can be personalized based on personal experiences. The output can be adjusted and modified manually by the user in any text editor. Similar work was done in a research study by Köbis et al. (2021) that explored people’s ability to discern artificial content from human content. In this experiment, humans directly competed with an AI agent in the form of the natural language generation (NLG) algorithm GPT-2 (Radford et al., 2019). The results showed that human participants were incapable of reliably detecting algorithm-generated poetry, even when incentivized to do so. Additionally, the study showed significantly higher preference for algorithm-generated texts when humans were involved in the selection process, versus when the selection was random. We drew inspiration from the study by introducing human-in-the-loop factors into our evaluation techniques when selecting the machine-generated text. The work of Lee et al. (2020) leveraged GPT-2 355M to generate coherent patent claims. Their training dataset comprised over 500,000 of the granted U.S. utility patents in 2013, which were then preprocessed by splitting claim texts into smaller units of inventive thoughts called ‘claim spans,’ circumventing the need to fine-tune GPT- 2 upon entire claim texts that may be too coarse- grained to model upon. This inspired us to reconsider generating full resume texts and instead focus on modeling smaller phrases such as bullet points for job titles. The authors also noticed that generating the first reasonably
  • 3. 3 coherent patent claims required relatively few training steps of fine-tuning, implying that future deep learning models may too have potential in modeling within similar domains. 2 Methodology 2.1 Dataset The dataset leveraged in this project was obtained from Kaggle1 and comprises 2,484 resumes scraped from a database of example resumes hosted by LiveCareer.2 Our resumes were those that were scored 85 (out of 100) or above by LiveCareer agents, ie. those found by former recruiters to be fairly well-written. This was to ensure our system is fine-tuned on the highest- quality resumes possible. The dataset offers the following attributes for each resume: ● ID: Unique identifier for the resume PDF. ● Resume_str: Resume text in string format. ● Resume_html: Resume content in HTML format as present from web scraping. ● Category: Type of job that the resume was used to apply for: HR, Designer, Information-Technology, Teacher, Advocate, Business-Development, Healthcare, Fitness, Agriculture, BPO, Sales, Consultant, Digital-Media, Automobile, Chef, Finance, Apparel, Engineering, Accountant, Construction, Public-Relations, Banking, Arts, Aviation. 2.2 Preprocessing Figure 1: Resume #2 of dataset after preprocessing. Extracted job titles and bullet points are colored in red and green, respectively. We parsed HTML contents from resumes and extracted tags containing job titles and their corresponding bullet points. We then performed 1 https://www.kaggle.com/datasets/sneh aanbhawal/resume-dataset preprocessing steps to ensure that the data is formatted suitably for being fed as input to our language models. For instance, after extracting the necessary text from each resume, we remove all characters that are neither alphanumeric nor punctuation. We kept punctuation in order to retain proper formatting for dependent clauses and lists. We kept numbers simply as placeholders; even though the precise values may not apply to each user, he/she can easily copy and paste our generated bullet points and edit them by incorporating his/her individual experiences. Figure 1 visualizes the content we handle from a random resume’s HTML content. In order to enable users to customize the information they want to generate, down to the desired job title to be written about, we transformed our original dataset of 2,484 resumes into a dataset of 64,441 tuples of form (job title, bullet point) comprising all observed pairs of job titles and bullet points. This yields text samples of the form “<JOB_TITLE>: <BULLET_POINT>” such as: Assistant Head Teller: Consistently met or exceeded quarterly sales goals. By fine-tuning our language models on such text samples, we allow users to pass in a prompt of the form “<JOB_TITLE>”, optionally followed by a colon and one or more seed words, and receive synthetic job-specific bullet points. 2.3 Algorithm Language generation models that we experimented with include GPT-2 124M and GPT-Neo 125M (Black et al., 2021). Though we originally intended to use GPT-Neo 355M in order to examine the effect of additional neural network layers on the quality of GPT-Neo text generations, we were held back by GPU memory limitations of our Google Colab coding environments (Bisong, 2019). We chose the aforementioned models over the preferred state- of-the-art GPT-3 architecture proposed by OpenAI (Brown et al., 2020) due to the latter not being open source, as well as the heavy monetary costs of using OpenAI’s API to fine-tune on our 2 https://www.livecareer.com/resume-search/
  • 4. 4 sizable dataset. In contrast, GPT-Neo models developed by EleutherAI3 are open source and offer competitive performances in comparison with those of GPT-2 models.4 In contrast to GPT- 2 models, GPT-Neo models are said to be more suitable for longer texts and pre-trained on more recent data. To fine-tune and evaluate our language models, we used the aitextgen library,5 a Python toolkit for text-based AI training and generation across numerous deep learning architectures. We fine- tuned each language model on a single GPU for 10 epochs with a learning rate of 1e-3. 2.4 Evaluation 2.4.1 Human Evaluators We created a Google Form containing 48 text samples of both synthetic and human generated bullet points and presented them to 10 people to rate how ‘human-sounding’ they are. We first chose the top-six most frequent job titles from our dataset, which are Accountant, Sales Associate, Consultant, Teacher, Administrative Assistant, and Executive Chef. For each job title, we then chose two ground truth bullet points from the actual dataset, three bullet points generated by GPT-2, and three bullet points generated by GPT- Neo. We randomized the order of the bullet points within each job title and asked our human evaluators to rate them. The options were based on a 5-point Likert scale denoting (5) Strongly Agree, (4) Agree, (3) Neutral, (2) Disagree, and (1) Strongly Disagree. Evaluators were current undergraduate students. 2.4.2 BLEU Score Bilingual evaluation understudy (BLEU) is an algorithm for evaluating the quality of text (Papineni et al., 2002). Designed to evaluate language translation, BLEU has been shown to be a performant metric for many natural language generation tasks, having a high correlation with human judgment (Callison-Burch et al., 2006). It uses n-grams to calculate the similarity between generated text and a reference corpus, which is 3 https://www.eleuther.ai/ 4 https://towardsdatascience.com/guide- to-fine-tuning-text-generation-models- gpt-2-gpt-neo-and-t5-dc5de6b3bc5e especially good for short sentences. We decided to use BLEU as an initial metric to gauge text generation. 2.4.3 Preliminary Results We calculated the BLEU score for 6 phrases using GPT-Neo. The respective input job titles were chosen based on the number of data points it had: 2 with a lot of data, 2 with an average amount of data, and 2 with few data. Since BLEU score requires a ground truth phrase, the input needed to be unique so that there was only a single point in the data starting with the input. Some inputs are as follows: “Teacher: Developed lesson”, “Project Coordinator: Collected”, and “Finance and Sales Consultant: Planned”. The BLEU score for these initial results was 0.19. 3 Results Figure 2: Loss curves for GPT-2 124M and GPT-Neo 125M after fine-tuning for 10 epochs with learning rate of 1e-3. Job Title Bullet Point Sourc e Accou ntant Sales Assoc iate Cons ultant Teach er Admi nistra tive Assist ant Execu tive Chef Groun d Truth 2.80 (1.36) 3.55 (1.10) 3.35 (1.27) 3.15 (1.27) 4.05 (1.10) 2.95 (1.28) GPT- 2 124M 4.13 (1.20) 3.13 (1.63) 3.33 (1.27) 3.47 (1.57) 3.90 (1.21) 2.77 (1.52) 5 https://github.com/minimaxir/aitextgen
  • 5. 5 GPT- Neo 125M 2.67 (1.54) 4.40 (0.97) 2.23 (1.04) 3.77 (1.14) 2.20 (1.52) 2.23 (1.38) Table 1: Average human likeness scores (ranging from 1.0 to 5.0 where higher is better) across job types for resume bullet points taken from ground truth resumes versus generated from language models. Bolded values indicate the most natural-sounding bullet points across job types. Parentheses indicate standard deviations. Figure 2 shows the convergence of both language models upon fine-tuning, indicating that both GPT models are adjusting to subtleties in English language found across resume bullet points. Table 1 compares human likeness scores for resume bullet points generated by our models. 4 Discussion 4.1 BLEU Score BLEU scores are between 0 and 1, with 0.6 considered the best one could realistically achieve and 0.3 as decent. We got a relatively low score of 0.19 for our preliminary results. However, a low BLEU score does not necessarily mean the generated text is bad. There are many possibilities for a ‘good’ sentence on a resume and there is not one correct answer. The model may have generated a good text that was very different from the originals. We decided human evaluation would be better suited for our task. 4.2 Human Evaluation For only two of the top-six job titles do GPT-Neo text generations appear to be (on average) more humanlike than ground truth bullet points and GPT-2 text generations, whereas they fare poorly across all other job titles, making it a potentially less versatile model for clients seeking customization across a variety of job titles. Upon closer examination, the following GPT-Neo text generation for an ‘Administrative Assistant’ job title received the poorest Likert score of Strongly Disagree from 90% of evaluators: Ensuring that certifications and coordinates the billing and updates Although the proposed bullet point cleverly begins without a subject–this is assumed to refer to the job applicant across most resumes–it fails to complete the initial verb phrase describing how the applicant supposedly handles certifications, transitions unnaturally to how the applicant supposedly coordinates billing processes, and forms a run-on sentence bearing the conjunction ‘and’ twice. It seems that although the responsibilities of the job title have been learned by GPT-Neo over the course of fine-tuning, perhaps further fine-tuning on text samples that are ensured to be grammatically correct would help expose the model to syntactic features of language present in resumes. Another GPT-Neo text generation given a score of Strongly Disagree, this time by 80% of evaluators and for the ‘Executive Chef’ job title, was the following: Responsible for the daily operation of the food items for the operation of the food items The proposed bullet point exhibits repetition of the prepositional phrase “for the ?:(daily) operation of the food items,” prompting the need to study what a good balance would be between forced no-repetition and repeating cycles of identical n-grams. This balance could be implicitly learned by fine-tuning our language models further and by manually experimenting with the repetition penalty parameter (Keskar et al., 2019) featured in the generate method of our aitextgen pipeline. Regarding ethical concerns, recall that the resumes we used in this project were specifically those evaluated by LiveCareer agents to be fairly well-written. Consequently, our dataset is biased in terms of the quality of writing that real world resumes actually bear. Another concern is the usage of text generations. Our generations are meant to serve as a high-quality template. Clients should not misuse our system by taking its text generations verbatim, as neither GPT-2 nor GPT- Neo were fine-tuned on the client’s personal experiences. Rather, clients should incorporate their personal experiences into the writing styles present in our text generations. 5 Conclusion Algorithms that generate human-sounding text language are becoming ever more widely accessible. Models like GPT-2 and GPT-Neo can guide users through daunting writing processes, such as resume building. Understanding humans’ behavioral reactions to these algorithms helps shape future breakthroughs in the field that will address the shortcomings of the existing models. As a step in that direction, our study adopts a behavioral science approach to examine a balance of formulaic and creative artificial intelligence in
  • 6. 6 the form of resumes. As with the majority of studies, the design of our current study is subject to limitations. Due to the small sample size of our human evaluators, further research on a larger sample size is critical for reinforcing the accuracy of our results. We hope more studies follow suit to provide new behavioral insights into human versus innovative artificial intelligence. References Bisong, Ekaba. Building machine learning and deep learning models on Google cloud platform: A comprehensive guide for beginners. Apress, 2019. Black, Sid, et al. "GPT-Neo: Large scale autoregressive language modeling with mesh- tensorflow." If you use this software, please cite it using these metadata 58 (2021). Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901. Callison-Burch, Chris, Miles Osborne, and Philipp Koehn. "Re-evaluatingthe role of BLEU in machine translation research." 11th conference of the european chapter of the association for computational linguistics. 2006. Keskar, Nitish Shirish, et al. "Ctrl: A conditional transformer language model for controllable generation." arXiv preprint arXiv:1909.05858 (2019). Köbis, Nils, and Luca D. Mossink. "Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI- generated from human-written poetry." Computers in human behavior 114 (2021): 106553. Lee, Jieh-Sheng, and Jieh Hsiang. "Patent claim generation by fine-tuning OpenAI GPT-2." World Patent Information 62 (2020): 101983. Papineni, Kishore, et al. "Bleu: a method for automatic evaluation of machine translation." Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.