Using Deep Learning And NLP To Predict Performance From Resumes

Using Deep Learning To
Predict Performance From
Resumes
Ben Taylor, Chief Data Scientist

Ben Taylor @bentaylordata
Background Personal

• Sequoia Capital
• Largest Video Interviewing Platform
• Forbes #10 most promising companies
• Global: 189 countries

NATURAL LANGUAGE
PROCESSING (NLP)

GRIT MOTIVATION ENGAGEMENT PERFORMANCE
1 55 80 95%
0 75 10 22%
0 50 20 57%
1 20 90 91%
0 40 60 11%
Basic Tutorial On How To Build A Numeric Feature Model
BUILDING A MODEL

ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE
I want to work here 1 55 80 95%
I have great teamwork 0 75 10 22%
Synergy 0 50 20 57%
I have so much grit 1 20 90 91%
They fired that individual 0 40 60 11%
Now what?!?
BUILDING A MODEL

ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:
Bad = 0
Good = 1
Better = 2
Best = 3
Tokenize:
Female = 1
Male = 1
Female Male
1 0
0 1

I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Tokenize the text into unique word columns
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%

I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL

Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL

I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
Band-Aid: Concept of n-grams
BUILDING A MODEL

SENTIMENT EXAMPLE
(multiclass)

We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all.
SENTIMENT DATASET, 1.5M TWEETS
label text
neg @Christian_Rocha i miss u!!!!!
pos @llanitos there's still some St Werburghs hone...
pos @Ashley96 it's me
neg @Phillykidd we use to be like bestfriends
neg Just got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg "Ughh it's so hot & the singing lady is st...
neg @hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
neg Just watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...
pos @ninthspace how is the table building going?
pos by d way guyz I must log out na see u again to...
neg @dreday11 its only 20 mins...
Sentiment140
cs.stanford.edu
:(:)

Before we can process this we need to do the proper formatting to get it ready
SENTIMENT DATASET - FORMATTING
text
@Christian_Rocha i miss u!!!!!
@llanitos there's still some St Werburghs hone...
@Ashley96 it's me
@Phillykidd we use to be like bestfriends
Just got back from Manchester. I went to the T...
@LauraDark thnks x el rt
"Ughh it's so hot & the singing lady is st...
@hnprashanth @dkris I was out to my native for...
Girls night with the bests Wish you were here J!
Just watched @paulkehler rock the crap out of ...
i got the gurl! i got the ride! now im just on...
@ninthspace how is the table building going?
by d way guyz I must log out na see u again to...
@dreday11 its only 20 mins...
Python list

Now we can go all the way to model training and prediction
SENTIMENT DATASET – UNIGRAM
y
[0,1,0,1,1]
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
I want to work here have great
1 1 1 1 1 0 0
1 0 0 0 0 1 1
0 0 0 0 0 0 0
1 0 0 0 0 1 0
0 0 0 0 0 0 0

SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 1
1 0 0 0 0
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
y
[0,1,0,1,1]

Convert labels to integers
Python int array
label
neg
pos
pos
neg
neg
pos
neg
neg
pos
neg
pos
pos
pos
neg

Convert labels to integers
model.fit(X,Y)
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]

SENTIMENT DATASET – BUILD A MODEL
y
[0,1,0,1,1]
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
PERFORMANCE?

We need to hold out data we can test against, this is called your validation set
SENTIMENT DATASET – VALIDATION

Train on 20%, test on 80%
20% 80%

Best score yet
60% 40%

Best score yet
70% 30%

Best score yet
80% 20%

Best score yet
99% 1%

Perfect scores
99.9999% 2

Predict Every Point, k-folding
Folds = 9 Fold = 1 Fold = 2… Y_pred

SENTIMENT DATASET – Validation
10 folds

SENTIMENT DATASET – Validation
100 folds

BIGRAM BOOST
acc: 0.8015
r: 0.2061
AUROC: 0.8738
acc: 0.7809
r: 0.1238
AUROC: 0.8554

Feature Creation
Model Selection
Feature Reduction

BETTER MODELS
acc: 0.8208
r: 0.2832
AUROC: 0.8939
acc: 0.8015
r: 0.2061
AUROC: 0.8739
Was:
Now: (10x average)

EMAIL CLASSIFICATION
(multiclass)

EMAIL MULTICLASS DATASET (20 classes)
alt.atheism
comp.graphics
comp.os.ms-windows.misc
comp.sys.ibm.pc.hardware
comp.sys.mac.hardware
comp.windows.x
misc.forsale
rec.autos
rec.motorcycles
rec.sport.baseball
rec.sport.hockey
sci.crypt
sci.electronics
sci.med
sci.space
soc.religion.christian
talk.politics.guns
talk.politics.mideast
talk.politics.misc
talk.religion.misc

From: lerxst@wam.umd.edu (where's my thing)
Subject: WHAT car is this!?
Nntp-Posting-Host: rac3.wam.umd.edu
Organization: University of Maryland, College Park
Lines: 15
MSG: I was wondering if anyone out there could enlighten me on this car I sawnthe other day. It was a 2-door
sports car, looked to be from the late 60s/nearly 70s. It was called a Bricklin. The doors were really small. In
addition,nthe front bumper was separate from the rest of the body. This is nall I know. If anyone can tellme a
model name, engine specs, yearsnof production, where this car is made, history, or whatever info younhave on
this funky looking car, please e-mail.nnThanks,n- ILn ---- brought to you by your neighborhood Lerxst ----
nnnnn"
rec.autos

From: guykuo@carson.u.washington.edu (Guy Kuo)
Subject: SI Clock Poll - Final Call
Summary: Final call for SI clock reports
Keywords: SI,acceleration,clock,upgrade
Article-I.D.: shelley.1qvfo9INNc3s
Organization: University of Washington
Lines: 11
NNTP-Posting-Host: carson.u.washington.edu
MSG: A fair number of brave souls who upgraded their SI clock oscillator havenshared their experiences for
this poll. Please send a brief message detailingnyour experiences with the procedure. Top speed attained, CPU
rated speed,nadd on cards and adapters, heat sinks, hour of usage per day, floppy disknfunctionality with 800
and 1.4 m floppies are especially requested.nnI will be summarizing in the next two days, so please add to the
networknknowledge base if you have done the clock upgrade and haven't answered thisnpoll. Thanks.nnGuy
Kuo <guykuo@u.washington.edu>n"
comp.sys.mac.hardware

From: jgreen@amber (Joe Green)
Subject: Re: Weitek P9000 ?
Organization: Harris Computer Systems Division
Lines: 14
Distribution: world
NNTP-Posting-Host: amber.ssd.csd.harris.com
X-Newsreader: TIN [version 1.1 PL9]
MSG: Robert J.C. Kyanko (rob@rjck.UUCP) wrote:n> abraxis@iastate.edu writes in article
<abraxis.734340159@class1.iastate.edu>:n> > Anyone know about the Weitek P9000 graphics chip?n> As far
as the low-level stuff goes, it looks pretty nice. It's got thisn> quadrilateral fill command that requires just the
four points.nnDo you have Weitek's address/phone number? I'd like to get some informationnabout this
chip.nn--nJoe GreenttttHarris Corporationnjgreen@csd.harris.comtttComputer Systems Divisionn"The
only thing that really scares me is a person with no sense of humor."ntttttt-- Jonathan Wintersn’
comp.graphics

Upload Your
Resume
Now painstakingly fill out
this form containing all of
the exact same information

Document modeling review
UNSTRUCTURED
STRUCTURED
MUNGED

Mimicking the human recruiter
Feature Hunt
ONE FEATURE AT A TIME
INCREMENTAL GAINS

Unstructured
ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
Structured
I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual

ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual
ESSAY
3 2 1 4 5
3 7 67 345
54
3 7 99 10234
78 203 501 14
1 2 3 4 5
0 0 0 1 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
LSTM
RAW TEXT WORD SEQUENCE
ENCODING

BEGIN SCRATCHING AT LAYOUT
AUTOMATIC FEATURE GENERATION (LAYOUT)
CNN:
bit.ly/pacon

59
Would you ever hire from just a resume?
INTERVIEW MODELING
SOFT/TECHNICAL COMPETENCIES
Resume can overstate and understate

Using Deep Learning And NLP To Predict Performance From Resumes

Recommended

Recommended

More Related Content

Similar to Using Deep Learning And NLP To Predict Performance From Resumes

Similar to Using Deep Learning And NLP To Predict Performance From Resumes (20)

More from Benjamin Taylor

More from Benjamin Taylor (10)

Recently uploaded

Recently uploaded (20)

Using Deep Learning And NLP To Predict Performance From Resumes

Editor's Notes