1130 track2 taylor

Getting the most out of NLP
for cover letter ranking,
resume screening,
interview conversations
and more.
Ben Taylor
Chief Data Scientist

• Sequoia Capital
• Largest Video Interviewing Platform
• Forbes #10 most promising companies
• Global: 189 countries
• Nearly half of the fortune 100 companies

Feature Creation
Model Selection
Feature Reduction

NATURAL LANGUAGE
PROCESSING (NLP)

GRIT MOTIVATION ENGAGEMENT PERFORMANCE
Basic Tutorial On How To Build A Numeric Feature Model
BUILDING A MODEL

ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE
Now what?!?
BUILDING A MODEL

ESSAY PERFORMANCE
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:
Bad = 0
Good = 1
Better = 2
Best = 3
Tokenize:
Female = 1
Male = 1
Female Male
1 0
0 1

I want to work here have great PERF.
1 1 1 1 1
1 1 1
1 1
Tokenize the text into unique word columns
BUILDING A MODEL
ESSAY PERFORMANCE

I want to work here have great PERF.
1 1 1 1 1
1 1 1
1 1
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL

Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL

I want Want to to go work here PERF.
1 1 1 1 1
1
1
Band-Aid: Concept of n-grams
BUILDING A MODEL

We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all.
SENTIMENT DATASET, 1.5M TWEETS
label text
neg @Christian_Rocha i miss u!!!!!
pos @llanitos there's still some St Werburghs hone...
pos @Ashley96 it's me
neg @Phillykidd we use to be like bestfriends
neg Just got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg "Ughh it's so hot & the singing lady is st...
neg @hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
neg Just watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...
pos @ninthspace how is the table building going?
pos by d way guyz I must log out na see u again to...
neg @dreday11 its only 20 mins...
Sentiment140
cs.stanford.edu
:(:)

Now we can go all the way to model training and prediction
SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 1
1
1
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
y
[0,1,0,1,1]

Now we can go all the way to model training and prediction
SENTIMENT DATASET – BUILD A MODEL
y
[0,1,0,1,1]
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
PERFORMANCE?

Best score yet
SENTIMENT DATASET – VALIDATION
60% 40%

Best score yet
SENTIMENT DATASET – VALIDATION
80% 20%

SENTIMENT DATASET – Validation
10 folds

Better Than Anyone Else
(benchmarks)

Time, your most valuable resource
Time (effort)
Accuracy

BIGRAM BOOST
acc: 0.8015
r: 0.2061
AUROC: 0.8738
acc: 0.7809
r: 0.1238
AUROC: 0.8554

BETTER MODELS
acc: 0.8208
r: 0.2832
AUROC: 0.8939
acc: 0.8015
r: 0.2061
AUROC: 0.8739
Was:
Now: (+10x average)

Upload Your
Resume
Now painstakingly fill out
this form containing all of
the exact same information

Mimicking the human recruiter
Feature Hunt
ONE FEATURE AT A TIME
INCREMENTAL GAINS

FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH
WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM)
Run
Running
Runner
Runs
Joy
Happiness
Smile
Friendly
Stemming Algorithm (NLTK)
?

FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH
WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM)
Run
Running
Runner
Runs
Joy
Happiness
Smile
Friendly
Stemming Algorithm (NLTK)
[('recognitions', 0.7759417295455933),
('award', 0.736858606338501),
('scholarships', 0.7121902704238892),
('commendations', 0.7114571332931519),
('recipient', 0.6931612491607666),
('accolades', 0.6901562213897705),
('recognition', 0.6832661628723145),
('presidential', 0.6819321513175964),
('bronze', 0.6669197678565979),
('distinguished', 0.666782021522522)]
[('cfo', 0.8624443411827087),
('coo', 0.8411941528320312),
('vp', 0.7637340426445007),
('vice', 0.7591078281402588),
('directors', 0.6882436275482178),
('vps', 0.6827613711357117),
('president', 0.6824430227279663),
('cmo', 0.6671531200408936),
('svp', 0.655689001083374),
('cto', 0.6270800828933716)]

What did they do?
34 interns, 8 left standing
AUC: 803

ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY ESSAY
1 2 3 4 5
0 0 0 1 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
LSTM
RAW TEXT WORD SEQUENCE
ENCODING
Keras

1130 track2 taylor

Recommended

Recommended

More Related Content

Similar to 1130 track2 taylor

Similar to 1130 track2 taylor (20)

More from Rising Media, Inc.

More from Rising Media, Inc. (20)

Recently uploaded

Recently uploaded (20)

1130 track2 taylor