6. • Sequoia Capital
• Largest Video Interviewing Platform
• Forbes #10 most promising companies
• Global: 189 countries
• Nearly half of the fortune 100 companies
12. ESSAY PERFORMANCE
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:
Bad = 0
Good = 1
Better = 2
Best = 3
Tokenize:
Female = 1
Male = 1
Female Male
1 0
0 1
13. I want to work here have great PERF.
1 1 1 1 1
1 1 1
1 1
Tokenize the text into unique word columns
BUILDING A MODEL
ESSAY PERFORMANCE
14. I want to work here have great PERF.
1 1 1 1 1
1 1 1
1 1
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
15. Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
16. I want Want to to go work here PERF.
1 1 1 1 1
1
1
Band-Aid: Concept of n-grams
BUILDING A MODEL
19. We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all.
SENTIMENT DATASET, 1.5M TWEETS
label text
neg @Christian_Rocha i miss u!!!!!
pos @llanitos there's still some St Werburghs hone...
pos @Ashley96 it's me
neg @Phillykidd we use to be like bestfriends
neg Just got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg "Ughh it's so hot & the singing lady is st...
neg @hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
neg Just watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...
pos @ninthspace how is the table building going?
pos by d way guyz I must log out na see u again to...
neg @dreday11 its only 20 mins...
Sentiment140
cs.stanford.edu
:(:)
20. Now we can go all the way to model training and prediction
SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 1
1
1
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
y
[0,1,0,1,1]
21. Now we can go all the way to model training and prediction
SENTIMENT DATASET – BUILD A MODEL
y
[0,1,0,1,1]
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
PERFORMANCE?
35. Mimicking the human recruiter
Feature Hunt
ONE FEATURE AT A TIME
INCREMENTAL GAINS
36. FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH
WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM)
Run
Running
Runner
Runs
Joy
Happiness
Smile
Friendly
Stemming Algorithm (NLTK)
?
37. FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH
WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM)
Run
Running
Runner
Runs
Joy
Happiness
Smile
Friendly
Stemming Algorithm (NLTK)
[('recognitions', 0.7759417295455933),
('award', 0.736858606338501),
('scholarships', 0.7121902704238892),
('commendations', 0.7114571332931519),
('recipient', 0.6931612491607666),
('accolades', 0.6901562213897705),
('recognition', 0.6832661628723145),
('presidential', 0.6819321513175964),
('bronze', 0.6669197678565979),
('distinguished', 0.666782021522522)]
[('cfo', 0.8624443411827087),
('coo', 0.8411941528320312),
('vp', 0.7637340426445007),
('vice', 0.7591078281402588),
('directors', 0.6882436275482178),
('vps', 0.6827613711357117),
('president', 0.6824430227279663),
('cmo', 0.6671531200408936),
('svp', 0.655689001083374),
('cto', 0.6270800828933716)]
38. What did they do?
34 interns, 8 left standing
AUC: 803
39. ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY ESSAY
1 2 3 4 5
0 0 0 1 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
LSTM
RAW TEXT WORD SEQUENCE
ENCODING
Keras