SlideShare a Scribd company logo
1 of 40
Getting the most out of NLP
for cover letter ranking,
resume screening,
interview conversations
and more.
​Ben Taylor
​Chief Data Scientist
PERSONAL
INTRODUCTION
A
Career
Less
Traveled
• Sequoia Capital
• Largest Video Interviewing Platform
• Forbes #10 most promising companies
• Global: 189 countries
• Nearly half of the fortune 100 companies
Audio VideoText
Feature Creation
Model Selection
Feature Reduction
NATURAL LANGUAGE
PROCESSING (NLP)
GRIT MOTIVATION ENGAGEMENT PERFORMANCE
Basic Tutorial On How To Build A Numeric Feature Model
BUILDING A MODEL
ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE
Now what?!?
BUILDING A MODEL
ESSAY PERFORMANCE
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:
Bad = 0
Good = 1
Better = 2
Best = 3
Tokenize:
Female = 1
Male = 1
Female Male
1 0
0 1
I want to work here have great PERF.
1 1 1 1 1
1 1 1
1 1
Tokenize the text into unique word columns
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here have great PERF.
1 1 1 1 1
1 1 1
1 1
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
I want Want to to go work here PERF.
1 1 1 1 1
1
1
Band-Aid: Concept of n-grams
BUILDING A MODEL
SENTIMENT EXAMPLE
(binary)
We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all.
SENTIMENT DATASET, 1.5M TWEETS
label text
neg @Christian_Rocha i miss u!!!!!
pos @llanitos there's still some St Werburghs hone...
pos @Ashley96 it's me
neg @Phillykidd we use to be like bestfriends
neg Just got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg "Ughh it's so hot & the singing lady is st...
neg @hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
neg Just watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...
pos @ninthspace how is the table building going?
pos by d way guyz I must log out na see u again to...
neg @dreday11 its only 20 mins...
Sentiment140
cs.stanford.edu
:(:)
Now we can go all the way to model training and prediction
SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 1
1
1
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
y
[0,1,0,1,1]
Now we can go all the way to model training and prediction
SENTIMENT DATASET – BUILD A MODEL
y
[0,1,0,1,1]
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
PERFORMANCE?
PROPER MODEL VALIDATION
Best score yet
SENTIMENT DATASET – VALIDATION
60% 40%
Best score yet
SENTIMENT DATASET – VALIDATION
80% 20%
SENTIMENT DATASET – Validation
10 folds
Better Than Anyone Else
(benchmarks)
Time, your most valuable resource
Time (effort)
Accuracy
ACC: 77.65%
AUC: 85.49%
7min
BIGRAM BOOST
acc: 0.8015
r: 0.2061
AUROC: 0.8738
acc: 0.7809
r: 0.1238
AUROC: 0.8554
BETTER MODELS
acc: 0.8208
r: 0.2832
AUROC: 0.8939
acc: 0.8015
r: 0.2061
AUROC: 0.8739
Was:
Now: (+10x average)
RESUME MODELING
(binary)
Upload Your
Resume
Now painstakingly fill out
this form containing all of
the exact same information
GPA Inclusion (18%)
GPA Replacement
Mimicking the human recruiter
Feature Hunt
ONE FEATURE AT A TIME
INCREMENTAL GAINS
FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH
WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM)
Run
Running
Runner
Runs
Joy
Happiness
Smile
Friendly
Stemming Algorithm (NLTK)
?
FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH
WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM)
Run
Running
Runner
Runs
Joy
Happiness
Smile
Friendly
Stemming Algorithm (NLTK)
[('recognitions', 0.7759417295455933),
('award', 0.736858606338501),
('scholarships', 0.7121902704238892),
('commendations', 0.7114571332931519),
('recipient', 0.6931612491607666),
('accolades', 0.6901562213897705),
('recognition', 0.6832661628723145),
('presidential', 0.6819321513175964),
('bronze', 0.6669197678565979),
('distinguished', 0.666782021522522)]
[('cfo', 0.8624443411827087),
('coo', 0.8411941528320312),
('vp', 0.7637340426445007),
('vice', 0.7591078281402588),
('directors', 0.6882436275482178),
('vps', 0.6827613711357117),
('president', 0.6824430227279663),
('cmo', 0.6671531200408936),
('svp', 0.655689001083374),
('cto', 0.6270800828933716)]
What did they do?
34 interns, 8 left standing
AUC: 803
ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY ESSAY
1 2 3 4 5
0 0 0 1 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
LSTM
RAW TEXT WORD SEQUENCE
ENCODING
Keras
QUESTIONS

More Related Content

Similar to 1130 track2 taylor

Keynote training 2011
Keynote training 2011Keynote training 2011
Keynote training 2011
Doug Evans
 
Extreme Programming practices for your team
Extreme Programming practices for your teamExtreme Programming practices for your team
Extreme Programming practices for your team
Pawel Lipinski
 
Bootstrapping a-devops-matter
Bootstrapping a-devops-matterBootstrapping a-devops-matter
Bootstrapping a-devops-matter
Skills Matter
 
Pratice Material - Powerpoint Workshop Updated.pptx
Pratice Material - Powerpoint Workshop Updated.pptxPratice Material - Powerpoint Workshop Updated.pptx
Pratice Material - Powerpoint Workshop Updated.pptx
studyneur
 
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
Insight Technology, Inc.
 

Similar to 1130 track2 taylor (20)

Keynote training 2011
Keynote training 2011Keynote training 2011
Keynote training 2011
 
Serendipity by Design - IxD S. America 13
Serendipity by Design - IxD S. America 13Serendipity by Design - IxD S. America 13
Serendipity by Design - IxD S. America 13
 
AstriCon 2017 - Machine Learning, AI & Asterisk
AstriCon 2017  - Machine Learning, AI & AsteriskAstriCon 2017  - Machine Learning, AI & Asterisk
AstriCon 2017 - Machine Learning, AI & Asterisk
 
My Mom Doesn't Like the Font—Applying UX to Design Presentations for Better C...
My Mom Doesn't Like the Font—Applying UX to Design Presentations for Better C...My Mom Doesn't Like the Font—Applying UX to Design Presentations for Better C...
My Mom Doesn't Like the Font—Applying UX to Design Presentations for Better C...
 
Mawd2 program 2012
Mawd2 program 2012Mawd2 program 2012
Mawd2 program 2012
 
Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP
 
Extreme Programming practices for your team
Extreme Programming practices for your teamExtreme Programming practices for your team
Extreme Programming practices for your team
 
Games based learning_in_the_corporate_world
Games based learning_in_the_corporate_worldGames based learning_in_the_corporate_world
Games based learning_in_the_corporate_world
 
Presenting to Executive Leadership
Presenting to Executive LeadershipPresenting to Executive Leadership
Presenting to Executive Leadership
 
What Developers Need To Know About Visual Design
What Developers Need To Know About Visual DesignWhat Developers Need To Know About Visual Design
What Developers Need To Know About Visual Design
 
Going Staff
Going StaffGoing Staff
Going Staff
 
Bootstrapping a-devops-matter
Bootstrapping a-devops-matterBootstrapping a-devops-matter
Bootstrapping a-devops-matter
 
[QE 2018] Przemysław Sech – Software Quality Assistance w 40 minut
[QE 2018] Przemysław Sech – Software Quality Assistance w 40 minut[QE 2018] Przemysław Sech – Software Quality Assistance w 40 minut
[QE 2018] Przemysław Sech – Software Quality Assistance w 40 minut
 
Trailhead: the Free, Fun Way to Learn Salesforce
Trailhead: the Free, Fun Way to Learn SalesforceTrailhead: the Free, Fun Way to Learn Salesforce
Trailhead: the Free, Fun Way to Learn Salesforce
 
Pratice Material - Powerpoint Workshop Updated.pptx
Pratice Material - Powerpoint Workshop Updated.pptxPratice Material - Powerpoint Workshop Updated.pptx
Pratice Material - Powerpoint Workshop Updated.pptx
 
Treating your career path and training like leveling up in games by Raymond C...
Treating your career path and training like leveling up in games by Raymond C...Treating your career path and training like leveling up in games by Raymond C...
Treating your career path and training like leveling up in games by Raymond C...
 
What made you a software testing leader?
What made you a software testing leader?What made you a software testing leader?
What made you a software testing leader?
 
Becoming a Better Programmer (2013)
Becoming a Better Programmer (2013)Becoming a Better Programmer (2013)
Becoming a Better Programmer (2013)
 
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
 
O365Con19 - UI:UX 101 Learn How to Design Custom Experiences for SharePoint -...
O365Con19 - UI:UX 101 Learn How to Design Custom Experiences for SharePoint -...O365Con19 - UI:UX 101 Learn How to Design Custom Experiences for SharePoint -...
O365Con19 - UI:UX 101 Learn How to Design Custom Experiences for SharePoint -...
 

More from Rising Media, Inc.

More from Rising Media, Inc. (20)

1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Keynote adam greco
Keynote adam grecoKeynote adam greco
Keynote adam greco
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptop
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop
 
1415 track 2 richardson
1415 track 2 richardson1415 track 2 richardson
1415 track 2 richardson
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop
 
915 e metrics_claudia perlich
915 e metrics_claudia perlich915 e metrics_claudia perlich
915 e metrics_claudia perlich
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop
 
1615 plack using our laptop
1615 plack using our laptop1615 plack using our laptop
1615 plack using our laptop
 
1530 rimmele do not share
1530 rimmele do not share1530 rimmele do not share
1530 rimmele do not share
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable
 
1115 fiztgerald schuchardt
1115 fiztgerald schuchardt1115 fiztgerald schuchardt
1115 fiztgerald schuchardt
 
1000 kondic do not share
1000 kondic do not share1000 kondic do not share
1000 kondic do not share
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptop
 
Stephen morse sharable
Stephen morse sharableStephen morse sharable
Stephen morse sharable
 
Elder shareable
Elder shareableElder shareable
Elder shareable
 
1115 ramirez using our laptop
1115 ramirez using our laptop1115 ramirez using our laptop
1115 ramirez using our laptop
 

Recently uploaded

00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
nafizanafzal
 
Contact +971581248768 for 100% original and safe abortion pills available for...
Contact +971581248768 for 100% original and safe abortion pills available for...Contact +971581248768 for 100% original and safe abortion pills available for...
Contact +971581248768 for 100% original and safe abortion pills available for...
DUBAI (+971)581248768 BUY ABORTION PILLS IN ABU dhabi...Qatar
 
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Klinik kandungan
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
LR1709MUSIC
 

Recently uploaded (20)

Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deckPitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
 
How Bookkeeping helps you in Cost Saving, Tax Saving and Smooth Business Runn...
How Bookkeeping helps you in Cost Saving, Tax Saving and Smooth Business Runn...How Bookkeeping helps you in Cost Saving, Tax Saving and Smooth Business Runn...
How Bookkeeping helps you in Cost Saving, Tax Saving and Smooth Business Runn...
 
Beyond Numbers A Holistic Approach to Forensic Accounting
Beyond Numbers A Holistic Approach to Forensic AccountingBeyond Numbers A Holistic Approach to Forensic Accounting
Beyond Numbers A Holistic Approach to Forensic Accounting
 
Thompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptx
Thompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptxThompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptx
Thompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptx
 
Progress Report - Oracle's OCI Analyst Summit 2024
Progress Report - Oracle's OCI Analyst Summit 2024Progress Report - Oracle's OCI Analyst Summit 2024
Progress Report - Oracle's OCI Analyst Summit 2024
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
JIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDE
JIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDEJIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDE
JIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDE
 
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
 
JAJPUR CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JAJPUR ESCORTS SERVICE PROVIDE
JAJPUR CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JAJPUR  ESCORTS SERVICE PROVIDEJAJPUR CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JAJPUR  ESCORTS SERVICE PROVIDE
JAJPUR CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JAJPUR ESCORTS SERVICE PROVIDE
 
Moradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in PenacovaMoradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in Penacova
 
SCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjw
SCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjwSCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjw
SCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjw
 
Contact +971581248768 for 100% original and safe abortion pills available for...
Contact +971581248768 for 100% original and safe abortion pills available for...Contact +971581248768 for 100% original and safe abortion pills available for...
Contact +971581248768 for 100% original and safe abortion pills available for...
 
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptxGoal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
 
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
 
Learn How To Start Buy Verified Payoneer Accounts
Learn How To Start Buy Verified Payoneer AccountsLearn How To Start Buy Verified Payoneer Accounts
Learn How To Start Buy Verified Payoneer Accounts
 
Ital Liptz - all about Itai Liptz. news.
Ital Liptz - all about Itai Liptz. news.Ital Liptz - all about Itai Liptz. news.
Ital Liptz - all about Itai Liptz. news.
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
 
JEYPORE CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JEYPORE ESCORTS SERVICE PROVIDE
JEYPORE CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JEYPORE ESCORTS SERVICE PROVIDEJEYPORE CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JEYPORE ESCORTS SERVICE PROVIDE
JEYPORE CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JEYPORE ESCORTS SERVICE PROVIDE
 
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
 
The Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and UncertaintyThe Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and Uncertainty
 

1130 track2 taylor

  • 1. Getting the most out of NLP for cover letter ranking, resume screening, interview conversations and more. ​Ben Taylor ​Chief Data Scientist
  • 3.
  • 5.
  • 6. • Sequoia Capital • Largest Video Interviewing Platform • Forbes #10 most promising companies • Global: 189 countries • Nearly half of the fortune 100 companies
  • 10. GRIT MOTIVATION ENGAGEMENT PERFORMANCE Basic Tutorial On How To Build A Numeric Feature Model BUILDING A MODEL
  • 11. ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE Now what?!? BUILDING A MODEL
  • 12. ESSAY PERFORMANCE There are really two different options, mapping or tokenizing BUILDING A MODEL Map: Bad = 0 Good = 1 Better = 2 Best = 3 Tokenize: Female = 1 Male = 1 Female Male 1 0 0 1
  • 13. I want to work here have great PERF. 1 1 1 1 1 1 1 1 1 1 Tokenize the text into unique word columns BUILDING A MODEL ESSAY PERFORMANCE
  • 14. I want to work here have great PERF. 1 1 1 1 1 1 1 1 1 1 Bag of words modeling, sequence and ordering is lost BUILDING A MODEL
  • 15. Bag of words modeling, sequence and ordering is lost BUILDING A MODEL
  • 16. I want Want to to go work here PERF. 1 1 1 1 1 1 1 Band-Aid: Concept of n-grams BUILDING A MODEL
  • 17.
  • 19. We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all. SENTIMENT DATASET, 1.5M TWEETS label text neg @Christian_Rocha i miss u!!!!! pos @llanitos there's still some St Werburghs hone... pos @Ashley96 it's me neg @Phillykidd we use to be like bestfriends neg Just got back from Manchester. I went to the T... pos @LauraDark thnks x el rt neg "Ughh it's so hot & the singing lady is st... neg @hnprashanth @dkris I was out to my native for... pos Girls night with the bests Wish you were here J! neg Just watched @paulkehler rock the crap out of ... pos i got the gurl! i got the ride! now im just on... pos @ninthspace how is the table building going? pos by d way guyz I must log out na see u again to... neg @dreday11 its only 20 mins... Sentiment140 cs.stanford.edu :(:)
  • 20. Now we can go all the way to model training and prediction SENTIMENT DATASET – BIGRAM I want Want to to go work here 1 1 1 1 1 1 1 text_data [[‘this is a tweet’] [‘sounds good’] [‘not really’]] y [0,1,0,1,1]
  • 21. Now we can go all the way to model training and prediction SENTIMENT DATASET – BUILD A MODEL y [0,1,0,1,1] X [4,0,0,0,0,7,0,0,1] [0,0,0,0,9,0,0,0,2] PERFORMANCE?
  • 23. Best score yet SENTIMENT DATASET – VALIDATION 60% 40%
  • 24. Best score yet SENTIMENT DATASET – VALIDATION 80% 20%
  • 25. SENTIMENT DATASET – Validation 10 folds
  • 26. Better Than Anyone Else (benchmarks)
  • 27. Time, your most valuable resource Time (effort) Accuracy
  • 29. BIGRAM BOOST acc: 0.8015 r: 0.2061 AUROC: 0.8738 acc: 0.7809 r: 0.1238 AUROC: 0.8554
  • 30. BETTER MODELS acc: 0.8208 r: 0.2832 AUROC: 0.8939 acc: 0.8015 r: 0.2061 AUROC: 0.8739 Was: Now: (+10x average)
  • 32. Upload Your Resume Now painstakingly fill out this form containing all of the exact same information
  • 35. Mimicking the human recruiter Feature Hunt ONE FEATURE AT A TIME INCREMENTAL GAINS
  • 36. FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM) Run Running Runner Runs Joy Happiness Smile Friendly Stemming Algorithm (NLTK) ?
  • 37. FOCUS ON FIXING WORD SCARCITY IN YOUR MODELING APPROACH WORD SCARCITY A MAJOR PROBLEM (BOW & LSTM) Run Running Runner Runs Joy Happiness Smile Friendly Stemming Algorithm (NLTK) [('recognitions', 0.7759417295455933), ('award', 0.736858606338501), ('scholarships', 0.7121902704238892), ('commendations', 0.7114571332931519), ('recipient', 0.6931612491607666), ('accolades', 0.6901562213897705), ('recognition', 0.6832661628723145), ('presidential', 0.6819321513175964), ('bronze', 0.6669197678565979), ('distinguished', 0.666782021522522)] [('cfo', 0.8624443411827087), ('coo', 0.8411941528320312), ('vp', 0.7637340426445007), ('vice', 0.7591078281402588), ('directors', 0.6882436275482178), ('vps', 0.6827613711357117), ('president', 0.6824430227279663), ('cmo', 0.6671531200408936), ('svp', 0.655689001083374), ('cto', 0.6270800828933716)]
  • 38. What did they do? 34 interns, 8 left standing AUC: 803
  • 39. ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE AUTOMATIC FEATURE GENERATION ESSAY ESSAY 1 2 3 4 5 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 LSTM RAW TEXT WORD SEQUENCE ENCODING Keras