SlideShare a Scribd company logo
1 of 19
Download to read offline
Model Driven Candidate Sorting
Based On Video Interview Cues
Benjamin Taylor
Chief Data Scientist
Outline
• Introduction
• Case study objective
• Big data landscape
• Problem setup
• Results/Conclusion
• Future work
@bentaylordata
Introduction
• Chemical Engineering (BS/MS/PhD Candidate)
• 5 years Intel/Micron
– Photolithography, process control, yield modeling
• AIQ Hedge fund
– 600 GPU chip cluster, algorithmic stock modeling, distributed metaheuristic
algorithms
• HireVue, Chief Data Scientist
– HR analytics, interview modeling
@bentaylordata
Case Study Objective
• Given 400 recorded video interviews for sales positions
and post hire performance data can improved sorting
efficiency be demonstrate out-of-sample?
V=400
Input Data Set Target Data Set, n=400
Personal Email Perf
rich.taylor@gmail.com Exceeds
wasatch@aol.com Meets
tradmonkey@mx.com Below
hsommer@gmail.com Meets
@bentaylordata
big
data
hadoop
Big data landscape
• Big data platforms have motivated innovations around
unstructured data handling. These innovations have
involved new algorithms and better unstructured
wrangling methods.
@bentaylordata
Big data landscape
• Unstructured data
– Data that does not have a predefine data model or schema, i.e.
tool logs, resumes, cover letters, images, audio, video, Twitter,
LinkedIn
• Structured data
– Data that fits within a predefined data model. Most common
structured data formats involve a column/row architecture.
Most familiar examples include spreadsheet software such as
Excel.
@bentaylordata
Problem setup
• Unstructured data challenge
– How do we convert the video into a manageable machine
ready format? AKA unstructured > structured data.
0.23,0.15,0.98,0.63,0.45,0.36…
1D Vector representation
Method?
@bentaylordata
UNSTRUCTURED
STRUCTURED
TOKENIZED
Problem Setup
• What is done for text modeling?
@bentaylordata
Problem Setup
• Piecemeal the structuring: final outputs are scalars
Audio
Video
Text
Signal Processing
Personality
Expression Signal Processing
ts
ts
us
us
us
us = unstructured data
ts = time series data
s = scalar data
s
@bentaylordata
Feature
Gen
Raw Audio Indicators
@bentaylordata
• Engagement
• Motivation
• Distress
• Aggression
Model
Personality Models
@bentaylordata
Feature
Gen
Video Indicators
@bentaylordata
Signal
Processing
F989 F990 F991
scalar
@bentaylordata
Combining All Features
X
56.341 -200.45 0 1
2 4 60.71 12 52.15 -350.12 1 1
Feature Mapping:
As the features are produced they
are stored in a matrix where each
column represents a feature and
each row represents an interview
2 4 60.71 12 52.15 -350.12 1 0
2 3 16.16 21 25.51 -105.21 0 0
NA
NA
NA
NA
NA
How To Build A Model
Model
Best
Fitness?
@bentaylordata
A Lesson On K-folding
@bentaylordata
Folds = 9
Cut your data up
into fixed folds
A Lesson On K-folding
@bentaylordata
Folds = 9 Fold = 1 Fold = 2… Y_pred
Fitness Metric?
Top Performer Accuracy AUC
@bentaylordata
Results:
Conclusion:
Using structured features
from audio and video we
are able to show predictive
sorting value in our out-of-
sample interviews.
Model AUC score
Bernoulli NB 0.75
Other 0.79
67.50% reduction in interview evaluation
>300% increase in concentration
@bentaylordata
Feature
Engineering
Auto Feature
Engineering
Future Work:
Future work involves offloading the feature engineering tasks to a more automated
Process such as deep learning or more advanced ensemble modeling methods.
My Contact Info:
Twitter: @bentaylordata
Email: btaylor@hirevue.com
LinkedIn: bentaylordata
@bentaylordata

More Related Content

Similar to #SIOP15 Presentation on

AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)byteLAKE
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Vikram_Sharma_M_Resume
Vikram_Sharma_M_ResumeVikram_Sharma_M_Resume
Vikram_Sharma_M_ResumeVikram Sharma
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveJune Andrews
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 
Pro engineer package overview new
Pro engineer package overview newPro engineer package overview new
Pro engineer package overview newpraveen99950
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision MakingBigML, Inc
 
Accelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIAccelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIFeatureByte
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionSkyl.ai
 
The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?
The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?
The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?panagenda
 
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCSession 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCMLconf
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Bharath Chinthamani_W1
Bharath Chinthamani_W1Bharath Chinthamani_W1
Bharath Chinthamani_W1Bharath Chary
 
Sumeet Kumar Dwivedi_Metadata_Manager
Sumeet Kumar Dwivedi_Metadata_ManagerSumeet Kumar Dwivedi_Metadata_Manager
Sumeet Kumar Dwivedi_Metadata_Managersumeetsdm
 

Similar to #SIOP15 Presentation on (20)

AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Vikram_Sharma_M_Resume
Vikram_Sharma_M_ResumeVikram_Sharma_M_Resume
Vikram_Sharma_M_Resume
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Pro engineer package overview new
Pro engineer package overview newPro engineer package overview new
Pro engineer package overview new
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
Accelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIAccelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAI
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?
The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?
The CEO Just Called Your Boss. His MS Teams calls keep dropping! What do you do?
 
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCSession 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Bharath Chinthamani_W1
Bharath Chinthamani_W1Bharath Chinthamani_W1
Bharath Chinthamani_W1
 
Montali - DB-Nets: On The Marriage of Colored Petri Nets 
and Relational Data...
Montali - DB-Nets: On The Marriage of Colored Petri Nets 
and Relational Data...Montali - DB-Nets: On The Marriage of Colored Petri Nets 
and Relational Data...
Montali - DB-Nets: On The Marriage of Colored Petri Nets 
and Relational Data...
 
BoSh Technology, Maharashtra
BoSh Technology, MaharashtraBoSh Technology, Maharashtra
BoSh Technology, Maharashtra
 
Sumeet Kumar Dwivedi_Metadata_Manager
Sumeet Kumar Dwivedi_Metadata_ManagerSumeet Kumar Dwivedi_Metadata_Manager
Sumeet Kumar Dwivedi_Metadata_Manager
 

More from Benjamin Taylor

How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A RockstarBenjamin Taylor
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Benjamin Taylor
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yieldBenjamin Taylor
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionBenjamin Taylor
 

More from Benjamin Taylor (7)

Python genetics
Python geneticsPython genetics
Python genetics
 
Homeless story
Homeless storyHomeless story
Homeless story
 
How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A Rockstar
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yield
 
Text analytics intro
Text analytics introText analytics intro
Text analytics intro
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
 

#SIOP15 Presentation on

  • 1. Model Driven Candidate Sorting Based On Video Interview Cues Benjamin Taylor Chief Data Scientist
  • 2. Outline • Introduction • Case study objective • Big data landscape • Problem setup • Results/Conclusion • Future work @bentaylordata
  • 3. Introduction • Chemical Engineering (BS/MS/PhD Candidate) • 5 years Intel/Micron – Photolithography, process control, yield modeling • AIQ Hedge fund – 600 GPU chip cluster, algorithmic stock modeling, distributed metaheuristic algorithms • HireVue, Chief Data Scientist – HR analytics, interview modeling @bentaylordata
  • 4. Case Study Objective • Given 400 recorded video interviews for sales positions and post hire performance data can improved sorting efficiency be demonstrate out-of-sample? V=400 Input Data Set Target Data Set, n=400 Personal Email Perf rich.taylor@gmail.com Exceeds wasatch@aol.com Meets tradmonkey@mx.com Below hsommer@gmail.com Meets @bentaylordata
  • 5. big data hadoop Big data landscape • Big data platforms have motivated innovations around unstructured data handling. These innovations have involved new algorithms and better unstructured wrangling methods. @bentaylordata
  • 6. Big data landscape • Unstructured data – Data that does not have a predefine data model or schema, i.e. tool logs, resumes, cover letters, images, audio, video, Twitter, LinkedIn • Structured data – Data that fits within a predefined data model. Most common structured data formats involve a column/row architecture. Most familiar examples include spreadsheet software such as Excel. @bentaylordata
  • 7. Problem setup • Unstructured data challenge – How do we convert the video into a manageable machine ready format? AKA unstructured > structured data. 0.23,0.15,0.98,0.63,0.45,0.36… 1D Vector representation Method? @bentaylordata
  • 8. UNSTRUCTURED STRUCTURED TOKENIZED Problem Setup • What is done for text modeling? @bentaylordata
  • 9. Problem Setup • Piecemeal the structuring: final outputs are scalars Audio Video Text Signal Processing Personality Expression Signal Processing ts ts us us us us = unstructured data ts = time series data s = scalar data s @bentaylordata
  • 11. • Engagement • Motivation • Distress • Aggression Model Personality Models @bentaylordata
  • 13. @bentaylordata Combining All Features X 56.341 -200.45 0 1 2 4 60.71 12 52.15 -350.12 1 1 Feature Mapping: As the features are produced they are stored in a matrix where each column represents a feature and each row represents an interview 2 4 60.71 12 52.15 -350.12 1 0 2 3 16.16 21 25.51 -105.21 0 0 NA NA NA NA NA
  • 14. How To Build A Model Model Best Fitness? @bentaylordata
  • 15. A Lesson On K-folding @bentaylordata Folds = 9 Cut your data up into fixed folds
  • 16. A Lesson On K-folding @bentaylordata Folds = 9 Fold = 1 Fold = 2… Y_pred
  • 17. Fitness Metric? Top Performer Accuracy AUC @bentaylordata
  • 18. Results: Conclusion: Using structured features from audio and video we are able to show predictive sorting value in our out-of- sample interviews. Model AUC score Bernoulli NB 0.75 Other 0.79 67.50% reduction in interview evaluation >300% increase in concentration @bentaylordata
  • 19. Feature Engineering Auto Feature Engineering Future Work: Future work involves offloading the feature engineering tasks to a more automated Process such as deep learning or more advanced ensemble modeling methods. My Contact Info: Twitter: @bentaylordata Email: btaylor@hirevue.com LinkedIn: bentaylordata @bentaylordata

Editor's Notes

  1. Hadoop story: Why is it called Hadoop? Google paper?
  2. Hadoop story: Why is it called Hadoop? Google paper?
  3. Hadoop story: Why is it called Hadoop? Google paper?
  4. <expand… categorical > tokenizing [assume dependent or independent] Discuss <> Gender: [Name modification >> ]
  5. <expand… categorical > tokenizing [assume dependent or independent] Discuss <> Gender: [Name modification >> ]