Talent Search and Recommendation
Systems at LinkedIn
Practical Challenges and Lessons Learned
Qi Guo, Sahin Cem Geyik, Bo Hu, Cagri Ozcaglar,
Ketan Thakkar, Xianren Wu, Krishnaram Kenthapadi
AI @ LinkedIn
+SIGIR 2018
+
The Team
Qi Guo Sahin Cem Geyik Bo Hu Cagri Ozcaglar
Ketan Thakkar Xianren Wu Krishnaram Kenthapadi
Contents
• Introduction
• Ranking Models for Talent Search
• Personalization
• Talent Search Architecture
• Summary
Introduction
LinkedIn Talent Solution:
~65% of LinkedIn’s Annual Revenue
A H I R I N G E C O S Y S T E M
LinkedIn Recruiter
MAJOR PRODUCT
A Talent Search and
Recommendation System
Recruiter Search
• Criteria-Based Search
• A recruiter has specific requisitions to fill
• Candidate Recommendation System
• A recruiter may want many qualified candidates, goes through pages
• Considers Both Sides of the Talent Marketplace
• Talents are limited resources
# of InMail Accepts
OPTIMIZATION OBJECTIVE:
3. Accept
2. Send
InMail
Recruiter
Candidate
1. Search
Ranking Models for Talent Search
Number of InMail Accepts Per Seat: 30% YoY
O V E R A L L I M P R O V E M E N T
Go Non-Linear with Tree Model
• Before: Linear Model optimized for NDCG with Coordinate Ascent
• After: XGBoost Tree Model
• Captures feature interactions
• XGBoost: gradient boosting tree models for richer model complexity
• Online Results:
METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT
Lift +7.5% +7.4% +5.1%
P-Value 2.1e-4 4.8e-4 0.01
Search for “Dentist”, a Software Engineer ranks high
P R O B L E M O B S E R V E D
Search for “Dentist”, a Software Engineer ranks high
P R O B L E M O B S E R V E D
• Focused too much on promoting active job-seeking candidate
• We want our ranking to be more context-aware
f( , , ) => Accept?
Reject?
Recruiter
Context
Query
Context
Context-Aware Ranking – Pairwise Training
f( , , )1
- f( , , )2Recruiter
Context
Query
Context
{
Shared Context
=>
• Pair up two candidates from the same search request:
Accept?
1
Accept?
2?>
Context-Aware Ranking
• Before: Pointwise XGBoost
• After: Pairwise XGBoost with Context-Aware Features
• Recruiter Context: Personalization features
• Query Context: Query-Candidate matching features
• Online Results:
METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT
Lift +18.2% +13.7% +8%
P-Value 1e-16 1.1e-11 9.6e-4
Search for “Machine Learning Engineer”,
desirable to include some Data Scientists
P R O B L E M O B S E R V E D
Representation Learning
• Fuzzy semantic match on title ids, skill ids, company ids etc.
• Unsupervised Graph Embedding
• Co-Occurrence Graph based on profile data
Representation Learning
• Before: XGBoost
• After: XGBoost with Title Similarity Feature
• Based on unsupervised graph embedding
• Online Results:
METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT
Lift +2% +1.8% +3%
P-Value 0.2 0.25 0.11
Deep Learning?
• Differentiable Programming with TensorFlow
• Flexible for model engineering
• Offline result does not justify the effort yet.
• Offline Results (Pairwise NN v.s. Pointwise XGBoost):
METRIC PRECISION@1 PRECISION@5 PRECISION@25
Lift +5.3% +2.8% +1.7%
Personalization for
Talent Search
Entity-Level Personalization with GLMix
• GLMix: Generalized Linear Mixed Models
• GLMix: global model + per-entity models
• We added per-recruiter model and per-contract/company model
Entity-Level Personalization with GLMix
• Model Ensemble
• Nonlinearity via tree interaction features
• Each leaf node is a feature
• Offline Results (GLMix vs. Pairwise XGBoost):
METRIC PRECISION@1 PRECISION@5 PRECISION@25
Lift +8.5% +4.7% +2.0%
Using Recruiter Search requires a lot of skills.
P R O B L E M O B S E R V E D
A Stream of
Recommended Candidates
Recommended Matches
SIMPLIFIED EXPERIENCE
In-Session Personalization
• Step 1: Segment the Space
• Query Intent Clustering
• Step 2: Evaluate each segment
• Multi-Armed Bandits
• Step 3: Modify each segment
• Term Weight Updates
In-Session Personalization: Results
Talent Search Architecture
Search and Retrieval Architecture
• LinkedIn’s Galene is built on top of Lucene.
• Three main components:
• Search index on searcher
• The fanout queries through broker, and
• Live updates to the index using live-updater.
• Query language is similar to Lucene with OR, AND, NOT.
• The search index contains two types of fields:
• Inverted Fields
• Forward Fields
Search and Retrieval Architecture
• Static Rank
• An auxiliary rank for members to help with retrieving at scale
• Based on member profile and activity
• Early termination
• Index partitioned into N-shards, each retrieves and scores candidates
• Not all members in a shard can be retrieved, so query is early terminated on the basis of
static rank.
• Galene Facet Counting:
• Galene supports facet counting (such as region, titles, etc) for any given query.
• Uses statistical counting approximation based on sample in each shard
Layered Ranking Architecture
• L1: Better to scoop into the talent pool and score/rank more candidates.
• L2: Refines the short-listed talent to apply more dynamic features using external cache.
Summary
Summary
• Talent Search
• Criteria Search, Recommendation System, Marketplace
• Talent Search Ranking
• Context-Aware Pairwise Training
• Representation Learning & Deep Learning
• GLMix Personalization
• In-Session Personalization
+
Thank You
Qi Guo Sahin Cem Geyik Bo Hu Cagri Ozcaglar
Ketan Thakkar Xianren Wu Krishnaram Kenthapadi

Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned

  • 1.
    Talent Search andRecommendation Systems at LinkedIn Practical Challenges and Lessons Learned Qi Guo, Sahin Cem Geyik, Bo Hu, Cagri Ozcaglar, Ketan Thakkar, Xianren Wu, Krishnaram Kenthapadi AI @ LinkedIn +SIGIR 2018
  • 2.
    + The Team Qi GuoSahin Cem Geyik Bo Hu Cagri Ozcaglar Ketan Thakkar Xianren Wu Krishnaram Kenthapadi
  • 3.
    Contents • Introduction • RankingModels for Talent Search • Personalization • Talent Search Architecture • Summary
  • 4.
  • 5.
    LinkedIn Talent Solution: ~65%of LinkedIn’s Annual Revenue A H I R I N G E C O S Y S T E M
  • 6.
    LinkedIn Recruiter MAJOR PRODUCT ATalent Search and Recommendation System
  • 7.
    Recruiter Search • Criteria-BasedSearch • A recruiter has specific requisitions to fill • Candidate Recommendation System • A recruiter may want many qualified candidates, goes through pages • Considers Both Sides of the Talent Marketplace • Talents are limited resources
  • 8.
    # of InMailAccepts OPTIMIZATION OBJECTIVE: 3. Accept 2. Send InMail Recruiter Candidate 1. Search
  • 9.
    Ranking Models forTalent Search
  • 10.
    Number of InMailAccepts Per Seat: 30% YoY O V E R A L L I M P R O V E M E N T
  • 11.
    Go Non-Linear withTree Model • Before: Linear Model optimized for NDCG with Coordinate Ascent • After: XGBoost Tree Model • Captures feature interactions • XGBoost: gradient boosting tree models for richer model complexity • Online Results: METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT Lift +7.5% +7.4% +5.1% P-Value 2.1e-4 4.8e-4 0.01
  • 12.
    Search for “Dentist”,a Software Engineer ranks high P R O B L E M O B S E R V E D
  • 13.
    Search for “Dentist”,a Software Engineer ranks high P R O B L E M O B S E R V E D • Focused too much on promoting active job-seeking candidate • We want our ranking to be more context-aware f( , , ) => Accept? Reject? Recruiter Context Query Context
  • 14.
    Context-Aware Ranking –Pairwise Training f( , , )1 - f( , , )2Recruiter Context Query Context { Shared Context => • Pair up two candidates from the same search request: Accept? 1 Accept? 2?>
  • 15.
    Context-Aware Ranking • Before:Pointwise XGBoost • After: Pairwise XGBoost with Context-Aware Features • Recruiter Context: Personalization features • Query Context: Query-Candidate matching features • Online Results: METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT Lift +18.2% +13.7% +8% P-Value 1e-16 1.1e-11 9.6e-4
  • 16.
    Search for “MachineLearning Engineer”, desirable to include some Data Scientists P R O B L E M O B S E R V E D
  • 17.
    Representation Learning • Fuzzysemantic match on title ids, skill ids, company ids etc. • Unsupervised Graph Embedding • Co-Occurrence Graph based on profile data
  • 18.
    Representation Learning • Before:XGBoost • After: XGBoost with Title Similarity Feature • Based on unsupervised graph embedding • Online Results: METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT Lift +2% +1.8% +3% P-Value 0.2 0.25 0.11
  • 19.
    Deep Learning? • DifferentiableProgramming with TensorFlow • Flexible for model engineering • Offline result does not justify the effort yet. • Offline Results (Pairwise NN v.s. Pointwise XGBoost): METRIC PRECISION@1 PRECISION@5 PRECISION@25 Lift +5.3% +2.8% +1.7%
  • 20.
  • 21.
    Entity-Level Personalization withGLMix • GLMix: Generalized Linear Mixed Models • GLMix: global model + per-entity models • We added per-recruiter model and per-contract/company model
  • 22.
    Entity-Level Personalization withGLMix • Model Ensemble • Nonlinearity via tree interaction features • Each leaf node is a feature • Offline Results (GLMix vs. Pairwise XGBoost): METRIC PRECISION@1 PRECISION@5 PRECISION@25 Lift +8.5% +4.7% +2.0%
  • 23.
    Using Recruiter Searchrequires a lot of skills. P R O B L E M O B S E R V E D
  • 24.
    A Stream of RecommendedCandidates Recommended Matches SIMPLIFIED EXPERIENCE
  • 25.
    In-Session Personalization • Step1: Segment the Space • Query Intent Clustering • Step 2: Evaluate each segment • Multi-Armed Bandits • Step 3: Modify each segment • Term Weight Updates
  • 26.
  • 27.
  • 28.
    Search and RetrievalArchitecture • LinkedIn’s Galene is built on top of Lucene. • Three main components: • Search index on searcher • The fanout queries through broker, and • Live updates to the index using live-updater. • Query language is similar to Lucene with OR, AND, NOT. • The search index contains two types of fields: • Inverted Fields • Forward Fields
  • 29.
    Search and RetrievalArchitecture • Static Rank • An auxiliary rank for members to help with retrieving at scale • Based on member profile and activity • Early termination • Index partitioned into N-shards, each retrieves and scores candidates • Not all members in a shard can be retrieved, so query is early terminated on the basis of static rank. • Galene Facet Counting: • Galene supports facet counting (such as region, titles, etc) for any given query. • Uses statistical counting approximation based on sample in each shard
  • 30.
    Layered Ranking Architecture •L1: Better to scoop into the talent pool and score/rank more candidates. • L2: Refines the short-listed talent to apply more dynamic features using external cache.
  • 31.
  • 32.
    Summary • Talent Search •Criteria Search, Recommendation System, Marketplace • Talent Search Ranking • Context-Aware Pairwise Training • Representation Learning & Deep Learning • GLMix Personalization • In-Session Personalization
  • 33.
    + Thank You Qi GuoSahin Cem Geyik Bo Hu Cagri Ozcaglar Ketan Thakkar Xianren Wu Krishnaram Kenthapadi