A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens,
September, 2016
iMinds – Ghent University, Belgium
toon.depessemier@ugent.be
A Scalable, High-performance Algorithm
for Hybrid Job Recommendations
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
2
Introduction: Job recommendations
Not a classic recommender story
Not a classic solution
Specific metadata characteristics
Discipline, industry, career level, …
Detailed user profile
Experience, education (university degree), employment
Limited availability in time (active_during_test)
Various user-item interactions
Click, bookmark, reply, delete
Specific meaning of delete (click on “X” load new item)
Impressions
Recommendations generated by XING’s recommender Bias
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
3
Our goals
XING’s evaluation measure
Reflects typical XING use case
Scalable
Number of users and items
Dataset = subset of XING users
Incremental updates
Continuous stream of new job items
Updating models instead of recalculating
Fast score calculation
New job items fast distribution to target users
Limited computational resources
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
4
Findings
Challenge = Prediction task
≠ Recommendation task
No influence on user behavior
Recommendations are not evaluated
by the user
Important quality metrics are not evaluated
Usefulness
Risk: Items already discovered by the user
Items that the user already interacted with, can be recommended
Diversity
Risk: Too much of the same
Serendipity
Risk: Items that are difficult to find but interesting, are unfairly evaluated as
“poor recommendations”
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
5
Findings
The information value of impressions is
limited
Recommendations of existing job
recommender
Bias to Xing’s algorithm
Less diverse
Subset of recommendations
No guarantee that the user has seen the item
No cold start user Better results if only the
interactions are used
Penalty for items with a limited visibility
Low visibility low probability of interaction
Low visibility penalty better results
Item visibility estimated by number of interactions in training
set
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
6
Findings
Influence of the user’s region
Expected: interest for jobs located in the user’s
home region or in adjacent regions
Observed: Many interactions for jobs located in
non-adjacent or far away regions
E.g. Users of Lower Saxony Jobs in Baden-
Württemberg
Many cold-start users
No interactions, no impressions (9.7%)
CB recommendation based on explicit profile
Risk: too general or to specific profile
Risk: not updated by the user
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
7
Findings
Traditional classification does not work
Positive class: click, bookmark, reply
Negative class: delete
Recommendations: items most typical for the positive class
Poor score
Reasoning: meaning of delete action
Click on X button in recommendation list
New recommendation will be loaded and displayed
Deletes not sampled from complete job offer but from
recommendations (bias: items more similar to the user’s interests
than random items)
Not necessarily a disinterest of the user
Intension to click: new recommendation
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
8
Content-Based Recommender
Based on feature matching
Explicit user profile
Interactions counter for each feature
Interaction weight
Updating counters
Delete=0, click=1, bookmark=10, reply=10 (no significant effect of deletes)
Positive counters (posf,u) item has feature
Negative counters (negf,u) item does not have feature
Score calculation
α = 0.5 (positive counters are more important than negative counters)
IDF = inverse document frequency: feature frequency across all jobs
N = total number of items
nf = number of items with feature f
wf = weight per feature type (tag, discipline, industry, …)
u = user
i = item
score(u,i) =
1
𝑓𝜖 𝑖
𝑓∈𝑖
𝑤𝑓 𝑝𝑜𝑠 𝑓,𝑢 − 𝛼 𝑛𝑒𝑔 𝑓,𝑢 𝑙𝑜𝑔
𝑁
𝑛 𝑓
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
9
Content-based calculation
Profile
Offline calculation
Incremental updates of counters
IDF
Slightly varying over time
Periodic updates
Target items
Active items
Minimum matching threshold (positive counters and item
have X features in common)
Algorithm running in parallel for different users
Fast calculation of the recommendations
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
10
Collaborative filtering: KNN
Traditional KNN
Distance based on interactions
Our KNN solution
Distanced based on interactions and metadata
2 items are similar if users have interacted with both
2 items are similar if they have metadata features in common
Feature distance: factor 𝑙𝑜𝑔
𝑁
𝑛 𝑓
Fine-grained distance function
Risk of ties is reduced
Method:
For each candidate item:
Calculate distance to k-nearest items that the user has positively interacted with
Select items with shortest distance
𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑖 =
1
𝑘 𝑘
𝐷𝑖𝑠𝑡 𝑚𝑎𝑥−𝐷𝑖𝑠𝑡 𝑖,𝑘
𝐷𝑖𝑠𝑡 𝑚𝑎𝑥
Based on Weka Framework
BallTree implementation of NearestNeighbourSearch package
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
11
KNN calculation
Item distances
Offline calculation
Slightly varying over time
If partially computed distance > threshold
stop calculation
Score calculation
Fast if distances are precomputed
Algorithm running in parallel for different users
Fast calculation of the recommendations
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
12
Results and fallback
CB: 286,041.10
KNN: 298,316.85
Hybrid: 344,264.37
Fallback cold start users:
No interactions:
KNN based on interactions is not possible (26.5% of users)
No interactions use impressions (16.8% of users)
Solution without fallback to impressions (only based on profile):
292,909.26
No interactions and no impressions (9.7% of the users):
Hybrid CB
CB cannot generate recommendations:
For 1485 users
Recommend the 30 most popular items (most positive interactions)
Without fallback to most popular recommender: 344,241.51
Most popular recommender as the only solution: 73,298.13
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
13
Questions?