Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Toon De Pessemier, Kris Vanhecke, Luc Martens,
September, 2016
iMinds – Ghent University, Belgium
toon.depessemier@ugent.b...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
2
Intr...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
3
Our ...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
4
Find...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
5
Find...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
6
Find...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
7
Find...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
8
Cont...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
9
Cont...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
10
Col...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
11
KNN...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
12
Res...
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
13
Que...
Upcoming SlideShare
Loading in …5
×

A Scalable, High-performance Algorithm for Hybrid Job Recommendations

408 views

Published on

RecSys2016 Challenge

Published in: Internet
  • Be the first to comment

  • Be the first to like this

A Scalable, High-performance Algorithm for Hybrid Job Recommendations

  1. 1. Toon De Pessemier, Kris Vanhecke, Luc Martens, September, 2016 iMinds – Ghent University, Belgium toon.depessemier@ugent.be A Scalable, High-performance Algorithm for Hybrid Job Recommendations
  2. 2. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 2 Introduction: Job recommendations Not a classic recommender story Not a classic solution  Specific metadata characteristics  Discipline, industry, career level, …  Detailed user profile  Experience, education (university degree), employment  Limited availability in time (active_during_test)  Various user-item interactions  Click, bookmark, reply, delete  Specific meaning of delete (click on “X”  load new item)  Impressions  Recommendations generated by XING’s recommender  Bias
  3. 3. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 3 Our goals  XING’s evaluation measure  Reflects typical XING use case  Scalable  Number of users and items  Dataset = subset of XING users  Incremental updates  Continuous stream of new job items  Updating models instead of recalculating  Fast score calculation  New job items  fast distribution to target users  Limited computational resources
  4. 4. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 4 Findings  Challenge = Prediction task  ≠ Recommendation task  No influence on user behavior  Recommendations are not evaluated by the user  Important quality metrics are not evaluated  Usefulness Risk: Items already discovered by the user Items that the user already interacted with, can be recommended  Diversity Risk: Too much of the same  Serendipity Risk: Items that are difficult to find but interesting, are unfairly evaluated as “poor recommendations”
  5. 5. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 5 Findings  The information value of impressions is limited  Recommendations of existing job recommender  Bias to Xing’s algorithm  Less diverse  Subset of recommendations  No guarantee that the user has seen the item  No cold start user  Better results if only the interactions are used  Penalty for items with a limited visibility  Low visibility  low probability of interaction  Low visibility  penalty  better results  Item visibility estimated by number of interactions in training set
  6. 6. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 6 Findings  Influence of the user’s region  Expected: interest for jobs located in the user’s home region or in adjacent regions  Observed: Many interactions for jobs located in non-adjacent or far away regions  E.g. Users of Lower Saxony  Jobs in Baden- Württemberg  Many cold-start users  No interactions, no impressions (9.7%)  CB recommendation based on explicit profile  Risk: too general or to specific profile  Risk: not updated by the user
  7. 7. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 7 Findings  Traditional classification does not work  Positive class: click, bookmark, reply  Negative class: delete  Recommendations: items most typical for the positive class  Poor score  Reasoning: meaning of delete action  Click on X button in recommendation list  New recommendation will be loaded and displayed  Deletes not sampled from complete job offer but from recommendations (bias: items more similar to the user’s interests than random items)  Not necessarily a disinterest of the user  Intension to click: new recommendation
  8. 8. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 8 Content-Based Recommender  Based on feature matching  Explicit user profile  Interactions  counter for each feature  Interaction weight  Updating counters  Delete=0, click=1, bookmark=10, reply=10 (no significant effect of deletes)  Positive counters (posf,u)  item has feature  Negative counters (negf,u)  item does not have feature  Score calculation  α = 0.5 (positive counters are more important than negative counters)  IDF = inverse document frequency: feature frequency across all jobs  N = total number of items  nf = number of items with feature f  wf = weight per feature type (tag, discipline, industry, …)  u = user  i = item score(u,i) = 1 𝑓𝜖 𝑖 𝑓∈𝑖 𝑤𝑓 𝑝𝑜𝑠 𝑓,𝑢 − 𝛼 𝑛𝑒𝑔 𝑓,𝑢 𝑙𝑜𝑔 𝑁 𝑛 𝑓
  9. 9. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 9 Content-based calculation  Profile  Offline calculation  Incremental updates of counters  IDF  Slightly varying over time  Periodic updates  Target items  Active items  Minimum matching threshold (positive counters and item have X features in common)  Algorithm running in parallel for different users  Fast calculation of the recommendations
  10. 10. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 10 Collaborative filtering: KNN  Traditional KNN  Distance based on interactions  Our KNN solution  Distanced based on interactions and metadata  2 items are similar if users have interacted with both  2 items are similar if they have metadata features in common Feature distance: factor 𝑙𝑜𝑔 𝑁 𝑛 𝑓  Fine-grained distance function  Risk of ties is reduced  Method:  For each candidate item:  Calculate distance to k-nearest items that the user has positively interacted with  Select items with shortest distance  𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑖 = 1 𝑘 𝑘 𝐷𝑖𝑠𝑡 𝑚𝑎𝑥−𝐷𝑖𝑠𝑡 𝑖,𝑘 𝐷𝑖𝑠𝑡 𝑚𝑎𝑥  Based on Weka Framework  BallTree implementation of NearestNeighbourSearch package
  11. 11. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 11 KNN calculation  Item distances  Offline calculation  Slightly varying over time  If partially computed distance > threshold  stop calculation  Score calculation  Fast if distances are precomputed  Algorithm running in parallel for different users  Fast calculation of the recommendations
  12. 12. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 12 Results and fallback  CB: 286,041.10  KNN: 298,316.85  Hybrid: 344,264.37  Fallback cold start users:  No interactions:  KNN based on interactions is not possible (26.5% of users)  No interactions  use impressions (16.8% of users)  Solution without fallback to impressions (only based on profile): 292,909.26  No interactions and no impressions (9.7% of the users):  Hybrid  CB  CB cannot generate recommendations:  For 1485 users  Recommend the 30 most popular items (most positive interactions)  Without fallback to most popular recommender: 344,241.51  Most popular recommender as the only solution: 73,298.13
  13. 13. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 13 Questions?

×