Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Advertisement

A Scalable, High-performance Algorithm for Hybrid Job Recommendations

  1. Toon De Pessemier, Kris Vanhecke, Luc Martens, September, 2016 iMinds – Ghent University, Belgium toon.depessemier@ugent.be A Scalable, High-performance Algorithm for Hybrid Job Recommendations
  2. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 2 Introduction: Job recommendations Not a classic recommender story Not a classic solution  Specific metadata characteristics  Discipline, industry, career level, …  Detailed user profile  Experience, education (university degree), employment  Limited availability in time (active_during_test)  Various user-item interactions  Click, bookmark, reply, delete  Specific meaning of delete (click on “X”  load new item)  Impressions  Recommendations generated by XING’s recommender  Bias
  3. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 3 Our goals  XING’s evaluation measure  Reflects typical XING use case  Scalable  Number of users and items  Dataset = subset of XING users  Incremental updates  Continuous stream of new job items  Updating models instead of recalculating  Fast score calculation  New job items  fast distribution to target users  Limited computational resources
  4. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 4 Findings  Challenge = Prediction task  ≠ Recommendation task  No influence on user behavior  Recommendations are not evaluated by the user  Important quality metrics are not evaluated  Usefulness Risk: Items already discovered by the user Items that the user already interacted with, can be recommended  Diversity Risk: Too much of the same  Serendipity Risk: Items that are difficult to find but interesting, are unfairly evaluated as “poor recommendations”
  5. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 5 Findings  The information value of impressions is limited  Recommendations of existing job recommender  Bias to Xing’s algorithm  Less diverse  Subset of recommendations  No guarantee that the user has seen the item  No cold start user  Better results if only the interactions are used  Penalty for items with a limited visibility  Low visibility  low probability of interaction  Low visibility  penalty  better results  Item visibility estimated by number of interactions in training set
  6. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 6 Findings  Influence of the user’s region  Expected: interest for jobs located in the user’s home region or in adjacent regions  Observed: Many interactions for jobs located in non-adjacent or far away regions  E.g. Users of Lower Saxony  Jobs in Baden- Württemberg  Many cold-start users  No interactions, no impressions (9.7%)  CB recommendation based on explicit profile  Risk: too general or to specific profile  Risk: not updated by the user
  7. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 7 Findings  Traditional classification does not work  Positive class: click, bookmark, reply  Negative class: delete  Recommendations: items most typical for the positive class  Poor score  Reasoning: meaning of delete action  Click on X button in recommendation list  New recommendation will be loaded and displayed  Deletes not sampled from complete job offer but from recommendations (bias: items more similar to the user’s interests than random items)  Not necessarily a disinterest of the user  Intension to click: new recommendation
  8. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 8 Content-Based Recommender  Based on feature matching  Explicit user profile  Interactions  counter for each feature  Interaction weight  Updating counters  Delete=0, click=1, bookmark=10, reply=10 (no significant effect of deletes)  Positive counters (posf,u)  item has feature  Negative counters (negf,u)  item does not have feature  Score calculation  α = 0.5 (positive counters are more important than negative counters)  IDF = inverse document frequency: feature frequency across all jobs  N = total number of items  nf = number of items with feature f  wf = weight per feature type (tag, discipline, industry, …)  u = user  i = item score(u,i) = 1 𝑓𝜖 𝑖 𝑓∈𝑖 𝑤𝑓 𝑝𝑜𝑠 𝑓,𝑢 − 𝛼 𝑛𝑒𝑔 𝑓,𝑢 𝑙𝑜𝑔 𝑁 𝑛 𝑓
  9. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 9 Content-based calculation  Profile  Offline calculation  Incremental updates of counters  IDF  Slightly varying over time  Periodic updates  Target items  Active items  Minimum matching threshold (positive counters and item have X features in common)  Algorithm running in parallel for different users  Fast calculation of the recommendations
  10. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 10 Collaborative filtering: KNN  Traditional KNN  Distance based on interactions  Our KNN solution  Distanced based on interactions and metadata  2 items are similar if users have interacted with both  2 items are similar if they have metadata features in common Feature distance: factor 𝑙𝑜𝑔 𝑁 𝑛 𝑓  Fine-grained distance function  Risk of ties is reduced  Method:  For each candidate item:  Calculate distance to k-nearest items that the user has positively interacted with  Select items with shortest distance  𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑖 = 1 𝑘 𝑘 𝐷𝑖𝑠𝑡 𝑚𝑎𝑥−𝐷𝑖𝑠𝑡 𝑖,𝑘 𝐷𝑖𝑠𝑡 𝑚𝑎𝑥  Based on Weka Framework  BallTree implementation of NearestNeighbourSearch package
  11. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 11 KNN calculation  Item distances  Offline calculation  Slightly varying over time  If partially computed distance > threshold  stop calculation  Score calculation  Fast if distances are precomputed  Algorithm running in parallel for different users  Fast calculation of the recommendations
  12. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 12 Results and fallback  CB: 286,041.10  KNN: 298,316.85  Hybrid: 344,264.37  Fallback cold start users:  No interactions:  KNN based on interactions is not possible (26.5% of users)  No interactions  use impressions (16.8% of users)  Solution without fallback to impressions (only based on profile): 292,909.26  No interactions and no impressions (9.7% of the users):  Hybrid  CB  CB cannot generate recommendations:  For 1485 users  Recommend the 30 most popular items (most positive interactions)  Without fallback to most popular recommender: 344,241.51  Most popular recommender as the only solution: 73,298.13
  13. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 13 Questions?
Advertisement