Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

927 views

Published on

RecSys Challenge 2016 solution, scoring 2nd place,
https://recsys.xing.com/leaders
about authors: http://mim-solutions.pl/

Published in: Data & Analytics
  • Be the first to comment

RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

  1. 1. Challenge statement Our Solution What could we do better? RecSys Challenge 2016 job recommendations based on preselection of offers and gradient boosting Andrzej Pacuk Piotr Sankowski Karol W˛egrzycki Adam Witkowski Piotr Wygocki apacuk@mimuw.edu.pl University of Warsaw RecSys Challenge 2016 mim-solutions.pl RecSys Challenge 2016
  2. 2. Challenge statement Our Solution What could we do better? Outline 1 Challenge statement 2 Our Solution Candidate items selection Learning probabilities Features 3 What could we do better? mim-solutions.pl RecSys Challenge 2016
  3. 3. Challenge statement Our Solution What could we do better? Problem Xing.com dataset: user profiles (experience, education, current job’s roles, etc.), job (item) offer description (title, tags, employment type, etc.), past recommendations (impressions), user positive (clicking, bookmarking, replying) and negative (deleting) interactions with items. Task: predict user’s positive interactions. mim-solutions.pl RecSys Challenge 2016
  4. 4. Challenge statement Our Solution What could we do better? Evaluation Secret ground truth (GT): positive interactions from test week. Mean average precision-like (MAP) measure. Online evaluation. Finished 2nd! mim-solutions.pl RecSys Challenge 2016
  5. 5. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Solution’s schema user job #1 job #2 job #3 select candidates predict probabilities sort ... job #N job #1 0.3 job #2 0.7 job #3 0.4 ... job #N 0.5 job #15 0.9 job #34 0.89 ... job #124 0.75 take top 30 mim-solutions.pl RecSys Challenge 2016
  6. 6. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Training set Training GT: positive interactions of last week. Local score. Separate candidates and features for training and full dataset! mim-solutions.pl RecSys Challenge 2016
  7. 7. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates Candidate - item with high: P [i ∈ GT(u)] . 20 categories. Ranking: e.g. sort interactions by timestamp. ∼ 300 candidates per user (0.1% of all items). 37% cover of training GT. mim-solutions.pl RecSys Challenge 2016
  8. 8. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates categories Users’s interactions (Int(u)) sorted by week and events count within week, Similarly for impressions (Imp(u)), Int(u ) for users u sorted by: Jaccard(Int(u), Int(u )). mim-solutions.pl RecSys Challenge 2016
  9. 9. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates (cold start) items i sorted by: max i ∈Int(u) |tags(i) ∩ tags(i )|, items i sorted by: |jobroles(u) ∩ tags(i)|, globally most popular items. mim-solutions.pl RecSys Challenge 2016
  10. 10. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidate ranking XGBoost (Gradient Boosting Decision Trees). Optimizing logloss. Training file from preselected candidates: all positive, sampled negative. 77.5% of perfect candidates ranking’s score. mim-solutions.pl RecSys Challenge 2016
  11. 11. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Features Feature maps (user, item) to real number. 12 groups. Total 273. Worked well with: highly correlated features, null values, no scaling/normalization. mim-solutions.pl RecSys Challenge 2016
  12. 12. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Feature definitions (sample) Event based item: percentage of Int(u) having same property (e.g., employment) as item i. Most similar user who clicked item: max u ∈Users(i) Jaccard(Int(u), Int(u )). Most similar item clicked by user: max i ∈Int(u) Jaccard(Users(i), Users(i )). mim-solutions.pl RecSys Challenge 2016
  13. 13. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Top feature groups feature group fscore event based user (item) profile 41% tags + title 7% item global popularity 22% trend 10% weekday 4% most similar 10% item clicked by user 6% user who clicked item 4% user total events 8% in last week 4% seconds from last user activity 7% max common tags with clicked item 4% mim-solutions.pl RecSys Challenge 2016
  14. 14. Challenge statement Our Solution What could we do better? Possible improvements Training file: 8x bigger, sample 1/4 negative candidates (instead of random 5) per user. score: +6.5k. Ensembling models. Layer scores: Candidates selection: 37%. Ranking candidates: 77.5%. mim-solutions.pl RecSys Challenge 2016
  15. 15. Challenge statement Our Solution What could we do better? Thank you apacuk@mimuw.edu.pl mim-solutions.pl mim-solutions.pl RecSys Challenge 2016

×