Advertisement

A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

Data Scientist
Sep. 16, 2016
Advertisement

More Related Content

Similar to A Combination of Simple Models by Forward Predictor Selection for Job Recommendation(20)

Advertisement

Recently uploaded(20)

Advertisement

A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

  1. 1 A COMBINATION OF SIMPLE MODELS BY FORWARD PREDICTOR SELECTION FOR JOB RECOMMENDATION Dávid Zibriczky, PhD (DaveXster) Budapest University of Technology and Economics, Budapest, Hungary
  2. 2 The Dataset – Data preparation • Events (interactions, impressions) › Target format: (time,user_id,item_id,type,value) › Interactions  Format OK › Impressions: • Generating unique (time,user_id,item_id) triples • Value  count of their occurrence • Time  12pm on Thursday of the week • Type  5 • Catalog (items, users) › Target format:(id,key1,key2,…,keyN) › Items and users  Format OK › Unknown „0” values  empty values › Inconsistency: Geo-location vs. country/region  Metadata enhancement based on geo-location
  3. 3 The Dataset – Basic statistics Size of training set • 211M events, 2.8M users, 1.3M items • Effect: huge and very sparse matrix Distribution • 95% of events are impressions • 72% of the users have impressions only • Item support for interactions is low (~9) • Effect: weak collaboration using interactions Target users • 150K users • 73% active, 16% inactive, 12% new • Effect: user cold start and warm-up problem Data source #events #users #items Interactions 8,826,678 784,687 1,029,480 Impressions 201,872,093 2,755,167 846,814 All events 210,698,777 2,792,405 1,257,422 Catalog - 1,367,057 1,358,098 Catalog OR Events - 2,829,563 1,362,890
  4. 4 Methods – Concept Terminology • Method: A technique of estimating the relevance of an item for a user (p-Value) • Predictor/model: An instance of a method with a specified parameter setting • Combination: Linear combination of prediction values for a user-item pairs Approach 1. Exploring the properties of the data set 2. Definition of „simple” methods with different functionality (time-decay is commonly used)* 3. Finding a set of relevant predictors and optimal combination of them 4. Top-N ranking of available event supported items with non-zero p-Values (~200K) * Equations of the methods can be found in the paper
  5. 5 Methods – Item-kNN • Observation: Very sparse user-item matrix (0.005%), 211M events • Goal: Next best items to click, estimating recommendations of Xing • Method: Standard Item-based kNN with special fetures › Input-output event types › Controlling popularity factor › Similarity of the same item is 0 › Efficient implementation • Notation: IKNN(I,O) › I: input event type › O: output event type • Comment: No improvement combining other CF algorithms (MF, FM, User-kNN)
  6. 6 Methods – Recalling recommendations • Chart: The distribution of impression events by the number of weeks on that the same item has already been shown • Observation: 38% of recommendations are recurring items • Goal: Reverse engineering, recalling recommendations • Method: › Recommendation of already shown items › Weighted by expected CTR • Notation: RCTR
  7. 7 Methods – Already seen items • Chart: The probability of returning to an already seen item after interacting on other items • Observation: Significant probability of re- clicking on an already clicked item • Goal: Capturing re-clicking phenomena • Method: Recommendation of already clicked items • Notation: AS(I)
  8. 8 Methods – User metadata-based popularity • Observation: › Significant amount of passive and new users › All target users have metadata • Goal: › Semi-personalized recommendations for new users › Improving accuracy on inactive users • Method: 1. Item model: Expected popularity of an item in each user group 2. Prediction: Average popularity of an item for a user › Applied keys: jobroles, edu_fieldofstudies • Notation: UPOP
  9. 9 Methods – MS: Meta cosine similarity • Observation: › Item-cold start problem, many low-supported items › Almost all items has metadata • Goal: › Model building for new items › Improving the model of low-supported items • Method: 1. Item model: Meta-data representation, tf-idf 2. User model: Meta-words of items seen by the user 3. Prediction: Average cosine similarity between user-item models › Keys: tags, title, industry_id, geo_country, geo_region, discipline_id • Notation: MS
  10. 10 Methods – AP: Age-based popularity change • Observation: Significant drop in popularity of items with ~30 and ~60 days • Goal: Underscoring these items • Method: Expected ratio of the popularity in the next week • Notation: AP
  11. 11 Methods – OM: The omit method • Observation: Unwanted items in recommendation lists • Goal: Omitting poorly modelled items of a predictor or combination • Method: 1. Sub-train-test split 2. Retrain a new combination 3. Generating top-N recommendations 4. Measuring how the total evaluation would change by omitting items 5. Omitting worst K items on the original combination • Notation: OM
  12. 12 Methods – Optimization 1. Time-based train-test split (test set: last week) 2. Coordinate gradient descent optimization of various methods  candidate predictor set 3. Support-based distinct user groups (new users, inactive users, 10 equal sized group of active users) 4. Forward Predictor Selection 1. Initialization: 1. Predictors that are selected from the candidate set for final combination  selected predictor set 2. Selected predictor set is empty in the beginning 2. Loop: 1. Calculate the accuracy of selected predictor set 2. For all remained candidate predictor, calculate the gain in accuracy that would give the predictor if it would be moved to the selected set 3. Move the best one to the selected set and recalculate combination weights 4. Repeat the loop until there is improvement or reamining candidate preditor 3. Return: the set of the predictors and corresponding weights 5. Retrain selected predictors on the full data set
  13. 13 … let’s put it together and see how it performs!
  14. 14 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 * Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
  15. 15 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5
  16. 16 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3
  17. 17 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3
  18. 18 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score › 10 models: 650K+ score (<30 mins. training time) # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3 7 +AS(3) 1,205 178 636,498 3 8 +IKNN(R,C) 1,557 197 643,145 3 9 +AS(4) 1,582 202 644,710 3 10 +AP 1,621 207 652,802 3
  19. 19 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score › 10 models: 650K+ score (<30 mins. training time) • Final combination › 3rd place › ~666K leaderboard score › 11 instances › user-support-based weighting › 3h+ training time, 200 ms prediction time # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3 7 +AS(3) 1,205 178 636,498 3 8 +IKNN(R,C) 1,557 197 643,145 3 9 +AS(4) 1,582 202 644,710 3 10 +AP 1,621 207 652,802 3 SUPP_C(1-10) 1,639 194 661,359 3 11 +OM 11,790 199 665,592 3 * Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
  20. 20 Evaluation – Timeline 39 1514141415 121110 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 115.4 366.9 418.7 438.3 454.2 468.4 481.9 513.4 569.6 596.5 600.2 603.2 610.0 611.3 611.6 625.2 627.2 627.5 628.9 633.1 637.6 638.1 639.7 640.4 643.5 644.7 652.8 653.2 653.7 665.6 0 5 10 15 20 25 30 35 40 45 0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 Apr-25 May-02 May-09 May-16 May-23 May-30 Jun-06 Jun-13 Jun-20 Jun-27 Leaderboardrank Leaderboardscore(thousands) Date Timeline Initial setup Model design and implementation Final sprint
  21. 21 Lessons learnt • Exploiting the specificity of the dataset • Using Item-kNN over factorization in a very sparse dataset • Paying attention to recurrence • Forward Predictor Selection is effective • Different optimization for different user groups • Underscoring/omitting weak items • Ranking 200K items is slow • Keep it simple and transparent!
  22. 22 Presenter Contact Thank you for your attention! Dávid Zibriczky, PhD david.zibriczky@gmail.com
Advertisement