A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

1
A COMBINATION OF SIMPLE MODELS BY FORWARD
PREDICTOR SELECTION FOR JOB RECOMMENDATION
Dávid Zibriczky, PhD (DaveXster)
Budapest University of Technology and Economics,
Budapest, Hungary

2
The Dataset – Data preparation
• Events (interactions, impressions)
› Target format: (time,user_id,item_id,type,value)
› Interactions  Format OK
› Impressions:
• Generating unique (time,user_id,item_id) triples
• Value  count of their occurrence
• Time  12pm on Thursday of the week
• Type  5
• Catalog (items, users)
› Target format:(id,key1,key2,…,keyN)
› Items and users  Format OK
› Unknown „0” values  empty values
› Inconsistency: Geo-location vs. country/region  Metadata enhancement based on geo-location

3
The Dataset – Basic statistics
Size of training set
• 211M events, 2.8M users, 1.3M items
• Effect: huge and very sparse matrix
Distribution
• 95% of events are impressions
• 72% of the users have impressions only
• Item support for interactions is low (~9)
• Effect: weak collaboration using interactions
Target users
• 150K users
• 73% active, 16% inactive, 12% new
• Effect: user cold start and warm-up problem
Data source #events #users #items
Interactions 8,826,678 784,687 1,029,480
Impressions 201,872,093 2,755,167 846,814
All events 210,698,777 2,792,405 1,257,422
Catalog - 1,367,057 1,358,098
Catalog OR Events - 2,829,563 1,362,890

4
Methods – Concept
Terminology
• Method: A technique of estimating the relevance of an item for a user (p-Value)
• Predictor/model: An instance of a method with a specified parameter setting
• Combination: Linear combination of prediction values for a user-item pairs
Approach
1. Exploring the properties of the data set
2. Definition of „simple” methods with different functionality (time-decay is commonly used)*
3. Finding a set of relevant predictors and optimal combination of them
4. Top-N ranking of available event supported items with non-zero p-Values (~200K)
* Equations of the methods can be found in the paper

5
Methods – Item-kNN
• Observation: Very sparse user-item matrix (0.005%), 211M events
• Goal: Next best items to click, estimating recommendations of Xing
• Method: Standard Item-based kNN with special fetures
› Input-output event types
› Controlling popularity factor
› Similarity of the same item is 0
› Efficient implementation
• Notation: IKNN(I,O)
› I: input event type
› O: output event type
• Comment: No improvement combining other CF algorithms (MF, FM, User-kNN)

6
Methods – Recalling recommendations
• Chart: The distribution of impression
events by the number of weeks on that the
same item has already been shown
• Observation: 38% of recommendations
are recurring items
• Goal: Reverse engineering, recalling
recommendations
• Method:
› Recommendation of already shown items
› Weighted by expected CTR
• Notation: RCTR

7
Methods – Already seen items
• Chart: The probability of returning to an already
seen item after interacting on other items
• Observation: Significant probability of re-
clicking on an already clicked item
• Goal: Capturing re-clicking phenomena
• Method: Recommendation of already clicked
items
• Notation: AS(I)

8
Methods – User metadata-based popularity
• Observation:
› Significant amount of passive and new users
› All target users have metadata
• Goal:
› Semi-personalized recommendations for new users
› Improving accuracy on inactive users
• Method:
1. Item model: Expected popularity of an item in each user group
2. Prediction: Average popularity of an item for a user
› Applied keys: jobroles, edu_fieldofstudies
• Notation: UPOP

9
Methods – MS: Meta cosine similarity
• Observation:
› Item-cold start problem, many low-supported items
› Almost all items has metadata
• Goal:
› Model building for new items
› Improving the model of low-supported items
• Method:
1. Item model: Meta-data representation, tf-idf
2. User model: Meta-words of items seen by the user
3. Prediction: Average cosine similarity between user-item models
› Keys: tags, title, industry_id, geo_country, geo_region,
discipline_id
• Notation: MS

10
Methods – AP: Age-based popularity change
• Observation: Significant drop in popularity of
items with ~30 and ~60 days
• Goal: Underscoring these items
• Method: Expected ratio of the popularity in the
next week
• Notation: AP

11
Methods – OM: The omit method
• Observation: Unwanted items in recommendation lists
• Goal: Omitting poorly modelled items of a predictor or combination
• Method:
1. Sub-train-test split
2. Retrain a new combination
3. Generating top-N recommendations
4. Measuring how the total evaluation would change by omitting items
5. Omitting worst K items on the original combination
• Notation: OM

12
Methods – Optimization
1. Time-based train-test split (test set: last week)
2. Coordinate gradient descent optimization of various methods  candidate predictor set
3. Support-based distinct user groups (new users, inactive users, 10 equal sized group of active users)
4. Forward Predictor Selection
1. Initialization:
1. Predictors that are selected from the candidate set for final combination  selected predictor set
2. Selected predictor set is empty in the beginning
2. Loop:
1. Calculate the accuracy of selected predictor set
2. For all remained candidate predictor, calculate the gain in accuracy that would give the predictor if it
would be moved to the selected set
3. Move the best one to the selected set and recalculate combination weights
4. Repeat the loop until there is improvement or reamining candidate preditor
3. Return: the set of the predictors and corresponding weights
5. Retrain selected predictors on the full data set

13
… let’s put it together and see how it performs!

14
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory

15
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5

16
› 5 models: 3rd place
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3

17
› 6 models: 95% of final score
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3

18
› 10 models: 650K+ score (<30 mins. training time)
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
7 +AS(3) 1,205 178 636,498 3
8 +IKNN(R,C) 1,557 197 643,145 3
9 +AS(4) 1,582 202 644,710 3
10 +AP 1,621 207 652,802 3

19
› 10 models: 650K+ score (<30 mins. training time)
• Final combination
› 3rd place
› ~666K leaderboard score
› 11 instances
› user-support-based weighting
› 3h+ training time, 200 ms prediction time
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
7 +AS(3) 1,205 178 636,498 3
8 +IKNN(R,C) 1,557 197 643,145 3
9 +AS(4) 1,582 202 644,710 3
10 +AP 1,621 207 652,802 3
SUPP_C(1-10) 1,639 194 661,359 3
11 +OM 11,790 199 665,592 3
* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory

20
Evaluation – Timeline
39
1514141415
121110
2 3
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3
115.4
366.9
418.7
438.3
454.2
468.4
481.9
513.4
569.6
596.5
600.2
603.2
610.0
611.3
611.6
625.2
627.2
627.5
628.9
633.1
637.6
638.1
639.7
640.4
643.5
644.7
652.8
653.2
653.7
665.6
0
5
10
15
20
25
30
35
40
45
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
Apr-25
May-02
May-09
May-16
May-23
May-30
Jun-06
Jun-13
Jun-20
Jun-27
Leaderboardrank
Leaderboardscore(thousands)
Date
Timeline
Initial setup Model design and implementation Final sprint

21
Lessons learnt
• Exploiting the specificity of the dataset
• Using Item-kNN over factorization in a very sparse dataset
• Paying attention to recurrence
• Forward Predictor Selection is effective
• Different optimization for different user groups
• Underscoring/omitting weak items
• Ranking 200K items is slow
• Keep it simple and transparent!

22
Presenter
Contact
Thank you for your attention!
Dávid Zibriczky, PhD
david.zibriczky@gmail.com

A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

Similar to A Combination of Simple Models by Forward Predictor Selection for Job Recommendation (20)

More from David Zibriczky

More from David Zibriczky (10)

Recently uploaded

Recently uploaded (20)

A Combination of Simple Models by Forward Predictor Selection for Job Recommendation