SlideShare a Scribd company logo
1 of 22
1
A COMBINATION OF SIMPLE MODELS BY FORWARD
PREDICTOR SELECTION FOR JOB RECOMMENDATION
Dávid Zibriczky, PhD (DaveXster)
Budapest University of Technology and Economics,
Budapest, Hungary
2
The Dataset – Data preparation
• Events (interactions, impressions)
› Target format: (time,user_id,item_id,type,value)
› Interactions  Format OK
› Impressions:
• Generating unique (time,user_id,item_id) triples
• Value  count of their occurrence
• Time  12pm on Thursday of the week
• Type  5
• Catalog (items, users)
› Target format:(id,key1,key2,…,keyN)
› Items and users  Format OK
› Unknown „0” values  empty values
› Inconsistency: Geo-location vs. country/region  Metadata enhancement based on geo-location
3
The Dataset – Basic statistics
Size of training set
• 211M events, 2.8M users, 1.3M items
• Effect: huge and very sparse matrix
Distribution
• 95% of events are impressions
• 72% of the users have impressions only
• Item support for interactions is low (~9)
• Effect: weak collaboration using interactions
Target users
• 150K users
• 73% active, 16% inactive, 12% new
• Effect: user cold start and warm-up problem
Data source #events #users #items
Interactions 8,826,678 784,687 1,029,480
Impressions 201,872,093 2,755,167 846,814
All events 210,698,777 2,792,405 1,257,422
Catalog - 1,367,057 1,358,098
Catalog OR Events - 2,829,563 1,362,890
4
Methods – Concept
Terminology
• Method: A technique of estimating the relevance of an item for a user (p-Value)
• Predictor/model: An instance of a method with a specified parameter setting
• Combination: Linear combination of prediction values for a user-item pairs
Approach
1. Exploring the properties of the data set
2. Definition of „simple” methods with different functionality (time-decay is commonly used)*
3. Finding a set of relevant predictors and optimal combination of them
4. Top-N ranking of available event supported items with non-zero p-Values (~200K)
* Equations of the methods can be found in the paper
5
Methods – Item-kNN
• Observation: Very sparse user-item matrix (0.005%), 211M events
• Goal: Next best items to click, estimating recommendations of Xing
• Method: Standard Item-based kNN with special fetures
› Input-output event types
› Controlling popularity factor
› Similarity of the same item is 0
› Efficient implementation
• Notation: IKNN(I,O)
› I: input event type
› O: output event type
• Comment: No improvement combining other CF algorithms (MF, FM, User-kNN)
6
Methods – Recalling recommendations
• Chart: The distribution of impression
events by the number of weeks on that the
same item has already been shown
• Observation: 38% of recommendations
are recurring items
• Goal: Reverse engineering, recalling
recommendations
• Method:
› Recommendation of already shown items
› Weighted by expected CTR
• Notation: RCTR
7
Methods – Already seen items
• Chart: The probability of returning to an already
seen item after interacting on other items
• Observation: Significant probability of re-
clicking on an already clicked item
• Goal: Capturing re-clicking phenomena
• Method: Recommendation of already clicked
items
• Notation: AS(I)
8
Methods – User metadata-based popularity
• Observation:
› Significant amount of passive and new users
› All target users have metadata
• Goal:
› Semi-personalized recommendations for new users
› Improving accuracy on inactive users
• Method:
1. Item model: Expected popularity of an item in each user group
2. Prediction: Average popularity of an item for a user
› Applied keys: jobroles, edu_fieldofstudies
• Notation: UPOP
9
Methods – MS: Meta cosine similarity
• Observation:
› Item-cold start problem, many low-supported items
› Almost all items has metadata
• Goal:
› Model building for new items
› Improving the model of low-supported items
• Method:
1. Item model: Meta-data representation, tf-idf
2. User model: Meta-words of items seen by the user
3. Prediction: Average cosine similarity between user-item models
› Keys: tags, title, industry_id, geo_country, geo_region,
discipline_id
• Notation: MS
10
Methods – AP: Age-based popularity change
• Observation: Significant drop in popularity of
items with ~30 and ~60 days
• Goal: Underscoring these items
• Method: Expected ratio of the popularity in the
next week
• Notation: AP
11
Methods – OM: The omit method
• Observation: Unwanted items in recommendation lists
• Goal: Omitting poorly modelled items of a predictor or combination
• Method:
1. Sub-train-test split
2. Retrain a new combination
3. Generating top-N recommendations
4. Measuring how the total evaluation would change by omitting items
5. Omitting worst K items on the original combination
• Notation: OM
12
Methods – Optimization
1. Time-based train-test split (test set: last week)
2. Coordinate gradient descent optimization of various methods  candidate predictor set
3. Support-based distinct user groups (new users, inactive users, 10 equal sized group of active users)
4. Forward Predictor Selection
1. Initialization:
1. Predictors that are selected from the candidate set for final combination  selected predictor set
2. Selected predictor set is empty in the beginning
2. Loop:
1. Calculate the accuracy of selected predictor set
2. For all remained candidate predictor, calculate the gain in accuracy that would give the predictor if it
would be moved to the selected set
3. Move the best one to the selected set and recalculate combination weights
4. Repeat the loop until there is improvement or reamining candidate preditor
3. Return: the set of the predictors and corresponding weights
5. Retrain selected predictors on the full data set
13
… let’s put it together and see how it performs!
14
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
15
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
16
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
17
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
› 6 models: 95% of final score
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
18
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
› 6 models: 95% of final score
› 10 models: 650K+ score (<30 mins. training time)
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
7 +AS(3) 1,205 178 636,498 3
8 +IKNN(R,C) 1,557 197 643,145 3
9 +AS(4) 1,582 202 644,710 3
10 +AP 1,621 207 652,802 3
19
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
› 6 models: 95% of final score
› 10 models: 650K+ score (<30 mins. training time)
• Final combination
› 3rd place
› ~666K leaderboard score
› 11 instances
› user-support-based weighting
› 3h+ training time, 200 ms prediction time
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
7 +AS(3) 1,205 178 636,498 3
8 +IKNN(R,C) 1,557 197 643,145 3
9 +AS(4) 1,582 202 644,710 3
10 +AP 1,621 207 652,802 3
SUPP_C(1-10) 1,639 194 661,359 3
11 +OM 11,790 199 665,592 3
* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
20
Evaluation – Timeline
39
1514141415
121110
2 3
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3
115.4
366.9
418.7
438.3
454.2
468.4
481.9
513.4
569.6
596.5
600.2
603.2
610.0
611.3
611.6
625.2
627.2
627.5
628.9
633.1
637.6
638.1
639.7
640.4
643.5
644.7
652.8
653.2
653.7
665.6
0
5
10
15
20
25
30
35
40
45
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
Apr-25
May-02
May-09
May-16
May-23
May-30
Jun-06
Jun-13
Jun-20
Jun-27
Leaderboardrank
Leaderboardscore(thousands)
Date
Timeline
Initial setup Model design and implementation Final sprint
21
Lessons learnt
• Exploiting the specificity of the dataset
• Using Item-kNN over factorization in a very sparse dataset
• Paying attention to recurrence
• Forward Predictor Selection is effective
• Different optimization for different user groups
• Underscoring/omitting weak items
• Ranking 200K items is slow
• Keep it simple and transparent!
22
Presenter
Contact
Thank you for your attention!
Dávid Zibriczky, PhD
david.zibriczky@gmail.com

More Related Content

What's hot

Collaborative filtering at scale
Collaborative filtering at scaleCollaborative filtering at scale
Collaborative filtering at scale
huguk
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
GrubhubTech
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Lior Rokach
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
Ernesto Mislej
 

What's hot (20)

Recsys2021_slides_sato
Recsys2021_slides_satoRecsys2021_slides_sato
Recsys2021_slides_sato
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Collaborative filtering at scale
Collaborative filtering at scaleCollaborative filtering at scale
Collaborative filtering at scale
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 

Similar to A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
Khadija Atiya
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
Xavier Amatriain
 

Similar to A Combination of Simple Models by Forward Predictor Selection for Job Recommendation (20)

[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
 
Concept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance FeedbackConcept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance Feedback
 
Irrf Presentation
Irrf PresentationIrrf Presentation
Irrf Presentation
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
kdd2015
kdd2015kdd2015
kdd2015
 
Sbst2018 contest2018
Sbst2018 contest2018Sbst2018 contest2018
Sbst2018 contest2018
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 

More from David Zibriczky

More from David Zibriczky (10)

Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment Businesses
 
Improving the TV User Experience by Algorithms: Personalized Content Recommen...
Improving the TV User Experience by Algorithms: Personalized Content Recommen...Improving the TV User Experience by Algorithms: Personalized Content Recommen...
Improving the TV User Experience by Algorithms: Personalized Content Recommen...
 
Recommender Systems meet Finance - A literature review
Recommender Systems meet Finance - A literature reviewRecommender Systems meet Finance - A literature review
Recommender Systems meet Finance - A literature review
 
Fast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender SystemsFast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender Systems
 
EPG content recommendation in large scale: a case study on interactive TV pla...
EPG content recommendation in large scale: a case study on interactive TV pla...EPG content recommendation in large scale: a case study on interactive TV pla...
EPG content recommendation in large scale: a case study on interactive TV pla...
 
Personalized recommendation of linear content on interactive TV platforms
Personalized recommendation of linear content on interactive TV platformsPersonalized recommendation of linear content on interactive TV platforms
Personalized recommendation of linear content on interactive TV platforms
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Data Modeling in IPTV and OTT Recommender Systems
Data Modeling in IPTV and OTT Recommender SystemsData Modeling in IPTV and OTT Recommender Systems
Data Modeling in IPTV and OTT Recommender Systems
 
Entropy based asset pricing
Entropy based asset pricingEntropy based asset pricing
Entropy based asset pricing
 

Recently uploaded

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

  • 1. 1 A COMBINATION OF SIMPLE MODELS BY FORWARD PREDICTOR SELECTION FOR JOB RECOMMENDATION Dávid Zibriczky, PhD (DaveXster) Budapest University of Technology and Economics, Budapest, Hungary
  • 2. 2 The Dataset – Data preparation • Events (interactions, impressions) › Target format: (time,user_id,item_id,type,value) › Interactions  Format OK › Impressions: • Generating unique (time,user_id,item_id) triples • Value  count of their occurrence • Time  12pm on Thursday of the week • Type  5 • Catalog (items, users) › Target format:(id,key1,key2,…,keyN) › Items and users  Format OK › Unknown „0” values  empty values › Inconsistency: Geo-location vs. country/region  Metadata enhancement based on geo-location
  • 3. 3 The Dataset – Basic statistics Size of training set • 211M events, 2.8M users, 1.3M items • Effect: huge and very sparse matrix Distribution • 95% of events are impressions • 72% of the users have impressions only • Item support for interactions is low (~9) • Effect: weak collaboration using interactions Target users • 150K users • 73% active, 16% inactive, 12% new • Effect: user cold start and warm-up problem Data source #events #users #items Interactions 8,826,678 784,687 1,029,480 Impressions 201,872,093 2,755,167 846,814 All events 210,698,777 2,792,405 1,257,422 Catalog - 1,367,057 1,358,098 Catalog OR Events - 2,829,563 1,362,890
  • 4. 4 Methods – Concept Terminology • Method: A technique of estimating the relevance of an item for a user (p-Value) • Predictor/model: An instance of a method with a specified parameter setting • Combination: Linear combination of prediction values for a user-item pairs Approach 1. Exploring the properties of the data set 2. Definition of „simple” methods with different functionality (time-decay is commonly used)* 3. Finding a set of relevant predictors and optimal combination of them 4. Top-N ranking of available event supported items with non-zero p-Values (~200K) * Equations of the methods can be found in the paper
  • 5. 5 Methods – Item-kNN • Observation: Very sparse user-item matrix (0.005%), 211M events • Goal: Next best items to click, estimating recommendations of Xing • Method: Standard Item-based kNN with special fetures › Input-output event types › Controlling popularity factor › Similarity of the same item is 0 › Efficient implementation • Notation: IKNN(I,O) › I: input event type › O: output event type • Comment: No improvement combining other CF algorithms (MF, FM, User-kNN)
  • 6. 6 Methods – Recalling recommendations • Chart: The distribution of impression events by the number of weeks on that the same item has already been shown • Observation: 38% of recommendations are recurring items • Goal: Reverse engineering, recalling recommendations • Method: › Recommendation of already shown items › Weighted by expected CTR • Notation: RCTR
  • 7. 7 Methods – Already seen items • Chart: The probability of returning to an already seen item after interacting on other items • Observation: Significant probability of re- clicking on an already clicked item • Goal: Capturing re-clicking phenomena • Method: Recommendation of already clicked items • Notation: AS(I)
  • 8. 8 Methods – User metadata-based popularity • Observation: › Significant amount of passive and new users › All target users have metadata • Goal: › Semi-personalized recommendations for new users › Improving accuracy on inactive users • Method: 1. Item model: Expected popularity of an item in each user group 2. Prediction: Average popularity of an item for a user › Applied keys: jobroles, edu_fieldofstudies • Notation: UPOP
  • 9. 9 Methods – MS: Meta cosine similarity • Observation: › Item-cold start problem, many low-supported items › Almost all items has metadata • Goal: › Model building for new items › Improving the model of low-supported items • Method: 1. Item model: Meta-data representation, tf-idf 2. User model: Meta-words of items seen by the user 3. Prediction: Average cosine similarity between user-item models › Keys: tags, title, industry_id, geo_country, geo_region, discipline_id • Notation: MS
  • 10. 10 Methods – AP: Age-based popularity change • Observation: Significant drop in popularity of items with ~30 and ~60 days • Goal: Underscoring these items • Method: Expected ratio of the popularity in the next week • Notation: AP
  • 11. 11 Methods – OM: The omit method • Observation: Unwanted items in recommendation lists • Goal: Omitting poorly modelled items of a predictor or combination • Method: 1. Sub-train-test split 2. Retrain a new combination 3. Generating top-N recommendations 4. Measuring how the total evaluation would change by omitting items 5. Omitting worst K items on the original combination • Notation: OM
  • 12. 12 Methods – Optimization 1. Time-based train-test split (test set: last week) 2. Coordinate gradient descent optimization of various methods  candidate predictor set 3. Support-based distinct user groups (new users, inactive users, 10 equal sized group of active users) 4. Forward Predictor Selection 1. Initialization: 1. Predictors that are selected from the candidate set for final combination  selected predictor set 2. Selected predictor set is empty in the beginning 2. Loop: 1. Calculate the accuracy of selected predictor set 2. For all remained candidate predictor, calculate the gain in accuracy that would give the predictor if it would be moved to the selected set 3. Move the best one to the selected set and recalculate combination weights 4. Repeat the loop until there is improvement or reamining candidate preditor 3. Return: the set of the predictors and corresponding weights 5. Retrain selected predictors on the full data set
  • 13. 13 … let’s put it together and see how it performs!
  • 14. 14 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 * Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
  • 15. 15 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5
  • 16. 16 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3
  • 17. 17 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3
  • 18. 18 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score › 10 models: 650K+ score (<30 mins. training time) # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3 7 +AS(3) 1,205 178 636,498 3 8 +IKNN(R,C) 1,557 197 643,145 3 9 +AS(4) 1,582 202 644,710 3 10 +AP 1,621 207 652,802 3
  • 19. 19 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score › 10 models: 650K+ score (<30 mins. training time) • Final combination › 3rd place › ~666K leaderboard score › 11 instances › user-support-based weighting › 3h+ training time, 200 ms prediction time # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3 7 +AS(3) 1,205 178 636,498 3 8 +IKNN(R,C) 1,557 197 643,145 3 9 +AS(4) 1,582 202 644,710 3 10 +AP 1,621 207 652,802 3 SUPP_C(1-10) 1,639 194 661,359 3 11 +OM 11,790 199 665,592 3 * Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
  • 20. 20 Evaluation – Timeline 39 1514141415 121110 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 115.4 366.9 418.7 438.3 454.2 468.4 481.9 513.4 569.6 596.5 600.2 603.2 610.0 611.3 611.6 625.2 627.2 627.5 628.9 633.1 637.6 638.1 639.7 640.4 643.5 644.7 652.8 653.2 653.7 665.6 0 5 10 15 20 25 30 35 40 45 0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 Apr-25 May-02 May-09 May-16 May-23 May-30 Jun-06 Jun-13 Jun-20 Jun-27 Leaderboardrank Leaderboardscore(thousands) Date Timeline Initial setup Model design and implementation Final sprint
  • 21. 21 Lessons learnt • Exploiting the specificity of the dataset • Using Item-kNN over factorization in a very sparse dataset • Paying attention to recurrence • Forward Predictor Selection is effective • Different optimization for different user groups • Underscoring/omitting weak items • Ranking 200K items is slow • Keep it simple and transparent!
  • 22. 22 Presenter Contact Thank you for your attention! Dávid Zibriczky, PhD david.zibriczky@gmail.com