SlideShare a Scribd company logo
1
A COMBINATION OF SIMPLE MODELS BY FORWARD
PREDICTOR SELECTION FOR JOB RECOMMENDATION
Dávid Zibriczky, PhD (DaveXster)
Budapest University of Technology and Economics,
Budapest, Hungary
2
The Dataset – Data preparation
• Events (interactions, impressions)
› Target format: (time,user_id,item_id,type,value)
› Interactions  Format OK
› Impressions:
• Generating unique (time,user_id,item_id) triples
• Value  count of their occurrence
• Time  12pm on Thursday of the week
• Type  5
• Catalog (items, users)
› Target format:(id,key1,key2,…,keyN)
› Items and users  Format OK
› Unknown „0” values  empty values
› Inconsistency: Geo-location vs. country/region  Metadata enhancement based on geo-location
3
The Dataset – Basic statistics
Size of training set
• 211M events, 2.8M users, 1.3M items
• Effect: huge and very sparse matrix
Distribution
• 95% of events are impressions
• 72% of the users have impressions only
• Item support for interactions is low (~9)
• Effect: weak collaboration using interactions
Target users
• 150K users
• 73% active, 16% inactive, 12% new
• Effect: user cold start and warm-up problem
Data source #events #users #items
Interactions 8,826,678 784,687 1,029,480
Impressions 201,872,093 2,755,167 846,814
All events 210,698,777 2,792,405 1,257,422
Catalog - 1,367,057 1,358,098
Catalog OR Events - 2,829,563 1,362,890
4
Methods – Concept
Terminology
• Method: A technique of estimating the relevance of an item for a user (p-Value)
• Predictor/model: An instance of a method with a specified parameter setting
• Combination: Linear combination of prediction values for a user-item pairs
Approach
1. Exploring the properties of the data set
2. Definition of „simple” methods with different functionality (time-decay is commonly used)*
3. Finding a set of relevant predictors and optimal combination of them
4. Top-N ranking of available event supported items with non-zero p-Values (~200K)
* Equations of the methods can be found in the paper
5
Methods – Item-kNN
• Observation: Very sparse user-item matrix (0.005%), 211M events
• Goal: Next best items to click, estimating recommendations of Xing
• Method: Standard Item-based kNN with special fetures
› Input-output event types
› Controlling popularity factor
› Similarity of the same item is 0
› Efficient implementation
• Notation: IKNN(I,O)
› I: input event type
› O: output event type
• Comment: No improvement combining other CF algorithms (MF, FM, User-kNN)
6
Methods – Recalling recommendations
• Chart: The distribution of impression
events by the number of weeks on that the
same item has already been shown
• Observation: 38% of recommendations
are recurring items
• Goal: Reverse engineering, recalling
recommendations
• Method:
› Recommendation of already shown items
› Weighted by expected CTR
• Notation: RCTR
7
Methods – Already seen items
• Chart: The probability of returning to an already
seen item after interacting on other items
• Observation: Significant probability of re-
clicking on an already clicked item
• Goal: Capturing re-clicking phenomena
• Method: Recommendation of already clicked
items
• Notation: AS(I)
8
Methods – User metadata-based popularity
• Observation:
› Significant amount of passive and new users
› All target users have metadata
• Goal:
› Semi-personalized recommendations for new users
› Improving accuracy on inactive users
• Method:
1. Item model: Expected popularity of an item in each user group
2. Prediction: Average popularity of an item for a user
› Applied keys: jobroles, edu_fieldofstudies
• Notation: UPOP
9
Methods – MS: Meta cosine similarity
• Observation:
› Item-cold start problem, many low-supported items
› Almost all items has metadata
• Goal:
› Model building for new items
› Improving the model of low-supported items
• Method:
1. Item model: Meta-data representation, tf-idf
2. User model: Meta-words of items seen by the user
3. Prediction: Average cosine similarity between user-item models
› Keys: tags, title, industry_id, geo_country, geo_region,
discipline_id
• Notation: MS
10
Methods – AP: Age-based popularity change
• Observation: Significant drop in popularity of
items with ~30 and ~60 days
• Goal: Underscoring these items
• Method: Expected ratio of the popularity in the
next week
• Notation: AP
11
Methods – OM: The omit method
• Observation: Unwanted items in recommendation lists
• Goal: Omitting poorly modelled items of a predictor or combination
• Method:
1. Sub-train-test split
2. Retrain a new combination
3. Generating top-N recommendations
4. Measuring how the total evaluation would change by omitting items
5. Omitting worst K items on the original combination
• Notation: OM
12
Methods – Optimization
1. Time-based train-test split (test set: last week)
2. Coordinate gradient descent optimization of various methods  candidate predictor set
3. Support-based distinct user groups (new users, inactive users, 10 equal sized group of active users)
4. Forward Predictor Selection
1. Initialization:
1. Predictors that are selected from the candidate set for final combination  selected predictor set
2. Selected predictor set is empty in the beginning
2. Loop:
1. Calculate the accuracy of selected predictor set
2. For all remained candidate predictor, calculate the gain in accuracy that would give the predictor if it
would be moved to the selected set
3. Move the best one to the selected set and recalculate combination weights
4. Repeat the loop until there is improvement or reamining candidate preditor
3. Return: the set of the predictors and corresponding weights
5. Retrain selected predictors on the full data set
13
… let’s put it together and see how it performs!
14
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
15
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
16
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
17
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
› 6 models: 95% of final score
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
18
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
› 6 models: 95% of final score
› 10 models: 650K+ score (<30 mins. training time)
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
7 +AS(3) 1,205 178 636,498 3
8 +IKNN(R,C) 1,557 197 643,145 3
9 +AS(4) 1,582 202 644,710 3
10 +AP 1,621 207 652,802 3
19
Evaluation – Forward Predictor Selection
• Best single model
› Item-kNN trained on positive interactions
› 2.5 min training time
› 7 ms prediction time
• Sub-combinations
› 4 models: 600K+ score (w/o item metadata)
› 5 models: 3rd place
› 6 models: 95% of final score
› 10 models: 650K+ score (<30 mins. training time)
• Final combination
› 3rd place
› ~666K leaderboard score
› 11 instances
› user-support-based weighting
› 3h+ training time, 200 ms prediction time
# Predictor tTR(s)* tPR(ms)* Score Rank
1 IKNN(C,C) 148 7 450,046 24
2 +RCTR 208 15 548,338 9
3 +AS(1) 237 17 590,526 6
4 +UPOP 247 50 614,674 5
5 +MS 364 122 623,909 3
6 +IKNN(R,R) 1,150 168 635,278 3
7 +AS(3) 1,205 178 636,498 3
8 +IKNN(R,C) 1,557 197 643,145 3
9 +AS(4) 1,582 202 644,710 3
10 +AP 1,621 207 652,802 3
SUPP_C(1-10) 1,639 194 661,359 3
11 +OM 11,790 199 665,592 3
* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
20
Evaluation – Timeline
39
1514141415
121110
2 3
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3
115.4
366.9
418.7
438.3
454.2
468.4
481.9
513.4
569.6
596.5
600.2
603.2
610.0
611.3
611.6
625.2
627.2
627.5
628.9
633.1
637.6
638.1
639.7
640.4
643.5
644.7
652.8
653.2
653.7
665.6
0
5
10
15
20
25
30
35
40
45
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
Apr-25
May-02
May-09
May-16
May-23
May-30
Jun-06
Jun-13
Jun-20
Jun-27
Leaderboardrank
Leaderboardscore(thousands)
Date
Timeline
Initial setup Model design and implementation Final sprint
21
Lessons learnt
• Exploiting the specificity of the dataset
• Using Item-kNN over factorization in a very sparse dataset
• Paying attention to recurrence
• Forward Predictor Selection is effective
• Different optimization for different user groups
• Underscoring/omitting weak items
• Ranking 200K items is slow
• Keep it simple and transparent!
22
Presenter
Contact
Thank you for your attention!
Dávid Zibriczky, PhD
david.zibriczky@gmail.com

More Related Content

What's hot

Recsys2021_slides_sato
Recsys2021_slides_satoRecsys2021_slides_sato
Recsys2021_slides_sato
Masahiro Sato
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Federico Cargnelutti
 
Collaborative filtering at scale
Collaborative filtering at scaleCollaborative filtering at scale
Collaborative filtering at scale
huguk
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
Şeyda Hatipoğlu
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
GrubhubTech
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
Faisal Siddiqi
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
Kishor Datta Gupta
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
Changsung Moon
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
Alejandro Bellogin
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
Jayesh Lahori
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
Changsung Moon
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Lior Rokach
 
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
Iván Palomares Carrascosa
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
Yusuke Yamamoto
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
NAVER Engineering
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
Neha Kulkarni
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
Ernesto Mislej
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Varad Meru
 

What's hot (20)

Recsys2021_slides_sato
Recsys2021_slides_satoRecsys2021_slides_sato
Recsys2021_slides_sato
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Collaborative filtering at scale
Collaborative filtering at scaleCollaborative filtering at scale
Collaborative filtering at scale
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 

Similar to A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
Kunwoo Park
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
Aravind Sesagiri Raamkumar
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
Manuel Martín
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
Marco Meoni
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
Poo Kuan Hoong
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
Khadija Atiya
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
Alon Bochman, CFA
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
QuantUniversity
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
Pankaj Chandan Mohapatra
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
Arithmer Inc.
 
Concept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance FeedbackConcept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance Feedback
Sonia Haiduc
 
Irrf Presentation
Irrf PresentationIrrf Presentation
Irrf Presentation
gregoryg
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
Bryan Yang
 
kdd2015
kdd2015kdd2015
Sbst2018 contest2018
Sbst2018 contest2018Sbst2018 contest2018
Sbst2018 contest2018
Annibale Panichella
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
Xavier Amatriain
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
Brodmann17
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Data Con LA
 

Similar to A Combination of Simple Models by Forward Predictor Selection for Job Recommendation (20)

[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
 
Concept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance FeedbackConcept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance Feedback
 
Irrf Presentation
Irrf PresentationIrrf Presentation
Irrf Presentation
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
kdd2015
kdd2015kdd2015
kdd2015
 
Sbst2018 contest2018
Sbst2018 contest2018Sbst2018 contest2018
Sbst2018 contest2018
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 

More from David Zibriczky

Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
David Zibriczky
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment Businesses
David Zibriczky
 
Improving the TV User Experience by Algorithms: Personalized Content Recommen...
Improving the TV User Experience by Algorithms: Personalized Content Recommen...Improving the TV User Experience by Algorithms: Personalized Content Recommen...
Improving the TV User Experience by Algorithms: Personalized Content Recommen...
David Zibriczky
 
Recommender Systems meet Finance - A literature review
Recommender Systems meet Finance - A literature reviewRecommender Systems meet Finance - A literature review
Recommender Systems meet Finance - A literature review
David Zibriczky
 
Fast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender SystemsFast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender Systems
David Zibriczky
 
EPG content recommendation in large scale: a case study on interactive TV pla...
EPG content recommendation in large scale: a case study on interactive TV pla...EPG content recommendation in large scale: a case study on interactive TV pla...
EPG content recommendation in large scale: a case study on interactive TV pla...
David Zibriczky
 
Personalized recommendation of linear content on interactive TV platforms
Personalized recommendation of linear content on interactive TV platformsPersonalized recommendation of linear content on interactive TV platforms
Personalized recommendation of linear content on interactive TV platforms
David Zibriczky
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
David Zibriczky
 
Data Modeling in IPTV and OTT Recommender Systems
Data Modeling in IPTV and OTT Recommender SystemsData Modeling in IPTV and OTT Recommender Systems
Data Modeling in IPTV and OTT Recommender Systems
David Zibriczky
 
Entropy based asset pricing
Entropy based asset pricingEntropy based asset pricing
Entropy based asset pricing
David Zibriczky
 

More from David Zibriczky (10)

Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment Businesses
 
Improving the TV User Experience by Algorithms: Personalized Content Recommen...
Improving the TV User Experience by Algorithms: Personalized Content Recommen...Improving the TV User Experience by Algorithms: Personalized Content Recommen...
Improving the TV User Experience by Algorithms: Personalized Content Recommen...
 
Recommender Systems meet Finance - A literature review
Recommender Systems meet Finance - A literature reviewRecommender Systems meet Finance - A literature review
Recommender Systems meet Finance - A literature review
 
Fast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender SystemsFast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender Systems
 
EPG content recommendation in large scale: a case study on interactive TV pla...
EPG content recommendation in large scale: a case study on interactive TV pla...EPG content recommendation in large scale: a case study on interactive TV pla...
EPG content recommendation in large scale: a case study on interactive TV pla...
 
Personalized recommendation of linear content on interactive TV platforms
Personalized recommendation of linear content on interactive TV platformsPersonalized recommendation of linear content on interactive TV platforms
Personalized recommendation of linear content on interactive TV platforms
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Data Modeling in IPTV and OTT Recommender Systems
Data Modeling in IPTV and OTT Recommender SystemsData Modeling in IPTV and OTT Recommender Systems
Data Modeling in IPTV and OTT Recommender Systems
 
Entropy based asset pricing
Entropy based asset pricingEntropy based asset pricing
Entropy based asset pricing
 

Recently uploaded

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

A Combination of Simple Models by Forward Predictor Selection for Job Recommendation

  • 1. 1 A COMBINATION OF SIMPLE MODELS BY FORWARD PREDICTOR SELECTION FOR JOB RECOMMENDATION Dávid Zibriczky, PhD (DaveXster) Budapest University of Technology and Economics, Budapest, Hungary
  • 2. 2 The Dataset – Data preparation • Events (interactions, impressions) › Target format: (time,user_id,item_id,type,value) › Interactions  Format OK › Impressions: • Generating unique (time,user_id,item_id) triples • Value  count of their occurrence • Time  12pm on Thursday of the week • Type  5 • Catalog (items, users) › Target format:(id,key1,key2,…,keyN) › Items and users  Format OK › Unknown „0” values  empty values › Inconsistency: Geo-location vs. country/region  Metadata enhancement based on geo-location
  • 3. 3 The Dataset – Basic statistics Size of training set • 211M events, 2.8M users, 1.3M items • Effect: huge and very sparse matrix Distribution • 95% of events are impressions • 72% of the users have impressions only • Item support for interactions is low (~9) • Effect: weak collaboration using interactions Target users • 150K users • 73% active, 16% inactive, 12% new • Effect: user cold start and warm-up problem Data source #events #users #items Interactions 8,826,678 784,687 1,029,480 Impressions 201,872,093 2,755,167 846,814 All events 210,698,777 2,792,405 1,257,422 Catalog - 1,367,057 1,358,098 Catalog OR Events - 2,829,563 1,362,890
  • 4. 4 Methods – Concept Terminology • Method: A technique of estimating the relevance of an item for a user (p-Value) • Predictor/model: An instance of a method with a specified parameter setting • Combination: Linear combination of prediction values for a user-item pairs Approach 1. Exploring the properties of the data set 2. Definition of „simple” methods with different functionality (time-decay is commonly used)* 3. Finding a set of relevant predictors and optimal combination of them 4. Top-N ranking of available event supported items with non-zero p-Values (~200K) * Equations of the methods can be found in the paper
  • 5. 5 Methods – Item-kNN • Observation: Very sparse user-item matrix (0.005%), 211M events • Goal: Next best items to click, estimating recommendations of Xing • Method: Standard Item-based kNN with special fetures › Input-output event types › Controlling popularity factor › Similarity of the same item is 0 › Efficient implementation • Notation: IKNN(I,O) › I: input event type › O: output event type • Comment: No improvement combining other CF algorithms (MF, FM, User-kNN)
  • 6. 6 Methods – Recalling recommendations • Chart: The distribution of impression events by the number of weeks on that the same item has already been shown • Observation: 38% of recommendations are recurring items • Goal: Reverse engineering, recalling recommendations • Method: › Recommendation of already shown items › Weighted by expected CTR • Notation: RCTR
  • 7. 7 Methods – Already seen items • Chart: The probability of returning to an already seen item after interacting on other items • Observation: Significant probability of re- clicking on an already clicked item • Goal: Capturing re-clicking phenomena • Method: Recommendation of already clicked items • Notation: AS(I)
  • 8. 8 Methods – User metadata-based popularity • Observation: › Significant amount of passive and new users › All target users have metadata • Goal: › Semi-personalized recommendations for new users › Improving accuracy on inactive users • Method: 1. Item model: Expected popularity of an item in each user group 2. Prediction: Average popularity of an item for a user › Applied keys: jobroles, edu_fieldofstudies • Notation: UPOP
  • 9. 9 Methods – MS: Meta cosine similarity • Observation: › Item-cold start problem, many low-supported items › Almost all items has metadata • Goal: › Model building for new items › Improving the model of low-supported items • Method: 1. Item model: Meta-data representation, tf-idf 2. User model: Meta-words of items seen by the user 3. Prediction: Average cosine similarity between user-item models › Keys: tags, title, industry_id, geo_country, geo_region, discipline_id • Notation: MS
  • 10. 10 Methods – AP: Age-based popularity change • Observation: Significant drop in popularity of items with ~30 and ~60 days • Goal: Underscoring these items • Method: Expected ratio of the popularity in the next week • Notation: AP
  • 11. 11 Methods – OM: The omit method • Observation: Unwanted items in recommendation lists • Goal: Omitting poorly modelled items of a predictor or combination • Method: 1. Sub-train-test split 2. Retrain a new combination 3. Generating top-N recommendations 4. Measuring how the total evaluation would change by omitting items 5. Omitting worst K items on the original combination • Notation: OM
  • 12. 12 Methods – Optimization 1. Time-based train-test split (test set: last week) 2. Coordinate gradient descent optimization of various methods  candidate predictor set 3. Support-based distinct user groups (new users, inactive users, 10 equal sized group of active users) 4. Forward Predictor Selection 1. Initialization: 1. Predictors that are selected from the candidate set for final combination  selected predictor set 2. Selected predictor set is empty in the beginning 2. Loop: 1. Calculate the accuracy of selected predictor set 2. For all remained candidate predictor, calculate the gain in accuracy that would give the predictor if it would be moved to the selected set 3. Move the best one to the selected set and recalculate combination weights 4. Repeat the loop until there is improvement or reamining candidate preditor 3. Return: the set of the predictors and corresponding weights 5. Retrain selected predictors on the full data set
  • 13. 13 … let’s put it together and see how it performs!
  • 14. 14 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 * Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
  • 15. 15 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5
  • 16. 16 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3
  • 17. 17 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3
  • 18. 18 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score › 10 models: 650K+ score (<30 mins. training time) # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3 7 +AS(3) 1,205 178 636,498 3 8 +IKNN(R,C) 1,557 197 643,145 3 9 +AS(4) 1,582 202 644,710 3 10 +AP 1,621 207 652,802 3
  • 19. 19 Evaluation – Forward Predictor Selection • Best single model › Item-kNN trained on positive interactions › 2.5 min training time › 7 ms prediction time • Sub-combinations › 4 models: 600K+ score (w/o item metadata) › 5 models: 3rd place › 6 models: 95% of final score › 10 models: 650K+ score (<30 mins. training time) • Final combination › 3rd place › ~666K leaderboard score › 11 instances › user-support-based weighting › 3h+ training time, 200 ms prediction time # Predictor tTR(s)* tPR(ms)* Score Rank 1 IKNN(C,C) 148 7 450,046 24 2 +RCTR 208 15 548,338 9 3 +AS(1) 237 17 590,526 6 4 +UPOP 247 50 614,674 5 5 +MS 364 122 623,909 3 6 +IKNN(R,R) 1,150 168 635,278 3 7 +AS(3) 1,205 178 636,498 3 8 +IKNN(R,C) 1,557 197 643,145 3 9 +AS(4) 1,582 202 644,710 3 10 +AP 1,621 207 652,802 3 SUPP_C(1-10) 1,639 194 661,359 3 11 +OM 11,790 199 665,592 3 * Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory
  • 20. 20 Evaluation – Timeline 39 1514141415 121110 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 115.4 366.9 418.7 438.3 454.2 468.4 481.9 513.4 569.6 596.5 600.2 603.2 610.0 611.3 611.6 625.2 627.2 627.5 628.9 633.1 637.6 638.1 639.7 640.4 643.5 644.7 652.8 653.2 653.7 665.6 0 5 10 15 20 25 30 35 40 45 0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 Apr-25 May-02 May-09 May-16 May-23 May-30 Jun-06 Jun-13 Jun-20 Jun-27 Leaderboardrank Leaderboardscore(thousands) Date Timeline Initial setup Model design and implementation Final sprint
  • 21. 21 Lessons learnt • Exploiting the specificity of the dataset • Using Item-kNN over factorization in a very sparse dataset • Paying attention to recurrence • Forward Predictor Selection is effective • Different optimization for different user groups • Underscoring/omitting weak items • Ranking 200K items is slow • Keep it simple and transparent!
  • 22. 22 Presenter Contact Thank you for your attention! Dávid Zibriczky, PhD david.zibriczky@gmail.com