SlideShare a Scribd company logo
Challenge statement
Our Solution
What could we do better?
RecSys Challenge 2016
job recommendations based on preselection of offers and gradient
boosting
Andrzej Pacuk Piotr Sankowski Karol W˛egrzycki
Adam Witkowski Piotr Wygocki
apacuk@mimuw.edu.pl
University of Warsaw
RecSys Challenge 2016
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Outline
1 Challenge statement
2 Our Solution
Candidate items selection
Learning probabilities
Features
3 What could we do better?
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Problem
Xing.com dataset:
user profiles (experience, education, current job’s roles, etc.),
job (item) offer description (title, tags, employment type, etc.),
past recommendations (impressions),
user positive (clicking, bookmarking, replying) and negative
(deleting) interactions with items.
Task: predict user’s positive interactions.
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Evaluation
Secret ground truth (GT): positive interactions from test week.
Mean average precision-like (MAP) measure.
Online evaluation.
Finished 2nd!
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Solution’s schema
user
job #1 job #2 job #3
select candidates
predict probabilities
sort
... job #N
job #1
0.3
job #2
0.7
job #3
0.4
...
job #N
0.5
job #15
0.9
job #34
0.89
...
job #124
0.75
take top 30
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Training set
Training GT: positive interactions of last week.
Local score.
Separate candidates and features for training and full dataset!
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates
Candidate - item with high:
P [i ∈ GT(u)] .
20 categories.
Ranking: e.g. sort interactions by timestamp.
∼ 300 candidates per user (0.1% of all items).
37% cover of training GT.
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates categories
Users’s interactions (Int(u)) sorted by week and events count
within week,
Similarly for impressions (Imp(u)),
Int(u ) for users u sorted by:
Jaccard(Int(u), Int(u )).
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates (cold start)
items i sorted by:
max
i ∈Int(u)
|tags(i) ∩ tags(i )|,
items i sorted by:
|jobroles(u) ∩ tags(i)|,
globally most popular items.
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidate ranking
XGBoost (Gradient Boosting Decision Trees).
Optimizing logloss.
Training file from preselected candidates:
all positive,
sampled negative.
77.5% of perfect candidates ranking’s score.
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Features
Feature maps (user, item) to real number.
12 groups.
Total 273.
Worked well with:
highly correlated features,
null values,
no scaling/normalization.
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Feature definitions (sample)
Event based item: percentage of Int(u) having same property
(e.g., employment) as item i.
Most similar user who clicked item:
max
u ∈Users(i)
Jaccard(Int(u), Int(u )).
Most similar item clicked by user:
max
i ∈Int(u)
Jaccard(Users(i), Users(i )).
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Top feature groups
feature group fscore
event based user (item) profile 41%
tags + title 7%
item global popularity 22%
trend 10%
weekday 4%
most similar 10%
item clicked by user 6%
user who clicked item 4%
user total events 8%
in last week 4%
seconds from last user activity 7%
max common tags with clicked item 4%
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Possible improvements
Training file:
8x bigger,
sample 1/4 negative candidates (instead of random 5) per user.
score: +6.5k.
Ensembling models.
Layer scores:
Candidates selection: 37%.
Ranking candidates: 77.5%.
mim-solutions.pl RecSys Challenge 2016
Challenge statement
Our Solution
What could we do better?
Thank you
apacuk@mimuw.edu.pl
mim-solutions.pl
mim-solutions.pl RecSys Challenge 2016

More Related Content

Viewers also liked

Recruit recsys-review-magambo
Recruit recsys-review-magamboRecruit recsys-review-magambo
Recruit recsys-review-magambo
Elie Magambo Gatete
 
Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)
Sardana Nazarova
 
allegrotech - Data science meetup #1 Intro
allegrotech - Data science  meetup #1 Introallegrotech - Data science  meetup #1 Intro
allegrotech - Data science meetup #1 Intro
Bartlomiej Twardowski
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines Introduction
Bartlomiej Twardowski
 
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
Bartlomiej Twardowski
 
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
Bartlomiej Twardowski
 
Warsaw Data Science - Recsys2016 Quick Review
Warsaw Data Science - Recsys2016 Quick ReviewWarsaw Data Science - Recsys2016 Quick Review
Warsaw Data Science - Recsys2016 Quick Review
Bartlomiej Twardowski
 
Prezentacja z Big Data Tech 2016: Machine Learning vs Big Data
Prezentacja z Big Data Tech 2016: Machine Learning vs Big DataPrezentacja z Big Data Tech 2016: Machine Learning vs Big Data
Prezentacja z Big Data Tech 2016: Machine Learning vs Big Data
Bartlomiej Twardowski
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Bartlomiej Twardowski
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
T212
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 

Viewers also liked (12)

Recruit recsys-review-magambo
Recruit recsys-review-magamboRecruit recsys-review-magambo
Recruit recsys-review-magambo
 
Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)
 
allegrotech - Data science meetup #1 Intro
allegrotech - Data science  meetup #1 Introallegrotech - Data science  meetup #1 Intro
allegrotech - Data science meetup #1 Intro
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines Introduction
 
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
 
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
 
Warsaw Data Science - Recsys2016 Quick Review
Warsaw Data Science - Recsys2016 Quick ReviewWarsaw Data Science - Recsys2016 Quick Review
Warsaw Data Science - Recsys2016 Quick Review
 
Prezentacja z Big Data Tech 2016: Machine Learning vs Big Data
Prezentacja z Big Data Tech 2016: Machine Learning vs Big DataPrezentacja z Big Data Tech 2016: Machine Learning vs Big Data
Prezentacja z Big Data Tech 2016: Machine Learning vs Big Data
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Recently uploaded

[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 

Recently uploaded (20)

[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 

RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

  • 1. Challenge statement Our Solution What could we do better? RecSys Challenge 2016 job recommendations based on preselection of offers and gradient boosting Andrzej Pacuk Piotr Sankowski Karol W˛egrzycki Adam Witkowski Piotr Wygocki apacuk@mimuw.edu.pl University of Warsaw RecSys Challenge 2016 mim-solutions.pl RecSys Challenge 2016
  • 2. Challenge statement Our Solution What could we do better? Outline 1 Challenge statement 2 Our Solution Candidate items selection Learning probabilities Features 3 What could we do better? mim-solutions.pl RecSys Challenge 2016
  • 3. Challenge statement Our Solution What could we do better? Problem Xing.com dataset: user profiles (experience, education, current job’s roles, etc.), job (item) offer description (title, tags, employment type, etc.), past recommendations (impressions), user positive (clicking, bookmarking, replying) and negative (deleting) interactions with items. Task: predict user’s positive interactions. mim-solutions.pl RecSys Challenge 2016
  • 4. Challenge statement Our Solution What could we do better? Evaluation Secret ground truth (GT): positive interactions from test week. Mean average precision-like (MAP) measure. Online evaluation. Finished 2nd! mim-solutions.pl RecSys Challenge 2016
  • 5. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Solution’s schema user job #1 job #2 job #3 select candidates predict probabilities sort ... job #N job #1 0.3 job #2 0.7 job #3 0.4 ... job #N 0.5 job #15 0.9 job #34 0.89 ... job #124 0.75 take top 30 mim-solutions.pl RecSys Challenge 2016
  • 6. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Training set Training GT: positive interactions of last week. Local score. Separate candidates and features for training and full dataset! mim-solutions.pl RecSys Challenge 2016
  • 7. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates Candidate - item with high: P [i ∈ GT(u)] . 20 categories. Ranking: e.g. sort interactions by timestamp. ∼ 300 candidates per user (0.1% of all items). 37% cover of training GT. mim-solutions.pl RecSys Challenge 2016
  • 8. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates categories Users’s interactions (Int(u)) sorted by week and events count within week, Similarly for impressions (Imp(u)), Int(u ) for users u sorted by: Jaccard(Int(u), Int(u )). mim-solutions.pl RecSys Challenge 2016
  • 9. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates (cold start) items i sorted by: max i ∈Int(u) |tags(i) ∩ tags(i )|, items i sorted by: |jobroles(u) ∩ tags(i)|, globally most popular items. mim-solutions.pl RecSys Challenge 2016
  • 10. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidate ranking XGBoost (Gradient Boosting Decision Trees). Optimizing logloss. Training file from preselected candidates: all positive, sampled negative. 77.5% of perfect candidates ranking’s score. mim-solutions.pl RecSys Challenge 2016
  • 11. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Features Feature maps (user, item) to real number. 12 groups. Total 273. Worked well with: highly correlated features, null values, no scaling/normalization. mim-solutions.pl RecSys Challenge 2016
  • 12. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Feature definitions (sample) Event based item: percentage of Int(u) having same property (e.g., employment) as item i. Most similar user who clicked item: max u ∈Users(i) Jaccard(Int(u), Int(u )). Most similar item clicked by user: max i ∈Int(u) Jaccard(Users(i), Users(i )). mim-solutions.pl RecSys Challenge 2016
  • 13. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Top feature groups feature group fscore event based user (item) profile 41% tags + title 7% item global popularity 22% trend 10% weekday 4% most similar 10% item clicked by user 6% user who clicked item 4% user total events 8% in last week 4% seconds from last user activity 7% max common tags with clicked item 4% mim-solutions.pl RecSys Challenge 2016
  • 14. Challenge statement Our Solution What could we do better? Possible improvements Training file: 8x bigger, sample 1/4 negative candidates (instead of random 5) per user. score: +6.5k. Ensembling models. Layer scores: Candidates selection: 37%. Ranking candidates: 77.5%. mim-solutions.pl RecSys Challenge 2016
  • 15. Challenge statement Our Solution What could we do better? Thank you apacuk@mimuw.edu.pl mim-solutions.pl mim-solutions.pl RecSys Challenge 2016