Content-Based approaches for Cold-Start Job Recommendations
ACM RecSys Challenge 2017
Lunatic Goats @PoliMi
M. Bianchi, F. Cesaro, F. Ciceri, M. Dagrada, A. Gasparin, D. Grattarola, I. Inajjar, A. M. Metelli, L. Cella
Content-Based approaches for Cold-Start Job Recommendations
1. Titolo presentazione
sottotitolo
Milano, XX mese 20XX
Content-Based approaches for
Cold-Start Job Recommendations
ACM RecSys Challenge 2017
Lunatic Goats @PoliMi
M. Bianchi, F. Cesaro, F. Ciceri, M. Dagrada, A. Gasparin, D. Grattarola,
I. Inajjar, A. M. Metelli, L. Cella
2. Lunatic Goats @PoliMi
Task Outline
● Cold Start recommendation scenario:
○ job posting recommendations;
○ focus on getting positive interactions;
○ penalized for negative interaction;
○ rewarded for recruiter Interest.
● Two phases:
○ Offline - predictions for fixed sets of items and users.
○ Online - daily recommendation to variable sets of users.
3. Lunatic Goats @PoliMi
Data Analysis - Impressions vs Interactions
● Impressions: ~97% of the data, little to no information
contained (discarded).
● Interactions: ~3% of the data.
● Interactions divided in:
○ positive interactions (types 1, 2 and 3);
○ negative interactions (type 4);
○ recruiter interest (type 5).
● Interactions treated with implicit approach.
4. Lunatic Goats @PoliMi
Local Validation
● Split the dataset in train and validation set.
● Random sampling procedure:
○ randomly select target items from dataset;
○ remove all interactions with these items;
○ pick target users as a subset of those who have
interactions with these items.
● Preserve the user-item ratio.
● No cross-validation, too much data
5. Lunatic Goats @PoliMi
Solution - Preprocessing
● One Hot Encoding of both user and items features.
● Feature aggregation:
● TF-IDF application.
● Negative User Filtering: removing heavy deleters.
7. Lunatic Goats @PoliMi
Solution - Negative Recommendation
● Scoring heavily penalized negative (type 4) interactions
● Using CBF approach, predict type 4 interactions
● Ensemble these predictions with negative weight
8. Lunatic Goats @PoliMi
Solution – Content Based Filtering algorithms (CBF)
Recommend to a user items similar to the ones he/she likes.
● Run separately on positive (CBF+) and negative (CBF-)
interactions.
● Tanimoto similarity between items:
● Recommendation performed for filtered users only:
● Penalize heavy clickers.
9. Lunatic Goats @PoliMi
Solution – Profile Matching (PM)
Recommend to a user items matching his/her profile.
● Cosine similarity between user and item:
● Items’ tags and titles compared with users’ jobroles.
● Recommendation performed for filtered users only.
● Differently from CBF, PM is able to recommend also cold-start
users.
10. Lunatic Goats @PoliMi
Solution – Collaborative Filtering algorithms
● CF cannot be run directly in a cold-start scenario.
● Content-based microclustering approach:
○ for each cold-start item associate the interactions of the
top 5 CBF-similar non-cold-start items;
○ run standard CF algorithms.
● CF algorithms:
○ CF with item cosine similarity;
○ iALS (Implicit Alternating Least Squares).
11. Lunatic Goats @PoliMi
Solution - Ensemble Structure
● Divide algorithms by nature.
● Normalize and weight each
layer.
● Generate upper layers by
adding lower layers.
● Output 100 best scores.
12. Lunatic Goats @PoliMi
Solution - Parameter Tuning
● Ensemble tuning:
○ 9 weights (one for each block), reduced to 6 due to
normalization;
○ non-differentiable scoring function;
○ gradient-free optimization methods:
■ Genetic Algorithms - quick and acceptable results;
■ Powell’s Conjugate Direction method - slower but
superior results.
● Individual algorithms tuning:
○ greedy search on local test.
13. Lunatic Goats @PoliMi
Online - Changes to ensemble
● Normalization type.
● Cutting for each user
before items.
● Excluding slower
algorithms - prompt push
gives more exposure →
better scores.
14. Lunatic Goats @PoliMi
Architecture & Runtime
● Recommender is run on VM’s with 8 cores and 16GB RAM.
● Only exception is content-based microclustering and iALS,
run on 8 core 64GB RAM.
● Code is heavily optimized to use little memory efficiently
(sparse matrix representations, efficient matrix operations).
● Results in optimal runtime.
15. Lunatic Goats @PoliMi
Scores - Local vs Offline
Algorithm Local score Leaderboard score Execution time
CBF+ 57852 60257 13 min
CBF- -1330 -8529 4 min
PM 17260 16777 7 min
CF 42213 39250 12 min
iALS 48081 52411 150 min
XING Baseline 14742 14395 40 min
Ensemble 60625 71372 2 min
16. Lunatic Goats @PoliMi
Results and Conclusions
● 2nd
place in the online phase;
● 1st
place in the offline phase.
● Points of strength:
○ speed (in particular offline ~20 min);
○ ease of implementation.
● Extensions:
○ feature weighting (user personalized, feature interaction);
○ time decay models.