SlideShare a Scribd company logo
Titolo presentazione
sottotitolo
Milano, XX mese 20XX
Content-Based approaches for
Cold-Start Job Recommendations
ACM RecSys Challenge 2017
Lunatic Goats @PoliMi
M. Bianchi, F. Cesaro, F. Ciceri, M. Dagrada, A. Gasparin, D. Grattarola,
I. Inajjar, A. M. Metelli, L. Cella
Lunatic Goats @PoliMi
Task Outline
● Cold Start recommendation scenario:
○ job posting recommendations;
○ focus on getting positive interactions;
○ penalized for negative interaction;
○ rewarded for recruiter Interest.
● Two phases:
○ Offline - predictions for fixed sets of items and users.
○ Online - daily recommendation to variable sets of users.
Lunatic Goats @PoliMi
Data Analysis - Impressions vs Interactions
● Impressions: ~97% of the data, little to no information
contained (discarded).
● Interactions: ~3% of the data.
● Interactions divided in:
○ positive interactions (types 1, 2 and 3);
○ negative interactions (type 4);
○ recruiter interest (type 5).
● Interactions treated with implicit approach.
Lunatic Goats @PoliMi
Local Validation
● Split the dataset in train and validation set.
● Random sampling procedure:
○ randomly select target items from dataset;
○ remove all interactions with these items;
○ pick target users as a subset of those who have
interactions with these items.
● Preserve the user-item ratio.
● No cross-validation, too much data
Lunatic Goats @PoliMi
Solution - Preprocessing
● One Hot Encoding of both user and items features.
● Feature aggregation:
● TF-IDF application.
● Negative User Filtering: removing heavy deleters.
Lunatic Goats @PoliMi
Solution Overview
Lunatic Goats @PoliMi
Solution - Negative Recommendation
● Scoring heavily penalized negative (type 4) interactions
● Using CBF approach, predict type 4 interactions
● Ensemble these predictions with negative weight
Lunatic Goats @PoliMi
Solution – Content Based Filtering algorithms (CBF)
Recommend to a user items similar to the ones he/she likes.
● Run separately on positive (CBF+) and negative (CBF-)
interactions.
● Tanimoto similarity between items:
● Recommendation performed for filtered users only:
● Penalize heavy clickers.
Lunatic Goats @PoliMi
Solution – Profile Matching (PM)
Recommend to a user items matching his/her profile.
● Cosine similarity between user and item:
● Items’ tags and titles compared with users’ jobroles.
● Recommendation performed for filtered users only.
● Differently from CBF, PM is able to recommend also cold-start
users.
Lunatic Goats @PoliMi
Solution – Collaborative Filtering algorithms
● CF cannot be run directly in a cold-start scenario.
● Content-based microclustering approach:
○ for each cold-start item associate the interactions of the
top 5 CBF-similar non-cold-start items;
○ run standard CF algorithms.
● CF algorithms:
○ CF with item cosine similarity;
○ iALS (Implicit Alternating Least Squares).
Lunatic Goats @PoliMi
Solution - Ensemble Structure
● Divide algorithms by nature.
● Normalize and weight each
layer.
● Generate upper layers by
adding lower layers.
● Output 100 best scores.
Lunatic Goats @PoliMi
Solution - Parameter Tuning
● Ensemble tuning:
○ 9 weights (one for each block), reduced to 6 due to
normalization;
○ non-differentiable scoring function;
○ gradient-free optimization methods:
■ Genetic Algorithms - quick and acceptable results;
■ Powell’s Conjugate Direction method - slower but
superior results.
● Individual algorithms tuning:
○ greedy search on local test.
Lunatic Goats @PoliMi
Online - Changes to ensemble
● Normalization type.
● Cutting for each user
before items.
● Excluding slower
algorithms - prompt push
gives more exposure →
better scores.
Lunatic Goats @PoliMi
Architecture & Runtime
● Recommender is run on VM’s with 8 cores and 16GB RAM.
● Only exception is content-based microclustering and iALS,
run on 8 core 64GB RAM.
● Code is heavily optimized to use little memory efficiently
(sparse matrix representations, efficient matrix operations).
● Results in optimal runtime.
Lunatic Goats @PoliMi
Scores - Local vs Offline
Algorithm Local score Leaderboard score Execution time
CBF+ 57852 60257 13 min
CBF- -1330 -8529 4 min
PM 17260 16777 7 min
CF 42213 39250 12 min
iALS 48081 52411 150 min
XING Baseline 14742 14395 40 min
Ensemble 60625 71372 2 min
Lunatic Goats @PoliMi
Results and Conclusions
● 2nd
place in the online phase;
● 1st
place in the offline phase.
● Points of strength:
○ speed (in particular offline ~20 min);
○ ease of implementation.
● Extensions:
○ feature weighting (user personalized, feature interaction);
○ time decay models.

More Related Content

Similar to Content-Based approaches for Cold-Start Job Recommendations

Similar to Content-Based approaches for Cold-Start Job Recommendations (20)

Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
 
Recommender systems
Recommender systems Recommender systems
Recommender systems
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real life
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Credit Card Default Risk
Credit Card Default RiskCredit Card Default Risk
Credit Card Default Risk
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to Production
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja...
Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja...Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja...
Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja...
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Content-Based approaches for Cold-Start Job Recommendations

  • 1. Titolo presentazione sottotitolo Milano, XX mese 20XX Content-Based approaches for Cold-Start Job Recommendations ACM RecSys Challenge 2017 Lunatic Goats @PoliMi M. Bianchi, F. Cesaro, F. Ciceri, M. Dagrada, A. Gasparin, D. Grattarola, I. Inajjar, A. M. Metelli, L. Cella
  • 2. Lunatic Goats @PoliMi Task Outline ● Cold Start recommendation scenario: ○ job posting recommendations; ○ focus on getting positive interactions; ○ penalized for negative interaction; ○ rewarded for recruiter Interest. ● Two phases: ○ Offline - predictions for fixed sets of items and users. ○ Online - daily recommendation to variable sets of users.
  • 3. Lunatic Goats @PoliMi Data Analysis - Impressions vs Interactions ● Impressions: ~97% of the data, little to no information contained (discarded). ● Interactions: ~3% of the data. ● Interactions divided in: ○ positive interactions (types 1, 2 and 3); ○ negative interactions (type 4); ○ recruiter interest (type 5). ● Interactions treated with implicit approach.
  • 4. Lunatic Goats @PoliMi Local Validation ● Split the dataset in train and validation set. ● Random sampling procedure: ○ randomly select target items from dataset; ○ remove all interactions with these items; ○ pick target users as a subset of those who have interactions with these items. ● Preserve the user-item ratio. ● No cross-validation, too much data
  • 5. Lunatic Goats @PoliMi Solution - Preprocessing ● One Hot Encoding of both user and items features. ● Feature aggregation: ● TF-IDF application. ● Negative User Filtering: removing heavy deleters.
  • 7. Lunatic Goats @PoliMi Solution - Negative Recommendation ● Scoring heavily penalized negative (type 4) interactions ● Using CBF approach, predict type 4 interactions ● Ensemble these predictions with negative weight
  • 8. Lunatic Goats @PoliMi Solution – Content Based Filtering algorithms (CBF) Recommend to a user items similar to the ones he/she likes. ● Run separately on positive (CBF+) and negative (CBF-) interactions. ● Tanimoto similarity between items: ● Recommendation performed for filtered users only: ● Penalize heavy clickers.
  • 9. Lunatic Goats @PoliMi Solution – Profile Matching (PM) Recommend to a user items matching his/her profile. ● Cosine similarity between user and item: ● Items’ tags and titles compared with users’ jobroles. ● Recommendation performed for filtered users only. ● Differently from CBF, PM is able to recommend also cold-start users.
  • 10. Lunatic Goats @PoliMi Solution – Collaborative Filtering algorithms ● CF cannot be run directly in a cold-start scenario. ● Content-based microclustering approach: ○ for each cold-start item associate the interactions of the top 5 CBF-similar non-cold-start items; ○ run standard CF algorithms. ● CF algorithms: ○ CF with item cosine similarity; ○ iALS (Implicit Alternating Least Squares).
  • 11. Lunatic Goats @PoliMi Solution - Ensemble Structure ● Divide algorithms by nature. ● Normalize and weight each layer. ● Generate upper layers by adding lower layers. ● Output 100 best scores.
  • 12. Lunatic Goats @PoliMi Solution - Parameter Tuning ● Ensemble tuning: ○ 9 weights (one for each block), reduced to 6 due to normalization; ○ non-differentiable scoring function; ○ gradient-free optimization methods: ■ Genetic Algorithms - quick and acceptable results; ■ Powell’s Conjugate Direction method - slower but superior results. ● Individual algorithms tuning: ○ greedy search on local test.
  • 13. Lunatic Goats @PoliMi Online - Changes to ensemble ● Normalization type. ● Cutting for each user before items. ● Excluding slower algorithms - prompt push gives more exposure → better scores.
  • 14. Lunatic Goats @PoliMi Architecture & Runtime ● Recommender is run on VM’s with 8 cores and 16GB RAM. ● Only exception is content-based microclustering and iALS, run on 8 core 64GB RAM. ● Code is heavily optimized to use little memory efficiently (sparse matrix representations, efficient matrix operations). ● Results in optimal runtime.
  • 15. Lunatic Goats @PoliMi Scores - Local vs Offline Algorithm Local score Leaderboard score Execution time CBF+ 57852 60257 13 min CBF- -1330 -8529 4 min PM 17260 16777 7 min CF 42213 39250 12 min iALS 48081 52411 150 min XING Baseline 14742 14395 40 min Ensemble 60625 71372 2 min
  • 16. Lunatic Goats @PoliMi Results and Conclusions ● 2nd place in the online phase; ● 1st place in the offline phase. ● Points of strength: ○ speed (in particular offline ~20 min); ○ ease of implementation. ● Extensions: ○ feature weighting (user personalized, feature interaction); ○ time decay models.