Content-Based approaches for Cold-Start Job Recommendations

Titolo presentazione
sottotitolo
Milano, XX mese 20XX
Content-Based approaches for
Cold-Start Job Recommendations
ACM RecSys Challenge 2017
Lunatic Goats @PoliMi
M. Bianchi, F. Cesaro, F. Ciceri, M. Dagrada, A. Gasparin, D. Grattarola,
I. Inajjar, A. M. Metelli, L. Cella
Lunatic Goats @PoliMi
Task Outline
● Cold Start recommendation scenario:
○ job posting recommendations;
○ focus on getting positive interactions;
○ penalized for negative interaction;
○ rewarded for recruiter Interest.
● Two phases:
○ Offline - predictions for fixed sets of items and users.
○ Online - daily recommendation to variable sets of users.
Lunatic Goats @PoliMi
Data Analysis - Impressions vs Interactions
● Impressions: ~97% of the data, little to no information
contained (discarded).
● Interactions: ~3% of the data.
● Interactions divided in:
○ positive interactions (types 1, 2 and 3);
○ negative interactions (type 4);
○ recruiter interest (type 5).
● Interactions treated with implicit approach.
Lunatic Goats @PoliMi
Local Validation
● Split the dataset in train and validation set.
● Random sampling procedure:
○ randomly select target items from dataset;
○ remove all interactions with these items;
○ pick target users as a subset of those who have
interactions with these items.
● Preserve the user-item ratio.
● No cross-validation, too much data
Lunatic Goats @PoliMi
Solution - Preprocessing
● One Hot Encoding of both user and items features.
● Feature aggregation:
● TF-IDF application.
● Negative User Filtering: removing heavy deleters.
Lunatic Goats @PoliMi
Solution Overview
Lunatic Goats @PoliMi
Solution - Negative Recommendation
● Scoring heavily penalized negative (type 4) interactions
● Using CBF approach, predict type 4 interactions
● Ensemble these predictions with negative weight
Lunatic Goats @PoliMi
Solution – Content Based Filtering algorithms (CBF)
Recommend to a user items similar to the ones he/she likes.
● Run separately on positive (CBF+) and negative (CBF-)
interactions.
● Tanimoto similarity between items:
● Recommendation performed for filtered users only:
● Penalize heavy clickers.
Lunatic Goats @PoliMi
Solution – Profile Matching (PM)
Recommend to a user items matching his/her profile.
● Cosine similarity between user and item:
● Items’ tags and titles compared with users’ jobroles.
● Recommendation performed for filtered users only.
● Differently from CBF, PM is able to recommend also cold-start
users.
Lunatic Goats @PoliMi
Solution – Collaborative Filtering algorithms
● CF cannot be run directly in a cold-start scenario.
● Content-based microclustering approach:
○ for each cold-start item associate the interactions of the
top 5 CBF-similar non-cold-start items;
○ run standard CF algorithms.
● CF algorithms:
○ CF with item cosine similarity;
○ iALS (Implicit Alternating Least Squares).
Lunatic Goats @PoliMi
Solution - Ensemble Structure
● Divide algorithms by nature.
● Normalize and weight each
layer.
● Generate upper layers by
adding lower layers.
● Output 100 best scores.
Lunatic Goats @PoliMi
Solution - Parameter Tuning
● Ensemble tuning:
○ 9 weights (one for each block), reduced to 6 due to
normalization;
○ non-differentiable scoring function;
○ gradient-free optimization methods:
■ Genetic Algorithms - quick and acceptable results;
■ Powell’s Conjugate Direction method - slower but
superior results.
● Individual algorithms tuning:
○ greedy search on local test.
Lunatic Goats @PoliMi
Online - Changes to ensemble
● Normalization type.
● Cutting for each user
before items.
● Excluding slower
algorithms - prompt push
gives more exposure →
better scores.
Lunatic Goats @PoliMi
Architecture & Runtime
● Recommender is run on VM’s with 8 cores and 16GB RAM.
● Only exception is content-based microclustering and iALS,
run on 8 core 64GB RAM.
● Code is heavily optimized to use little memory efficiently
(sparse matrix representations, efficient matrix operations).
● Results in optimal runtime.
Lunatic Goats @PoliMi
Scores - Local vs Offline
Algorithm Local score Leaderboard score Execution time
CBF+ 57852 60257 13 min
CBF- -1330 -8529 4 min
PM 17260 16777 7 min
CF 42213 39250 12 min
iALS 48081 52411 150 min
XING Baseline 14742 14395 40 min
Ensemble 60625 71372 2 min
Lunatic Goats @PoliMi
Results and Conclusions
● 2nd
place in the online phase;
● 1st
place in the offline phase.
● Points of strength:
○ speed (in particular offline ~20 min);
○ ease of implementation.
● Extensions:
○ feature weighting (user personalized, feature interaction);
○ time decay models.
1 of 16

Recommended

Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection by
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionCombining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionGianluca Bontempi
333 views36 slides
How does ChatGPT work: an Information Retrieval perspective by
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveSease
485 views19 slides
MGaze: Multi-Gaze Interactions by
MGaze: Multi-Gaze InteractionsMGaze: Multi-Gaze Interactions
MGaze: Multi-Gaze InteractionsRajith Bhanuka Mahanama
27 views20 slides
Tips for data science competitions by
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
85.6K views32 slides
Structured prediction with reinforcement learning by
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learningguruprasad110
518 views22 slides
Dimensionality Reduction by
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
1.5K views18 slides

More Related Content

Similar to Content-Based approaches for Cold-Start Job Recommendations

Recommender Systems by
Recommender SystemsRecommender Systems
Recommender SystemsCarlos Castillo (ChaTo)
2.8K views67 slides
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series by
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesLuciano Pesci, PhD
197 views31 slides
Recommender systems by
Recommender systems Recommender systems
Recommender systems Mahmoud Khaled
70 views40 slides
BSSML16 L5. Summary Day 1 Sessions by
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
319 views38 slides
Model selection and tuning at scale by
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scaleOwen Zhang
2.6K views25 slides
PAISS (PRAIRIE AI Summer School) Digest July 2018 by
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 Natalia Díaz Rodríguez
764 views28 slides

Similar to Content-Based approaches for Cold-Start Job Recommendations(20)

Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series by Luciano Pesci, PhD
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Luciano Pesci, PhD197 views
BSSML16 L5. Summary Day 1 Sessions by BigML, Inc
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
BigML, Inc319 views
Model selection and tuning at scale by Owen Zhang
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
Owen Zhang2.6K views
VSSML16 LR1. Summary Day 1 by BigML, Inc
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
BigML, Inc643 views
Better Living Through Analytics - Louis Cialdella Product School by Louis Cialdella
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
Louis Cialdella185 views
Click prediction: kaggle competitions vs real life by Alexey Grigorev
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real life
Alexey Grigorev239 views
Using SigOpt to Tune Deep Learning Models with Nervana Cloud by SigOpt
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt1K views
Credit Card Default Risk by Vipul55627
Credit Card Default RiskCredit Card Default Risk
Credit Card Default Risk
Vipul556279 views
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz... by SigOpt
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
SigOpt264 views
DC02. Interpretation of predictions by Anton Kulesh
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
Anton Kulesh365 views
Automatic Image Cropping - A journey from a Master Thesis to Production by Alexey Grigorev
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to Production
Alexey Grigorev172 views
An introduction to deep reinforcement learning by Big Data Colombia
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia5.1K views
Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja... by Unicon, Inc.
Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja...Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja...
Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (Ja...
Unicon, Inc.454 views
10 Lessons Learned from Building Machine Learning Systems by Xavier Amatriain
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain378.3K views
Actor critic algorithm by Jie-Han Chen
Actor critic algorithmActor critic algorithm
Actor critic algorithm
Jie-Han Chen2.6K views

Recently uploaded

[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptxDataScienceConferenc1
6 views12 slides
SUPER STORE SQL PROJECT.pptx by
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptxkhan888620
13 views16 slides
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int... by
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...DataScienceConferenc1
5 views17 slides
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
8 views36 slides
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...DataScienceConferenc1
5 views19 slides
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptxayeshabaig2004
7 views30 slides

Recently uploaded(20)

[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int... by DataScienceConferenc1
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines by DataScienceConferenc1
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821716 views
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf by Oppotus
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf
Oppotus23 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials16 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9017 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...

Content-Based approaches for Cold-Start Job Recommendations

  • 1. Titolo presentazione sottotitolo Milano, XX mese 20XX Content-Based approaches for Cold-Start Job Recommendations ACM RecSys Challenge 2017 Lunatic Goats @PoliMi M. Bianchi, F. Cesaro, F. Ciceri, M. Dagrada, A. Gasparin, D. Grattarola, I. Inajjar, A. M. Metelli, L. Cella
  • 2. Lunatic Goats @PoliMi Task Outline ● Cold Start recommendation scenario: ○ job posting recommendations; ○ focus on getting positive interactions; ○ penalized for negative interaction; ○ rewarded for recruiter Interest. ● Two phases: ○ Offline - predictions for fixed sets of items and users. ○ Online - daily recommendation to variable sets of users.
  • 3. Lunatic Goats @PoliMi Data Analysis - Impressions vs Interactions ● Impressions: ~97% of the data, little to no information contained (discarded). ● Interactions: ~3% of the data. ● Interactions divided in: ○ positive interactions (types 1, 2 and 3); ○ negative interactions (type 4); ○ recruiter interest (type 5). ● Interactions treated with implicit approach.
  • 4. Lunatic Goats @PoliMi Local Validation ● Split the dataset in train and validation set. ● Random sampling procedure: ○ randomly select target items from dataset; ○ remove all interactions with these items; ○ pick target users as a subset of those who have interactions with these items. ● Preserve the user-item ratio. ● No cross-validation, too much data
  • 5. Lunatic Goats @PoliMi Solution - Preprocessing ● One Hot Encoding of both user and items features. ● Feature aggregation: ● TF-IDF application. ● Negative User Filtering: removing heavy deleters.
  • 7. Lunatic Goats @PoliMi Solution - Negative Recommendation ● Scoring heavily penalized negative (type 4) interactions ● Using CBF approach, predict type 4 interactions ● Ensemble these predictions with negative weight
  • 8. Lunatic Goats @PoliMi Solution – Content Based Filtering algorithms (CBF) Recommend to a user items similar to the ones he/she likes. ● Run separately on positive (CBF+) and negative (CBF-) interactions. ● Tanimoto similarity between items: ● Recommendation performed for filtered users only: ● Penalize heavy clickers.
  • 9. Lunatic Goats @PoliMi Solution – Profile Matching (PM) Recommend to a user items matching his/her profile. ● Cosine similarity between user and item: ● Items’ tags and titles compared with users’ jobroles. ● Recommendation performed for filtered users only. ● Differently from CBF, PM is able to recommend also cold-start users.
  • 10. Lunatic Goats @PoliMi Solution – Collaborative Filtering algorithms ● CF cannot be run directly in a cold-start scenario. ● Content-based microclustering approach: ○ for each cold-start item associate the interactions of the top 5 CBF-similar non-cold-start items; ○ run standard CF algorithms. ● CF algorithms: ○ CF with item cosine similarity; ○ iALS (Implicit Alternating Least Squares).
  • 11. Lunatic Goats @PoliMi Solution - Ensemble Structure ● Divide algorithms by nature. ● Normalize and weight each layer. ● Generate upper layers by adding lower layers. ● Output 100 best scores.
  • 12. Lunatic Goats @PoliMi Solution - Parameter Tuning ● Ensemble tuning: ○ 9 weights (one for each block), reduced to 6 due to normalization; ○ non-differentiable scoring function; ○ gradient-free optimization methods: ■ Genetic Algorithms - quick and acceptable results; ■ Powell’s Conjugate Direction method - slower but superior results. ● Individual algorithms tuning: ○ greedy search on local test.
  • 13. Lunatic Goats @PoliMi Online - Changes to ensemble ● Normalization type. ● Cutting for each user before items. ● Excluding slower algorithms - prompt push gives more exposure → better scores.
  • 14. Lunatic Goats @PoliMi Architecture & Runtime ● Recommender is run on VM’s with 8 cores and 16GB RAM. ● Only exception is content-based microclustering and iALS, run on 8 core 64GB RAM. ● Code is heavily optimized to use little memory efficiently (sparse matrix representations, efficient matrix operations). ● Results in optimal runtime.
  • 15. Lunatic Goats @PoliMi Scores - Local vs Offline Algorithm Local score Leaderboard score Execution time CBF+ 57852 60257 13 min CBF- -1330 -8529 4 min PM 17260 16777 7 min CF 42213 39250 12 min iALS 48081 52411 150 min XING Baseline 14742 14395 40 min Ensemble 60625 71372 2 min
  • 16. Lunatic Goats @PoliMi Results and Conclusions ● 2nd place in the online phase; ● 1st place in the offline phase. ● Points of strength: ○ speed (in particular offline ~20 min); ○ ease of implementation. ● Extensions: ○ feature weighting (user personalized, feature interaction); ○ time decay models.