SlideShare a Scribd company logo
Recommending Job Ads 
to People 
Fabian Abel, http://xing.com 
December 2014 @wisdelft & @RecSysNL
Job recommendations
Recommender REST Service 
Returns a ranked list of items 
Request:: 
GET /rest/recommendations/jobs/user/4320245 
4 
User ID 
Response: 
[ 
{“item”: “urn:x-xing:job:5463323”, “score”: 0.87, “reasons”: […]}, 
{“item”: “urn:x-xing:job:5463267”, “score”: 0.87, “reasons”: […]}, 
{“item”: “urn:x-xing:job:5464812”, “score”: 0.87, “reasons”: […]}, 
{“item”: “urn:x-xing:job:5462781”, “score”: 0.87, “reasons”: […]}, 
]
Deployment infrastructure 
Infrastructure on which we deploy recommender services 
5 
Search 
indices 
XING 
Sources 
/ 
XING 
services 
MySQL 
NoSQL 
live 
updates 
Batch 
processing 
batch 
updates 
Infrastructure 
for 
recommenders 
Recommender REST service 
XING 
Products
RecSys Infrastructure 
Infrastructure on which we deploy recommender services 
6 
Search 
indices 
XING 
Sources 
/ 
XING 
services 
MySQL 
NoSQL 
live 
updates 
Batch 
processing 
batch 
updates 
Infrastructure 
for 
recommenders 
Recommender REST service 
XING 
Products
Pointers 
7 
Technologies 
• Scala, Play, Akka: https://playframework.com/ - https://typesafe.com/ (à 
start e.g. with some Activator template: https://typesafe.com/activator) 
• Elasticsearch: http://www.elasticsearch.org/ (à start e.g. with “Getting 
started” on https://github.com/elasticsearch/elasticsearch) 
• Hadoop & Co., e.g. Hortonworks distributionà e.g. 
https://github.com/hortonworks/hadoop-tutorials): 
• Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 
• Spark: https://spark.apache.org/ 
• Deployment, e.g.: https://www.docker.com/
Building/enhancing a recommender service 
Typical steps within the development cycle 
8 
1. Analyze (click) data to identify potential features. 
Technologies: Hive, R, Excel 
2. Implement batch jobs for creating “indexes” that build the basis for the chosen 
features. 
Technologies: Hive, MapReduce, Oozie, Elasticsearch, Cassandra, MySQL 
3. Implement service logic that queries the “indexes” and implements the actual 
recommender interface (à def getRecos(userId: Long): Future[Seq[Reco[T]]]). 
Technologies: Scala, Play framework 
4. Analysis and offline evaluation of recommender: quantitative statistics, manual 
assesment, leave-n-out-evaluation. 
Technologies: R, Scala 
5. Deployment and A/B test phase with in-depth analyses. 
Technologies: R, Hive, Scala, Excel
Job Recommendation 
Framework
Challenge 
Identifying job postings that match the demands of the user and company 
10 
… User 
0.92 0.8 0.76 
Job 
Recommender 
Companies Job postings
Key properties of a job posting 
Job postings on XING 
11 
Title 
Company 
Employment type 
and career level 
Full-text 
description
Key properties of a job posting 
Job postings on XING 
12 
Employees of the 
company 
Social benefits 
Location
Key sources for understanding user demands 
Exploiting patterns that are found in the data(graph) 
13 
Social Network 
explicit and 
implicit 
connections 
Senior Data 
Scientist 
Data 
Strategist Manager 
Data Scientist 
Engineer 
Researcher 
Careerpath 
next career 
options 
Profile 
Fabian Abel 
Data Scientist 
Haves: 
Interests: 
web science 
big data, hadoop skills & co. 
data 
web 
social media 
Interactions 
clicks, shares, 
ratings 
big data 
kununu
Relevance Estimation 
Final relevance score of an item is obtained by combining the 
scores coming from the “sub-recommenders” (= features) 
14 
Social Network 
explicit and 
implicit 
connections 
Senior Data 
Scientist 
Data 
Strategist Manager 
Data Scientist 
Engineer 
Researcher 
Careerpath 
next career 
options 
Profile 
Fabian Abel 
Data Scientist 
Haves: 
Interests: 
web science 
big data, hadoop skills & co. 
data 
web 
social media 
Interactions 
clicks, shares, 
ratings 
big data 
kununu 
Content-based 
features 
Knowledge 
graph 
features 
Collaborative 
features 
Usage 
behavior 
features 
Relevance 
Estimation 
(regression model) 
Logistic Regression 
P(relevant | x) = 
1 
1 + e - (b0 + bi xi) i n 
feature vector impact of feature xi
Learning the model for 
relevance estimation 
1 + e - (b0 + bi xi) i n 
15 
user 
i 
(was 
confronted 
with 
item 
x 
at 
Jme 
t) 
x1: 
profile 
match 
x2: 
interestg 
graph 
match 
P(relevant | x) = 
x3: 
cf 
score 
x4: 
LocaJon 
match 
x5: 
career 
level 
match 
1 
... 
relevant? 
u1 
(t1) 
0.87 
0 
0.81 
1.0 
0.75 
... 
0 
u2 
(t2) 
0.0 
0.76 
0.61 
0.15 
1.0 
... 
1 
u3 
(t3) 
0.13 
0.42 
0 
0 
1.0 
... 
0 
u4 
(t4) 
0.5 
0 
0 
0.0 
0.0 
... 
1 
... 
... 
... 
... 
... 
... 
... 
... 
Model: 
b1 
= 
0.12 
b2 
= 
0.05 
b2= 
0.5 
b3 
= 
0.8 
b4 
= 
0.65 
.. 
b0 
=1.43 
... 
... 
... 
... 
... 
... 
... 
u2 
(t5) 
0.0 
0.76 
0.61 
0.15 
1.0 
... 
0 
u3 
(t6) 
0.15 
0.67 
0.9 
1.0 
1.0 
... 
1 
... 
... 
... 
... 
... 
... 
... 
...
Learning the model for 
relevance estimation 
1 + e - (b0 + bi xi) i n 
correct? 
16 
user 
i 
(was 
confronted 
with 
item 
x 
at 
Jme 
t) 
x1: 
profile 
match 
x2: 
interestg 
graph 
match 
P(relevant | x) = 
x3: 
cf 
score 
x4: 
LocaJon 
match 
x5: 
career 
level 
match 
1 
... 
relevant? 
u1 
(t1) 
0.87 
0 
0.81 
1.0 
0.75 
... 
0 
u2 
(t2) 
Training 
0.0 
data 
0.76 
0.61 
0.15 
1.0 
... 
1 
u3 
(t3) 
0.13 
0.42 
0 
0 
1.0 
... 
0 
u4 
(t4) 
0.5 
0 
0 
0.0 
0.0 
... 
1 
... 
... 
... 
... 
... 
... 
... 
... 
Model: 
b1 
= 
0.12 
b2 
= 
0.05 
b2= 
0.5 
b3 
= 
0.8 
b4 
= 
0.65 
.. 
b0 
=1.43 
... 
... 
... 
... 
... 
... 
... 
u2 
(t5) 
0.0 
0.76 
0.61 
0.15 
1.0 
... 
0 
u3 
(t6) 
Test 
& 
valida0.15 
Jon 
data 
0.67 
0.9 
1.0 
1.0 
... 
1 
... 
... 
... 
... 
... 
... 
... 
... 
! 
Model 
with 
highest 
predic4on 
accuracy 
wins 
the 
game
Relevance Estimation 
Learnt model (= parameters of the function) are used to compute the 
combined relevance scores 
17 
Social Network 
… User 
0.92 0.8 0.76 
explicit and 
implicit 
connections 
Senior Data 
Scientist 
Data 
Strategist Manager 
Data Scientist 
Engineer 
Researcher 
Careerpath 
next career 
options 
Profile 
Fabian Abel 
Data Scientist 
Haves: 
Interests: 
web science 
big data, hadoop skills & co. 
data 
web 
social media 
Interactions 
clicks, shares, 
ratings 
big data 
kununu 
Content-based 
features 
Knowledge 
graph 
features 
Collaborative 
features 
Usage 
behavior 
features 
Relevance 
Estimation 
(regression model) 
Job postings
Relevance Estimation + Additional Filters 
Filtering (rules) may dampen the relevance scores or filter out items 
18 
Content-based 
features 
Knowledge 
graph 
features 
Collaborative 
features 
Usage 
behavior 
features 
Job Reco. 
Service 
Relevance 
Estimation 
(regression model) 
Location-based 
filtering 
Social 
filtering 
Filter out 
declined/ 
known 
Career Level 
filtering 
Filter
Mining for features 
Example: career path mining
Properties of career path steps 
Analyzing the CV (= professional experience + education) 
20 
Professional experience 
Job title 
Time range 
Company 
Career level Industry
Properties of career path steps 
Analyzing the CV (= professional experience + education) 
21 
Professional experience 
Job title 
Time range 
Company 
Career level Industry 
Educational Background 
Time range University 
Study Degree 
program
Career path transitions 
Understanding transitions in the career path graph 
22 
Data Scientist 
Machine Learning Expert 
J2EE Developer 
Web Developer 
MSc Computer Science 
CV: 
Web Developer 
MSc Computer 
Science 
Data Scientist 
J2EE 
Developer 
Machine 
Learning 
Expert
Career path transitions 
Understanding transitions in the career path graph 
23 
Data Scientist 
Data Scientist 
Machine Learning Expert 
Machine Learning Expert 
J2EE Developer 
Web Developer 
MSc Computer Science 
CV: 
Web Developer 
MSc Computer 
Science 
Data Scientist 
J2EE 
Developer 
Machine 
Learning 
Expert 
Data Scientist 
Machine Learning Expert 
PhD Data Mining 
CV: 
J2EE Developer 
MSc Computer Science 
CV: 
PhD Data 
Mining
Career path graph 
Weighted directed graph with different types of nodes (job roles, education) 
Association rule mining for constructing 
24 
the career path graph: 
• Association rules (= edges): 
Job role A à Job role B 
Education X à Job Role Y 
... 
• Minimum support (e.g. at least k 
transitions with A and B have to occur in 
the data) 
• Minimum confidence (= probability(B | A) 
= weights of edges) 
Web Developer 
MSc Computer 
Science 
Data Scientist 
J2EE 
Developer 
Machine 
Learning 
Expert 
Similarly, 
graphs 
are 
constructed 
for: 
PhD Data 
Mining 
Industry 
X 
à 
Industry 
Y 
Job 
role 
Aà 
Company 
C 
... 
Thresholds 
for 
min-­‐support 
and 
min-­‐ 
confidence 
need 
to 
be 
learned 
(ideally 
individually 
per 
“industry 
segment”)
Career path hypergraph 
Mining association rules with multiple premises 
Association rule mining for constructung the 
25 
career path graph: 
• Association rules with more than one 
premise: 
Job role A, Job role B à Job role C 
Education X, Job role A à Job Role B 
• Minimum support (e.g. at least k transitions 
with A and B and C have to occur in the data) 
• Minimum confidence (= probability(C | A,B)) 
Data Scientist 
Machine 
Learning 
Expert 
PhD Data 
Mining 
J2EE 
Developer
Inferring Features from Career path graph(s) 
Probabilities that the job role is appropriate for the user 
26 
Data 
Scientist 
Job posting 
Machine 
Learning 
Expert 
PhD Data 
Mining 
User 
Features: 
P( D at a | Machine 
Learning 
, ) 
Scientist 
Expert 
F2: PhD Data = 0.79 
Mining 
P( D at a | Machine 
Learning 
) 
Scientist 
Expert 
F1: = 0.52 
P( D at a | Machine 
Learning 
, ) 
Scientist 
Expert 
F3: 5 years = 0.6 
experience 
… 
Career 
path 
graph
27 
futureme.xing.com 
spin-futureme.off project which xing.allows com 
for 
browsing the career-path graph
futureme.xing.com 
28
Evaluation of recommender 
systems
Evaluation of Recommender Systems 
Metrics & Methods 
Compare 
30 
Key Metrics: Precision@k, “Success-Rates@k” (e.g. CTR) 
Evaluation methods: 
• Quantitative statistics: count total number of recommendations per user, 
measure overlap between old and new recommender, … 
• Ad-hoc assessments: UI for assessing the quality of individual 
recommendations and comparing two lists 
• Leave-n-out cross validation: hide parts of the historic data, run the 
recommender and try to predict the “hidden data” 
• A/B testing: x% of users are served with strategy A, 100-x% with strategy 
Leave-­‐n-­‐out 
cross 
valida3on 
Track 
history 
of 
interacJons: 
User 
X 
performs 
AcJon 
Y: 
Training 
Data 
ValidaJon 
Data 
Strategy X 
Strategy Y 
Measure 
Prediction X 
Measure 
Prediction Y 
A/B 
Tes3ng 
Strategy X 
Strategy Y 
KPIs X 
KPIs Y 
Users 
Group 
A 
Group 
B
Is Strategy A better than strategy B? 
Based 
on 
1 
Million 
samples 
31 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
A 
B 
Success@4 
Success@20 
Based 
on 
1000 
samples
Significance tests 
Example: t-tests 
Null Hypothesis H0: performance 
of strategy A = performance of 
strategy B 
Alternative H1: performance of A 
> performance of B 
32 
t-­‐test 
computes 
p-­‐value: 
probability 
that 
the 
staJsJcal 
result 
is 
– 
under 
the 
given 
null 
hypothesis 
– 
at 
least 
as 
extreme 
as 
the 
one 
that 
was 
observed 
! 
reject 
H0 
with 
p-­‐value 
< 
α 
(= 
significance 
level)
Significance tests 
Example: t-tests 
Null Hypothesis H0: performance 
of strategy A = performance of 
strategy B 
Alternative H1: performance of A 
> performance of B 
33 
Beware 
of 
p-­‐score 
hacking, 
e.g.: 
“I 
found 
a 
metric 
where 
we 
have 
a 
significant 
improvement 
(with 
p 
< 
0.05)” 
t-­‐test 
computes 
p-­‐value: 
probability 
that 
the 
staJsJcal 
result 
is 
– 
under 
the 
given 
null 
hypothesis 
– 
at 
least 
as 
extreme 
as 
the 
one 
that 
was 
observed 
! 
reject 
H0 
with 
p-­‐value 
< 
α 
(= 
significance 
level)
Challenges regarding the evaluation 
Understanding the performance of a recommender system is not easy 
34 
Challenges: 
• Tracking clicks/interactions, e.g. not all recommendation clicks/interactions can 
easily be tracked (e.g. email, third party apps are not properly tracked) 
• Changes on the platform, e.g. A/B tests in the UI, marketing campaigns, etc. 
may impact the perfrormance of the recommender 
• “Novelty” effect: if recommendations change strongly then users are curious and 
interact with recommendations, but after a while the curiosity may drop again 
which leads again to decreasing click-through-rates 
• “Position bias”: top-ranked recommendations are more likely to be clicked “by 
defaut” 
• “p-score hacking”: http://en.wikipedia.org/wiki/P-hacking
The professional network 
www.xing.com 
Thank you 
@fabianabel 
xing.com 
futureme.xing.com

More Related Content

Similar to Recommending job ads to people

DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
Yong Siang (Ivan) Tan
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
lucenerevolution
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Sease
 

Similar to Recommending job ads to people (20)

Machine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketMachine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job Market
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 
Interaction-Based Feature Extraction: How to Convert Your Users’ Activity int...
Interaction-Based Feature Extraction: How to Convert Your Users’ Activity int...Interaction-Based Feature Extraction: How to Convert Your Users’ Activity int...
Interaction-Based Feature Extraction: How to Convert Your Users’ Activity int...
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Machine learning workshop @DYP Pune
Machine learning workshop @DYP PuneMachine learning workshop @DYP Pune
Machine learning workshop @DYP Pune
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Relational algebra
Relational algebraRelational algebra
Relational algebra
 
Telecom datascience master_public
Telecom datascience master_publicTelecom datascience master_public
Telecom datascience master_public
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Week2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptxWeek2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptx
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 
DSDT meetup July 2021
DSDT meetup July 2021DSDT meetup July 2021
DSDT meetup July 2021
 
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learning
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 

Recently uploaded

THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursing
Jocelyn Atis
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
Sérgio Sacani
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
sreddyrahul
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
Sérgio Sacani
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
Sérgio Sacani
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 

Recently uploaded (20)

Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursing
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Transport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSETransport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSE
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of Bengal
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)
 
electrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptxelectrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptx
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 

Recommending job ads to people

  • 1. Recommending Job Ads to People Fabian Abel, http://xing.com December 2014 @wisdelft & @RecSysNL
  • 2.
  • 4. Recommender REST Service Returns a ranked list of items Request:: GET /rest/recommendations/jobs/user/4320245 4 User ID Response: [ {“item”: “urn:x-xing:job:5463323”, “score”: 0.87, “reasons”: […]}, {“item”: “urn:x-xing:job:5463267”, “score”: 0.87, “reasons”: […]}, {“item”: “urn:x-xing:job:5464812”, “score”: 0.87, “reasons”: […]}, {“item”: “urn:x-xing:job:5462781”, “score”: 0.87, “reasons”: […]}, ]
  • 5. Deployment infrastructure Infrastructure on which we deploy recommender services 5 Search indices XING Sources / XING services MySQL NoSQL live updates Batch processing batch updates Infrastructure for recommenders Recommender REST service XING Products
  • 6. RecSys Infrastructure Infrastructure on which we deploy recommender services 6 Search indices XING Sources / XING services MySQL NoSQL live updates Batch processing batch updates Infrastructure for recommenders Recommender REST service XING Products
  • 7. Pointers 7 Technologies • Scala, Play, Akka: https://playframework.com/ - https://typesafe.com/ (à start e.g. with some Activator template: https://typesafe.com/activator) • Elasticsearch: http://www.elasticsearch.org/ (à start e.g. with “Getting started” on https://github.com/elasticsearch/elasticsearch) • Hadoop & Co., e.g. Hortonworks distributionà e.g. https://github.com/hortonworks/hadoop-tutorials): • Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF • Spark: https://spark.apache.org/ • Deployment, e.g.: https://www.docker.com/
  • 8. Building/enhancing a recommender service Typical steps within the development cycle 8 1. Analyze (click) data to identify potential features. Technologies: Hive, R, Excel 2. Implement batch jobs for creating “indexes” that build the basis for the chosen features. Technologies: Hive, MapReduce, Oozie, Elasticsearch, Cassandra, MySQL 3. Implement service logic that queries the “indexes” and implements the actual recommender interface (à def getRecos(userId: Long): Future[Seq[Reco[T]]]). Technologies: Scala, Play framework 4. Analysis and offline evaluation of recommender: quantitative statistics, manual assesment, leave-n-out-evaluation. Technologies: R, Scala 5. Deployment and A/B test phase with in-depth analyses. Technologies: R, Hive, Scala, Excel
  • 10. Challenge Identifying job postings that match the demands of the user and company 10 … User 0.92 0.8 0.76 Job Recommender Companies Job postings
  • 11. Key properties of a job posting Job postings on XING 11 Title Company Employment type and career level Full-text description
  • 12. Key properties of a job posting Job postings on XING 12 Employees of the company Social benefits Location
  • 13. Key sources for understanding user demands Exploiting patterns that are found in the data(graph) 13 Social Network explicit and implicit connections Senior Data Scientist Data Strategist Manager Data Scientist Engineer Researcher Careerpath next career options Profile Fabian Abel Data Scientist Haves: Interests: web science big data, hadoop skills & co. data web social media Interactions clicks, shares, ratings big data kununu
  • 14. Relevance Estimation Final relevance score of an item is obtained by combining the scores coming from the “sub-recommenders” (= features) 14 Social Network explicit and implicit connections Senior Data Scientist Data Strategist Manager Data Scientist Engineer Researcher Careerpath next career options Profile Fabian Abel Data Scientist Haves: Interests: web science big data, hadoop skills & co. data web social media Interactions clicks, shares, ratings big data kununu Content-based features Knowledge graph features Collaborative features Usage behavior features Relevance Estimation (regression model) Logistic Regression P(relevant | x) = 1 1 + e - (b0 + bi xi) i n feature vector impact of feature xi
  • 15. Learning the model for relevance estimation 1 + e - (b0 + bi xi) i n 15 user i (was confronted with item x at Jme t) x1: profile match x2: interestg graph match P(relevant | x) = x3: cf score x4: LocaJon match x5: career level match 1 ... relevant? u1 (t1) 0.87 0 0.81 1.0 0.75 ... 0 u2 (t2) 0.0 0.76 0.61 0.15 1.0 ... 1 u3 (t3) 0.13 0.42 0 0 1.0 ... 0 u4 (t4) 0.5 0 0 0.0 0.0 ... 1 ... ... ... ... ... ... ... ... Model: b1 = 0.12 b2 = 0.05 b2= 0.5 b3 = 0.8 b4 = 0.65 .. b0 =1.43 ... ... ... ... ... ... ... u2 (t5) 0.0 0.76 0.61 0.15 1.0 ... 0 u3 (t6) 0.15 0.67 0.9 1.0 1.0 ... 1 ... ... ... ... ... ... ... ...
  • 16. Learning the model for relevance estimation 1 + e - (b0 + bi xi) i n correct? 16 user i (was confronted with item x at Jme t) x1: profile match x2: interestg graph match P(relevant | x) = x3: cf score x4: LocaJon match x5: career level match 1 ... relevant? u1 (t1) 0.87 0 0.81 1.0 0.75 ... 0 u2 (t2) Training 0.0 data 0.76 0.61 0.15 1.0 ... 1 u3 (t3) 0.13 0.42 0 0 1.0 ... 0 u4 (t4) 0.5 0 0 0.0 0.0 ... 1 ... ... ... ... ... ... ... ... Model: b1 = 0.12 b2 = 0.05 b2= 0.5 b3 = 0.8 b4 = 0.65 .. b0 =1.43 ... ... ... ... ... ... ... u2 (t5) 0.0 0.76 0.61 0.15 1.0 ... 0 u3 (t6) Test & valida0.15 Jon data 0.67 0.9 1.0 1.0 ... 1 ... ... ... ... ... ... ... ... ! Model with highest predic4on accuracy wins the game
  • 17. Relevance Estimation Learnt model (= parameters of the function) are used to compute the combined relevance scores 17 Social Network … User 0.92 0.8 0.76 explicit and implicit connections Senior Data Scientist Data Strategist Manager Data Scientist Engineer Researcher Careerpath next career options Profile Fabian Abel Data Scientist Haves: Interests: web science big data, hadoop skills & co. data web social media Interactions clicks, shares, ratings big data kununu Content-based features Knowledge graph features Collaborative features Usage behavior features Relevance Estimation (regression model) Job postings
  • 18. Relevance Estimation + Additional Filters Filtering (rules) may dampen the relevance scores or filter out items 18 Content-based features Knowledge graph features Collaborative features Usage behavior features Job Reco. Service Relevance Estimation (regression model) Location-based filtering Social filtering Filter out declined/ known Career Level filtering Filter
  • 19. Mining for features Example: career path mining
  • 20. Properties of career path steps Analyzing the CV (= professional experience + education) 20 Professional experience Job title Time range Company Career level Industry
  • 21. Properties of career path steps Analyzing the CV (= professional experience + education) 21 Professional experience Job title Time range Company Career level Industry Educational Background Time range University Study Degree program
  • 22. Career path transitions Understanding transitions in the career path graph 22 Data Scientist Machine Learning Expert J2EE Developer Web Developer MSc Computer Science CV: Web Developer MSc Computer Science Data Scientist J2EE Developer Machine Learning Expert
  • 23. Career path transitions Understanding transitions in the career path graph 23 Data Scientist Data Scientist Machine Learning Expert Machine Learning Expert J2EE Developer Web Developer MSc Computer Science CV: Web Developer MSc Computer Science Data Scientist J2EE Developer Machine Learning Expert Data Scientist Machine Learning Expert PhD Data Mining CV: J2EE Developer MSc Computer Science CV: PhD Data Mining
  • 24. Career path graph Weighted directed graph with different types of nodes (job roles, education) Association rule mining for constructing 24 the career path graph: • Association rules (= edges): Job role A à Job role B Education X à Job Role Y ... • Minimum support (e.g. at least k transitions with A and B have to occur in the data) • Minimum confidence (= probability(B | A) = weights of edges) Web Developer MSc Computer Science Data Scientist J2EE Developer Machine Learning Expert Similarly, graphs are constructed for: PhD Data Mining Industry X à Industry Y Job role Aà Company C ... Thresholds for min-­‐support and min-­‐ confidence need to be learned (ideally individually per “industry segment”)
  • 25. Career path hypergraph Mining association rules with multiple premises Association rule mining for constructung the 25 career path graph: • Association rules with more than one premise: Job role A, Job role B à Job role C Education X, Job role A à Job Role B • Minimum support (e.g. at least k transitions with A and B and C have to occur in the data) • Minimum confidence (= probability(C | A,B)) Data Scientist Machine Learning Expert PhD Data Mining J2EE Developer
  • 26. Inferring Features from Career path graph(s) Probabilities that the job role is appropriate for the user 26 Data Scientist Job posting Machine Learning Expert PhD Data Mining User Features: P( D at a | Machine Learning , ) Scientist Expert F2: PhD Data = 0.79 Mining P( D at a | Machine Learning ) Scientist Expert F1: = 0.52 P( D at a | Machine Learning , ) Scientist Expert F3: 5 years = 0.6 experience … Career path graph
  • 27. 27 futureme.xing.com spin-futureme.off project which xing.allows com for browsing the career-path graph
  • 30. Evaluation of Recommender Systems Metrics & Methods Compare 30 Key Metrics: Precision@k, “Success-Rates@k” (e.g. CTR) Evaluation methods: • Quantitative statistics: count total number of recommendations per user, measure overlap between old and new recommender, … • Ad-hoc assessments: UI for assessing the quality of individual recommendations and comparing two lists • Leave-n-out cross validation: hide parts of the historic data, run the recommender and try to predict the “hidden data” • A/B testing: x% of users are served with strategy A, 100-x% with strategy Leave-­‐n-­‐out cross valida3on Track history of interacJons: User X performs AcJon Y: Training Data ValidaJon Data Strategy X Strategy Y Measure Prediction X Measure Prediction Y A/B Tes3ng Strategy X Strategy Y KPIs X KPIs Y Users Group A Group B
  • 31. Is Strategy A better than strategy B? Based on 1 Million samples 31 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 A B Success@4 Success@20 Based on 1000 samples
  • 32. Significance tests Example: t-tests Null Hypothesis H0: performance of strategy A = performance of strategy B Alternative H1: performance of A > performance of B 32 t-­‐test computes p-­‐value: probability that the staJsJcal result is – under the given null hypothesis – at least as extreme as the one that was observed ! reject H0 with p-­‐value < α (= significance level)
  • 33. Significance tests Example: t-tests Null Hypothesis H0: performance of strategy A = performance of strategy B Alternative H1: performance of A > performance of B 33 Beware of p-­‐score hacking, e.g.: “I found a metric where we have a significant improvement (with p < 0.05)” t-­‐test computes p-­‐value: probability that the staJsJcal result is – under the given null hypothesis – at least as extreme as the one that was observed ! reject H0 with p-­‐value < α (= significance level)
  • 34. Challenges regarding the evaluation Understanding the performance of a recommender system is not easy 34 Challenges: • Tracking clicks/interactions, e.g. not all recommendation clicks/interactions can easily be tracked (e.g. email, third party apps are not properly tracked) • Changes on the platform, e.g. A/B tests in the UI, marketing campaigns, etc. may impact the perfrormance of the recommender • “Novelty” effect: if recommendations change strongly then users are curious and interact with recommendations, but after a while the curiosity may drop again which leads again to decreasing click-through-rates • “Position bias”: top-ranked recommendations are more likely to be clicked “by defaut” • “p-score hacking”: http://en.wikipedia.org/wiki/P-hacking
  • 35. The professional network www.xing.com Thank you @fabianabel xing.com futureme.xing.com