Personalized Search and Job Recommendations with Relevancy Feedback

Personalized Search and Job Recommendations
Simon Hughes
Dice.com

01
Who Am I?
•  Chief Data Scientist at Dice.com and DHI under Yuri Bykov
•  Dice.com – leading US job board for IT professionals
•  Twitter handle: https://twitter.com/hughes_meister
•  Key projects
-  Dice recommender engines
-  Dice market value tool (salary predictor)
-  Dice career path tool
-  Dice skills pages
•  PhD
-  Phd Candidate at DePaul – ML and NLP
-  Thesis topic – Detecting causality in scientiﬁc explanatory essays

01
How Much Are Your Skills Worth?

Outline
•  Relevancy Feedback
•  The Rocchio Algorithm
•  Implementation Details
-  Open source Solr plugins
-  Naïve entity extraction
•  Use Cases
-  Conceptual / Semantic Search
-  Real-time recommendations
-  Personalized Search
-  Query and Filter Suggestions

01
Motivation
2 Main Problems In Achieving Relevancy
•  Polysemy - Words/phrases can have multiple meanings
-  Caused problems with Precision
-  Need to disambiguate query – determine query intent
•  Synonymy – multiple words/phrases with the same meaning
-  Causes problems with Recall
2 types of solution
•  Global Methods – adjust query based on analysis of entire index
-  Improve Precision – LTR, Probabilistic Query Parsing, Reinforcement learning, etc.
-  Improve Recall – Synonyms, Conceptual Search, Thesaurus/Ontology Learning
•  Local Methods – adjust a query relative to the documents that match
-  Improve Precision – Relevancy feedback
-  Improve Recall – Blind feedback

01
Relevancy Feedback
‘Supervised’ relevancy feedback uses information from the user’s
proﬁle or search behavior to adjust their search results to improve
relevancy.
‘Unsupervised Relevancy Feedback’ or ‘Blind Feedback’ instead uses
co-occurrence information in the search index to improve relevancy.
Both of these mechanisms can be implemented as simple Solr plugins
using the Rocchio Algorithm

01
Relevancy Feedback Process
1.  User issues a short query
2.  System returns an initial set of results
3.  The results are annotated as relevant / non-relevant
4.  The system computes a better query using this feedback
5.  Second (improved) result set is returned to the user

01
The Rocchio Algorithm
The Rocchio Algorithm is typically used for both forms of
relevancy feedback
•  A set of relevant documents are chosen for a given query
•  Using these documents, the query vector is computed that
represents the relevant documents
•  This vector is used to formulate a new query which is then
executed to produce a more relevant result set

01
The Rocchio Algorithm
Relevant Irrelevant Initial
Query
Revised Query
Before After

01
Supervised Feedback – Implicit vs Explicit
Explicit Feedback
•  User explicitly tells you what they want through some action
-  E.g. buys a product, applies for a job, rates a movie.
Implicit Feedback
•  User preferences can be inferred from user behavior
-  E.g. User views a web page, clicks on a search result, hovers their
mouse over an item, etc
•  Weak signal - additional data can be gathered to strengthen signal
-  E.g. Time spent on page, depth of navigation before clicking, etc.

01
Implementation Details
2 solr plugins, similar to the MLT handler
•  Configure a number of fields in Solr to do naïve entity extraction (see
later slides)
•  For a given query, extract all entities from these fields for each
document (using tf.idf score)
-  TF – number occurrences of each term in relevant documents
•  Pick the top k terms per field, weight by tf.idf score
•  Normalize each field to have unit length
•  Weight each term by normalized weighting, multiplied by field boost
•  In a number of internal tests, this improved the Mean Average
Precision over the standard MLT for job recommendations

01
Naïve Entity Extraction
Most documents are long, yet contain few useful content words
•  Rocchio algorithm works much better if it uses fields containing only
the most important entities or keywords in your domain
-  E.g. for dice, we extract job titles, and skills
•  Naïve entity extraction - configure a set of keywords and phrases to
extract score
•  Solutions
1.  SolrTextTagger – good 3rd party tool for entity extraction
2.  Use a sequence of synonym filters, followed by a type filter or
keep word filter. Very fast, synonym filter uses an FST

01
Analysis Chain (Simpliﬁed)

01
Supervised Feedback – Demo

01
Supervised Feedback – Uses Cases
•  Allow users to search from examples instead of just from a query
-  E.g. image search. Show them images matching initial query,
allow them to select which are more relevant
-  Users often ﬁnd it easier to show you examples of what they
want than forcing them into formulating a complex query
•  Personal recommendations based on browse history
-  E.g. Recommend jobs based on the last 5 jobs viewed

01
Blind Feedback
Uses the top–ranked results from the search to expand the search, and
improve recall
1.  Execute the query and take the results from the top n documents
(10-50)
2.  Extract the top k terms by tf.idf score (10-30)
3.  Use these terms to do query expansion, and re-execute the query
•  Also called “Pseudo-Relevancy Feedback”
•  Has been shown to be highly effective in some situations (see notes)
•  Has a performance penalty – query is executed twice
-  Can be partly mitigated by intelligent caching

01
Recommendations
Three Main Types of Recommender
•  Content Based
-  Uses information from user’s proﬁle to generate
recommendations
-  E.g. use a resume to ﬁnd matching jobs
•  Collaborative Filtering
-  Find similar documents to those they a user has liked previously
-  E.g. Find jobs similar to jobs they have applied to
•  Hybrid Recommender
-  Combines both approaches
All of these can be achieved in real-time using our plugin

01
Content Based Recommendations
Plugin is sent a content stream via a POST call
•  Entity extraction is performed by Solr in real-time
-  Extracts Jobs Titles
-  Extracts Skills
•  Query is formulated using top k terms, as before
•  Location based boost is applied using a boost query to boost
documents closed to the user’s location
Dice has a batch recommender algorithm that powers most of our
recommendations.
This plugin powers our real-time recommendations (new documents)

01
Content Based Recommendations - Demo

01
Collaboration Filtering Recommendations
Plugin is sent a query, listing the id’s of documents to match on
•  Top k terms are extracted across all documents
•  Recommendations are generated using the Rocchio algorithm
Use Cases
•  Recommendations from browse history (implicit)
-  Can work off cookies – if user not logged in
•  Recommendations from past purchases, applied jobs (explicit)

And ﬁnally… personalized search

01
From Relevancy Feedback to Personalized Search
•  We can use the query generated by the Relevancy Feedback
handler to personalize the search results using a boost query
•  Problem - user may be searching for documents that differ from
their apply history or their proﬁle (e.g. looking for a career change)
•  We want to personalize results only if the user’s query is related to
the personalization data we have for them

01
From Relevancy Feedback to Personalized Search
Hadoop Developer
Big Data
Hadoop
Java Developer
Java
JVM
Spring
Eclipse
IntelliJ
Hibernate
Hbase
SQL
MapReduce
HBase
Accountant Java Developer
Spring
Eclipse
IntelliJ
Hibernate
Auditor
Finance
Accounts
Payable
GAAP
TaxesHDFS
Oracle
Oracle
Related Queries Unrelated Queries
Java

01
Boost Query + High MM Threshold
•  Use the relevancy feedback query as a boost query
•  Set the boost query with high mm threshold – will only boost
documents that match most of the top k terms from plugin
q=+(Java Developer)^10
OR ((title:”Hadoop Developer” skills:”Cassandra” skills:”Big Data”
skills:”Hadoop”)~3)
q=“Java Developer”^10&bq={!edismax v=title:”Hadoop Developer”
skills:”Cassandra” skills:”Big Data” skills:”Hadoop” mm=-25% bq=}

01
Demo – 2 Unrelated Queries = No Personalization

01
Demo – 2 Related Queries = Personalization

01
Personalized Search - Use Cases
•  Content Based
-  Use the users’s proﬁle to generate the boost query
•  Collaborative Filtering (behavior based)
-  Use previously viewed documents
•  Hybrid
-  Do both
•  Based on previous search(es)
-  Use the blind feedback handler to generate boost query

01
Other Use Cases
•  Relevancy feedback
-  Use query expansion terms to produce ﬁlter suggestions
•  Blind feedback
-  Faceting terms are often dominated by common terms from the
least relevant documents (especially if an OR/should query)
-  Use query expansion terms from most relevant matches to
produce better terms to facet on
Enhancements
•  Relevancy feedback – use negative terms from negative examples
•  Blind feedback – only extract terms close to the query terms in the document
-  Has been shown to improve accuracy in some domains
-  Called the “positional relevance model” – see this paper

01
GitHub Repo
GitHubRepo - https://github.com/DiceTechJobs/RelevancyFeedback
•  Supports content streams and url’s
•  More Like ‘These’
-  Can generate recommendations from multiple documents
•  Algorithm improvements from core MLT handler
-  Top terms by field – prevents one field from dominating top terms
-  Normalizes terms within a field – smaller fields (e.g. job title) have equal weighting
•  Supports boost functions to boost recommendations
-  E.g. boost recommended jobs by distance from the user
•  Can add filter queries to both the resulting MLT query as well as the source query
•  Supports the mm parameter for MLT query
-  Ensures that all recommendations match at least x% of the top terms
•  Supports boosting individual terms using payloads

01
Useful References
•  “Modern Information Retrieval”, Chapter 10, Yates and Neto
-  From Berkeley
-  Free online version
•  "Introduction to Information Retrieval”, Chapter 9 – Manning, Raghavan and Schutze
-  From Stanford NLP group
-  Free online version
-  Amazon (hardcover)
Other Related Ideas
Attribute pivots
•  Uses decision trees and rule based learning to suggest query
reﬁnements to users
•  University of Texas has done some good work on attribute pivots
-  Has been shown to improve accuracy in some domains

Personalized Search and Job Recommendations with Relevancy Feedback

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Personalized Search and Job Recommendations with Relevancy Feedback

Similar to Personalized Search and Job Recommendations with Relevancy Feedback (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Personalized Search and Job Recommendations with Relevancy Feedback