Search plays an important role in online social networks as it provides an essential mechanism for discovering members and content on the network. Related search recommendation is one of several mechanisms used for improving members’ search experience in finding relevant results to their queries. This paper describes the design, implementation, and deployment of Metaphor, the related search recommendation system on LinkedIn, a professional social networking site with over 175 million members worldwide. Metaphor builds on a number of signals and filters that capture several dimensions of relatedness across member search activity.
The system, which has been in live operation for over a year, has gone through multiple iterations and evaluation cycles. This paper makes three contributions. First, we provide a discussion of a large-scale related search recommendation system. Second, we describe a mechanism for effectively combining several signals in building a unified dataset for related search recommendations. Third, we introduce a query length model for capturing bias in recommendation
click behavior. We also discuss some of the practical concerns in deploying related search recommendations.
1. 1
Related Searches at LinkedIn
Mitul Tiwari
Joint work with Azarias Reda, Yubin Park, Christian
Posse, and Sam Shah
LinkedIn
SIGIR Industrial Track 2013
4. 4
LinkedIn by the numbers
• 175M+ members
• 2+ new user registrations per second
• 4.2 Billion people searches in 2011
• 9.3 Billion page views in Q2 2012
• 100+ million monthly active users in Q2 2012
5. Broad Range of Products
Profile Search Hiring Solutions
People You May Know
Skills
News
6. 6
Related Searches at LinkedIn
• Millions of searches everyday
• Goal: Build related searches system at LinkedIn
• To help users to explore and refine their queries
11. 11
Design: Collaborative Filtering
• Searches correlated by time
‣ Searches done in the same session by the same user
‣ Collaborative filtering: implicit feedback
‣ TFIDF scoring to take care of popular queries (e.g. `Obama’)
Q1 Q2 Q3 Q4
Time
15. Design: Length Bias
• Insight: clicks on suggestions one term longer
• Corresponds to refining the initial query
• Statistical biasing model to score a longer query
higher
16. 18
Design: Ensemble Approach
• Need to generate unified recommendation dataset
• Analysis to figure out engagement of each signal
• Attempted ML approach
‣ Minimal overlap across different signals
17. 19
Design: Ensemble Approach
• Step-wise unionization
• Importance based on individual signal performance
‣ First, collaborative filter
‣ Second, queries correlated by query-result clicks
‣ Third, queries overlapping terms
18. 20
Design: Practical Considerations
• System designed for public consumption
‣ Strong profanity filters
‣ Need to deal with misspellings
‣ Languages
‣ Remove spammy search queries
30. 36
Details
• Metaphor: a System for Related Search
Recommendations, Azarias Reda, Yubin Park, Mitul
Tiwari, Christian Posse, and Sam Shah. In Proceedings
of the CIKM, 2012.
Editor's Notes
first context of related searches at LinkedIn
then design, implementation and evaluation of our related searches system
Slow down
Searches per second: 130, min: 8000, hour: 480000, day: 11.5M
Cut down
Research problems
discovery, exploration, refine
a screenshot of search result page
explore more
candidates and scoring
For example, web developer -> HTML
why collaborative filtering
elaborate
session replace to within a time window
Elaborate - across individual
put a real example
importance of each click
query fanout, popular result
mechanical engineer
across individual
highlight - next
show evaluation/analysis result about clicks on queries one term longer
skip the second equation
show evaluation/analysis result about clicks on queries one term longer
skip the second equation
describe signals
mention evaluation later
high level design
Kafka, Voldemort citations, url to Azkaban
more time here
scientific and easily repeatable
fast, iterative way for tuning parameters and performance
P-decreases with the size of window, R-increases with time window K
P/R low: predicting future searches, conservative measure; judging whether a signal can predict future behavior
CF has advantage in this measure
top-10 recommendations
why CTR is not the only metric: hadoop->mapreduce