Successfully reported this slideshow.
Your SlideShare is downloading. ×

Metaphor: A system for related searches recommendations

Metaphor: A system for related searches recommendations

Search plays an important role in online social networks as it provides an essential mechanism for discovering members and content on the network. Related search recommendation is one of several mechanisms used for improving members’ search experience in finding relevant results to their queries. This paper describes the design, implementation, and deployment of Metaphor, the related search recommendation system on LinkedIn, a professional social networking site with over 175 million members worldwide. Metaphor builds on a number of signals and filters that capture several dimensions of relatedness across member search activity.
The system, which has been in live operation for over a year, has gone through multiple iterations and evaluation cycles. This paper makes three contributions. First, we provide a discussion of a large-scale related search recommendation system. Second, we describe a mechanism for effectively combining several signals in building a unified dataset for related search recommendations. Third, we introduce a query length model for capturing bias in recommendation
click behavior. We also discuss some of the practical concerns in deploying related search recommendations.

Search plays an important role in online social networks as it provides an essential mechanism for discovering members and content on the network. Related search recommendation is one of several mechanisms used for improving members’ search experience in finding relevant results to their queries. This paper describes the design, implementation, and deployment of Metaphor, the related search recommendation system on LinkedIn, a professional social networking site with over 175 million members worldwide. Metaphor builds on a number of signals and filters that capture several dimensions of relatedness across member search activity.
The system, which has been in live operation for over a year, has gone through multiple iterations and evaluation cycles. This paper makes three contributions. First, we provide a discussion of a large-scale related search recommendation system. Second, we describe a mechanism for effectively combining several signals in building a unified dataset for related search recommendations. Third, we introduce a query length model for capturing bias in recommendation
click behavior. We also discuss some of the practical concerns in deploying related search recommendations.

Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Metaphor: A system for related searches recommendations

  1. 1. Metaphor: A System for Related Searches Recommendations Mitul Tiwari Joint work with Azarias Reda, Yubin Park, Christian Posse, and Sam Shah LinkedIn
  2. 2. Who am I
  3. 3. Outline • Related Searches at LinkedIn • Metaphor: A System for Related Searches Recommendations • Design • Implementation • Evaluation
  4. 4. LinkedIn by the numbers • 175M+ members • 2+ new user registrations per second • 4.2 Billion people searches in 2011 • 9.3 Billion page views in Q2 2012 • 100+ million monthly active users in Q2 2012
  5. 5. Related Searches at LinkedIn • Millions of searches everyday • Goal: Build related searches system at LinkedIn • To help users to explore and refine their queries Azarias Reda, Yubin Park, Mitul Tiwari, Christian Posse, and Sam Shah
  6. 6. Related Searches at LinkedIn
  7. 7. Metaphor: A Related Searches System • Design • Implementation • Evaluation
  8. 8. Metaphor: A Related Searches System • Design • Implementation • Evaluation
  9. 9. Design • Signals • Collaborative Filtering • Query-Result Click graph • Overlapping terms • Length-bias • Ensemble approach for unified recommendation • Practical considerations
  10. 10. Design: Collaborative Filtering • Searches correlated by time • Searches done in the same session by the same user • Collaborative filtering: implicit feedback • TFIDF scoring to take care of popular queries (e.g. `Obama’) Q1 Q2 Q3 Time
  11. 11. Design: Query-Result Clicks • Searches correlated by result clicks Q1 Qn R1 Rm
  12. 12. Design: Overlapping Terms • Searches with overlapping terms • TFIDF scoring to give importance to terms Software Developer Software Engineer Q1 Q2
  13. 13. Design: Length Bias • Insight: clicks on suggestions one term longer
  14. 14. Design: Length Bias • Insight: clicks on suggestions one term longer • Corresponds to refining the initial query • Statistical biasing model to score a longer query higher
  15. 15. Design: Ensemble Approach • Need to generate unified recommendation dataset • Analysis to figure out engagement of each signal • Attempted ML approach • Minimal overlap across different signals
  16. 16. Design: Ensemble Approach • Step-wise unionization • Importance based on individual signal performance • First, collaborative filtering • Second, queries correlated by query-result clicks • Third, queries overlapping terms
  17. 17. Design: Practical Considerations • System designed for public consumption • Strong profanity filters • Need to deal with misspellings • Languages • Remove spammy search queries
  18. 18. Metaphor: A Related Searches System • Design • Implementation • Evaluation
  19. 19. Implementation Challenge • Scale • 175M+ members • Billions of searches • Terabytes of data to process
  20. 20. Implementation • Kafka: publish-subscribe messaging system • Hadoop: MapReduce data processing system • Azkaban: Hadoop workflow management tool • Voldemort: Key-value store
  21. 21. Implementation: Workflow
  22. 22. Metaphor: A Related Searches System • Design • Implementation • Evaluation
  23. 23. Evaluation • Performance of each signal and combination • How does the system scale?
  24. 24. Evaluation Cont’d • Offline evaluation • Precision-Recall • Online evaluation • A/B testing to measure engagement • Performance evaluation
  25. 25. Offline Evaluation • Correct set: set of searches performed by a user in the following K minutes, here K=10
  26. 26. Online Evaluation • Used A/B testing • Metrics • Coverage: queries with recommendations • Impressions: # of recommendations shown • Clicks: Clicks on recommendations • Click-through rate (CTR): Clicks per impression
  27. 27. Online Evaluation
  28. 28. Evaluation: System Runtime
  29. 29. Selected References • R. Baeza-Yates. Graphs from search engine queries. LNCS, 4362:1–8, 2007. • P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query- flow graphs. In Proceedings of the WSDM, 2009. • M. A. Hasan, N. Parikh, B. Singh, and N. Sundaresan. Query suggestion for E-commerce sites. In Proceedings of the WSDM, 2011. • Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In Proceedings of the CIKM, 2008. • Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In Proceedings of the WWW, 2006.
  30. 30. Questions?

Editor's Notes

  • first context of related searches at LinkedIn
    then design, implementation and evaluation of our related searches system
  • Slow down
    Searches per second: 130, min: 8000, hour: 480000, day: 11.5M
    Cut down
  • discovery, exploration, refine
  • a screenshot of search result page
  • explore more
    candidates and scoring
  • For example, web developer -> HTML
    why collaborative filtering
    elaborate
    session replace to within a time window
  • Elaborate - across individual
    put a real example
    importance of each click
    query fanout, popular result
  • mechanical engineer
    across individual
  • highlight - next
  • show evaluation/analysis result about clicks on queries one term longer
    skip the second equation
  • describe signals
  • mention evaluation later
  • high level design
    Kafka, Voldemort citations, url to Azkaban
  • high level design
    Kafka, Voldemort citations, url to Azkaban
  • scientific and easily repeatable
    fast, iterative way for tuning parameters and performance
    P-decreases with the size of window, R-increases with time window K
    P/R low: predicting future searches, conservative measure; judging whether a signal can predict future behavior
    CF has advantage in this measure
    top-10 recommendations
  • why CTR is not the only metric: hadoop->mapreduce
  • normalized
    legends: elaborate CF, QRQ, partial
    break down figures: bigger chart
  • for all possible queries
    quadratic why?
    80 nodes 2 quad-core cpus: 640 cores
  • questions, details, hiring

×