Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Search Quality at LinkedIn

19,648 views

Published on

Presented to the Bay Area Search Meetup on February 26, 2014

http://www.meetup.com/Bay-Area-Search/events/136150622/

At LinkedIn, we face a number of challenges in delivering high quality search results to 277M+ members. Our results are highly personalized, requiring us to build machine-learned relevance models that combine document, query, and user features. And our emphasis on entities (names, companies, job titles, etc.) affects how we process and understand queries. In this talk, we'll talk about these challenges in detail, and we'll describe some of the solutions we are building to address them.

Speakers:

Satya Kanduri has worked on LinkedIn search relevance since 2011. Most recently he led the development of LinkedIn's machine-learned ranking platform. He previously worked at Microsoft, improving relevance for Bing Product Search. He has an MS in Computer Science from the University of Nebraska - Lincoln, and a BE in Computer Science from the Osmania University College of Engineering.

Abhimanyu Lad has worked at LinkedIn as a software engineer and data scientist since 2011. He has worked on a variety of relevance and query understanding problems, including query intent prediction, query suggestion, and spelling correction. He has a PhD in Computer Science from CMU, where he worked on developing machine learning techniques for diversifying search results.

Published in: Technology

Search Quality at LinkedIn

  1. Search Quality at LinkedIn Abhimanyu Lad Senior Software Engineer Recruiting Solutions Satya Kanduri Senior Software Engineer
  2. verticals: people, jobs intent: exploratory tag: skill OR title related skills: search, ranking, … tag: company id: 1337 industry: internet 2
  3. SEARCH USE CASES How do people use LinkedIn’s search? 3
  4. PEOPLE SEARCH Search for people by name 4
  5. PEOPLE SEARCH Search for people by other attributes 5
  6. EXPLORATORY PEOPLE SEARCH 6
  7. JOB SEARCH 7
  8. COMPANY SEARCH 8
  9. AND MUCH MORE… 9
  10. OUR GOAL  Universal Search – Single search box  High Recall – Spelling correction, synonym expansion, …  High Precision – Entity-oriented search: match things, not strings 10
  11. QUERY UNDERSTANDING PIPELINE 11
  12. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 12
  13. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 13
  14. SPELLING CORRECTION Fix obvious typos Help users spell names 14
  15. SPELLING OUT THE DETAILS N-grams marissa => ma ar ri is ss sa Metaphone PEOPLE NAMES COMPANIES TITLES mark/marc => MRK Co-occurrence counts PAST QUERIES marissa:mayer = 1000 marisa meyer yahoo marissa meyer marisa yahoo mayer 15
  16. SPELLING OUT THE DETAILS PROBLEM: Corpus as well as query logs contain many spelling errors Certain spelling errors are quite frequent While genuine words (especially names) might be infrequent 16
  17. SPELLING OUT THE DETAILS PROBLEM: Corpus as well as query logs contain many spelling errors SOLUTION: Use query chains to infer correct spelling [product manger] [marissa mayer] [product manager] CLICK CLICK 17
  18. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 18
  19. QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE TITLE-237 software engineer software developer programmer … CO GEO CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL ) 19
  20. QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE CO GEO MORE PRECISE MATCHING WITH DOCUMENTS 20
  21. ENTITY-BASED FILTERING BEFORE 21
  22. ENTITY-BASED FILTERING BEFORE AFTER 22
  23. ENTITY-BASED FILTERING BEFORE 23
  24. ENTITY-BASED FILTERING BEFORE AFTER 24
  25. ENTITY-BASED SUGGESTIONS 25
  26. ENTITY-BASED SUGGESTIONS 26
  27. QUERY TAGGING : SEQUENTIAL MODEL TRAINING EMISSION PROBABILITIES (Learned from user profiles) TRANSITION PROBABILITIES (Learned from query logs) 27
  28. QUERY TAGGING : SEQUENTIAL MODEL INFERENCE Given a query, find the most likely sequence of tags 28
  29. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 29
  30. VERTICAL INTENT PREDICTION JOBS PEOPLE COMPANIES (Probability distribution over verticals) 30
  31. VERTICAL INTENT PREDICTION : SIGNALS 1. Past query counts in each vertical + Query tags (TAG:COMPANY) [Company] (TAG:NAME) [Name Search] [Employees] [Jobs] 2. Personalization: User’s search history 31
  32. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 32
  33. QUERY EXPANSION GOAL: Improve recall through synonym expansion 33
  34. QUERY EXPANSION : NAME SYNONYMS 34
  35. QUERY EXPANSION : JOB TITLE SYNONYMS 35
  36. QUERY EXPANSION : SIGNALS Trained using query chains: [jon] [jonathan] CLICK [programmer] [developer] CLICK [software engineer] [software developer] CLICK Symmetric but not transitive! Context based! [francis] ⇔ [frank] [franklin] ⇔ [frank] [software engineer] => [software developer] [civil engineer] ≠ [civil developer] [francis] ≠ [franklin] 36
  37. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 37
  38. QUERY UNDERSTANDING: SUMMARY  High degree of structure in queries as well as corpus (user profiles, job postings, companies, …)  Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search  Query tagging and query log analysis play a big role in query understanding 38
  39. ranking 39
  40. WHAT’S IN A NAME QUERY?
  41. BUT NAMES CAN BE AMBIGUOUS kevin scott ≠
  42. SEARCHING FOR A COMPANY’S EMPLOYEES
  43. SEARCHING FOR PEOPLE WITH A SKILL
  44. RANKING IS COMPLICATED  Seemingly similar queries require dissimilar scoring functions  Personalization matters – Multiple dimensions to personalize on – Dimensions vary with query class
  45. TRAINING Documents for training F e a t u r e s Machine learning model Human evaluation L a b e l s
  46. TRAINING Documents for training F e a t u r e s Machine learning model Human evaluation L a b e l s
  47. ASSESSING RELEVANCE
  48. RELEVANCE DEPENDS ON WHO’S SEARCHING What if the searcher is a job seeker? Or a recruiter? Or…
  49. THE QUERY IS NOT ENOUGH
  50. WE NEED USER FEATURES  Non-personalized relevance model: score = f(Document | Query)  Personalized relevance model: score = f(Document | Query, User)
  51. COLLECTING RELEVANCE JUDGMENTS WON’T SCALE
  52. TRAINING Documents for training F e a t u r e s Machine learning model Human evaluation Search logs L a b e l s
  53. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant
  54. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant
  55. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant
  56. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant User eye scan direction  Good results not seen are marked Not Relevant. Unfairly penalized?
  57. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped = Not Relevant • Only penalize results that the user has seen but ignored
  58. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped = Not Relevant • Only penalize results that the user has seen but ignored • Risks inverting model by overweighing low-ranked results
  59. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR [Radlinski and Joachims, AAAI’06]
  60. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR Flipped [Radlinski and Joachims, AAAI’06]
  61. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR • Great at dealing with position bias • Does not invert models Flipped [Radlinski and Joachims, AAAI’06]
  62. EASY NEGATIVES • Assumption: A decent current model would push out bad results to the very end. • Easy Negatives: Some of the results at the end are picked up as negative examples
  63. EASY NEGATIVES 2 pages • 90+ pages Use strategies that sample across the feature space • Searches with less results preferred • Always sample from a given page, say page 10
  64. PUTTING IT ALL TOGETHER  Human evaluation is not practical for personalized searches  Learn from user behavior – Multiple heuristics depending on the need – Different pros and cons
  65. EFFICIENCY VS EXPRESSIVENESS  Build tree with logistic regression leaves.  By restricting decision nodes to (Query, User) segments, only one regression model can be evaluated for each document. X2=? b0 + b1 T(x1 )+...+ bn xn a0 + a1 P(x1 )+...+ anQ(xn ) X4? g 0 + g1 R(x1 )+...+ g nQ(xn ) 66
  66. SCORING New document New document New document F e aF t eF uae r ta eut sru e sr e s Machin e Machin learning e model Machine learning learning model model score score score Ordered Ordered list Ordered list list
  67. A SIMPLIFIED EXAMPLE Name Query? b0 + 0.85*(IndustryOverlap)+... + bn xn Skill Query? a0 +0*(IndustryOverlap)+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn ) 68
  68. TEST, TEST, TEST Interleaving Model 1 Model 2 Interleaved a b a b e b c a c d f e g g d h h f [Radlinski et al., CIKM 2008] 69
  69. SUMMARY  Query understanding leverages the rich structure of LinkedIn’s content and information needs.  Query tagging and rewriting allows us to deliver precision and recall.  For ranking, personalization is both the biggest challenge and the core of our solution.  Segmenting relevance models by query type helps us efficiently address the diversity of search needs.
  70. Abhimanyu Lad alad@linkedin.com https://linkedin.com/in/abhilad Satya Kanduri skanduri@linkedin.com https://linkedin.com/in/skanduri 71

×