Search Quality at LinkedIn

15,720 views

Published on

Presented to the Bay Area Search Meetup on February 26, 2014

http://www.meetup.com/Bay-Area-Search/events/136150622/

At LinkedIn, we face a number of challenges in delivering high quality search results to 277M+ members. Our results are highly personalized, requiring us to build machine-learned relevance models that combine document, query, and user features. And our emphasis on entities (names, companies, job titles, etc.) affects how we process and understand queries. In this talk, we'll talk about these challenges in detail, and we'll describe some of the solutions we are building to address them.

Speakers:

Satya Kanduri has worked on LinkedIn search relevance since 2011. Most recently he led the development of LinkedIn's machine-learned ranking platform. He previously worked at Microsoft, improving relevance for Bing Product Search. He has an MS in Computer Science from the University of Nebraska - Lincoln, and a BE in Computer Science from the Osmania University College of Engineering.

Abhimanyu Lad has worked at LinkedIn as a software engineer and data scientist since 2011. He has worked on a variety of relevance and query understanding problems, including query intent prediction, query suggestion, and spelling correction. He has a PhD in Computer Science from CMU, where he worked on developing machine learning techniques for diversifying search results.

Published in: Technology
3 Comments
26 Likes
Statistics
Notes
No Downloads
Views
Total views
15,720
On SlideShare
0
From Embeds
0
Number of Embeds
2,938
Actions
Shares
0
Downloads
211
Comments
3
Likes
26
Embeds 0
No embeds

No notes for slide
  • There’s a high degree of structure in our users’ queries as well as our corpus(i.e, user profiles, job postings, companies, etc)Query understanding allows us to take advantage of this structure to do entity-oriented search to optimally balance recall and precision.Finally, our ability to understand and intelligently rewrite queries heavily depends on two things: query tagging (the ability to identify entities in the query) and query log analysis (analysing how users reformulate their queries)
  • ThanksAbhi. Today I’ll talking about some of the ranking challenges we face here at LinkedIn. Through this talk I’ll be focusing on People Search, but the challenges are applicable to all search problems we strive to solve.In order to get a sense of the ranking problem, let’s take a look at some examples.
  • One of the more frequent types of queries we see in people search are name queries. In this example, that query happens to be richardbranson. While we have other Richard bransons on LinkedIn, most likely the searcher was looking for the founder of Virgin group. In order to get this search right, we only need 2 things – name has to match the terms and the rank should be based on global popularity. That is pretty straightforward. Now, let’s take a look at another example.
  • multiple dimensions of personalizationIf we look at the result sets, the left ones are mostly clustered around san francisco bay area and the right ones are clustered around atlanta area. We could be looking for any Kevin Scott local to us, but considering global prior we put the respective SVPs on top kevinscott --> non #1 results - start talking about featuresThis is also another name query. But there are multiple Kevin Scotts present on LinkedIn. Which of these 2 result sets are relevant? It’s hard to say. If I was issuing this search, I’d say the left result set is better as I work at LinkedIn and I live in SF Bay Area. On the other hand, if someone works at Home Depot and located in Atlanta area, the right result set is probably better. This example shows us 2 dimensions of personalization – 1. company, 2. location. Are there more factors we could personalize on?
  • There is not query here. I chose to use facets in this case to select the results preciselyWe’ve seen in the previous slide that personalization involves more than 1 dimension. Let us look at an example to see if there are any more dimensions that influence personalizationLet’s take a look at yet another example. Let us say I am looking for someone working at NetApp. Apart from the location personalization that you can see in the 2nd result, there are 2 other important dimensions to personalization – network distance on the first one and industry overlap on the 3rd result.So the question now is, by personalizing on all these dimensions (company, location, network distance, industry etc.) for every query, can we obtain the best set of results? Let’s find out with another example.
  • One of the unique value propositions of LinkedIn is to search people with a skillNot all features are useful in any query class. For instance, for skills searches, industry overlap didn’t turn out to be a significant features whereas for name searches that is one of the significant featuresballet --> all of the top ranked results are in performing arts (where as I am not in performing industry)One of the unique value propositions of linkedin is to search for people possessing a skill. Most of these results are from performing arts industry. As you can probably guess, I do not work in performing arts industry and I possess no skills related to performing arts. So personalization based on industry is not applicable here. However, if you look carefully, the results are still personalized based on my network distance.
  • To recap what we saw in the examples,
  • Considering all these factors that we should take into account – how do we train a personalized machine learning model
  • Of course I have severely simplified the process, but this is just to give those are you who aren’t familiar with machine learning an ideathis is how machine learned models are trained for ranking.
  • Most of the work typically involves around sampling documents, engineering features, and obtaining truth data. In my next few slide, we’ll explore different ways in which we can get labelsmore important part is data - unreasonable effectiveness of data- train a model for each of 270 million models
  • Let’s say a recruiter is looking for someone with skill oracle database – is this still a right result?
  • - non-personalized (allude to others conventional, traditional) - we are always personalizedSatya’s notes – A conventional non-personalized model is a function of document and query. But in LinkedIn’s case, we have an additional “User” dimension due to personalization. As you can see, our score is a function of document, query, and user.
  • Cannot use human labels
  • - lower ranked results not labeled negative - we are throwing our own ranking function under the bus.. there might be a good reason they are ranked lower, but there might be a good result...
  • ----- Meeting Notes (11/15/13 10:06) -----all the results we didn't evaluate look better than the any results before the ones that was clicked. If the original model was pretty good, that gives a lot of credit to the unseen ones
  • sampling bias – data concentrated to top results. model does not know how to differentiate really poor results
  • ----- Meeting Notes (11/15/13 10:25) -----Why is this okay given that unrepresentative sample?
  • The best models for LTR are generally complex, like ensembles of trees. These models are expensive – especially in first-pass rankers which often need to score hundreds of thousands of results for every query. The approach we use is to first train complex models, then use insights from those to train simple models.don’t talk about ndcgexpressiveness/complexity
  • Potentially score hundreds of documents… trade-off between expressiveness vs evaluation
  • The decision nodes can also be based on user segments – such as whether the user is a recruiter or a regular userIndustry overlap is not required ins kill queriesAs can be seen, we can avoid computing IndustryOverlap for a skill query
  • We test offline as much as possible, but online evaluation is the litmus testConventional ways to measure are CTR, MRR, P/R etc.Interleaving side-by-side2 result sets…
  • Platform supports fast iteration
  • Search Quality at LinkedIn

    1. Search Quality at LinkedIn Abhimanyu Lad Senior Software Engineer Recruiting Solutions Satya Kanduri Senior Software Engineer
    2. verticals: people, jobs intent: exploratory tag: skill OR title related skills: search, ranking, … tag: company id: 1337 industry: internet 2
    3. SEARCH USE CASES How do people use LinkedIn’s search? 3
    4. PEOPLE SEARCH Search for people by name 4
    5. PEOPLE SEARCH Search for people by other attributes 5
    6. EXPLORATORY PEOPLE SEARCH 6
    7. JOB SEARCH 7
    8. COMPANY SEARCH 8
    9. AND MUCH MORE… 9
    10. OUR GOAL  Universal Search – Single search box  High Recall – Spelling correction, synonym expansion, …  High Precision – Entity-oriented search: match things, not strings 10
    11. QUERY UNDERSTANDING PIPELINE 11
    12. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 12
    13. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 13
    14. SPELLING CORRECTION Fix obvious typos Help users spell names 14
    15. SPELLING OUT THE DETAILS N-grams marissa => ma ar ri is ss sa Metaphone PEOPLE NAMES COMPANIES TITLES mark/marc => MRK Co-occurrence counts PAST QUERIES marissa:mayer = 1000 marisa meyer yahoo marissa meyer marisa yahoo mayer 15
    16. SPELLING OUT THE DETAILS PROBLEM: Corpus as well as query logs contain many spelling errors Certain spelling errors are quite frequent While genuine words (especially names) might be infrequent 16
    17. SPELLING OUT THE DETAILS PROBLEM: Corpus as well as query logs contain many spelling errors SOLUTION: Use query chains to infer correct spelling [product manger] [marissa mayer] [product manager] CLICK CLICK 17
    18. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 18
    19. QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE TITLE-237 software engineer software developer programmer … CO GEO CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL ) 19
    20. QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE CO GEO MORE PRECISE MATCHING WITH DOCUMENTS 20
    21. ENTITY-BASED FILTERING BEFORE 21
    22. ENTITY-BASED FILTERING BEFORE AFTER 22
    23. ENTITY-BASED FILTERING BEFORE 23
    24. ENTITY-BASED FILTERING BEFORE AFTER 24
    25. ENTITY-BASED SUGGESTIONS 25
    26. ENTITY-BASED SUGGESTIONS 26
    27. QUERY TAGGING : SEQUENTIAL MODEL TRAINING EMISSION PROBABILITIES (Learned from user profiles) TRANSITION PROBABILITIES (Learned from query logs) 27
    28. QUERY TAGGING : SEQUENTIAL MODEL INFERENCE Given a query, find the most likely sequence of tags 28
    29. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 29
    30. VERTICAL INTENT PREDICTION JOBS PEOPLE COMPANIES (Probability distribution over verticals) 30
    31. VERTICAL INTENT PREDICTION : SIGNALS 1. Past query counts in each vertical + Query tags (TAG:COMPANY) [Company] (TAG:NAME) [Name Search] [Employees] [Jobs] 2. Personalization: User’s search history 31
    32. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 32
    33. QUERY EXPANSION GOAL: Improve recall through synonym expansion 33
    34. QUERY EXPANSION : NAME SYNONYMS 34
    35. QUERY EXPANSION : JOB TITLE SYNONYMS 35
    36. QUERY EXPANSION : SIGNALS Trained using query chains: [jon] [jonathan] CLICK [programmer] [developer] CLICK [software engineer] [software developer] CLICK Symmetric but not transitive! Context based! [francis] ⇔ [frank] [franklin] ⇔ [frank] [software engineer] => [software developer] [civil engineer] ≠ [civil developer] [francis] ≠ [franklin] 36
    37. QUERY UNDERSTANDING PIPELINE Raw query Spellcheck Query Tagging Vertical Intent Prediction Query Expansion Structured query + Annotations 37
    38. QUERY UNDERSTANDING: SUMMARY  High degree of structure in queries as well as corpus (user profiles, job postings, companies, …)  Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search  Query tagging and query log analysis play a big role in query understanding 38
    39. ranking 39
    40. WHAT’S IN A NAME QUERY?
    41. BUT NAMES CAN BE AMBIGUOUS kevin scott ≠
    42. SEARCHING FOR A COMPANY’S EMPLOYEES
    43. SEARCHING FOR PEOPLE WITH A SKILL
    44. RANKING IS COMPLICATED  Seemingly similar queries require dissimilar scoring functions  Personalization matters – Multiple dimensions to personalize on – Dimensions vary with query class
    45. TRAINING Documents for training F e a t u r e s Machine learning model Human evaluation L a b e l s
    46. TRAINING Documents for training F e a t u r e s Machine learning model Human evaluation L a b e l s
    47. ASSESSING RELEVANCE
    48. RELEVANCE DEPENDS ON WHO’S SEARCHING What if the searcher is a job seeker? Or a recruiter? Or…
    49. THE QUERY IS NOT ENOUGH
    50. WE NEED USER FEATURES  Non-personalized relevance model: score = f(Document | Query)  Personalized relevance model: score = f(Document | Query, User)
    51. COLLECTING RELEVANCE JUDGMENTS WON’T SCALE
    52. TRAINING Documents for training F e a t u r e s Machine learning model Human evaluation Search logs L a b e l s
    53. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant
    54. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant
    55. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant
    56. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked = Not Relevant User eye scan direction  Good results not seen are marked Not Relevant. Unfairly penalized?
    57. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped = Not Relevant • Only penalize results that the user has seen but ignored
    58. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped = Not Relevant • Only penalize results that the user has seen but ignored • Risks inverting model by overweighing low-ranked results
    59. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR [Radlinski and Joachims, AAAI’06]
    60. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR Flipped [Radlinski and Joachims, AAAI’06]
    61. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR • Great at dealing with position bias • Does not invert models Flipped [Radlinski and Joachims, AAAI’06]
    62. EASY NEGATIVES • Assumption: A decent current model would push out bad results to the very end. • Easy Negatives: Some of the results at the end are picked up as negative examples
    63. EASY NEGATIVES 2 pages • 90+ pages Use strategies that sample across the feature space • Searches with less results preferred • Always sample from a given page, say page 10
    64. PUTTING IT ALL TOGETHER  Human evaluation is not practical for personalized searches  Learn from user behavior – Multiple heuristics depending on the need – Different pros and cons
    65. EFFICIENCY VS EXPRESSIVENESS  Build tree with logistic regression leaves.  By restricting decision nodes to (Query, User) segments, only one regression model can be evaluated for each document. X2=? b0 + b1 T(x1 )+...+ bn xn a0 + a1 P(x1 )+...+ anQ(xn ) X4? g 0 + g1 R(x1 )+...+ g nQ(xn ) 66
    66. SCORING New document New document New document F e aF t eF uae r ta eut sru e sr e s Machin e Machin learning e model Machine learning learning model model score score score Ordered Ordered list Ordered list list
    67. A SIMPLIFIED EXAMPLE Name Query? b0 + 0.85*(IndustryOverlap)+... + bn xn Skill Query? a0 +0*(IndustryOverlap)+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn ) 68
    68. TEST, TEST, TEST Interleaving Model 1 Model 2 Interleaved a b a b e b c a c d f e g g d h h f [Radlinski et al., CIKM 2008] 69
    69. SUMMARY  Query understanding leverages the rich structure of LinkedIn’s content and information needs.  Query tagging and rewriting allows us to deliver precision and recall.  For ranking, personalization is both the biggest challenge and the core of our solution.  Segmenting relevance models by query type helps us efficiently address the diversity of search needs.
    70. Abhimanyu Lad alad@linkedin.com https://linkedin.com/in/abhilad Satya Kanduri skanduri@linkedin.com https://linkedin.com/in/skanduri 71

    ×