There’s a high degree of structure in our users’ queries as well as our corpus(i.e, user profiles, job postings, companies, etc)Query understanding allows us to take advantage of this structure to do entity-oriented search to optimally balance recall and precision.Finally, our ability to understand and intelligently rewrite queries heavily depends on two things: query tagging (the ability to identify entities in the query) and query log analysis (analysing how users reformulate their queries)
ThanksAbhi. Today I’ll talking about some of the ranking challenges we face here at LinkedIn. Through this talk I’ll be focusing on People Search, but the challenges are applicable to all search problems we strive to solve.In order to get a sense of the ranking problem, let’s take a look at some examples.
One of the more frequent types of queries we see in people search are name queries. In this example, that query happens to be richardbranson. While we have other Richard bransons on LinkedIn, most likely the searcher was looking for the founder of Virgin group. In order to get this search right, we only need 2 things – name has to match the terms and the rank should be based on global popularity. That is pretty straightforward. Now, let’s take a look at another example.
multiple dimensions of personalizationIf we look at the result sets, the left ones are mostly clustered around san francisco bay area and the right ones are clustered around atlanta area. We could be looking for any Kevin Scott local to us, but considering global prior we put the respective SVPs on top kevinscott --> non #1 results - start talking about featuresThis is also another name query. But there are multiple Kevin Scotts present on LinkedIn. Which of these 2 result sets are relevant? It’s hard to say. If I was issuing this search, I’d say the left result set is better as I work at LinkedIn and I live in SF Bay Area. On the other hand, if someone works at Home Depot and located in Atlanta area, the right result set is probably better. This example shows us 2 dimensions of personalization – 1. company, 2. location. Are there more factors we could personalize on?
There is not query here. I chose to use facets in this case to select the results preciselyWe’ve seen in the previous slide that personalization involves more than 1 dimension. Let us look at an example to see if there are any more dimensions that influence personalizationLet’s take a look at yet another example. Let us say I am looking for someone working at NetApp. Apart from the location personalization that you can see in the 2nd result, there are 2 other important dimensions to personalization – network distance on the first one and industry overlap on the 3rd result.So the question now is, by personalizing on all these dimensions (company, location, network distance, industry etc.) for every query, can we obtain the best set of results? Let’s find out with another example.
One of the unique value propositions of LinkedIn is to search people with a skillNot all features are useful in any query class. For instance, for skills searches, industry overlap didn’t turn out to be a significant features whereas for name searches that is one of the significant featuresballet --> all of the top ranked results are in performing arts (where as I am not in performing industry)One of the unique value propositions of linkedin is to search for people possessing a skill. Most of these results are from performing arts industry. As you can probably guess, I do not work in performing arts industry and I possess no skills related to performing arts. So personalization based on industry is not applicable here. However, if you look carefully, the results are still personalized based on my network distance.
To recap what we saw in the examples,
Considering all these factors that we should take into account – how do we train a personalized machine learning model
Of course I have severely simplified the process, but this is just to give those are you who aren’t familiar with machine learning an ideathis is how machine learned models are trained for ranking.
Most of the work typically involves around sampling documents, engineering features, and obtaining truth data. In my next few slide, we’ll explore different ways in which we can get labelsmore important part is data - unreasonable effectiveness of data- train a model for each of 270 million models
Let’s say a recruiter is looking for someone with skill oracle database – is this still a right result?
- non-personalized (allude to others conventional, traditional) - we are always personalizedSatya’s notes – A conventional non-personalized model is a function of document and query. But in LinkedIn’s case, we have an additional “User” dimension due to personalization. As you can see, our score is a function of document, query, and user.
Cannot use human labels
- lower ranked results not labeled negative - we are throwing our own ranking function under the bus.. there might be a good reason they are ranked lower, but there might be a good result...
----- Meeting Notes (11/15/13 10:06) -----all the results we didn't evaluate look better than the any results before the ones that was clicked. If the original model was pretty good, that gives a lot of credit to the unseen ones
sampling bias – data concentrated to top results. model does not know how to differentiate really poor results
----- Meeting Notes (11/15/13 10:25) -----Why is this okay given that unrepresentative sample?
The best models for LTR are generally complex, like ensembles of trees. These models are expensive – especially in first-pass rankers which often need to score hundreds of thousands of results for every query. The approach we use is to first train complex models, then use insights from those to train simple models.don’t talk about ndcgexpressiveness/complexity
Potentially score hundreds of documents… trade-off between expressiveness vs evaluation
The decision nodes can also be based on user segments – such as whether the user is a recruiter or a regular userIndustry overlap is not required ins kill queriesAs can be seen, we can avoid computing IndustryOverlap for a skill query
We test offline as much as possible, but online evaluation is the litmus testConventional ways to measure are CTR, MRR, P/R etc.Interleaving side-by-side2 result sets…
SPELLING OUT THE DETAILS N-grams
marissa => ma ar ri is ss sa Metaphone PEOPLE NAMES COMPANIES TITLES mark/marc => MRK Co-occurrence counts PAST QUERIES marissa:mayer = 1000 marisa meyer yahoo marissa meyer marisa yahoo mayer 15
SPELLING OUT THE DETAILS PROBLEM:
Corpus as well as query logs contain many spelling errors Certain spelling errors are quite frequent While genuine words (especially names) might be infrequent 16
SPELLING OUT THE DETAILS PROBLEM:
Corpus as well as query logs contain many spelling errors SOLUTION: Use query chains to infer correct spelling [product manger] [marissa mayer] [product manager] CLICK CLICK 17
QUERY TAGGING IDENTIFYING ENTITIES IN
THE QUERY TITLE TITLE-237 software engineer software developer programmer … CO GEO CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL ) 19
QUERY UNDERSTANDING: SUMMARY High
degree of structure in queries as well as corpus (user profiles, job postings, companies, …) Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search Query tagging and query log analysis play a big role in query understanding 38
FAIR PAIRS • Fair Pairs:
• Randomize, Clicked= R, Skipped= NR • Great at dealing with position bias • Does not invert models Flipped [Radlinski and Joachims, AAAI’06]
EASY NEGATIVES • Assumption: A
decent current model would push out bad results to the very end. • Easy Negatives: Some of the results at the end are picked up as negative examples
EASY NEGATIVES 2 pages •
90+ pages Use strategies that sample across the feature space • Searches with less results preferred • Always sample from a given page, say page 10
PUTTING IT ALL TOGETHER
Human evaluation is not practical for personalized searches Learn from user behavior – Multiple heuristics depending on the need – Different pros and cons
EFFICIENCY VS EXPRESSIVENESS Build
tree with logistic regression leaves. By restricting decision nodes to (Query, User) segments, only one regression model can be evaluated for each document. X2=? b0 + b1 T(x1 )+...+ bn xn a0 + a1 P(x1 )+...+ anQ(xn ) X4? g 0 + g1 R(x1 )+...+ g nQ(xn ) 66
SCORING New document New document
New document F e aF t eF uae r ta eut sru e sr e s Machin e Machin learning e model Machine learning learning model model score score score Ordered Ordered list Ordered list list
SUMMARY Query understanding leverages
the rich structure of LinkedIn’s content and information needs. Query tagging and rewriting allows us to deliver precision and recall. For ranking, personalization is both the biggest challenge and the core of our solution. Segmenting relevance models by query type helps us efficiently address the diversity of search needs.