Learn to Rank search results

2,607 views

Published on

The presentation details some machine learning techniques used to learn to rank search results.

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
  • @ganesh, a neat and clean presentation of LTR. I was wondering what kind of features you had used for learning algiorithm ? query, doc, querydoc features ? Thanks
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,607
On SlideShare
0
From Embeds
0
Number of Embeds
141
Actions
Shares
0
Downloads
88
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

Learn to Rank search results

  1. 1. Recruiting Solutions 1 Learning-to-Rank Search Results Ganesh Venkataraman http://www.linkedin.com/in/npcomplete @gvenkataraman
  2. 2. Audience Classification  Search background  ML background  Search + ML background 2 0% 10% 20% 30% 40% 50% 60% 70% Search+ML ML Search
  3. 3. Outline  Search Overview  Why Learning to Rank (LTR)?  Biases with collecting training data from click logs – Sampling bias – Presentation bias  Three basic approaches – Point wise – Pair wise – List wise  Key Takeaways/Summary 3
  4. 4. tl;dr  Ranking interacts heavily with retrieval and query understanding  Ground truth > features > model*  List wise > pair wise > point wise 4 * Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
  5. 5. Primer on Search 5
  6. 6. bird’s-eye view of how a search engine works rank using IR model system: user: Information need query select from results 6
  7. 7. Pre Retrieval/Retrieval/Post Retrieval Pre retrieval – Process input query, rewrite, check for spelling etc. – Hit search (potentially several) nodes with appropriate query Retrieval – Given a query, retrieve all documents matching query along with a score Post retrieval – Merge sort results from different search nodes – Add relevant information to search results used by front end 7
  8. 8. 8
  9. 9. Claim #1 Search is about understanding the query/user intent 9
  10. 10. Understanding intent 10 TITLE CO GEO TITLE-237 software engineer software developer programmer … CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
  11. 11. Fixing user errors 11 typos Help users spell names
  12. 12. Claim #2 Search is about understanding systems 12
  13. 13. The Search Index  Inverted Index: Mapping from (search) terms to list of documents (they are present in)  Forward Index: Mapping from documents to metadata about them 13
  14. 14. Posting List 14 Term Posting List DO = “it is what it is” D1 = “what is it” D2 = “it is a banana” DocId a banana is it what 2 2 0 0 1 1 2 2 0 Frequency 1 Bold B 1 1 2 1 1 1 2 1 1 1
  15. 15. Candidate selection for “abraham lincoln”  Posting lists – “abraham” => {5, 7, 8, 23, 47, 101} – “lincoln” => {7, 23, 101, 151}  Query = “abraham AND lincoln” – Retrieved set => {7, 23, 101}  Some systems level issues – How to represent posting lists efficiently? – How does one traverse a very long posting list? (for words like “the”, “an” etc.)? 15
  16. 16. Claim #3 Search is ranking problem 16
  17. 17. What is search ranking?  Ranking – Find a ordered list of documents according to relevance between documents and query.  Traditional search – f(query, document) => score  Social networks context – f(query, document, user) => score – Find an ordered list of documents according to relevance between documents, query and user 17
  18. 18. Why LTR?  Manual models become hard to tune with very large number of features and non-convex interactions  Leverages large volume of click through data in an automated way  Unique challenges involved in crowdsourcing personalized ranking  Key Issues – How do we collect training data? – How do we avoid biases? – How do we train the model? 18
  19. 19. TRAINING Documents for training Fe atures Human evaluation La bels Machine learning model
  20. 20. TRAINING Documents for training Fe atures Human evaluation La bels Machine learning model
  21. 21. training options – crowdsourcing judgment 21 Crowd source judgments • (query, user, document) -> label • {1, 2, 3, 4, 5}, higher label => better • Issues • Personalized world • Difficult to scale
  22. 22. Mining click stream Approach: Clicked = Relevant, Not-Clicked = Not Relevant User eye scan direction Unfairly penalized?
  23. 23. Position Bias  “Accurately interpreting clickthrough data as implicit feedback” – Joachims et. al, ACM SIGIR, 2005. – Experiment #1  Present users with normal Google search results  55.56% users clicked first result  5.56% clicked second result – Experiment #2  Same result page as first experiment, but 1st and 2nd result were flipped  57.14% users clicked first result  7.14% clicked second result 23
  24. 24. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR [Radlinski and Joachims, AAAI’06]
  25. 25. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Flipped Skipped= NR [Radlinski and Joachims, AAAI’06]
  26. 26. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, • Great at dealing with position bias • Does not invert models Flipped Skipped= NR [Radlinski and Joachims, AAAI’06]
  27. 27. Issue #2 – Sampling Bias 27  Sample bias – User clicks or skips only what is shown. – What about low scoring results from existing model? – Add low-scoring results as ‘easy negatives’ so model learns bad results not presented to user. … label 0 label 0 label 0 … label 0 page 1 page 2 page 3 page n
  28. 28. Issue #2 – Sampling Bias 28
  29. 29. Avoiding Sampling Bias – Easy negatives  Invasive way – For a small sample of users add bad results in the SERP page to test that the results were indeed bad – Not really recommended since it affects UX  Non-Invasive way – Assume we have a decent model – Take tail results and add them to model as an “easy negative” – Similar approach can be done for “easy positives” depending on applications 29
  30. 30. How to collect training data?  Implicit relevance judgments from click logs – including clicked and unclicked results from SERP (avoids position bias)  Add easy negatives (avoids sampling bias) 30
  31. 31. Mining click stream Approach: Relevance labels Label = 0 (least relevant) Label = 5 (Most relevant) Label = 2
  32. 32. Learning to Rank  Pointwise: Reduce ranking to binary classification 33 Q1 + + + - Q2 + - - - Q3 + + - -
  33. 33. Learning to Rank  Pointwise: Reduce ranking to binary classification 34 Q1 + + + - Q2 + - - - Q3 + + - -
  34. 34. Learning to Rank  Pointwise: Reduce ranking to binary classification 35 Q1 + + + - Q2 + - - - Q3 + + - - Limitations  Assume relevance is absolute  Relevant documents associated with different queries are put into the same class
  35. 35. Learning to Rank  Pairwise: Reduce ranking to classification of document pairs w.r.t. the same query – {(Q1, A>B), (Q2, C>D), (Q3, E>F)} 36
  36. 36. Learning to Rank  Pairwise: Reduce ranking to classification of document pairs w.r.t the same query – {(Q1, A>B), (Q2, C>D), (Q3, E>F)} 37
  37. 37. Learning to Rank  Pairwise – No longer assume absolute relevance – Limitation: Does not differentiate inversions at top vs. bottom positions 38
  38. 38. Listwise approach - DCG  Objective – Come up with a function to convert entire set of ranked search results, each with relevance labels into a score  Characteristics of such a function – Higher relevance in ranked set => higher score – Higher relevance in ranked set on higher positions => higher score  p documents in the search results, each document ‘i’ has a relevance reli. 39 DCGp = p å 2reli -1 log(i +1) i=1
  39. 39. DCG Rank Discounted 40 Gain 1 3 2 4.4 3 0.5 (2relevance -1)/log(1+Rank) 7.9
  40. 40. NDCG based optimization  NDCG@k = Normalized(DCG@k)  Ensures value is between 0.0 and 1.0  Since NDCG directly represents the “value” of particular ranking given the relevance labels, one can directly formulate ranking as maximizing NDCG@k (say k = 5)  Directly pluggable into a variety of algorithms including coordinate ascent 41
  41. 41. Learning to Rank 42 Point wise Simple to understand and debug Straight forward to use ✕Query independent Pair wise ✕Assumes relevance is absolute Assumes relevance is relative Depends on query ✕Loss function agnostic to position List Wise Directly operate on ranked lists Loss function aware of position ✕More complicated, non-convex functions, higher training time
  42. 42. Search Ranking 43 Click Logs Training Data Model Offline Evaluation Online A/B test/debug score = f(query, user, document)
  43. 43. tl;dr revisited  Ranking interacts heavily with retrieval and query understanding – Query understanding affects intent detection, fixing user errors etc. – Retrieval affects candidate selection, speed etc.  Ground truth > features > model* – Truth data is affected by biases  List wise > pair wise > point wise – Listwise while more complicated avoids some model level issues in pairwise and point wise methods 44 * Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
  44. 44. Useful references  “From RankNet to LambdaRank to LambdaMART: An overview” – Christopher Burges  “Learning to Rank for Information Retrieval” – Tie-Yan Liu  RankLib – has implementations of several LTR approaches 45
  45. 45. LinkedIn search is powered by … 46 We are hiring !! careers.linkedin.com

×