Adapting Rankers Online, Maarten de Rijke
Upcoming SlideShare
Loading in...5
×
 

Adapting Rankers Online, Maarten de Rijke

on

  • 3,739 views

 

Statistics

Views

Total Views
3,739
Views on SlideShare
672
Embed Views
3,067

Actions

Likes
0
Downloads
1
Comments
0

18 Embeds 3,067

http://xss.yandex.net 2322
http://tech.yandex.ru 677
http://events.yandex.ru 19
https://tech.yandex.ru 13
http://xss.yandex 5
http://test1e.tech.yandex.ru 5
http://events.lynx.yandex.ru 5
http://xss.yande 3
http://external.events.test.tools.yandex-team.ru 3
http://xss.yand 3
http://xss.ya 2
https://xss.yandex.net 2
http://events.yandex-team.ru 2
http://xss 2
http://xss.y 1
http://xss.yan 1
http://news.google.com 1
http://events.indus.yandex.ru 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Adapting Rankers Online, Maarten de Rijke Presentation Transcript

  • 1. Adapting Rankers OnlineMaarten de Rijke
  • 2. Joint work with Katja Hofmann and Shimon Whiteson Adapting Rankers Online 2
  • 3.  Growing complexity of search engines Current methods for optimizing mostly work offline Adapting Rankers Online 3
  • 4. Online learning to rank No distinction between training and operating Search engine observes users’ natural interactions with the search interface, infers information from them, and improves its ranking function automatically Expensive data collection not required; the collected data matches target users and target setting Adapting Rankers Online 4
  • 5. Users’ natural interactions with the search interface Refe r s s m a l to p o s s i le st b Minimum scope of i te le s c op e m a cte d b ei n g up o n Segment Object Class Behavior category View, Listen, Scroll, Examine Find, Query Select Browse Bookmark, Save, Retain Print Delete, Purchase, Email Subscribe Copy-and-paste, Forward, Reply, Reference Quote Link, Cite toR efe rs of o sep u rp ve d Annotate Mark up Rate, Publish Organize o bser io r v beh a Create Type, Edit Author Oard and Kim, 2001 Adapting Rankers Online 5 Kelly and Teevan, 2004
  • 6. Users’ interactions Relevance feedback  History goes back close to forty years  Typically used for query expansion, user profiling Explicit feedback  Users explicitly give feedback  Keywords, selecting or marking documents, answering questions  Natural explicit feedback can be difficult to obtain  “Unnatural” explicit feedback through TREC assessors and crowd sourcing Adapting Rankers Online 6
  • 7. Users’ interactions (2) Implicit feedback for learning, query expansion and user profiling  Observe users’ natural interactions with system  Reading time, saving, printing, bookmarking, selecting, clicking, …  Thought to be less accurate than explicit measures  Available in very large quantities at no cost Adapting Rankers Online 7
  • 8. Learning to rank online Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running  Algorithms need to explore new solutions to obtain feedback for effective learning and exploit what has been learned to produce results acceptable to users  Interleaved comparison methods can use implicit feedback to detect small differences between rankers and can be used to learn ranking functions online Adapting Rankers Online 8
  • 9. Agenda Balancing exploration and exploitation Inferring preferences from clicks Adapting Rankers Online 9
  • 10. Rec en wor t k Balancing Exploitation and ExplorationK. Hofmann et al. (2011), Balancing exploration and exploitation. In:ECIR ’11. Adapting Rankers Online 10
  • 11. Challenges Generalize over queries and documents Learn from implicit feedback that is …  noisy  relative  rank-biased Keep users happy while learning Adapting Rankers Online 11
  • 12. Learning document pair-wise preferences Vienna Insight: infer preferences from clicksJoachims, T. (2002). Optimizing search engines usingclickthrough data. In KDD 02, pages 133-142. Adapting Rankers Online 12
  • 13. Learning document pair-wise preferences Input: feature vectors constructed from document (￿ (q, di ), ￿ (q, dj )) ∈ Rn × Rn pairs x x Output: y ∈ {−1, +1} correct / incorrect order Learning method: supervised learning, e.g., SVMJoachims, T. (2002). Optimizing search engines using clickthrough data. In KDD 02,pages 133-142. Adapting Rankers Online 13
  • 14. Challenges Generalize over queries and documents Learn from implicit feedback that is …  noisy  relative  rank-biased Keep users happy while learning Adapting Rankers Online 14
  • 15. Dueling bandit gradient descent Learns a ranking function consisting of a weight vector for a linear weighted combination of feature vectors from feedback about relative quality of rankings  Outcome: weights for ranking S = w￿ (q, d) x Approach  Maintain a current “best” ranking function candidate w  On each incoming query: x2 current best w  Generate a new candidate ranking function  Compare to current “best” x1  If candidate is better, update “best” ranking functionYue, Y. and Joachims, T. (2009). Interactively optimizing informationretrieval systems as a dueling bandits problem. In ICML 09. Adapting Rankers Online 15
  • 16. Challenges Generalize over queries and documents Learn from implicit feedback that is …  noisy  relative  rank-biased Keep users happy while learning Adapting Rankers Online 16
  • 17. Exploration and exploitationNeed to learn effectively Need to present high-from rank-biased quality results whilefeedback learning Exploration ExploitationPrevious approaches are either purely exploratory orpurely exploitative Adapting Rankers Online 17
  • 18. Questions Can we improve online performance by balancing exploration and exploitation? How much exploration is needed for effective learning? Adapting Rankers Online 18
  • 19. Problem formulation Reinforcement learning  No explicit labels  Learn from feedback from the environment in response to actions (document lists) Contextual bandit problem try something documents Retrieval Environment Retrieval Environment system (user) system (user) get feedback clicks Adapting Rankers Online 19
  • 20. Our method Learning based on Dueling Bandit Gradient Descent  Relative evaluations of quality of two document lists  Infers such comparisons from implicit feedback Balance exploration and exploitation with k-greedy comparison of document lists Adapting Rankers Online 20
  • 21. k-greedy exploration To compare document lists, interleave An exploration rate k influences the relative number of documents from each list Blue wi n c o mp a r s is o n n Exp l o ratio rate k = 0.5 Adapting Rankers Online 21
  • 22. k-greedy exploration atio n atio n Exp l o r 0.5 Exp l o r 0.2 rate k = rate k = Adapting Rankers Online 22
  • 23. Evaluation Simulated interactions We need to  observe clicks on arbitrary result lists  measure online performance Simulate clicks and measure online performance  Probabilistic click model: assume dependent click model and define click and stop probabilities based on standard learning to rank data sets  Measure cumulative reward of the rankings displayed to the user Adapting Rankers Online 23
  • 24. Experiments Vary exploration rate k Three click models  “perfect”  “navigational”  “informational” Evaluate on nine data sets (LETOR 3.0 and 4.0) Adapting Rankers Online 24
  • 25. “Perfect” click model 0.8 0.6 Click model 0.4 P(c|R) P(c|NR) P(s|R) P(s|NR) 0.2 1.0 0.0 0.0 0.0 0.0 0 200 400 600 800 1000 Final performance over time for data set NP2003 and perfect click model Provides an upperbound Adapting Rankers Online 25
  • 26. “Perfect” online performance k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1HP2003 119.91 125.71 129.99 130.55 128.50HP2004 109.21 111.57 118.54 119.86 116.46 117.44 fo r m a n ceNP2003 108.74 113.61 Bes t per 120.46 o 119.06 124.47 n l y t wNP2004 112.33 119.34 with o 126.20 y 123.70TD2003 82.00 84.24 88.20 r ato r 89.36 exp lo 86.20 or e nts f91.71TD2004 85.67 90.23 do c u m 91.00 88.98OHSUMED 128.12 130.40 top- 01 131.16 re s u lts 133.37 131.93MQ2007 96.02 97.48 98.54 100.28 98.32MQ2008 90.97 92.99 94.03 95.59 95.14 Darker shades indicate higher performance 125.71 Dark borders indicate significant improvements over the k = 0.5 baseline Adapting Rankers Online 26
  • 27. “Navigational” click model 0.8 0.6 Click model 0.4 P(c|R) P(c|NR) P(s|R) P(s|NR) 0.2 0.95 0.05 0.9 0.2 0.0 0 200 400 600 800 1000 Final performance over time for data set Simulate realistic but NP2003 and navigational click model reliable interaction Adapting Rankers Online 27
  • 28. “Navigational” online performance k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1HP2003 102.58 109.78 118.84 116.38 117.52HP2004 89.61 97.08 99.03 103.36 105.69NP2003 90.32 100.94 Be st p e r fo r m a n c e 105.03 108.15 110.12NP2004 99.14 104.34 t le 110.16 h l i t 112.05 wit 116.00TD2003 70.93 75.20 ex plo 77.64ratio n77.54dan 75.70TD2004 78.83 80.17 82.40 ot s o f 83.54 l 80.98OHSUMED 125.35 126.92 127.37 l o i t at i o n exp 127.94 127.21MQ2007 95.50 94.99 95.70 96.02 94.94MQ2008 89.39 90.55 91.24 92.36 92.25 Darker shades indicate higher performance 125.71 Dark borders indicate significant improvements over the k = 0.5 baseline Adapting Rankers Online 28
  • 29. “Informational” click model 0.8 k = 0.5 k = 0.2 k = 0.1 0.6 Click model 0.4 P(c|R) P(c|NR) P(s|R) P(s|NR) 0.2 0.9 0.4 0.5 0.1 0.0 0 200 400 600 800 1000 Simulate very noisy Final performance over time for data set NP2003 and informational click model interaction Adapting Rankers Online 29
  • 30. “Informational” online performance k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1HP2003 59.53 63.91 61.43 70.11 71.19HP2004 41.12 52.88 st 55.88 H i g h e 58.40 48.54 55.16 e nts63.23t h wiNP2003 53.63 53.64 57.60 69.90 63.38 p ro ve m im o n55.76 te s:NP2004 60.59 64.17 69.96 51.58 at i r raTD2003 52.78 l o w exp l o 52.95 57.30 59.75 n b et we e nTD2004 58.49 i nte ra 61.43 ctio 62.88 63.37 126.76 et asOHSUMEDMQ2007 121.39 91.57 123.26 92.00 124.01 an n o ise91.66 d dat90.79 125.40 90.19MQ2008 86.06 87.26 85.83 87.62 86.29 Darker shades indicate higher performance 125.71 Dark borders indicate significant improvements over the k = 0.5 baseline Adapting Rankers Online 30
  • 31. Summary What?  Developed first method for balancing exploration and exploitation in online learning to rank  Devised experimental framework for simulating user interactions and measuring online performance And so?  Balancing exploration and exploitation improves online performance for all click models and all data sets  Best results are achieved with 2 exploratory documents per results list Adapting Rankers Online 31
  • 32. What’s next here? Validate simulation assumptions Evaluate using on click logs Develop new algorithms for online learning to rank for IR that can balance exploration and exploitation Adapting Rankers Online 32
  • 33. Ongo ingInferring Preferences work from Clicks Adapting Rankers Online 33
  • 34. Interleaved ranker comparison methods Use implicit feedback (“clicks”), not to infer absolute judgments, but to compare two rankers by observing clicks on an interleaved result list  Interleave two ranked lists (“outputs of two rankers”)  Use click data to detect even very small differences between rankers Examine three existing methods for interleaving, identify issues with them and propose a new one Adapting Rankers Online 34
  • 35. Three methods (1) Balanced interleave method  Interleaved list is generated for each query based on the two rankers  User’s clicks on interleaved list are attributed to each ranker based on how they ranked the clicked docs  Ranker that obtains more clicks is deemed superiorJoachims, Evaluating retrieval performance Adapting Rankers Online 35using clickthrough data, In: Text Mining, 2003
  • 36. 1) Interleaving 2) ComparisonList l1 List l2 d1 d2 d1 d2 d2 d3 x d2 x d1 observed clicks c d3 d4 d3 d3 d4 d1 x d4 x d4 k = min(4,3) = 3 k = min(4,4) = 4Two possible interleaved lists l: click count: click count: c1 = 1 c1 = 2d1 d2 c2 = 2 c2 = 2d2 d1d3 d3 l2 wins the first comparison, and the lists tie ford4 d4 the second. In expectation l2 wins. Adapting Rankers Online 36
  • 37. Three methods (2) Team draft method  Create an interleaved list following the model of “team captains” selecting their team from a set of players  For each pair of documents to be placed in the interleaved list, a coin flip determines which list gets to select a document first  Record which document contributed which documentRadlinski et al., How does click-through data reflect Adapting Rankers Online 37retrieval quality? 2008
  • 38. 1) Interleaving 2) Comparison assignments aList l1 List l2 d1 d2 a) c) d2 d3 d1 1 d2 2 d3 d4 d2 2 d1 1 d4 d1 x d3 1 x d3 2 d4 2 d4 1Four possible interleaved lists l,with different assignments a: b) d) d2 2 d1 1For the interleaved lists a) and b) l1 d1 1 d2 2wins the comparison. l2 wins in the x d3 1 x d3 2other two cases. d4 2 d4 1 Adapting Rankers Online 38
  • 39. Three methods (3) Document-constraint method  Result lists are interleaved and clicks observed as for the balanced interleaved method  Infer constraints on pairs of individual documents based on clicks and ranks  For each pair of a clicked document and a higher-ranked non- clicked document, a constraint is inferred that requires the former to be ranked higher than the latter  The original list that violates fewer constraints is deemed superiorHe et al., Evaluation of methods for relative comparison of retrieval Adapting Rankers Online 39systems based on clickthroughs, 2009
  • 40. 1) Interleaving 2) ComparisonList l1 List l2 d1 d2 d1 d2 d2 d3 x d2 x d1 d3 d4 x d3 x d3 d4 d1 d4 d4 inferred constraints inferred constraintsTwo possible interleaved lists l: violated by: l1 l2 violated by: l1 l2d1 d2 d2 ≻ d1 x - d1 ≻ d2 - xd2 d1 d3 ≻ d1 x - d3 ≻ d2 x xd3 d3 l2 wins the first comparison, and loses thed4 d4 second. In expectation l2 wins. Adapting Rankers Online 40
  • 41. Assessing comparison methods Bias  Don’t prefer either ranker when clicks are random Sensitivity  The ability of a comparison method to detect differences in the quality of rankings Balanced interleave and document constraint are biased Team draft may suffer from insensitivity Adapting Rankers Online 41
  • 42. A new proposal Briefly  Based on team draft  Instead of interleaving deterministically, model the interleaving process as random sampling from softmax functions that define probability distributions over documents  Derive an estimator that is unbiased and sensitive to small ranking changes  Marginalize over all possible assignments to make estimates more reliable Adapting Rankers Online 42
  • 43. 1) Probabilistic Interleave 2) Probabilistic marginalize over all possible assignments: Comparisonl1 ! softmax s1 l2 ! softmax s2 a o(ci,a) P(a|li,qi) d1 d2 P(dr=1)= 0.85 1 1 1 1 2 0 0.053 Observe data, e.g. 1 1 1 2 2 0 0.053 d2 d3 P(dr=2)= 0.10 d1 1 1 1 2 1 1 1 0.058 d3 d4 P(dr=3)= 0.03 x d2 2 1 1 2 2 1 1 0.058 d4 d1 P(dr=4)= 0.02 x d3 1 1 2 1 1 1 1 0.065For each rank of the interleaved list l draw one of {s1, s2} and d4 2 1 2 1 2 1 1 0.065 P(c1 > c2) = 0.108sample d: 1 2 2 1 0 2 0.071 P(c1 < c2) = 0.144 s1 d4 1 2 2 2 0 2 0.071 s1 d3 d2 s2 d4 2 1 1 1 2 0 0.001 s1 s2 d4 ... 2 1 1 2 2 0 0.001 s2 (based on l2) wins d1 d3 ... 2 1 2 1 1 1 0.001 the comparison. s1 and s2 2 1 2 2 1 1 0.001 s2 tie in expectation. s1 d2 ... d4 ... 2 2 1 1 1 1 0.001 2 2 1 2 1 1 0.001 s2 d3 ... 2 2 2 1 0 2 0.001 All permutations of documents d4 ... in D are possible. 2 2 2 2 0 2 0.001  For an incoming query  ...  System generates  All possible assignments are generated; interleaved list  Probability of each is computed  Observe clicks  Expensive; only need to do this  Compute probability of until the lowest observed click each possible outcome Adapting Rankers Online 43
  • 44. Question Do analytical differences between the methods translate into performance differences? Adapting Rankers Online 44
  • 45. Evaluation Set-up  Simulation based on dependent click model  Perfect and realistic instantiations  Not binary, but with relevance levels  MSLR-WEB30k Microsoft learning to rank data set  136 doc features (i.e., rankers) Three experiments  Exhaustive comparison of all distinct ranker pairs  9,180 distinct pairs  Selection of small subsets for detailed analysis  Add noise Adapting Rankers Online 45
  • 46. Results (1) Experiment 1  Accuracy  Percentage of pairs of rankers for which a comparison method identified the better ranker after 1000 queries Method Accuracybalanced interleave 0.881team draft 0.898document constraint 0.857new 0.914 Adapting Rankers Online 46
  • 47. Results (2): overview “Problematic” pairs  Pairs of rankers for which all methods correctly identified the better one  Three achieved perfect accuracy within 1000 queries  For each method, incorrectly judged pair with highest difference in NDCG Adapting Rankers Online 47
  • 48. Results (3): perfect model 10.8 10.6 0.80.4 1 0.60.2 balanced interleave team draft 0.8 document constraint 0.4 marginalized probabilities 1 0 0.6 1 10 100 1k 2k 5k 10k 0.2 0.8 0.4 0 0.6 1 10 100 1k 2k 5k 10k 0.2 0.4 0 1 10 0.2 100 1k 2k 5k 10k 0 1 10 100 1k 2k 5k 10k Adapting Rankers Online 48
  • 49. Results (4): realistic model 1 0.8 0.6 1 0.4 0.8 0.2 0.6 0 0.4 1 10 100 1k 2k 5k 10k 0.2 0 1 10 100 1k 2k 5k 10k Adapting Rankers Online 49
  • 50. Summary What?  Methods for evaluating rankers using implicit feedback  Analysis of interleaved comparison methods in terms of bias and sensitivity And so?  Introduced a new probabilistic interleaved comparison method, unbiased and sensitive  Experimental analysis: more accurate, with substantially fewer observed queries, more robust Adapting Rankers Online 50
  • 51. What’s next here? Evaluate in a real-life setting in the future With more reliable and faster convergence, our approach can pave the way for online learning to rank methods that require many comparisons Adapting Rankers Online 51
  • 52. Wrap-up Adapting Rankers Online 52
  • 53.  Online learning to rank Emphasis on implicit feedback collected during normal operation of the search engine Balancing exploration and exploitation Probabilistic method for inferring preferences from clicks Adapting Rankers Online 53
  • 54. Information retrieval observatory Academic experiments on online learning and implicit feedback used simulators  Need to validate the simulators What’s really needed  Move away from artificial explicit feedback to natural implicit feedback  Shared experimental environment for observing users in the wild as they interact with systems Adapting Rankers Online 54
  • 55.  Adapting Rankers Online Maarten de Rijke, derijke@uva.nl Adapting Rankers Online 55
  • 56. (Intentionally left blank) Adapting Rankers Online 56
  • 57. Bias 1) Interleaving 2) Comparison List l1 List l2 d1 d2 d1 d2 d2 d3 x d2 x d1 observed clicks c d3 d4 d3 d3 d4 d1 x d4 x d4 k = min(4,3) = 3 k = min(4,4) = 4 Two possible interleaved lists l: click count: click count: c1 = 1 c1 = 2 d1 d2 c2 = 2 c2 = 2 d2 d1 d3 d3 l2 wins the first comparison, and the lists tie for d4 d4 the second. In expectation l2 wins. Adapting Rankers Online 57
  • 58. Sensitivity 1) Interleaving 2) Comparison assignments a List l1 List l2 d1 d2 a) c) d2 d3 d1 1 d2 2 d3 d4 d2 2 d1 1 d4 d1 x d3 1 x d3 2 d4 2 d4 1 Four possible interleaved lists l, with different assignments a: b) d) d2 2 d1 1 For the interleaved lists a) and b) l1 d1 1 d2 2 wins the comparison. l2 wins in the x d3 1 x d3 2 other two cases. d4 2 d4 1 Adapting Rankers Online 58