Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Online Learning to Rank

936 views

Published on

A survey on online learning to rank.

Published in: Science
  • Be the first to comment

Online Learning to Rank

  1. 1. Образец заголовка Online Learning to Rank by Edward W Huang (ewhuang3) and Yerzhan Suleimenov (suleime1) Prepared as an assignment for CS410: Text Information Systems in Spring 2016
  2. 2. Образец заголовка Introduction
  3. 3. Образец заголовкаWhat is learning to rank? • Many information retrieval problems are ranking problems • Also known as machine-learned ranking – Uses machine learning techniques to create ranking models • Training data: queries and documents matched with relevance judgements – Model sorts objects by relevance, preference, or importance – Finds optimal combination of features
  4. 4. Образец заголовкаApplications of learning to rank • Ranking problems in information retrieval – Document Retrieval – Sentiment analysis – Product rating – Anti-spam measures – Search engines • Many more applications not just in information retrieval! – Machine translation – Computational biology
  5. 5. Образец заголовкаOnline vs. offline learning to rank • Training set is produced by human assessors (offline) – Time consuming and expensive to produce – Not always in line with actual user preferences • Data of users interacting with system (online) – Users leave trace of interaction data: query reformulations, mouse movements, clicks, etc. – Clicks especially valuable when interpreted as preferences
  6. 6. Образец заголовка Big issue with online learning to rank • Exploration-exploitation dilemma – Have to obtain feedback to improve system, while also utilizing past models to optimize result quality – Discuss solutions later
  7. 7. Образец заголовка Creating Ranking Models
  8. 8. Образец заголовкаRanking model training framework • Discriminative training attributes – Input space – Output space – Hypothesis space – Loss function • Ranking model predicts ground truth label in training set in terms of loss function • Test phase: new query arrives, trained ranking model sorts documents according to relevance to query
  9. 9. Образец заголовка Algorithms for learning to rank problems • Categorized into three groups by their framework (input representation and loss function) – Pointwise – Pairwise – Listwise T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3): 225–331, 2009.
  10. 10. Образец заголовкаLimitations of pointwise approach • Does not consider interdependency among documents • Does not make use of the fact that some documents are associated with the same query • Most IR evaluation measures are query-level and position-based
  11. 11. Образец заголовкаPairwise and listwise • Potential solutions to the previously mentioned exploration-exploitation dilemma • Pairwise approach – Input: pairs of documents with labels identifying which one is preferred – Learns classifier to predict these labels • Listwise approach – Input: entire document list associated with a certain query – Directly optimizes evaluation measures (i.e., Normalized Discounted Cumulative Gain) Hofmann, Katja, Shimon Whiteson, and Maarten De Rijke. "Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval." Information Retrieval Inf Retrieval 16.1 (2012): 63-90.
  12. 12. Образец заголовка Absolute and relative feedback approaches • Use feedback to learn personalized rankings • Absolute feedback: contextual bandits learning • Relative feedback: gradient methods and inferred preferences between complete result rankings • Relative is usually better – Robust to noisy feedback – Deals with larger document spaces Chen, Yiwei, and Katja Hofmann. "Online Learning to Rank: Absolute vs. Relative" Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion (2015).
  13. 13. Образец заголовка State of the Art Learning
  14. 14. Образец заголовкаImproving learning performance • Search engine clicks are useful, but might be biased – Bias might come from attractive titles, snippets, or captions • Method to detect and compensate for caption bias – Enable reweighting of clicks based on likelihood – Attractive, clicked links are considered less relevant K. Hofmann, F. Behr, and F. Radlinski. On caption bias in interleaving experiments. In Proc. of CIKM, 2012.
  15. 15. Образец заголовкаHandling caption bias • Allow weighting of clicks based on likelihood that each click is caption biased • Model click probability as function of position, relevance, and caption bias – Visual characteristics of individual documents – Pairwise feature to focus on relationships with neighboring documents • Learn model weights from past behavior of users • Remove caption bias to obtain evaluation that reflects better relevance
  16. 16. Образец заголовкаImproving learning speed • Search engine clicks can be interpreted using interleaved comparison methods (two main methods) – Reliably infer preferences between pairs of rankers • Dueling bandit gradient descent learns from these comparisons – Requires pairwise comparisons involving users between all exploratory rankers • Multileave gradient descent learns from comparisons of multiple rankers at once – Uses a single user interaction – Fast Schuth, Anne, Harrie Oosterhuis, Shimon Whiteson, and Maarten De Rijke. "Multileave Gradient Descent for Fast Online Learning to Rank." Proceedings of the Ninth ACM International Conference on Web Search and Data Mining - WSDM '16 (2016).
  17. 17. Образец заголовка Evaluating Rankers
  18. 18. Образец заголовкаHow to evaluate rankers? • After training a ranker, we need to find out how effective it is • Offline evaluation methods – Dependent on explicit expert judgements – Not feasible in practice • Online evaluation methods – Leverage online data that can reflect ranker quality – Click-based ranker evaluation (discussed next) • State of the art software: Lerot – Evaluates different algorithms – Can simulate user clicking behaviour with user models Schuth, Anne, Katja Hofmann, Shimon Whiteson, and Maarten De Rijke. "Lerot." Proceedings of the 2013 Workshop on Living Labs for Information Retrieval Evaluation - LivingLab '13 (2013).
  19. 19. Образец заголовкаClick-based ranker evaluation • Online evaluation strategy based on clickthrough data • Independent of expert judgments, unlike conventional evaluation methods – Measure reflects interest of an actual user rather than interest of an expert providing relevance judgement
  20. 20. Образец заголовка Challenges of using clickthrough data • Handling presentation bias – Design user interface with three features • Blind test: hidden random variables underlying the hypothesis test • Click to preference: user’s click should reflect its actual judgment • Low usability impact: interactive, user-friendly interface • Identifying the superior of two rankers – Unified user interface that sends user query to both rankers – Mix two ranking results (discussed next) – Show combined ranking to the user and record interesting/relevant clicks T. Joachims, Evaluating Retrieval Performance Using Clickthrough Data, in J. Franke and G. Nakhaeizadeh and I. Renz, "Text Mining", Physica/Springer Verlag, pp 79-96, 2003.
  21. 21. Образец заголовкаMixing two ranking results • Also known as interleaving • Key is to mix by balancing population from both rankers in top n listings • Algorithms vary in mixing strategy – Balanced Interleaving – Team-Draft Interleaving
  22. 22. Образец заголовка Leveraging click responses from mixed rankings • Each click response represents user’s preference to ranker that provided the clicked link • Thus, proper leverage of clicks is critical – Also known as test statistics – Essential to reliable evaluation of rankers • One basic approach is to assign equal weights to all clicks – Suboptimal since not all clicks are equally significant – Caption bias! • More advanced test statistics, discussed next
  23. 23. Образец заголовкаTest statistics for evaluation • Learn weights to maximize mean score difference between best and worst rankers • Optimize statistical power of z-test by maximizing z-score and p-value – Removes assumption of equal variance of weights • Learns to invert Wilcoxon Signed-Rank Test – Produces scoring function to optimize Wilcoxon test • Max mean difference performs the worst • Inverse z-test performs the best Yisong Yue, Yue Gao, O. Chapelle, Ya Zhang, T. Joachims, Learning more powerful test statistics for click- based retrieval evaluation, Proceedings of the Conference on Research and Development in Information Retrieval (SIGIR), 2010.
  24. 24. Образец заголовка How good are interleaving methods? • Interleaving methods are compared against baseline: conventional evaluation methods based on absolute metrics • Conventional evaluation methods based on absolute metrics – Absolute usage statistics are expected to monotonically change with respect to ranker quality • Interleaving methods – More user clicks are expected for better ranker
  25. 25. Образец заголовка Relative performance of interleaving methods • Experiment results on two rankers whose relative qualities are known by construction • Conventional evaluation methods based on absolute metrics – Did not reliably identify high-quality rankers – Absolute usage statistics did not monotonically change with respect to ranker quality • Balanced Interleaving and Team-Draft Interleaving algorithms – Reliably identified high-quality rankers – Number of preferences for better ranker is significantly larger F. Radlinski, M. Kurup, T. Joachims, How Does Clickthrough Data Reflect Retrieval Quality?,Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2008.
  26. 26. Образец заголовка How much reliable and why to choose interleaved methods? • Results of interleaving agrees with conventional evaluation methods • Achieves statistically reliable preference compared to absolute metrics • Economical: statistical evaluation power of 10 interleaved clicks is approximately equal to 1 manual judged query • Not sensitive to different click aggregation schemes • Can complement or even replace standard evaluation methods based on manual judgments or absolute metrics O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, Large-Scale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.1-6.41, 2012.
  27. 27. Образец заголовкаFuture directions • Extend current linear learning approaches with online learning to rank algorithms that can effectively learn more complex models • Designing and re-experimenting with more complex models for click behavior to better understand various click biases. • Learning distinctive properties, such as click dwell time and use of back button, to filter out raw clicks. • Understanding range of domains in which interleaving methods are highly effective. • Improvement of gradient descent based rankers by covering all search directions to speed up learning processes.
  28. 28. Образец заголовкаReferences 1. Chen, Yiwei, and Katja Hofmann. "Online Learning to Rank: Absolute vs. Relative" Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion (2015). 2. F. Radlinski, M. Kurup, T. Joachims, How Does Clickthrough Data Reflect Retrieval Quality?,Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2008. 3. Hofmann, Katja, Shimon Whiteson, and Maarten De Rijke. "Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval." Information Retrieval Inf Retrieval 16.1 (2012): 63-90. 4. T. Joachims, Evaluating Retrieval Performance Using Clickthrough Data, in J. Franke and G. Nakhaeizadeh and I. Renz, "Text Mining", Physica/Springer Verlag, pp 79-96, 2003. 5. K. Hofmann, F. Behr, and F. Radlinski. On caption bias in interleaving experiments. In Proc. of CIKM, 2012. 6. O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, Large-Scale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.1-6.41, 2012. 7. Schuth, Anne, Harrie Oosterhuis, Shimon Whiteson, and Maarten De Rijke. "Multileave Gradient Descent for Fast Online Learning to Rank." Proceedings of the Ninth ACM International Conference on Web Search and Data Mining - WSDM '16 (2016). 8. Schuth, Anne, Katja Hofmann, Shimon Whiteson, and Maarten De Rijke. "Lerot." Proceedings of the 2013 Workshop on Living Labs for Information Retrieval Evaluation - LivingLab '13 (2013). 9. T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3): 225–331, 2009. 10. Yisong Yue, Yue Gao, O. Chapelle, Ya Zhang, T. Joachims, Learning more powerful test statistics for click- based retrieval evaluation, Proceedings of the Conference on Research and Development in Information Retrieval (SIGIR), 2010.

×