This tutorial covers machine learning approaches for learning to rank documents in information retrieval systems. It discusses how early IR methods did not incorporate machine learning. It then covers: (1) ordinal regression approaches that learn multiple thresholds to account for the ordered nature of relevance labels; (2) optimizing pairwise preferences between documents, which decomposes the problem and allows for efficient algorithms; (3) directly optimizing rank-based evaluation measures like MAP and NDCG using structural SVMs, boosting, or smooth approximations to allow for gradient descent optimization of discontinuous objectives. The goal is to outperform traditional IR methods by applying machine learning techniques to learn good ranking functions.
References: V. Bush. “As We May Think.” The Atlantic Monthly, July 1945. C. Manning, P. Raghavan, H. Schütze. “Introduction to Information Retrieval.” Cambridge University Press, 2008.
References: S. Robertson. “The Probability Ranking Principle in IR.” Journal of Documentation, 33(4), 294-304, December 1977.
References: S. Robertson. “The Probability Ranking Principle in IR.” Journal of Documentation, 33(4), 294-304, December 1977.
References: -G. Salton, C. Buckley. “Term-Weighting Approaches in Automatic Text Retrieval.” Information Processing and Management, 24(5), 513-523, 1988. -S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, M. Gatford. “Okapi at TREC-3.” In Proceedings of TREC-3, 1995. -J. Ponte, W. B. Croft. “A Language Modeling Approach to Information Retrieval.” In Proceedings of SIGIR 1998. C. Zhai, J. Lafferty. “A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval.” In Proceedings of SIGIR 2001.
References: W. Chu, S. Keerthi. “New Approaches to Support Vector Ordinal Regression.” In Proceedings of ICML 2005. (http://www.gatsby.ucl.ac.uk/%7Echuwei/svor.htm)
References: W. Chu, S. Keerthi. “New Approaches to Support Vector Ordinal Regression.” In Proceedings of ICML 2005. (http://www.gatsby.ucl.ac.uk/%7Echuwei/svor.htm)
References: W. Chu, Z. Ghahramani. “Gaussian Processes for Ordinal Regression.” Journal of Machine Learning Research, 6, 1019-1041, 2005 S. Kramer, G. Widmer, B. Pfahringer, M. De Groeve. “Prediction of Ordinal Classes Using Regression Trees.” Fundamenta Informaticae, XXI, 1001-1013, 2001 R. Caruana, S. Baluja, T. Mitchell. “Using the Future to `Sort Out’ the Present: Rankprop and Multitask Learning for Medical Risk Evaluation.” In Proceedings of NIPS 1996. K. Crammer, Y. Singer. “PRanking with Ranking.” In Proceedings of NIPS 2001. W. Chu, S. Keerthi. “New Approaches to Support Vector Ordinal Regression.” In Proceedings of ICML 2005. (http://www.gatsby.ucl.ac.uk/%7Echuwei/svor.htm)
References: P. Li, C. Burges, Q. Wu. “Learning to Rank Using Classification and Gradient Boosting.” In Proceedings of NIPS 2007 T. Qin, X. Zhang, D. Wang, T. Liu, W. Lai, H. Li. “Ranking with Multiple Hyperplanes.” In Proceedings of SIGIR 2007.
References: R. Herbrich, T. Graepel, K. Obermayer. “Support Vector Learning for Ordinal Regression.” In Proceedings of ICANN 1999. T. Joachims, “A Support Vector Method for Multivariate Performance Measures.” In Proceedings of ICML 2005. (http://www.cs.cornell.edu/People/tj/svm_light/svm_perf.html)
References: C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender. “Learning to Rank Using Gradient Descent.” In Proceedings of ICML 2005. W. Cohen, R. Schapire, Y. Singer. “Learning to Order Things.” In Proceedings of NIPS 1998. Y. Freund, R. Iyer, R. Schapire, Y. Singer. “An Efficient Boosting Algorithm for Combining Preferences.” Journal of Machine Learning Research, 4, 933-969, 2003. P. Long, R. Servedio. “Boosting the Area Under the ROC Curve.” In Proceedings of NIPS 2007. R. Herbrich, T. Graepel, K. Obermayer. “Support Vector Learning for Ordinal Regression.” In Proceedings of ICANN 1999. T. Joachims, “A Support Vector Method for Multivariate Performance Measures.” In Proceedings of ICML 2005. (http://www.cs.cornell.edu/People/tj/svm_light/svm_perf.html) Y. Cao, J. Xu, T. Liu, H. Li, Y. Huang, H. Hon. “Adapting Ranking SVM to Document Retrieval.” In Proceedings of SIGIR 2006.
References: Y. Yue, C. Burges. “On Using Simultaneous Perturbation Stochastic Approximation on Learning to Rank; and, the Empirical Optimality of LambdaRank.” Microsoft Research Technical Report, MSR-TR-2007-115, 2007.
References: Y. Yue, T. Finley, F. Radlinski, T. Joachims. “A Support Vector Method for Optimizing Average Precision.” In Proceedings of SIGIR 2007. (http://projects.yisongyue.com/svmmap/) O. Chapelle, Q. Le, A. Smola. “Large Margin Optimization of Ranking Measures.” NIPS Workshop on Machine Learning for Web Search, 2007. Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, G. Sun. “A General Boosting Method and its Application to Learning Ranking Functions for Web Search.” In Proceedings of NIPS 2007. J. Xu and H. Li, “AdaRank: A Boosting Algorithm for Information Retrieval.” In Proceedings of SIGIR 2007. C. Burges, R. Ragno, Q. Le. “Learning to Rank with Non-Smooth Cost Functions.” In Proceedings of NIPS 2006. E. Snelson, J. Guiver. “SoftRank with Gaussian Processes.” NIPS Workshop on Machine Learning for Web Search, 2007.
References: I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun. “Large Margin Methods for Structured and Interdependent Output Variables.” Journal of Machine Learning Research, 6, 1453-1484, 2005.
References: Y. Yue, T. Finley, F. Radlinski, T. Joachims. “A Support Vector Method for Optimizing Average Precision.” In Proceedings of SIGIR 2007. (http://projects.yisongyue.com/svmmap/)
References: I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun. “Large Margin Methods for Structured and Interdependent Output Variables.” Journal of Machine Learning Research, 6, 1453-1484, 2005.
References: Y. Yue, T. Finley, F. Radlinski, T. Joachims. “A Support Vector Method for Optimizing Average Precision.” In Proceedings of SIGIR 2007. (http://projects.yisongyue.com/svmmap/) O. Chapelle, Q. Le, A. Smola. “Large Margin Optimization of Ranking Measures.” NIPS Workshop on Machine Learning for Web Search, 2007.
References: C. Burges, R. Ragno, Q. Le. “Learning to Rank with Non-Smooth Cost Functions.” In Proceedings of NIPS 2006
References: C. Burges. “Learning to Rank for Web Search: Some New Directions.” Keynote Speech, SIGIR 2007 Workshop on Learning to Rank for Information Retrieval.
References: C. Burges, R. Ragno, Q. Le. “Learning to Rank with Non-Smooth Cost Functions.” In Proceedings of NIPS 2006