The document discusses learning to rank, which involves using machine learning techniques to generate ordered rankings based on training data. It describes ranking as a supervised learning problem that deals with ordinal labels rather than categorical or real-valued labels. The document outlines different approaches to learning to rank, including pointwise, pairwise, and listwise methods. It also provides examples of applications like search engine rankings and personalized ad recommendations.
1. Learning to Rank
Stefan Kühn
Join me on XING
data2day Heidelberg - September 28th, 2017
Stefan Kühn (XING) Ranking 28.09.2017 1 / 30
2. Contents
1 Rankings and Humans
2 Ranking and Machine Learning
3 Formalizing Ranking Problems
4 Rankings and Recommender Systems
Stefan Kühn (XING) Ranking 28.09.2017 2 / 30
3. 1 Rankings and Humans
2 Ranking and Machine Learning
3 Formalizing Ranking Problems
4 Rankings and Recommender Systems
Stefan Kühn (XING) Ranking 28.09.2017 3 / 30
4. Rankings in Everyday Life
TODO Lists
Prioritized Backlogs
Top X songs/movies/. . .
You get the idea. . .
Stefan Kühn (XING) Ranking 28.09.2017 4 / 30
7. Rankings, Heuristics, Decisions
Rankings are about comparisons
Rankings are about decision-making
Some heuristics are about both
Recognition Heuristic
If one of two objects is recognized and the other is not, then infer that the
recognized object has the higher value with respect to the criterion.
proposed by Gigerenzer and Goldstein, built upon the great works of Kahneman and Tversky
Stefan Kühn (XING) Ranking 28.09.2017 7 / 30
8. 1 Rankings and Humans
2 Ranking and Machine Learning
3 Formalizing Ranking Problems
4 Rankings and Recommender Systems
Stefan Kühn (XING) Ranking 28.09.2017 8 / 30
9. Learning
Is Ranking a Machine Learning Problem?
Stefan Kühn (XING) Ranking 28.09.2017 9 / 30
10. Machine Learning Concepts
Supervised - Learning from Labels
Figure out how to generate correct labels using the given data
Classification
Regression
Unsupervised - Learning from Data
Identify hidden/inherent structure using the given data
Clustering
Dimensionality Reduction / Manifold Learning
Outlier Detection
Stefan Kühn (XING) Ranking 28.09.2017 10 / 30
11. Supervised versus Unsupervised
Learning to Rank
Figure out how to generate good ranking using the given data
What about Learning to Rank = Machine-Learned Ranking or MLR?
1 Supervised because ranks are like labels?
2 Unsupervised because ranks are typically based on implicit feedback,
i.e. latent/hidden/inherent structure?
3 Mixed/intermediate/something else?
4 Ill-posed question?
Could you please rank these options according to whatever you think is
appropriate?
Stefan Kühn (XING) Ranking 28.09.2017 11 / 30
12. Supervised versus Unsupervised
Learning to Rank
Figure out how to generate good ranking using the given data
What about Learning to Rank = Machine-Learned Ranking or MLR?
1 Supervised because ranks are like labels?
2 Unsupervised because ranks are typically based on implicit feedback,
i.e. latent/hidden/inherent structure?
3 Mixed/intermediate/something else?
4 Ill-posed question?
Could you please rank these options according to whatever you think is
appropriate?
And by the way, how did you do it?
Stefan Kühn (XING) Ranking 28.09.2017 11 / 30
13. Example: XING Stream
How to order News?
By time?
By content/topic?
By popularity?
By clicking probability?
Every choice changes the problem to
solve while the result set is always the
same - a ranked list of items. Every
choice represents a different distance
measure / objective function to
minimize.
Stefan Kühn (XING) Ranking 28.09.2017 12 / 30
14. 1 Rankings and Humans
2 Ranking and Machine Learning
3 Formalizing Ranking Problems
4 Rankings and Recommender Systems
Stefan Kühn (XING) Ranking 28.09.2017 13 / 30
15. Ranking - Problem Formulation
Items x ∈ X
Ordered Labels or Ranks 1 > 2 > . . . > k > . . .
Ranking rule f that allows to do the following:
Input: Unordered subset {x, y, z, . . .} ⊆ X
Output: Ordered list, i.e. y > z > x > . . .
Example: Text search
Items: Set of Documents
Ranking rule f : Similarity measure for documents and search terms
Stefan Kühn (XING) Ranking 28.09.2017 14 / 30
16. Ranking and Level of Measurement
Supervised Learning Problems
Classification - Nominal Scale - Class Labels
Ranking - Ordinal Scale - Ranks
Regression - Intervall Scale - Real Values
Ranking is the task of predicting labels on an ordinal scale.
Informally: Learn ordering from labeled training data - typically ordered lists
of items - and try to predict ordering for new sets of items.
What is special about this?
Ordering is context-dependent. One additional item (or one item less) can
change all other ranks. This is clearly different compared to regression and
classification.
Stefan Kühn (XING) Ranking 28.09.2017 15 / 30
17. Ranking in Information Retrieval
CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=518546
Stefan Kühn (XING) Ranking 28.09.2017 16 / 30
18. Ranking - Pointwise
Approach Characteristics
Input: Single items
Evaluation: Scoring function evaluated for each point/item
Optimization: Loss function derived from individual scores
Reduces Ranking Problem to either
Regression
Classification
Ordinal Regression
Stefan Kühn (XING) Ranking 28.09.2017 17 / 30
19. Ranking - Pointwise
Image taken from Tie-Yan Liu @ WWW 2009 Tutorial on Learning to Rank
http://wwwconference.org/www2009/pdf/T7A-LEARNING TO RANK TUTORIAL.pdf
Stefan Kühn (XING) Ranking 28.09.2017 18 / 30
20. Ranking - Pointwise
Problems with the Pointwise Approach
Length of item lists can differ significantly
Example: There are more website related to the search term Online
(ca. 10 Mrd.) than to Offline (ca. 666 Mio)
Position of items on list is not taken into account
Example: Incorrect ordering of the top 10 results will have a slightly
bigger impact than errors/inversions below position 123456789
Consequence
Longer lists will dominate the optimization, while actually the shorter lists
are more important for humans/customers.
Advantages
If all individual scores are known, all possible Rankings are determined.
Stefan Kühn (XING) Ranking 28.09.2017 19 / 30
21. Ranking - Pairwise
Approach Characteristics
Input: Pairs of Items
Evaluation: Preference function evaluated for each pair - binary
classification
Optimization: Pairwise Classification Loss derived from all pairings,
weighted majority voting
Reduces Ranking Problem to
Binary (or pairwise) Classification
Stefan Kühn (XING) Ranking 28.09.2017 20 / 30
22. Ranking - Pairwise
Image taken from Tie-Yan Liu @ WWW 2009 Tutorial on Learning to Rank
http://wwwconference.org/www2009/pdf/T7A-LEARNING TO RANK TUTORIAL.pdf
Stefan Kühn (XING) Ranking 28.09.2017 21 / 30
23. Ranking - Pairwise
Problems with the Pairwise Approach
Length of item lists can differ significantly
Number of pairs depends quadratically on the length of the list
Even bigger imbalance w.r.t. list length
Advantages
Comparisons of pairs of elements is a much more natural approach to
Ranking than Regression or Classification.
Stefan Kühn (XING) Ranking 28.09.2017 22 / 30
24. Ranking - Listwise
Approach Characteristics
Input: Set of Items
Evaluation: Some Evaluation Metric
Optimization:
Either: Directly minimize Evaluation Metric
Or: Loss function defined for permutations of the given input
Reduces Ranking Problem to either
Direct Optimization of Evaluation Metric
Listwise Loss Optimization (Distance between lists is non-trivial)
Stefan Kühn (XING) Ranking 28.09.2017 23 / 30
25. Ranking - Listwise
Image taken from Tie-Yan Liu @ WWW 2009 Tutorial on Learning to Rank
http://wwwconference.org/www2009/pdf/T7A-LEARNING TO RANK TUTORIAL.pdf
Stefan Kühn (XING) Ranking 28.09.2017 24 / 30
26. Ranking - Listwise
Problems with the Listwise Approach
Huge complexity issue
Direct Optimization: Non-smooth functions
Often only incomplete knowledge about ground truth for lists (only
tiny subset available for learning)
Advantages
Positions on lists are visible to the algorithms.
Stefan Kühn (XING) Ranking 28.09.2017 25 / 30
27. Important Contributions
Natural Language Processing
tf-idf
Okapi BM25
Link to Information Theory
Interesting Nonlinear Evaluation Metrics
P@k = Precision restricted to the best k items
MAP
Discounted Cumulative Gain = DCG
Interesting Non-Standard Ojective Functions
(N)DCG as optimization objective
non-continuous and non-smooth
Interesting Rankers
Pointwise: Subset Ranking; McRank; PRanking (Ordinal Regression)
Pairwise: RankNet; FRank; RankBoost; Ranking SVM
Listwise: SoftRank; SoftNDCG; SVM-MAP, Structural SVM, AdaRank
Stefan Kühn (XING) Ranking 28.09.2017 26 / 30
28. 1 Rankings and Humans
2 Ranking and Machine Learning
3 Formalizing Ranking Problems
4 Rankings and Recommender Systems
Stefan Kühn (XING) Ranking 28.09.2017 27 / 30
29. Example: Personalized Ad Recommendations
Standard Approaches
Contextual Bandits
Policies based on classifiers for
each ad
Collaborative Filtering
Based on Latent Features,
e.g. when using Matrix
Factorization
Main Problem
Extreme sparsity of positive
feedback
Stefan Kühn (XING) Ranking 28.09.2017 28 / 30
30. Example: Personalized Ad Recommendations
New Approaches
Still Contextual Bandits
Policies based on rankers
instead of classifiers
Recent Paper by Chaudhuri et
al.
Personalized Advertisement
Recommendation: A Ranking
Approach to Address the
Ubiquitous Click Sparsity
Problem
Works best in the case of
extreme sparsity
Stefan Kühn (XING) Ranking 28.09.2017 29 / 30