Learning To Rank data2day 2017

Learning to Rank
Stefan Kühn
Join me on XING
data2day Heidelberg - September 28th, 2017
Stefan Kühn (XING) Ranking 28.09.2017 1 / 30

Contents
1 Rankings and Humans
2 Ranking and Machine Learning
3 Formalizing Ranking Problems
4 Rankings and Recommender Systems

Rankings in Everyday Life
TODO Lists
Prioritized Backlogs
Top X songs/movies/. . .
You get the idea. . .

Rankings in History
It all started with

Rankings Nowadays
German States by Employee Happiness (according to Kununu)

Rankings, Heuristics, Decisions
Rankings are about comparisons
Rankings are about decision-making
Some heuristics are about both
Recognition Heuristic
If one of two objects is recognized and the other is not, then infer that the
recognized object has the higher value with respect to the criterion.
proposed by Gigerenzer and Goldstein, built upon the great works of Kahneman and Tversky

Learning
Is Ranking a Machine Learning Problem?

Machine Learning Concepts
Supervised - Learning from Labels
Figure out how to generate correct labels using the given data
Classiﬁcation
Regression
Unsupervised - Learning from Data
Identify hidden/inherent structure using the given data
Clustering
Dimensionality Reduction / Manifold Learning
Outlier Detection

Supervised versus Unsupervised
Learning to Rank
Figure out how to generate good ranking using the given data
What about Learning to Rank = Machine-Learned Ranking or MLR?
1 Supervised because ranks are like labels?
2 Unsupervised because ranks are typically based on implicit feedback,
i.e. latent/hidden/inherent structure?
3 Mixed/intermediate/something else?
4 Ill-posed question?
Could you please rank these options according to whatever you think is
appropriate?

Supervised versus Unsupervised
Learning to Rank
Figure out how to generate good ranking using the given data
What about Learning to Rank = Machine-Learned Ranking or MLR?
1 Supervised because ranks are like labels?
2 Unsupervised because ranks are typically based on implicit feedback,
i.e. latent/hidden/inherent structure?
3 Mixed/intermediate/something else?
4 Ill-posed question?
Could you please rank these options according to whatever you think is
appropriate?
And by the way, how did you do it?

Example: XING Stream
How to order News?
By time?
By content/topic?
By popularity?
By clicking probability?
Every choice changes the problem to
solve while the result set is always the
same - a ranked list of items. Every
choice represents a diﬀerent distance
measure / objective function to
minimize.

Ranking - Problem Formulation
Items x ∈ X
Ordered Labels or Ranks 1 > 2 > . . . > k > . . .
Ranking rule f that allows to do the following:
Input: Unordered subset {x, y, z, . . .} ⊆ X
Output: Ordered list, i.e. y > z > x > . . .
Example: Text search
Items: Set of Documents
Ranking rule f : Similarity measure for documents and search terms

Ranking and Level of Measurement
Supervised Learning Problems
Classification - Nominal Scale - Class Labels
Ranking - Ordinal Scale - Ranks
Regression - Intervall Scale - Real Values
Ranking is the task of predicting labels on an ordinal scale.
Informally: Learn ordering from labeled training data - typically ordered lists
of items - and try to predict ordering for new sets of items.
What is special about this?
Ordering is context-dependent. One additional item (or one item less) can
change all other ranks. This is clearly different compared to regression and
classification.

Ranking in Information Retrieval
CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=518546

Ranking - Pointwise
Approach Characteristics
Input: Single items
Evaluation: Scoring function evaluated for each point/item
Optimization: Loss function derived from individual scores
Reduces Ranking Problem to either
Regression
Classiﬁcation
Ordinal Regression

Ranking - Pointwise
Image taken from Tie-Yan Liu @ WWW 2009 Tutorial on Learning to Rank
http://wwwconference.org/www2009/pdf/T7A-LEARNING TO RANK TUTORIAL.pdf

Ranking - Pointwise
Problems with the Pointwise Approach
Length of item lists can differ significantly
Example: There are more website related to the search term Online
(ca. 10 Mrd.) than to Offline (ca. 666 Mio)
Position of items on list is not taken into account
Example: Incorrect ordering of the top 10 results will have a slightly
bigger impact than errors/inversions below position 123456789
Consequence
Longer lists will dominate the optimization, while actually the shorter lists
are more important for humans/customers.
Advantages
If all individual scores are known, all possible Rankings are determined.

Ranking - Pairwise
Input: Pairs of Items
Evaluation: Preference function evaluated for each pair - binary
classification
Optimization: Pairwise Classification Loss derived from all pairings,
weighted majority voting
Reduces Ranking Problem to
Binary (or pairwise) Classification

Ranking - Pairwise

Ranking - Pairwise
Problems with the Pairwise Approach
Length of item lists can differ significantly
Number of pairs depends quadratically on the length of the list
Even bigger imbalance w.r.t. list length
Advantages
Comparisons of pairs of elements is a much more natural approach to
Ranking than Regression or Classification.

Ranking - Listwise
Input: Set of Items
Evaluation: Some Evaluation Metric
Optimization:
Either: Directly minimize Evaluation Metric
Or: Loss function deﬁned for permutations of the given input
Reduces Ranking Problem to either
Direct Optimization of Evaluation Metric
Listwise Loss Optimization (Distance between lists is non-trivial)

Ranking - Listwise

Ranking - Listwise
Problems with the Listwise Approach
Huge complexity issue
Direct Optimization: Non-smooth functions
Often only incomplete knowledge about ground truth for lists (only
tiny subset available for learning)
Advantages
Positions on lists are visible to the algorithms.

Important Contributions
Natural Language Processing
tf-idf
Okapi BM25
Link to Information Theory
Interesting Nonlinear Evaluation Metrics
P@k = Precision restricted to the best k items
MAP
Discounted Cumulative Gain = DCG
Interesting Non-Standard Ojective Functions
(N)DCG as optimization objective
non-continuous and non-smooth
Interesting Rankers
Pointwise: Subset Ranking; McRank; PRanking (Ordinal Regression)
Pairwise: RankNet; FRank; RankBoost; Ranking SVM
Listwise: SoftRank; SoftNDCG; SVM-MAP, Structural SVM, AdaRank

Example: Personalized Ad Recommendations
Standard Approaches
Contextual Bandits
Policies based on classiﬁers for
each ad
Collaborative Filtering
Based on Latent Features,
e.g. when using Matrix
Factorization
Main Problem
Extreme sparsity of positive
feedback

Example: Personalized Ad Recommendations
New Approaches
Still Contextual Bandits
Policies based on rankers
instead of classiﬁers
Recent Paper by Chaudhuri et
al.
Personalized Advertisement
Recommendation: A Ranking
Approach to Address the
Ubiquitous Click Sparsity
Problem
Works best in the case of
extreme sparsity

Thank you!

Learning To Rank data2day 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Learning To Rank data2day 2017

Similar to Learning To Rank data2day 2017 (20)

More from Stefan Kühn

More from Stefan Kühn (14)

Recently uploaded

Recently uploaded (20)

Learning To Rank data2day 2017