Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
[SIGIR17] Learning to Rank Using Localized Geometric Mean Metrics
1. Learning to Rank Using Localized
Geometric Mean Metrics
Yuxin Su, Irwin King and Michael Lyu
Department of Computer Science and Engineering
The Chinese University of Hong Kong
The 40th International ACM SIGIR Conference on Research and Development
in Information Retrieval, 2017
Tokyo, Japan
3. Candidate Documents: Query Independent
World Wide
Web
Documents
Repository
Query 1
Query 2
Query 3
Query 𝑞
𝒟1
𝒟2
𝒟3
𝒟 𝑞
Ranking System
8/8/2017 Su, King and Lyu (The Chinese University of HK) 3
4. Learning to Rank for Query-document Pair
8/8/2017 Su, King and Lyu (The Chinese University of HK) 4
Doc Doc 𝑝 𝑛 4…Query 4
Doc Doc 𝑝 𝑛 3…Query 3
Doc Doc 𝑝 𝑛 2…Query 2
Doc Doc 𝑝 𝑛 1…Query 1
Training Data
Doc Doc Doc…Test Query
Doc Doc Doc…Ranked List
Learning System
Ranking System
Model ℎ
5. Feature List of Microsoft Learning to Rank Datasets
Feature ID Description Stream
3
Covered query term number
Title
4 URL
5 Whole document
71
Sum of TF-IDF
Title
72 URL
73 Whole document
106
BM25
Title
107 URL
108 Whole document
128 In-link number
129 Out-link number
130 PageRank
131 SiteRank
134 URL click count
8/8/2017 Su, King and Lyu (The Chinese University of HK) 5
https://www.microsoft.com/en-us/research/project/mslr
• Contain local structures in data
Observation
6. The State-of-the-art Methods
• Gradient-Boosted Regression Tree (GBRT) [Friedman, 2004]
• 𝜆-MART [Burges et al., 2011]
8/8/2017 Su, King and Lyu (The Chinese University of HK) 6
𝑟 = 1
BM25 > 0.5
LM > 0.1
PageRank >
0.3
True False
𝑟 = 4 𝑟 = 3 𝑟 = 3
True
TrueFalse False
• Sensitive to noise
• No structural information (weak for theoretical analysis)
Drawback of decision tree-like methods
7. Ideas from Manifold Learning
8/8/2017 Su, King and Lyu (The Chinese University of HK) 7
• Find better similarity measurement on the feature space of
query-document pair
Motivation
Ideal Document 2
Ideal Document 3
Ideal Document 4
Ideal Document 1
8. Contributions
8/8/2017 Su, King and Lyu (The Chinese University of HK) 8
Method
Learning to Rank for
Query-document Pairs
Geometric Mean
Metric Learning
Local Geometric
Mean Metric Learning
First
Metric Learning
First
• The state-of-the-art query-dependent metric-learning-to-rank algorithms
• The state-of-the-art learning-to-rank methods for query-document pairs
Experiments: outperforms in terms of accuracy and computational complexity
9. 𝑝𝑖
𝑝𝑗
𝑝 𝑘
Metric Learning
8/8/2017 Su, King and Lyu (The Chinese University of HK) 9
𝑑 𝑀 𝑝1, 𝑝2 = 𝑝1 − 𝑝2
⊤
𝑴(𝑝1 − 𝑝2)
• 𝑝 ∈ ℝ 𝑑
represents document
Metric Learning
pi is more similar to pj than to pk
• 𝑝𝑖, 𝑝𝑗 ∈ 𝑆
• 𝑝𝑖, 𝑝 𝑘 ∈ 𝐷
Similar and Dissimilar Pairs
10. Geometric Mean Metric Learning [Zadeh et al., ICML16]
• Similar and Dissimilar Matrices:
𝐒 =
(𝑝 𝑖,𝑝 𝑗)∈𝑆
𝑝𝑖 − 𝑝𝑗 𝑝𝑖 − 𝑝𝑗
⊤
𝐃 =
(𝑝 𝑖,𝑝 𝑘)∈𝐷
𝑝𝑖 − 𝑝 𝑘 𝑝𝑖 − 𝑝 𝑘
⊤
• Compute the metric:
𝐌 = 𝐒−1/2 𝐒1/2 𝐃𝐒1/2 1/2
𝐒−1/2
• The fastest metric learning algorithm
8/8/2017 Su, King and Lyu (The Chinese University of HK) 10
11. Localized Metric Learning
• A single (global) metric is a linear transformation
𝑑 𝐌 𝑝1, 𝑝2 = 𝑝1 − 𝑝2
⊤
𝐌 𝑝1 − 𝑝2 = 𝐋 𝑝1 − 𝑝2
⊤
𝐋 𝑝1 − 𝑝2
• Local metric contains multiple basis metrics:
𝑑(𝑝𝑖, 𝑝𝑗) = 𝑑 𝐌(𝑝𝑖)(𝑝𝑖, 𝑝𝑗)
𝐌(𝑝𝑖) =
𝑟=1
𝑚
𝑤𝑟 (𝑝𝑖)𝐌 𝑟
8/8/2017 Su, King and Lyu (The Chinese University of HK) 11
12. Proposed Framework
8/8/2017 Su, King and Lyu (The Chinese University of HK) 12
Doc Doc Doc…Test Query
Doc Doc Doc…Ranked List
Relevant
Set
Irrelevant
Set
Anchor
Doc
Relevant
Set
Irrelevant
Set
Anchor
Doc
Sampling
Mahalanobis
Metrics
Doc Doc 𝑝 𝑛 4…Query 4
Doc Doc 𝑝 𝑛 3…Query 3
Doc Doc 𝑝 𝑛 2…Query 2
Doc Doc 𝑝 𝑛 1…Query 1
Training Data
Training
Evaluation Weighted Local
Metric Model
NDCG
Scorer
13. Smooth Interpretation
8/8/2017 Su, King and Lyu (The Chinese University of HK) 13
Riemannian metric 𝑀 𝑝 is a smoothly varying:
𝑥𝑖, 𝑥𝑗 𝑝
= 𝑥𝑖
𝑇
𝐌(𝑝)𝑥𝑗
Theorem
• For the anchor document 𝑝 𝑟, we propose a smoothing weight function:
𝑤𝑟(𝑝) = exp −
𝜌
2
𝑝 − 𝑝 𝑟 𝐌 𝑟
• A well-studied Riemannian metric [Hauberg et al., NIPS 2012]
• Easy to compute
14. Ideal Candidate Document
8/8/2017 Su, King and Lyu (The Chinese University of HK) 14
𝑓𝑞(𝑝, 𝛷𝑞) = −
𝑟=1
𝑚
𝛷𝑞
(𝑟)
⋅ exp − 𝑝 − 𝑝 𝑟 𝑀 𝑟
⋅ 𝑝 − 𝑝 𝑟 𝑀 𝑟
• Relevance to 𝑞 for document 𝑝
• Φ 𝑞 need to learn
• The combination of 𝑝 𝑟 is considered as ideal candidate document
Evaluation Function
15. Weighted Approximate Rank Pairwise [Weston, ML10]
• For a set of candidate document 𝒟 𝑞 with query 𝑞:
8/8/2017 Su, King and Lyu (The Chinese University of HK) 15
ℒ 𝑞 =
1
𝒟 𝑞
+
𝑝∈𝒟 𝑞
+
𝐿 𝑣 𝑞 𝑝+
Weighted Approximate Rank Pairwise (WARP) loss
𝑣 𝑞(𝑝+
) =
𝑝−∈𝒟 𝑞
−
𝐈 𝑓𝑞 𝑝−
, 𝛷𝑞 − 𝑓𝑞 𝑝+
, 𝛷𝑞
𝑣 𝑞 𝑝+ is the number of violators in 𝒟 𝑞 for positive 𝑝+
𝐿(𝑘) =
𝑖=1
𝑘
1
log2 𝑖 + 1
NDCG score
16. Update of Φ 𝑞
•
𝜕𝑓𝑞(𝑝,𝛷 𝑞)
𝜕𝛷 𝑞
=
𝜕𝑓𝑞 𝑝,𝛷 𝑞
(𝑟)
𝜕𝛷 𝑞
(𝑟)
𝑟=1…𝑚
•
𝜕𝑓𝑞 𝑝,𝛷 𝑞
(𝑟)
𝜕𝛷 𝑞
(𝑟) = exp − 𝑝 − 𝑝 𝑟 𝑀 𝑟
⋅ 𝑝 − 𝑝 𝑟 𝑀 𝑟
8/8/2017 Su, King and Lyu (The Chinese University of HK) 16
Φ 𝑞 𝑡 + 1
= 𝛷𝑞(𝑡) − 𝜇
𝜕𝑙 𝑞, 𝑝+
, 𝑝−
𝜕𝛷𝑞(𝑡)
= 𝛷𝑞(𝑡) − 𝜇𝐿
𝒟 𝑞
−
𝑁𝑞
⋅
𝜕𝑓𝑞(𝑝−
, 𝛷𝑞(𝑡))
𝜕𝛷𝑞(𝑡)
−
𝜕𝑓𝑞 𝑝+
, 𝛷𝑞(𝑡)
𝜕𝛷𝑞(𝑡)
Stochastic gradient descent to minimize the WARP loss
19. Experiments: Questions
Correctness
• Is the proposed method a correct
extension to the global GMML?
Accuracy
• Does our solution outperform any
state-of-the-art LtR algorithm on
accuracy?
Scalability
• Does our solution enjoy high
computational efficiency and good
scalability for scaled datasets?
8/8/2017 Su, King and Lyu (The Chinese University of HK) 19
20. Datasets
8/8/2017 Su, King and Lyu (The Chinese University of HK) 20
Query-dependent Dataset: CAL10K
# of Features # of Songs
Audio 1,024 5,419
Lyrics-128 128 2000
Lyrics-256 256 2000
Query-independent Datasets (Query-document pairs)
Name
# of Queries # of Doc. Rel.
Levels
# of
FeaturesTrain Test Train Test
Yahoo! Set I 19,944 6,983 473,134 165,660 5 519
Yahoo! Set II 1,266 3,798 34,815 103,174 5 596
MSLR-WEB10K 6,000 2,000 723,412 235,259 5 136
MSLR-WEB30K 31,531 6,306 3,771k 753k 5 136
21. Global GMML vs Local GMML
8/8/2017 Su, King and Lyu (The Chinese University of HK) 21
22. Number of Local Metrics
8/8/2017 Su, King and Lyu (The Chinese University of HK) 22
23. Scalability: Training Time of L-GMML on Different
Scaled Synthetic Datasets
8/8/2017 Su, King and Lyu (The Chinese University of HK) 23
24. Comparison with R-MLR [Lim et al., ICML13]
8/8/2017 Su, King and Lyu (The Chinese University of HK) 24
Time (s) R-MLR L-GMML
Audio N/A 38
Audio with PCA 607 4.7
Lyrics-128 302 2.6
Lyrics-256 1241 7.8
25. Comparison between GBRT and Proposed L-GMML
8/8/2017 Su, King and Lyu (The Chinese University of HK) 25
26. Comparison between 𝜆-MART and Proposed L-GMML
8/8/2017 Su, King and Lyu (The Chinese University of HK) 26
27. Conclusions
• Developed a localized GMML algorithm for the query-independent
ranking framework
8/8/2017 Su, King and Lyu (The Chinese University of HK) 27
Query-independent
Learning to Rank
Abstracted as Query-
document pair
Metric Learning
Query-dependent
Learning to Rank
Query, Doc ∈ ℝ 𝑑
• Experiments show proposed method outperforms the state-of-
the-art
Ideal Candidate Document
28. Source Code
8/8/2017 Su, King and Lyu (The Chinese University of HK) 28
https://github.com/yxsu/LtR.jl
today I will present our work for classical learning to rank task from different perspective.
When we use search engine, we expect search engine return the most important or most relevance documents at the top-k positions.
In this situation, when we type query phrase as input, search engine return a set of candidate documents.
So, the problem is how to measure these documents' relevance to the input query.
In modern search engine, query and documents will be preprocessed before user use.
For each query, search engine will store a big set of candiate documents from their document repository.
When user input a query, for example query 2, search engine will take the candidate set as input, then apply an ranking system, finally return the ranked list of candiate document to user.
Therefore, the goal of "learning to rank for query-document pair" is to try to rank the candidate documents for each query and return the top-k highest relevance documents to user.
We also call this task as query-independent learning to rank. Because query instance does not belong to the same domain of candidate documents. We focus on ranking documents, instead of directly computing the similarity between query phrase and candidate documents.
This slide display an example about features of documents.
In Micorsoft Learning to rank datasets, each document is represented by a 136-dimensional vector.
I just list some of features as example. So, our task is to rank these documents according to these features.
For this task, we have a important observation that there are local structures in this data.
this learning to rank is not a new task. There are several excellent methods to conduct this task.
For example, these are two most popular methods, and also enjoy the state-of-the-art performance.
All of these kind of methods are the extension to decision tree. The only different point is how to compute the threshold.
The drawback of decision tree is sensitive to noise, and there is no structural information. It is very important to the theoretical analysis.
So, in order to overcome these drawbacks, we bollowered some ideas from manifold learning. Because learning-to-rank is also a classical task about how to measure the similarity with some ideal documents in a smart way.
When document has shorter distance to an ideal document, this document should be labeled as high-relevant.
In our work, we employ metric learning to model the real distance.
The contribution of this work are two-fold:
we are the first to employ metric learning algorithm to this learning to rank for query-document pairs.
Although, I'm not the first to define the framework of the metric learning for ranking problem. But these methods only compute the similarity between query instance and candidate document, which is not suitable for this more popular learning-to-rank task.
In order to achieve this goal, we developed a local geometric mean metric learning algorithm from the original global version.
The experiments help us show that our approach out-performs the state-of-the-art for accuracy and computational complexity.
In the following slides, I will introduce the proposed method.
Suppose there are three points in an unkowned manifold. In this example, we can know, p_i is more similar to p_j than to p_k.
We attempt to define a new distance measurement instead of Ecludiean distance to model the actual distance between points.
So, in metric learning, we consider p_i and p_j as similar pair, p_i and p_k as dissimilar pair.
In most metric learning framework, include this geometric mean metric learning method, we take these similar and dissimilar set as input and compute the metric.
We want to extend this method, because this is the fastest metric learning algorithm currently.
But, the original GMML is not enough, because it is equivalent to a linear transformation.
We extend this method with several basis metrics for describing complex structure.
this picture show: the more number of basic metrics, the higher ability to describe the complex data structure
This slides show our proposed method:
We samples from training data to learn several basis metrics.
Then we propose a evaluation function by combining these basic metrics.
Finally, in order to optimize our evaluation function, we use the popular NDCG scorer to learn the best weighting function.
In order to obtain a proper manifold. we need a smooth interpretation about our weighting function. Because of this theorem.
so, for the anchor document
Then, we can formulate our evaluation function for learning-to-rank task like this:
this function show the relevance to q for document p.
We can consider this evaluation function as the distance between the query and an ideal candidate document. Because the forms of combination.
the shorter distance, the higher relevance.
we apply the popular WARP loss to optimize our evaluation function.
we use NDCG score here.
then, we can optimize our evaluation function by computing the gradient to Phi_q, which is the most important parameters in our evaluation function.
This is our overall algorithm framework.
The first part learn basis metrics.
The second part optimize the weighting function
Before going deep, we would like propose the following questions for our experiments.
Our method is an extension to original global GMML, we concern the correctness of this extension.
We employ the following two types of datasets.
One for query-dependent task,
one for query-independent task. These are all the popular datasets for learning to rank task.
This slides tackle the first question.
we synthetic datasets with different classes and its centers. We can see, when the number of basis metrics match the real number of classes, our method performs the best.
It prove that our method can correctly reveal the actual number of classes in data structure.
this slides show the number of local metrics have great impact on the performance for different real world datasets.
This slides try to answer the third question.
The figure shows that our method has a good scalability in terms of computational complexity.
This slides try to answer the second question for query-dependent task.
We can see, our method enjoy the similar performance on accuracy, but has faster speed than other methods.
This slides shows the comparison between state-of-the-art method on query-independent task. The first is GBRT
We can see our method has huge improvement on accuracy compared with GBRT.
there is similar situation on the comparison between \lambda-mart and our proposed method.
In conclusion,
we developed ...
we are not the first to introduce metric learning to query-dependent learning to rank. it based on the similarity measurement.
but the existing work can not be applied to query-independent task, because query and document does not lies on the same feature space.
so, we are the first to solve this problem by proposing a concept of ideal candidate document.
experiments also shows that our method have great improvement over the state-of-the-art.