Context-guided Learning to Rank Entities

Makoto P. Kato
(University of Tsukuba)
Wiradee Imrattanatrai (Kyoto University), Takehiro Yamamoto,
Hiroaki Ohshima (University of Hyogo), and Katsumi Tanaka (Kyoto University)
Context-guided Learning to Rank Entities
= 0.9(Context-guided Learning) + 0.1(Learning to Rank Entities)

Learn to rank entities with their numerical attributes
and a subset of ranked entities
Goal
Rank Country GDP $ Military Land area … Min. Temp.
1st Sweden 493 5 450 -50
2nd Canada 1,550 15 9,987 -40
3rd Switzerland 664 3 41 -30
4th Australia 1,225 21 7,692 -20
5th Norway 388 5 323 -10
Entity
Numerical attribute
Learn
Ranking of Popular Countries
𝐏𝐨𝐩𝐮𝐥𝐚𝐫𝐢𝐭𝐲 = +𝟎. 𝟓 𝐇𝐚𝐩𝐩𝐢𝐧𝐞𝐬𝐬 − 𝟎. 𝟑 #𝐒𝐮𝐢𝐜𝐢𝐝𝐞𝐬
2

Attractiveness of cities ＝
＋ 0.035 (Avg. lifetime of women) － 0.032 (# Traffic accidents)
－ 0.031 (Population/# Households)
Popularity of countries ＝
＋ 0.058 (Happiness) － 0.057 (#refugees) － 0.045 (# Suicides)
Peacefulness of countries =
＋ 0.170 (Grain harvest) ＋ 0.166 (GDP grow rate) － 0.126 (# Suicides)
Usability of cameras ＝
－ 0.240 (Weight) － 0.213 (Height) ＋ 0.133 (Max. shutter speed)
Real Examples from Experiments 3

If the ranking of entities was learned,
we could realize the following applications
Motivation 4
safe countries
1.Iceland
2.New Zealand
3.Portugal
4.Austria
5.Denmark
Ranking entities
in a specified order
1.Iceland
2.New Zealand
3.Portugal
4.Austria
5.Denmark
Understanding rankings
Safety = +𝟎. 𝟓 Police budget − 𝟎. 𝟖 Crime rate
Safe country ranking 2020

Too many attributes for a small size of training data
(known as over-fitting)
Challenge
𝐏𝐨𝐩𝐮𝐥𝐚𝐫𝐢𝐭𝐲 = −𝟏. 𝟎 𝐌𝐢𝐧. 𝐓𝐞𝐦𝐩.
Rank Country GDP $ Military Land area … Min. Temp.
1st Sweden 493 5 450 -50
2nd Canada 1,550 15 9,987 -40
3rd Switzerland 664 3 41 -30
4th Australia 1,225 21 7,692 -20
5th Norway 388 5 323 -10
Entity
Numerical attribute
Learn
Ranking of Popular Countries
This should hold only
for these five countries
5

• Over-fitting
 The learned model is highly accurate for seen data (training data),
while it is not for unseen data (test data)
 In general, it happens when the number of features is large
compared to the number of training instances
• Is it a serious problem?
 If the number of attributes for an entity class is fixed,
only a solution is to increase the size of training data
Over-fitting? 6
Can you increase the number of entities?
e.g. the number of countries (max. ~ 200)
Sometimes yes, and sometimes no

Why not help rankers understand attributes by their context?
→ Context-guided Learning (CGL)
Observation 7
• It seems obvious for us that
𝐏𝐨𝐩𝐮𝐥𝐚𝐫𝐢𝐭𝐲 = −𝟏. 𝟎 𝐌𝐢𝐧. 𝐓𝐞𝐦𝐩.
is a wrong model for the popularity ranking
• Why?
 We know the meaning of the minimum temperature,
and that it is (probably) nothing to do with the country popularity
 We probably learned it by reading/listening to many sentences on
"popularity" and "minimum temperature"
Key Idea

1. Introduced the problem of learning to rank entities by
using attributes as features
 For ranking entities by various criteria and precisely understanding
ranking criteria
2. Proposed Context-guided Learning (CGL)
 A general ML method using contexts of labeling criteria and
features for preventing over-fitting
3. Conducted experiments with a wide variety of orders,
and demonstrated the effectiveness of CGL in the task
of learning to rank entities
Contributions 8

Learn the weights of a linear model by training instances,
as well as contexts of the labeling criteria and attributes
Context-guided Learning (CGL)
 Labeling criteria: language expression used to explain how labels are
given (e.g. popularity)
 Context of x: sentences on x
A large # suicides affects the popularity of countries.
# suicides may indicate low popularity of the country.
Contexts suggesting negative correlation
While the min. temp. is low, the country is popular.
The country is cold but popular.
Contexts suggesting no correlation
Estimated as
non-zero and negative
Estimated as zero
𝐏𝐨𝐩𝐮𝐥𝐚𝐫𝐢𝐭𝐲 = 𝑤1 𝐆𝐃𝐏 + 𝑤2 #𝐒𝐮𝐢𝐜𝐢𝐝𝐞𝐬 + 𝑤3(𝐌𝐢𝐧. 𝐓𝐞𝐦𝐩. )
A linear model for "popularity"
9

• Suppose we try to learn a linear model 𝑓 𝐱 = 𝐰T
𝐱 + 𝑏
• One of the weight values fitting the training data is
𝐰 = (𝟏, 𝟎) meaning that "warm countries are rich"
 (𝟎, 𝟏) is another candidate for 𝐰, but no evidence on which is better
Example: Learning without CGL 10
Rich
(𝑙)
Temp.
(𝑎1)
GDP
(𝑎2)
𝐱1 +1 14 9
𝐱2 +1 13 4
𝐱3 −1 3 1
Entities
Labeling
Criteria Attributes
Attributes and Labels of Entities Temp.
GDP
Decisionboundaryby𝐰
𝐱1
𝐱2
𝐱3 𝐰: Weights of a linear function
Learn
a linear
model
−1
+1
+1

𝐠 is a weight "roughly" estimated by the contexts
Expected that 𝐠 is somewhat close to the ideal weight
Example: Learning with CGL 1/2 11
… The average temp. of the lobster-
rich waters …
… The effect of rich air/fuel ratios
and temp. …
… Culturally-rich country has
moderate temp. …
Contexts of 𝑙
(usually derived from the Web corpus)
𝑐1
𝑐2
Temp.
GDP
𝐱1
𝐱2
𝐠
Predict… GDP is a key factor for richness.
…
… Rich countries have high GDP.
…
… Rich regions, where GDP was
above the EU-28 …
𝐱3
For"temp."For"GDP"
−1
+1
+1

CGL estimate 𝐰 by 𝐰 = 𝐠 + 𝐯
The difference 𝐯 is expected to be small
Evidences to support 𝐰 = (0, 1) meaning that "a high GDP indicates richness"
Example: Learning with CGL 2/2 12
Temp.
GDP
𝐱1
𝐱2
𝐱3
𝐠Predict
Rich
(𝑙)
Temp.
(𝑎1)
GDP
(𝑎2)
𝐱1 +1 14 9
𝐱2 +1 13 4
𝐱3 −1 3 1
Entities
Labeling
Criteria Attributes
Attributes and Labels of Entities
Decision boundary by 𝐰
𝐰
𝐯
−1
+1
+1

• Linear function 𝑓𝑘 to rank entities in order 𝑘
(we assume there are several orders to be learned)
𝑓𝑘 𝐱 𝑖 =
𝑗=1
𝑀
𝑤 𝑘,𝑗 x𝑖,𝑗
• Weight Model
𝑤 𝑘,𝑗 = 𝐮 𝑇
𝐜 𝑘,𝑗 + 𝑣 𝑘,𝑗
Formalization
𝒋-th attribute value of 𝒊-th entityWeight value
for 𝒋-th attribute
Weight vector
for context vectors
Context vector
for order 𝒌 and
𝒋-th attribute
Weight value that could not
be explained by only contexts
13

Any models such as TF-IDF, doc2vec, or
Sentence-BERT can be applied to the
contexts for generating context vectors
Context Model 14
… The average temp. of the lobster-
rich waters …
… The effect of rich air/fuel ratios
and temp. …
… Culturally-rich country has
moderate temp. …
Contexts of 𝑙
(usually derived from the Web corpus)
𝑐1,1
𝑐1,2
… GDP is a key factor for richness.
…
… Rich countries have high GDP.
…
… Rich regions, where GDP was
above the EU-28 …
For"temp."For"GDP"
𝐜1,1 = (1.2, 0, 0.1)
𝐜1,2 = (0, 2.2, 1.7)
𝐮 𝑇 𝐜1,1 = 0.1
𝐮 𝑇 𝐜1,2 = 2.9
If 𝐮 = (0, 0.5, 1)
𝐮 determines how to estimate
the weight based on the
context vector

• Find the solution of this optimization problem:
min
𝐮,𝐯 𝑘,𝜉 𝑘,𝑖
𝐮 2
+
𝑐
𝐾
𝑘=1
𝐾
𝐯 𝑘
2
+ 𝐶
𝑘=1
𝐾
𝑖=1
𝑁 𝑘
𝜉 𝑘,𝑖
 subject, for 𝑘 = 1, … , 𝐾 and 𝑖 = 1, … , to the constraints
𝑓𝑘 𝐱 𝑖
sup
− 𝑓𝑘 𝐱 𝑖
inf
≥ 1 − 𝜉 𝑘,𝑖
• Can be solved by SVM solvers with a special kernel
Learning of CGL
Regularization term
similar to SVM
Slack variables
similar to SVM
The rank of 𝐱 𝑖
sup
is higher than that of 𝐱 𝑖
inf
in
the training data. Similar to RankingSVM.
15

Cities Countries Cameras
# Entities 47 138 149
# Orders 64 40 54
# Attributes 137 83 16
Examples of Orders
Attractiveness,
Richness
Livability, Safety Portability, Usability
Examples of
Attributes
Population,
Crime rate
# Visitors,
# Suicides
Resolution,
Weight
Experiments
Experiments were conducted with
Used a half of ranked entities as training data, and
examined if the rest of the entities can be ranked correctly
16

• Baselines
 RankNet
 RankBoost
 Linear-Feature
(A linear feature-based model optimized by coordinate ascent)
 LambdaMART
 ListNet
• Proposed Methods
 CGL (TF-IDF)
• The TF-IDF weighting schema was used as a context model
 CGL (Distributed)
• Paragraph vector was used as a context model
Comparative Methods 17

Context-guided Learning (CGL) worked well (+16%) at every class of entities
No significant difference between the two context models
Experimental Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
City Country Camera Total
Accuracy
RankNet RankBoost LinearFeature LambdaMART
ListNet CGL (TF-IDF) CGL (Distributed)
18

Attractiveness of cities ＝
＋ 0.035 (Avg. lifetime of women) － 0.032 (# Traffic accidents)
－ 0.031 (Population/# Households)
Popularity of countries ＝
＋ 0.058 (Happiness) － 0.057 (#refugees) － 0.045 (# Suicides)
Peacefulness of countries =
＋ 0.170 (Grain harvest) ＋ 0.166 (GDP grow rate) － 0.126 (# Suicides)
Usability of cameras ＝
－ 0.240 (Weight) － 0.213 (Height) ＋ 0.133 (Max. shutter speed)
Real Examples from Experiments 19

User Study
• Evaluated the learned model
by crowdsourcing
 “If you agree that there is a
correlation between <labeling
criterion> and <attribute>, please
assign a score +2. If you disagree,
please assign a score −2. If you
cannot agree or disagree, please
assign a score 0.”
• Compared CGL and Linear-
Feature
• CGL was slightly better
20

1. Introduced the problem of
learning to rank entities by
using attributes as features
2. Proposed Context-guided
Learning (CGL)
3. Conducted experiments with a
wide variety of orders, and
demonstrated the
effectiveness of CGL in the
task of learning to rank entities
Summary 21
Can take questions at
https://www.mpkato.net/

Context-guided Learning to Rank Entities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Context-guided Learning to Rank Entities

Similar to Context-guided Learning to Rank Entities (20)

More from kt.mako

More from kt.mako (9)

Recently uploaded

Recently uploaded (20)

Context-guided Learning to Rank Entities