Learning to Rank Entity Relatedness Through Embedding-Based Features

Learning to Rank Entity Relatedness Through
Embedding-Based Features
Pierpaolo Basile, Annalina Caputo, Gaetano Rossiello, Giovanni Semeraro
gaetano.rossiello@uniba.it
Department of Computer Science
University of Bari - Aldo Moro, Italy
23 June 2016
NLDB 2016 - 21st International Conference on Applications of Natural Language to Information Systems

Entity Relatedness
Entity Relatedness tries to capture the strength of the relationship
between named entities or concepts
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness

Entity Relatedness - Applications
Entity relatedness, as a semantic measure, plays a key role in:
Natural Language Processing
Information Retrieval
Question Answering
Entity Linking
Entity Recommendation

Entity Relatedness - Example
Is Stanley Kubrick more related to Beethoven or Mozart?

Stanley Kubrick is more related to Beethoven

Why?

Clockwork Orange is a movie directed by Stanley Kubrick with a
soundtrack that contains music by Beethoven

Where can we get this knowledge?

Each article in Wikipedia is an entity or concept in the real world

Entity Relatedness - State-of-the-art
Most of the methods that exploited Wikipedia for entity
relatedness have focused on a single aspect at time
In the past, proposed measures exploited some statistical aspects
relying on the Wikipedia content:
Joint probability
Conditional probability
Entropy
Kullback-Leibler divergence
Co-citation
Jaccard similarity
Chi-square statistic test
...

The Idea
Wikipedia provides evidence of different kinds of relatedness:
Textual content of articles
Hyperlink graph structure
Hierarchical organization of categories
The idea is to combine different measures into a unified framework
in order to make the relatedness more effective
The combination of such measures proved to be very effective in a
Learning to Rank framework [1]
[1] Ceccarelli, Diego, et al. ”Learning relatedness measures for entity linking.” Proceedings of the 22nd ACM
international conference on Conference on information & knowledge management. ACM, 2013.

Our contributions
We deﬁne a new set of features based on word/link
embeddings
We test these features within a learning to rank framework
We evaluate the contribution of each of these features through
a feature selection algorithm

Distributional Space Models
Three diﬀerent Distributional Space Models are built on
Wikipedia content using Word2Vec tool:
Entity (e) Space built only on the entities occurring in the
Wikipedia pages
Entity&Word (e&w) Space built on both entities and words
that occur in the Wikipedia pages
Abstract (a) Space built only on the Wikipedia page abstracts

Embedding-Based Features
Given two entities ei and ej we deﬁne a new set of features:
W 2Ve(ei , ej ) Cosine similarity computed between vectors built in
the space e
W 2Ve&w (ei , ej ) Cosine similarity computed between vectors built
in the space e&w
W 2Va(ei , ej ) Cosine similarity computed between vectors built in
the space a

Vector Space Features
In order to compare the proposed embedding-based features, we
deﬁne two additional measures in a standard vector space of links:
vsmin(ei , ej ) Cosine similarity between vectors built on the in-links
of pages ei and ej
vsmout(ei , ej ) Cosine similarity between vectors built on the
out-links of pages ei and ej

Evaluation Goal
The goal of the evaluation is twofold:
Prove the eﬀectiveness of the proposed relatedness measures
based on embeddings
Provide a deep features analysis by relying on the features
selection algorithm

Evaluation Setup
Dataset Subset of the CoNNL 2003 entity recognition task
Training: 957,622 pairs
Validation: 361,984 pairs
Test: 295,886 pairs
Word2Vec Skip-gram model
W 2Ve: 200 dim
W 2Ve&w : 300 dim
W 2Va: 200 dim
L2R LambdaMART algorithm with nDCG@10
Feature Selection Kendall τ measure for ranking
Learning to Rank runs:
SOA 27 state-of-the-art features — 0.8050
ALL SOA + our 5 features — 0.8187 (+1.702%)

Evaluation Results
# Id Description n@10 S n@10 C %∆ALL %∆SOA
1 21 Joint probability 0.7215 0.6443 -21.30 -20.04
2 14 KL divergence 0.2844 0.6657 -18.69 -17.39
3 2 Probability of e2 0.4622 0.6855 -16.27 -14.93
4 29 W2Ve 0.5471 0.7595 -7.23 -5.75
5 4 Entropy of e2 0.4622 0.7672 -6.29 -4.79
6 26 χ2 on out links 0.6046 0.779 -4.85 -3.33
7 30 W2Ve&w 0.5879 0.786 -3.99 -2.46
8 24 χ2 on in links 0.6884 0.7913 -3.35 -1.80
9 28 W2Va 0.4916 0.7927 -3.18 -1.63
10 25 χ2 on in-out links 0.6668 0.7929 -3.15 -1.60
11 16 Co-cit on in-out 0.5974 0.8079 -1.32 0.26
... ... ... ... ... ... ...
17 32 VSMout 0.5938 0.8158 -0.35 1.24
25 31 VSMin 0.5028 0.8183 -0.05 1.55

Evaluation Results
Learning curve
NDCG@10
0.65
0.70
0.75
0.80
0 10 20 30
all
NDCG.10
SOA

Learning to Rank for Entity Relatedeness
Given two entities ei and ej we want to learn a function r(ei , ej )
able to predict their degree of relatedness
A learning to rank model is trained over a set of features describing
the relatedness between entities pairs
Entity 1 Entity 2 Relevant
Germany United Kingdom YES
Germany British Empire NO
Germany Brussels YES
Germany Commissioner of Baseball NO
Cleveland Indians Mark Acre YES
Cleveland Indians Art Howe YES
Cleveland Indians Athletics NO
Cleveland Indians 1972 Chicago White Sox season NO

Learning to Rank Entity Relatedness Through Embedding-Based Features

Recommended

Recommended

More Related Content

Similar to Learning to Rank Entity Relatedness Through Embedding-Based Features

Similar to Learning to Rank Entity Relatedness Through Embedding-Based Features (20)

Recently uploaded

Recently uploaded (20)

Learning to Rank Entity Relatedness Through Embedding-Based Features