SlideShare a Scribd company logo
Learning to Rank Entity Relatedness Through
Embedding-Based Features
Pierpaolo Basile, Annalina Caputo, Gaetano Rossiello, Giovanni Semeraro
gaetano.rossiello@uniba.it
Department of Computer Science
University of Bari - Aldo Moro, Italy
23 June 2016
NLDB 2016 - 21st International Conference on Applications of Natural Language to Information Systems
Entity Relatedness
Entity Relatedness tries to capture the strength of the relationship
between named entities or concepts
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Entity Relatedness - Applications
Entity relatedness, as a semantic measure, plays a key role in:
Natural Language Processing
Information Retrieval
Question Answering
Entity Linking
Entity Recommendation
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Entity Relatedness - Example
Is Stanley Kubrick more related to Beethoven or Mozart?
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Entity Relatedness - Example
Stanley Kubrick is more related to Beethoven
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Why?
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Entity Relatedness - Example
Clockwork Orange is a movie directed by Stanley Kubrick with a
soundtrack that contains music by Beethoven
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Where can we get this knowledge?
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Entity Relatedness - Example
Each article in Wikipedia is an entity or concept in the real world
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Entity Relatedness - State-of-the-art
Most of the methods that exploited Wikipedia for entity
relatedness have focused on a single aspect at time
In the past, proposed measures exploited some statistical aspects
relying on the Wikipedia content:
Joint probability
Conditional probability
Entropy
Kullback-Leibler divergence
Co-citation
Jaccard similarity
Chi-square statistic test
...
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
The Idea
Wikipedia provides evidence of different kinds of relatedness:
Textual content of articles
Hyperlink graph structure
Hierarchical organization of categories
The idea is to combine different measures into a unified framework
in order to make the relatedness more effective
The combination of such measures proved to be very effective in a
Learning to Rank framework [1]
[1] Ceccarelli, Diego, et al. ”Learning relatedness measures for entity linking.” Proceedings of the 22nd ACM
international conference on Conference on information & knowledge management. ACM, 2013.
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Our contributions
We define a new set of features based on word/link
embeddings
We test these features within a learning to rank framework
We evaluate the contribution of each of these features through
a feature selection algorithm
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Distributional Space Models
Three different Distributional Space Models are built on
Wikipedia content using Word2Vec tool:
Entity (e) Space built only on the entities occurring in the
Wikipedia pages
Entity&Word (e&w) Space built on both entities and words
that occur in the Wikipedia pages
Abstract (a) Space built only on the Wikipedia page abstracts
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Embedding-Based Features
Given two entities ei and ej we define a new set of features:
W 2Ve(ei , ej ) Cosine similarity computed between vectors built in
the space e
W 2Ve&w (ei , ej ) Cosine similarity computed between vectors built
in the space e&w
W 2Va(ei , ej ) Cosine similarity computed between vectors built in
the space a
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Vector Space Features
In order to compare the proposed embedding-based features, we
define two additional measures in a standard vector space of links:
vsmin(ei , ej ) Cosine similarity between vectors built on the in-links
of pages ei and ej
vsmout(ei , ej ) Cosine similarity between vectors built on the
out-links of pages ei and ej
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Evaluation Goal
The goal of the evaluation is twofold:
Prove the effectiveness of the proposed relatedness measures
based on embeddings
Provide a deep features analysis by relying on the features
selection algorithm
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Evaluation Setup
Dataset Subset of the CoNNL 2003 entity recognition task
Training: 957,622 pairs
Validation: 361,984 pairs
Test: 295,886 pairs
Word2Vec Skip-gram model
W 2Ve: 200 dim
W 2Ve&w : 300 dim
W 2Va: 200 dim
L2R LambdaMART algorithm with nDCG@10
Feature Selection Kendall τ measure for ranking
Learning to Rank runs:
SOA 27 state-of-the-art features — 0.8050
ALL SOA + our 5 features — 0.8187 (+1.702%)
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Evaluation Results
# Id Description n@10 S n@10 C %∆ALL %∆SOA
1 21 Joint probability 0.7215 0.6443 -21.30 -20.04
2 14 KL divergence 0.2844 0.6657 -18.69 -17.39
3 2 Probability of e2 0.4622 0.6855 -16.27 -14.93
4 29 W2Ve 0.5471 0.7595 -7.23 -5.75
5 4 Entropy of e2 0.4622 0.7672 -6.29 -4.79
6 26 χ2 on out links 0.6046 0.779 -4.85 -3.33
7 30 W2Ve&w 0.5879 0.786 -3.99 -2.46
8 24 χ2 on in links 0.6884 0.7913 -3.35 -1.80
9 28 W2Va 0.4916 0.7927 -3.18 -1.63
10 25 χ2 on in-out links 0.6668 0.7929 -3.15 -1.60
11 16 Co-cit on in-out 0.5974 0.8079 -1.32 0.26
... ... ... ... ... ... ...
17 32 VSMout 0.5938 0.8158 -0.35 1.24
25 31 VSMin 0.5028 0.8183 -0.05 1.55
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Evaluation Results
Learning curve
NDCG@10
0.65
0.70
0.75
0.80
0 10 20 30
all
NDCG.10
SOA
P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
Learning to Rank for Entity Relatedeness
Given two entities ei and ej we want to learn a function r(ei , ej )
able to predict their degree of relatedness
A learning to rank model is trained over a set of features describing
the relatedness between entities pairs
Entity 1 Entity 2 Relevant
Germany United Kingdom YES
Germany British Empire NO
Germany Brussels YES
Germany Commissioner of Baseball NO
Cleveland Indians Mark Acre YES
Cleveland Indians Art Howe YES
Cleveland Indians Athletics NO
Cleveland Indians 1972 Chicago White Sox season NO

More Related Content

Similar to Learning to Rank Entity Relatedness Through Embedding-Based Features

Entity linking with a knowledge baseissues, techniques, and solutions
Entity linking with a knowledge baseissues, techniques, and solutionsEntity linking with a knowledge baseissues, techniques, and solutions
Entity linking with a knowledge baseissues, techniques, and solutions
Shakas Technologies
 
Bagwords
BagwordsBagwords
Bagwords
mustafa sarac
 
Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...
Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...
Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...
Dragan Gasevic
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
Janna Joceli Omena
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
CloudTechnologies
 
EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...
EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...
EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...
eckchela
 
PROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGN
PROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGNPROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGN
PROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGN
ijpla
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction Survey
Patrick Walter
 
Organizational Identification of Millennial employees working remotely: Quali...
Organizational Identification of Millennial employees working remotely: Quali...Organizational Identification of Millennial employees working remotely: Quali...
Organizational Identification of Millennial employees working remotely: Quali...
HennaAnsari
 
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
Daniele Malitesta
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
Cuong Tran Van
 
Temporal learning analytics in learning design
Temporal learning analytics in learning designTemporal learning analytics in learning design
Temporal learning analytics in learning design
Quan Nguyen
 
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
SYRTO Project
 
Towards Strengthening Links between Learning Analytics and Assessment
Towards Strengthening Links between  Learning Analytics and AssessmentTowards Strengthening Links between  Learning Analytics and Assessment
Towards Strengthening Links between Learning Analytics and Assessment
Dragan Gasevic
 
Ijetcas14 347
Ijetcas14 347Ijetcas14 347
Ijetcas14 347
Iasir Journals
 
Conceptual Framework in Qualitative Research
Conceptual Framework in Qualitative ResearchConceptual Framework in Qualitative Research
Conceptual Framework in Qualitative Research
Tribhuvan University
 
Introduction to Advance Qualitative Methods
Introduction to Advance Qualitative MethodsIntroduction to Advance Qualitative Methods
Introduction to Advance Qualitative Methods
Katrina Pritchard
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detection
csandit
 

Similar to Learning to Rank Entity Relatedness Through Embedding-Based Features (20)

Entity linking with a knowledge baseissues, techniques, and solutions
Entity linking with a knowledge baseissues, techniques, and solutionsEntity linking with a knowledge baseissues, techniques, and solutions
Entity linking with a knowledge baseissues, techniques, and solutions
 
Bagwords
BagwordsBagwords
Bagwords
 
Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...
Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...
Nurturing the Connections: The Role of Quantitative Ethnography in Learning A...
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...
EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...
EDR 8205 Week 2 Assignment: Analyze Non-Experimental (Non-Causal) Correlation...
 
PROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGN
PROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGNPROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGN
PROPERTIES OF RELATIONSHIPS AMONG OBJECTS IN OBJECT-ORIENTED SOFTWARE DESIGN
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction Survey
 
Organizational Identification of Millennial employees working remotely: Quali...
Organizational Identification of Millennial employees working remotely: Quali...Organizational Identification of Millennial employees working remotely: Quali...
Organizational Identification of Millennial employees working remotely: Quali...
 
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Temporal learning analytics in learning design
Temporal learning analytics in learning designTemporal learning analytics in learning design
Temporal learning analytics in learning design
 
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
 
Towards Strengthening Links between Learning Analytics and Assessment
Towards Strengthening Links between  Learning Analytics and AssessmentTowards Strengthening Links between  Learning Analytics and Assessment
Towards Strengthening Links between Learning Analytics and Assessment
 
Ijetcas14 347
Ijetcas14 347Ijetcas14 347
Ijetcas14 347
 
Conceptual Framework in Qualitative Research
Conceptual Framework in Qualitative ResearchConceptual Framework in Qualitative Research
Conceptual Framework in Qualitative Research
 
Introduction to Advance Qualitative Methods
Introduction to Advance Qualitative MethodsIntroduction to Advance Qualitative Methods
Introduction to Advance Qualitative Methods
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detection
 

Recently uploaded

原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 

Recently uploaded (20)

原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 

Learning to Rank Entity Relatedness Through Embedding-Based Features

  • 1. Learning to Rank Entity Relatedness Through Embedding-Based Features Pierpaolo Basile, Annalina Caputo, Gaetano Rossiello, Giovanni Semeraro gaetano.rossiello@uniba.it Department of Computer Science University of Bari - Aldo Moro, Italy 23 June 2016 NLDB 2016 - 21st International Conference on Applications of Natural Language to Information Systems
  • 2. Entity Relatedness Entity Relatedness tries to capture the strength of the relationship between named entities or concepts P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 3. Entity Relatedness - Applications Entity relatedness, as a semantic measure, plays a key role in: Natural Language Processing Information Retrieval Question Answering Entity Linking Entity Recommendation P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 4. Entity Relatedness - Example Is Stanley Kubrick more related to Beethoven or Mozart? P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 5. Entity Relatedness - Example Stanley Kubrick is more related to Beethoven P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 6. Why? P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 7. Entity Relatedness - Example Clockwork Orange is a movie directed by Stanley Kubrick with a soundtrack that contains music by Beethoven P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 8. Where can we get this knowledge? P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 9. Entity Relatedness - Example Each article in Wikipedia is an entity or concept in the real world P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 10. Entity Relatedness - State-of-the-art Most of the methods that exploited Wikipedia for entity relatedness have focused on a single aspect at time In the past, proposed measures exploited some statistical aspects relying on the Wikipedia content: Joint probability Conditional probability Entropy Kullback-Leibler divergence Co-citation Jaccard similarity Chi-square statistic test ... P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 11. The Idea Wikipedia provides evidence of different kinds of relatedness: Textual content of articles Hyperlink graph structure Hierarchical organization of categories The idea is to combine different measures into a unified framework in order to make the relatedness more effective The combination of such measures proved to be very effective in a Learning to Rank framework [1] [1] Ceccarelli, Diego, et al. ”Learning relatedness measures for entity linking.” Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2013. P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 12. Our contributions We define a new set of features based on word/link embeddings We test these features within a learning to rank framework We evaluate the contribution of each of these features through a feature selection algorithm P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 13. Distributional Space Models Three different Distributional Space Models are built on Wikipedia content using Word2Vec tool: Entity (e) Space built only on the entities occurring in the Wikipedia pages Entity&Word (e&w) Space built on both entities and words that occur in the Wikipedia pages Abstract (a) Space built only on the Wikipedia page abstracts P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 14. Embedding-Based Features Given two entities ei and ej we define a new set of features: W 2Ve(ei , ej ) Cosine similarity computed between vectors built in the space e W 2Ve&w (ei , ej ) Cosine similarity computed between vectors built in the space e&w W 2Va(ei , ej ) Cosine similarity computed between vectors built in the space a P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 15. Vector Space Features In order to compare the proposed embedding-based features, we define two additional measures in a standard vector space of links: vsmin(ei , ej ) Cosine similarity between vectors built on the in-links of pages ei and ej vsmout(ei , ej ) Cosine similarity between vectors built on the out-links of pages ei and ej P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 16. Evaluation Goal The goal of the evaluation is twofold: Prove the effectiveness of the proposed relatedness measures based on embeddings Provide a deep features analysis by relying on the features selection algorithm P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 17. Evaluation Setup Dataset Subset of the CoNNL 2003 entity recognition task Training: 957,622 pairs Validation: 361,984 pairs Test: 295,886 pairs Word2Vec Skip-gram model W 2Ve: 200 dim W 2Ve&w : 300 dim W 2Va: 200 dim L2R LambdaMART algorithm with nDCG@10 Feature Selection Kendall τ measure for ranking Learning to Rank runs: SOA 27 state-of-the-art features — 0.8050 ALL SOA + our 5 features — 0.8187 (+1.702%) P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 18. Evaluation Results # Id Description n@10 S n@10 C %∆ALL %∆SOA 1 21 Joint probability 0.7215 0.6443 -21.30 -20.04 2 14 KL divergence 0.2844 0.6657 -18.69 -17.39 3 2 Probability of e2 0.4622 0.6855 -16.27 -14.93 4 29 W2Ve 0.5471 0.7595 -7.23 -5.75 5 4 Entropy of e2 0.4622 0.7672 -6.29 -4.79 6 26 χ2 on out links 0.6046 0.779 -4.85 -3.33 7 30 W2Ve&w 0.5879 0.786 -3.99 -2.46 8 24 χ2 on in links 0.6884 0.7913 -3.35 -1.80 9 28 W2Va 0.4916 0.7927 -3.18 -1.63 10 25 χ2 on in-out links 0.6668 0.7929 -3.15 -1.60 11 16 Co-cit on in-out 0.5974 0.8079 -1.32 0.26 ... ... ... ... ... ... ... 17 32 VSMout 0.5938 0.8158 -0.35 1.24 25 31 VSMin 0.5028 0.8183 -0.05 1.55 P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 19. Evaluation Results Learning curve NDCG@10 0.65 0.70 0.75 0.80 0 10 20 30 all NDCG.10 SOA P. Basile, A. Caputo, G. Rossiello, G. Semeraro Learning to Rank Entity Relatedeness
  • 20.
  • 21. Learning to Rank for Entity Relatedeness Given two entities ei and ej we want to learn a function r(ei , ej ) able to predict their degree of relatedness A learning to rank model is trained over a set of features describing the relatedness between entities pairs Entity 1 Entity 2 Relevant Germany United Kingdom YES Germany British Empire NO Germany Brussels YES Germany Commissioner of Baseball NO Cleveland Indians Mark Acre YES Cleveland Indians Art Howe YES Cleveland Indians Athletics NO Cleveland Indians 1972 Chicago White Sox season NO