Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking

Encoding Knowledge Graph Entity Aliases in Attentive
Neural Network for Wikidata Entity Linking
1
Authors : Isaiah Onando Mulang’, Kuldeep Singh, Akhilesh Vyas, Saeedeh Shekarpour, Maria-Esther Vidal,
Jens Lehmann, and Sören Auer
Web Information Systems Engineering (WISE) Conference 2020
Amsterdam and Leiden, The Netherlands, October 20-24, 2020

Know the presenter, know audience. Let’s know each other
Knowledge Graphs : Background Information [ What’s a KG and it’s Rep. ]
Introduction
Models on Wikidata Cetoli, et al.
State of the art
Impact of KG Context in EL: Can attentive Neural Networks encode KG Context to Improve Performance.
A peek into creating Local KG from entities and relations in KGs
Approach
The richness of KG context and its impact on attentive neural networks,
SotA comparison
Evaluation and Results
Open Questions, future directions
Conclusion
Augmenting Entities with KG Aliases
Research Question
Motivation & Problem Statement

Introduction KGs - Representation
3
ℯ1
ℯ2
human
Sovereign State
Person
President
Politician
is_a
subclass
subclass
same_as
occupation isa-a
T-BOX
A-BOX
USABarack Obama
head_of_state

Entity Q76 - an entity from Wikidata
4
● Alias:
○ Barack Hussein Obama, Barack Obama II, Barack Hussein Obama, Obama, Barak
Obama, Barry Obama, President Obama, President Barack Obama, BHO, Barack
● Description
○ 44th president of the United States
● Instance of
○ Human
● Label
○ Barack Obama
● Triples
○ <BHO, spouse, Q13113>
○ <Q76, spouse, Michelle Obama>
○ <Q76, wed, Q13113>

Introduction KGs
5
“Knowledge without application is simply
knowledge. Applying the knowledge to one’s life is
wisdom — and that is the ultimate virtue”
― Kasi Kaye Iliopoulos

Problem Entity Linking
6
Task : Entity Linking
Input : Given a sentence, and a Knowledge Graph (KG)
Output : Identified mentions of entities in the text. Links of counterpart entities within
the KG matching the mention (Surface forms)

Problem Entity Linking
7
Sentence 𝓓 is a set of ordered natural language Tokens of length n
𝓓 = {𝔀1
,𝔀2
,𝔀3
, …, 𝔀n
}
Mention 𝑚 is a token or combination of tokens drawn from 𝓓
𝓜 = {𝑚1
,𝑚2
,𝑚3
, …, 𝑚t
} : 𝑚i
∋𝔀’ ∈ 𝓓
KG =(𝓔,𝓗+
,𝓡)
Output
{(𝑚i
, 𝑒i
)}i∈1,T
Set of all triples in KG
e ∈ 𝓔 - an entity
r ∈ 𝓡 - a relation
𝓗+
⊆ (𝓔 ⨉ 𝓡 ⨉ 𝓔)

Challenges Wikidata Rep.
8
ℯ1
ℯ3
ℯ2

9
ℯ1
ℯ3
ℯ2
implicit entities

10
ℯ1
ℯ3
ℯ2
implicit entities
subsuming entities long entity titles
Andhra Pradesh High Court

11
ℯ1
ℯ3
ℯ2
implicit entities
long entity titlessubsuming entities
AP : Q1159 HC: Q3128536

12
ℯ1
ℯ3
ℯ2
implicit entities
long entity titlessubsuming entities
AP : Q1159 HC: Q3128536
complex entity titles

Research Question
13
implicit entities
implicit entities
How well does the attentive neural network perform for
entity linking task leveraging background knowledge
particularly for a challenging KG such as Wikidata?

SOTA - Wikidata Entity Linking
14
● Major datasets in EL are based on Wikipedia, and news articles:
○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004
● Other Dataset have been based on DBpedia or YAGO
○ RSS-500 (AGDISTIS)
○ CoNLL-AIDA-YAGO
● First Datasets on Wikidata
○ RSS-500- Wikidata - Antonin Delpeuch, 2019
○ ISTEX - Antonin Delpeuch, 2019
● Second Datasets on Wikidata
○ T-REX dataset - Elsahar H. et. al., 2018
○ that contains 4.65 million Wikipedia extracts (documents)
○ 6.2 million sentences.
implicit entities
○ implicit entities

15
○ CoNLL-AIDA-YAGO
implicit entities
Kolistas et. al., 2018
OpenTpioca - Delpeuch, 2019

16
○ CoNLL-AIDA-YAGO
implicit entities
First wikidata dataset
4.65 M docs / 6.2 M Sentences
OpenTpioca - Delpeuch, 2019
Kolistas et. al., 2018

Approach - Encoding Entity Aliases in Attentive NN
17

Approach - Local infused KG
18
Wikidata contains over 52 million
entities and 3.9 billion facts
DBpedia contains over 5.6 million
entities and 111 million facts
Alias Triples
● <Barack Obama, spouse, Michelle Obama>
● <Obama, spouse, Michelle>
● <BH, partner, Michelle Obama>
1. Sakor et al. 2019 NAACL. Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text. Link

19
Approach - Encoder - Decoder Attentive NN
● LSTM Encoder / Decoder + Attention layer: (Luong et. al. 2015)
○ Allow flexibility to break the NN architecture so as to add extra context
○ Simple - for explainability
○ Not novel in this work.
● Induces Context at the Interchange of NER & NED
○ Enhances the performance of model
○ Novel for the community
● Attention allows choosing which context is relevant
○ Among several entity aliases, the attention mechanism assists encode focused
signals.
implicit entities

Approach - Implementation and Training Details
20
● Apache Lucene
○ Local KG and Indexing
○ Semantic search
○ Reuses FALCON indexing (Sakor et. al., 2019)
○ Candidate Selection threshold 0.85
● PyTorch framework
○ Neural Network
● Word Embeddings
○ Pre-trained word embeddings from Glove
○ 300D
● Hardware
○ Two Nvidia GeForce GTX1080 Ti GPUs with 11GB size.
implicit entities

Baselines : Systems for comparison
21
● Limited Work on Wikidata Entity Linking
○ No comparison with models based on other KBs
● Baseline System
○ Attention Architectures without added Background KG
● Open Tapioca (Delpeuch)
○ Tested on the T-Rex Dataset
implicit entities

Resulta : Results
22
Baseline 0.630
OpenTapioca 0.579
Arjun 0.713
T-Rex dataset
APPROACH F1
RQ - How does encoding KG entity
context impact the performance of
Attentive Neural Nets over Wikidata?

Results : Success and Failure cases of Arjun
23
Success Cases :
● Arjun achieves 0.77 F-Score for mention detection.
● Linking implicit entities
○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit.
○ The encoded Aliases contain the form “ASIC”
○ The attention layer is able to focus the model on the relevant info.
● Subsuming entities
● Long entity titles
Failure Cases
● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from
the French”
○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963
● Identifying and Linking other Entities not in the Gold Standard

24
Success Cases :
Failure Cases
the French”

25
Success Cases :
Failure Cases
the French”
wdt:Q3134963
wdt:Q56539239
Surname
L’Heureux
French ship

Conclusions : and Open Questions
26
● Focus : introducing the limitations of EL on Wikidata
● Presented the novel approach Arjun : Encode context in NN
●
Open Questions
● Enhancing Neural Network with Multiple layers : Requires better resources
● Alternative Models : Enhance NER models and
● Replacing Semantic Search
● Cross KG entity Linking / KG-Agnostic EL

Acknowledgement & Contacts
27
Website
Acknowledgements
Grant Agreement No. 822404
Grant Agreement No. 27658
Fraunhofer Institute for Intelligent
Analysis and Information Systems IAIS
Schloss Birlinghoven
53757 Sankt Augustin, Germany
Isaiah Mulang’ Onando
isaiah.mulang.onando@iais.fraunhofer.de
Email
IASIS Project(EU Horizon 2020)
QualiChain Project (EU Horizon 2020)
www.mulangonando.com

Thanks for Listening
Questions?

Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking

Recommended

Recommended

More Related Content

Similar to Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking

Similar to Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking (20)

Recently uploaded

Recently uploaded (20)

Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking