First Runner-up for Best Paper at the 21st International Conference on Web Information Systems Engineering (WISE 2020).
Abstract : The collaborative knowledge graphs such as Wikidata excessively rely on the crowd to author the information. Since the crowd is not bound to a standard protocol for assigning entity titles, the knowledge graph is populated by non-standard, noisy, long, or even sometimes awkward titles. The issue of long, implicit, and nonstandard entity representations is a challenge in Entity Linking (EL) approaches for gaining high precision and recall. Underlying KG, in general, is the source of target entities for EL approaches, however, it often contains other relevant information, such as aliases of entities (e.g., Obama and Barack Hussein Obama are aliases for the entity Barack Obama). EL models usually ignore such readily available entity attributes. In this paper, we examine the role of knowledge graph context on an attentive neural network approach for entity linking on Wikidata. Our approach contributes by exploiting the sufficient context from a KG as a source of background knowledge, which is then fed into the neural network. This approach demonstrates merit to address challenges associated with entity titles (multi-word, long, implicit, case-sensitive). Our experimental study shows approx 8% improvements over the baseline approach, and significantly outperform an end to end approach for Wikidata entity linking.
VIRUSES structure and classification ppt by Dr.Prince C P
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking
1. Encoding Knowledge Graph Entity Aliases in Attentive
Neural Network for Wikidata Entity Linking
1
Authors : Isaiah Onando Mulang’, Kuldeep Singh, Akhilesh Vyas, Saeedeh Shekarpour, Maria-Esther Vidal,
Jens Lehmann, and Sören Auer
Web Information Systems Engineering (WISE) Conference 2020
Amsterdam and Leiden, The Netherlands, October 20-24, 2020
2. Know the presenter, know audience. Let’s know each other
Knowledge Graphs : Background Information [ What’s a KG and it’s Rep. ]
Introduction
Models on Wikidata Cetoli, et al.
State of the art
Impact of KG Context in EL: Can attentive Neural Networks encode KG Context to Improve Performance.
A peek into creating Local KG from entities and relations in KGs
Approach
The richness of KG context and its impact on attentive neural networks,
SotA comparison
Evaluation and Results
Open Questions, future directions
Conclusion
Augmenting Entities with KG Aliases
Research Question
Motivation & Problem Statement
3. Introduction KGs - Representation
3
ℯ1
ℯ2
human
Sovereign State
Person
President
Politician
is_a
subclass
subclass
same_as
occupation isa-a
T-BOX
A-BOX
USABarack Obama
head_of_state
4. Entity Q76 - an entity from Wikidata
4
● Alias:
○ Barack Hussein Obama, Barack Obama II, Barack Hussein Obama, Obama, Barak
Obama, Barry Obama, President Obama, President Barack Obama, BHO, Barack
● Description
○ 44th president of the United States
● Instance of
○ Human
● Label
○ Barack Obama
● Triples
○ <BHO, spouse, Q13113>
○ <Q76, spouse, Michelle Obama>
○ <Q76, wed, Q13113>
5. Introduction KGs
5
“Knowledge without application is simply
knowledge. Applying the knowledge to one’s life is
wisdom — and that is the ultimate virtue”
― Kasi Kaye Iliopoulos
6. Problem Entity Linking
6
Task : Entity Linking
Input : Given a sentence, and a Knowledge Graph (KG)
Output : Identified mentions of entities in the text. Links of counterpart entities within
the KG matching the mention (Surface forms)
7. Problem Entity Linking
7
Sentence 𝓓 is a set of ordered natural language Tokens of length n
𝓓 = {𝔀1
,𝔀2
,𝔀3
, …, 𝔀n
}
Mention 𝑚 is a token or combination of tokens drawn from 𝓓
𝓜 = {𝑚1
,𝑚2
,𝑚3
, …, 𝑚t
} : 𝑚i
∋𝔀’ ∈ 𝓓
KG =(𝓔,𝓗+
,𝓡)
Output
{(𝑚i
, 𝑒i
)}i∈1,T
Set of all triples in KG
e ∈ 𝓔 - an entity
r ∈ 𝓡 - a relation
𝓗+
⊆ (𝓔 ⨉ 𝓡 ⨉ 𝓔)
13. Research Question
13
implicit entities
implicit entities
How well does the attentive neural network perform for
entity linking task leveraging background knowledge
particularly for a challenging KG such as Wikidata?
14. SOTA - Wikidata Entity Linking
14
● Major datasets in EL are based on Wikipedia, and news articles:
○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004
● Other Dataset have been based on DBpedia or YAGO
○ RSS-500 (AGDISTIS)
○ CoNLL-AIDA-YAGO
● First Datasets on Wikidata
○ RSS-500- Wikidata - Antonin Delpeuch, 2019
○ ISTEX - Antonin Delpeuch, 2019
● Second Datasets on Wikidata
○ T-REX dataset - Elsahar H. et. al., 2018
○ that contains 4.65 million Wikipedia extracts (documents)
○ 6.2 million sentences.
implicit entities
○ implicit entities
15. SOTA - Wikidata Entity Linking
15
● Major datasets in EL are based on Wikipedia, and news articles:
○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004
● Other Dataset have been based on DBpedia or YAGO
○ RSS-500 (AGDISTIS)
○ CoNLL-AIDA-YAGO
● First Datasets on Wikidata
○ RSS-500- Wikidata - Antonin Delpeuch, 2019
○ ISTEX - Antonin Delpeuch, 2019
● Second Datasets on Wikidata
○ T-REX dataset - Elsahar H. et. al., 2018
○ that contains 4.65 million Wikipedia extracts (documents)
○ 6.2 million sentences.
implicit entities
○ implicit entities
Kolistas et. al., 2018
OpenTpioca - Delpeuch, 2019
16. SOTA - Wikidata Entity Linking
16
● Major datasets in EL are based on Wikipedia, and news articles:
○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004
● Other Dataset have been based on DBpedia or YAGO
○ RSS-500 (AGDISTIS)
○ CoNLL-AIDA-YAGO
● First Datasets on Wikidata
○ RSS-500- Wikidata - Antonin Delpeuch, 2019
○ ISTEX - Antonin Delpeuch, 2019
● Second Datasets on Wikidata
○ T-REX dataset - Elsahar H. et. al., 2018
○ that contains 4.65 million Wikipedia extracts (documents)
○ 6.2 million sentences.
implicit entities
○ implicit entities
First wikidata dataset
4.65 M docs / 6.2 M Sentences
OpenTpioca - Delpeuch, 2019
Kolistas et. al., 2018
18. Approach - Local infused KG
18
Wikidata contains over 52 million
entities and 3.9 billion facts
DBpedia contains over 5.6 million
entities and 111 million facts
Alias Triples
● <Barack Obama, spouse, Michelle Obama>
● <Obama, spouse, Michelle>
● <BH, partner, Michelle Obama>
1. Sakor et al. 2019 NAACL. Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text. Link
19. 19
Approach - Encoder - Decoder Attentive NN
● LSTM Encoder / Decoder + Attention layer: (Luong et. al. 2015)
○ Allow flexibility to break the NN architecture so as to add extra context
○ Simple - for explainability
○ Not novel in this work.
● Induces Context at the Interchange of NER & NED
○ Enhances the performance of model
○ Novel for the community
● Attention allows choosing which context is relevant
○ Among several entity aliases, the attention mechanism assists encode focused
signals.
implicit entities
○ implicit entities
20. Approach - Implementation and Training Details
20
● Apache Lucene
○ Local KG and Indexing
○ Semantic search
○ Reuses FALCON indexing (Sakor et. al., 2019)
○ Candidate Selection threshold 0.85
● PyTorch framework
○ Neural Network
● Word Embeddings
○ Pre-trained word embeddings from Glove
○ 300D
● Hardware
○ Two Nvidia GeForce GTX1080 Ti GPUs with 11GB size.
implicit entities
21. Baselines : Systems for comparison
21
● Limited Work on Wikidata Entity Linking
○ No comparison with models based on other KBs
● Baseline System
○ Attention Architectures without added Background KG
● Open Tapioca (Delpeuch)
○ Tested on the T-Rex Dataset
implicit entities
○ implicit entities
22. Resulta : Results
22
Baseline 0.630
OpenTapioca 0.579
Arjun 0.713
T-Rex dataset
APPROACH F1
RQ - How does encoding KG entity
context impact the performance of
Attentive Neural Nets over Wikidata?
23. Results : Success and Failure cases of Arjun
23
Success Cases :
● Arjun achieves 0.77 F-Score for mention detection.
● Linking implicit entities
○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit.
○ The encoded Aliases contain the form “ASIC”
○ The attention layer is able to focus the model on the relevant info.
● Subsuming entities
● Long entity titles
Failure Cases
● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from
the French”
○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963
● Identifying and Linking other Entities not in the Gold Standard
24. Results : Success and Failure cases of Arjun
24
Success Cases :
● Arjun achieves 0.77 F-Score for mention detection.
● Linking implicit entities
○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit.
○ The encoded Aliases contain the form “ASIC”
○ The attention layer is able to focus the model on the relevant info.
● Subsuming entities
● Long entity titles
Failure Cases
● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from
the French”
○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963
● Identifying and Linking other Entities not in the Gold Standard
25. Results : Success and Failure cases of Arjun
25
Success Cases :
● Arjun achieves 0.77 F-Score for mention detection.
● Linking implicit entities
○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit.
○ The encoded Aliases contain the form “ASIC”
○ The attention layer is able to focus the model on the relevant info.
● Subsuming entities
● Long entity titles
Failure Cases
● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from
the French”
○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963
● Identifying and Linking other Entities not in the Gold Standard
wdt:Q3134963
wdt:Q56539239
Surname
L’Heureux
French ship
26. Conclusions : and Open Questions
26
● Focus : introducing the limitations of EL on Wikidata
● Presented the novel approach Arjun : Encode context in NN
●
Open Questions
● Enhancing Neural Network with Multiple layers : Requires better resources
● Alternative Models : Enhance NER models and
● Replacing Semantic Search
● Cross KG entity Linking / KG-Agnostic EL
27. Acknowledgement & Contacts
27
Website
Acknowledgements
Grant Agreement No. 822404
Grant Agreement No. 27658
Fraunhofer Institute for Intelligent
Analysis and Information Systems IAIS
Schloss Birlinghoven
53757 Sankt Augustin, Germany
Isaiah Mulang’ Onando
isaiah.mulang.onando@iais.fraunhofer.de
Email
IASIS Project(EU Horizon 2020)
QualiChain Project (EU Horizon 2020)
www.mulangonando.com