SlideShare a Scribd company logo
1 of 28
Download to read offline
Encoding Knowledge Graph Entity Aliases in Attentive
Neural Network for Wikidata Entity Linking
1
Authors : Isaiah Onando Mulang’, Kuldeep Singh, Akhilesh Vyas, Saeedeh Shekarpour, Maria-Esther Vidal,
Jens Lehmann, and Sören Auer
Web Information Systems Engineering (WISE) Conference 2020
Amsterdam and Leiden, The Netherlands, October 20-24, 2020
Know the presenter, know audience. Let’s know each other
Knowledge Graphs : Background Information [ What’s a KG and it’s Rep. ]
Introduction
Models on Wikidata Cetoli, et al.
State of the art
Impact of KG Context in EL: Can attentive Neural Networks encode KG Context to Improve Performance.
A peek into creating Local KG from entities and relations in KGs
Approach
The richness of KG context and its impact on attentive neural networks,
SotA comparison
Evaluation and Results
Open Questions, future directions
Conclusion
Augmenting Entities with KG Aliases
Research Question
Motivation & Problem Statement
Introduction KGs - Representation
3
ℯ1
ℯ2
human
Sovereign State
Person
President
Politician
is_a
subclass
subclass
same_as
occupation isa-a
T-BOX
A-BOX
USABarack Obama
head_of_state
Entity Q76 - an entity from Wikidata
4
● Alias:
○ Barack Hussein Obama, Barack Obama II, Barack Hussein Obama, Obama, Barak
Obama, Barry Obama, President Obama, President Barack Obama, BHO, Barack
● Description
○ 44th president of the United States
● Instance of
○ Human
● Label
○ Barack Obama
● Triples
○ <BHO, spouse, Q13113>
○ <Q76, spouse, Michelle Obama>
○ <Q76, wed, Q13113>
Introduction KGs
5
“Knowledge without application is simply
knowledge. Applying the knowledge to one’s life is
wisdom — and that is the ultimate virtue”
― Kasi Kaye Iliopoulos
Problem Entity Linking
6
Task : Entity Linking
Input : Given a sentence, and a Knowledge Graph (KG)
Output : Identified mentions of entities in the text. Links of counterpart entities within
the KG matching the mention (Surface forms)
Problem Entity Linking
7
Sentence 𝓓 is a set of ordered natural language Tokens of length n
𝓓 = {𝔀1
,𝔀2
,𝔀3
, …, 𝔀n
}
Mention 𝑚 is a token or combination of tokens drawn from 𝓓
𝓜 = {𝑚1
,𝑚2
,𝑚3
, …, 𝑚t
} : 𝑚i
∋𝔀’ ∈ 𝓓
KG =(𝓔,𝓗+
,𝓡)
Output
{(𝑚i
, 𝑒i
)}i∈1,T
Set of all triples in KG
e ∈ 𝓔 - an entity
r ∈ 𝓡 - a relation
𝓗+
⊆ (𝓔 ⨉ 𝓡 ⨉ 𝓔)
Challenges Wikidata Rep.
8
ℯ1
ℯ3
ℯ2
Challenges Wikidata Rep.
9
ℯ1
ℯ3
ℯ2
implicit entities
Challenges Wikidata Rep.
10
ℯ1
ℯ3
ℯ2
implicit entities
subsuming entities long entity titles
Andhra Pradesh High Court
Challenges Wikidata Rep.
11
ℯ1
ℯ3
ℯ2
implicit entities
long entity titlessubsuming entities
AP : Q1159 HC: Q3128536
Challenges Wikidata Rep.
12
ℯ1
ℯ3
ℯ2
implicit entities
long entity titlessubsuming entities
AP : Q1159 HC: Q3128536
complex entity titles
Research Question
13
implicit entities
implicit entities
How well does the attentive neural network perform for
entity linking task leveraging background knowledge
particularly for a challenging KG such as Wikidata?
SOTA - Wikidata Entity Linking
14
● Major datasets in EL are based on Wikipedia, and news articles:
○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004
● Other Dataset have been based on DBpedia or YAGO
○ RSS-500 (AGDISTIS)
○ CoNLL-AIDA-YAGO
● First Datasets on Wikidata
○ RSS-500- Wikidata - Antonin Delpeuch, 2019
○ ISTEX - Antonin Delpeuch, 2019
● Second Datasets on Wikidata
○ T-REX dataset - Elsahar H. et. al., 2018
○ that contains 4.65 million Wikipedia extracts (documents)
○ 6.2 million sentences.
implicit entities
○ implicit entities
SOTA - Wikidata Entity Linking
15
● Major datasets in EL are based on Wikipedia, and news articles:
○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004
● Other Dataset have been based on DBpedia or YAGO
○ RSS-500 (AGDISTIS)
○ CoNLL-AIDA-YAGO
● First Datasets on Wikidata
○ RSS-500- Wikidata - Antonin Delpeuch, 2019
○ ISTEX - Antonin Delpeuch, 2019
● Second Datasets on Wikidata
○ T-REX dataset - Elsahar H. et. al., 2018
○ that contains 4.65 million Wikipedia extracts (documents)
○ 6.2 million sentences.
implicit entities
○ implicit entities
Kolistas et. al., 2018
OpenTpioca - Delpeuch, 2019
SOTA - Wikidata Entity Linking
16
● Major datasets in EL are based on Wikipedia, and news articles:
○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004
● Other Dataset have been based on DBpedia or YAGO
○ RSS-500 (AGDISTIS)
○ CoNLL-AIDA-YAGO
● First Datasets on Wikidata
○ RSS-500- Wikidata - Antonin Delpeuch, 2019
○ ISTEX - Antonin Delpeuch, 2019
● Second Datasets on Wikidata
○ T-REX dataset - Elsahar H. et. al., 2018
○ that contains 4.65 million Wikipedia extracts (documents)
○ 6.2 million sentences.
implicit entities
○ implicit entities
First wikidata dataset
4.65 M docs / 6.2 M Sentences
OpenTpioca - Delpeuch, 2019
Kolistas et. al., 2018
Approach - Encoding Entity Aliases in Attentive NN
17
Approach - Local infused KG
18
Wikidata contains over 52 million
entities and 3.9 billion facts
DBpedia contains over 5.6 million
entities and 111 million facts
Alias Triples
● <Barack Obama, spouse, Michelle Obama>
● <Obama, spouse, Michelle>
● <BH, partner, Michelle Obama>
1. Sakor et al. 2019 NAACL. Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text. Link
19
Approach - Encoder - Decoder Attentive NN
● LSTM Encoder / Decoder + Attention layer: (Luong et. al. 2015)
○ Allow flexibility to break the NN architecture so as to add extra context
○ Simple - for explainability
○ Not novel in this work.
● Induces Context at the Interchange of NER & NED
○ Enhances the performance of model
○ Novel for the community
● Attention allows choosing which context is relevant
○ Among several entity aliases, the attention mechanism assists encode focused
signals.
implicit entities
○ implicit entities
Approach - Implementation and Training Details
20
● Apache Lucene
○ Local KG and Indexing
○ Semantic search
○ Reuses FALCON indexing (Sakor et. al., 2019)
○ Candidate Selection threshold 0.85
● PyTorch framework
○ Neural Network
● Word Embeddings
○ Pre-trained word embeddings from Glove
○ 300D
● Hardware
○ Two Nvidia GeForce GTX1080 Ti GPUs with 11GB size.
implicit entities
Baselines : Systems for comparison
21
● Limited Work on Wikidata Entity Linking
○ No comparison with models based on other KBs
● Baseline System
○ Attention Architectures without added Background KG
● Open Tapioca (Delpeuch)
○ Tested on the T-Rex Dataset
implicit entities
○ implicit entities
Resulta : Results
22
Baseline 0.630
OpenTapioca 0.579
Arjun 0.713
T-Rex dataset
APPROACH F1
RQ - How does encoding KG entity
context impact the performance of
Attentive Neural Nets over Wikidata?
Results : Success and Failure cases of Arjun
23
Success Cases :
● Arjun achieves 0.77 F-Score for mention detection.
● Linking implicit entities
○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit.
○ The encoded Aliases contain the form “ASIC”
○ The attention layer is able to focus the model on the relevant info.
● Subsuming entities
● Long entity titles
Failure Cases
● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from
the French”
○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963
● Identifying and Linking other Entities not in the Gold Standard
Results : Success and Failure cases of Arjun
24
Success Cases :
● Arjun achieves 0.77 F-Score for mention detection.
● Linking implicit entities
○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit.
○ The encoded Aliases contain the form “ASIC”
○ The attention layer is able to focus the model on the relevant info.
● Subsuming entities
● Long entity titles
Failure Cases
● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from
the French”
○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963
● Identifying and Linking other Entities not in the Gold Standard
Results : Success and Failure cases of Arjun
25
Success Cases :
● Arjun achieves 0.77 F-Score for mention detection.
● Linking implicit entities
○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit.
○ The encoded Aliases contain the form “ASIC”
○ The attention layer is able to focus the model on the relevant info.
● Subsuming entities
● Long entity titles
Failure Cases
● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from
the French”
○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963
● Identifying and Linking other Entities not in the Gold Standard
wdt:Q3134963
wdt:Q56539239
Surname
L’Heureux
French ship
Conclusions : and Open Questions
26
● Focus : introducing the limitations of EL on Wikidata
● Presented the novel approach Arjun : Encode context in NN
●
Open Questions
● Enhancing Neural Network with Multiple layers : Requires better resources
● Alternative Models : Enhance NER models and
● Replacing Semantic Search
● Cross KG entity Linking / KG-Agnostic EL
Acknowledgement & Contacts
27
Website
Acknowledgements
Grant Agreement No. 822404
Grant Agreement No. 27658
Fraunhofer Institute for Intelligent
Analysis and Information Systems IAIS
Schloss Birlinghoven
53757 Sankt Augustin, Germany
Isaiah Mulang’ Onando
isaiah.mulang.onando@iais.fraunhofer.de
Email
IASIS Project(EU Horizon 2020)
QualiChain Project (EU Horizon 2020)
www.mulangonando.com
Thanks for Listening
Questions?

More Related Content

Similar to Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking

Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...
Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...
Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...IntelliSemantic
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...
IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...
IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...YuningJiang4
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021hala Skaf
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsHugh McCamphill
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringSK Ahammad Fahad
 
Ontology of citizen science @ Siena 2016 11 24
Ontology of citizen science @ Siena 2016 11 24Ontology of citizen science @ Siena 2016 11 24
Ontology of citizen science @ Siena 2016 11 24Luigi Ceccaroni
 
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By  Kabul KurniawanKnowledge Graph for Cybersecurity: An Introduction By  Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By Kabul KurniawanKabul Kurniawan
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
CauseVCare - A Blockchain based Charity DApp
CauseVCare - A Blockchain based Charity DAppCauseVCare - A Blockchain based Charity DApp
CauseVCare - A Blockchain based Charity DAppIRJET Journal
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataEUCLID project
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarJessica Willis
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Claudio Greco
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Alessandro Suglia
 
Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Kwame Porter Robinson
 
Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesSteve Purkis
 
Test trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsTest trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsHugh McCamphill
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...Dr. Haxel Consult
 

Similar to Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking (20)

Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...
Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...
Alberto Ciaramella: "Linked patent data: opportunities and challenges for pat...
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...
IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...
IEEE_EDOC_2018 Presentation | A Language and Repository for Cyber Security of...
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely tests
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
 
Ontology of citizen science @ Siena 2016 11 24
Ontology of citizen science @ Siena 2016 11 24Ontology of citizen science @ Siena 2016 11 24
Ontology of citizen science @ Siena 2016 11 24
 
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By  Kabul KurniawanKnowledge Graph for Cybersecurity: An Introduction By  Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
CauseVCare - A Blockchain based Charity DApp
CauseVCare - A Blockchain based Charity DAppCauseVCare - A Blockchain based Charity DApp
CauseVCare - A Blockchain based Charity DApp
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit Jaokar
 
Ajit jaokar slides
Ajit jaokar slidesAjit jaokar slides
Ajit jaokar slides
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler
 
Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & Services
 
Test trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsTest trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely tests
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
 

Recently uploaded

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 

Recently uploaded (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 

Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Network for Entity Linking

  • 1. Encoding Knowledge Graph Entity Aliases in Attentive Neural Network for Wikidata Entity Linking 1 Authors : Isaiah Onando Mulang’, Kuldeep Singh, Akhilesh Vyas, Saeedeh Shekarpour, Maria-Esther Vidal, Jens Lehmann, and Sören Auer Web Information Systems Engineering (WISE) Conference 2020 Amsterdam and Leiden, The Netherlands, October 20-24, 2020
  • 2. Know the presenter, know audience. Let’s know each other Knowledge Graphs : Background Information [ What’s a KG and it’s Rep. ] Introduction Models on Wikidata Cetoli, et al. State of the art Impact of KG Context in EL: Can attentive Neural Networks encode KG Context to Improve Performance. A peek into creating Local KG from entities and relations in KGs Approach The richness of KG context and its impact on attentive neural networks, SotA comparison Evaluation and Results Open Questions, future directions Conclusion Augmenting Entities with KG Aliases Research Question Motivation & Problem Statement
  • 3. Introduction KGs - Representation 3 ℯ1 ℯ2 human Sovereign State Person President Politician is_a subclass subclass same_as occupation isa-a T-BOX A-BOX USABarack Obama head_of_state
  • 4. Entity Q76 - an entity from Wikidata 4 ● Alias: ○ Barack Hussein Obama, Barack Obama II, Barack Hussein Obama, Obama, Barak Obama, Barry Obama, President Obama, President Barack Obama, BHO, Barack ● Description ○ 44th president of the United States ● Instance of ○ Human ● Label ○ Barack Obama ● Triples ○ <BHO, spouse, Q13113> ○ <Q76, spouse, Michelle Obama> ○ <Q76, wed, Q13113>
  • 5. Introduction KGs 5 “Knowledge without application is simply knowledge. Applying the knowledge to one’s life is wisdom — and that is the ultimate virtue” ― Kasi Kaye Iliopoulos
  • 6. Problem Entity Linking 6 Task : Entity Linking Input : Given a sentence, and a Knowledge Graph (KG) Output : Identified mentions of entities in the text. Links of counterpart entities within the KG matching the mention (Surface forms)
  • 7. Problem Entity Linking 7 Sentence 𝓓 is a set of ordered natural language Tokens of length n 𝓓 = {𝔀1 ,𝔀2 ,𝔀3 , …, 𝔀n } Mention 𝑚 is a token or combination of tokens drawn from 𝓓 𝓜 = {𝑚1 ,𝑚2 ,𝑚3 , …, 𝑚t } : 𝑚i ∋𝔀’ ∈ 𝓓 KG =(𝓔,𝓗+ ,𝓡) Output {(𝑚i , 𝑒i )}i∈1,T Set of all triples in KG e ∈ 𝓔 - an entity r ∈ 𝓡 - a relation 𝓗+ ⊆ (𝓔 ⨉ 𝓡 ⨉ 𝓔)
  • 10. Challenges Wikidata Rep. 10 ℯ1 ℯ3 ℯ2 implicit entities subsuming entities long entity titles Andhra Pradesh High Court
  • 11. Challenges Wikidata Rep. 11 ℯ1 ℯ3 ℯ2 implicit entities long entity titlessubsuming entities AP : Q1159 HC: Q3128536
  • 12. Challenges Wikidata Rep. 12 ℯ1 ℯ3 ℯ2 implicit entities long entity titlessubsuming entities AP : Q1159 HC: Q3128536 complex entity titles
  • 13. Research Question 13 implicit entities implicit entities How well does the attentive neural network perform for entity linking task leveraging background knowledge particularly for a challenging KG such as Wikidata?
  • 14. SOTA - Wikidata Entity Linking 14 ● Major datasets in EL are based on Wikipedia, and news articles: ○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004 ● Other Dataset have been based on DBpedia or YAGO ○ RSS-500 (AGDISTIS) ○ CoNLL-AIDA-YAGO ● First Datasets on Wikidata ○ RSS-500- Wikidata - Antonin Delpeuch, 2019 ○ ISTEX - Antonin Delpeuch, 2019 ● Second Datasets on Wikidata ○ T-REX dataset - Elsahar H. et. al., 2018 ○ that contains 4.65 million Wikipedia extracts (documents) ○ 6.2 million sentences. implicit entities ○ implicit entities
  • 15. SOTA - Wikidata Entity Linking 15 ● Major datasets in EL are based on Wikipedia, and news articles: ○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004 ● Other Dataset have been based on DBpedia or YAGO ○ RSS-500 (AGDISTIS) ○ CoNLL-AIDA-YAGO ● First Datasets on Wikidata ○ RSS-500- Wikidata - Antonin Delpeuch, 2019 ○ ISTEX - Antonin Delpeuch, 2019 ● Second Datasets on Wikidata ○ T-REX dataset - Elsahar H. et. al., 2018 ○ that contains 4.65 million Wikipedia extracts (documents) ○ 6.2 million sentences. implicit entities ○ implicit entities Kolistas et. al., 2018 OpenTpioca - Delpeuch, 2019
  • 16. SOTA - Wikidata Entity Linking 16 ● Major datasets in EL are based on Wikipedia, and news articles: ○ CoNLL-AIDA, MSNBC,ACQUAINT, ACE-2004 ● Other Dataset have been based on DBpedia or YAGO ○ RSS-500 (AGDISTIS) ○ CoNLL-AIDA-YAGO ● First Datasets on Wikidata ○ RSS-500- Wikidata - Antonin Delpeuch, 2019 ○ ISTEX - Antonin Delpeuch, 2019 ● Second Datasets on Wikidata ○ T-REX dataset - Elsahar H. et. al., 2018 ○ that contains 4.65 million Wikipedia extracts (documents) ○ 6.2 million sentences. implicit entities ○ implicit entities First wikidata dataset 4.65 M docs / 6.2 M Sentences OpenTpioca - Delpeuch, 2019 Kolistas et. al., 2018
  • 17. Approach - Encoding Entity Aliases in Attentive NN 17
  • 18. Approach - Local infused KG 18 Wikidata contains over 52 million entities and 3.9 billion facts DBpedia contains over 5.6 million entities and 111 million facts Alias Triples ● <Barack Obama, spouse, Michelle Obama> ● <Obama, spouse, Michelle> ● <BH, partner, Michelle Obama> 1. Sakor et al. 2019 NAACL. Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text. Link
  • 19. 19 Approach - Encoder - Decoder Attentive NN ● LSTM Encoder / Decoder + Attention layer: (Luong et. al. 2015) ○ Allow flexibility to break the NN architecture so as to add extra context ○ Simple - for explainability ○ Not novel in this work. ● Induces Context at the Interchange of NER & NED ○ Enhances the performance of model ○ Novel for the community ● Attention allows choosing which context is relevant ○ Among several entity aliases, the attention mechanism assists encode focused signals. implicit entities ○ implicit entities
  • 20. Approach - Implementation and Training Details 20 ● Apache Lucene ○ Local KG and Indexing ○ Semantic search ○ Reuses FALCON indexing (Sakor et. al., 2019) ○ Candidate Selection threshold 0.85 ● PyTorch framework ○ Neural Network ● Word Embeddings ○ Pre-trained word embeddings from Glove ○ 300D ● Hardware ○ Two Nvidia GeForce GTX1080 Ti GPUs with 11GB size. implicit entities
  • 21. Baselines : Systems for comparison 21 ● Limited Work on Wikidata Entity Linking ○ No comparison with models based on other KBs ● Baseline System ○ Attention Architectures without added Background KG ● Open Tapioca (Delpeuch) ○ Tested on the T-Rex Dataset implicit entities ○ implicit entities
  • 22. Resulta : Results 22 Baseline 0.630 OpenTapioca 0.579 Arjun 0.713 T-Rex dataset APPROACH F1 RQ - How does encoding KG entity context impact the performance of Attentive Neural Nets over Wikidata?
  • 23. Results : Success and Failure cases of Arjun 23 Success Cases : ● Arjun achieves 0.77 F-Score for mention detection. ● Linking implicit entities ○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit. ○ The encoded Aliases contain the form “ASIC” ○ The attention layer is able to focus the model on the relevant info. ● Subsuming entities ● Long entity titles Failure Cases ● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from the French” ○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963 ● Identifying and Linking other Entities not in the Gold Standard
  • 24. Results : Success and Failure cases of Arjun 24 Success Cases : ● Arjun achieves 0.77 F-Score for mention detection. ● Linking implicit entities ○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit. ○ The encoded Aliases contain the form “ASIC” ○ The attention layer is able to focus the model on the relevant info. ● Subsuming entities ● Long entity titles Failure Cases ● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from the French” ○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963 ● Identifying and Linking other Entities not in the Gold Standard
  • 25. Results : Success and Failure cases of Arjun 25 Success Cases : ● Arjun achieves 0.77 F-Score for mention detection. ● Linking implicit entities ○ Running Example : ASIC - wdt:Q217302 Application Specific Integrated Circuit. ○ The encoded Aliases contain the form “ASIC” ○ The attention layer is able to focus the model on the relevant info. ● Subsuming entities ● Long entity titles Failure Cases ● Sample Sentence : “Two vessels have borne the name HMS Heureux, both of them captured from the French” ○ Caused by Candidate Generation step: Since the Candidate list does not contain wdt:Q3134963 ● Identifying and Linking other Entities not in the Gold Standard wdt:Q3134963 wdt:Q56539239 Surname L’Heureux French ship
  • 26. Conclusions : and Open Questions 26 ● Focus : introducing the limitations of EL on Wikidata ● Presented the novel approach Arjun : Encode context in NN ● Open Questions ● Enhancing Neural Network with Multiple layers : Requires better resources ● Alternative Models : Enhance NER models and ● Replacing Semantic Search ● Cross KG entity Linking / KG-Agnostic EL
  • 27. Acknowledgement & Contacts 27 Website Acknowledgements Grant Agreement No. 822404 Grant Agreement No. 27658 Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS Schloss Birlinghoven 53757 Sankt Augustin, Germany Isaiah Mulang’ Onando isaiah.mulang.onando@iais.fraunhofer.de Email IASIS Project(EU Horizon 2020) QualiChain Project (EU Horizon 2020) www.mulangonando.com