Presented by: Jennifer D’Souza, Postdoc in the ORKG team
http://orkg.org | @orkg_org
Technische Informationsbibliothek (TIB)
Welfengarten 1B // 30167 Hannover
Perspectives on Mining Knowledge Graphs
from Text
● A critical scientific document digitalization initiative in this
digital age
○ Capturing scholarly article contributions in machine interpretable
Knowledge Graphs
● The ORKG is hosted at TIB
○ https://www.orkg.org/
○ @orkg_org
● Led by TIB director Prof. (Dr.) Sören Auer
The Open Research Knowledge Graph (ORKG) is ...
Perspectives on Mining Knowledge Graphs from Text
Knowledge
Graph
Knowledge
Representation
Learning
Representation Space
Scoring Function
Encoding Models
Knowledge
Acquisition
Entity Discovery
Relation Extraction
Knowledge Graph Completion
*Point-wise *Manifold
*Complex *Gaussian
*Discrete
*Linear/Bilinear *RNN
*Factorization *Transformers
*Neural Nets *GCN
*CNN
*Recognition
*Typing
*Linking
*Alignment
*Neural Nets
*Attention
*GCN
*GAN
*RL
*Others
*Embedding-based Ranking
*Path-based Reasoning
*Rule-based Reasoning
*Meta Relational Learning
*Triple Classification
References
Ji, Shaoxiong, et al. "A survey on knowledge graphs: Representation, acquisition, and applications." IEEE Transactions on Neural Networks and Learning Systems
(2021).
*Distance-based
*Similarity Matching
(I) Entity Linking and
(II) KG Completion
Jennifer D’Souza
Technische Informationsbibliothek (TIB)
Welfengarten 1B // 30167 Hannover
● Given an entity mention in a text document and a
knowledge base (KB) of entities,
○ find the entity in the KB the entity mention refers to
or
○ determine that such entity does not exist in the KB
Entity Linking
Entity Linking
What is the birthdate of the famous basketball player
Michael Jordan?
Entity Linking
What is the birthdate of the famous basketball player
Michael Jordan?
Entity Linking
What is the birthdate of the famous basketball player
Michael Jordan?
Knowledge Bases
Entity Linking
What is the birthdate of the famous basketball player
Michael Jordan?
Knowledge Bases
Entity Linking
What is the birthdate of the famous basketball player
Michael Jordan?
Knowledge Bases
Entity Linking
● challenging because
○ entity ambiguity: mentions with the same word/phrase can have various entity
candidates
■ E.g., Michael Jordan: Basketball player or Berkeley professor?
○ name variations: mentions with different words/phrases can refer to the same
entity
■ E.g., New York City or Big Apple
Entity Linking
● challenging because
○ entity ambiguity: mentions with the same word/phrase can have various entity
candidates
■ E.g., Michael Jordan: Basketball player or Berkeley professor?
○ name variations: mentions with different words/phrases can refer to the same entity
■ E.g., New York City or Big Apple
● Aside 1: alternatively called Named Entity Disambiguation
○ However, Named Entity Disambiguation (NED) and Entity Linking (EL) can
sometimes be treated as separate tasks.
■ NED: determine which named entity a mention refers to.
● E.g., the mention “Trump” can refer to either a person, a corporation or a building;
■ EL: provide a standard IRI for each disambiguated entity.
● IRIs (Internationalized Resource Identifier) used as subjects, predicates, and objects
can be taken from well-defined vocabularies or ontologies in the Linked Open Data
(LOD) cloud
Entity Linking
● challenging because
○ entity ambiguity: mentions with the same word/phrase can have various entity
candidates
■ E.g., Michael Jordan: Basketball player or Berkeley professor?
○ name variations: mentions with different words/phrases can refer to the same entity
■ E.g., New York City or Big Apple
● Aside 1: alternatively called Named Entity Disambiguation
○ However, Named Entity Disambiguation (NED) and Entity Linking (EL) can
sometimes be treated as separate tasks.
■ NED: determine which named entity a mention refers to.
● E.g., the mention “Trump” can refer to either a person, a corporation or a building;
■ EL: provide a standard IRI for each disambiguated entity.
● E.g., Trump-the-president can be linked to the IRI that represents him in Wikidata:
https://www.wikidata.org/entity/Q22686
Entity Linking
● challenging because
○ entity ambiguity: mentions with the same word/phrase can have various entity
candidates
■ E.g., Michael Jordan: Basketball player or Berkeley professor?
○ name variations: mentions with different words/phrases can refer to the same entity
■ E.g., New York City or Big Apple
● Aside 1: alternatively called Named Entity Disambiguation
○ In this talk, NED and EL are treated as the same task, i.e. NED that finds which
entity a mention like “Trump” refers to, and the EL providing the LOD IRI for that
entity, are considered as one step
Entity Linking
● challenging because
○ entity ambiguity: mentions with the same word/phrase can have various entity
candidates
■ E.g., Michael Jordan: Basketball player or Berkeley professor?
○ name variations: mentions with different words/phrases can refer to the same entity
■ E.g., New York City or Big Apple
● Aside 2: commonly known as normalization for the biomedical domain
○ Map a word/phrase in a document to a concept in an ontology after disambiguating
potential ambiguous words/phrases
Entity Linking
● challenging because
○ entity ambiguity: mentions with the same word/phrase can have various entity
candidates
■ E.g., Michael Jordan: Basketball player or Berkeley professor?
○ name variations: mentions with different words/phrases can refer to the same entity
■ E.g., New York City or Big Apple
● Aside 2: commonly known as normalization for the biomedical domain
○ This talk will focus on the open-domain EL task, i.e. involving data from newswire or the
Web.
■ While the approaches for open-domain EL can be imported to biomedical entity
normalization, the latter task may be amenable to strong rule-based resolution1,2
as well
References
1 D’Souza, J., & Ng, V. (2015, July). Sieve-based entity linking for the biomedical domain. In Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 297-302)
2. D. Kim et al., "A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining," in IEEE Access, vol. 7, pp. 73729-73740,
2019, doi: 10.1109/ACCESS.2019.2920708.
Plan for Part I of II of the Talk
● Datasets & Knowledge Bases
● (Neural) Approaches
○ since 2015
● Evaluations
Datasets
● Open-domain Evaluation datasets from various genres
○ News, Tweets, Web pages, Blog, Encyclopedia
Datasets: Details & Statistics
Dataset Name Genre Mentions KB
AIDA news 34,587 YAGO/Freebase/W
ikipedia
KBP’2010 news 4,338 Wikipedia
MSNBC news 656 Wikipedia
AQUAINT news 449 Wikipedia
ACE-2004 news 257 Wikipedia
WNED-CWEB
(CWEB)
news 11,154 Wikipedia
WNED-WIKI (WW) news 6,821 Wikipedia
Datasets: Details & Statistics
Dataset Name Genre Mentions KB
AIDA news 34,587 YAGO/Freebase/W
ikipedia
KBP’2010 news 4,338 Wikipedia
MSNBC news 656 Wikipedia
AQUAINT news 449 Wikipedia
ACE-2004 news 257 Wikipedia
WNED-CWEB
(CWEB)
news 11,154 Wikipedia
WNED-WIKI (WW) news 6,821 Wikipedia
AIDA is the largest human-annotated dataset, where each of the 34,587 mentions
were checked for entities in the YAGO knowledge base.
Datasets: Details & Statistics
Dataset Name Genre Mentions KB
AIDA news 34,587 YAGO/Freebase/W
ikipedia
KBP’2010 news 4,338 Wikipedia
MSNBC news 656 Wikipedia
AQUAINT news 449 Wikipedia
ACE-2004 news 257 Wikipedia
WNED-CWEB
(CWEB)
news 11,154 Wikipedia
WNED-WIKI (WW) news 6,821 Wikipedia
The Knowledge Base Population (KBP) track conducted as part of NIST Text Analysis Conference (TAC) is
an international entity linking competition held every year since 2009. Entity linking is regarded as one of
the two subtasks in this track. These public entity linking competitions provided some benchmark data sets
to evaluate and compare different entity linking systems.
Datasets: Details & Statistics
Dataset Name Genre Mentions KB
AIDA news 34,587 YAGO/Freebase/W
ikipedia
KBP’2010 news 4,338 Wikipedia
MSNBC news 656 Wikipedia
AQUAINT news 449 Wikipedia
ACE-2004 news 257 Wikipedia
WNED-CWEB
(CWEB)
news 11,154 Wikipedia
WNED-WIKI (WW) news 6,821 Wikipedia
Then there is a dataset from the MSNBC news source.
Datasets: Details & Statistics
Dataset Name Genre Mentions KB
AIDA news 34,587 YAGO/Freebase/W
ikipedia
KBP’2010 news 4,338 Wikipedia
MSNBC news 656 Wikipedia
AQUAINT news 449 Wikipedia
ACE-2004 news 257 Wikipedia
WNED-CWEB
(CWEB)
news 11,154 Wikipedia
WNED-WIKI (WW) news 6,821 Wikipedia
WNED datasets where WNED stands for Walking Named Entity Disambiguation as
a name of the algorithm developed for EL are the largest automatically created
datasets.
Others: NEEL (tweets; 8,665 mentions; DBpedia); OKE-2015 (encyclopedia; DBpedia); WES2015
(blog; DBpedia); WikiNews (news; DBpedia); OKE2016 (Web pages; 1,043 mentions; DBpedia)
Datasets: Details & Statistics
Dataset Name Genre Mentions KB
AIDA news 34,587 YAGO/Freebase/W
ikipedia
KBP’2010 news 4,338 Wikipedia
MSNBC news 656 Wikipedia
AQUAINT news 449 Wikipedia
ACE-2004 news 257 Wikipedia
WNED-CWEB
(CWEB)
news 11,154 Wikipedia
WNED-WIKI (WW) news 6,821 Wikipedia
Others: NEEL (tweets; 8,665 mentions; DBpedia); OKE-2015 (encyclopedia; DBpedia); WES2015
(blog; DBpedia); WikiNews (news; DBpedia); OKE2016 (Web pages; 1,043 mentions; DBpedia)
Datasets: Details & Statistics
Dataset Name Genre Mentions KB
AIDA news 34,587 YAGO/Freebase/W
ikipedia
KBP’2010 news 4,338 Wikipedia
MSNBC news 656 Wikipedia
AQUAINT news 449 Wikipedia
ACE-2004 news 257 Wikipedia
WNED-CWEB
(CWEB)
news 11,154 Wikipedia
WNED-WIKI (WW) news 6,821 Wikipedia
● A fundamental component for Entity Linking
● Knowledge bases provide the information about the world’s entities (e.g., the entities of
Albert Einstein and Ulm), their semantic categories (e.g., Albert Einstein has a type of
Scientist and Ulm has a type of City), and the mutual relationships between entities (e.g.,
Albert Einstein has a relation named bornIn with Ulm).
● Examples:
○ Wikipedia (6,195,675 English articles)1
■ a free online multilingual encyclopedia created through decentralized, collective efforts of thousands of volunteers around
the world.
■ The basic entry in Wikipedia is an article, which defines and describes an entity or a topic, and each article in Wikipedia is
uniquely referenced by an identifier.
■ Wikipedia provides a set of useful features for entity linking, such as entity pages, article categories, redirect pages,
disambiguation pages, and hyperlinks in Wikipedia articles.
○ DBpedia (4.58 million things in English version)2
■ multilingual knowledge base constructed by extracting structured information from Wikipedia such as infobox templates,
categorization information, geo-coordinates, and links to external Web pages
References
1 https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
2 https://wiki.dbpedia.org/about/facts-figures
Knowledge Bases for Entities
● A fundamental component for Entity Linking
● Knowledge bases provide the information about the world’s entities (e.g., the entities of
Albert Einstein and Ulm), their semantic categories (e.g., Albert Einstein has a type of
Scientist and Ulm has a type of City), and the mutual relationships between entities (e.g.,
Albert Einstein has a relation named bornIn with Ulm).
● Examples:
○ YAGO (50 million entities and 2 billion facts)1
■ YAGO combines Wikidata and the schema.org ontology as the top level ontology for information organization, thus getting
the best from both worlds: a huge repository of facts, together with an ontology that is simple and used as a standard by a
large community.
References
1 https://yago-knowledge.org/getting-started
Knowledge Bases for Entities
We have:
1. released a novel multidisciplinary corpus of scholarly abstracts annotated for scientific
entities under a generic conceptual formalism that bridges 10 different STEM scientific
disciplines
a. The STEM domains we consider are Agriculture, Astronomy, Biology, Chemistry, Computer Science,
Earth Science, Engineering, Materials Science, and Mathematics.
b. The generic conceptual formalism involves four entity types
i. Process, Method, Material, and Data
c. The terms underlying the Process, Method, Material, and Data entities are linked in Wikipedia,
thereby, our entities are disambiguated for their scientific sense and grounded in the real world.
d. The STEM-ECR v1.0 corpus is publicly available: https://doi.org/10.25835/0017546 (ISLRN
749-555-840-571-2)
References
● Brack, Arthur, Jennifer D’Souza, Anett Hoppe, Sören Auer, and Ralph Ewerth. "Domain-independent extraction of scientific concepts from
research articles." In European Conference on Information Retrieval, pp. 251-266. Springer, Cham, 2020.
● D’Souza, Jennifer, Anett Hoppe, Arthur Brack, Mohmad Yaser Jaradeh, Sören Auer, and Ralph Ewerth. "The STEM-ECR Dataset: Grounding
Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources." In Proceedings of The 12th
Language Resources and Evaluation Conference, pp. 2192-2203. 2020.
Datasets & Knowledge Bases
New Resource Highlight: Scholarly Knowledge Linked Entities across STEM
Disciplines
Plan for Part I of II of the Talk
● Datasets & Knowledge Bases
● (Neural) Approaches
○ since 2015
● Evaluations
(Neural) Approaches to Entity Linking
● EL has three main subtasks:
○ candidate-entity generation;
■ aims to retrieve all possible entities in the KB that may refer to an entity mention
○ candidate-entity ranking or disambiguation;
■ aims to rank the candidate entities and return the most likely one for each targeted mention
○ NIL clustering
■ handles those mentions that cannot be matched with an entity in the KB
Approaches to Entity Linking: From the 3 subtasks perspective
Reference: Figure 7 in T. Al-Moslmi, M. Gallofré Ocaña, A. L. Opdahl and C. Veres, "Named Entity Extraction for Knowledge
Graphs: A Literature Overview," in IEEE Access, vol. 8, pp. 32862-32881, 2020, doi: 10.1109/ACCESS.2020.2973928.
Approaches to Entity Linking: From the systems perspective
Reference: Part of Figure 8 in T. Al-Moslmi, M. Gallofré Ocaña, A. L. Opdahl and C. Veres, "Named Entity Extraction for
Knowledge Graphs: A Literature Overview," in IEEE Access, vol. 8, pp. 32862-32881, 2020, doi:
10.1109/ACCESS.2020.2973928.
Three Non-Neural Approaches
● AIDA1
○ Mention Detection using Stanford NER Tagger
○ Linking as a graph-based technique with weighted edges computed as degree of links
between pages
● DBpedia Spotlight2
○ Mention Detection as a lightweight heuristics-based model with syntactic parsers to generate
mention candidates
○ Linking as a generative probabilistic model using maximum likelihood estimates
● Babelfy3
○ Mention Detection as named entities (e.g., Major League Soccer) and overlapping nominals
(e.g., major league, soccer)
○ A unified graph-based approach relying on encyclopedic and lexicographic knowledge
References
1. AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables
2. Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013, September). Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of the
9th International Conference on Semantic Systems (pp. 121-124).
3. A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational
Linguistics (TACL), 2, pp. 231-244, 2014
Approaches to Entity Linking: From the systems perspective
prior to 2015
(Neural) Approaches to Entity Linking: General Architecture
since 2015
Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
(Neural) Approaches to Entity Linking: General Architecture
since 2015
Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
mentions in a plain text
are distinguished
corresponding entity is predicted for the
given mention
(Neural) Approaches to Entity Linking: General Architecture
since 2015
Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
possible entities are
produced for the mention
context/mention - candidate
similarity score is computed
through the representations
Three prominent methods:
● a surface form matching
○ a candidate list is composed of entities, which simply match surface forms of mentions in the
text; does not work well if referent entity does not contain mention string
● a dictionary lookup
○ a dictionary of additional aliases is constructed using KB metadata like
disambiguation/redirect pages of Wikipedia or lexical synonymy relations
● and a prior probability computation
○ the candidates are generated based on precalculated prior probabilities of correspondence
between certain mentions and entities; based on mention-entity hyperlink count statistics
[1,2,3,4,5,etc.]
○
References
1. Stefan Zwicklbauer, Christin Seifert, and Michael Granitzer. 2016. Robust and collective entity disambiguation through semantic embeddings. In Proceedings of
the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, pages 425–434, New York, NY, USA. ACM.
2. Chen-Tse Tsai and Dan Roth. 2016. Cross-lingual Wikification using multilingual embeddings. In Proceedings of the 2016 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 589–598, San Diego, California, USA. ACL.
3. Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep joint entity disambiguation with local neural attention. In Proceedings of the 2017 Conference on
Empirical Methods in Natural Language Processing, pages 2619–2629, Copenhagen, Denmark. ACL.
4. Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In Proceedings of the 22nd Conference on
Computational Natural Language Learning, pages 519–529, Brussels, Belgium. Association for Computational Linguistics.
5. Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In The 32 AAAI, New Orleans, Louisiana, USA. AAAI Press.
Candidate Generation
The goal of this stage is given a list of entity candidates from a KB and a context with a mention to
rank these entities assigning a score to each of them.
Entity Ranking
General Architecture of a Neural Entity Ranking Component
Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
General Architecture of a Neural Entity Ranking Component
Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
Three parts 1.Encoding the mention
General Architecture of a Neural Entity Ranking Component
Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
● To correctly disambiguate an
entity mention, it is crucial to
thoroughly capture the
information from its context.
● A contextualized vector
representation of a mention
is generated by an encoder
network.
Two approaches prevail:
● recurrent networks with LSTM cells
● self-attention
Mention Encoding Subcomponent
Two approaches prevail:
● recurrent networks with LSTM cells
○ concatenating outputs of two LSTM networks that independently encode left and right
contexts of a mention (including the mention itself) [1];
○ encode left and right local contexts via LSTMs but also pool the results across all mentions in
a coreference chain and postprocess left and right representations with a tensor network [2];
○ modification of LSTM–GRU in conjunction with an attention mechanism to encode left and
right context of a mention [3];
○ run a bidirectional LSTM network on words complemented with embeddings of word positions
relative to a target mention [4]
References
1 Nitish Gupta, Sameer Singh, and Dan Roth. 2017. Entity linking via joint encoding of types, descriptions, and context. In 2017 EMNLP, pages 2681–2690,
Copenhagen, Denmark. ACL.
2 Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In The 32 AAAI, New Orleans, Louisiana, USA. AAAI Press.
3. Yotam Eshel, Noam Cohen, Kira Radinsky, Shaul Markovitch, Ikuya Yamada, and Omer Levy. 2017. Named entity disambiguation for noisy text. In CoNLL 2017,
pages 58–68, Vancouver, Canada. ACL.
4. Phong Le and Ivan Titov. 2019b. Distant learning for entity linking with automatic noise detection. In Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics, pages 4081–4090, Florence, Italy, July. ACL.
Mention Encoding Subcomponent
Two approaches prevail:
● recurrent networks with LSTM cells
● self-attention: encoding methods based on self-attention rely on the outputs from pre-trained
BERT layers for context and mention encoding.
Mention Encoding Subcomponent
Two approaches prevail:
● recurrent networks with LSTM cells
● self-attention: encoding methods based on self-attention rely on the outputs from pre-trained
BERT layers for context and mention encoding.
○ a mention representation is modeled by pooling over word pieces in a mention span. The
authors also put an additional self-attention block over all mention representations that
encode interactions between several entities in a sentence [1].
○ reduce a sequence by keeping the representation of the special pooling symbol ‘[CLS]’
inserted at the beginning of a sequence [2].
○ mark positions of a mention span by summing embeddings of words within the span with a
special vector [3] and use the same reduction strategy as [2].
○ concatenate text with all mentions in it and jointly encode this sequence via a self-attention
model based on pre-trained BERT [4].
References
1 Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge enhanced contextual word
representations. In Proceedings of the 2019 EMNLP-IJCNLP, pages 43–54, Hong Kong, China. ACL.
2 Ledell Yu Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Zero-shot entity linking with dense entity retrieval. ArXiv,
abs/1911.03814.
3 Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, and Honglak Lee. 2019. Zero-shot entity linking by reading entity
descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3449–3460, Florence, Italy. ACL.
4 Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2020. Global entity disambiguation with pretrained contextualized embeddings of words and
entities. arXiv preprint arXiv:1909.00426v2.
Mention Encoding Subcomponent
General Architecture of a Neural Entity Ranking Component
Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
2.Encoding the candidate entities
Three parts
General Architecture of a Neural Entity Ranking Component
Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
● Linking decision will be
based on how accurately
candidate entities match a
corresponding mention or
context based on the entity
structured or textual
information.
● Low-dimensional semantic
representations of entities
account for this in such a
way that spatial proximity of
entities in a vector space
correlates with their
semantic similarity.
Three parts
Aim to obtain vector representations for entities:
● capture different kinds of entity information, including entity type, description page, linked mention,
and contextual information, and therefore, generate a large encoder, which involves CNN for the
entity description and alignment function for the others [1].
● encode entities based on their title, description page, and category information. All previously
mentioned models rely on the annotated data, and a few studies are challenged with less
resource dependence [2].
● derive entity embeddings using pre-trained word2vec word vectors through description page
words, surface forms words, and entity category words [3,4].
● depend on the BERT architecture to create representations through the description pages [5,6].
References
1 Nitish Gupta, Sameer Singh, and Dan Roth. 2017. Entity linking via joint encoding of types, descriptions, and context. In Proceedings of the 2017 EMNLP, pages
2681–2690, Copenhagen, Denmark. ACL.
2 Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning dense representations for
entity retrieval. In Proceedings of the 23rd CoNLL, pages 528–537, Hong Kong, China. ACL.
3 Yaming Sun, Lei Lin, Duyu Tang, Nan Yang, Zhenzhou Ji, and XiaolongWang. 2015. Modeling mention, context and entity with neural networks for entity
disambiguation. In Proceedings of the 24th, IJCAI’15, pages 1333–1339. AAAI Press.
4 Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In 32 AAAI, New Orleans, Louisiana, USA. AAAI Press.
5 Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, and Honglak Lee. 2019. Zero-shot entity linking by reading entity
descriptions. In Proceedings of the 57th ACL, pages 3449–3460, Florence, Italy. ACL.
6 Ledell Yu Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Zero-shot entity linking with dense entity retrieval. ArXiv,
abs/1911.03814.
Entity Encoding Subcomponent
General Architecture of a Neural Entity Ranking Component
Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
3. Comparing Mention and Candidate Entity Representations
Three parts
● Most of the state-of-the-art studies compare mention and entity representations using a dot product [1,2,3,4] or
cosine similarity [5,6,7].
● The calculated similarity score is often combined with mention-entity priors obtained during the candidate
generation phase [1,3,6] or other features including various similarities, string matching indicator, and entity types
[6,8,9,10].
● Commonly an additional one or two-layer feedforward network [1,6,9] is used. The final disambiguation decision is
inferred via a probability distribution, usually by a softmax function over the candidates. The local similarity score
or a probability distribution can be further utilized for global scoring.
References
1 Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep joint entity disambiguation with local neural attention. In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, pages 2619–2629, Copenhagen, Denmark. ACL.
2 Nitish Gupta, Sameer Singh, and Dan Roth. 2017. Entity linking via joint encoding of types, descriptions, and context. In Proceedings of the 2017 EMNLP, pages
2681–2690, Copenhagen, Denmark. ACL.
3 Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In 22nd CoNLL, pages 519–529, Brussels, Belgium. ACL.
4 Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge enhanced contextual word
representations. In Proceedings of the 2019 EMNLP-IJCNLP, pages 43–54, Hong Kong, China. ACL.
5 Yaming Sun, Lei Lin, Duyu Tang, Nan Yang, Zhenzhou Ji, and XiaolongWang. 2015. Modeling mention, context and entity with neural networks for entity disambiguation. In
Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 1333–1339. AAAI Press.
6 Matthew Francis-Landau, Greg Durrett, and Dan Klein. 2016. Capturing semantic similarity for entity linking with convolutional neural networks. In Proceedings of the 2016
NAACL: Human Language Technologies, pages 1256–1261, San Diego, California, USA.
7 Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning dense representations for entity
retrieval. In Proceedings of the 23rd CoNLL, pages 528–537, Hong Kong, China. ACL.
8 Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In 32 AAAI, New Orleans, Louisiana, USA. AAAI Press.
9 Hamed Shahbazi, Xiaoli Z Fern, Reza Ghaeini, Rasha Obeidat, and Prasad Tadepalli. 2019. Entity-aware elmo:Learning contextual entity representation for entity
disambiguation. arXiv preprint arXiv:1908.05762.
Comparing Mention and Candidate Entity Representations
● Optionally addressed in some systems
● Aim to equip EL systems to recognize cases when referent entities of some mentions can be
absent in the KBs. This is known as NIL prediction.
Unlinkable Mention Prediction
● Optionally addressed in some systems
● Aim to equip EL systems to recognize cases when referent entities of some mentions can be
absent in the KBs. This is known as NIL prediction.
● Four common ways to perform NIL prediction.
○ a candidate generator does not yield any corresponding entities for a mention by setting a
threshold for linking probability [1,2]
○ introduce an additional special ‘NIL’ entity in the ranking phase, so some models predict it as
the best match for the mention [3]
○ train an additional binary classifier that accepts mention-entity pairs after the ranking phase,
as well as several additional features (best linking score, whether mentions are also detected
by a dedicated NER system, etc.), and makes the final decision about whether a mention is
linkable or not [4,5].
References
1 Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge enhanced contextual word
representations. In Proceedings of the 2019 EMNLP-IJCNLP, pages 43–54, Hong Kong, China. Association for Computational Linguistics.
2 Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, and Fernando Pereira. 2015. Plato: A selective context model for entity resolution. TACL, 3:503–515.
3 Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In 22nd CoNLL, pages 519–529, Brussels, Belgium. ACL.
4 Jose G. Moreno, Romaric Besanc¸on, Romain Beaumont, Eva D’hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, and Brigitte Grau. 2017. Combining word
and entity embeddings for entity linking. In Extended Semantic Web Conference (1), volume 10249 of Lecture Notes in Computer Science, pages 337–352.
5 Pedro Henrique Martins, Zita Marinho, and Andr´e F. T. Martins. 2019. Joint learning of named entity recognition and entity linking. In Proceedings of the 57th Annual
Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 190–196, Florence, Italy. Association for Computational Linguistics.
Unlinkable Mention Prediction
(Neural) Approaches to Entity Linking: General Architecture
since 2015
Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey
of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
possible entities are
produced for the mention
context/mention - candidate
similarity score is computed
through the representations
Modifications of the General Architecture:
● Joint Entity Recognition and Disambiguation Architectures
○ Observe that interaction between recognition and disambiguation is beneficial to improve overall
model
■ E.g., multi-task learning framework that integrates recognition and linking [1]
● Global Context Architectures
○ global EL seen as sequential decision task where disambiguation of new entities is based on the
already disambiguated ones
■ E.g., apply LSTM to be able to maintain long term memory for previous decisions [2]
● Cross-lingual Architectures
○ leverage supervision signals from multiple languages for training a model in a target language
■ E.g., the inter-lingual links in Wikipedia utilized for alignment of entities in multiple languages. With
this alignment, the annotated data from high-resource languages like English can help to improve
the quality of text processing for the low-resource ones [3]
References
1 Pedro Henrique Martins, Zita Marinho, and Andr´e F. T. Martins. 2019. Joint learning of named entity recognition and entity linking. In 57th ACL: Student Research
Workshop, pages 190–196, Florence, Italy. ACL.
2 Zheng Fang, Yanan Cao, Qian Li, Dongjie Zhang, Zhenyu Zhang, and Yanbing Liu. 2019. Joint entity linking with deep reinforcement learning. In The World Wide
Web Conference, WWW ’19, pages 438–447, New York, NY, USA. ACM.
3 Heng Ji, Joel Nothman, Ben Hachey, and Radu Florian. 2015. Overview of TAC-KBP2015 tri-lingual entity discovery and linking. In Proceedings of the 2015 Text
Analysis Conference, TAC 2015, pages 16–17, Gaithersburg, Maryland, USA. NIST.
(Neural) Approaches to Entity Linking: General Architecture
Modifications
Plan for Part I of II of the Talk
● Datasets & Knowledge Bases
● (Neural) Approaches
○ since 2015
● Evaluations
Results are described in terms of accuracy and micro F1 scores
Evaluations: Metrics
References
Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2020. Global entity disambiguation with pretrained contextualized embeddings of words and entities.
arXiv preprint arXiv:1909.00426v2.
Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In 22nd CoNNL, pages 519–529, Brussels, Belgium. ACL.
Ledell Yu Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Zero-shot entity linking with dense entity retrieval. ArXiv, abs/1911.03814.
Evaluations: Entity Linking Results
Dataset Accuracy Micro F1 System
AIDA
0.950 - Yamada et al. (2020)
- 0.824 Kolitsas et al. (2018)
KBP’10 0.940 - Wu et al. (2020)
MSNBC - 0.963 Yamada et al. (2020)
AQUAINT - 0.935 Yamada et al. (2020)
ACE-2004 - 0.919 Yamada et al. (2020)
CWEB - 0.789 Yamada et al. (2020)
WW - 0.891 Yamada et al. (2020)
Open Source Tools and Resources for Entity Linking
Year System name NER NED URL
2010 Tagme Y Y https://tagme.d4science.org/tagme/
2011 DBpedia Spotlight Y Y https://www.dbpedia-spotlight.org/
2011 AIDA Y Y https://gate.d5.mpi-inf.mpg.de/webaida/
2013 TwitIE Y Y https://gate.ac.uk/wiki/twitie.html
2014 Babelfy Y Y http://babelfy.org/
2014 Stanford CoreNLP Y N https://stanfordnlp.github.io/CoreNLP/
2015 SpaCy Y N https://spacy.io/
since 2015 neural models - - https://paperswithcode.com/task/entity-linking
(I) Entity Linking and
(II) KG Completion
Jennifer D’Souza
Technische Informationsbibliothek (TIB)
Welfengarten 1B // 30167 Hannover
● Involves the construction of Knowledge Graphs (KG) from
unstructured text and other structured or semi-structured
sources.
○ Core tasks are relation and entity extraction
Knowledge Acquisition
A KG is typically a multi-relational graph containing entities as nodes and relations as edges. Each edge is
represented as a triplet (head entity, relation, tail entity) ((h; r; t) for short), indicating the relation between two
entities, e.g., (Albert Einstein, WinnerOf, Nobel Prize in Physics).
● Involves the construction of Knowledge Graphs (KG) from
unstructured text and other structured or semi-structured
sources.
○ Core tasks are relation and entity extraction
● Powered by KGs, many real-world applications such as
recommendation systems and question answering has
seen significant progress with the their new capacity for
commonsense understanding and reasoning.
○ Search powered by Google’s Knowledge Graph
Knowledge Acquisition
● Involves the construction of Knowledge Graphs (KG) from
unstructured text and other structured or semi-structured
sources.
○ Core tasks are relation and entity extraction
● Powered by KGs, many real-world applications such as
recommendation systems and question answering has
seen significant progress with the their new capacity for
commonsense understanding and reasoning.
○ Given knowledge: (Male,gender,Y) and (X,hasChild,Y)
Knowledge Acquisition
● Involves the construction of Knowledge Graphs (KG) from
unstructured text and other structured or semi-structured
sources.
○ Core tasks are relation and entity extraction
● Powered by KGs, many real-world applications such as
recommendation systems and question answering has
seen significant progress with the their new capacity for
commonsense understanding and reasoning.
○ Given knowledge: (Y,gender,Male) and (X,hasChild,Y)
Then, inferences such as (Y,sonOf,X) are possible.
Knowledge Acquisition
● Also involves completing an existing knowledge graph,
and other entity-oriented acquisition tasks such as entity
resolution and alignment.
● Thus, the main tasks of knowledge acquisition include
relation extraction to convert unstructured text to
structured knowledge, knowledge graph completion
(KGC), and other entity-oriented acquisition tasks such as
entity recognition and entity alignment.
○ KGC and relation extraction can be treated jointly. Han et
al. [1] proposed a joint learning framework with mutual
attention for data fusion between knowledge graphs and
text, which solves both KGC and relation extraction from
text.
References
1 X. Han, Z. Liu, and M. Sun, “Neural knowledge acquisition via mutual attention between knowledge
graph and text,” in AAAI, 2018, pp. 4832–4839.
Knowledge Acquisition
Knowledge Graph Completion
● Knowledge Graphs constructed from unstructured text or
acquired from other sources are by nature incomplete.
○ Why? Created at scale from millions of documents or
at Web scale they are easily amenable to noise.
A example of an incomplete Knowledge Graph with a missing relation
Img src: https://towardsdatascience.com/embedding-models-for-knowledge-graph-completion-a66d4c01d588
Knowledge Graph Completion
● Knowledge Graphs constructed from unstructured text or
acquired from other sources are by nature incomplete.
○ Why? Created at scale from millions of documents or
at Web scale they are easily amenable to noise.
is a university in
A example of an incomplete Knowledge Graph with a missing relation
Img src: https://towardsdatascience.com/embedding-models-for-knowledge-graph-completion-a66d4c01d588
The Knowledge Graph Completion Task
Given a KG having edges specified with a triplet of elements
(h, r, t) ∈ E × R × E where the head (h) and the tail (t) entities
are elements of E and r is a type of relation of R. Note relations
can be directed.
Formally, we define KGC as the task that tries to predict any
missing element of the triplet (h, r, t). In particular, we talk
about:
● link (entity) prediction when an element between h or t is
missing ((?, r, t) or (h, r, ?));
● relation prediction when r is missing (h, ?, t)
The Knowledge Graph Completion Task
Formally, we define KGC as the task that tries to predict any
missing element of the triplet (h, r, t). In particular, we talk
about:
● link (entity) prediction when an element between h or t is
missing ((?, r, t) or (h, r, ?));
● relation prediction when r is missing (h, ?, t)
The Knowledge Graph Completion Task
Formally, we define KGC as the task that tries to predict any
missing element of the triplet (h, r, t). In particular, we talk
about:
● link (entity) prediction when an element between h or t is
missing ((?, r, t) or (h, r, ?));
● relation prediction when r is missing (h, ?, t);
● Aside: triplet classification when an algorithm recognizes
whether a given triplet (h, r, t) is correct or not.
Knowledge Graph Completion
● challenging because:
○ it is not trivial to create a KG;
○ every entity could have a variable number of attributes
(non-unique specification);
○ R could contain different types of relation (multi-layer
network, hierarchical network);
○ a KG changes over time (evolution over time).
Plan for Part II of II of the Talk
● Approaches
● Datasets and Toolkits
1. Embedding-based (ranking) methods
○ involves learning low-dimensional embeddings, i.e. adopting Knowledge Graph Embedding
(KGE) method used originally for triple prediction
2. Relational path reasoning
○ Embedding-based methods however failed to capture multi-step relationships.
○ Relational path reasoning methods explore multi-step relation paths
○ The two approaches below also have the same information capture paradigm but
incorporating logical rules
3. Logical rule reasoning
4. Meta relational learning
Approaches to Knowledge Graph Completion
1. Embedding-based (ranking) methods
○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing:
■ learn embedding vectors based on existing triples:
● during test, the missing h or t entity is predicted from the existing set E of entities in the KG;
● during training, triple instances are created by replacing h or t with each entity in E, scores are
calculated of all candidate entities, and the top k entities are ranked.
■ all Knowledge Graph Embedding methods that represent inputs and candidates in a unified
embedding space are applicable. E.g., TransE [1], TransH [2], TransR [3], HolE [4].
References
1 A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in NIPS, 2013, pp. 2787–2795.
2 Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph embedding by translating on hyperplanes,” in AAAI, 2014, pp. 1112–1119.
3 Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in AAAI, 2015, pp. 2181–2187.
4 M. Nickel, L. Rosasco, and T. Poggio, “Holographic embeddings of knowledge graphs,” in AAAI, 2016, pp. 1955–1961.
Approaches to Knowledge Graph Completion
1. Embedding-based (ranking) methods
○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing:
■ TransE model [Bordes et al., 2013]
References
1 A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in NIPS, 2013, pp. 2787–2795.
Approaches to Knowledge Graph Completion
1. Embedding-based (ranking) methods
○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing:
■ learn embedding vectors based on existing triples:
● during test, the missing h or t entity is predicted from the existing set E of entities in the KG;
● during training, triple instances are created by replacing h or t with each entity in E, scores are
calculated of all candidate entities, and the top k entities are ranked.
■ Unlike representing inputs and candidates in a unified embedding space, ProjE [1] proposes a
combined embedding by space projection of the known parts of input triples, i.e., (h; r; ?) or (?;
r; t), and the candidate entities with the candidate-entity matrix Wc belongs to Rsxd, where s
is the number of candidate entities. Their embedding projection function includes a neural
combination layer and a output projection layer.
References
1 Shi and T. Weninger, “ProjE: Embedding projection for knowledge graph completion,” in AAAI, 2017, pp. 1236–1242.
Approaches to Knowledge Graph Completion
1. Embedding-based (ranking) methods
○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing:
■ learn embedding vectors based on existing triples:
● during test, the missing h or t entity is predicted from the existing set E of entities in the KG;
● during training, triple instances are created by replacing h or t with each entity in E, scores are
calculated of all candidate entities, and the top k entities are ranked.
■ ConMask [1] proposes relationship-dependent content masking over the entity description to
select relevant snippets of given relations, and CNN-based target fusion to complete the
knowledge graph. It can only make a prediction when query relations and entities are explicitly
expressed in the text description.
References
1 B. Shi and T. Weninger, “Open-world knowledge graph completion,” in AAAI, 2018, pp. 1957–1964.
Approaches to Knowledge Graph Completion
2. Relation path reasoning
○ A limitation of the embedding based method is that they do not model complex relation paths.
E.g. one-to-many, or many-to-many relations
■ Relation path reasoning leverages path information over the graph structure.
Approaches to Knowledge Graph Completion
2. Relation path reasoning
○ A limitation of the embedding based method is that they do not model complex relation paths.
■ Relation path reasoning leverages path information over the graph structure.
○ Random walk inference has been investigated.
■ E.g., the Path-Ranking Algorithm (PRA) [1] chooses a relational path under a combination of
path constraints and conducts maximum-likelihood classification.
○ Neural multi-hop relational path modeling is also studied.
■ Neelakantan et al. [2] models complex relation paths by applying compositionality recursively
over the relations in the path as depicted in the figure below.
References
1 N. Lao and W. W. Cohen, “Relational retrieval using a combination of path-constrained random walks,” Machine learning, vol. 81, no. 1, pp. 53–67, 2010.
2 A. Neelakantan, B. Roth, and A. McCallum, “Compositional vector space models for knowledge base completion,” in ACL-IJCNLP, vol. 1, 2015, pp. 156–166.
Approaches to Knowledge Graph Completion
2. Relation path reasoning
○ Chains-of-Reasoning [1], a neural attention mechanism to enable multiple reasons,
represents logical composition across all relations, entities, and text.
○ DIVA [2] proposes a unified variational inference framework that takes multi-hop reasoning as
two sub-steps of path-finding (a prior distribution for underlying path inference) and
path-reasoning (a likelihood for link classification).
References
1 R. Das, A. Neelakantan, D. Belanger, and A. McCallum, “Chains of reasoning over entities, relations, and text using recurrent neural networks,” in EACL, vol. 1, 2017,
pp. 132–141.
2 W. Chen, W. Xiong, X. Yan, and W. Y. Wang, “Variational knowledge graph reasoning,” in NAACL, 2018, pp. 1823–1832.
Approaches to Knowledge Graph Completion
2. Reinforcement-learning based path finding
○ Deep reinforcement learning (RL) is introduced for multi-hop reasoning by formulating
path-finding between entity pairs as sequential decision making, specifically a Markov
decision process (MDP). The policy-based RL agent learns to find a step of relation to
extending the reasoning paths via the interaction between the knowledge graph environment,
where the policy gradient is utilized for training RL agents.
■ KGC based on RL concepts of State, Action, Reward, and Policy Network
○ DeepPath [1] firstly applies RL into relational path learning and develops a novel reward
function to improve accuracy, path diversity, and path efficiency. It encodes states in the
continuous space via a translational embedding method and takes the relation space as its
action space.
○ Similarly, MINERVA [2] takes path walking to the correct answer entity as a sequential
optimization problem by maximizing the expected reward. It excludes the target answer entity
and provides more capable inference.
References
1 W. Xiong, T. Hoang, and W. Y. Wang, “DeepPath: A reinforcement learning method for knowledge graph reasoning,” in EMNLP, 2017, pp. 564–573.
2 R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and A. McCallum, “Go for a walk and arrive at the answer: Reasoning over paths
in knowledge bases using reinforcement learning,” in ICLR, 2018, pp. 1–18.
Approaches to Knowledge Graph Completion
2. Reinforcement-learning based path finding
○ Instead of using a binary reward function, MultiHop [1] proposes a soft reward mechanism.
Action dropout is also adopted to mask some outgoing edges during training to enable more
effective path exploration.
○ M-Walk [2] applies an RNN controller to capture the historical trajectory and uses the Monte
Carlo Tree Search (MCTS) for effective path generation.
○ Leveraging text corpus with the sentence bag of current entity denoted as bet , CPL [3]
proposes collaborative policy learning for pathfinding and fact extraction from text.
○ For the policy networks, DeepPath uses fully-connected network, the extractor of CPL
employs CNN, while the rest uses recurrent networks.
References
1 X. V. Lin, R. Socher, and C. Xiong, “Multi-hop knowledge graph reasoning with reward shaping,” in EMNLP, 2018, pp. 3243–3253.
2 Y. Shen, J. Chen, P.-S. Huang, Y. Guo, and J. Gao, “M-Walk: Learning to walk over graphs using monte carlo tree search,” in NeurIPS, 2018, pp. 6786–6797.
3 C. Fu, T. Chen, M. Qu, W. Jin, and X. Ren, “Collaborative policy learning for open knowledge graph reasoning,” in EMNLP, 2019, pp. 2672–2681.
Approaches to Knowledge Graph Completion
3. Rule-based Reasoning
○ Another direction for Knowledge Graph Completion
■ making use of the symbolic nature of knowledge is logical rule learning
○ E.g., the inference rule: (Y; sonOf; X) <-- (X; hasChild; Y) ^ (Y; gender; Male), where the
relation ‘sonOf’ did not exist earlier.
■ Logical rules can been extracted by rule mining tools like AMIE [1]
○ RLvLR [2] proposes a scalable rule mining approach with efficient rule searching and
pruning, and uses the extracted rules for relation prediction.
References
1 L. A. Gal´arraga, C. Teflioudi, K. Hose, and F. Suchanek, “AMIE: association rule mining under incomplete evidence in ontological knowledge bases,” in WWW, 2013,
pp. 413–422.
2 P. G. Omran, K. Wang, and Z. Wang, “An embedding-based approach to rule learning in knowledge graphs,” IEEE TKDE, pp. 1–12, 2019.
Approaches to Knowledge Graph Completion
3. Rule-based Reasoning
○ In a different research direction on this topic, research is focused on injecting logical rules
into embeddings to improve reasoning, with joint learning, as an example, applied to
incorporate first-order logic rules.
■ E.g., KALE [1] proposes a unified joint model with t-norm fuzzy logical connectives defined for
compatible triples and logical rules embedding.
■ Specifically, three compositions of logical conjunction, disjunction, and negation are defined to
compose the truth value of a complex formula.
References
1 S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo, “Jointly embedding knowledge graphs and logical rules,” in EMNLP, 2016, pp. 192–202.
Approaches to Knowledge Graph Completion
4. Meta Relational Learning
○ Consider that the real-world scenario of knowledge is, in fact, dynamic where unseen triples
are usually acquired.
○ The new scenario is called as meta relational learning or few-shot relational learning
■ requires models to predict new relational facts with only very few samples
○ GMatching [1] develops a metric based few-shot learning method with entity embeddings and
local graph structures.
■ It encodes one-hop neighbors to capture the structural information with R-GCN and then takes
the structural entity embedding for multistep matching guided by long short-term memory
(LSTM) networks to calculate the similarity scores.
References
1 W. Xiong, M. Yu, S. Chang, X. Guo, and W. Y. Wang, “One-shot relational learning for knowledge graphs,” in EMNLP, 2018, pp. 1980–1990.
Approaches to Knowledge Graph Completion
Plan for Part II of II of the Talk
● Approaches
● Datasets and Toolkits
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications.
arXiv preprint arXiv:2002.00388.
A popular way of generating task-specific datasets is to sample subsets from large general datasets.
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications.
arXiv preprint arXiv:2002.00388.
E.g., the WN prefixed dataset names are a subset of the WordNet knowledge base.
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
● WordNet is designed to produce an intuitively usable dictionary and thesaurus, and support automatic text
analysis.
● Its entities (termed synsets) correspond to word senses, and relationships define lexical relations between
them. Examples of triplets are (score_NN_1, hypernym, evaluation_NN_1) or (score_NN_2, has_part,
musical_notation_NN_1).
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications.
arXiv preprint arXiv:2002.00388.
On the other hand, the FB prefixed dataset names are a subset of the Freebase knowledge base.
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
● Freebase is a huge and growing KB of general facts; there are currently around 1.2 billion triplets and
more than 80 million entities.
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
● Freebase is a huge and growing KB of general facts; there are currently around 1.2 billion triplets and
more than 80 million entities.
● The small data set (FB15K) was made by selected the subset of entities that are also present in the
Wikilinks database and that also have at least 100 mentions in Freebase (for both entities and
relationships).
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
● The large-scale dataset was created by selecting the most frequently occurring 5 million entities occuring
in Freebase.
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications.
arXiv preprint arXiv:2002.00388.
● The datasets WN18 and FB15k suffer from test set leakage through inverse relations, where a large
number of test triples could be obtained by inverting triples in the training set.
Datasets
Dataset Original Data # Rel. # Ent. # Train # Valid. # Test
WN18 WordNet 18 40,943 141,442 5,000 5,000
FB15K Freebase 1,345 14,951 483,142 50,000 59,071
WN11 WordNet 11 38,696 112,581 2,609 10,544
FB13 Freebase 13 75,043 316,232 5,908 23,733
WN18RR WordNet 11 40,943 86,835 3,034 3,134
FB15k-237 Freebase 237 14,541 272,115 17,535 20,466
FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071
FB40K Freebase 1,336 39,528 370,648 67,946 96,678
Datasets for Tasks on Knowledge Graphs
Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications.
arXiv preprint arXiv:2002.00388.
● The FB15k-237 was then introduced – a subset of FB15k where inverse relations were removed
Toolkits
Task Library Language URL
General Grakn Python github.com/graknlabs/kglib
General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph
General GraphVile Python graphvite.io
Database Akutan Go github.com/eBay/akutan
KRL OpenKE PyTorch github.com/thunlp/OpenKE
KRL Fast-TransX C++ github.com/thunlp/Fast-TransX
KRL sckit-kge Python github.com/mnick/scikit-kge
KRL LibKGE PyTorch github.com/uma-pi1/kge
KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN
RE OpenNRE PyTorch github.com/thunlp/OpenNRE
Table: Summary of Knowledge Graph Building Technology as Open Source Libraries
Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications.
arXiv preprint arXiv:2002.00388.
Toolkits
Task Library Language URL
General Grakn Python github.com/graknlabs/kglib
General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph
General GraphVile Python graphvite.io
Database Akutan Go github.com/eBay/akutan
KRL OpenKE PyTorch github.com/thunlp/OpenKE
KRL Fast-TransX C++ github.com/thunlp/Fast-TransX
KRL sckit-kge Python github.com/mnick/scikit-kge
KRL LibKGE PyTorch github.com/uma-pi1/kge
KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN
RE OpenNRE PyTorch github.com/thunlp/OpenNRE
Table: Summary of Knowledge Graph Building Technology as Open Source Libraries
● AmpliGraph for knowledge representation learning
Toolkits
Task Library Language URL
General Grakn Python github.com/graknlabs/kglib
General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph
General GraphVile Python graphvite.io
Database Akutan Go github.com/eBay/akutan
KRL OpenKE PyTorch github.com/thunlp/OpenKE
KRL Fast-TransX C++ github.com/thunlp/Fast-TransX
KRL sckit-kge Python github.com/mnick/scikit-kge
KRL LibKGE PyTorch github.com/uma-pi1/kge
KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN
RE OpenNRE PyTorch github.com/thunlp/OpenNRE
● Akutan for knowledge graph store and query
Toolkits
Task Library Language URL
General Grakn Python github.com/graknlabs/kglib
General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph
General GraphVile Python graphvite.io
Database Akutan Go github.com/eBay/akutan
KRL OpenKE PyTorch github.com/thunlp/OpenKE
KRL Fast-TransX C++ github.com/thunlp/Fast-TransX
KRL sckit-kge Python github.com/mnick/scikit-kge
KRL LibKGE PyTorch github.com/uma-pi1/kge
KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN
RE OpenNRE PyTorch github.com/thunlp/OpenNRE
● Three example useful toolkits released by the research community.
○ scikit-kge and OpenKE for knowledge graph embedding
Toolkits
Task Library Language URL
General Grakn Python github.com/graknlabs/kglib
General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph
General GraphVile Python graphvite.io
Database Akutan Go github.com/eBay/akutan
KRL OpenKE PyTorch github.com/thunlp/OpenKE
KRL Fast-TransX C++ github.com/thunlp/Fast-TransX
KRL sckit-kge Python github.com/mnick/scikit-kge
KRL LibKGE PyTorch github.com/uma-pi1/kge
KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN
RE OpenNRE PyTorch github.com/thunlp/OpenNRE
● Three example useful toolkits released by the research community.
○ OpenNRE for relation extraction
● Entity Linking is a long researched topic in the NLP community
● Neural models have enabled systems to cross the 95% performance barriers for the
task
● Knowledge Graph Completion is an active research area and relatively new with neural
model considerations
○ Uses machine learning and neural networks to ‘vectorize’ entities and relationships
● Implementations can be slow, but recently this has started to change
Conclusion: Takeaways
Happy to take Questions
Thank you for your attention!

Perspectives on mining knowledge graphs from text

  • 1.
    Presented by: JenniferD’Souza, Postdoc in the ORKG team http://orkg.org | @orkg_org Technische Informationsbibliothek (TIB) Welfengarten 1B // 30167 Hannover Perspectives on Mining Knowledge Graphs from Text
  • 2.
    ● A criticalscientific document digitalization initiative in this digital age ○ Capturing scholarly article contributions in machine interpretable Knowledge Graphs ● The ORKG is hosted at TIB ○ https://www.orkg.org/ ○ @orkg_org ● Led by TIB director Prof. (Dr.) Sören Auer The Open Research Knowledge Graph (ORKG) is ...
  • 3.
    Perspectives on MiningKnowledge Graphs from Text Knowledge Graph Knowledge Representation Learning Representation Space Scoring Function Encoding Models Knowledge Acquisition Entity Discovery Relation Extraction Knowledge Graph Completion *Point-wise *Manifold *Complex *Gaussian *Discrete *Linear/Bilinear *RNN *Factorization *Transformers *Neural Nets *GCN *CNN *Recognition *Typing *Linking *Alignment *Neural Nets *Attention *GCN *GAN *RL *Others *Embedding-based Ranking *Path-based Reasoning *Rule-based Reasoning *Meta Relational Learning *Triple Classification References Ji, Shaoxiong, et al. "A survey on knowledge graphs: Representation, acquisition, and applications." IEEE Transactions on Neural Networks and Learning Systems (2021). *Distance-based *Similarity Matching
  • 4.
    (I) Entity Linkingand (II) KG Completion Jennifer D’Souza Technische Informationsbibliothek (TIB) Welfengarten 1B // 30167 Hannover
  • 5.
    ● Given anentity mention in a text document and a knowledge base (KB) of entities, ○ find the entity in the KB the entity mention refers to or ○ determine that such entity does not exist in the KB Entity Linking
  • 6.
    Entity Linking What isthe birthdate of the famous basketball player Michael Jordan?
  • 7.
    Entity Linking What isthe birthdate of the famous basketball player Michael Jordan?
  • 8.
    Entity Linking What isthe birthdate of the famous basketball player Michael Jordan? Knowledge Bases
  • 9.
    Entity Linking What isthe birthdate of the famous basketball player Michael Jordan? Knowledge Bases
  • 10.
    Entity Linking What isthe birthdate of the famous basketball player Michael Jordan? Knowledge Bases
  • 11.
    Entity Linking ● challengingbecause ○ entity ambiguity: mentions with the same word/phrase can have various entity candidates ■ E.g., Michael Jordan: Basketball player or Berkeley professor? ○ name variations: mentions with different words/phrases can refer to the same entity ■ E.g., New York City or Big Apple
  • 12.
    Entity Linking ● challengingbecause ○ entity ambiguity: mentions with the same word/phrase can have various entity candidates ■ E.g., Michael Jordan: Basketball player or Berkeley professor? ○ name variations: mentions with different words/phrases can refer to the same entity ■ E.g., New York City or Big Apple ● Aside 1: alternatively called Named Entity Disambiguation ○ However, Named Entity Disambiguation (NED) and Entity Linking (EL) can sometimes be treated as separate tasks. ■ NED: determine which named entity a mention refers to. ● E.g., the mention “Trump” can refer to either a person, a corporation or a building; ■ EL: provide a standard IRI for each disambiguated entity. ● IRIs (Internationalized Resource Identifier) used as subjects, predicates, and objects can be taken from well-defined vocabularies or ontologies in the Linked Open Data (LOD) cloud
  • 13.
    Entity Linking ● challengingbecause ○ entity ambiguity: mentions with the same word/phrase can have various entity candidates ■ E.g., Michael Jordan: Basketball player or Berkeley professor? ○ name variations: mentions with different words/phrases can refer to the same entity ■ E.g., New York City or Big Apple ● Aside 1: alternatively called Named Entity Disambiguation ○ However, Named Entity Disambiguation (NED) and Entity Linking (EL) can sometimes be treated as separate tasks. ■ NED: determine which named entity a mention refers to. ● E.g., the mention “Trump” can refer to either a person, a corporation or a building; ■ EL: provide a standard IRI for each disambiguated entity. ● E.g., Trump-the-president can be linked to the IRI that represents him in Wikidata: https://www.wikidata.org/entity/Q22686
  • 14.
    Entity Linking ● challengingbecause ○ entity ambiguity: mentions with the same word/phrase can have various entity candidates ■ E.g., Michael Jordan: Basketball player or Berkeley professor? ○ name variations: mentions with different words/phrases can refer to the same entity ■ E.g., New York City or Big Apple ● Aside 1: alternatively called Named Entity Disambiguation ○ In this talk, NED and EL are treated as the same task, i.e. NED that finds which entity a mention like “Trump” refers to, and the EL providing the LOD IRI for that entity, are considered as one step
  • 15.
    Entity Linking ● challengingbecause ○ entity ambiguity: mentions with the same word/phrase can have various entity candidates ■ E.g., Michael Jordan: Basketball player or Berkeley professor? ○ name variations: mentions with different words/phrases can refer to the same entity ■ E.g., New York City or Big Apple ● Aside 2: commonly known as normalization for the biomedical domain ○ Map a word/phrase in a document to a concept in an ontology after disambiguating potential ambiguous words/phrases
  • 16.
    Entity Linking ● challengingbecause ○ entity ambiguity: mentions with the same word/phrase can have various entity candidates ■ E.g., Michael Jordan: Basketball player or Berkeley professor? ○ name variations: mentions with different words/phrases can refer to the same entity ■ E.g., New York City or Big Apple ● Aside 2: commonly known as normalization for the biomedical domain ○ This talk will focus on the open-domain EL task, i.e. involving data from newswire or the Web. ■ While the approaches for open-domain EL can be imported to biomedical entity normalization, the latter task may be amenable to strong rule-based resolution1,2 as well References 1 D’Souza, J., & Ng, V. (2015, July). Sieve-based entity linking for the biomedical domain. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 297-302) 2. D. Kim et al., "A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining," in IEEE Access, vol. 7, pp. 73729-73740, 2019, doi: 10.1109/ACCESS.2019.2920708.
  • 17.
    Plan for PartI of II of the Talk ● Datasets & Knowledge Bases ● (Neural) Approaches ○ since 2015 ● Evaluations
  • 18.
    Datasets ● Open-domain Evaluationdatasets from various genres ○ News, Tweets, Web pages, Blog, Encyclopedia
  • 19.
    Datasets: Details &Statistics Dataset Name Genre Mentions KB AIDA news 34,587 YAGO/Freebase/W ikipedia KBP’2010 news 4,338 Wikipedia MSNBC news 656 Wikipedia AQUAINT news 449 Wikipedia ACE-2004 news 257 Wikipedia WNED-CWEB (CWEB) news 11,154 Wikipedia WNED-WIKI (WW) news 6,821 Wikipedia
  • 20.
    Datasets: Details &Statistics Dataset Name Genre Mentions KB AIDA news 34,587 YAGO/Freebase/W ikipedia KBP’2010 news 4,338 Wikipedia MSNBC news 656 Wikipedia AQUAINT news 449 Wikipedia ACE-2004 news 257 Wikipedia WNED-CWEB (CWEB) news 11,154 Wikipedia WNED-WIKI (WW) news 6,821 Wikipedia AIDA is the largest human-annotated dataset, where each of the 34,587 mentions were checked for entities in the YAGO knowledge base.
  • 21.
    Datasets: Details &Statistics Dataset Name Genre Mentions KB AIDA news 34,587 YAGO/Freebase/W ikipedia KBP’2010 news 4,338 Wikipedia MSNBC news 656 Wikipedia AQUAINT news 449 Wikipedia ACE-2004 news 257 Wikipedia WNED-CWEB (CWEB) news 11,154 Wikipedia WNED-WIKI (WW) news 6,821 Wikipedia The Knowledge Base Population (KBP) track conducted as part of NIST Text Analysis Conference (TAC) is an international entity linking competition held every year since 2009. Entity linking is regarded as one of the two subtasks in this track. These public entity linking competitions provided some benchmark data sets to evaluate and compare different entity linking systems.
  • 22.
    Datasets: Details &Statistics Dataset Name Genre Mentions KB AIDA news 34,587 YAGO/Freebase/W ikipedia KBP’2010 news 4,338 Wikipedia MSNBC news 656 Wikipedia AQUAINT news 449 Wikipedia ACE-2004 news 257 Wikipedia WNED-CWEB (CWEB) news 11,154 Wikipedia WNED-WIKI (WW) news 6,821 Wikipedia Then there is a dataset from the MSNBC news source.
  • 23.
    Datasets: Details &Statistics Dataset Name Genre Mentions KB AIDA news 34,587 YAGO/Freebase/W ikipedia KBP’2010 news 4,338 Wikipedia MSNBC news 656 Wikipedia AQUAINT news 449 Wikipedia ACE-2004 news 257 Wikipedia WNED-CWEB (CWEB) news 11,154 Wikipedia WNED-WIKI (WW) news 6,821 Wikipedia WNED datasets where WNED stands for Walking Named Entity Disambiguation as a name of the algorithm developed for EL are the largest automatically created datasets.
  • 24.
    Others: NEEL (tweets;8,665 mentions; DBpedia); OKE-2015 (encyclopedia; DBpedia); WES2015 (blog; DBpedia); WikiNews (news; DBpedia); OKE2016 (Web pages; 1,043 mentions; DBpedia) Datasets: Details & Statistics Dataset Name Genre Mentions KB AIDA news 34,587 YAGO/Freebase/W ikipedia KBP’2010 news 4,338 Wikipedia MSNBC news 656 Wikipedia AQUAINT news 449 Wikipedia ACE-2004 news 257 Wikipedia WNED-CWEB (CWEB) news 11,154 Wikipedia WNED-WIKI (WW) news 6,821 Wikipedia
  • 25.
    Others: NEEL (tweets;8,665 mentions; DBpedia); OKE-2015 (encyclopedia; DBpedia); WES2015 (blog; DBpedia); WikiNews (news; DBpedia); OKE2016 (Web pages; 1,043 mentions; DBpedia) Datasets: Details & Statistics Dataset Name Genre Mentions KB AIDA news 34,587 YAGO/Freebase/W ikipedia KBP’2010 news 4,338 Wikipedia MSNBC news 656 Wikipedia AQUAINT news 449 Wikipedia ACE-2004 news 257 Wikipedia WNED-CWEB (CWEB) news 11,154 Wikipedia WNED-WIKI (WW) news 6,821 Wikipedia
  • 26.
    ● A fundamentalcomponent for Entity Linking ● Knowledge bases provide the information about the world’s entities (e.g., the entities of Albert Einstein and Ulm), their semantic categories (e.g., Albert Einstein has a type of Scientist and Ulm has a type of City), and the mutual relationships between entities (e.g., Albert Einstein has a relation named bornIn with Ulm). ● Examples: ○ Wikipedia (6,195,675 English articles)1 ■ a free online multilingual encyclopedia created through decentralized, collective efforts of thousands of volunteers around the world. ■ The basic entry in Wikipedia is an article, which defines and describes an entity or a topic, and each article in Wikipedia is uniquely referenced by an identifier. ■ Wikipedia provides a set of useful features for entity linking, such as entity pages, article categories, redirect pages, disambiguation pages, and hyperlinks in Wikipedia articles. ○ DBpedia (4.58 million things in English version)2 ■ multilingual knowledge base constructed by extracting structured information from Wikipedia such as infobox templates, categorization information, geo-coordinates, and links to external Web pages References 1 https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia 2 https://wiki.dbpedia.org/about/facts-figures Knowledge Bases for Entities
  • 27.
    ● A fundamentalcomponent for Entity Linking ● Knowledge bases provide the information about the world’s entities (e.g., the entities of Albert Einstein and Ulm), their semantic categories (e.g., Albert Einstein has a type of Scientist and Ulm has a type of City), and the mutual relationships between entities (e.g., Albert Einstein has a relation named bornIn with Ulm). ● Examples: ○ YAGO (50 million entities and 2 billion facts)1 ■ YAGO combines Wikidata and the schema.org ontology as the top level ontology for information organization, thus getting the best from both worlds: a huge repository of facts, together with an ontology that is simple and used as a standard by a large community. References 1 https://yago-knowledge.org/getting-started Knowledge Bases for Entities
  • 28.
    We have: 1. releaseda novel multidisciplinary corpus of scholarly abstracts annotated for scientific entities under a generic conceptual formalism that bridges 10 different STEM scientific disciplines a. The STEM domains we consider are Agriculture, Astronomy, Biology, Chemistry, Computer Science, Earth Science, Engineering, Materials Science, and Mathematics. b. The generic conceptual formalism involves four entity types i. Process, Method, Material, and Data c. The terms underlying the Process, Method, Material, and Data entities are linked in Wikipedia, thereby, our entities are disambiguated for their scientific sense and grounded in the real world. d. The STEM-ECR v1.0 corpus is publicly available: https://doi.org/10.25835/0017546 (ISLRN 749-555-840-571-2) References ● Brack, Arthur, Jennifer D’Souza, Anett Hoppe, Sören Auer, and Ralph Ewerth. "Domain-independent extraction of scientific concepts from research articles." In European Conference on Information Retrieval, pp. 251-266. Springer, Cham, 2020. ● D’Souza, Jennifer, Anett Hoppe, Arthur Brack, Mohmad Yaser Jaradeh, Sören Auer, and Ralph Ewerth. "The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources." In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 2192-2203. 2020. Datasets & Knowledge Bases New Resource Highlight: Scholarly Knowledge Linked Entities across STEM Disciplines
  • 29.
    Plan for PartI of II of the Talk ● Datasets & Knowledge Bases ● (Neural) Approaches ○ since 2015 ● Evaluations
  • 30.
    (Neural) Approaches toEntity Linking ● EL has three main subtasks: ○ candidate-entity generation; ■ aims to retrieve all possible entities in the KB that may refer to an entity mention ○ candidate-entity ranking or disambiguation; ■ aims to rank the candidate entities and return the most likely one for each targeted mention ○ NIL clustering ■ handles those mentions that cannot be matched with an entity in the KB
  • 31.
    Approaches to EntityLinking: From the 3 subtasks perspective Reference: Figure 7 in T. Al-Moslmi, M. Gallofré Ocaña, A. L. Opdahl and C. Veres, "Named Entity Extraction for Knowledge Graphs: A Literature Overview," in IEEE Access, vol. 8, pp. 32862-32881, 2020, doi: 10.1109/ACCESS.2020.2973928.
  • 32.
    Approaches to EntityLinking: From the systems perspective Reference: Part of Figure 8 in T. Al-Moslmi, M. Gallofré Ocaña, A. L. Opdahl and C. Veres, "Named Entity Extraction for Knowledge Graphs: A Literature Overview," in IEEE Access, vol. 8, pp. 32862-32881, 2020, doi: 10.1109/ACCESS.2020.2973928.
  • 33.
    Three Non-Neural Approaches ●AIDA1 ○ Mention Detection using Stanford NER Tagger ○ Linking as a graph-based technique with weighted edges computed as degree of links between pages ● DBpedia Spotlight2 ○ Mention Detection as a lightweight heuristics-based model with syntactic parsers to generate mention candidates ○ Linking as a generative probabilistic model using maximum likelihood estimates ● Babelfy3 ○ Mention Detection as named entities (e.g., Major League Soccer) and overlapping nominals (e.g., major league, soccer) ○ A unified graph-based approach relying on encyclopedic and lexicographic knowledge References 1. AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables 2. Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013, September). Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of the 9th International Conference on Semantic Systems (pp. 121-124). 3. A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, pp. 231-244, 2014 Approaches to Entity Linking: From the systems perspective prior to 2015
  • 34.
    (Neural) Approaches toEntity Linking: General Architecture since 2015 Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
  • 35.
    (Neural) Approaches toEntity Linking: General Architecture since 2015 Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. mentions in a plain text are distinguished corresponding entity is predicted for the given mention
  • 36.
    (Neural) Approaches toEntity Linking: General Architecture since 2015 Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. possible entities are produced for the mention context/mention - candidate similarity score is computed through the representations
  • 37.
    Three prominent methods: ●a surface form matching ○ a candidate list is composed of entities, which simply match surface forms of mentions in the text; does not work well if referent entity does not contain mention string ● a dictionary lookup ○ a dictionary of additional aliases is constructed using KB metadata like disambiguation/redirect pages of Wikipedia or lexical synonymy relations ● and a prior probability computation ○ the candidates are generated based on precalculated prior probabilities of correspondence between certain mentions and entities; based on mention-entity hyperlink count statistics [1,2,3,4,5,etc.] ○ References 1. Stefan Zwicklbauer, Christin Seifert, and Michael Granitzer. 2016. Robust and collective entity disambiguation through semantic embeddings. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, pages 425–434, New York, NY, USA. ACM. 2. Chen-Tse Tsai and Dan Roth. 2016. Cross-lingual Wikification using multilingual embeddings. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 589–598, San Diego, California, USA. ACL. 3. Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep joint entity disambiguation with local neural attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2619–2629, Copenhagen, Denmark. ACL. 4. Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 519–529, Brussels, Belgium. Association for Computational Linguistics. 5. Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In The 32 AAAI, New Orleans, Louisiana, USA. AAAI Press. Candidate Generation
  • 38.
    The goal ofthis stage is given a list of entity candidates from a KB and a context with a mention to rank these entities assigning a score to each of them. Entity Ranking
  • 39.
    General Architecture ofa Neural Entity Ranking Component Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575.
  • 40.
    General Architecture ofa Neural Entity Ranking Component Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. Three parts 1.Encoding the mention
  • 41.
    General Architecture ofa Neural Entity Ranking Component Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. ● To correctly disambiguate an entity mention, it is crucial to thoroughly capture the information from its context. ● A contextualized vector representation of a mention is generated by an encoder network.
  • 42.
    Two approaches prevail: ●recurrent networks with LSTM cells ● self-attention Mention Encoding Subcomponent
  • 43.
    Two approaches prevail: ●recurrent networks with LSTM cells ○ concatenating outputs of two LSTM networks that independently encode left and right contexts of a mention (including the mention itself) [1]; ○ encode left and right local contexts via LSTMs but also pool the results across all mentions in a coreference chain and postprocess left and right representations with a tensor network [2]; ○ modification of LSTM–GRU in conjunction with an attention mechanism to encode left and right context of a mention [3]; ○ run a bidirectional LSTM network on words complemented with embeddings of word positions relative to a target mention [4] References 1 Nitish Gupta, Sameer Singh, and Dan Roth. 2017. Entity linking via joint encoding of types, descriptions, and context. In 2017 EMNLP, pages 2681–2690, Copenhagen, Denmark. ACL. 2 Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In The 32 AAAI, New Orleans, Louisiana, USA. AAAI Press. 3. Yotam Eshel, Noam Cohen, Kira Radinsky, Shaul Markovitch, Ikuya Yamada, and Omer Levy. 2017. Named entity disambiguation for noisy text. In CoNLL 2017, pages 58–68, Vancouver, Canada. ACL. 4. Phong Le and Ivan Titov. 2019b. Distant learning for entity linking with automatic noise detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4081–4090, Florence, Italy, July. ACL. Mention Encoding Subcomponent
  • 44.
    Two approaches prevail: ●recurrent networks with LSTM cells ● self-attention: encoding methods based on self-attention rely on the outputs from pre-trained BERT layers for context and mention encoding. Mention Encoding Subcomponent
  • 45.
    Two approaches prevail: ●recurrent networks with LSTM cells ● self-attention: encoding methods based on self-attention rely on the outputs from pre-trained BERT layers for context and mention encoding. ○ a mention representation is modeled by pooling over word pieces in a mention span. The authors also put an additional self-attention block over all mention representations that encode interactions between several entities in a sentence [1]. ○ reduce a sequence by keeping the representation of the special pooling symbol ‘[CLS]’ inserted at the beginning of a sequence [2]. ○ mark positions of a mention span by summing embeddings of words within the span with a special vector [3] and use the same reduction strategy as [2]. ○ concatenate text with all mentions in it and jointly encode this sequence via a self-attention model based on pre-trained BERT [4]. References 1 Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge enhanced contextual word representations. In Proceedings of the 2019 EMNLP-IJCNLP, pages 43–54, Hong Kong, China. ACL. 2 Ledell Yu Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Zero-shot entity linking with dense entity retrieval. ArXiv, abs/1911.03814. 3 Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, and Honglak Lee. 2019. Zero-shot entity linking by reading entity descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3449–3460, Florence, Italy. ACL. 4 Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2020. Global entity disambiguation with pretrained contextualized embeddings of words and entities. arXiv preprint arXiv:1909.00426v2. Mention Encoding Subcomponent
  • 46.
    General Architecture ofa Neural Entity Ranking Component Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. 2.Encoding the candidate entities Three parts
  • 47.
    General Architecture ofa Neural Entity Ranking Component Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. ● Linking decision will be based on how accurately candidate entities match a corresponding mention or context based on the entity structured or textual information. ● Low-dimensional semantic representations of entities account for this in such a way that spatial proximity of entities in a vector space correlates with their semantic similarity. Three parts
  • 48.
    Aim to obtainvector representations for entities: ● capture different kinds of entity information, including entity type, description page, linked mention, and contextual information, and therefore, generate a large encoder, which involves CNN for the entity description and alignment function for the others [1]. ● encode entities based on their title, description page, and category information. All previously mentioned models rely on the annotated data, and a few studies are challenged with less resource dependence [2]. ● derive entity embeddings using pre-trained word2vec word vectors through description page words, surface forms words, and entity category words [3,4]. ● depend on the BERT architecture to create representations through the description pages [5,6]. References 1 Nitish Gupta, Sameer Singh, and Dan Roth. 2017. Entity linking via joint encoding of types, descriptions, and context. In Proceedings of the 2017 EMNLP, pages 2681–2690, Copenhagen, Denmark. ACL. 2 Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning dense representations for entity retrieval. In Proceedings of the 23rd CoNLL, pages 528–537, Hong Kong, China. ACL. 3 Yaming Sun, Lei Lin, Duyu Tang, Nan Yang, Zhenzhou Ji, and XiaolongWang. 2015. Modeling mention, context and entity with neural networks for entity disambiguation. In Proceedings of the 24th, IJCAI’15, pages 1333–1339. AAAI Press. 4 Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In 32 AAAI, New Orleans, Louisiana, USA. AAAI Press. 5 Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, and Honglak Lee. 2019. Zero-shot entity linking by reading entity descriptions. In Proceedings of the 57th ACL, pages 3449–3460, Florence, Italy. ACL. 6 Ledell Yu Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Zero-shot entity linking with dense entity retrieval. ArXiv, abs/1911.03814. Entity Encoding Subcomponent
  • 49.
    General Architecture ofa Neural Entity Ranking Component Reference: Figure 3 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. 3. Comparing Mention and Candidate Entity Representations Three parts
  • 50.
    ● Most ofthe state-of-the-art studies compare mention and entity representations using a dot product [1,2,3,4] or cosine similarity [5,6,7]. ● The calculated similarity score is often combined with mention-entity priors obtained during the candidate generation phase [1,3,6] or other features including various similarities, string matching indicator, and entity types [6,8,9,10]. ● Commonly an additional one or two-layer feedforward network [1,6,9] is used. The final disambiguation decision is inferred via a probability distribution, usually by a softmax function over the candidates. The local similarity score or a probability distribution can be further utilized for global scoring. References 1 Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep joint entity disambiguation with local neural attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2619–2629, Copenhagen, Denmark. ACL. 2 Nitish Gupta, Sameer Singh, and Dan Roth. 2017. Entity linking via joint encoding of types, descriptions, and context. In Proceedings of the 2017 EMNLP, pages 2681–2690, Copenhagen, Denmark. ACL. 3 Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In 22nd CoNLL, pages 519–529, Brussels, Belgium. ACL. 4 Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge enhanced contextual word representations. In Proceedings of the 2019 EMNLP-IJCNLP, pages 43–54, Hong Kong, China. ACL. 5 Yaming Sun, Lei Lin, Duyu Tang, Nan Yang, Zhenzhou Ji, and XiaolongWang. 2015. Modeling mention, context and entity with neural networks for entity disambiguation. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 1333–1339. AAAI Press. 6 Matthew Francis-Landau, Greg Durrett, and Dan Klein. 2016. Capturing semantic similarity for entity linking with convolutional neural networks. In Proceedings of the 2016 NAACL: Human Language Technologies, pages 1256–1261, San Diego, California, USA. 7 Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning dense representations for entity retrieval. In Proceedings of the 23rd CoNLL, pages 528–537, Hong Kong, China. ACL. 8 Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural cross-lingual entity linking. In 32 AAAI, New Orleans, Louisiana, USA. AAAI Press. 9 Hamed Shahbazi, Xiaoli Z Fern, Reza Ghaeini, Rasha Obeidat, and Prasad Tadepalli. 2019. Entity-aware elmo:Learning contextual entity representation for entity disambiguation. arXiv preprint arXiv:1908.05762. Comparing Mention and Candidate Entity Representations
  • 51.
    ● Optionally addressedin some systems ● Aim to equip EL systems to recognize cases when referent entities of some mentions can be absent in the KBs. This is known as NIL prediction. Unlinkable Mention Prediction
  • 52.
    ● Optionally addressedin some systems ● Aim to equip EL systems to recognize cases when referent entities of some mentions can be absent in the KBs. This is known as NIL prediction. ● Four common ways to perform NIL prediction. ○ a candidate generator does not yield any corresponding entities for a mention by setting a threshold for linking probability [1,2] ○ introduce an additional special ‘NIL’ entity in the ranking phase, so some models predict it as the best match for the mention [3] ○ train an additional binary classifier that accepts mention-entity pairs after the ranking phase, as well as several additional features (best linking score, whether mentions are also detected by a dedicated NER system, etc.), and makes the final decision about whether a mention is linkable or not [4,5]. References 1 Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge enhanced contextual word representations. In Proceedings of the 2019 EMNLP-IJCNLP, pages 43–54, Hong Kong, China. Association for Computational Linguistics. 2 Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, and Fernando Pereira. 2015. Plato: A selective context model for entity resolution. TACL, 3:503–515. 3 Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In 22nd CoNLL, pages 519–529, Brussels, Belgium. ACL. 4 Jose G. Moreno, Romaric Besanc¸on, Romain Beaumont, Eva D’hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, and Brigitte Grau. 2017. Combining word and entity embeddings for entity linking. In Extended Semantic Web Conference (1), volume 10249 of Lecture Notes in Computer Science, pages 337–352. 5 Pedro Henrique Martins, Zita Marinho, and Andr´e F. T. Martins. 2019. Joint learning of named entity recognition and entity linking. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 190–196, Florence, Italy. Association for Computational Linguistics. Unlinkable Mention Prediction
  • 53.
    (Neural) Approaches toEntity Linking: General Architecture since 2015 Reference: Figure 2 in Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., & Biemann, C. (2020). Neural Entity Linking: A Survey of Models based on Deep Learning. arXiv preprint arXiv:2006.00575. possible entities are produced for the mention context/mention - candidate similarity score is computed through the representations
  • 54.
    Modifications of theGeneral Architecture: ● Joint Entity Recognition and Disambiguation Architectures ○ Observe that interaction between recognition and disambiguation is beneficial to improve overall model ■ E.g., multi-task learning framework that integrates recognition and linking [1] ● Global Context Architectures ○ global EL seen as sequential decision task where disambiguation of new entities is based on the already disambiguated ones ■ E.g., apply LSTM to be able to maintain long term memory for previous decisions [2] ● Cross-lingual Architectures ○ leverage supervision signals from multiple languages for training a model in a target language ■ E.g., the inter-lingual links in Wikipedia utilized for alignment of entities in multiple languages. With this alignment, the annotated data from high-resource languages like English can help to improve the quality of text processing for the low-resource ones [3] References 1 Pedro Henrique Martins, Zita Marinho, and Andr´e F. T. Martins. 2019. Joint learning of named entity recognition and entity linking. In 57th ACL: Student Research Workshop, pages 190–196, Florence, Italy. ACL. 2 Zheng Fang, Yanan Cao, Qian Li, Dongjie Zhang, Zhenyu Zhang, and Yanbing Liu. 2019. Joint entity linking with deep reinforcement learning. In The World Wide Web Conference, WWW ’19, pages 438–447, New York, NY, USA. ACM. 3 Heng Ji, Joel Nothman, Ben Hachey, and Radu Florian. 2015. Overview of TAC-KBP2015 tri-lingual entity discovery and linking. In Proceedings of the 2015 Text Analysis Conference, TAC 2015, pages 16–17, Gaithersburg, Maryland, USA. NIST. (Neural) Approaches to Entity Linking: General Architecture Modifications
  • 55.
    Plan for PartI of II of the Talk ● Datasets & Knowledge Bases ● (Neural) Approaches ○ since 2015 ● Evaluations
  • 56.
    Results are describedin terms of accuracy and micro F1 scores Evaluations: Metrics
  • 57.
    References Ikuya Yamada, KokiWashio, Hiroyuki Shindo, and Yuji Matsumoto. 2020. Global entity disambiguation with pretrained contextualized embeddings of words and entities. arXiv preprint arXiv:1909.00426v2. Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In 22nd CoNNL, pages 519–529, Brussels, Belgium. ACL. Ledell Yu Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Zero-shot entity linking with dense entity retrieval. ArXiv, abs/1911.03814. Evaluations: Entity Linking Results Dataset Accuracy Micro F1 System AIDA 0.950 - Yamada et al. (2020) - 0.824 Kolitsas et al. (2018) KBP’10 0.940 - Wu et al. (2020) MSNBC - 0.963 Yamada et al. (2020) AQUAINT - 0.935 Yamada et al. (2020) ACE-2004 - 0.919 Yamada et al. (2020) CWEB - 0.789 Yamada et al. (2020) WW - 0.891 Yamada et al. (2020)
  • 58.
    Open Source Toolsand Resources for Entity Linking Year System name NER NED URL 2010 Tagme Y Y https://tagme.d4science.org/tagme/ 2011 DBpedia Spotlight Y Y https://www.dbpedia-spotlight.org/ 2011 AIDA Y Y https://gate.d5.mpi-inf.mpg.de/webaida/ 2013 TwitIE Y Y https://gate.ac.uk/wiki/twitie.html 2014 Babelfy Y Y http://babelfy.org/ 2014 Stanford CoreNLP Y N https://stanfordnlp.github.io/CoreNLP/ 2015 SpaCy Y N https://spacy.io/ since 2015 neural models - - https://paperswithcode.com/task/entity-linking
  • 59.
    (I) Entity Linkingand (II) KG Completion Jennifer D’Souza Technische Informationsbibliothek (TIB) Welfengarten 1B // 30167 Hannover
  • 60.
    ● Involves theconstruction of Knowledge Graphs (KG) from unstructured text and other structured or semi-structured sources. ○ Core tasks are relation and entity extraction Knowledge Acquisition A KG is typically a multi-relational graph containing entities as nodes and relations as edges. Each edge is represented as a triplet (head entity, relation, tail entity) ((h; r; t) for short), indicating the relation between two entities, e.g., (Albert Einstein, WinnerOf, Nobel Prize in Physics).
  • 61.
    ● Involves theconstruction of Knowledge Graphs (KG) from unstructured text and other structured or semi-structured sources. ○ Core tasks are relation and entity extraction ● Powered by KGs, many real-world applications such as recommendation systems and question answering has seen significant progress with the their new capacity for commonsense understanding and reasoning. ○ Search powered by Google’s Knowledge Graph Knowledge Acquisition
  • 62.
    ● Involves theconstruction of Knowledge Graphs (KG) from unstructured text and other structured or semi-structured sources. ○ Core tasks are relation and entity extraction ● Powered by KGs, many real-world applications such as recommendation systems and question answering has seen significant progress with the their new capacity for commonsense understanding and reasoning. ○ Given knowledge: (Male,gender,Y) and (X,hasChild,Y) Knowledge Acquisition
  • 63.
    ● Involves theconstruction of Knowledge Graphs (KG) from unstructured text and other structured or semi-structured sources. ○ Core tasks are relation and entity extraction ● Powered by KGs, many real-world applications such as recommendation systems and question answering has seen significant progress with the their new capacity for commonsense understanding and reasoning. ○ Given knowledge: (Y,gender,Male) and (X,hasChild,Y) Then, inferences such as (Y,sonOf,X) are possible. Knowledge Acquisition
  • 64.
    ● Also involvescompleting an existing knowledge graph, and other entity-oriented acquisition tasks such as entity resolution and alignment. ● Thus, the main tasks of knowledge acquisition include relation extraction to convert unstructured text to structured knowledge, knowledge graph completion (KGC), and other entity-oriented acquisition tasks such as entity recognition and entity alignment. ○ KGC and relation extraction can be treated jointly. Han et al. [1] proposed a joint learning framework with mutual attention for data fusion between knowledge graphs and text, which solves both KGC and relation extraction from text. References 1 X. Han, Z. Liu, and M. Sun, “Neural knowledge acquisition via mutual attention between knowledge graph and text,” in AAAI, 2018, pp. 4832–4839. Knowledge Acquisition
  • 65.
    Knowledge Graph Completion ●Knowledge Graphs constructed from unstructured text or acquired from other sources are by nature incomplete. ○ Why? Created at scale from millions of documents or at Web scale they are easily amenable to noise. A example of an incomplete Knowledge Graph with a missing relation Img src: https://towardsdatascience.com/embedding-models-for-knowledge-graph-completion-a66d4c01d588
  • 66.
    Knowledge Graph Completion ●Knowledge Graphs constructed from unstructured text or acquired from other sources are by nature incomplete. ○ Why? Created at scale from millions of documents or at Web scale they are easily amenable to noise. is a university in A example of an incomplete Knowledge Graph with a missing relation Img src: https://towardsdatascience.com/embedding-models-for-knowledge-graph-completion-a66d4c01d588
  • 67.
    The Knowledge GraphCompletion Task Given a KG having edges specified with a triplet of elements (h, r, t) ∈ E × R × E where the head (h) and the tail (t) entities are elements of E and r is a type of relation of R. Note relations can be directed. Formally, we define KGC as the task that tries to predict any missing element of the triplet (h, r, t). In particular, we talk about: ● link (entity) prediction when an element between h or t is missing ((?, r, t) or (h, r, ?)); ● relation prediction when r is missing (h, ?, t)
  • 68.
    The Knowledge GraphCompletion Task Formally, we define KGC as the task that tries to predict any missing element of the triplet (h, r, t). In particular, we talk about: ● link (entity) prediction when an element between h or t is missing ((?, r, t) or (h, r, ?)); ● relation prediction when r is missing (h, ?, t)
  • 69.
    The Knowledge GraphCompletion Task Formally, we define KGC as the task that tries to predict any missing element of the triplet (h, r, t). In particular, we talk about: ● link (entity) prediction when an element between h or t is missing ((?, r, t) or (h, r, ?)); ● relation prediction when r is missing (h, ?, t); ● Aside: triplet classification when an algorithm recognizes whether a given triplet (h, r, t) is correct or not.
  • 70.
    Knowledge Graph Completion ●challenging because: ○ it is not trivial to create a KG; ○ every entity could have a variable number of attributes (non-unique specification); ○ R could contain different types of relation (multi-layer network, hierarchical network); ○ a KG changes over time (evolution over time).
  • 71.
    Plan for PartII of II of the Talk ● Approaches ● Datasets and Toolkits
  • 72.
    1. Embedding-based (ranking)methods ○ involves learning low-dimensional embeddings, i.e. adopting Knowledge Graph Embedding (KGE) method used originally for triple prediction 2. Relational path reasoning ○ Embedding-based methods however failed to capture multi-step relationships. ○ Relational path reasoning methods explore multi-step relation paths ○ The two approaches below also have the same information capture paradigm but incorporating logical rules 3. Logical rule reasoning 4. Meta relational learning Approaches to Knowledge Graph Completion
  • 73.
    1. Embedding-based (ranking)methods ○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing: ■ learn embedding vectors based on existing triples: ● during test, the missing h or t entity is predicted from the existing set E of entities in the KG; ● during training, triple instances are created by replacing h or t with each entity in E, scores are calculated of all candidate entities, and the top k entities are ranked. ■ all Knowledge Graph Embedding methods that represent inputs and candidates in a unified embedding space are applicable. E.g., TransE [1], TransH [2], TransR [3], HolE [4]. References 1 A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in NIPS, 2013, pp. 2787–2795. 2 Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph embedding by translating on hyperplanes,” in AAAI, 2014, pp. 1112–1119. 3 Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in AAAI, 2015, pp. 2181–2187. 4 M. Nickel, L. Rosasco, and T. Poggio, “Holographic embeddings of knowledge graphs,” in AAAI, 2016, pp. 1955–1961. Approaches to Knowledge Graph Completion
  • 74.
    1. Embedding-based (ranking)methods ○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing: ■ TransE model [Bordes et al., 2013] References 1 A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in NIPS, 2013, pp. 2787–2795. Approaches to Knowledge Graph Completion
  • 75.
    1. Embedding-based (ranking)methods ○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing: ■ learn embedding vectors based on existing triples: ● during test, the missing h or t entity is predicted from the existing set E of entities in the KG; ● during training, triple instances are created by replacing h or t with each entity in E, scores are calculated of all candidate entities, and the top k entities are ranked. ■ Unlike representing inputs and candidates in a unified embedding space, ProjE [1] proposes a combined embedding by space projection of the known parts of input triples, i.e., (h; r; ?) or (?; r; t), and the candidate entities with the candidate-entity matrix Wc belongs to Rsxd, where s is the number of candidate entities. Their embedding projection function includes a neural combination layer and a output projection layer. References 1 Shi and T. Weninger, “ProjE: Embedding projection for knowledge graph completion,” in AAAI, 2017, pp. 1236–1242. Approaches to Knowledge Graph Completion
  • 76.
    1. Embedding-based (ranking)methods ○ For the link prediction KGC task, i.e. for the KGC task with triples (h, r, t) with h or t missing: ■ learn embedding vectors based on existing triples: ● during test, the missing h or t entity is predicted from the existing set E of entities in the KG; ● during training, triple instances are created by replacing h or t with each entity in E, scores are calculated of all candidate entities, and the top k entities are ranked. ■ ConMask [1] proposes relationship-dependent content masking over the entity description to select relevant snippets of given relations, and CNN-based target fusion to complete the knowledge graph. It can only make a prediction when query relations and entities are explicitly expressed in the text description. References 1 B. Shi and T. Weninger, “Open-world knowledge graph completion,” in AAAI, 2018, pp. 1957–1964. Approaches to Knowledge Graph Completion
  • 77.
    2. Relation pathreasoning ○ A limitation of the embedding based method is that they do not model complex relation paths. E.g. one-to-many, or many-to-many relations ■ Relation path reasoning leverages path information over the graph structure. Approaches to Knowledge Graph Completion
  • 78.
    2. Relation pathreasoning ○ A limitation of the embedding based method is that they do not model complex relation paths. ■ Relation path reasoning leverages path information over the graph structure. ○ Random walk inference has been investigated. ■ E.g., the Path-Ranking Algorithm (PRA) [1] chooses a relational path under a combination of path constraints and conducts maximum-likelihood classification. ○ Neural multi-hop relational path modeling is also studied. ■ Neelakantan et al. [2] models complex relation paths by applying compositionality recursively over the relations in the path as depicted in the figure below. References 1 N. Lao and W. W. Cohen, “Relational retrieval using a combination of path-constrained random walks,” Machine learning, vol. 81, no. 1, pp. 53–67, 2010. 2 A. Neelakantan, B. Roth, and A. McCallum, “Compositional vector space models for knowledge base completion,” in ACL-IJCNLP, vol. 1, 2015, pp. 156–166. Approaches to Knowledge Graph Completion
  • 79.
    2. Relation pathreasoning ○ Chains-of-Reasoning [1], a neural attention mechanism to enable multiple reasons, represents logical composition across all relations, entities, and text. ○ DIVA [2] proposes a unified variational inference framework that takes multi-hop reasoning as two sub-steps of path-finding (a prior distribution for underlying path inference) and path-reasoning (a likelihood for link classification). References 1 R. Das, A. Neelakantan, D. Belanger, and A. McCallum, “Chains of reasoning over entities, relations, and text using recurrent neural networks,” in EACL, vol. 1, 2017, pp. 132–141. 2 W. Chen, W. Xiong, X. Yan, and W. Y. Wang, “Variational knowledge graph reasoning,” in NAACL, 2018, pp. 1823–1832. Approaches to Knowledge Graph Completion
  • 80.
    2. Reinforcement-learning basedpath finding ○ Deep reinforcement learning (RL) is introduced for multi-hop reasoning by formulating path-finding between entity pairs as sequential decision making, specifically a Markov decision process (MDP). The policy-based RL agent learns to find a step of relation to extending the reasoning paths via the interaction between the knowledge graph environment, where the policy gradient is utilized for training RL agents. ■ KGC based on RL concepts of State, Action, Reward, and Policy Network ○ DeepPath [1] firstly applies RL into relational path learning and develops a novel reward function to improve accuracy, path diversity, and path efficiency. It encodes states in the continuous space via a translational embedding method and takes the relation space as its action space. ○ Similarly, MINERVA [2] takes path walking to the correct answer entity as a sequential optimization problem by maximizing the expected reward. It excludes the target answer entity and provides more capable inference. References 1 W. Xiong, T. Hoang, and W. Y. Wang, “DeepPath: A reinforcement learning method for knowledge graph reasoning,” in EMNLP, 2017, pp. 564–573. 2 R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and A. McCallum, “Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning,” in ICLR, 2018, pp. 1–18. Approaches to Knowledge Graph Completion
  • 81.
    2. Reinforcement-learning basedpath finding ○ Instead of using a binary reward function, MultiHop [1] proposes a soft reward mechanism. Action dropout is also adopted to mask some outgoing edges during training to enable more effective path exploration. ○ M-Walk [2] applies an RNN controller to capture the historical trajectory and uses the Monte Carlo Tree Search (MCTS) for effective path generation. ○ Leveraging text corpus with the sentence bag of current entity denoted as bet , CPL [3] proposes collaborative policy learning for pathfinding and fact extraction from text. ○ For the policy networks, DeepPath uses fully-connected network, the extractor of CPL employs CNN, while the rest uses recurrent networks. References 1 X. V. Lin, R. Socher, and C. Xiong, “Multi-hop knowledge graph reasoning with reward shaping,” in EMNLP, 2018, pp. 3243–3253. 2 Y. Shen, J. Chen, P.-S. Huang, Y. Guo, and J. Gao, “M-Walk: Learning to walk over graphs using monte carlo tree search,” in NeurIPS, 2018, pp. 6786–6797. 3 C. Fu, T. Chen, M. Qu, W. Jin, and X. Ren, “Collaborative policy learning for open knowledge graph reasoning,” in EMNLP, 2019, pp. 2672–2681. Approaches to Knowledge Graph Completion
  • 82.
    3. Rule-based Reasoning ○Another direction for Knowledge Graph Completion ■ making use of the symbolic nature of knowledge is logical rule learning ○ E.g., the inference rule: (Y; sonOf; X) <-- (X; hasChild; Y) ^ (Y; gender; Male), where the relation ‘sonOf’ did not exist earlier. ■ Logical rules can been extracted by rule mining tools like AMIE [1] ○ RLvLR [2] proposes a scalable rule mining approach with efficient rule searching and pruning, and uses the extracted rules for relation prediction. References 1 L. A. Gal´arraga, C. Teflioudi, K. Hose, and F. Suchanek, “AMIE: association rule mining under incomplete evidence in ontological knowledge bases,” in WWW, 2013, pp. 413–422. 2 P. G. Omran, K. Wang, and Z. Wang, “An embedding-based approach to rule learning in knowledge graphs,” IEEE TKDE, pp. 1–12, 2019. Approaches to Knowledge Graph Completion
  • 83.
    3. Rule-based Reasoning ○In a different research direction on this topic, research is focused on injecting logical rules into embeddings to improve reasoning, with joint learning, as an example, applied to incorporate first-order logic rules. ■ E.g., KALE [1] proposes a unified joint model with t-norm fuzzy logical connectives defined for compatible triples and logical rules embedding. ■ Specifically, three compositions of logical conjunction, disjunction, and negation are defined to compose the truth value of a complex formula. References 1 S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo, “Jointly embedding knowledge graphs and logical rules,” in EMNLP, 2016, pp. 192–202. Approaches to Knowledge Graph Completion
  • 84.
    4. Meta RelationalLearning ○ Consider that the real-world scenario of knowledge is, in fact, dynamic where unseen triples are usually acquired. ○ The new scenario is called as meta relational learning or few-shot relational learning ■ requires models to predict new relational facts with only very few samples ○ GMatching [1] develops a metric based few-shot learning method with entity embeddings and local graph structures. ■ It encodes one-hop neighbors to capture the structural information with R-GCN and then takes the structural entity embedding for multistep matching guided by long short-term memory (LSTM) networks to calculate the similarity scores. References 1 W. Xiong, M. Yu, S. Chang, X. Guo, and W. Y. Wang, “One-shot relational learning for knowledge graphs,” in EMNLP, 2018, pp. 1980–1990. Approaches to Knowledge Graph Completion
  • 85.
    Plan for PartII of II of the Talk ● Approaches ● Datasets and Toolkits
  • 86.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications. arXiv preprint arXiv:2002.00388. A popular way of generating task-specific datasets is to sample subsets from large general datasets.
  • 87.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications. arXiv preprint arXiv:2002.00388. E.g., the WN prefixed dataset names are a subset of the WordNet knowledge base.
  • 88.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs ● WordNet is designed to produce an intuitively usable dictionary and thesaurus, and support automatic text analysis. ● Its entities (termed synsets) correspond to word senses, and relationships define lexical relations between them. Examples of triplets are (score_NN_1, hypernym, evaluation_NN_1) or (score_NN_2, has_part, musical_notation_NN_1).
  • 89.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications. arXiv preprint arXiv:2002.00388. On the other hand, the FB prefixed dataset names are a subset of the Freebase knowledge base.
  • 90.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs ● Freebase is a huge and growing KB of general facts; there are currently around 1.2 billion triplets and more than 80 million entities.
  • 91.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs ● Freebase is a huge and growing KB of general facts; there are currently around 1.2 billion triplets and more than 80 million entities. ● The small data set (FB15K) was made by selected the subset of entities that are also present in the Wikilinks database and that also have at least 100 mentions in Freebase (for both entities and relationships).
  • 92.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs ● The large-scale dataset was created by selecting the most frequently occurring 5 million entities occuring in Freebase.
  • 93.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications. arXiv preprint arXiv:2002.00388. ● The datasets WN18 and FB15k suffer from test set leakage through inverse relations, where a large number of test triples could be obtained by inverting triples in the training set.
  • 94.
    Datasets Dataset Original Data# Rel. # Ent. # Train # Valid. # Test WN18 WordNet 18 40,943 141,442 5,000 5,000 FB15K Freebase 1,345 14,951 483,142 50,000 59,071 WN11 WordNet 11 38,696 112,581 2,609 10,544 FB13 Freebase 13 75,043 316,232 5,908 23,733 WN18RR WordNet 11 40,943 86,835 3,034 3,134 FB15k-237 Freebase 237 14,541 272,115 17,535 20,466 FB5M Freebase 1,192 5,385,322 19,193,556 50,000 59,071 FB40K Freebase 1,336 39,528 370,648 67,946 96,678 Datasets for Tasks on Knowledge Graphs Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications. arXiv preprint arXiv:2002.00388. ● The FB15k-237 was then introduced – a subset of FB15k where inverse relations were removed
  • 95.
    Toolkits Task Library LanguageURL General Grakn Python github.com/graknlabs/kglib General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph General GraphVile Python graphvite.io Database Akutan Go github.com/eBay/akutan KRL OpenKE PyTorch github.com/thunlp/OpenKE KRL Fast-TransX C++ github.com/thunlp/Fast-TransX KRL sckit-kge Python github.com/mnick/scikit-kge KRL LibKGE PyTorch github.com/uma-pi1/kge KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN RE OpenNRE PyTorch github.com/thunlp/OpenNRE Table: Summary of Knowledge Graph Building Technology as Open Source Libraries Reference: Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P. S. (2020). A survey on knowledge graphs: Representation, acquisition and applications. arXiv preprint arXiv:2002.00388.
  • 96.
    Toolkits Task Library LanguageURL General Grakn Python github.com/graknlabs/kglib General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph General GraphVile Python graphvite.io Database Akutan Go github.com/eBay/akutan KRL OpenKE PyTorch github.com/thunlp/OpenKE KRL Fast-TransX C++ github.com/thunlp/Fast-TransX KRL sckit-kge Python github.com/mnick/scikit-kge KRL LibKGE PyTorch github.com/uma-pi1/kge KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN RE OpenNRE PyTorch github.com/thunlp/OpenNRE Table: Summary of Knowledge Graph Building Technology as Open Source Libraries ● AmpliGraph for knowledge representation learning
  • 97.
    Toolkits Task Library LanguageURL General Grakn Python github.com/graknlabs/kglib General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph General GraphVile Python graphvite.io Database Akutan Go github.com/eBay/akutan KRL OpenKE PyTorch github.com/thunlp/OpenKE KRL Fast-TransX C++ github.com/thunlp/Fast-TransX KRL sckit-kge Python github.com/mnick/scikit-kge KRL LibKGE PyTorch github.com/uma-pi1/kge KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN RE OpenNRE PyTorch github.com/thunlp/OpenNRE ● Akutan for knowledge graph store and query
  • 98.
    Toolkits Task Library LanguageURL General Grakn Python github.com/graknlabs/kglib General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph General GraphVile Python graphvite.io Database Akutan Go github.com/eBay/akutan KRL OpenKE PyTorch github.com/thunlp/OpenKE KRL Fast-TransX C++ github.com/thunlp/Fast-TransX KRL sckit-kge Python github.com/mnick/scikit-kge KRL LibKGE PyTorch github.com/uma-pi1/kge KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN RE OpenNRE PyTorch github.com/thunlp/OpenNRE ● Three example useful toolkits released by the research community. ○ scikit-kge and OpenKE for knowledge graph embedding
  • 99.
    Toolkits Task Library LanguageURL General Grakn Python github.com/graknlabs/kglib General AmpliGraph TensorFlow github.com/Accenture/AmpliGraph General GraphVile Python graphvite.io Database Akutan Go github.com/eBay/akutan KRL OpenKE PyTorch github.com/thunlp/OpenKE KRL Fast-TransX C++ github.com/thunlp/Fast-TransX KRL sckit-kge Python github.com/mnick/scikit-kge KRL LibKGE PyTorch github.com/uma-pi1/kge KRL PyKEEN Python github.com/SmartDataAnalytics/PyKEEN RE OpenNRE PyTorch github.com/thunlp/OpenNRE ● Three example useful toolkits released by the research community. ○ OpenNRE for relation extraction
  • 100.
    ● Entity Linkingis a long researched topic in the NLP community ● Neural models have enabled systems to cross the 95% performance barriers for the task ● Knowledge Graph Completion is an active research area and relatively new with neural model considerations ○ Uses machine learning and neural networks to ‘vectorize’ entities and relationships ● Implementations can be slow, but recently this has started to change Conclusion: Takeaways
  • 101.
    Happy to takeQuestions Thank you for your attention!