Latent Relational Model for Relation Extraction
Gaetano Rossiello1
, Alfio Gliozzo2
, Nicolas Fauceglia2
, Giovanni Semeraro1
1
Department of Computer Science - University of Bari, Italy
2
IBM Research AI - Yorktown Heights, NY, USA
gaetano.rossiello@uniba.it
github.com/gaetangate
/in/gaetano-rossiello
@tanoross
Goal: from Text to Knowledge
Unstructured Textual Data Structured Data Knowledge & Insights
● Information Extraction
○ Entity Recognition
○ Relation Extraction
● Frame Parsing
● Semantic Parsing
○ FOL
○ Lambda Calculus
○ AMR
● Deductive Reasoning
● Inductive Logic Programming
● Probabilistic (Logic) Programming
● Relational Embeddings
● ...
Why Relation Extraction?
● Automatic Knowledge Base Population (AKBP)
○ Lexical resources: add words to WordNet thesaurus
○ Fact bases: add facts to Wikidata or DBpedia
● Automatic Knowledge Base Construction (AKBC)
● Sample application: Question Answering (QA)
○ Who are the actors younger than Tom Hanks?
(isA ?x actor) (birthDate ?x ?y) (birthDate “Tom_Hanks” ?z) (> ?y ?z)
Relation Extraction Approaches
● Pattern-based [Hearst, 1992]
○ Hand-crafted rules
● Bootstrapping [Agichtein, 2000]
○ Semantic drift
● OpenIE [Banko, 2007; Fader, 2011; Mausam, 2012]
○ Lexicalized relations not in a canonical form
● Supervised [Jiang, 2007; Sun, 2014; Nguyen, 2015]
○ Manually annotated training examples
● Distant Supervision [Mintz, 2009; Lin, 2016; Glass, 2018]
○ An existing KB is used to generate training examples
○ Advantages from both bootstrapping and supervised RE
ISWC Semantic Web Challenge 2017
Glass, M., Gliozzo, A., Hassanzadeh, O., Mihindukulasooriya, N., Rossiello, G.
Inducing implicit relations from text using distantly supervised deep nets. ISWC 2018.
PCNN-KI: Piecewise Convolutional Neural Network for Distantly Supervised RE
PermID KG
Distantly Supervised RE: Limitations
● Distant Supervision does not fit well for vertical domains or long-tailed
relation types, where only a few seed examples are available
● The generalization capability is limited only to those relation types seen
during the training phase
Distantly supervised RE cannot be applied in other domains with new relation types
Use Case: Knowledge Base Population in Cold Start
Research Question:
How to design a method able to identify new
relation types in a (small) collection of
documents using a few examples?
Training examples
Relation Extraction as Analogy Problem
● Given a corpus D and an entity pair (a, b)
● Find the set R = {(x, y) ∊ D | a : b = x : y}
Watson : IBM = Pixel : Google
Query pair Result pair
Word Analogy using Distributional Semantic Models
Vector offset with Word Embeddings
man : king = woman : ?
vec(king) - vec(man) + vec (woman) ≈ vec(queen)
vec(king) - vec(man) ≈ vec(queen) - vec(woman)
Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient estimation of word representations in vector space. ICLR 2013.
Pennington, J., Socher, R., & Manning, C. D. Glove: Global vectors for word representation. EMNLP 2014.
Levy, O., Goldberg, Y., & Dagan, I. Improving distributional similarity with lessons learned from word embeddings. TACL 2015.
Limitations:
● Handling multi-word (e.g. Tom Hanks) with pre-trained word embedding models
● Handling unseen words/entities
● Not effective on SAT Analogy Questions [Church, 2017]
SAT Analogy Questions Dataset
● SAT = Scholastic Aptitude Test [Turney, 2003]
● 374 multiple-choice analogy questions; 5 choices per question
● Human performance: 81.5%
● SOTA - Latent Relational Analysis (LRA): 56.1%
Turney, P.D., and Littman, M.L. Corpus-based learning of analogies and semantic relations. Machine Learning. 2005.
Turney, P.D. Similarity of semantic relations. Computational Linguistics. 2006.
LRA
r1 = vec(mason:stone)
r2 = vec(carpenter:wood)
sim = cosine(r1, r2)
Latent Relational Model for RE
Entity-Entity Vocabulary
V = {(X1, Y1),..., (Xn, Yn)}
Entity-Entity Contexts
1. The entity types provided by the NER
2. The sequence of words between the two entities
3. The part-of-speech tags of these words
4. A flag indicating which entity came first
5. An n-gram to the left of the first entity
6. An n-gram to the right of the second entity
7. A dependency path between the two entities
1 0 0 ... 1
0 1 1 ... 0
1 1 0 ... 0
0 0 0 ... 1
Un,k
∑k,k
Vk,m
Singular Value Decomposition (SVD)
Relational Vector Space Model
LRMn,k
= (Uk
Σk
)n,k
m columns
n rows
Rome is the capital of Italy.
David Gilmour was the guitarist of Pink Floyd.
Pac-Man is an arcade developed by Namco.
...
(Rome, Italy)
(David Gilmour, Pink Floyd)
(Pac-Man, Namco)
...
Use Case: Knowledge Base Population in Cold Start
Rossiello G., Gliozzo A., Fauceglia N. RELATION EXTRACTION FROM A CORPUS USING AN
INFORMATION RETRIEVAL BASED PROCEDURE. Patent ID P201706307
Use Case: Knowledge Base Population in Cold Start
Training examples
Geometric Interpretation of Relations
“A semantic relation R is a region in a relational vector space
LRMn,k
that outlines the boundaries among those entity-pair
vectors that are analogous to each other.”
Dataset: NYT-FB [Riedel, 2010]
New York, Brooklyn
Bill Gates, Microsoft
A:B=C:D ⇔ dist(r(A,B)
,r(C,D)
) <
t
LRM for Distantly Supervised Relation Extraction
Dataset: NTY-FB [Riedel, 2010]
Corpus: New York Times (2005-2007)
KG: Freebase
Relations/classes: 51
Training positive: 4700
Training negative: 63569
Test positive: 1950
Test negative: 94917
LRM: SVD [Halko, 2011] k=2000
Classifier: SVM one-vs-rest
ARES (Ours) = LRM + SVM
Conclusion
● Relation Extraction (RE) as Analogy Problem
(two sides of the same coin)
● Latent Relational Model (LRM) for RE
● Geometric Interpretation of Relations
● LRM for Unsupervised RE
● LRM for Semi-supervised RE
● LRM for Supervised RE
Limitations of LRM / Future Work
● NLP pipeline and SVD do not scale on very large corpora
○ Learning Relational Representations by Analogy
using Hierarchical Siamese Networks [Rossiello et al, NAACL 2019]
○ Variational Autoencoders
● LRM is not able to model the directionality of relations
○ founder(Person, Company) - OK
○ competitor(Company, Company) - OK
○ supplyTo(Company, Company) - KO!
● One entity-entity embedding encodes many relations
○ Contextual Relational Embeddings, like ELMO [Peters, 2018], BERT [Devlin, 2018]
○ Lookup tensor: [entity-entity, mention, vector]
● Extract n-ary Relations
○ Towards Unsupervised Semantic/Frame Parsing
Thank you!

Latent Relational Model for Relation Extraction

  • 1.
    Latent Relational Modelfor Relation Extraction Gaetano Rossiello1 , Alfio Gliozzo2 , Nicolas Fauceglia2 , Giovanni Semeraro1 1 Department of Computer Science - University of Bari, Italy 2 IBM Research AI - Yorktown Heights, NY, USA gaetano.rossiello@uniba.it github.com/gaetangate /in/gaetano-rossiello @tanoross
  • 2.
    Goal: from Textto Knowledge Unstructured Textual Data Structured Data Knowledge & Insights ● Information Extraction ○ Entity Recognition ○ Relation Extraction ● Frame Parsing ● Semantic Parsing ○ FOL ○ Lambda Calculus ○ AMR ● Deductive Reasoning ● Inductive Logic Programming ● Probabilistic (Logic) Programming ● Relational Embeddings ● ...
  • 3.
    Why Relation Extraction? ●Automatic Knowledge Base Population (AKBP) ○ Lexical resources: add words to WordNet thesaurus ○ Fact bases: add facts to Wikidata or DBpedia ● Automatic Knowledge Base Construction (AKBC) ● Sample application: Question Answering (QA) ○ Who are the actors younger than Tom Hanks? (isA ?x actor) (birthDate ?x ?y) (birthDate “Tom_Hanks” ?z) (> ?y ?z)
  • 4.
    Relation Extraction Approaches ●Pattern-based [Hearst, 1992] ○ Hand-crafted rules ● Bootstrapping [Agichtein, 2000] ○ Semantic drift ● OpenIE [Banko, 2007; Fader, 2011; Mausam, 2012] ○ Lexicalized relations not in a canonical form ● Supervised [Jiang, 2007; Sun, 2014; Nguyen, 2015] ○ Manually annotated training examples ● Distant Supervision [Mintz, 2009; Lin, 2016; Glass, 2018] ○ An existing KB is used to generate training examples ○ Advantages from both bootstrapping and supervised RE
  • 5.
    ISWC Semantic WebChallenge 2017 Glass, M., Gliozzo, A., Hassanzadeh, O., Mihindukulasooriya, N., Rossiello, G. Inducing implicit relations from text using distantly supervised deep nets. ISWC 2018. PCNN-KI: Piecewise Convolutional Neural Network for Distantly Supervised RE PermID KG
  • 6.
    Distantly Supervised RE:Limitations ● Distant Supervision does not fit well for vertical domains or long-tailed relation types, where only a few seed examples are available ● The generalization capability is limited only to those relation types seen during the training phase Distantly supervised RE cannot be applied in other domains with new relation types
  • 7.
    Use Case: KnowledgeBase Population in Cold Start Research Question: How to design a method able to identify new relation types in a (small) collection of documents using a few examples? Training examples
  • 8.
    Relation Extraction asAnalogy Problem ● Given a corpus D and an entity pair (a, b) ● Find the set R = {(x, y) ∊ D | a : b = x : y} Watson : IBM = Pixel : Google Query pair Result pair
  • 9.
    Word Analogy usingDistributional Semantic Models Vector offset with Word Embeddings man : king = woman : ? vec(king) - vec(man) + vec (woman) ≈ vec(queen) vec(king) - vec(man) ≈ vec(queen) - vec(woman) Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient estimation of word representations in vector space. ICLR 2013. Pennington, J., Socher, R., & Manning, C. D. Glove: Global vectors for word representation. EMNLP 2014. Levy, O., Goldberg, Y., & Dagan, I. Improving distributional similarity with lessons learned from word embeddings. TACL 2015. Limitations: ● Handling multi-word (e.g. Tom Hanks) with pre-trained word embedding models ● Handling unseen words/entities ● Not effective on SAT Analogy Questions [Church, 2017]
  • 10.
    SAT Analogy QuestionsDataset ● SAT = Scholastic Aptitude Test [Turney, 2003] ● 374 multiple-choice analogy questions; 5 choices per question ● Human performance: 81.5% ● SOTA - Latent Relational Analysis (LRA): 56.1% Turney, P.D., and Littman, M.L. Corpus-based learning of analogies and semantic relations. Machine Learning. 2005. Turney, P.D. Similarity of semantic relations. Computational Linguistics. 2006. LRA r1 = vec(mason:stone) r2 = vec(carpenter:wood) sim = cosine(r1, r2)
  • 11.
    Latent Relational Modelfor RE Entity-Entity Vocabulary V = {(X1, Y1),..., (Xn, Yn)} Entity-Entity Contexts 1. The entity types provided by the NER 2. The sequence of words between the two entities 3. The part-of-speech tags of these words 4. A flag indicating which entity came first 5. An n-gram to the left of the first entity 6. An n-gram to the right of the second entity 7. A dependency path between the two entities 1 0 0 ... 1 0 1 1 ... 0 1 1 0 ... 0 0 0 0 ... 1 Un,k ∑k,k Vk,m Singular Value Decomposition (SVD) Relational Vector Space Model LRMn,k = (Uk Σk )n,k m columns n rows Rome is the capital of Italy. David Gilmour was the guitarist of Pink Floyd. Pac-Man is an arcade developed by Namco. ... (Rome, Italy) (David Gilmour, Pink Floyd) (Pac-Man, Namco) ...
  • 12.
    Use Case: KnowledgeBase Population in Cold Start Rossiello G., Gliozzo A., Fauceglia N. RELATION EXTRACTION FROM A CORPUS USING AN INFORMATION RETRIEVAL BASED PROCEDURE. Patent ID P201706307
  • 13.
    Use Case: KnowledgeBase Population in Cold Start Training examples
  • 14.
    Geometric Interpretation ofRelations “A semantic relation R is a region in a relational vector space LRMn,k that outlines the boundaries among those entity-pair vectors that are analogous to each other.” Dataset: NYT-FB [Riedel, 2010] New York, Brooklyn Bill Gates, Microsoft A:B=C:D ⇔ dist(r(A,B) ,r(C,D) ) < t
  • 15.
    LRM for DistantlySupervised Relation Extraction Dataset: NTY-FB [Riedel, 2010] Corpus: New York Times (2005-2007) KG: Freebase Relations/classes: 51 Training positive: 4700 Training negative: 63569 Test positive: 1950 Test negative: 94917 LRM: SVD [Halko, 2011] k=2000 Classifier: SVM one-vs-rest ARES (Ours) = LRM + SVM
  • 16.
    Conclusion ● Relation Extraction(RE) as Analogy Problem (two sides of the same coin) ● Latent Relational Model (LRM) for RE ● Geometric Interpretation of Relations ● LRM for Unsupervised RE ● LRM for Semi-supervised RE ● LRM for Supervised RE
  • 17.
    Limitations of LRM/ Future Work ● NLP pipeline and SVD do not scale on very large corpora ○ Learning Relational Representations by Analogy using Hierarchical Siamese Networks [Rossiello et al, NAACL 2019] ○ Variational Autoencoders ● LRM is not able to model the directionality of relations ○ founder(Person, Company) - OK ○ competitor(Company, Company) - OK ○ supplyTo(Company, Company) - KO! ● One entity-entity embedding encodes many relations ○ Contextual Relational Embeddings, like ELMO [Peters, 2018], BERT [Devlin, 2018] ○ Lookup tensor: [entity-entity, mention, vector] ● Extract n-ary Relations ○ Towards Unsupervised Semantic/Frame Parsing
  • 18.