SlideShare a Scribd company logo
Toward a Better Understanding of Relational
Knowledge in Language Models
Asahi Ushio
Ph.D in Computer Science & Informatics, Cardiff University
Cardiff NLP
https://asahiushio.com, @asahiushio , https://github.com/asahi417
UCL NLP Meetup
3rd October 2022
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
About Me
Ph.D at Cardiff University (2020~2023): Jose Camacho-Collados, Steven Schockaert
Internship at Amazon (2021 summer): Danushka Bollegala
Internship at Snapchat (2021 winter): Leonardo Neves, Francesco Barbieri
Research Interests:
- Relational Knowledge Representation: Analogy LM [ACL 2021], RelBERT [EMNLP 2021]
- Question Generation: AutoQG
- Twitter NLP: TweetNLP [EMNLP 2022], TweetTopic [COLING 2022], TweetNER7 [ACL 2022]
Funs: Art, Whisky, Dance, Jazz
Social: Twitter, LinkedIn, GitHub
2
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Outline
What is relational knowledge and how we evaluate it?
- “BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?”,
ACL 2021
Can we extract relational knowledge by fine-tuning?
- “Distilling Relation Embeddings from Pretrained Language Models”, EMNLP 2021
Resources:
- HuggingFace: https://huggingface.co/relbert
- GitHub: https://github.com/asahi417/relbert, https://github.com/asahi417/analogy-language-model
3
Relational Knowledge and
Word Analogies
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Relational Knowledge
5
Understand the relation in between terms
→ (Washington D.C., U.S.) = a capital in the country
→ (Obama,U.S.) = a former president
→ (Word, Language) = a component
→ (cool, cold) = synonym
→ (warm, warmer) = comparative
Obama
U.S.
D.C.
Word
Language
cold
cool
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Component
Word Analogies
6
U.S.
D.C.
Japan
Tokyo
U.K.
London
U.S.
Obama
Japan
Abe
U.K.
Johnson
Language
Word
Music
Note
Gym
Barbell
Is capital of Former president
eg. ) Word is to language as note is to music.
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Analogical Reasoning
7
Question: What is the note?
● Case-based Reasoning
○ Nearest Neighbour → Note is “memo”.
Tone
Record
Note
Memo
Embedding Space
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Analogical Reasoning
8
Question: What is the note?
● Case-based Reasoning
○ Nearest Neighbour → Note is “memo”.
● Analogical Reasoning [Prade 2017]
○ Transfer knowledge from analogies → Note is “a component of music”.
Analogies
Language
Word
Gym
Barbell
Relation
Estimation
component
Note Music
Note is a component of Music.
Analogical
Reasoning
Tone
Record
Note
Memo
Embedding Space
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Analogical Reasoning
9
Question: What is the note?
● Case-based Reasoning
○ Nearest Neighbour → Note is “memo”.
● Analogical Reasoning [Prade 2017]
○ Transfer knowledge from analogies → Note is “a component of music”.
Word is to Language as Note is to Music.
Analogies
Language
Word
Analogical Proportion (AP)
Eg.)
`A` is to `B` as `C` is to `D`. == A : B :: C : D
`<task 1>` is to `<solution 1>` as `<task 2>` is to `<solution 2>`.
Task
Note
Tone
Record
Note
Memo
Embedding Space
Analogy
Task
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Related Tasks in NLP
10
Relation mapping problem [Turney 2008].
Tasks that require an explicit understanding of relational knowledge in NLP.
● Analogy Question
● Relation Mapping Problem
● Lexical Relation Classification
● etc
SAT analogy question [Turney 2003].
Solving Analogy with LMs!
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Analogy Question & Solution with Word Embedding
12
SAT analogy question [Turney 2003].
King
Queen
Woman
Man
Word Embedding [Mikolov 2013]
v(Man) - v(Woman) + v(Queen) = v(King)
hq : tq
(1) h1 : t1
(2) h2 : t2
(3) h3 : t3
y = argmax{sim(rq, ri)}
rq = v(hq) - v(tq)
r1 = v(h1) - v(t1)
r2 = v(h2) - v(t2)
r3 = v(h3) - v(t3)
Solving Analogy with Word Embedding
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
GPT3 to Solve Analogy
13
GPT3 [Brown 2020] uses template of “A” is to “B” as “C” is to “D” (analogical
proportion).
hq : tq
(1) h1 : t1
(2) h2 : t2
(3) h3 : t3
Prompting (1) x1
(2) x2
(3) x3
(1) 0.9
(2) 0.2
(3) 1.7
Perplexity
Analogy Test Sentence Perplexity
Model prediction
Eg) word:language
(1) paint:portrait → word is to language as paint is to portrait
(2) note:music → word is to language as note is to music
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Result on SAT Analogy Question
14
Model Accuracy on SAT
[Turney 2003], [link]
LRA [Turney 2005] 56.4
Word2Vec [Mikolov 2013] 42.8
GloVe [Pennington 2014] 48.9
FastText [Bojanowski 2017] 49.7
GPT3 (zeroshot) [Brown 2020] 53.7
GPT3 (fewshot) [Brown 2020] 65.2
Human [Turney 2005] 57.0
Remark
● Human performance is not high.
● LRA (SVD on a PMI matrix over large
word-pair corpus) is competitive with
recent methods.
● None of zero-shot outperforms human
baselines including GPT3.
So, LMs are poor at understanding analogy… 😨
BERT is to NLP what AlexNet is to
CV: Can Pre-Trained Language
Models Identify Analogies?
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
LM to Solve Analogy
16
hq : tq
(1) h1 : t1
(2) h2 : t2
(3) h3 : t3
Prompting (1) x1
(2) x2
(3) x3
(1) 0.2
(2) 0.1
(3) 0.7
LM scoring
Analogy Test Sentence Score
Model prediction
Template types
Based on GPT3 approach, but generalize the perplexity to LM scoring with multiple
template types.
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
LM Scoring
17
- Perplexity (or pseudo-perplexity for masked LMs [Wang 2019] )
- Perplexity-based PMI
- Marginal likelihood biased Perplexity (mPPL)
*Conditional likelihood:
*Marginal likelihood:
Text converted from (hq : tq :: hi : ti) with
template t .
Parameter to control the effect of
marginal likelihood [Feldman 2019] .
Perplexity
Marginal likelihood of tail and
head with parameters to
control the effect.
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
18
1. a : b :: c : d
2. a : c :: b : d
3. b : a :: d : c
4. b : d :: a : c
5. c : d :: a : b
6. c : a :: d : b
7. d : c :: b : a
8. d : b :: c : a
Permutations of (a:b) and (c:d)
Positive Negative
1. a : b :: d : c
2. a : c :: d : b
3. a : d :: b : c
4. a : d :: c : b
5. b : a :: c : d
6. b : c :: a : d
7. b : c :: d : a
8. b : d :: c : a
9. c : a :: b : d
10. c : b :: a : d
11. c : b :: d : a
12. c : d :: b : a
13. d : a :: b : c
14. d : a :: c : b
15. d : b :: a : c
16. d : c :: a : b
Permutation of Analogical Proportion
Analogical Proportion (AP)
→ “a” is to “b” as “c” is to “d” or a : b :: c : d
Permutation Invariance [Barbot 2019]
● Positive permutation:
○ Statement is still the same analogy.
● Negative permutation:
○ Statement is not the same analogy.
Example:
● “word is to language as note is to music” = “language is to word as music is to note”
● “word is to language as note is to music” ≠ “language is to word as note is to music”
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
19
Analogical Proportion (AP) Score
Scoring function (perplexity, PMI, mPPL)
Parameter control the bias of negative
permutation
Positive permutations
Negative permutations
Aggregation function (mean, max, n-th element)
AP Score: Relative gain of LM score from negative permutations to positive permutations.
Key assumption: Better scoring function should give low score on the negative permutations
and high score on the positive permutations.
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Analogy Question Datasets
20
SAT [Turney 2003]: US college admission test
UNIT2 [Boteanu 2015]: US college applicants in grade of 4 to 12 (age 9 onwards)
UNIT4: Same as UNIT2 but organized by high-beginning (SAT level), low-intermediate, high-intermediate,
low-advanced, and high-advanced (GRE level).
Google [Mikolov 2013]: A mix of semantic and morphological relations (capital-of, singular-plural, etc)
BATS [Gladkova 2016]: A mix of lexicographic, encyclopedic, and derivational and inflectional morphology
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Results
(zeroshot)
21
RoBERTa is the best in
U2 & U4 but otherwise
FastText owns it 🤔
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Results
(tune on val)
22
BERT still worse
but
RoBERTa & GPT2 achieve
the best 🤗
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
23
Results
(SAT full)
Few-shot models
outperform word
embeddings and LRA.
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Difficulty Level Breakdown (U2 & U4)
24
UNIT4
UNIT2
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Conclusion
25
GPT3 result was dissapointing given the model size, but we show some of the smaller
LMs (RoBERTa and GPT2) can solve analogies in a true zero-shot setting to some extent
with a carefully designed scoring function.
Language models are better than word embeddings at understanding abstract relations,
but have ample room for improvement (BERT is still worse even if it is tuned).
Language models are very sensitive to hyperparameter tuning in this task, and careful
tuning leads to competitive results.
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Conclusion
26
GPT3 result was dissapointing given the model size, but we show some of the smaller
LMs (RoBERTa and GPT2) can solve analogies in a true zero-shot setting to some extent
with a carefully designed scoring function.
Language models are better than word embeddings at understanding abstract relations,
but have ample room for improvement (BERT is still worse even if it is tuned).
Language models are very sensitive to hyperparameter tuning in this task, and careful
tuning leads to competitive results.
LMs do have relational knowledge, but it is not easy
to exploit and utilize in downstream tasks with
vanilla LMs.
Assumption on Relational Knowledge of LMs
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Conclusion
27
GPT3 result was dissapointing given the model size, but we show some of the smaller
LMs (RoBERTa and GPT2) can solve analogies in a true zero-shot setting to some extent
with a carefully designed scoring function.
Language models are better than word embeddings at understanding abstract relations,
but have ample room for improvement (BERT is still worse even if it is tuned).
Language models are very sensitive to hyperparameter tuning in this task, and careful
tuning leads to competitive results.
LMs do have relational knowledge, but it is not easy
to exploit and utilize in downstream tasks with
vanilla LMs.
Assumption on Relational Knowledge of LMs
What if we can fine-tune
LM to explicitly extract
relational knowledge?🤔
Distilling Relation Embeddings
from Pre-trained Language
Models
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Relation Embedding
Word Embedding [Mikolov 2013]
Pair2Vec [Joshi 2019]
Relative [Camacho-Collados 2019]
29
King
Queen
Woman
Man
Word Embedding
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Prompt Generation
● Custom Template, AutoPrompt [Shin 2020], P-tuning [Liu 2021]
Embedding Aggregation
● Averaging Pooling / Mask Embedding
30
(h, t)
Word pair
Sentence
(tokens)
Contextual
embedding
Fixed-length
embedding
LM
embedding
Aggregation
x1
x2
x3
…
Relation Embedding with LM (RelBERT)
Prompting
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Prompt Generation
● Custom Template, AutoPrompt [Shin 2020], P-tuning [Liu 2021]
Embedding Aggregation
● Averaging Pooling / Mask Embedding
31
(h, t)
Sentence
(tokens)
Contextual
embedding
Fixed-length
embedding
LM
embedding
Aggregation
x1
x2
x3
…
Relation Embedding with LM (RelBERT)
1. Today, I finally discovered the relation between `h` and `t` : `h`
is the <mask> of photographer
2. Today, I finally discovered the relation between `h` and `t` : `t`
is `h`’s <mask>
3. Today, I finally discovered the relation between `h` and `t` :
<mask>
4. I wasn’t aware of this relationship, but I just read in the
encyclopedia that `h` is the <mask> of `t`
5. I wasn’t aware of this relationship, but I just read in the
encyclopedia that `t` is `h`’s <mask>
Prompting
Word pair
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Given a triple: anchor “sunscreen::sunburn”, positive “vaccine::malaria”,
and negative “dog::cat”, we want the embeddings of the anchor and the positive
close but far from the negative.
32
vaccine::malaria
sunscreen::sunburn
dog::cat
Fine-tuning
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
Given a triple of the anchor (eg. “sunscreen”), the positive (eg. “sunburn”), and the
negative (eg. “evil”), the triplet loss is defined as
and the classification loss is defined as
where is a learnable weight.
33
Triplet and Classification Loss [Reimers 2019]
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
We create the dataset from Relation Similarity Dataset
(taxonomy).
# Training: 1,580 (positive pairs) 70,207 (negative pairs)
# Validation: 1,778 (positive pairs) 78,820 (negative pairs)
# 10 parent relations
# 10 child relations
34
Dataset
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
We create the dataset from Relation Similarity Dataset
(taxonomy).
# Training: 1,580 (positive pairs) 70,207 (negative pairs)
# Validation: 1,778 (positive pairs) 78,820 (negative pairs)
# 10 parent relations
# 10 child relations
35
Dataset
Class Inclusion
Parent Relation Child Relation
Taxonomic
Functional
(h1,t1), ... , (hk,tk)
Positive
(h1,t1), ... , (hk,tk)
Negative
:
Triplet
( xa, xp, xn )
EXPERIMENTS
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
38
Setup
- Cosine similarity in between embeddings (same as word embedding).
- No additional training or validation.
- Accuracy as the metric.
- RoBERTa large as an anchor language model for RelBERT.
Analogy Question
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
39
SotA in 4 / 5 datasets 🎉
Even better than methods tuned on validation set
🤗
Analogy Question: Results
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
40
Setup
- Supervised Task
- LMs are frozen
- macro/micro F1
- Tuned on dev
Data statistics.
Lexical Relation Classification
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
SotA in 4 / 5 datasets in
macro F1 score 🎉
SotA in 3 / 5 datasets in
micro F1 score 🎉
41
Classification:
Results
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
42
Does RelBERT just memorize the relations in the training
set… ?
Experiment: Train RelBERT without hypernym.
Result: No significant decrease in hypernym prediction.
Memorization
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
43
Break-down of BATS results per relation type.
● High accuracy in morphological analogies
● Poor accuracy in encyclopedic analogies
Note that the relation similarity dataset does not contain
morphological relation but rather semantic relation
(taxonomy).
Relation Type Breakdown
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
44
Train RelBERT on BERT, ALBERT in addition to RoBERTa.
→ RoBERTa is the best.
Vanilla RoBERTa (no fine-tuning).
→ Fine-tuning (distillation) is necessary.
LM Variation
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
word2vec
RelBERT
Latent Space Visualization
2-D embeddings of ConceptNet (with UMAP dim-reduction).
Toward a Better Understanding of Relational Knowledge in Language Models
Asahi Ushio
46
We propose RelBERT, a framework to achieve relation embedding model based on pretrained LM.
RelBERT distil the LM’s relational knowledge and realize a high quality relation embedding.
Experimental results show that RelBERT embeddings outperform existing baselines, establishing
SotA in analogy and relation classification.
- Model: https://huggingface.co/relbert/relbert-roberta-large
- Demo: https://huggingface.co/spaces/relbert/Analogy
- GitHub: https://github.com/asahi417/relbert
Conclusion
Thank You!

More Related Content

Similar to 2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledge in Language Models

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
Graph-based Question Answering
Graph-based Question AnsweringGraph-based Question Answering
Graph-based Question Answering
Diego Molla-Aliod
 
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGETEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
kevig
 
Text Summarization in Mongolian Language
Text Summarization in Mongolian LanguageText Summarization in Mongolian Language
Text Summarization in Mongolian Language
kevig
 
20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs
X 37
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
MeetupDataScienceRoma
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
Lukáš Svoboda
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
iyo
 
Lecture20 xing
Lecture20 xingLecture20 xing
Lecture20 xing
Tianlu Wang
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
Fabio Petroni, PhD
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
Jisoo Jang
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
André Karpištšenko
 
Ldml presentation
Ldml presentationLdml presentation
Ldml presentation
Sean Con
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Jie Bao
 
Learning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniquesLearning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniques
UKM university
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Tae Hwan Jung
 
natural language processing
natural language processing natural language processing
natural language processing
sunanthakrishnan
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language Semantics
Dimitrios Kartsaklis
 
Ldml - public
Ldml - publicLdml - public
Ldml - public
seanscon
 

Similar to 2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledge in Language Models (20)

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Graph-based Question Answering
Graph-based Question AnsweringGraph-based Question Answering
Graph-based Question Answering
 
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGETEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
 
Text Summarization in Mongolian Language
Text Summarization in Mongolian LanguageText Summarization in Mongolian Language
Text Summarization in Mongolian Language
 
20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
Lecture20 xing
Lecture20 xingLecture20 xing
Lecture20 xing
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Ldml presentation
Ldml presentationLdml presentation
Ldml presentation
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
 
Learning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniquesLearning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniques
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
natural language processing
natural language processing natural language processing
natural language processing
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language Semantics
 
Ldml - public
Ldml - publicLdml - public
Ldml - public
 

More from asahiushio1

2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
asahiushio1
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
asahiushio1
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
asahiushio1
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
asahiushio1
 
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
asahiushio1
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
asahiushio1
 

More from asahiushio1 (6)

2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
 
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
 

Recently uploaded

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 

Recently uploaded (20)

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 

2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledge in Language Models

  • 1. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Ph.D in Computer Science & Informatics, Cardiff University Cardiff NLP https://asahiushio.com, @asahiushio , https://github.com/asahi417 UCL NLP Meetup 3rd October 2022
  • 2. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio About Me Ph.D at Cardiff University (2020~2023): Jose Camacho-Collados, Steven Schockaert Internship at Amazon (2021 summer): Danushka Bollegala Internship at Snapchat (2021 winter): Leonardo Neves, Francesco Barbieri Research Interests: - Relational Knowledge Representation: Analogy LM [ACL 2021], RelBERT [EMNLP 2021] - Question Generation: AutoQG - Twitter NLP: TweetNLP [EMNLP 2022], TweetTopic [COLING 2022], TweetNER7 [ACL 2022] Funs: Art, Whisky, Dance, Jazz Social: Twitter, LinkedIn, GitHub 2
  • 3. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Outline What is relational knowledge and how we evaluate it? - “BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?”, ACL 2021 Can we extract relational knowledge by fine-tuning? - “Distilling Relation Embeddings from Pretrained Language Models”, EMNLP 2021 Resources: - HuggingFace: https://huggingface.co/relbert - GitHub: https://github.com/asahi417/relbert, https://github.com/asahi417/analogy-language-model 3
  • 5. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Relational Knowledge 5 Understand the relation in between terms → (Washington D.C., U.S.) = a capital in the country → (Obama,U.S.) = a former president → (Word, Language) = a component → (cool, cold) = synonym → (warm, warmer) = comparative Obama U.S. D.C. Word Language cold cool
  • 6. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Component Word Analogies 6 U.S. D.C. Japan Tokyo U.K. London U.S. Obama Japan Abe U.K. Johnson Language Word Music Note Gym Barbell Is capital of Former president eg. ) Word is to language as note is to music.
  • 7. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Analogical Reasoning 7 Question: What is the note? ● Case-based Reasoning ○ Nearest Neighbour → Note is “memo”. Tone Record Note Memo Embedding Space
  • 8. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Analogical Reasoning 8 Question: What is the note? ● Case-based Reasoning ○ Nearest Neighbour → Note is “memo”. ● Analogical Reasoning [Prade 2017] ○ Transfer knowledge from analogies → Note is “a component of music”. Analogies Language Word Gym Barbell Relation Estimation component Note Music Note is a component of Music. Analogical Reasoning Tone Record Note Memo Embedding Space
  • 9. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Analogical Reasoning 9 Question: What is the note? ● Case-based Reasoning ○ Nearest Neighbour → Note is “memo”. ● Analogical Reasoning [Prade 2017] ○ Transfer knowledge from analogies → Note is “a component of music”. Word is to Language as Note is to Music. Analogies Language Word Analogical Proportion (AP) Eg.) `A` is to `B` as `C` is to `D`. == A : B :: C : D `<task 1>` is to `<solution 1>` as `<task 2>` is to `<solution 2>`. Task Note Tone Record Note Memo Embedding Space Analogy Task
  • 10. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Related Tasks in NLP 10 Relation mapping problem [Turney 2008]. Tasks that require an explicit understanding of relational knowledge in NLP. ● Analogy Question ● Relation Mapping Problem ● Lexical Relation Classification ● etc SAT analogy question [Turney 2003].
  • 12. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Analogy Question & Solution with Word Embedding 12 SAT analogy question [Turney 2003]. King Queen Woman Man Word Embedding [Mikolov 2013] v(Man) - v(Woman) + v(Queen) = v(King) hq : tq (1) h1 : t1 (2) h2 : t2 (3) h3 : t3 y = argmax{sim(rq, ri)} rq = v(hq) - v(tq) r1 = v(h1) - v(t1) r2 = v(h2) - v(t2) r3 = v(h3) - v(t3) Solving Analogy with Word Embedding
  • 13. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio GPT3 to Solve Analogy 13 GPT3 [Brown 2020] uses template of “A” is to “B” as “C” is to “D” (analogical proportion). hq : tq (1) h1 : t1 (2) h2 : t2 (3) h3 : t3 Prompting (1) x1 (2) x2 (3) x3 (1) 0.9 (2) 0.2 (3) 1.7 Perplexity Analogy Test Sentence Perplexity Model prediction Eg) word:language (1) paint:portrait → word is to language as paint is to portrait (2) note:music → word is to language as note is to music
  • 14. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Result on SAT Analogy Question 14 Model Accuracy on SAT [Turney 2003], [link] LRA [Turney 2005] 56.4 Word2Vec [Mikolov 2013] 42.8 GloVe [Pennington 2014] 48.9 FastText [Bojanowski 2017] 49.7 GPT3 (zeroshot) [Brown 2020] 53.7 GPT3 (fewshot) [Brown 2020] 65.2 Human [Turney 2005] 57.0 Remark ● Human performance is not high. ● LRA (SVD on a PMI matrix over large word-pair corpus) is competitive with recent methods. ● None of zero-shot outperforms human baselines including GPT3. So, LMs are poor at understanding analogy… 😨
  • 15. BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
  • 16. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio LM to Solve Analogy 16 hq : tq (1) h1 : t1 (2) h2 : t2 (3) h3 : t3 Prompting (1) x1 (2) x2 (3) x3 (1) 0.2 (2) 0.1 (3) 0.7 LM scoring Analogy Test Sentence Score Model prediction Template types Based on GPT3 approach, but generalize the perplexity to LM scoring with multiple template types.
  • 17. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio LM Scoring 17 - Perplexity (or pseudo-perplexity for masked LMs [Wang 2019] ) - Perplexity-based PMI - Marginal likelihood biased Perplexity (mPPL) *Conditional likelihood: *Marginal likelihood: Text converted from (hq : tq :: hi : ti) with template t . Parameter to control the effect of marginal likelihood [Feldman 2019] . Perplexity Marginal likelihood of tail and head with parameters to control the effect.
  • 18. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 18 1. a : b :: c : d 2. a : c :: b : d 3. b : a :: d : c 4. b : d :: a : c 5. c : d :: a : b 6. c : a :: d : b 7. d : c :: b : a 8. d : b :: c : a Permutations of (a:b) and (c:d) Positive Negative 1. a : b :: d : c 2. a : c :: d : b 3. a : d :: b : c 4. a : d :: c : b 5. b : a :: c : d 6. b : c :: a : d 7. b : c :: d : a 8. b : d :: c : a 9. c : a :: b : d 10. c : b :: a : d 11. c : b :: d : a 12. c : d :: b : a 13. d : a :: b : c 14. d : a :: c : b 15. d : b :: a : c 16. d : c :: a : b Permutation of Analogical Proportion Analogical Proportion (AP) → “a” is to “b” as “c” is to “d” or a : b :: c : d Permutation Invariance [Barbot 2019] ● Positive permutation: ○ Statement is still the same analogy. ● Negative permutation: ○ Statement is not the same analogy. Example: ● “word is to language as note is to music” = “language is to word as music is to note” ● “word is to language as note is to music” ≠ “language is to word as note is to music”
  • 19. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 19 Analogical Proportion (AP) Score Scoring function (perplexity, PMI, mPPL) Parameter control the bias of negative permutation Positive permutations Negative permutations Aggregation function (mean, max, n-th element) AP Score: Relative gain of LM score from negative permutations to positive permutations. Key assumption: Better scoring function should give low score on the negative permutations and high score on the positive permutations.
  • 20. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Analogy Question Datasets 20 SAT [Turney 2003]: US college admission test UNIT2 [Boteanu 2015]: US college applicants in grade of 4 to 12 (age 9 onwards) UNIT4: Same as UNIT2 but organized by high-beginning (SAT level), low-intermediate, high-intermediate, low-advanced, and high-advanced (GRE level). Google [Mikolov 2013]: A mix of semantic and morphological relations (capital-of, singular-plural, etc) BATS [Gladkova 2016]: A mix of lexicographic, encyclopedic, and derivational and inflectional morphology
  • 21. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Results (zeroshot) 21 RoBERTa is the best in U2 & U4 but otherwise FastText owns it 🤔
  • 22. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Results (tune on val) 22 BERT still worse but RoBERTa & GPT2 achieve the best 🤗
  • 23. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 23 Results (SAT full) Few-shot models outperform word embeddings and LRA.
  • 24. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Difficulty Level Breakdown (U2 & U4) 24 UNIT4 UNIT2
  • 25. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Conclusion 25 GPT3 result was dissapointing given the model size, but we show some of the smaller LMs (RoBERTa and GPT2) can solve analogies in a true zero-shot setting to some extent with a carefully designed scoring function. Language models are better than word embeddings at understanding abstract relations, but have ample room for improvement (BERT is still worse even if it is tuned). Language models are very sensitive to hyperparameter tuning in this task, and careful tuning leads to competitive results.
  • 26. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Conclusion 26 GPT3 result was dissapointing given the model size, but we show some of the smaller LMs (RoBERTa and GPT2) can solve analogies in a true zero-shot setting to some extent with a carefully designed scoring function. Language models are better than word embeddings at understanding abstract relations, but have ample room for improvement (BERT is still worse even if it is tuned). Language models are very sensitive to hyperparameter tuning in this task, and careful tuning leads to competitive results. LMs do have relational knowledge, but it is not easy to exploit and utilize in downstream tasks with vanilla LMs. Assumption on Relational Knowledge of LMs
  • 27. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Conclusion 27 GPT3 result was dissapointing given the model size, but we show some of the smaller LMs (RoBERTa and GPT2) can solve analogies in a true zero-shot setting to some extent with a carefully designed scoring function. Language models are better than word embeddings at understanding abstract relations, but have ample room for improvement (BERT is still worse even if it is tuned). Language models are very sensitive to hyperparameter tuning in this task, and careful tuning leads to competitive results. LMs do have relational knowledge, but it is not easy to exploit and utilize in downstream tasks with vanilla LMs. Assumption on Relational Knowledge of LMs What if we can fine-tune LM to explicitly extract relational knowledge?🤔
  • 28. Distilling Relation Embeddings from Pre-trained Language Models
  • 29. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Relation Embedding Word Embedding [Mikolov 2013] Pair2Vec [Joshi 2019] Relative [Camacho-Collados 2019] 29 King Queen Woman Man Word Embedding
  • 30. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Prompt Generation ● Custom Template, AutoPrompt [Shin 2020], P-tuning [Liu 2021] Embedding Aggregation ● Averaging Pooling / Mask Embedding 30 (h, t) Word pair Sentence (tokens) Contextual embedding Fixed-length embedding LM embedding Aggregation x1 x2 x3 … Relation Embedding with LM (RelBERT) Prompting
  • 31. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Prompt Generation ● Custom Template, AutoPrompt [Shin 2020], P-tuning [Liu 2021] Embedding Aggregation ● Averaging Pooling / Mask Embedding 31 (h, t) Sentence (tokens) Contextual embedding Fixed-length embedding LM embedding Aggregation x1 x2 x3 … Relation Embedding with LM (RelBERT) 1. Today, I finally discovered the relation between `h` and `t` : `h` is the <mask> of photographer 2. Today, I finally discovered the relation between `h` and `t` : `t` is `h`’s <mask> 3. Today, I finally discovered the relation between `h` and `t` : <mask> 4. I wasn’t aware of this relationship, but I just read in the encyclopedia that `h` is the <mask> of `t` 5. I wasn’t aware of this relationship, but I just read in the encyclopedia that `t` is `h`’s <mask> Prompting Word pair
  • 32. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Given a triple: anchor “sunscreen::sunburn”, positive “vaccine::malaria”, and negative “dog::cat”, we want the embeddings of the anchor and the positive close but far from the negative. 32 vaccine::malaria sunscreen::sunburn dog::cat Fine-tuning
  • 33. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio Given a triple of the anchor (eg. “sunscreen”), the positive (eg. “sunburn”), and the negative (eg. “evil”), the triplet loss is defined as and the classification loss is defined as where is a learnable weight. 33 Triplet and Classification Loss [Reimers 2019]
  • 34. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio We create the dataset from Relation Similarity Dataset (taxonomy). # Training: 1,580 (positive pairs) 70,207 (negative pairs) # Validation: 1,778 (positive pairs) 78,820 (negative pairs) # 10 parent relations # 10 child relations 34 Dataset
  • 35. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio We create the dataset from Relation Similarity Dataset (taxonomy). # Training: 1,580 (positive pairs) 70,207 (negative pairs) # Validation: 1,778 (positive pairs) 78,820 (negative pairs) # 10 parent relations # 10 child relations 35 Dataset Class Inclusion Parent Relation Child Relation Taxonomic Functional (h1,t1), ... , (hk,tk) Positive (h1,t1), ... , (hk,tk) Negative : Triplet ( xa, xp, xn )
  • 37. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 38 Setup - Cosine similarity in between embeddings (same as word embedding). - No additional training or validation. - Accuracy as the metric. - RoBERTa large as an anchor language model for RelBERT. Analogy Question
  • 38. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 39 SotA in 4 / 5 datasets 🎉 Even better than methods tuned on validation set 🤗 Analogy Question: Results
  • 39. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 40 Setup - Supervised Task - LMs are frozen - macro/micro F1 - Tuned on dev Data statistics. Lexical Relation Classification
  • 40. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio SotA in 4 / 5 datasets in macro F1 score 🎉 SotA in 3 / 5 datasets in micro F1 score 🎉 41 Classification: Results
  • 41. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 42 Does RelBERT just memorize the relations in the training set… ? Experiment: Train RelBERT without hypernym. Result: No significant decrease in hypernym prediction. Memorization
  • 42. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 43 Break-down of BATS results per relation type. ● High accuracy in morphological analogies ● Poor accuracy in encyclopedic analogies Note that the relation similarity dataset does not contain morphological relation but rather semantic relation (taxonomy). Relation Type Breakdown
  • 43. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 44 Train RelBERT on BERT, ALBERT in addition to RoBERTa. → RoBERTa is the best. Vanilla RoBERTa (no fine-tuning). → Fine-tuning (distillation) is necessary. LM Variation
  • 44. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio word2vec RelBERT Latent Space Visualization 2-D embeddings of ConceptNet (with UMAP dim-reduction).
  • 45. Toward a Better Understanding of Relational Knowledge in Language Models Asahi Ushio 46 We propose RelBERT, a framework to achieve relation embedding model based on pretrained LM. RelBERT distil the LM’s relational knowledge and realize a high quality relation embedding. Experimental results show that RelBERT embeddings outperform existing baselines, establishing SotA in analogy and relation classification. - Model: https://huggingface.co/relbert/relbert-roberta-large - Demo: https://huggingface.co/spaces/relbert/Analogy - GitHub: https://github.com/asahi417/relbert Conclusion