Successfully reported this slideshow.
Your SlideShare is downloading. ×

2021-11, EMNLP, Distilling Relation Embeddings from Pre-trained Language Models

Ad

Distilling Relation Embeddings from Pre-trained
Language Models
Asahi Ushio
Jose Camacho-Collados
Steven Schockaert
https:...

Ad

Distilling Relation Embeddings from Pre-trained Language Models
Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados
...

Ad

Distilling Relation Embeddings from Pre-trained Language Models
Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados
...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 24 Ad
1 of 24 Ad

2021-11, EMNLP, Distilling Relation Embeddings from Pre-trained Language Models

Download to read offline

Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it possible to distill high-quality word vectors from pre-trained language models. However, it is currently unclear to what extent it is possible to distill relation embeddings, i.e. vectors that characterize the relationship between two words. Such relation embeddings are appealing because they can, in principle, encode relational knowledge in a more fine-grained way than is possible with knowledge graphs. To obtain relation embeddings from a pre-trained language model, we encode word pairs using a (manually or automatically generated) prompt, and we fine-tune the language model such that relationally similar word pairs yield similar output vectors. We find that the resulting relation embeddings are highly competitive on analogy (unsupervised) and relation classification (supervised) benchmarks, even without any task-specific fine-tuning. Source code to reproduce our experimental results and the model checkpoints are available in the following repository: https://github.com/asahi417/relbert

Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it possible to distill high-quality word vectors from pre-trained language models. However, it is currently unclear to what extent it is possible to distill relation embeddings, i.e. vectors that characterize the relationship between two words. Such relation embeddings are appealing because they can, in principle, encode relational knowledge in a more fine-grained way than is possible with knowledge graphs. To obtain relation embeddings from a pre-trained language model, we encode word pairs using a (manually or automatically generated) prompt, and we fine-tune the language model such that relationally similar word pairs yield similar output vectors. We find that the resulting relation embeddings are highly competitive on analogy (unsupervised) and relation classification (supervised) benchmarks, even without any task-specific fine-tuning. Source code to reproduce our experimental results and the model checkpoints are available in the following repository: https://github.com/asahi417/relbert

2021-11, EMNLP, Distilling Relation Embeddings from Pre-trained Language Models

  1. 1. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio Jose Camacho-Collados Steven Schockaert https://github.com/asahi417/relbert
  2. 2. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Language Model Understanding 2 Syntactic Knowledge - Probing embedding: Hewitt 2019, Tenney 2019 - Probing attention weight: Clark 2020 Factual Knowledge a.k.a Language Model as a Commonsense KB - Petroni 2019, Kassner 2020, Jiang 2020, etc Relational Knowledge a.k.a Language Model as a Lexical Relation Reasoner - LM fine-tuning on relation classification: Bouraoui 2019 - Vanilla LM evaluation: Ushio 2021
  3. 3. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Language Model Understanding 3 Syntactic Knowledge - Probing embedding: Hewitt 2019, Tenney 2019 - Probing attention weight: Clark 2020 Factual Knowledge a.k.a Language Model as a Commonsense KB - Petroni 2019, Kassner 2020, Jiang 2020, etc Relational Knowledge a.k.a Language Model as a Lexical Relation Reasoner - LM fine-tuning on relation classification: Bouraoui 2019 - Vanilla LM evaluation: Ushio 2021
  4. 4. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Language Model Understanding 4 Syntactic Knowledge - Probing embedding: Hewitt 2019, Tenney 2019 - Probing attention weight: Clark 2020 Factual Knowledge a.k.a Language Model as a Commonsense KB - Petroni 2019, Kassner 2020, Jiang 2020, etc Relational Knowledge a.k.a Language Model as a Lexical Relation Reasoner - LM fine-tuning on relation classification: Bouraoui 2019 - Vanilla LM evaluation: Ushio 2021 Can we distil relational knowledge as relation embedding?
  5. 5. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Relation Embedding Word Embedding Mikolov (2013) Pair2Vec Joshi (2019) Relative Camacho-Collados (2019) 5 King Queen Woman Man Word Embedding
  6. 6. RelBERT
  7. 7. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Prompt Generation Custom Template, AutoPrompt (Shin 2020), P-tuning (Liu 2021) LM embedding Averaging over the context Relation Embedding from LM 7 (h, t) Word pair Sentence (tokens) Prompt Generation Contextual embeddings Fixed-length embedding LM encoding x Averaging s1 s2 s3
  8. 8. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Prompt Generation Custom Template, AutoPrompt (Shin 2020), P-tuning (Liu 2021) LM embedding Averaging over the context Relation Embedding from LM 8 (h, t) Word pair Sentence (tokens) Prompt Generation Contextual embeddings Fixed-length embedding LM encoding x Averaging s1 s2 s3 1. Today, I finally discovered the relation between camera and photographer : camera is the <mask> of photographer 2. Today, I finally discovered the relation between camera and photographer : photographer is camera’s <mask> 3. Today, I finally discovered the relation between camera and photographer : <mask> 4. I wasn’t aware of this relationship, but I just read in the encyclopedia that camera is the <mask> of photographer 5. I wasn’t aware of this relationship, but I just read in the encyclopedia that photographer is camera’s <mask>
  9. 9. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Given a triple: anchor “sunscreen::sunburn”, positive “vaccine::malaria”, and negative “dog::cat”, we want the embeddings of the anchor and the posirive close but far from the negative. Loss function: Triplet loss and classification loss following SBERT (Reimers 2019). Fine-tuning on Triples 9 vaccine::malaria sunscreen::sunburn dog::cat
  10. 10. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados We create the dataset from SemEval 2012 Task 2. Dataset 10 Class Inclusion Parent Relation Child Relation Taxonomic Functional (h1,t1), ... , (hk,tk) Positive (h1,t1), ... , (hk,tk) Negative : Triplet ( xa, xp, xn ) : : ( xa,1, xp,1, xn,1 ) ( xa,2, xp,2, xn,2 ) : ( xa,m, xp,m, xn,m ) [(xa,m, xp,m, xa,1), ..., (xa,m, xp,m, xp,m-1)] Original batch: m samples Augmented batch: 2m(m-1) samples [(xa,1, xp,1, xa,2 ), ..., (xa,1, xp,1, xp,m)]
  11. 11. EXPERIMENTS
  12. 12. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Experiment: Analogy 12 Setup - Cosine similarity in between embeddings. - No training. - Accuracy as the metric. - No validation. Sample from SAT analogy dataset. Data statistics.
  13. 13. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Experiment: Analogy 13 SotA in 4 / 5 datasets 🎉 Better than tuned methods on dev set 🤗
  14. 14. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Experiment: Classification 14 Setup - Supervised Task - LMs are frozen - macro/micro F1 - Tuned on dev Data statistics.
  15. 15. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados SotA in 4 / 5 datasets in macro F1 score 🎉 SotA in 3 / 5 datasets in micro F1 score 🎉 Experiment: Classification 15
  16. 16. ANALYSIS
  17. 17. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Relation Memorarization 17 Does RelBERT just memorize the relations in the training set… ? Experiment: Train RelBERT without hypernymy. Result: No significant decrease in hyperbyny prediction. → RelBERT does not rely on the memorization!
  18. 18. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Fine-tuning? Other LMs? 18 Train RelBERT on BERT, ALBERT in addition to RoBERTa. → RoBERTa is the best. Vanilla RoBERTa (no fine-tuning). → Fine-tuning (distillation) is necessary.
  19. 19. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Conclusion 19 ● We propose RelBERT, a framework to achieve relation embedding model based on pretrained LM. ● RelBERT distil the LM’s relational knowledge and realize a high quality relation embedding. ● Experimental results show that RelBERT embedding outperform existing baselines, establishing SotA in analogy and relation classification.
  20. 20. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Release of RelBERT Library 20 We release python package relbert (install via pip install relbert) along with model checkpoints on the huggingface modelhub. Please check our project page https://github.com/asahi417/relbert !! from relbert import RelBERT model = RelBERT( 'asahi417/relbert-roberta-large' ) # the vector has (1024,) v_tokyo_japan = model.get_embedding([ 'Tokyo', 'Japan'])
  21. 21. 🌳Thank you!🌳
  22. 22. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Comparing to Word Embeddings 22 FastText is still better than RelBERT in Google Analogy Question. Breakdown per relation types shows that FastText is better in the morphological relation, while very poor in the lexical relation.
  23. 23. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Nearest Neighbours 23
  24. 24. Distilling Relation Embeddings from Pre-trained Language Models Asahi Ushio, Steven Schockaert, and Jose Camacho-Collados Fine-tuning on Triples Given a triple of the anchor (eg. “sunscreen”), the positive (eg. “sunburn”), and the negative (eg. “evil”), the triplet loss is defined as and the classification loss is defined as where is a learnable weight. The loss functions are inspired by SBERT (Reimers 2019). 24

×