Successfully reported this slideshow.
Your SlideShare is downloading. ×

2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

Ad

BERT is to NLP what AlexNet is to CV:
Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio
Luis Espinosa-Anke
S...

Ad

Language Model Understanding
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asa...

Ad

Can LMs identify
analogies?

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 18 Ad
1 of 18 Ad

2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

Download to read offline

Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as eye is to seeing what ear is to hearing, sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations\footnote{Source code and data to reproduce our experimental results are available in the following repository: https://github.com/asahi417/analogy-language-model .

Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as eye is to seeing what ear is to hearing, sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations\footnote{Source code and data to reproduce our experimental results are available in the following repository: https://github.com/asahi417/analogy-language-model .

2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

  1. 1. BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio Luis Espinosa-Anke Steven Schockaert Jose Camacho-Collados https://github.com/asahi417/analogy-language-model https://arxiv.org/abs/2105.04949
  2. 2. Language Model Understanding BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados Model Analysis - Hewitt 2019, Tenney 2019 → The embeddings capture linguistics knowledge. - Clark 2020 → The attention reflects dependency. Factual Knowledge - Petroni 2019 → LM can be used as a commonsense KB. Generalization Capacity - Warstadt 2020 → LMs need large data to achieve linguistic generalization. - Min 2020 → LMs’ poor performance on adversarial data can be improved by DA. 2
  3. 3. Can LMs identify analogies?
  4. 4. Why Analogies? BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 4 Sample from SAT analogy dataset.
  5. 5. Why Analogies? BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 5 Sample from SAT analogy dataset. King Queen Woman Man Analogy in word embedding
  6. 6. Why Analogies? BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 6 Sample from SAT analogy dataset. Question Answering Relation Classification Passage Retrieval
  7. 7. Solving Analogies with LMs BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 7 hq : tq (1) h1 : t1 (2) h2 : t2 (3) h3 : t3 Prompting (1) x1 (2) x2 (3) x3 (1) 0.2 (2) 0.1 (3) 0.7 LM scoring Analogy Test Sentence Score Model prediction Eg) word:language (1) paint:portrait → word is to language as paint is to portrait → Compute perplexity (2) note:music → word is to language as note is to music → Compute perplexity Prompt types
  8. 8. Scoring Functions BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 8 - Perplexity (PPL) - Approximated point-wise mutual information (PMI) - Marginal likelihood biased perplexity (mPPL)
  9. 9. Datasets BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 9
  10. 10. Result (zeroshot) BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 10 RoBERTa is the best in U2 & U4 but otherwise FastText owns it 🤔
  11. 11. Result (tune on val) BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 11 BERT still worse 🧐 but RoBERTa & GPT2 achieve the best 🤗
  12. 12. BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 12 Results (SAT full)
  13. 13. Conclusion BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 13 ● Some LMs can solve analogies in a true zero-shot setting to some extent. ● Language models are better than word embeddings at understanding abstract relations, but have ample room for improvement. ● Language models are very sensitive to hyperparameter tuning in this task, and careful tuning leads to competitive results.
  14. 14. 🌳Thank you!🌳
  15. 15. Story BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 15 LM is very good at all downstream tasks → Recent studies have further confirmed the linguistic semantics encoded in LM in a various way. → Also factual knowledge probing shows the capacity of LM → what about relational knowledge? Like w2v? → we did research on it! The result? → Very bad → With validation set, some LMs outperforms baseline → CONCLUSION ● Some language model represents relation knowledge ● With carefully tuned method, some LM can achieve very high accuracy (SoTa) → Future work: prompt mining, supervision
  16. 16. Language Model Pretraining BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados BERT (Devlin, 2018) T5 (Raffel, 2020) GPT (Radford, 2018) 16
  17. 17. Permutation Invariance BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 17 1. a : b :: c : d 2. a : c :: b : d 3. b : a :: d : c 4. b : d :: a : c 5. c : d :: a : b 6. c : a :: d : b 7. d : c :: b : a 8. d : b :: c : a Permutations of (a:b) and (c:d) Positive Negative 1. a : b :: d : c 2. a : c :: d : b 3. a : d :: b : c 4. a : d :: c : b 5. b : a :: c : d 6. b : c :: a : d 7. b : c :: d : a 8. b : d :: c : a 9. c : a :: b : d 10. c : b :: a : d 11. c : b :: d : a 12. c : d :: b : a 13. d : a :: b : c 14. d : a :: c : b 15. d : b :: a : c 16. d : c :: a : b *Analogical Proportion Score eg) “word is to language as note is to music” = “language is to word as music is to note” “word is to language as note is to music” ≠ “language is to word as note is to music” Analogy Between Concepts (Barbot, et al., 2019)
  18. 18. Difficulty Level Breakdown (U2 & U4) BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 18 UNIT 4 UNIT 2

×