SlideShare a Scribd company logo
1 of 18
Download to read offline
BERT is to NLP what AlexNet is to CV:
Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio
Luis Espinosa-Anke
Steven Schockaert
Jose Camacho-Collados
https://github.com/asahi417/analogy-language-model
https://arxiv.org/abs/2105.04949
Language Model Understanding
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
Model Analysis
- Hewitt 2019, Tenney 2019 → The embeddings capture linguistics knowledge.
- Clark 2020 → The attention reflects dependency.
Factual Knowledge
- Petroni 2019 → LM can be used as a commonsense KB.
Generalization Capacity
- Warstadt 2020 → LMs need large data to achieve linguistic generalization.
- Min 2020 → LMs’ poor performance on adversarial data can be improved by DA.
2
Can LMs identify
analogies?
Why Analogies?
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
4
Sample from SAT analogy dataset.
Why Analogies?
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
5
Sample from SAT analogy dataset.
King
Queen
Woman
Man
Analogy in word embedding
Why Analogies?
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
6
Sample from SAT analogy dataset.
Question Answering
Relation Classification
Passage Retrieval
Solving Analogies with LMs
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
7
hq : tq
(1) h1 : t1
(2) h2 : t2
(3) h3 : t3
Prompting (1) x1
(2) x2
(3) x3
(1) 0.2
(2) 0.1
(3) 0.7
LM scoring
Analogy Test Sentence Score
Model prediction
Eg) word:language
(1) paint:portrait → word is to language as paint is to portrait → Compute perplexity
(2) note:music → word is to language as note is to music → Compute perplexity
Prompt types
Scoring Functions
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
8
- Perplexity (PPL)
- Approximated point-wise mutual information (PMI)
- Marginal likelihood biased perplexity (mPPL)
Datasets
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
9
Result
(zeroshot)
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
10
RoBERTa is the best
in U2 & U4 but
otherwise FastText
owns it 🤔
Result
(tune on val)
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
11
BERT still worse 🧐
but
RoBERTa & GPT2
achieve the best 🤗
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
12
Results
(SAT full)
Conclusion
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
13
● Some LMs can solve analogies in a true zero-shot setting to some extent.
● Language models are better than word embeddings at understanding
abstract relations, but have ample room for improvement.
● Language models are very sensitive to hyperparameter tuning in this task,
and careful tuning leads to competitive results.
🌳Thank you!🌳
Story
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
15
LM is very good at all downstream tasks
→ Recent studies have further confirmed the linguistic semantics encoded in LM in a
various way.
→ Also factual knowledge probing shows the capacity of LM
→ what about relational knowledge? Like w2v?
→ we did research on it! The result?
→ Very bad
→ With validation set, some LMs outperforms baseline
→ CONCLUSION
● Some language model represents relation knowledge
● With carefully tuned method, some LM can achieve very high accuracy (SoTa)
→ Future work: prompt mining, supervision
Language Model Pretraining
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
BERT (Devlin, 2018)
T5 (Raffel, 2020)
GPT (Radford, 2018)
16
Permutation Invariance
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
17
1. a : b :: c : d
2. a : c :: b : d
3. b : a :: d : c
4. b : d :: a : c
5. c : d :: a : b
6. c : a :: d : b
7. d : c :: b : a
8. d : b :: c : a
Permutations of (a:b) and (c:d)
Positive Negative
1. a : b :: d : c
2. a : c :: d : b
3. a : d :: b : c
4. a : d :: c : b
5. b : a :: c : d
6. b : c :: a : d
7. b : c :: d : a
8. b : d :: c : a
9. c : a :: b : d
10. c : b :: a : d
11. c : b :: d : a
12. c : d :: b : a
13. d : a :: b : c
14. d : a :: c : b
15. d : b :: a : c
16. d : c :: a : b
*Analogical Proportion Score
eg)
“word is to language as note is to music” = “language is to word as music is to note”
“word is to language as note is to music” ≠ “language is to word as note is to music”
Analogy Between Concepts (Barbot, et al., 2019)
Difficulty Level Breakdown (U2 & U4)
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados
18
UNIT 4 UNIT 2

More Related Content

Similar to 2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

Processing Written English
Processing Written EnglishProcessing Written English
Processing Written EnglishRuel Montefolka
 
Processing of Written Language
Processing of Written LanguageProcessing of Written Language
Processing of Written LanguageHome and School
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back AgainMarkus Voelter
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Basis Technology
 
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxShivaprasad787526
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probingasahiushio1
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowValeria de Paiva
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer scienceFelienne Hermans
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfNohaGhoweil
 

Similar to 2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? (20)

Processing Written English
Processing Written EnglishProcessing Written English
Processing Written English
 
Processing of Written Language
Processing of Written LanguageProcessing of Written Language
Processing of Written Language
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back Again
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Esa act
Esa actEsa act
Esa act
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
lexicographic evidence
lexicographic evidencelexicographic evidence
lexicographic evidence
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer science
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 

Recently uploaded

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Recently uploaded (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 

2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

  • 1. BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio Luis Espinosa-Anke Steven Schockaert Jose Camacho-Collados https://github.com/asahi417/analogy-language-model https://arxiv.org/abs/2105.04949
  • 2. Language Model Understanding BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados Model Analysis - Hewitt 2019, Tenney 2019 → The embeddings capture linguistics knowledge. - Clark 2020 → The attention reflects dependency. Factual Knowledge - Petroni 2019 → LM can be used as a commonsense KB. Generalization Capacity - Warstadt 2020 → LMs need large data to achieve linguistic generalization. - Min 2020 → LMs’ poor performance on adversarial data can be improved by DA. 2
  • 4. Why Analogies? BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 4 Sample from SAT analogy dataset.
  • 5. Why Analogies? BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 5 Sample from SAT analogy dataset. King Queen Woman Man Analogy in word embedding
  • 6. Why Analogies? BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 6 Sample from SAT analogy dataset. Question Answering Relation Classification Passage Retrieval
  • 7. Solving Analogies with LMs BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 7 hq : tq (1) h1 : t1 (2) h2 : t2 (3) h3 : t3 Prompting (1) x1 (2) x2 (3) x3 (1) 0.2 (2) 0.1 (3) 0.7 LM scoring Analogy Test Sentence Score Model prediction Eg) word:language (1) paint:portrait → word is to language as paint is to portrait → Compute perplexity (2) note:music → word is to language as note is to music → Compute perplexity Prompt types
  • 8. Scoring Functions BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 8 - Perplexity (PPL) - Approximated point-wise mutual information (PMI) - Marginal likelihood biased perplexity (mPPL)
  • 9. Datasets BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 9
  • 10. Result (zeroshot) BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 10 RoBERTa is the best in U2 & U4 but otherwise FastText owns it 🤔
  • 11. Result (tune on val) BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 11 BERT still worse 🧐 but RoBERTa & GPT2 achieve the best 🤗
  • 12. BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 12 Results (SAT full)
  • 13. Conclusion BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 13 ● Some LMs can solve analogies in a true zero-shot setting to some extent. ● Language models are better than word embeddings at understanding abstract relations, but have ample room for improvement. ● Language models are very sensitive to hyperparameter tuning in this task, and careful tuning leads to competitive results.
  • 15. Story BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 15 LM is very good at all downstream tasks → Recent studies have further confirmed the linguistic semantics encoded in LM in a various way. → Also factual knowledge probing shows the capacity of LM → what about relational knowledge? Like w2v? → we did research on it! The result? → Very bad → With validation set, some LMs outperforms baseline → CONCLUSION ● Some language model represents relation knowledge ● With carefully tuned method, some LM can achieve very high accuracy (SoTa) → Future work: prompt mining, supervision
  • 16. Language Model Pretraining BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados BERT (Devlin, 2018) T5 (Raffel, 2020) GPT (Radford, 2018) 16
  • 17. Permutation Invariance BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 17 1. a : b :: c : d 2. a : c :: b : d 3. b : a :: d : c 4. b : d :: a : c 5. c : d :: a : b 6. c : a :: d : b 7. d : c :: b : a 8. d : b :: c : a Permutations of (a:b) and (c:d) Positive Negative 1. a : b :: d : c 2. a : c :: d : b 3. a : d :: b : c 4. a : d :: c : b 5. b : a :: c : d 6. b : c :: a : d 7. b : c :: d : a 8. b : d :: c : a 9. c : a :: b : d 10. c : b :: a : d 11. c : b :: d : a 12. c : d :: b : a 13. d : a :: b : c 14. d : a :: c : b 15. d : b :: a : c 16. d : c :: a : b *Analogical Proportion Score eg) “word is to language as note is to music” = “language is to word as music is to note” “word is to language as note is to music” ≠ “language is to word as note is to music” Analogy Between Concepts (Barbot, et al., 2019)
  • 18. Difficulty Level Breakdown (U2 & U4) BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados 18 UNIT 4 UNIT 2