SlideShare a Scribd company logo
Relational
Knowledge and
Language Models
Cardiff NLP Reading Group
Asahi Ushio
Cardiff University
Language Model Pretraining
● Large scale language model is now a huge trend
● Many architectures (seq2seq, uni/bi-directional,
non-autoregressive, external knowledge, etc)
● Growing data/model size
What actually language model knows?
Agenda
An overview of recent study in “relational knowledge
understanding in pretrained language models”
- Petroni, et al. "Language models as knowledge bases?" 2019
- Jiang, et al. "How can we know what language models know?"
2019
- Bouraoui, et al. "Inducing relational knowledge from BERT."
2019
What LM knows?
Petroni, et al. "Language models as knowledge bases?"
LM knows the fact
“Dante was born in Florence”
It’s a knowledge graph!
Petroni, et al. "Language models as knowledge bases?"
LM knows the fact
“Dante was born in Florence”
KG includes the fact
“(Dante, born-in, Florence)”
Dataset
Petroni, et al. "Language models as knowledge bases?"
● Dataset requirements:
○ Each entry has (prompt, answer)
■ eg) (Dante was born in, Florence)
○ Answer should be single token
○ Query should represent relational knowledge given a head and a
relation in a KG
■ eg) Dante was born in = (Dante, born-in)
Whole pipeline
Petroni, et al. "Language models as knowledge bases?"
KG
Prompting
( head, relation, tail ) ( query, tail )
LM (BERT, etc)
Dataset generation
Model evaluation
eg) (Dante, born-in, Florence)
eg) Dante was born-in
Results
Petroni, et al. "Language models as knowledge bases?"
BERT Large
Effect of prompt type
What’s the best prompt? 🤔
KG
Prompting
( head, relation, tail ) ( query, tail )
Dataset generation
eg) (Dante, born-in, Florence)
● Dante was born-in Florence.
● Florence is where Dante was born
● Dante was born-in Florence, Italy.
● etc
Jiang, et al. "How can we know what language models know?"
Prompt ensembling/selection
● Prompting methods
○ Manual
○ Mined: Frequency in a large corpus
○ Paraphrased: Back-translation
● Ensembling prompt
○ Optimization over training set
● Data:
○ Test: T-Rex
○ Training: Wikidata
Jiang, et al. "How can we know what language models know?"
Improve manual prompt
Jiang, et al. "How can we know what language models know?"
Ensembling
Averaging
Mined prompts
Jiang, et al. "How can we know what language models know?"
Paraphrased prompts
Jiang, et al. "How can we know what language models know?"
Ensembling weight
Jiang, et al. "How can we know what language models know?"
Issue with link prediction
● Spurious correlation among subject and object
○ Birds cannot [MASK] → fly, Kassner and Schütze, 2020
○ The capital of Macintosh is [MASK] → apple, Bouraoui, et al. 2019
● Heuristics on surface form
○ BERT is very good at IR, Petroni, et al. 2020
Relation classification
Bouraoui, et al. "Inducing relational knowledge from BERT."
KG
Prompting
( head, relation, tail ) ( sentence, relation )
Dataset generation
Model training
LM + Linear
Model evaluation
LM + Linear
eg) (Dante, born-in, Florence) eg) (Dante was born-in Florence, born-in)
Template search and prompt
Bouraoui, et al. "Inducing relational knowledge from BERT."
1) Extract sentences over all (head, tail)
⇒ relation template candidates
2) BERT-based filtering
⇒ find template to the relation
Wikipedia
KG (training set)
( relation, template )
Prompting
sent rel A: False
sent rel B: False
sent rel C: True
Classification result
Bouraoui, et al. "Inducing relational knowledge from BERT."
word embedding
language model
Result breakdown: Google
Bouraoui, et al. "Inducing relational knowledge from BERT."
Result breakdown: DiffVec
Bouraoui, et al. "Inducing relational knowledge from BERT."
Result breakdown: BATS
Bouraoui, et al. "Inducing relational knowledge from BERT."
Recap
Link prediction: finetuning-free, but other factors
- Petroni, et al. "Language models as knowledge bases?" 2019
- Jiang, et al. "How can we know what language models know?"
2019
Relation classification: relation evaluation, but finetuning
- Bouraoui, et al. "Inducing relational knowledge from BERT."
2019
Limitation/Open issue
● Object with multiple tokens
● Better template
● More dataset
● Effect on other tasks
● etc...
Thanks for listening 🎉
Related Topic
1. KB augmented LM
○ Latent Relation LMs, KnowBERT, COMET
2. LM training with KB
○ RAG, REALM
3. LM inference with KB
○ kNN-LM, BERT-kNN, IR+BERT
4. KB completion
○ Commonsense KB completion, LMs are open KG
LM
KB
LM KB
Learnable
LM KB
Learnable
LM KB
Learnable
Comment given to the talk
● We need to differentiate LM as a language generator and
fact retriever
● BPE subword handling
● More complex reasoning

More Related Content

Similar to 2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing

lexicographic evidence
lexicographic evidencelexicographic evidence
lexicographic evidence
Duygu Aşıklar
 
Hoopingarner.Actfl2008
Hoopingarner.Actfl2008Hoopingarner.Actfl2008
Hoopingarner.Actfl2008
dhoopingarner
 
Mini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimizationMini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimization
Filip Ilievski
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
Valeria de Paiva
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
WarNik Chow
 
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
asahiushio1
 
Parameter setting
Parameter settingParameter setting
Parameter setting
Rick McKinnon
 
The future of language learning in virtual worlds
The future of language learning in virtual worldsThe future of language learning in virtual worlds
The future of language learning in virtual worlds
University of Hawaii
 
Sample Of A Cause And Effect Essay
Sample Of A Cause And Effect EssaySample Of A Cause And Effect Essay
Sample Of A Cause And Effect Essay
Kathy Murray
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
Lawrie Hunter
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
Sara Hooker
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
WarNik Chow
 
Euro Exam
Euro Exam Euro Exam
Euro Exam
Trendy English
 
ChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptx
ChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptxChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptx
ChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptx
apicciano
 
CUNY IT Picciano.pptx
CUNY IT Picciano.pptxCUNY IT Picciano.pptx
CUNY IT Picciano.pptx
apicciano
 
Suzana_Delic
Suzana_DelicSuzana_Delic
Suzana_Delic
Suzana Delic
 
Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...
Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...
Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...
eaquals
 
Presentation alex
Presentation alexPresentation alex
Presentation alex
Alex Curtis
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?
Dominik Seisser
 
Leu Iespp Final
Leu Iespp FinalLeu Iespp Final
Leu Iespp Final
djleu
 

Similar to 2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing (20)

lexicographic evidence
lexicographic evidencelexicographic evidence
lexicographic evidence
 
Hoopingarner.Actfl2008
Hoopingarner.Actfl2008Hoopingarner.Actfl2008
Hoopingarner.Actfl2008
 
Mini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimizationMini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimization
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
 
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
 
Parameter setting
Parameter settingParameter setting
Parameter setting
 
The future of language learning in virtual worlds
The future of language learning in virtual worldsThe future of language learning in virtual worlds
The future of language learning in virtual worlds
 
Sample Of A Cause And Effect Essay
Sample Of A Cause And Effect EssaySample Of A Cause And Effect Essay
Sample Of A Cause And Effect Essay
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
Euro Exam
Euro Exam Euro Exam
Euro Exam
 
ChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptx
ChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptxChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptx
ChatGPT - ADSUP 717 New PPT Revised 1 1 1 1 1 1.pptx
 
CUNY IT Picciano.pptx
CUNY IT Picciano.pptxCUNY IT Picciano.pptx
CUNY IT Picciano.pptx
 
Suzana_Delic
Suzana_DelicSuzana_Delic
Suzana_Delic
 
Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...
Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...
Andrew Bax & Bruce Bax: Using an online tool to benchmark texts to the CEFR: ...
 
Presentation alex
Presentation alexPresentation alex
Presentation alex
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?
 
Leu Iespp Final
Leu Iespp FinalLeu Iespp Final
Leu Iespp Final
 

More from asahiushio1

2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
asahiushio1
 
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
asahiushio1
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
asahiushio1
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
asahiushio1
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
asahiushio1
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
asahiushio1
 

More from asahiushio1 (6)

2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
 
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
 

Recently uploaded

Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
savindersingh16
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
vimalveerammal
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
suyashempire
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
gyhwyo
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Nistarini College, Purulia (W.B) India
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
Polycythemia vera_causes_disorders_treatment.pptx
Polycythemia vera_causes_disorders_treatment.pptxPolycythemia vera_causes_disorders_treatment.pptx
Polycythemia vera_causes_disorders_treatment.pptx
muralinath2
 

Recently uploaded (20)

Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
Polycythemia vera_causes_disorders_treatment.pptx
Polycythemia vera_causes_disorders_treatment.pptxPolycythemia vera_causes_disorders_treatment.pptx
Polycythemia vera_causes_disorders_treatment.pptx
 

2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing

  • 1. Relational Knowledge and Language Models Cardiff NLP Reading Group Asahi Ushio Cardiff University
  • 2. Language Model Pretraining ● Large scale language model is now a huge trend ● Many architectures (seq2seq, uni/bi-directional, non-autoregressive, external knowledge, etc) ● Growing data/model size What actually language model knows?
  • 3. Agenda An overview of recent study in “relational knowledge understanding in pretrained language models” - Petroni, et al. "Language models as knowledge bases?" 2019 - Jiang, et al. "How can we know what language models know?" 2019 - Bouraoui, et al. "Inducing relational knowledge from BERT." 2019
  • 4. What LM knows? Petroni, et al. "Language models as knowledge bases?" LM knows the fact “Dante was born in Florence”
  • 5. It’s a knowledge graph! Petroni, et al. "Language models as knowledge bases?" LM knows the fact “Dante was born in Florence” KG includes the fact “(Dante, born-in, Florence)”
  • 6. Dataset Petroni, et al. "Language models as knowledge bases?" ● Dataset requirements: ○ Each entry has (prompt, answer) ■ eg) (Dante was born in, Florence) ○ Answer should be single token ○ Query should represent relational knowledge given a head and a relation in a KG ■ eg) Dante was born in = (Dante, born-in)
  • 7. Whole pipeline Petroni, et al. "Language models as knowledge bases?" KG Prompting ( head, relation, tail ) ( query, tail ) LM (BERT, etc) Dataset generation Model evaluation eg) (Dante, born-in, Florence) eg) Dante was born-in
  • 8. Results Petroni, et al. "Language models as knowledge bases?" BERT Large
  • 9. Effect of prompt type What’s the best prompt? 🤔 KG Prompting ( head, relation, tail ) ( query, tail ) Dataset generation eg) (Dante, born-in, Florence) ● Dante was born-in Florence. ● Florence is where Dante was born ● Dante was born-in Florence, Italy. ● etc Jiang, et al. "How can we know what language models know?"
  • 10. Prompt ensembling/selection ● Prompting methods ○ Manual ○ Mined: Frequency in a large corpus ○ Paraphrased: Back-translation ● Ensembling prompt ○ Optimization over training set ● Data: ○ Test: T-Rex ○ Training: Wikidata Jiang, et al. "How can we know what language models know?"
  • 11. Improve manual prompt Jiang, et al. "How can we know what language models know?" Ensembling Averaging
  • 12. Mined prompts Jiang, et al. "How can we know what language models know?"
  • 13. Paraphrased prompts Jiang, et al. "How can we know what language models know?"
  • 14. Ensembling weight Jiang, et al. "How can we know what language models know?"
  • 15. Issue with link prediction ● Spurious correlation among subject and object ○ Birds cannot [MASK] → fly, Kassner and Schütze, 2020 ○ The capital of Macintosh is [MASK] → apple, Bouraoui, et al. 2019 ● Heuristics on surface form ○ BERT is very good at IR, Petroni, et al. 2020
  • 16. Relation classification Bouraoui, et al. "Inducing relational knowledge from BERT." KG Prompting ( head, relation, tail ) ( sentence, relation ) Dataset generation Model training LM + Linear Model evaluation LM + Linear eg) (Dante, born-in, Florence) eg) (Dante was born-in Florence, born-in)
  • 17. Template search and prompt Bouraoui, et al. "Inducing relational knowledge from BERT." 1) Extract sentences over all (head, tail) ⇒ relation template candidates 2) BERT-based filtering ⇒ find template to the relation Wikipedia KG (training set) ( relation, template ) Prompting sent rel A: False sent rel B: False sent rel C: True
  • 18. Classification result Bouraoui, et al. "Inducing relational knowledge from BERT." word embedding language model
  • 19. Result breakdown: Google Bouraoui, et al. "Inducing relational knowledge from BERT."
  • 20. Result breakdown: DiffVec Bouraoui, et al. "Inducing relational knowledge from BERT."
  • 21. Result breakdown: BATS Bouraoui, et al. "Inducing relational knowledge from BERT."
  • 22. Recap Link prediction: finetuning-free, but other factors - Petroni, et al. "Language models as knowledge bases?" 2019 - Jiang, et al. "How can we know what language models know?" 2019 Relation classification: relation evaluation, but finetuning - Bouraoui, et al. "Inducing relational knowledge from BERT." 2019
  • 23. Limitation/Open issue ● Object with multiple tokens ● Better template ● More dataset ● Effect on other tasks ● etc...
  • 25. Related Topic 1. KB augmented LM ○ Latent Relation LMs, KnowBERT, COMET 2. LM training with KB ○ RAG, REALM 3. LM inference with KB ○ kNN-LM, BERT-kNN, IR+BERT 4. KB completion ○ Commonsense KB completion, LMs are open KG LM KB LM KB Learnable LM KB Learnable LM KB Learnable
  • 26. Comment given to the talk ● We need to differentiate LM as a language generator and fact retriever ● BPE subword handling ● More complex reasoning