SlideShare a Scribd company logo
BERT
李宏毅
Hung-yi Lee
dog
cat
rabbit
jump
run
flower
tree
apple = [ 1 0 0 0 0]
bag = [ 0 1 0 0 0]
cat = [ 0 0 1 0 0]
dog = [ 0 0 0 1 0]
elephant = [ 0 0 0 0 1]
1-of-N Encoding Word Embedding
flower
tree apple
dog
cat bird
class 1 Class 2 Class 3
ran
jumped
walk
Word Class
A word can have multiple senses.
Have you paid that money to the bank yet ?
It is safest to deposit your money in the bank .
The victim was found lying dead on the river bank .
They stood on the river bank to fish.
The hospital has its own blood bank.
The third sense or not?
https://arxiv.org/abs/1902.06006
More Examples
他是尼祿
她也是尼祿
這是
加賀號護衛艦
這也是加賀
號護衛艦
money
the
Contextualized Word Embedding
Contextualized
Word Embedding
Contextualized
Word Embedding
in the bank
river bank
own
Contextualized
Word Embedding
blood bank
… …
… …
… …
Embeddings from Language
Model (ELMO)
• RNN-based language models (trained from lots of sentences)
https://arxiv.org/abs/1802.05365
e.g. given “潮水 退了 就 知道 誰 沒穿 褲子”
RNN
RNN
<BOS> 潮水 退了
RNN
RNN
RNN
RNN
潮水 退了 就
…
…
RNN
RNN
退了 就 知道
RNN
RNN
RNN
RNN
潮水 退了 就
…
…
…
…
ELMO
RNN
RNN
<BOS> 潮水 退了
RNN
RNN
RNN
RNN
…
…
RNN
RNN
退了 就 知道
RNN
RNN
RNN
RNN
…
…
…
…
RNN RNN RNN … RNN RNN RNN …
…
Each layer in deep LSTM can generate a
latent representation.
Which one should we use???
ℎ1
ℎ2
…
…
…
…
…
…
ELMO
潮水 退了 就 知道 ……
ELMO
large
small
Learned with the down
stream tasks
= 𝛼1 + 𝛼2
Bidirectional Encoder
Representations from Transformers
(BERT)
• BERT = Encoder of Transformer
Encoder
BERT
潮水 退了 就 知道 ……
Learned from a large amount of text
without annotation
……
Training of BERT
• Approach 1:
Masked LM
BERT
潮水 退了 就 知道 ……
……
[MASK]
Linear Multi-class
Classifier
Predicting the
masked word
vocabulary size
BERT
[CLS] 醒醒 吧 [SEP]
Training of BERT
Approach 2: Next Sentence Prediction
你 沒有 妹妹
Linear Binary
Classifier
yes [CLS]: the position that outputs
classification results
[SEP]: the boundary of two sentences
Approaches 1 and 2 are used at the same time.
BERT
[CLS] 醒醒 吧 [SEP]
Training of BERT
Approach 2: Next Sentence Prediction
眼睛 業障 重
Linear Binary
Classifier
No [CLS]: the position that outputs
classification results
[SEP]: the boundary of two sentences
Approaches 1 and 2 are used at the same time.
How to use BERT – Case 1
BERT
[CLS] w1 w2 w3
Linear
Classifier
class
Input: single sentence,
output: class
sentence
Example:
Sentiment analysis (our
HW),
Document Classification
Trained from
Scratch
Fine-tune
How to use BERT – Case 2
BERT
[CLS] w1 w2 w3
Linear
Cls
class
Input: single sentence,
output: class of each word
sentence
Example: Slot filling
Linear
Cls
class
Linear
Cls
class
Linear
Classifier
w1 w2
How to use BERT – Case 3
BERT
[CLS] [SEP]
Class
Sentence 1 Sentence 2
w3 w4 w5
Input: two sentences, output: class
Example: Natural Language Inference
Given a “premise”, determining whether
a “hypothesis” is T/F/ unknown.
How to use BERT – Case 4
• Extraction-based Question
Answering (QA) (E.g. SQuAD)
𝐷 = 𝑑1, 𝑑2, ⋯ , 𝑑𝑁
𝑄 = 𝑞1, 𝑞2, ⋯ , 𝑞𝑁
QA
Model
output: two integers (𝑠, 𝑒)
𝐴 = 𝑞𝑠, ⋯ , 𝑞𝑒
Document:
Query:
Answer:
𝐷
𝑄
𝑠
𝑒
17
77 79
𝑠 = 17, 𝑒 = 17
𝑠 = 77, 𝑒 = 79
q1 q2
How to use BERT – Case 4
BERT
[CLS] [SEP]
question document
d1 d2 d3
dot product
Softmax
0.5
0.3 0.2
The answer is “d2 d3”.
s = 2, e = 3
Learned
from scratch
q1 q2
How to use BERT – Case 4
BERT
[CLS] [SEP]
question document
d1 d2 d3
dot product
Softmax
0.2
0.1 0.7
The answer is “d2 d3”.
s = 2, e = 3
Learned
from scratch
BERT 屠榜 ……
SQuAD 2.0
Enhanced Representation through
Knowledge Integration (ERNIE)
• Designed for Chinese
https://arxiv.org/abs/1904.09223
BERT
ERNIE
Source of image:
https://zhuanlan.zhihu.com/p/59436589
What does BERT
learn?
https://arxiv.org/abs/1905.05950
https://openreview.net/pdf?id=SJzSgnRcKX
Multilingual BERT
https://arxiv.org/abs/1904.09077
Trained on 104 languages
Task specific training
data for English
En Class 1
En Class 2
En Class 3
Task specific testing
data for Chinese
Zh ? Zh ?
Zh ?
Zh ? Zh ?
Generative Pre-Training (GPT)
https://d4mucfpksywv.cloudfront.net/better-language-
models/language_models_are_unsupervised_multitask_learners.pdf
Source of image: https://huaban.com/pins/1714071707/
ELMO
(94M)
BERT
(340M)
GPT-2
(1542M)
Transformer
Decoder
Generative Pre-Training (GPT)
𝑣1
𝑘1
𝑞1 𝑣2
𝑘2
𝑞2
𝑣3
𝑘3
𝑞3
𝑣4
𝑘4
𝑞4
𝑎4
𝑎3
𝑎2
𝑎1
<BOS> 潮水
𝛼2,1 𝛼2,2
𝑏2
Many Layers …
退了
Generative Pre-Training (GPT)
𝑣1
𝑘1
𝑞1 𝑣2
𝑘2
𝑞2
𝑣3
𝑘3
𝑞3
𝑣4
𝑘4
𝑞4
𝑎4
𝑎3
𝑎2
𝑎1
<BOS> 潮水
𝛼3,2 𝛼3,3
𝑏3
Many Layers …
就
退了
𝛼3,1
就
CoQA
𝑑1, 𝑑2, ⋯ , 𝑑𝑁,
”Q:”, 𝑞1, 𝑞2, ⋯ , 𝑞𝑁,
“A:”
Zero-shot Learning?
• Reading Comprehension
• Summarization 𝑑1, 𝑑2, ⋯ , 𝑑𝑁,”TL;DR:”
• Translation English sentence 1 = French sentence 1
English sentence 2 = French sentence 2
English sentence 3 =
Visualization
https://arxiv.org/abs/1904.02679
(The results below are from GPT-2)
https://talktotransformer.com/
Credit: Greg Durrett
GPT-2
Can BERT speak?
• Unified Language Model Pre-training for Natural
Language Understanding and Generation
• https://arxiv.org/abs/1905.03197
• BERT has a Mouth, and It Must Speak: BERT as a
Markov Random Field Language Model
• https://arxiv.org/abs/1902.04094
• Insertion Transformer: Flexible Sequence Generation
via Insertion Operations
• https://arxiv.org/abs/1902.03249
• Insertion-based Decoding with automatically Inferred
Generation Order
• https://arxiv.org/abs/1902.01370

More Related Content

What's hot

What's hot (20)

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
BERT
BERTBERT
BERT
 
Bert
BertBert
Bert
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
BERT
BERTBERT
BERT
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Bert.pptx
Bert.pptxBert.pptx
Bert.pptx
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Accelerated Training of Transformer Models
Accelerated Training of Transformer ModelsAccelerated Training of Transformer Models
Accelerated Training of Transformer Models
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Gpt models
Gpt modelsGpt models
Gpt models
 
‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERT
 

Similar to BERT (v3).pptx

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 

Similar to BERT (v3).pptx (10)

visualizing and measuring the geometry of bert
visualizing and measuring the geometry of bertvisualizing and measuring the geometry of bert
visualizing and measuring the geometry of bert
 
ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters
ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing FiltersICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters
ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters
 
ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters
ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing FiltersICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters
ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 

Recently uploaded

RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
Atif Razi
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
Kamal Acharya
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
AbrahamGadissa
 

Recently uploaded (20)

Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 

BERT (v3).pptx

  • 2. dog cat rabbit jump run flower tree apple = [ 1 0 0 0 0] bag = [ 0 1 0 0 0] cat = [ 0 0 1 0 0] dog = [ 0 0 0 1 0] elephant = [ 0 0 0 0 1] 1-of-N Encoding Word Embedding flower tree apple dog cat bird class 1 Class 2 Class 3 ran jumped walk Word Class
  • 3. A word can have multiple senses. Have you paid that money to the bank yet ? It is safest to deposit your money in the bank . The victim was found lying dead on the river bank . They stood on the river bank to fish. The hospital has its own blood bank. The third sense or not? https://arxiv.org/abs/1902.06006
  • 5. money the Contextualized Word Embedding Contextualized Word Embedding Contextualized Word Embedding in the bank river bank own Contextualized Word Embedding blood bank … … … … … …
  • 6. Embeddings from Language Model (ELMO) • RNN-based language models (trained from lots of sentences) https://arxiv.org/abs/1802.05365 e.g. given “潮水 退了 就 知道 誰 沒穿 褲子” RNN RNN <BOS> 潮水 退了 RNN RNN RNN RNN 潮水 退了 就 … … RNN RNN 退了 就 知道 RNN RNN RNN RNN 潮水 退了 就 … … … …
  • 7. ELMO RNN RNN <BOS> 潮水 退了 RNN RNN RNN RNN … … RNN RNN 退了 就 知道 RNN RNN RNN RNN … … … … RNN RNN RNN … RNN RNN RNN … … Each layer in deep LSTM can generate a latent representation. Which one should we use??? ℎ1 ℎ2 … … … … … …
  • 8.
  • 9. ELMO 潮水 退了 就 知道 …… ELMO large small Learned with the down stream tasks = 𝛼1 + 𝛼2
  • 10. Bidirectional Encoder Representations from Transformers (BERT) • BERT = Encoder of Transformer Encoder BERT 潮水 退了 就 知道 …… Learned from a large amount of text without annotation ……
  • 11. Training of BERT • Approach 1: Masked LM BERT 潮水 退了 就 知道 …… …… [MASK] Linear Multi-class Classifier Predicting the masked word vocabulary size
  • 12. BERT [CLS] 醒醒 吧 [SEP] Training of BERT Approach 2: Next Sentence Prediction 你 沒有 妹妹 Linear Binary Classifier yes [CLS]: the position that outputs classification results [SEP]: the boundary of two sentences Approaches 1 and 2 are used at the same time.
  • 13. BERT [CLS] 醒醒 吧 [SEP] Training of BERT Approach 2: Next Sentence Prediction 眼睛 業障 重 Linear Binary Classifier No [CLS]: the position that outputs classification results [SEP]: the boundary of two sentences Approaches 1 and 2 are used at the same time.
  • 14. How to use BERT – Case 1 BERT [CLS] w1 w2 w3 Linear Classifier class Input: single sentence, output: class sentence Example: Sentiment analysis (our HW), Document Classification Trained from Scratch Fine-tune
  • 15. How to use BERT – Case 2 BERT [CLS] w1 w2 w3 Linear Cls class Input: single sentence, output: class of each word sentence Example: Slot filling Linear Cls class Linear Cls class
  • 16. Linear Classifier w1 w2 How to use BERT – Case 3 BERT [CLS] [SEP] Class Sentence 1 Sentence 2 w3 w4 w5 Input: two sentences, output: class Example: Natural Language Inference Given a “premise”, determining whether a “hypothesis” is T/F/ unknown.
  • 17. How to use BERT – Case 4 • Extraction-based Question Answering (QA) (E.g. SQuAD) 𝐷 = 𝑑1, 𝑑2, ⋯ , 𝑑𝑁 𝑄 = 𝑞1, 𝑞2, ⋯ , 𝑞𝑁 QA Model output: two integers (𝑠, 𝑒) 𝐴 = 𝑞𝑠, ⋯ , 𝑞𝑒 Document: Query: Answer: 𝐷 𝑄 𝑠 𝑒 17 77 79 𝑠 = 17, 𝑒 = 17 𝑠 = 77, 𝑒 = 79
  • 18. q1 q2 How to use BERT – Case 4 BERT [CLS] [SEP] question document d1 d2 d3 dot product Softmax 0.5 0.3 0.2 The answer is “d2 d3”. s = 2, e = 3 Learned from scratch
  • 19. q1 q2 How to use BERT – Case 4 BERT [CLS] [SEP] question document d1 d2 d3 dot product Softmax 0.2 0.1 0.7 The answer is “d2 d3”. s = 2, e = 3 Learned from scratch
  • 21. Enhanced Representation through Knowledge Integration (ERNIE) • Designed for Chinese https://arxiv.org/abs/1904.09223 BERT ERNIE Source of image: https://zhuanlan.zhihu.com/p/59436589
  • 23. Multilingual BERT https://arxiv.org/abs/1904.09077 Trained on 104 languages Task specific training data for English En Class 1 En Class 2 En Class 3 Task specific testing data for Chinese Zh ? Zh ? Zh ? Zh ? Zh ?
  • 24. Generative Pre-Training (GPT) https://d4mucfpksywv.cloudfront.net/better-language- models/language_models_are_unsupervised_multitask_learners.pdf Source of image: https://huaban.com/pins/1714071707/ ELMO (94M) BERT (340M) GPT-2 (1542M) Transformer Decoder
  • 25. Generative Pre-Training (GPT) 𝑣1 𝑘1 𝑞1 𝑣2 𝑘2 𝑞2 𝑣3 𝑘3 𝑞3 𝑣4 𝑘4 𝑞4 𝑎4 𝑎3 𝑎2 𝑎1 <BOS> 潮水 𝛼2,1 𝛼2,2 𝑏2 Many Layers … 退了
  • 26. Generative Pre-Training (GPT) 𝑣1 𝑘1 𝑞1 𝑣2 𝑘2 𝑞2 𝑣3 𝑘3 𝑞3 𝑣4 𝑘4 𝑞4 𝑎4 𝑎3 𝑎2 𝑎1 <BOS> 潮水 𝛼3,2 𝛼3,3 𝑏3 Many Layers … 就 退了 𝛼3,1 就
  • 27. CoQA 𝑑1, 𝑑2, ⋯ , 𝑑𝑁, ”Q:”, 𝑞1, 𝑞2, ⋯ , 𝑞𝑁, “A:” Zero-shot Learning? • Reading Comprehension • Summarization 𝑑1, 𝑑2, ⋯ , 𝑑𝑁,”TL;DR:” • Translation English sentence 1 = French sentence 1 English sentence 2 = French sentence 2 English sentence 3 =
  • 29.
  • 32. Can BERT speak? • Unified Language Model Pre-training for Natural Language Understanding and Generation • https://arxiv.org/abs/1905.03197 • BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model • https://arxiv.org/abs/1902.04094 • Insertion Transformer: Flexible Sequence Generation via Insertion Operations • https://arxiv.org/abs/1902.03249 • Insertion-based Decoding with automatically Inferred Generation Order • https://arxiv.org/abs/1902.01370

Editor's Notes

  1. https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html Contextual Word Representations:  Putting Words into Computers BERT: 這篇不錯 主要是分析 BERT https://zhuanlan.zhihu.com/p/58430637
  2. Examples The :
  3. Examples
  4. Can we compare BERT and ELMO??
  5. (https://arxiv.org/abs/1805.12471)
  6. (https://arxiv.org/abs/1805.12471)
  7. determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.
  8. determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.
  9. determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.
  10. 屠榜
  11. From 擺渡https://zhuanlan.zhihu.com/p/59436589
  12. BERT Rediscovers the Classical NLP Pipeline
  13. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
  14. 93.6M 340M 1542M 40 GB
  15. 展現神蹟
  16. There are more and more examples: https://www.inside.com.tw/article/16479-open-ai-edit-the-avengers https://towardsdatascience.com/openai-gpt-2-writes-alternate-endings-for-game-of-thrones-c9be75cd2425 https://hub.packtpub.com/openai-two-new-versions-and-the-output-dataset-of-gpt-2-out/ https://hackernoon.com/what-i-learned-using-gpt-2-to-write-a-novel-b74a6294c813