SlideShare a Scribd company logo
Transformer and BERT
Date: 11.11.2022
RNN vs Transformer
RNN:
Sequence to sequence processing
No parallel computation
Difficult to process longer input sequence. LSTM and GRU solve this problem to some extent
Slow processing
Transformer:
No Sequence to sequence processing
parallel computation
No dependency between the time steps
Faster processing
Transformer components
Need of attention in transformer: preserve the semantics in the input as well as in the output sequence.
English => French
red => rouge
dress => robe
“red dress” => “robe rouge”
Notice how red is before dress in English but rouge is after robe.
Positional encoder: vector that give context based on position of word in a
sentence.
Bob’s dog looks Cute. —--Position 2
Bob looks like a dog. —----position 5
Multi-head attention working
Transfer learning
Step1: Pre-train a model
Step 2: Fine tune for the specific task(be it image or language task)
Transfer learning became the default strategy for computer vision tasks from the
year 2014.People often use models that are pre-trained for image classification on
imagenet.
Transfer learning in Natural language processing
Transfer learning is a powerful tool for natural language processing also. .
Tools using transfer learning in the backend;
1. BERT
2. GPT
3. ELMo
BERT stands for Bidirectional Encoder Representations from Transformers.
It requires least fine tuning also. It is self-supervised model.
After fine-tuning BERT can handle a range of tasks, including:
Sentiment Analysis:
But believe me or not,it is one of the most beautiful and evocative work i have seen. [very
positive]
Identify relevant Documents:
BERT Architecture
BERT is based on transformer
Encoder.
Two versions:
-BERT-BASE: N=12, d=768, h=12,
#parameters=110M
-BERT-LARGE: N=24, d=1024,
h=16, #parameters=340M
N= encoder blocks , d= vector
embedding,h= multi head self-
attention unit
Input Representation
During training, the input contains two input sequences/sentences Sentence A Sentence B.
Sentences are separated by SEP token. SEP determines the start and end of a sentence.
Input sequence always start with classification token CLS. CLS works as a tool to identify a
classification task.
For all the other tokens of the input we will try to compute more informative embeddings of that
token where the context is taken into account
For the CLS token the objective is to obtain an embedding that summarizes the entire sequence
such that we can use it to perform classification on the sequence.
Input embeddings are sum of token,position and sentence embeddings.
Token embedding describe the word itself.
Position embedding describe where the word located in the sequence.
Sentence embedding describes to which sentence the sequence or word belongs
to.
input embedding : Token embedding + Position embedding +sentence embedding
Ex: input embedding for ‘great’ : Egreat+E5+EA
‘Food’ : Efood+E9+EB
BERT Pre-training
BERT has two pre-training tasks.
1st Task: Masked word prediction
-15% of words are masked.
-predict masked word
2nd task:
Select Sentence B as the sentence after A with 50% probability.
Use CLS output classify B as next/other sentence.
Fine-tuning BERT
Collect Annotated dataset.
For sequence level classification tasks:
Use output from CLS token for classification.
Add single layer FFN(feed forward network)
Fine-tune end to end by applying cross entropy loss.

More Related Content

Similar to BERT.pptx

MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Massimo Schenone
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Association for Computational Linguistics
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
Suman Debnath
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Codemotion
 
The future of DSLs - functions and formal methods
The future of DSLs - functions and formal methodsThe future of DSLs - functions and formal methods
The future of DSLs - functions and formal methods
Markus Voelter
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
Huali Zhao
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
H K Yoon
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
MeetupDataScienceRoma
 
Deepak Soni,BCA 2nd Year
Deepak Soni,BCA 2nd YearDeepak Soni,BCA 2nd Year
Deepak Soni,BCA 2nd Year
dezyneecole
 
Pinkle Makhijani ,BCA 2nd Year
Pinkle  Makhijani ,BCA 2nd YearPinkle  Makhijani ,BCA 2nd Year
Pinkle Makhijani ,BCA 2nd Year
dezyneecole
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
Suman Debnath
 
Pooja Sharma ,BCA 2nd Year
Pooja Sharma ,BCA 2nd YearPooja Sharma ,BCA 2nd Year
Pooja Sharma ,BCA 2nd Year
dezyneecole
 
Ronak Kachhawa,BCA 2nd Year
Ronak Kachhawa,BCA 2nd YearRonak Kachhawa,BCA 2nd Year
Ronak Kachhawa,BCA 2nd Year
dezyneecole
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
Ramya Nellutla
 
C programming language tutorial for beginers.pdf
C programming language tutorial for beginers.pdfC programming language tutorial for beginers.pdf
C programming language tutorial for beginers.pdf
ComedyTechnology
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
Jayavardhan Reddy Peddamail
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
Unit 2 python
Unit 2 pythonUnit 2 python
Unit 2 python
praveena p
 
Deepika Mittal,BCA ,2nd Year
Deepika Mittal,BCA ,2nd Year Deepika Mittal,BCA ,2nd Year
Deepika Mittal,BCA ,2nd Year
dezyneecole
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
indico data
 

Similar to BERT.pptx (20)

MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
 
The future of DSLs - functions and formal methods
The future of DSLs - functions and formal methodsThe future of DSLs - functions and formal methods
The future of DSLs - functions and formal methods
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Deepak Soni,BCA 2nd Year
Deepak Soni,BCA 2nd YearDeepak Soni,BCA 2nd Year
Deepak Soni,BCA 2nd Year
 
Pinkle Makhijani ,BCA 2nd Year
Pinkle  Makhijani ,BCA 2nd YearPinkle  Makhijani ,BCA 2nd Year
Pinkle Makhijani ,BCA 2nd Year
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
 
Pooja Sharma ,BCA 2nd Year
Pooja Sharma ,BCA 2nd YearPooja Sharma ,BCA 2nd Year
Pooja Sharma ,BCA 2nd Year
 
Ronak Kachhawa,BCA 2nd Year
Ronak Kachhawa,BCA 2nd YearRonak Kachhawa,BCA 2nd Year
Ronak Kachhawa,BCA 2nd Year
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
C programming language tutorial for beginers.pdf
C programming language tutorial for beginers.pdfC programming language tutorial for beginers.pdf
C programming language tutorial for beginers.pdf
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Unit 2 python
Unit 2 pythonUnit 2 python
Unit 2 python
 
Deepika Mittal,BCA ,2nd Year
Deepika Mittal,BCA ,2nd Year Deepika Mittal,BCA ,2nd Year
Deepika Mittal,BCA ,2nd Year
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 

Recently uploaded

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 

Recently uploaded (20)

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 

BERT.pptx

  • 2. RNN vs Transformer RNN: Sequence to sequence processing No parallel computation Difficult to process longer input sequence. LSTM and GRU solve this problem to some extent Slow processing Transformer: No Sequence to sequence processing parallel computation No dependency between the time steps Faster processing
  • 4. Need of attention in transformer: preserve the semantics in the input as well as in the output sequence. English => French red => rouge dress => robe “red dress” => “robe rouge” Notice how red is before dress in English but rouge is after robe.
  • 5. Positional encoder: vector that give context based on position of word in a sentence. Bob’s dog looks Cute. —--Position 2 Bob looks like a dog. —----position 5
  • 7. Transfer learning Step1: Pre-train a model Step 2: Fine tune for the specific task(be it image or language task) Transfer learning became the default strategy for computer vision tasks from the year 2014.People often use models that are pre-trained for image classification on imagenet.
  • 8. Transfer learning in Natural language processing Transfer learning is a powerful tool for natural language processing also. . Tools using transfer learning in the backend; 1. BERT 2. GPT 3. ELMo BERT stands for Bidirectional Encoder Representations from Transformers. It requires least fine tuning also. It is self-supervised model.
  • 9. After fine-tuning BERT can handle a range of tasks, including: Sentiment Analysis: But believe me or not,it is one of the most beautiful and evocative work i have seen. [very positive] Identify relevant Documents:
  • 10. BERT Architecture BERT is based on transformer Encoder. Two versions: -BERT-BASE: N=12, d=768, h=12, #parameters=110M -BERT-LARGE: N=24, d=1024, h=16, #parameters=340M N= encoder blocks , d= vector embedding,h= multi head self- attention unit
  • 12. During training, the input contains two input sequences/sentences Sentence A Sentence B. Sentences are separated by SEP token. SEP determines the start and end of a sentence. Input sequence always start with classification token CLS. CLS works as a tool to identify a classification task. For all the other tokens of the input we will try to compute more informative embeddings of that token where the context is taken into account For the CLS token the objective is to obtain an embedding that summarizes the entire sequence such that we can use it to perform classification on the sequence. Input embeddings are sum of token,position and sentence embeddings.
  • 13. Token embedding describe the word itself. Position embedding describe where the word located in the sequence. Sentence embedding describes to which sentence the sequence or word belongs to. input embedding : Token embedding + Position embedding +sentence embedding Ex: input embedding for ‘great’ : Egreat+E5+EA ‘Food’ : Efood+E9+EB
  • 14. BERT Pre-training BERT has two pre-training tasks. 1st Task: Masked word prediction -15% of words are masked. -predict masked word
  • 15. 2nd task: Select Sentence B as the sentence after A with 50% probability. Use CLS output classify B as next/other sentence.
  • 16. Fine-tuning BERT Collect Annotated dataset. For sequence level classification tasks: Use output from CLS token for classification. Add single layer FFN(feed forward network) Fine-tune end to end by applying cross entropy loss.