SlideShare a Scribd company logo
1 of 14
Comparative Analysis of Transformer based
Pre-Trained NLP Models
Authors : Saurav Singla & Ramachandra N
Abstract:
❖ Transformer based self-supervised pre-trained models have transformed the concept of Transfer learning
in Natural language processing (NLP) using Deep learning approach.
❖ In this project we analyze the performance of self-supervised models for Multi-class Sentiment analysis
on a Non benchmarking dataset.
❖ We used BERT, RoBERTa, & ALBERT models for this study.
❖ We fine-tuned these models on Sentiment analysis with a proposed architecture.
❖ We used f1-score & AUC (Area under ROC curve) score for evaluating model performance.
❖ We found the BERT model with proposed architecture performed well with the highest f1-score of 0.85
followed by RoBERTa (f1-score=0.80), & ALBERT (f1-score=0.78).
❖ This analysis reveals that the BERT model with proposed architecture is best for multi-class sentiment
task on a Non-benchmarking dataset.
Related work:
❖ A concise overview on several large pre-trained language models provided with state-of-the-art results
on benchmark datasets viz. GLUE, RACE, & SQuAD[5].
❖ Cristóbal et al. presented a benchmark comparison of various deep learning architectures such as
Convolutional Neural Networks (CNN) , Long short-term memory (LSTM) recurrent neural networks
and BERT with a Bi-LSTM for the sentiment analysis of drug reviews[6].
❖ Horsuwan et al. systematically compared four modern language models: ULMFiT, ELMo with biLSTM,
OpenAI GPT, and BERT across different dimensions including speed of pretraining and fine-tuning,
perplexity, downstream classification benchmarks, and performance in limited pre training data on Thai
Social Text Categorization[7].
❖ Carlos Aspillaga et al. have performed Stress test evaluation of Transformer based models (RoBERTa,
XLNet, & BERT) in Natural Language Inference (NLI) & Question Answering (QA) tasks with
adversarial-examples[8].
Methodology:
In this section we will explain briefly about the model architecture & dataset used in this task.
Dataset
❖ We used Covid19 tweets dataset, publicly available on Kaggle[10].
❖ The train dataset contains 41157 tweets & test dataset contains 3798 tweets.
❖ There are 5 classes in the sentiment variable viz. Extremely Negative(0), Extremely positive(1),
Negative(2), Neutral(3), & Positive(4).
We used the Pytorch framework for building deep learning models with the help of Hugging face transformers.
Methodology:
Model Architecture
We have proposed architectures for BERT, RoBERTa, & ALBERT models for this study.
BERT
❖ It is a bidirectional transformer, meaning that it uses both left & right contexts in all layers as in Fig 1.
❖ This stands for Bidirectional Encoding Representations from Transformers.
❖ The Fig 2 shows the BERT input representations which includes Token, Segment, & Position
embeddings.
❖ In practice, input embeddings also contain input/attention masks used to differentiate between actual
tokens & padded tokens.
Methodology: BERT
Fig 1. BERT architecture Fig 2. BERT input representation
❖ In our task, we fine tuned the BERT model on preprocessed tweets data using a dropout layer, a hidden
layer, a fully connected layer & a softmax layer for classification on top of BERT embeddings which is
shown in Fig 3.
❖ We have considered bert-base uncased pre-trained model for this task, which has 12 layers, 768 hidden
size, 110 M parameters.
Methodology: Proposed architectures
Fig 3. BERT Fig 4. RoBERTa Fig 5. ALBERT
Methodology: RoBERTa
❖ It is Robustly optimized BERT pre-training approach. This replicates the BERT model by tweaking
hyper parameter settings & increasing training data size.
❖ For this task, we have fine tuned the model on preprocessed tweets data using a dropout layer, a hidden
layer, a fully connected layer & a softmax layer on top of RoBERTa embeddings as shown in Fig 4.
❖ We have chosen distil roberta base pre-trained model, which has 6 layers, 768 hidden size,12 heads, 82
M parameters.
ALBERT
❖ A Light BERT introduced to overcome TPU/GPU limitations & longer training times.
❖ We have fine tuned this model on preprocessed tweets data using a dropout, a fully connected layer &
finally a softmax on top of ALBERT embeddings which is shown in Fig 5.
❖ We have selected albert-base-v2 pre-trained model for this task, which has 12 layers, 768 hidden size,
11 M parameters.
Results & Discussions:
Table 1 shows the Sentiment analysis results for all models & corresponding hyperparameters.
Table 1. Comparison between models
We have kept a constant learning rate (lr) of 2e-5 & sentiment length (Sent len) of 120 for all models by
varying batch size & drop out.
Model f1-score lr dropout batch Sent length
BERT 0.85 2e-5 0.35 8 120
RoBERT
a
0.80 2e-5 0.32 32 120
ALBERT 0.78 2e-5 0.35 8 120
Results & Discussions: BERT
Fig 6. Precision-Recall curve Fig 7. ROC curve
We got the best results for BERT at batch size of 8 & drop out of 0.35.
Results & Discussions: RoBERTa
Fig 8. Precision-Recall curve Fig 9. ROC curve
We achieved best results at batch size of 32 & drop out of 0.32.
Results & Discussions: ALBERT
Fig 8. Precision-Recall curve Fig 9. ROC curve
We got good model performance at batch size of 8 & drop out of 0.35.
Conclusions & Future work:
❖ In this paper, we have fine tuned Transformer based pre-trained models viz., BERT, RoBERTa, &
ALBERT with proposed method for Multiclass Sentiment analysis task on Covid19 tweets dataset.
❖ We obtained the best results for BERT with a high training time (batch size=8). RoBERTa model
achieves acceptable results with less training time (batch size=32). We got reasonable results for
ALBERT with high training time (batch size=8).
❖ From the accuracy point of view the BERT model is the best for Multiclass Sentiment classification on
our dataset following the RoBERTa & ALBERT model. If speed is the main consideration, we
recommend using RoBERTa due to its speed of pretraining and fine-tuning with acceptable results.
❖ This study was conducted at specific batch size & drop out for 5 epochs. So model performance may be
different beyond 5 epochs & for different batch size & drop out.
❖ This work can be carried out in future to investigate how these models perform for different batch sizes &
drop out values.
❖ This work would help to choose the best pre-trained models for Sentiment analysis based on accuracy &
speed.
References:
1. Ashish et al, “Attention is all you need”, 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA pp.5998-
6008, 2017.
2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of deep bidirectional transformers for language
understanding”, In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, , Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171-4186.
3. Yinhan Liu et al, “Roberta: A robustly optimized bert pre training approaches”, 2019, arXiv preprint arXiv:1907.11692.
4. Zhenzhong Lan et al, “Albert: A lite bert for self-supervised learning of language representations”, 2019, arXiv preprint
arXiv:1909.11942.
5. Matthias Aßenmacher, Christian Heumann, “On the comparability of pre-trained language models”, CEUR Workshop Proceedings,
Vol.2624.
6. Cristóbal Colón-Ruiz, Isabel Segura-Bedmar, "Comparing deep learning architectures for sentiment analysis on drug reviews", Journal
of Biomedical Informatics, Volume 110, 2020, 103539, ISSN 1532-0464.
7. Thanapapas Horsuwan, Kasidis Kanwatchara, Peerapon Vateekul, Boonserm Kijsirikul, "A Comparative Study of Pretrained Language
Models on Thai Social Text Categorization", 2019, arXiv:1912.01580v1.
8. Carlos Aspillaga, Andres Carvallo, Vladimir Araujo,"Stress Test Evaluation of Transformer-based Models in Natural Language
Understanding Tasks", Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), European Language
Resources Association (ELRA), Marseille , pp. 1882–1894, 2020.
9. Vishal Shirsat, Rajkumar Jagdale, Kanchan Shende, Sachin N. Deshmukh, Sunil Kawale, “Sentence Level Sentiment Analysis from
News Articles and Blogs using Machine Learning Techniques”, International Journal of Computer Sciences and Engineering, Vol.7,
Issue.5, 2019.
10. Avinash Kumar1 , Savita Sharma , Dinesh Singh, "Sentiment Analysis on Twitter Data using a Hybrid Approach", International Journal
of Computer Sciences and Engineering, Vol.-7, Issue-5, May 2019.
11. Dataset: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification

More Related Content

What's hot

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSubarno Pal
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
 
BERT Explained: What You Need to Know About Google’s New Algorithm
BERT Explained: What You Need to Know About Google’s New AlgorithmBERT Explained: What You Need to Know About Google’s New Algorithm
BERT Explained: What You Need to Know About Google’s New AlgorithmSearch Engine Journal
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 

What's hot (20)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
BERT
BERTBERT
BERT
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
BERT Explained: What You Need to Know About Google’s New Algorithm
BERT Explained: What You Need to Know About Google’s New AlgorithmBERT Explained: What You Need to Know About Google’s New Algorithm
BERT Explained: What You Need to Know About Google’s New Algorithm
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 

Similar to Comparing BERT, RoBERTa & ALBERT for Sentiment Analysis

Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Modelgerogepatton
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxlwz614595250
 
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONadeij1
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024MuhammadAfrizalSepti
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionTakato Hayashi
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGijasuc
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemAssociation for Computational Linguistics
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classificationcsandit
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...cscpconf
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
 

Similar to Comparing BERT, RoBERTa & ALBERT for Sentiment Analysis (20)

Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Model
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptx
 
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognition
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural Network
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 

Comparing BERT, RoBERTa & ALBERT for Sentiment Analysis

  • 1. Comparative Analysis of Transformer based Pre-Trained NLP Models Authors : Saurav Singla & Ramachandra N
  • 2. Abstract: ❖ Transformer based self-supervised pre-trained models have transformed the concept of Transfer learning in Natural language processing (NLP) using Deep learning approach. ❖ In this project we analyze the performance of self-supervised models for Multi-class Sentiment analysis on a Non benchmarking dataset. ❖ We used BERT, RoBERTa, & ALBERT models for this study. ❖ We fine-tuned these models on Sentiment analysis with a proposed architecture. ❖ We used f1-score & AUC (Area under ROC curve) score for evaluating model performance. ❖ We found the BERT model with proposed architecture performed well with the highest f1-score of 0.85 followed by RoBERTa (f1-score=0.80), & ALBERT (f1-score=0.78). ❖ This analysis reveals that the BERT model with proposed architecture is best for multi-class sentiment task on a Non-benchmarking dataset.
  • 3. Related work: ❖ A concise overview on several large pre-trained language models provided with state-of-the-art results on benchmark datasets viz. GLUE, RACE, & SQuAD[5]. ❖ Cristóbal et al. presented a benchmark comparison of various deep learning architectures such as Convolutional Neural Networks (CNN) , Long short-term memory (LSTM) recurrent neural networks and BERT with a Bi-LSTM for the sentiment analysis of drug reviews[6]. ❖ Horsuwan et al. systematically compared four modern language models: ULMFiT, ELMo with biLSTM, OpenAI GPT, and BERT across different dimensions including speed of pretraining and fine-tuning, perplexity, downstream classification benchmarks, and performance in limited pre training data on Thai Social Text Categorization[7]. ❖ Carlos Aspillaga et al. have performed Stress test evaluation of Transformer based models (RoBERTa, XLNet, & BERT) in Natural Language Inference (NLI) & Question Answering (QA) tasks with adversarial-examples[8].
  • 4. Methodology: In this section we will explain briefly about the model architecture & dataset used in this task. Dataset ❖ We used Covid19 tweets dataset, publicly available on Kaggle[10]. ❖ The train dataset contains 41157 tweets & test dataset contains 3798 tweets. ❖ There are 5 classes in the sentiment variable viz. Extremely Negative(0), Extremely positive(1), Negative(2), Neutral(3), & Positive(4). We used the Pytorch framework for building deep learning models with the help of Hugging face transformers.
  • 5. Methodology: Model Architecture We have proposed architectures for BERT, RoBERTa, & ALBERT models for this study. BERT ❖ It is a bidirectional transformer, meaning that it uses both left & right contexts in all layers as in Fig 1. ❖ This stands for Bidirectional Encoding Representations from Transformers. ❖ The Fig 2 shows the BERT input representations which includes Token, Segment, & Position embeddings. ❖ In practice, input embeddings also contain input/attention masks used to differentiate between actual tokens & padded tokens.
  • 6. Methodology: BERT Fig 1. BERT architecture Fig 2. BERT input representation ❖ In our task, we fine tuned the BERT model on preprocessed tweets data using a dropout layer, a hidden layer, a fully connected layer & a softmax layer for classification on top of BERT embeddings which is shown in Fig 3. ❖ We have considered bert-base uncased pre-trained model for this task, which has 12 layers, 768 hidden size, 110 M parameters.
  • 7. Methodology: Proposed architectures Fig 3. BERT Fig 4. RoBERTa Fig 5. ALBERT
  • 8. Methodology: RoBERTa ❖ It is Robustly optimized BERT pre-training approach. This replicates the BERT model by tweaking hyper parameter settings & increasing training data size. ❖ For this task, we have fine tuned the model on preprocessed tweets data using a dropout layer, a hidden layer, a fully connected layer & a softmax layer on top of RoBERTa embeddings as shown in Fig 4. ❖ We have chosen distil roberta base pre-trained model, which has 6 layers, 768 hidden size,12 heads, 82 M parameters. ALBERT ❖ A Light BERT introduced to overcome TPU/GPU limitations & longer training times. ❖ We have fine tuned this model on preprocessed tweets data using a dropout, a fully connected layer & finally a softmax on top of ALBERT embeddings which is shown in Fig 5. ❖ We have selected albert-base-v2 pre-trained model for this task, which has 12 layers, 768 hidden size, 11 M parameters.
  • 9. Results & Discussions: Table 1 shows the Sentiment analysis results for all models & corresponding hyperparameters. Table 1. Comparison between models We have kept a constant learning rate (lr) of 2e-5 & sentiment length (Sent len) of 120 for all models by varying batch size & drop out. Model f1-score lr dropout batch Sent length BERT 0.85 2e-5 0.35 8 120 RoBERT a 0.80 2e-5 0.32 32 120 ALBERT 0.78 2e-5 0.35 8 120
  • 10. Results & Discussions: BERT Fig 6. Precision-Recall curve Fig 7. ROC curve We got the best results for BERT at batch size of 8 & drop out of 0.35.
  • 11. Results & Discussions: RoBERTa Fig 8. Precision-Recall curve Fig 9. ROC curve We achieved best results at batch size of 32 & drop out of 0.32.
  • 12. Results & Discussions: ALBERT Fig 8. Precision-Recall curve Fig 9. ROC curve We got good model performance at batch size of 8 & drop out of 0.35.
  • 13. Conclusions & Future work: ❖ In this paper, we have fine tuned Transformer based pre-trained models viz., BERT, RoBERTa, & ALBERT with proposed method for Multiclass Sentiment analysis task on Covid19 tweets dataset. ❖ We obtained the best results for BERT with a high training time (batch size=8). RoBERTa model achieves acceptable results with less training time (batch size=32). We got reasonable results for ALBERT with high training time (batch size=8). ❖ From the accuracy point of view the BERT model is the best for Multiclass Sentiment classification on our dataset following the RoBERTa & ALBERT model. If speed is the main consideration, we recommend using RoBERTa due to its speed of pretraining and fine-tuning with acceptable results. ❖ This study was conducted at specific batch size & drop out for 5 epochs. So model performance may be different beyond 5 epochs & for different batch size & drop out. ❖ This work can be carried out in future to investigate how these models perform for different batch sizes & drop out values. ❖ This work would help to choose the best pre-trained models for Sentiment analysis based on accuracy & speed.
  • 14. References: 1. Ashish et al, “Attention is all you need”, 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA pp.5998- 6008, 2017. 2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding”, In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171-4186. 3. Yinhan Liu et al, “Roberta: A robustly optimized bert pre training approaches”, 2019, arXiv preprint arXiv:1907.11692. 4. Zhenzhong Lan et al, “Albert: A lite bert for self-supervised learning of language representations”, 2019, arXiv preprint arXiv:1909.11942. 5. Matthias Aßenmacher, Christian Heumann, “On the comparability of pre-trained language models”, CEUR Workshop Proceedings, Vol.2624. 6. Cristóbal Colón-Ruiz, Isabel Segura-Bedmar, "Comparing deep learning architectures for sentiment analysis on drug reviews", Journal of Biomedical Informatics, Volume 110, 2020, 103539, ISSN 1532-0464. 7. Thanapapas Horsuwan, Kasidis Kanwatchara, Peerapon Vateekul, Boonserm Kijsirikul, "A Comparative Study of Pretrained Language Models on Thai Social Text Categorization", 2019, arXiv:1912.01580v1. 8. Carlos Aspillaga, Andres Carvallo, Vladimir Araujo,"Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks", Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), European Language Resources Association (ELRA), Marseille , pp. 1882–1894, 2020. 9. Vishal Shirsat, Rajkumar Jagdale, Kanchan Shende, Sachin N. Deshmukh, Sunil Kawale, “Sentence Level Sentiment Analysis from News Articles and Blogs using Machine Learning Techniques”, International Journal of Computer Sciences and Engineering, Vol.7, Issue.5, 2019. 10. Avinash Kumar1 , Savita Sharma , Dinesh Singh, "Sentiment Analysis on Twitter Data using a Hybrid Approach", International Journal of Computer Sciences and Engineering, Vol.-7, Issue-5, May 2019. 11. Dataset: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification