SlideShare a Scribd company logo
1 of 14
Comparative Analysis of Transformer based
Pre-Trained NLP Models
Authors : Saurav Singla & Ramachandra N
Abstract:
❖ Transformer based self-supervised pre-trained models have transformed the concept of Transfer learning
in Natural language processing (NLP) using Deep learning approach.
❖ In this project we analyze the performance of self-supervised models for Multi-class Sentiment analysis
on a Non benchmarking dataset.
❖ We used BERT, RoBERTa, & ALBERT models for this study.
❖ We fine-tuned these models on Sentiment analysis with a proposed architecture.
❖ We used f1-score & AUC (Area under ROC curve) score for evaluating model performance.
❖ We found the BERT model with proposed architecture performed well with the highest f1-score of 0.85
followed by RoBERTa (f1-score=0.80), & ALBERT (f1-score=0.78).
❖ This analysis reveals that the BERT model with proposed architecture is best for multi-class sentiment
task on a Non-benchmarking dataset.
Related work:
❖ A concise overview on several large pre-trained language models provided with state-of-the-art results
on benchmark datasets viz. GLUE, RACE, & SQuAD[5].
❖ Cristóbal et al. presented a benchmark comparison of various deep learning architectures such as
Convolutional Neural Networks (CNN) , Long short-term memory (LSTM) recurrent neural networks
and BERT with a Bi-LSTM for the sentiment analysis of drug reviews[6].
❖ Horsuwan et al. systematically compared four modern language models: ULMFiT, ELMo with biLSTM,
OpenAI GPT, and BERT across different dimensions including speed of pretraining and fine-tuning,
perplexity, downstream classification benchmarks, and performance in limited pre training data on Thai
Social Text Categorization[7].
❖ Carlos Aspillaga et al. have performed Stress test evaluation of Transformer based models (RoBERTa,
XLNet, & BERT) in Natural Language Inference (NLI) & Question Answering (QA) tasks with
adversarial-examples[8].
Methodology:
In this section we will explain briefly about the model architecture & dataset used in this task.
Dataset
❖ We used Covid19 tweets dataset, publicly available on Kaggle[10].
❖ The train dataset contains 41157 tweets & test dataset contains 3798 tweets.
❖ There are 5 classes in the sentiment variable viz. Extremely Negative(0), Extremely positive(1),
Negative(2), Neutral(3), & Positive(4).
We used the Pytorch framework for building deep learning models with the help of Hugging face transformers.
Methodology:
Model Architecture
We have proposed architectures for BERT, RoBERTa, & ALBERT models for this study.
BERT
❖ It is a bidirectional transformer, meaning that it uses both left & right contexts in all layers as in Fig 1.
❖ This stands for Bidirectional Encoding Representations from Transformers.
❖ The Fig 2 shows the BERT input representations which includes Token, Segment, & Position
embeddings.
❖ In practice, input embeddings also contain input/attention masks used to differentiate between actual
tokens & padded tokens.
Methodology: BERT
Fig 1. BERT architecture Fig 2. BERT input representation
❖ In our task, we fine tuned the BERT model on preprocessed tweets data using a dropout layer, a hidden
layer, a fully connected layer & a softmax layer for classification on top of BERT embeddings which is
shown in Fig 3.
❖ We have considered bert-base uncased pre-trained model for this task, which has 12 layers, 768 hidden
size, 110 M parameters.
Methodology: Proposed architectures
Fig 3. BERT Fig 4. RoBERTa Fig 5. ALBERT
Methodology: RoBERTa
❖ It is Robustly optimized BERT pre-training approach. This replicates the BERT model by tweaking
hyper parameter settings & increasing training data size.
❖ For this task, we have fine tuned the model on preprocessed tweets data using a dropout layer, a hidden
layer, a fully connected layer & a softmax layer on top of RoBERTa embeddings as shown in Fig 4.
❖ We have chosen distil roberta base pre-trained model, which has 6 layers, 768 hidden size,12 heads, 82
M parameters.
ALBERT
❖ A Light BERT introduced to overcome TPU/GPU limitations & longer training times.
❖ We have fine tuned this model on preprocessed tweets data using a dropout, a fully connected layer &
finally a softmax on top of ALBERT embeddings which is shown in Fig 5.
❖ We have selected albert-base-v2 pre-trained model for this task, which has 12 layers, 768 hidden size,
11 M parameters.
Results & Discussions:
Table 1 shows the Sentiment analysis results for all models & corresponding hyperparameters.
Table 1. Comparison between models
We have kept a constant learning rate (lr) of 2e-5 & sentiment length (Sent len) of 120 for all models by
varying batch size & drop out.
Model f1-score lr dropout batch Sent length
BERT 0.85 2e-5 0.35 8 120
RoBERT
a
0.80 2e-5 0.32 32 120
ALBERT 0.78 2e-5 0.35 8 120
Results & Discussions: BERT
Fig 6. Precision-Recall curve Fig 7. ROC curve
We got the best results for BERT at batch size of 8 & drop out of 0.35.
Results & Discussions: RoBERTa
Fig 8. Precision-Recall curve Fig 9. ROC curve
We achieved best results at batch size of 32 & drop out of 0.32.
Results & Discussions: ALBERT
Fig 8. Precision-Recall curve Fig 9. ROC curve
We got good model performance at batch size of 8 & drop out of 0.35.
Conclusions & Future work:
❖ In this paper, we have fine tuned Transformer based pre-trained models viz., BERT, RoBERTa, &
ALBERT with proposed method for Multiclass Sentiment analysis task on Covid19 tweets dataset.
❖ We obtained the best results for BERT with a high training time (batch size=8). RoBERTa model
achieves acceptable results with less training time (batch size=32). We got reasonable results for
ALBERT with high training time (batch size=8).
❖ From the accuracy point of view the BERT model is the best for Multiclass Sentiment classification on
our dataset following the RoBERTa & ALBERT model. If speed is the main consideration, we
recommend using RoBERTa due to its speed of pretraining and fine-tuning with acceptable results.
❖ This study was conducted at specific batch size & drop out for 5 epochs. So model performance may be
different beyond 5 epochs & for different batch size & drop out.
❖ This work can be carried out in future to investigate how these models perform for different batch sizes &
drop out values.
❖ This work would help to choose the best pre-trained models for Sentiment analysis based on accuracy &
speed.
References:
1. Ashish et al, “Attention is all you need”, 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA pp.5998-
6008, 2017.
2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of deep bidirectional transformers for language
understanding”, In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, , Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171-4186.
3. Yinhan Liu et al, “Roberta: A robustly optimized bert pre training approaches”, 2019, arXiv preprint arXiv:1907.11692.
4. Zhenzhong Lan et al, “Albert: A lite bert for self-supervised learning of language representations”, 2019, arXiv preprint
arXiv:1909.11942.
5. Matthias Aßenmacher, Christian Heumann, “On the comparability of pre-trained language models”, CEUR Workshop Proceedings,
Vol.2624.
6. CristĂłbal ColĂłn-Ruiz, Isabel Segura-Bedmar, "Comparing deep learning architectures for sentiment analysis on drug reviews", Journal
of Biomedical Informatics, Volume 110, 2020, 103539, ISSN 1532-0464.
7. Thanapapas Horsuwan, Kasidis Kanwatchara, Peerapon Vateekul, Boonserm Kijsirikul, "A Comparative Study of Pretrained Language
Models on Thai Social Text Categorization", 2019, arXiv:1912.01580v1.
8. Carlos Aspillaga, Andres Carvallo, Vladimir Araujo,"Stress Test Evaluation of Transformer-based Models in Natural Language
Understanding Tasks", Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), European Language
Resources Association (ELRA), Marseille , pp. 1882–1894, 2020.
9. Vishal Shirsat, Rajkumar Jagdale, Kanchan Shende, Sachin N. Deshmukh, Sunil Kawale, “Sentence Level Sentiment Analysis from
News Articles and Blogs using Machine Learning Techniques”, International Journal of Computer Sciences and Engineering, Vol.7,
Issue.5, 2019.
10. Avinash Kumar1 , Savita Sharma , Dinesh Singh, "Sentiment Analysis on Twitter Data using a Hybrid Approach", International Journal
of Computer Sciences and Engineering, Vol.-7, Issue-5, May 2019.
11. Dataset: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification

More Related Content

What's hot

Transformers
TransformersTransformers
TransformersAnup Joseph
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AIBenjaminlapid1
 
Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AIMostafa Elsheikh
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptxRahulKumar854607
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedIllia Polosukhin
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 

What's hot (20)

Transformers
TransformersTransformers
Transformers
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Bert
BertBert
Bert
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
BERT
BERTBERT
BERT
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AI
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 

Similar to Comparing BERT, RoBERTa & ALBERT for Sentiment Analysis

Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Modelgerogepatton
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxlwz614595250
 
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONadeij1
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024MuhammadAfrizalSepti
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionTakato Hayashi
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGijasuc
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemAssociation for Computational Linguistics
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classificationcsandit
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...cscpconf
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
 

Similar to Comparing BERT, RoBERTa & ALBERT for Sentiment Analysis (20)

Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Model
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptx
 
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognition
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural Network
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 

Recently uploaded

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Comparing BERT, RoBERTa & ALBERT for Sentiment Analysis

  • 1. Comparative Analysis of Transformer based Pre-Trained NLP Models Authors : Saurav Singla & Ramachandra N
  • 2. Abstract: ❖ Transformer based self-supervised pre-trained models have transformed the concept of Transfer learning in Natural language processing (NLP) using Deep learning approach. ❖ In this project we analyze the performance of self-supervised models for Multi-class Sentiment analysis on a Non benchmarking dataset. ❖ We used BERT, RoBERTa, & ALBERT models for this study. ❖ We fine-tuned these models on Sentiment analysis with a proposed architecture. ❖ We used f1-score & AUC (Area under ROC curve) score for evaluating model performance. ❖ We found the BERT model with proposed architecture performed well with the highest f1-score of 0.85 followed by RoBERTa (f1-score=0.80), & ALBERT (f1-score=0.78). ❖ This analysis reveals that the BERT model with proposed architecture is best for multi-class sentiment task on a Non-benchmarking dataset.
  • 3. Related work: ❖ A concise overview on several large pre-trained language models provided with state-of-the-art results on benchmark datasets viz. GLUE, RACE, & SQuAD[5]. ❖ CristĂłbal et al. presented a benchmark comparison of various deep learning architectures such as Convolutional Neural Networks (CNN) , Long short-term memory (LSTM) recurrent neural networks and BERT with a Bi-LSTM for the sentiment analysis of drug reviews[6]. ❖ Horsuwan et al. systematically compared four modern language models: ULMFiT, ELMo with biLSTM, OpenAI GPT, and BERT across different dimensions including speed of pretraining and fine-tuning, perplexity, downstream classification benchmarks, and performance in limited pre training data on Thai Social Text Categorization[7]. ❖ Carlos Aspillaga et al. have performed Stress test evaluation of Transformer based models (RoBERTa, XLNet, & BERT) in Natural Language Inference (NLI) & Question Answering (QA) tasks with adversarial-examples[8].
  • 4. Methodology: In this section we will explain briefly about the model architecture & dataset used in this task. Dataset ❖ We used Covid19 tweets dataset, publicly available on Kaggle[10]. ❖ The train dataset contains 41157 tweets & test dataset contains 3798 tweets. ❖ There are 5 classes in the sentiment variable viz. Extremely Negative(0), Extremely positive(1), Negative(2), Neutral(3), & Positive(4). We used the Pytorch framework for building deep learning models with the help of Hugging face transformers.
  • 5. Methodology: Model Architecture We have proposed architectures for BERT, RoBERTa, & ALBERT models for this study. BERT ❖ It is a bidirectional transformer, meaning that it uses both left & right contexts in all layers as in Fig 1. ❖ This stands for Bidirectional Encoding Representations from Transformers. ❖ The Fig 2 shows the BERT input representations which includes Token, Segment, & Position embeddings. ❖ In practice, input embeddings also contain input/attention masks used to differentiate between actual tokens & padded tokens.
  • 6. Methodology: BERT Fig 1. BERT architecture Fig 2. BERT input representation ❖ In our task, we fine tuned the BERT model on preprocessed tweets data using a dropout layer, a hidden layer, a fully connected layer & a softmax layer for classification on top of BERT embeddings which is shown in Fig 3. ❖ We have considered bert-base uncased pre-trained model for this task, which has 12 layers, 768 hidden size, 110 M parameters.
  • 7. Methodology: Proposed architectures Fig 3. BERT Fig 4. RoBERTa Fig 5. ALBERT
  • 8. Methodology: RoBERTa ❖ It is Robustly optimized BERT pre-training approach. This replicates the BERT model by tweaking hyper parameter settings & increasing training data size. ❖ For this task, we have fine tuned the model on preprocessed tweets data using a dropout layer, a hidden layer, a fully connected layer & a softmax layer on top of RoBERTa embeddings as shown in Fig 4. ❖ We have chosen distil roberta base pre-trained model, which has 6 layers, 768 hidden size,12 heads, 82 M parameters. ALBERT ❖ A Light BERT introduced to overcome TPU/GPU limitations & longer training times. ❖ We have fine tuned this model on preprocessed tweets data using a dropout, a fully connected layer & finally a softmax on top of ALBERT embeddings which is shown in Fig 5. ❖ We have selected albert-base-v2 pre-trained model for this task, which has 12 layers, 768 hidden size, 11 M parameters.
  • 9. Results & Discussions: Table 1 shows the Sentiment analysis results for all models & corresponding hyperparameters. Table 1. Comparison between models We have kept a constant learning rate (lr) of 2e-5 & sentiment length (Sent len) of 120 for all models by varying batch size & drop out. Model f1-score lr dropout batch Sent length BERT 0.85 2e-5 0.35 8 120 RoBERT a 0.80 2e-5 0.32 32 120 ALBERT 0.78 2e-5 0.35 8 120
  • 10. Results & Discussions: BERT Fig 6. Precision-Recall curve Fig 7. ROC curve We got the best results for BERT at batch size of 8 & drop out of 0.35.
  • 11. Results & Discussions: RoBERTa Fig 8. Precision-Recall curve Fig 9. ROC curve We achieved best results at batch size of 32 & drop out of 0.32.
  • 12. Results & Discussions: ALBERT Fig 8. Precision-Recall curve Fig 9. ROC curve We got good model performance at batch size of 8 & drop out of 0.35.
  • 13. Conclusions & Future work: ❖ In this paper, we have fine tuned Transformer based pre-trained models viz., BERT, RoBERTa, & ALBERT with proposed method for Multiclass Sentiment analysis task on Covid19 tweets dataset. ❖ We obtained the best results for BERT with a high training time (batch size=8). RoBERTa model achieves acceptable results with less training time (batch size=32). We got reasonable results for ALBERT with high training time (batch size=8). ❖ From the accuracy point of view the BERT model is the best for Multiclass Sentiment classification on our dataset following the RoBERTa & ALBERT model. If speed is the main consideration, we recommend using RoBERTa due to its speed of pretraining and fine-tuning with acceptable results. ❖ This study was conducted at specific batch size & drop out for 5 epochs. So model performance may be different beyond 5 epochs & for different batch size & drop out. ❖ This work can be carried out in future to investigate how these models perform for different batch sizes & drop out values. ❖ This work would help to choose the best pre-trained models for Sentiment analysis based on accuracy & speed.
  • 14. References: 1. Ashish et al, “Attention is all you need”, 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA pp.5998- 6008, 2017. 2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding”, In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171-4186. 3. Yinhan Liu et al, “Roberta: A robustly optimized bert pre training approaches”, 2019, arXiv preprint arXiv:1907.11692. 4. Zhenzhong Lan et al, “Albert: A lite bert for self-supervised learning of language representations”, 2019, arXiv preprint arXiv:1909.11942. 5. Matthias Aßenmacher, Christian Heumann, “On the comparability of pre-trained language models”, CEUR Workshop Proceedings, Vol.2624. 6. CristĂłbal ColĂłn-Ruiz, Isabel Segura-Bedmar, "Comparing deep learning architectures for sentiment analysis on drug reviews", Journal of Biomedical Informatics, Volume 110, 2020, 103539, ISSN 1532-0464. 7. Thanapapas Horsuwan, Kasidis Kanwatchara, Peerapon Vateekul, Boonserm Kijsirikul, "A Comparative Study of Pretrained Language Models on Thai Social Text Categorization", 2019, arXiv:1912.01580v1. 8. Carlos Aspillaga, Andres Carvallo, Vladimir Araujo,"Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks", Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), European Language Resources Association (ELRA), Marseille , pp. 1882–1894, 2020. 9. Vishal Shirsat, Rajkumar Jagdale, Kanchan Shende, Sachin N. Deshmukh, Sunil Kawale, “Sentence Level Sentiment Analysis from News Articles and Blogs using Machine Learning Techniques”, International Journal of Computer Sciences and Engineering, Vol.7, Issue.5, 2019. 10. Avinash Kumar1 , Savita Sharma , Dinesh Singh, "Sentiment Analysis on Twitter Data using a Hybrid Approach", International Journal of Computer Sciences and Engineering, Vol.-7, Issue-5, May 2019. 11. Dataset: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification