SlideShare a Scribd company logo
1 of 11
PRACTICAL TEXT
CLASSIFICATION WITH
LARGE PRE-TRAINED
LANGUAGE MODELS
Sravani Raparla
016656601
Abstract Introduction Background Methodology Results Analysis
Multi-emotion sentiment classification is a challenging problem in
natural language processing (NLP) with various real-world
applications. In this study, the authors demonstrate the effectiveness
of combining large-scale unsupervised language modeling with fine-
tuning to address this task, even on difficult datasets with label class
imbalance and domain-specific context.
The authors train an attention-based Transformer network on a substantial
amount of text data (specifically, Amazon reviews) fine-tune the model on the
training set. Their approach achieves a competitive F1 score of 0.69 on the.
SemEval Task 1:E-c multidimensional emotion classification problem, which is
based on the Plutchik wheel of emotions. Notably, the model performs well
on challenging emotion categories such as Fear (0.73), Disgust (0.77), Anger
(0.78), as well as on rare categories like Anticipation (0.42) Surprise (0.37).
 Language models, such as mLSTM and Transformer networks, have shown
impressive performance on academic datasets, surpassing previous state-of-
the-art approaches. However, their performance on practical text classification
tasks using real-world data remains uncertain. In this work, we train mLSTM
and Transformer language models on a large 40GB text dataset and apply
them to binary sentiment analysis and multidimensional emotion classification
tasks. We evaluate our models on both academic datasets and an original
social media dataset, showcasing their performance against state-of-the-art
approaches and commercially available APIs.
 Our models achieve state-of-the-art performance on academic datasets
without domain-specific training or excessive hyper-parameter tuning. On the
social media dataset, they outperform commercially available APIs, even
when re-calibrated to the test set. The Transformer model generally
outperforms mLSTM, especially in fine-tuning for multidimensional emotion
classification. Fine-tuning significantly improves performance on emotion
tasks for both models. We demonstrate that unsupervised language modeling
combined with fine-tuning offers practical solutions for text classification
problems, including those with large class imbalance and human label
disagreement.
 Text classification across domains is challenging due to unknown words,
specialized context, and colloquial language. By training models on a diverse
text corpus, they learn to adapt to different contexts and select relevant
features for emotion classification. Our work shows the effectiveness of
language models in specialized text classification problems and unlocks
possibilities for real-world applications.
PLUTCHIK'S WHEEL OF
EMOTIONS
 Our multidimensional emotion classification
focuses on Plutchik's wheel of emotions, which
has been in use since 1979. This taxonomy aims
to classify human emotions based on four
dualities: Joy - Sadness, Anger - Fear, Trust -
Disgust, and Surprise - Anticipation. According
to the basic emotion model proposed by Ekman
in 2013, although humans experience numerous
emotions, some emotions are considered more
fundamental than others.
 In our comparison with IBM's Watson, a
commercial general-purpose emotion
classification API, we evaluate classification
scores for the emotions of Joy, Sadness, Fear,
Disgust, and Anger. These emotions align with
the categories present in Plutchik's wheel
(shown in Figure 1).
 We used a larger batch size and shorter sequence length (global batch of 512, sequence length of 64 tokens) to train the models
efficiently on tweets, which are short text snippets.
 The language models were trained on the Amazon Reviews dataset, chosen for its rich emotional context.
 We compared two models: mLSTM and Transformer, both known for their state-of-the-art performance in academic NLP
benchmarks.
 Unsupervised pretraining was performed using an encoder-decoder framework to maximize the likelihood of token sequences.
 The models were fine-tuned for emotion classification tasks.
 Example formula: The objective to maximize the likelihood of token sequences can be represented as:
 log p(x0, . . . , xn) = - Σ log p(xt|xt−1, . . . , x0)
 This formula captures the joint probability distribution of sequences, allowing the model to predict the next token given the
preceding ones accurately.
 In summary, our methodology included efficient training, leveraging emotional context, comparing different models, and
utilizing unsupervised pretraining to achieve effective text classification.
RESUL
TS
Binary Sentiment Tweets:
 On the academic SST dataset, the Transformer model performs close to the state-of-the-art but doesn't
exceed it.
 On the company tweets dataset, the Transformer model outperforms the mLSTM and ELMo baselines,
as well as both Watson and Google Sentiment APIs, even after optimal calibration of the API results on
the test set.
Multi-Label Emotion Tweets:
 Comparing our models to Watson on the SemEval dataset and company tweets, our models outperform
Watson on every emotion category, including Anger, Disgust, Fear, Joy, and Sadness.
SemEval Tweets:
 Our finetuned Transformer model achieved the top macro-averaged F1 score among all submissions in
the SemEval Task1:E-C challenge.
 - While our model's micro-average F1 scores and Jaccard Index accuracy are slightly lower, indicating
relatively higher performance on rare and difficult categories, the most common categories of Joy,
Anger, Disgust, and Optimism receive relatively higher F1 scores across all models.
 In summary, our models demonstrate strong performance across various sentiment and emotion
classification tasks, outperforming baselines and commercial sentiment analysis APIs in multiple
scenarios.
1. Classification Performance by Dataset Size:
 The experiment showed that the macro average F1 score is more sensitive to dataset size and
falls more quickly than the micro average F1 score.
 Categories with worse class imbalance benefit more from having a larger training dataset size,
suggesting that more data can substantially improve results for harder categories.
 The difference between single and multihead decoders becomes more pronounced for more
difficult categories and smaller dataset sizes.
2. Dataset Quality and Human Rater Agreement:
 The SemEval dataset and the company tweets dataset labeled using a similar technique
showed reasonably good results, validating the labeling approach.
 Plutchik category labels have large rater disagreement, even among vetted raters, which can
be attributed to the tendency to label "No Emotion" when unsure about a category.
 Datasets with more emotions tend to have higher Plutchik disagreement, possibly due to the
uncertainty of raters when assigning emotions.
3. Difficult Tweets and Challenging Contexts:
 General-purpose APIs may not work well on the company tweets dataset due to the context-
specific nature of emotion classification.
 Examples of disagreements between human raters and the Watson API in the video game
context highlight the challenge of ascribing negative sentiment to terms that are not inherently
negative in that context.
 Training a large unsupervised model and finetuning with a small amount of labeled data
specific to the dataset's context may yield better results.
4. Multiple Softmax Outputs and Transformer Features:
 Training models with multiple softmax outputs can improve performance on language
modeling by capturing a larger number of distinct contexts in the text.
 The Transformer model may also capture features relevant to a wide range of contexts, and
finetuning helps select the most significant features for a specific setting, while disregarding
irrelevant features.
 In summary, the analysis highlights the importance of dataset size, rater agreement, context-
specific challenges, and the potential benefits of using unsupervised models with finetuning to
improve classification performance in challenging text classification tasks.
Analysis :
CONCLUSION:
 This work demonstrates the effectiveness of unsupervised pretraining and finetuning in
tackling difficult text classification tasks. The Transformer network, in particular, showed
remarkable performance when adapting to downstream tasks with noisy labels and specialized
context.
 The framework presented here offers flexibility and ease of customization for text classification
on niche tasks. Unsupervised language modeling on general text datasets without labels allows
for pretraining, while finetuning with a small amount of domain-specific labeled data proves to
be effective in transferring to downstream tasks.
 This approach holds great potential for various practical text classification problems. Similar to
the successes of language modeling and transfer in academic text understanding tasks on the
GLUE Benchmark, it is anticipated that this framework can be applied to a wide range of real-
world text classification challenges.
 By leveraging the power of unsupervised pretraining and fine-tuning, this framework opens
doors for academics and small organizations to tackle text classification problems with limited
labeled data, enabling advancements in diverse domains of natural language processing.
 Al-Rfou, R.; Choe, D.; Constant, N.; Guo, M.; and Jones, L. 2018.
Character-level language modeling with deeper self-attention. CoRR
abs/1808.044449.
 Baziotis, C.; Athanasiou, N.; Chronopoulou, A.; Kolovou, A.;
Paraskevopoulos, G.; Ellinas, N.; Narayanan, S.; and Potamianos,
A. 2018. NTUA-SLP at semeval-2018 task 1: Predicting affective
content in tweets with deep attentive rnns and transfer learning.
CoRR abs/1804.06658.
 Dai, A. M., and Le, Q. V. 2015. Semi-supervised sequence learning.
CoRR abs/1511.01432.
 Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. Bert:
Pre-training of deep bidirectional transformers for language
understanding.
 Ekman, P. 2013. An argument for basic emotions.
 Gray, S.; Radford, A.; and Kingma, D. P. 2017. Gpu kernels for
block-sparse weights.
 Howard, J., and Ruder, S. 2018. Fine-tuned language models for
text classification. CoRR abs/1801.06146.
 Khetan, A.; Lipton, Z. C.; and Anandkumar, A. 2017. Learning from
noisy singly-labeled data. CoRR abs/1712.04577.
 Krause, B.; Lu, L.; Murray, I.; and Renals, S. 2016. Multiplicative
LSTM for sequence modelling. CoRR abs/1609.07959.
DL.pptx

More Related Content

Similar to DL.pptx

Rasa NLU and ML Interpretability
Rasa NLU and ML InterpretabilityRasa NLU and ML Interpretability
Rasa NLU and ML Interpretabilityztopol
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
NLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxNLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxKevinSims18
 
Contextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsContextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsIRJET Journal
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...Shakas Technologies
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Miningmlaij
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitIAESIJAI
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
 
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...IRJET Journal
 
Natural Language Processing .pdf
Natural Language Processing .pdfNatural Language Processing .pdf
Natural Language Processing .pdfAnime196637
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...AIRCC Publishing Corporation
 
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...ijcsit
 

Similar to DL.pptx (20)

Rasa NLU and ML Interpretability
Rasa NLU and ML InterpretabilityRasa NLU and ML Interpretability
Rasa NLU and ML Interpretability
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
NLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxNLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docx
 
Contextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsContextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based Models
 
Doc format.
Doc format.Doc format.
Doc format.
 
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
 
Natural Language Processing .pdf
Natural Language Processing .pdfNatural Language Processing .pdf
Natural Language Processing .pdf
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
 
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

DL.pptx

  • 1. PRACTICAL TEXT CLASSIFICATION WITH LARGE PRE-TRAINED LANGUAGE MODELS Sravani Raparla 016656601
  • 2. Abstract Introduction Background Methodology Results Analysis
  • 3. Multi-emotion sentiment classification is a challenging problem in natural language processing (NLP) with various real-world applications. In this study, the authors demonstrate the effectiveness of combining large-scale unsupervised language modeling with fine- tuning to address this task, even on difficult datasets with label class imbalance and domain-specific context. The authors train an attention-based Transformer network on a substantial amount of text data (specifically, Amazon reviews) fine-tune the model on the training set. Their approach achieves a competitive F1 score of 0.69 on the. SemEval Task 1:E-c multidimensional emotion classification problem, which is based on the Plutchik wheel of emotions. Notably, the model performs well on challenging emotion categories such as Fear (0.73), Disgust (0.77), Anger (0.78), as well as on rare categories like Anticipation (0.42) Surprise (0.37).
  • 4.  Language models, such as mLSTM and Transformer networks, have shown impressive performance on academic datasets, surpassing previous state-of- the-art approaches. However, their performance on practical text classification tasks using real-world data remains uncertain. In this work, we train mLSTM and Transformer language models on a large 40GB text dataset and apply them to binary sentiment analysis and multidimensional emotion classification tasks. We evaluate our models on both academic datasets and an original social media dataset, showcasing their performance against state-of-the-art approaches and commercially available APIs.  Our models achieve state-of-the-art performance on academic datasets without domain-specific training or excessive hyper-parameter tuning. On the social media dataset, they outperform commercially available APIs, even when re-calibrated to the test set. The Transformer model generally outperforms mLSTM, especially in fine-tuning for multidimensional emotion classification. Fine-tuning significantly improves performance on emotion tasks for both models. We demonstrate that unsupervised language modeling combined with fine-tuning offers practical solutions for text classification problems, including those with large class imbalance and human label disagreement.  Text classification across domains is challenging due to unknown words, specialized context, and colloquial language. By training models on a diverse text corpus, they learn to adapt to different contexts and select relevant features for emotion classification. Our work shows the effectiveness of language models in specialized text classification problems and unlocks possibilities for real-world applications.
  • 5. PLUTCHIK'S WHEEL OF EMOTIONS  Our multidimensional emotion classification focuses on Plutchik's wheel of emotions, which has been in use since 1979. This taxonomy aims to classify human emotions based on four dualities: Joy - Sadness, Anger - Fear, Trust - Disgust, and Surprise - Anticipation. According to the basic emotion model proposed by Ekman in 2013, although humans experience numerous emotions, some emotions are considered more fundamental than others.  In our comparison with IBM's Watson, a commercial general-purpose emotion classification API, we evaluate classification scores for the emotions of Joy, Sadness, Fear, Disgust, and Anger. These emotions align with the categories present in Plutchik's wheel (shown in Figure 1).
  • 6.  We used a larger batch size and shorter sequence length (global batch of 512, sequence length of 64 tokens) to train the models efficiently on tweets, which are short text snippets.  The language models were trained on the Amazon Reviews dataset, chosen for its rich emotional context.  We compared two models: mLSTM and Transformer, both known for their state-of-the-art performance in academic NLP benchmarks.  Unsupervised pretraining was performed using an encoder-decoder framework to maximize the likelihood of token sequences.  The models were fine-tuned for emotion classification tasks.  Example formula: The objective to maximize the likelihood of token sequences can be represented as:  log p(x0, . . . , xn) = - Σ log p(xt|xt−1, . . . , x0)  This formula captures the joint probability distribution of sequences, allowing the model to predict the next token given the preceding ones accurately.  In summary, our methodology included efficient training, leveraging emotional context, comparing different models, and utilizing unsupervised pretraining to achieve effective text classification.
  • 7. RESUL TS Binary Sentiment Tweets:  On the academic SST dataset, the Transformer model performs close to the state-of-the-art but doesn't exceed it.  On the company tweets dataset, the Transformer model outperforms the mLSTM and ELMo baselines, as well as both Watson and Google Sentiment APIs, even after optimal calibration of the API results on the test set. Multi-Label Emotion Tweets:  Comparing our models to Watson on the SemEval dataset and company tweets, our models outperform Watson on every emotion category, including Anger, Disgust, Fear, Joy, and Sadness. SemEval Tweets:  Our finetuned Transformer model achieved the top macro-averaged F1 score among all submissions in the SemEval Task1:E-C challenge.  - While our model's micro-average F1 scores and Jaccard Index accuracy are slightly lower, indicating relatively higher performance on rare and difficult categories, the most common categories of Joy, Anger, Disgust, and Optimism receive relatively higher F1 scores across all models.  In summary, our models demonstrate strong performance across various sentiment and emotion classification tasks, outperforming baselines and commercial sentiment analysis APIs in multiple scenarios.
  • 8. 1. Classification Performance by Dataset Size:  The experiment showed that the macro average F1 score is more sensitive to dataset size and falls more quickly than the micro average F1 score.  Categories with worse class imbalance benefit more from having a larger training dataset size, suggesting that more data can substantially improve results for harder categories.  The difference between single and multihead decoders becomes more pronounced for more difficult categories and smaller dataset sizes. 2. Dataset Quality and Human Rater Agreement:  The SemEval dataset and the company tweets dataset labeled using a similar technique showed reasonably good results, validating the labeling approach.  Plutchik category labels have large rater disagreement, even among vetted raters, which can be attributed to the tendency to label "No Emotion" when unsure about a category.  Datasets with more emotions tend to have higher Plutchik disagreement, possibly due to the uncertainty of raters when assigning emotions. 3. Difficult Tweets and Challenging Contexts:  General-purpose APIs may not work well on the company tweets dataset due to the context- specific nature of emotion classification.  Examples of disagreements between human raters and the Watson API in the video game context highlight the challenge of ascribing negative sentiment to terms that are not inherently negative in that context.  Training a large unsupervised model and finetuning with a small amount of labeled data specific to the dataset's context may yield better results. 4. Multiple Softmax Outputs and Transformer Features:  Training models with multiple softmax outputs can improve performance on language modeling by capturing a larger number of distinct contexts in the text.  The Transformer model may also capture features relevant to a wide range of contexts, and finetuning helps select the most significant features for a specific setting, while disregarding irrelevant features.  In summary, the analysis highlights the importance of dataset size, rater agreement, context- specific challenges, and the potential benefits of using unsupervised models with finetuning to improve classification performance in challenging text classification tasks. Analysis :
  • 9. CONCLUSION:  This work demonstrates the effectiveness of unsupervised pretraining and finetuning in tackling difficult text classification tasks. The Transformer network, in particular, showed remarkable performance when adapting to downstream tasks with noisy labels and specialized context.  The framework presented here offers flexibility and ease of customization for text classification on niche tasks. Unsupervised language modeling on general text datasets without labels allows for pretraining, while finetuning with a small amount of domain-specific labeled data proves to be effective in transferring to downstream tasks.  This approach holds great potential for various practical text classification problems. Similar to the successes of language modeling and transfer in academic text understanding tasks on the GLUE Benchmark, it is anticipated that this framework can be applied to a wide range of real- world text classification challenges.  By leveraging the power of unsupervised pretraining and fine-tuning, this framework opens doors for academics and small organizations to tackle text classification problems with limited labeled data, enabling advancements in diverse domains of natural language processing.
  • 10.  Al-Rfou, R.; Choe, D.; Constant, N.; Guo, M.; and Jones, L. 2018. Character-level language modeling with deeper self-attention. CoRR abs/1808.044449.  Baziotis, C.; Athanasiou, N.; Chronopoulou, A.; Kolovou, A.; Paraskevopoulos, G.; Ellinas, N.; Narayanan, S.; and Potamianos, A. 2018. NTUA-SLP at semeval-2018 task 1: Predicting affective content in tweets with deep attentive rnns and transfer learning. CoRR abs/1804.06658.  Dai, A. M., and Le, Q. V. 2015. Semi-supervised sequence learning. CoRR abs/1511.01432.  Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.  Ekman, P. 2013. An argument for basic emotions.  Gray, S.; Radford, A.; and Kingma, D. P. 2017. Gpu kernels for block-sparse weights.  Howard, J., and Ruder, S. 2018. Fine-tuned language models for text classification. CoRR abs/1801.06146.  Khetan, A.; Lipton, Z. C.; and Anandkumar, A. 2017. Learning from noisy singly-labeled data. CoRR abs/1712.04577.  Krause, B.; Lu, L.; Murray, I.; and Renals, S. 2016. Multiplicative LSTM for sequence modelling. CoRR abs/1609.07959.