EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING
VOTING CLASSIFIER (LR-SGD)
Abstract
The proliferation of user-generated content on social media has made opinion mining
an arduous job. As a microblogging platform, Twitter is being used to collect views about
products, trends, and politics. Sentiment analysis is a technique used to analyze the attitude,
emotions and opinions of different people towards anything, and it can be carried out on tweets
to analyze public opinion on news, policies, social movements, and personalities. By
employing Machine Learning models, opinion mining can be performed without reading tweets
manually. Their results could assist governments and businesses in rolling out policies,
products, and events. Seven Machine Learning models are implemented for emotion
recognition by classifying tweets as happy or unhappy. With an in-depth comparative
performance analysis, it was observed that proposed voting classifier (LR-SGD) with TF-IDF
produces the most optimal result with 79% accuracy and 81% F1 score. To further validate
stability of the proposed approach on two more datasets, one binary and other multi-class
dataset and achieved robust results.
Existing System
Automatic emotion recognition, pattern recognition and computer vision have become
significantly important in Artificial Intelligence lately with applications is a wide range of
areas. Recently, social media platforms such as Twitter have generated enormous amounts of
structured, unstructured and semi-structured data. One of the most recent example is COVID-
19 infodemic that shows misinformation in social media can be far more important and
devastating than a disaster such as a pendemic. There is a need to analyze to accurately assign
sentiment classes on a large scale. To perform such tasks, accurate NLP techniques and
machine learning (ML) models for text classification are required. Twitter provides an
opportunity to its users to analyze its data on a large and broader point of view. Efficient
methods are important to automatically label text data due to its noisy nature. In the past many
studies have been performed on Twitter sentiment classification [1]. As Twitter is very fast and
an efficient micro-blogging examination that facilitates the end users to transmit small posts
are said to be tweets. Twitter is a highly demanding app in the world and is a successful
platform in social media. Free account can be created by using Twitter that can provide an
enormous audience potential. With the purpose of business and marketing, Twitter can be
proved as the best platform, through which one can get in touch with very rich and famous
personalities like stars and celebrities, so their purchasing can be very charming for them as
well as for advertisers. Using Twitter, every celebrity is linked with fans as well as to grant a
communication to followers. Such a platform is one of the superlative approaches for lovers as
well. But, it has a short note range; only 140 letters for each post and it can type a post or link
on the website since it hasno cost and also open as the advertisements as well. There is no
problem with clusters of personal ads which are similar to other social networking sites. It is
quick because as a tweet is posted on Twitter, the public who is subsequent to respective
business will get it without delay.
Drawback in Existing System
 Complexity in Model Selection: The choice of models in the voting classifier might
not always be optimal. Selecting the right combination of models can be challenging,
as different models might perform better on different types of data or emotions. It
requires careful experimentation and tuning.
 Computational Complexity: Training multiple models and combining them in a
voting classifier can increase computational complexity and training time, especially if
dealing with large datasets or complex models. This could be a challenge in real-time
applications or systems with limited computational resources.
 Interpretability: Combining multiple models might sacrifice the interpretability of the
classification results. Understanding how the ensemble decision is made can be more
complex compared to a single model, making it harder to explain the predictions.
 Data Imbalance: If the dataset is imbalanced regarding the distribution of different
emotions in the tweets, certain emotions might dominate the classifier's predictions,
leading to biased results.
Proposed System
 Integration: Deploy the trained ensemble model into the application or system for real-
time emotion classification in tweets.
 Ensemble Diversity: Experiment with different combinations of models beyond LR
and SGD to improve diversity in the ensemble.
 Handling Imbalanced Data: Implement strategies to address imbalanced emotion
distributions in the dataset, such as oversampling, under sampling, or using different
evaluation metrics.
 Hyperparameter Tuning: Tune hyperparameters for each individual model within the
ensemble and the ensemble itself, considering regularization parameters, learning rates,
and ensemble weights.
Algorithm
 Monitoring and Updates: Monitor the model's performance over time and update it
periodically with new data to adapt to changing trends and emotions in tweets.
 Individual Model Training: Train Logistic Regression (LR) and Stochastic Gradient
Descent (SGD) classifiers separately using the processed features and labeled emotions.
 Feature Extraction: Convert text data into numerical features using techniques like
TF-IDF, word embeddings (Word2Vec, GloVe), or BERT embeddings.
Advantages
 Ensemble Diversity: Combining different models (LR and SGD) can capture diverse
patterns and improve overall performance. Each model might excel in recognizing
certain types of emotions or textual nuances, leading to a more robust classification.
 Reduction in Overfitting: Ensembles can reduce overfitting by aggregating
predictions from multiple models, minimizing the risk of making decisions based on
noise or biased patterns present in a single model.
 Better Generalization: Ensembles often generalize better to unseen data compared to
individual models. They can achieve higher accuracy by leveraging the strengths of
multiple classifiers, leading to improved predictions on new tweets or datasets.
Software Specification
 Processor : I3 core processor
 Ram : 4 GB
 Hard disk : 500 GB
Software Specification
 Operating System : Windows 10 /11
 Frond End : Java
 Back End : Mysql Server
 IDE Tools : Eclipse

EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER (LR-SGD)

  • 1.
    EMOTION RECOGNITION BYTEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER (LR-SGD) Abstract The proliferation of user-generated content on social media has made opinion mining an arduous job. As a microblogging platform, Twitter is being used to collect views about products, trends, and politics. Sentiment analysis is a technique used to analyze the attitude, emotions and opinions of different people towards anything, and it can be carried out on tweets to analyze public opinion on news, policies, social movements, and personalities. By employing Machine Learning models, opinion mining can be performed without reading tweets manually. Their results could assist governments and businesses in rolling out policies, products, and events. Seven Machine Learning models are implemented for emotion recognition by classifying tweets as happy or unhappy. With an in-depth comparative performance analysis, it was observed that proposed voting classifier (LR-SGD) with TF-IDF produces the most optimal result with 79% accuracy and 81% F1 score. To further validate stability of the proposed approach on two more datasets, one binary and other multi-class dataset and achieved robust results. Existing System Automatic emotion recognition, pattern recognition and computer vision have become significantly important in Artificial Intelligence lately with applications is a wide range of areas. Recently, social media platforms such as Twitter have generated enormous amounts of structured, unstructured and semi-structured data. One of the most recent example is COVID- 19 infodemic that shows misinformation in social media can be far more important and devastating than a disaster such as a pendemic. There is a need to analyze to accurately assign sentiment classes on a large scale. To perform such tasks, accurate NLP techniques and machine learning (ML) models for text classification are required. Twitter provides an opportunity to its users to analyze its data on a large and broader point of view. Efficient methods are important to automatically label text data due to its noisy nature. In the past many studies have been performed on Twitter sentiment classification [1]. As Twitter is very fast and an efficient micro-blogging examination that facilitates the end users to transmit small posts are said to be tweets. Twitter is a highly demanding app in the world and is a successful platform in social media. Free account can be created by using Twitter that can provide an
  • 2.
    enormous audience potential.With the purpose of business and marketing, Twitter can be proved as the best platform, through which one can get in touch with very rich and famous personalities like stars and celebrities, so their purchasing can be very charming for them as well as for advertisers. Using Twitter, every celebrity is linked with fans as well as to grant a communication to followers. Such a platform is one of the superlative approaches for lovers as well. But, it has a short note range; only 140 letters for each post and it can type a post or link on the website since it hasno cost and also open as the advertisements as well. There is no problem with clusters of personal ads which are similar to other social networking sites. It is quick because as a tweet is posted on Twitter, the public who is subsequent to respective business will get it without delay. Drawback in Existing System  Complexity in Model Selection: The choice of models in the voting classifier might not always be optimal. Selecting the right combination of models can be challenging, as different models might perform better on different types of data or emotions. It requires careful experimentation and tuning.  Computational Complexity: Training multiple models and combining them in a voting classifier can increase computational complexity and training time, especially if dealing with large datasets or complex models. This could be a challenge in real-time applications or systems with limited computational resources.  Interpretability: Combining multiple models might sacrifice the interpretability of the classification results. Understanding how the ensemble decision is made can be more complex compared to a single model, making it harder to explain the predictions.  Data Imbalance: If the dataset is imbalanced regarding the distribution of different emotions in the tweets, certain emotions might dominate the classifier's predictions, leading to biased results. Proposed System  Integration: Deploy the trained ensemble model into the application or system for real- time emotion classification in tweets.
  • 3.
     Ensemble Diversity:Experiment with different combinations of models beyond LR and SGD to improve diversity in the ensemble.  Handling Imbalanced Data: Implement strategies to address imbalanced emotion distributions in the dataset, such as oversampling, under sampling, or using different evaluation metrics.  Hyperparameter Tuning: Tune hyperparameters for each individual model within the ensemble and the ensemble itself, considering regularization parameters, learning rates, and ensemble weights. Algorithm  Monitoring and Updates: Monitor the model's performance over time and update it periodically with new data to adapt to changing trends and emotions in tweets.  Individual Model Training: Train Logistic Regression (LR) and Stochastic Gradient Descent (SGD) classifiers separately using the processed features and labeled emotions.  Feature Extraction: Convert text data into numerical features using techniques like TF-IDF, word embeddings (Word2Vec, GloVe), or BERT embeddings. Advantages  Ensemble Diversity: Combining different models (LR and SGD) can capture diverse patterns and improve overall performance. Each model might excel in recognizing certain types of emotions or textual nuances, leading to a more robust classification.  Reduction in Overfitting: Ensembles can reduce overfitting by aggregating predictions from multiple models, minimizing the risk of making decisions based on noise or biased patterns present in a single model.  Better Generalization: Ensembles often generalize better to unseen data compared to individual models. They can achieve higher accuracy by leveraging the strengths of multiple classifiers, leading to improved predictions on new tweets or datasets.
  • 4.
    Software Specification  Processor: I3 core processor  Ram : 4 GB  Hard disk : 500 GB Software Specification  Operating System : Windows 10 /11  Frond End : Java  Back End : Mysql Server  IDE Tools : Eclipse