We apply Random Forest, Long Short-Term Memory (LSTM), Bi-directional Long Short-Term Memory (Bi-LSTM), Bi-directional Gated Recurrent Units (GRUs), and Bidirectional Encoder Representations from Transformers (BERT) for Twitter sentiment classification.
2. Introduction
• Twitter, One of the realtime feedback platform for any type of products, issues or other topics which are
used as getting the opinions of users based on the comments or tweet posted on behalf of respective
topics.
• These tweets comprises of text data with various types of emotics or meaningless words.
• Lots of preprocessing along with vectorization method are applied.
• Multi-class classification sentiment are done to get various sentiment to get wide knowledge of the
sentiments.
3. Problem
• Finding a sentiment of a text data is it’s self a challenging task as we used have large number of
feedback/text data and we can’t find its sentiment manually.
• Positive, Neutral and Negative doesn’t provide the enough information about a subject (products, text, data
etc.)
• In this case, tweet data (text) contains more number of sentiments (Sadness, Boredom, Neutral, Worry,
Surprise, Love, Fun, Hate, Happiness, Anger, Relief) which seems a challenging task.
• Classifying this multi-class problem and finding the real sentiment of the tweet data (text data) provides us
real and genuine sentiment against the product.
4. Business Needs
• To process more and more new products in market, companies need real time feedback against their
products.
• These feedbacks contain large number of data and finding sentiment manually is tough task.
• With the Natural Language Processing (NLP), we process the opinion and find the sentiment of the data.
• Based on these sentiment, We can take various decisions like change in the advertisement program, geo
based marketing, increment of supply chain and many more business centric decision are made.
5. Approach
• We apply Random Forest, along with many deep learning framework such as LSTM, Bi-LSTM, GRUs, and
BERT to classify the text data into various sentiments.
• To convert these text data to vector form, different vectorization techniques such as tf-idf, word2vec, and
glove are after done after some essential steps of text preprocessing and then these vectors are fed into
Machine and deep learning models.
• Text preprocessing steps include
• Lowercase
• Remove punctuation, URLs and handles
• Removing stop word
• Stemming
• Tokenize sentence
7. Solution
• We will follow the last model –M5 named as BERT model which gives the best sentiment among all the
approaches we applied given in approach section.
• This approach tends to classify the test data with higher accuracy of 40% on multi-class sentiment data.
8. Benefits
• The final outcomes of this work will be as follows :
• We don’t have to rely on the polarity i.e. Positive, Neutral and Negative sentiment.
• Will provide information about the various types sentiments/opinions related the product.
• Decision making process will become easier as we will have wide knowledge of the opinions of the
users.
• Analyzation time will be decreases.
9. Results
• Overall accuracy of 40% is recorded during the testing of our model – M5 and provide a good knowledge
related to opinions of the users and which will definitely help to take various types of decisions.
• These decisions help to increase the business in smooth manner with user centric way.