Twitter is an immensely popular social media platform that boasts an extensive collection of data in the form of tweets. With tweets being generated in various languages, they convey diverse sentiments, including positive, negative, or neutral tones. The intimacy score associated with each tweet reflects the closeness or familiarity expressed within the message. To delve into this intriguing aspect, the project titled "Multilingual tweet intimacy analysis using bidirectional LSTM" presents a comprehensive study that focuses on analyzing the intimacy levels of multilingual tweets. The primary objective of this study is to develop an automated method capable of determining the intimacy levels of tweets across different languages. The training process encompasses a wide range of datasets from languages such as English, Portuguese, Chinese, French, and Italian. By incorporating a combination of features, including linguistic and sentimental attributes, the proposed approach aims to predict the intimacy level of tweets in English, Spanish, and Arabic languages. To achieve this, the study harnesses the power of Long Short-Term Memory (LSTM) networks, a type of recurrent neural network (RNN) that excels in capturing and understanding long-term dependencies within a sequence of data. By leveraging LSTMs, the model can effectively grasp the context and relationships between words in a tweet, thus improving the accuracy of the trained model. The process begins by preprocessing the multilingual tweet data, which involves tasks such as tokenization, normalization, and removing noise or irrelevant information. Subsequently, the extracted features, which encompass both linguistic and sentimental aspects, are carefully engineered to represent the underlying characteristics of each tweet. The bidirectional LSTM architecture is then employed to learn the intricate patterns and dependencies within the tweet data. This architecture consists of two LSTM layers that process the input sequence in both forward and backward directions. This bidirectional nature allows the model to capture not only the current word's context but also the context of preceding and succeeding words, enabling a more comprehensive understanding of the tweet's overall sentiment and intimacy level.
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
Multilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptx
1. Multilingual Tweet Intimacy
Analysis using Bidirectional LSTM
By Team 10
Matrim Pathak (22MAI0029)
Arkapriya Gupta (22MAI0039)
Samim Aktar (22MAI0047)
Ankita Biswas (22MAI0050)
Romak Das (22MAI0056)
Purbayan Pal (22MAI0064)
Under the guidance of
Dr. Sathyaraj R
SCOPE
VIT, Vellore
2. Abstract
Twitter is an immensely popular social media platform that boasts an extensive collection of data in the form of tweets.
With tweets being generated in various languages, they convey diverse sentiments, including positive, negative, or
neutral tones. The intimacy score associated with each tweet reflects the closeness or familiarity expressed within the
message. To delve into this intriguing aspect, the project titled "Multilingual tweet intimacy analysis using bidirectional
LSTM" presents a comprehensive study that focuses on analyzing the intimacy levels of multilingual tweets. The primary
objective of this study is to develop an automated method capable of determining the intimacy levels of tweets across
different languages. The training process encompasses a wide range of datasets from languages such as English,
Portuguese, Chinese, French, and Italian. By incorporating a combination of features, including linguistic and
sentimental attributes, the proposed approach aims to predict the intimacy level of tweets in English, Spanish, and
Arabic languages. To achieve this, the study harnesses the power of Long Short-Term Memory (LSTM) networks, a type
of recurrent neural network (RNN) that excels in capturing and understanding long-term dependencies within a
sequence of data. By leveraging LSTMs, the model can effectively grasp the context and relationships between words in
a tweet, thus improving the accuracy of the trained model. The process begins by preprocessing the multilingual tweet
data, which involves tasks such as tokenization, normalization, and removing noise or irrelevant information.
Subsequently, the extracted features, which encompass both linguistic and sentimental aspects, are carefully
engineered to represent the underlying characteristics of each tweet. The bidirectional LSTM architecture is then
employed to learn the intricate patterns and dependencies within the tweet data. This architecture consists of two LSTM
layers that process the input sequence in both forward and backward directions. This bidirectional nature allows the
model to capture not only the current word's context but also the context of preceding and succeeding words, enabling
a more comprehensive understanding of the tweet's overall sentiment and intimacy level.
3. Introduction
● Social media has become a primary outlet for people to share their innermost
ideas and feelings.
● Multilingual tweet intimacy is a relatively new area of research.
● NLP techniques can be used to analyze tweet intimacy.
● Analyzing tweet intimacy can provide valuable insights into user behavior,
interpersonal relationships, and emotional states.
● The goal of this research is to develop a multilingual tweet intimacy analysis
system using NLP techniques.
● The system will automatically categorize tweets into different intimacy levels,
ranging from low to high.
● The system can be used in a variety of applications, such as marketing,
mental health, and social research.
4. Methodology
● Dataset Analysis
● Preprocessing of the data
● Vectorizing the data using
CountVectorizer
● Algorithm/Neural Network used:
Bidirectional LSTM
● Usefulness of LSTM in sentiment
analysis
5. Proposed Model
● The model is a sequential neural network made up of 6
layers.
● The embedding layer transforms input words into
numerical representations known as embeddings.
● The Conv1D layer applies convolutional filters to the
embeddings.
● The MaxPooling2D layer downsampled data by
selecting the highest value from each set of two
vectors.
● The bidirectional layer allows for both forward and
backward processing of the sequence.
● The dropout layer helps prevent overfitting by
randomly setting a portion of the input units to zero.
● The dense layer generates the model's final output
with three units, which stand in for the three classes or
categories needed for the classification job.
● The model has a total of 179,939 parameters, of which
160,000 are trainable and 195 are non-trainable.
6. Graph of accuracy and loss vs no. of epochs
Results
Sentiment analysis of a user given tweet
7. Confusion Matrix with the testing data
Results
Accuracy Precisio
n
Recall F1-Score
91.66% 91.83% 91.51% 91.67%
Performance metrics of the model
on the test dataset
8. Conclusion
● Multilingual Tweet Intimacy Analysis (MTIA) is the study of personal
aspects of user-generated information published on social media
platforms in multiple languages.
● MTIA can help us understand the nature of intimacy in a variety of
cultural and linguistic contexts.
● MTIA can also help us develop better methods for natural language
processing.
● MTIA has practical applications in a variety of fields, including the social
sciences, psychology, marketing, and public opinion analysis.
● MTIA is a challenging field, but it has the potential to make significant
contributions to our understanding of social interactions.