SlideShare a Scribd company logo
1 of 9
Download to read offline
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 10, No. 6, December 2020, pp. 5992~6000
ISSN: 2088-8708, DOI: 10.11591/ijece.v10i6.pp5992-6000  5992
Journal homepage: http://ijece.iaescore.com/index.php/IJECE
Multi-lingual Twitter sentiment analysis using machine learning
K. Arun1
, A. Srinagesh2
1
Department of Computer Science and Engineering, Acharya Nagarjuna University, India
2
Department of Computer Science and Engineering, Rayapati Venkata Ranga Rao and Jagarlamudi Chandramouli
(R.V.R. and J.C.) College of Engineering, Acharya Nagarjuna University, India
Article Info ABSTRACT
Article history:
Received Aug 2, 2019
Revised May 2, 2020
Accepted May 17, 2020
Twitter Sentiment Analysis is one of the leading research fields nowadays.
Most of the researchers have contributed to the research in twitter sentiment
analysis in English tweets, but few researchers have focused on
the multilingual twitter sentiment analysis. Still, some more challenges are
present and not yet addressed in the domain of multilingual twitter sentiment
analysis (MLTSA). Research is highly warranted in these unexplored areas.
This study presents the implementation of sentiment analysis in multilingual
twitter data and improves the data classification up to the adequate level of
accuracy. Twitter is the sixth leading social networking site in the world.
Active users for twitter in a month are 330 million. People can tweet or
retweet in their languages and allow users to use emoji’s, abbreviations,
contraction words, misspellings, and shortcut words. The best platform for
sentiment analysis is twitter. Multilingual tweets and data sparsity are
the two main challenges. In this paper, the MLTSA algorithm gives
the solution for these two challenges. MLTSA algorithm is divided into two
parts. One for detecting and translating non-English tweets into English using
natural language processing (NLP) and the second one is an appropriate
pre-processing method with NLP support that can reduce the data sparsity.
The result of the MLTSA with SVM achieves good accuracy by up to 95%.
Keywords:
Machine learning algorithms
Multilingual
NLP
Twitter sentiment analysis
Copyright © 2020 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
K. Arun,
Department of Computer Science Engineering,
Acharya Nagarjuna University,
NH16, Nagarjuna Nagar, Guntur, Andhra Pradesh, India.
Email: karun014@gmail.com
1. INTRODUCTION
Twitter is microblogging, which allows a maximum of 280 characters per tweet; twitter is one of
the best social networks. Twitter has 330 million active users per month [1]. It provides a good platform to
share user's opinions and views about the trending topics. Twitter has no local groups; any user can post
tweets and has open access.
Twitter provides a good environment for sentiment analysis [2]. Sentiment classification can be
done on twitter data like positive, negative and neutral. Sentiment analysis can be applied to movie reviews,
product reviews, market review analysis, twitter trending news [3], restaurant review analysis [4], age,
gender sentiment classification [5], future predictions like election prediction, forecast applications [6, 7],
and forecasting stock market movements [8]. Multilingual tweets can be seen in trending news in the twitter
data. Extending the sentiment analysis into multilingual [9] tweets is very essential to improve the accuracy
in sentiment analysis without losing any user tweets. The efficiency of this analysis was improved by the best
pre-processing techniques [10].
Increasing the accuracy in sentiment analysis is a research challenge in big data. By reducing
the data sparsity and translating the multilingual tweets accuracy can be increased [7]. In the sentiment
analysis, pre-processing plays an important role [10, 11]. Mostly sentiment analysis is applied only on
Int J Elec & Comp Eng ISSN: 2088-8708 
Multi-lingual twitter sentiment analysis using machine learning (K. Arun)
5993
English tweets. In the pre-processing level non-English tweets are avoided, i.e. multilingual tweets are not
considered in this analysis. On this occasion, the accuracy of the sentiment analysis gets to drop.
Very valuable tweets and opinions of the public are not considered here. In twitter sentiment analysis,
multilingual tweets must be translated into English tweets [12], and noise data like misspellings, emphasized
words, code-mixed tweets need to be solved [13].
Multilingual twitter sentiment analysis is still prominent in reaserch area [14, 15]. In this context,
analysis of multilingual tweets is not a direct process. The first step in this sequence is that it requires
a translation from non-English tweets to the English language and the complementary steps are applied for
the sentiment classification. Spelling mistakes, emphasized words, contraction words are called as data
sparsity. Data sparsity can be reduced in pre-processing using NLP. In this way, the accuracy of
the classification can be improved drastically through multilingual twitter sentiment analysis [16, 17].
In this paper, MLTSA (Algorithm-1) is a new proposed algorithm, which is used to define and solve
the data sparsity by using efficient preprocessing methods and translation from non-English to English is by
using Google translator. Finally, the sentiment analysis process has been applied in this algorithm. In this
proposed system accuracy is increased up to 95%.
Sentiment analysis for Swiss politician tweets; in this study tweets are available in multiple national
languages like Swiss. The language was translated from Swiss to English with the help of machine learning
and a lexicon-based approach [18]. Norah Fahad Alshammari [19] Sentiment analysis classification for
twitter data from Arabic to English with deep learning model. A complete review of Arabic sentiment
analysis is given by Abdullatif Ghallab [20].
Md. Al-Amin improves the accuracy of the Bengali comments up to 75.5%, with the combination of
word2vec and sentiment extraction of words. This approach performance is not compromising in increasing
the size of the dataset [21]. Narr, Presents multilingual tweets sentiment analysis with the language
independent classification approach. Here four languages were translated using the Amazon Mechanical Turk
such as English, German, French, and Portuguese. Language tweets are independently translated and
classified on human-annotated tweets with good accuracy [22]. Nankani H., Dutta H focuses on sentiment
analysis of various low resources multi-languages [23] having limited sentiment analysis resources such as
annotated datasets, word embeddings and sentiment lexicons, along with the English language.
2. RESEARCH METHOD
Multilingual twitter sentiment analysis (MLTSA) contains the following steps.
- Pre-processing.
- Language translation for each multi-language tweets.
These two methods are very clearly defined in the flowchart Figure 1 and algorithm as follows.
Figure 1. Flowchart for multilingual twitter sentiment analysis
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000
5994
Algorithm 1. Multilingual twitter sentiment analysis (MLTSA)
ALGORITHM: - “Multilingual Twitter Sentiment Analysis (MLTSA)”
Purpose: - “Prediction of sentiment Analysis from the Multilingual and sparsity of tweets”.
Input: - Tweets ᴦ, Tweet_language T_L, Sentiment_Lexicon L
Output: - Tweet_Sentiment TS {P, Neg, NT}, Sentiment_Score SS
Where P: Positive, Neg: Negative, and Nt: Neutral.
Initialize: - POS,NEG, and NT=0,E=”English Language”
1. While Tokenize Tweets ᴦ into Tweet(ti)
2. Remove RT, Numbers, URL, Hyperlinks from ti
3. if T_L(ti) ≠ English then
4. Each ti∈ ᴦ is Tokenize into word set Wi
5. //translating into English using Google Translator(GNMT)//
6. ti=P(Ei|Wi)= ∏ 𝑃𝑛
𝑖=1 (Ei|E0,E1,E3…Ei-1;W1,W2,W3…Wi)
7. Tokenize ti ∈ ᴦ into word set Wi
8. Remove stop words, punctuation symbols and special symbols from Wi
9. if contraction words in Wi then
10. Replace with the complete word.
11. else if emphasized words in Wi then
12. Replace with the proper word.
13. //Lexicon values from the Lexicon Dictionary for each word//
14. Search for Wi in L
15. if Wi ∈ L.POS then
16. POSPOS+L.val
17. else if Wi∈ L.NEG then
18. NEGNEG+L.val
19. else
20. NTNT+1
21. // Tweets Sentiments and Sentiments scores calculation //
22. if POS>|NEG| then
23. TS=P, SS=POS|(POS+NEG)
24. else if POS<|NEG| then
25. TS=Neg, SS=NEG|(POS+NEG)
26. else
27. TS=Nt
28. End
2.1. Pre-processing
In multi-lingual twitter sentiment analysis, pre-processing plays an important key role.
The pre-processing step is used to prepare the raw data into suitable data for analysis. The basic method of
this process is cleaning and replacing the required data. The pre-processing method directly affects
the accuracy and efficiency of sentiment analysis. Here some of the pre-processing steps are used for the best
analysis. In the Figure 1 and Algorithm 1 defines many steps in pre-processing.
- Remove “RT”: In the twitter data set “RT” re-tweets are common, but it is not required for
the processing, so it is removed from the data.
- Remove numeric values: Numbers are not required in this analysis part, so numbers are also removed
from the tweets in the data set.
- Remove Isolated words: These words have no significant impact on sentiment analysis. Such as
{i, a, y} which are single characters.
- Remove undefined language tweets from the data set: In the twitter data set, every tweet is written in
a specific language. But some of the tweets are posted in the undefined language, this type of tweets will
not support for the language translations and sentiment analysis. So it is better to avoid undefined
language tweets.
- Remove punctuations and special symbols: Punctuation symbols do not affect the analysis; punctuations
are removed from each and every tweet.The symbols are [!"#$%&'()*+,-./:;<=>?@[]^_`{|}~],
not only these symbols some more special symbols are available regularly that are used in the tweets such
as [:“ ‘ ’…], these are also removed from the tweets.
- Remove hyperlinks: Hyperlinks do not affect the analysis; from each tweet, hyperlinks are removed.
- Remove Tags: there are different types of tags used in the tweets such as @, and &. This will not show
any effect on the analysis.
- Remove Stop words: There is One Hundred and Seventy-Nine (179) list of English language stop words
that were supported in the NLTK. All these are used in regular English sentences, but there is no effect in
the sentiment analysis so, it is better to remove the stop words. In the stop word list, some of
the negative words such as { 'nor', 'not', }has an effect in the sentiment classification, so these words must
be excluded from the stop word list, otherwise negative tweets will be classified into positive.
Int J Elec & Comp Eng ISSN: 2088-8708 
Multi-lingual twitter sentiment analysis using machine learning (K. Arun)
5995
- Replace the contraction words: Contraction words are commonly used in the tweets. The contraction
words are {I’ll, don’t, I’m, won’t, this’ll, etc}. But this type of contraction words are not directly
processed, it needs to be replaced with the proper words instead of contraction words.
Like {I’llI will, don’tdo not, I’m I am, won’t would not, this’llthis will, ..etc}
- Spelling corrections: More than 10% of tweets contain misspellings, which decreases the efficiency of
classification [24]. Here every tweet and every word in the tweet is verified. If any spelling mistakes
occur, that misspellings are corrected by using dictionary-based which is proposed by Peter Norvig [9].
- Replace the emphasized words: Emphasized words are used for expressive words and it is very
common to express the feelings in the text form. These words are not supported from any dictionary,
with the help of NLTK, regular expressions and spelling corrections can be replaced with proper words,
{‘cooooooool’’cool’,’perrrrrrrrrrrfffffect’’perfect’,’gooooood’’good’,’hooooot’’hot’,’toooooo’
’too’}.
2.2. Language translation
Language translation is used to translate non-English language tweets into English tweets.
The language of the tweets is specified in the "lang" attribute along with the "text". Three types of labels are
defining the language of tweets. The labels are 1) English, 2) Non-English, 3) Undefined. Example "en",
"hi", "und" in the Figure 1 and Algorithm 1 explains the translation procedure. If the ’lang' type is 'und',
These types of tweets provide hyperlinks, videos, stickers, or any graphics but not the textual information and
there is no use in the sentiment analysis process.
If the language is in the 'Non-English' type, it requires English translation. According to the given
algorithm (MLTSA), Algorithm 1 uses one most Google translator that is GNMT(google neural machine
translator). This GNMT [25-28] uses the LSTM RNN (long-short term memory recurrent neural
networks) [29]. This GNMT architecture divides into four parts 1) Decoder LSTMs, 2) Encoder LSTMs, 3)
Attention, 4) Softmax. Encoder LSTMs encodes the given input set of words from the source sentence into
fixed-sized vectors in its hidden layers, Decoder LSTMs and Softmax can decode into target language words
symbols, and attention provides the generated target word set and communication between encoder and
decoder hidden layers.
In this process, Twitter dataset ᴦ is tokenized into single tweets ti, and each tweet has its language,
the Language of ti is 'eng' or 'und' no need of translation, otherwise, translation is needed for any source
language into English only. In this process ti tokenized into words Wi.
V1, V2, V3, ..... Vn = EncoderLSTM(W1, W2, W3, ..... Wn) (1)
where Vn is the word symbol vector is the output of the encoder LSTM hidden layers.
Then decoderLMST and Softmax modules can make it into English word vector by using
the conditional probability model.
P(E|V) = P(E| V1, V2, V3, ..... Vn ) (2)
= ∏ Pn
i=1 (Ei|E0,E1,E3…Ei-1; V1, V2, V3, ..... Vn ) (3)
where E0 is the special symbol as the beginning of the sentence in any target language. Ei translated
word vector.
This system can do the translation task for the valid inputs. If Wi is a non-English word and which is
defined in a dictionary of a language, that word(s) will go for translation. But in other cases where Wi is
non-English but lexically English, translation is not possible.
3. RESULTS
3.1. Data set
The data set is collected from the Twitter API. The topic that has been accessed from Twitter live
stream is the honourable CM, he is the Andhra Pradesh honourable CM, and 200 tweets were collected.
Tweets are in local languages, and mostly in English. The rest of the tweets are in Indian local languages
such as Hindi, and Telugu. which is in the Figure 2.
Initially, each tweet contains 69 attributes, such as used_id,user_str_id, text, language, country,
out of this process it requires only two attributes {text,lang}. Here text contains tweets, re-tweets and lang is
referred to as the language of each tweet. The dimensionality of the data set was reduced up to two features.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000
5996
One of the new feature added to the data set is "sentiment" as a third feature. This sentiment
attribute contains three labels i.e. "positive, negative, neutral". An auto label is assigned with the support of
a lexicon-based approach. Translation of different languages into English is done using a python
programming language with NLP support. Finally data set contains three attributes. In Figure 3, 65-80% of
tweets are in the English language (en), and remaining are in the local language [te: Telugu, hi: Hindi, be:
Bengali, ar: Arabic, ur: Urdu] along with the undefined(und) language. Undefined language tweets are not
considered in this process.
Figure 2. Sample data set from twitter live
Figure 3. Multilingual tweets [English, Telugu, Hindi, und]
Int J Elec & Comp Eng ISSN: 2088-8708 
Multi-lingual twitter sentiment analysis using machine learning (K. Arun)
5997
3.2. Pre-processing
The preprocessing method contains 11 steps as mentioned above. The data set contains a huge
amount of noise data, even though the attributes were reduced up to three. The noise data must be cleared
before analyzing it. Python language completely supports the pre-processing because of its rich set of
packages. URL's, numbers, HTML tags, stop words, punctuation symbols, special symbols, and emoticons
were removed from each and every tweet. Additionally, the pre-processing module replaces contraction
words with actual words, Negation words with complete word and misspelling words were rectified.
Finally, the two data sets were pre-processed very effectively.
Sample pre-processing steps:
Input-1: RT@ravi: Apple phone it is verrrry costlllllllllllllllllllllllly
Output-1: apple phone costly.
Emphasized words and spelling corrections in the tweets are verrrrry, costllllllllllllllly.
These words can reduce the length like very, costly by using some of the regular expressions.
The remaining corrections depend on the spelling corrections.
3.3. Language translation
According to the specified algorithm MLTSA, different language (non-English) tweets are
translated. This translation is supported by one of the python modules, and that module depends on
the Google translator. In this algorithm, it contains 3 cases.
3.3.1. Translating non-English into English
Tweet in Hindi: पुलवामा हमले पर बोले भारतीय बैडममिंटन टीम के राष्ट्रीय कोच पुलेला गोपीचिंद पाक के साथ
ककसी तरह के
Translated to English: Pullela Gopichand, national coach of the Indian badminton team, spoke on
the Pulwama attack.
Tweet in Bengali: RT @IamSourav_b: যখন আজকালকার জীবনমুখী বাাংলা চলিচিত্র বা গানে
গালিগালাজ
ব্যবহার করা হয় তখন তো @MyAnandaBazar এর গায়ে লাগেনা দেখি| কিন্তু…
Translated to English: When we are now using the lyrics of the life-changing Bangla film or song,
we do not see it
Tweet in Telugu and Translation:
3.3.2. Translating mixed language words into English
Translating into English: Country universal election noise there farming attached mine the problems
judiciary his majestically light losing Varukirata.
3.3.3. Code-mixed words
A tweet is typed in one language but the meaning is in other languages, this type of tweet is not
getting translated(Code mixed). Tweet: “Antha Rajanna mahima Appatlo annagarini devudu ni ela chesaro
ipudu ysr ni devudu cheyadam chusthunte ardamaipoth”. This tweet is written in English, but each and every
meaning of the word is related to Telugu language. So this type of tweet is not converted.
3.4. Training and testing
Bag-of-words vector is created after the pre-processing and translations by using the counter
vectorization. Data set splits into 7:3 ratio out of that 70% is for the train data, and 30% is for the test data.
By using machine learning algorithms [30], the training and testing process is implemented. The ML
algorithms are
- Multinomial Naïve Bayes (MNB),
- Logistic Regression (LR),
- Support Vector Machines (SVM),
- Decision Tree (DT),
- k-Nearest Neighbour (kNN),
- Random Forest (RF).
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000
5998
Performance measures are computed for all the algorithms. Performance measures such as precision,
recall, f1-scores. Figure 4 to Figure 6 are used to represent the sentiment classification with pre-processes and
MLTSA algorithm.
Figure 4. Precision in pre-process vs. MLTSA classification
Figure 5. Recall in pre-process vs. MLTSA classification
Figure 6. F1-Score in pre-process vs. MLTSA classification
Int J Elec & Comp Eng ISSN: 2088-8708 
Multi-lingual twitter sentiment analysis using machine learning (K. Arun)
5999
In the Figures 4-6 the results show the comparison between pre-processed tweets and MLTSA
algorithm. In all the cases, MLTSA leads to good and accurate results. Precision, Recall and F1-scores are
the averages for the positive, negative, neutrals. MLTSA line is increased above the line of pre-processing.
Training and testing score accuracy in machine learning algorithms for SVM (support vector machine) is
95% and for RF (random forest), it is 93%.
4. CONCLUSION AND FUTURE WORK
In this paper, sentiments are extracted from the multilingual tweets. This algorithm provides
sufficient pre-processing techniques that were applied to set the dataset. Here totally 11 pre-processing
techniques were implemented to improve the accuracy in the sentiment analysis. Multi-lingual tweets deal
with multiple numbers of international or local languages. Python language translator module, which is
supported by the Google translator, is used to translate the no English tweets into English, and then
sentiments were extracted from the English data. MLTSA algorithm is a better pre-processing and language
translation technique. Machine learning algorithms are applied in the trained and test data. Sentiment
classifications are improved on the translated data. Machine learning algorithms are Multinomial naive bayes
(MNB), logistic regression (LR), support vector machine (SVM), decision tree (DT), K-nearest neighbor
(KNN), and random forest (RF) applied. The performance measures in machine learning are precision, recall,
and f1-score. Acceptable improvements were recorded after the translation and SVM is the best classifier in
multilingual twitter sentiment analysis. Accuracy is improved by up to 95%. In this work, some challenges
need to be solved in situations like for example, accuracy drops when the tweets contain code-mixed and
code-switched words and sentences.
REFERENCES
[1] J. Clement, “Twitter: number of active users 2010-2019,” Statista, Aug. 2019. [Online], Available:
https://www.statista.com/statistics/282087/number-of-monthly-activetwitter-users.
[2] E. Kouloumpis, et al., “Twitter Sentiment Analysis: The Good the Bad and the OMG!” in Proceedings of Fifth
International AAAI Conference on Weblogs and Social Media, pp. 538-541, 2011.
[3] K. Arun, et al., “Twitter Sentiment Analysis on Demonetization tweets in India Using R language,” International
Journal of Computer Engineering in Research Trends, vol. 4, no. 6, pp. 252-258, 2017.
[4] S. A. El Rahman, et al., “Sentiment Analysis of Twitter Data,” International Conference on Computer and
Information Sciences (ICCIS), Saudi Arabia, pp. 1-4, 2019.
[5] S. Kumar, et al., “Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning,”
Electronics, vol. 9, no. 2, pp. 1-14, 2020.
[6] H. Saif, et al., “Semantic sentiment analysis of twitter,” International Semantic Web Conference, pp. 508-524,
2012.
[7] S. Shayaa, et al., “Sentiment Analysis of Big Data: Methods, Applications, and Open Challenges,” in IEEE Access,
vol. 6, pp. 37807-37827, 2018.
[8] T. M. Nisar and M. Yeung, “Twitter as a Tool for Forecasting Stock Market Movements: A Short-window Event
Study,” The Journal of Finance and Data Science, vol. 4, no. 2, pp. 101-119, 2018.
[9] A. K. Soni, “Multi-Lingual Sentiment Analysis of Twitter data by using classification algorithms,” 2007 Second
International Conference on Electrical, Computer and Communication Technologies, pp. 1-5, 2017.
[10] Z. Jianqiang and G. Xiaolin, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment
Analysis,” in IEEE Access, vol. 5, pp. 2870-2879, 2017.
[11] A. Deshwal and S. K. Sharma, “Twitter Sentiment Analysis using Various Classification Algorithms,” 2016 5th
International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)
(ICRITO), Noida, pp. 251-257, 2016.
[12] A. Balahur and M. Turchi, “Improving sentiment analysis in twitter using multilingual machine translated data,”
Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 49-55, 2013.
[13] S. K. Mahata, et al., “Analyzing Code-Switching Rules for English–Hindi Code-Mixed Text,” Emerging
Technology in Modelling and Graphics, pp. 137-145, 2020.
[14] K. Dashtipour, et al., “Multilingual Sentiment Analysis: State of the Art and Independent Comparison of
Techniques,” Cognitive Computation, vol. 8, pp. 757-771, 2016.
[15] I. Mozetič, et al., “Multilingual Twitter Sentiment Classification: The Role of Human Annotators,” PLOSONE,
vol. 11, no. 5, 2016.
[16] M. E. Basiri and A. Kabiri, “Uninorm operators for sentence-level score aggregation in sentiment analysis,”
4th International Conference on Web Research (ICWR), Tehran, pp. 97-102, 2018.
[17] S. Sahu, S. K. Rout and D. Mohanty, “Twitter Sentiment Analysis - A More Enhanced Way of Classification and
Scoring,” 2015 IEEE International Symposium on Nanoelectronic and Information Systems, pp. 67-72, 2015.
[18] L. Brönnimann, “Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians,”
M.Sc. Thesis, University of Applied Sciences Northwestern Switzerland, 2014.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000
6000
[19] N. F. Alshammari and A. A. Al Mansour, “State-of-the-art review on Twitter Sentiment Analysis,” 2nd
International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia,
pp. 1-8, 2019.
[20] A. Ghallab, et al., “Arabic Sentiment Analysis: A Systematic Literature Review,” Applied Computational
Intelligence and Soft Computing, pp. 1-21, 2020.
[21] M. Al-Amin, et al., “Sentiment Analysis of Bengali Comments with words2VCE and Sentiment Information of
Words,” 2017 International Conference on Electrical, computer and Communication Engineering (ECCE),
pp. 186-190, 2017.
[22] S. Narr, et al., “Language-Independent Twitter Sentiment Analysis,” Workshop on Knowledge Discovery, Data
Mining and Machine Learning (KDML 2012), 2012.
[23] Nankani H., et al., “Multilingual Sentiment Analysis,” in Agarwal B., et al. (eds), “Deep Learning-Based
Approaches for Sentiment Analysis,” Algorithms for Intelligent Systems, pp. 193-236, 2020.
[24] H. Saif, et al., “Alleviating data sparsity for Twitter sentiment analysis,” 2nd Workshop on Making Sense of
Microposts, pp. 2-9, 2012.
[25] Y. Wu, et al., “Google's Neural Machine Translation System: Bridging the Gap between Human and Machine
Translation,” arXiv: 1609.08144, 2016.
[26] Q. V. Le and M. Schuster, “A Neural Network for Machine Translation, at Production Scale,” Google AI Blog, Sep
2016. [Online], Available: https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html.
[27] S. Strassel, et al., “Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous
Language Exploitation,” Springer-Verlag, 2011.
[28] L. Specia, et al., “Quality Estimation for Machine Translation,” Morgan & Claypool Publishers, 2018.
[29] Christopher Olah, “Understanding LSTM Networks,” 2015. [Online], Available: https://colah.github.io/posts/2015-08-
Understanding-LSTMs/.
[30] Jesus Rogel-Salazar, “Data Science and Analytics with Python,” 1st.
ed., Chapman & Hall/CRC, 2017

More Related Content

What's hot

The sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionThe sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionEditorIJAERD
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisAnuj Gupta
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis pptSonuCreation
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Anil Shrestha
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysissneha penmetsa
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion miningAnkush Mehta
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter dataBhagyashree Deokar
 
Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysisAnil Shrestha
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysisZahid Azam
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276IJMER
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Sentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsSentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsRaimon Bosch
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
Pareto Type II Based Software Reliability Growth Model
Pareto Type II Based Software Reliability Growth ModelPareto Type II Based Software Reliability Growth Model
Pareto Type II Based Software Reliability Growth ModelWaqas Tariq
 

What's hot (20)

The sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionThe sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regression
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter data
 
Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysis
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsSentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-models
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Pareto Type II Based Software Reliability Growth Model
Pareto Type II Based Software Reliability Growth ModelPareto Type II Based Software Reliability Growth Model
Pareto Type II Based Software Reliability Growth Model
 

Similar to Multi-lingual Twitter sentiment analysis using machine learning

A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGA STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGIRJET Journal
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment AnalysisA Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment AnalysisIRJET Journal
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisIRJET Journal
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...eSAT Journals
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...eSAT Journals
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...IRJET Journal
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental AnalysisIRJET Journal
 
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...IRJET Journal
 
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market InvestmentIRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market InvestmentIRJET Journal
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningIRJET Journal
 
IRJET - Sentiment Analysis of Posts and Comments of OSN
IRJET -  	  Sentiment Analysis of Posts and Comments of OSNIRJET -  	  Sentiment Analysis of Posts and Comments of OSN
IRJET - Sentiment Analysis of Posts and Comments of OSNIRJET Journal
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm IJECEIAES
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
 
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET Journal
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...IRJET Journal
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine LearningIRJET Journal
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...IJECEIAES
 

Similar to Multi-lingual Twitter sentiment analysis using machine learning (20)

A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGA STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment AnalysisA Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
 
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market InvestmentIRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine Learning
 
IRJET - Sentiment Analysis of Posts and Comments of OSN
IRJET -  	  Sentiment Analysis of Posts and Comments of OSNIRJET -  	  Sentiment Analysis of Posts and Comments of OSN
IRJET - Sentiment Analysis of Posts and Comments of OSN
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
 
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
 

More from IJECEIAES

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...IJECEIAES
 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...IJECEIAES
 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...IJECEIAES
 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningIJECEIAES
 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...IJECEIAES
 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...IJECEIAES
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...IJECEIAES
 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...IJECEIAES
 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyIJECEIAES
 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationIJECEIAES
 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceIJECEIAES
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...IJECEIAES
 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...IJECEIAES
 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...IJECEIAES
 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...IJECEIAES
 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...IJECEIAES
 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...IJECEIAES
 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...IJECEIAES
 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...IJECEIAES
 

More from IJECEIAES (20)

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...
 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regression
 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learning
 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...
 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...
 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...
 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancy
 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimization
 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performance
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...
 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...
 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...
 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...
 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...
 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
 

Recently uploaded

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Multi-lingual Twitter sentiment analysis using machine learning

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 10, No. 6, December 2020, pp. 5992~6000 ISSN: 2088-8708, DOI: 10.11591/ijece.v10i6.pp5992-6000  5992 Journal homepage: http://ijece.iaescore.com/index.php/IJECE Multi-lingual Twitter sentiment analysis using machine learning K. Arun1 , A. Srinagesh2 1 Department of Computer Science and Engineering, Acharya Nagarjuna University, India 2 Department of Computer Science and Engineering, Rayapati Venkata Ranga Rao and Jagarlamudi Chandramouli (R.V.R. and J.C.) College of Engineering, Acharya Nagarjuna University, India Article Info ABSTRACT Article history: Received Aug 2, 2019 Revised May 2, 2020 Accepted May 17, 2020 Twitter Sentiment Analysis is one of the leading research fields nowadays. Most of the researchers have contributed to the research in twitter sentiment analysis in English tweets, but few researchers have focused on the multilingual twitter sentiment analysis. Still, some more challenges are present and not yet addressed in the domain of multilingual twitter sentiment analysis (MLTSA). Research is highly warranted in these unexplored areas. This study presents the implementation of sentiment analysis in multilingual twitter data and improves the data classification up to the adequate level of accuracy. Twitter is the sixth leading social networking site in the world. Active users for twitter in a month are 330 million. People can tweet or retweet in their languages and allow users to use emoji’s, abbreviations, contraction words, misspellings, and shortcut words. The best platform for sentiment analysis is twitter. Multilingual tweets and data sparsity are the two main challenges. In this paper, the MLTSA algorithm gives the solution for these two challenges. MLTSA algorithm is divided into two parts. One for detecting and translating non-English tweets into English using natural language processing (NLP) and the second one is an appropriate pre-processing method with NLP support that can reduce the data sparsity. The result of the MLTSA with SVM achieves good accuracy by up to 95%. Keywords: Machine learning algorithms Multilingual NLP Twitter sentiment analysis Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: K. Arun, Department of Computer Science Engineering, Acharya Nagarjuna University, NH16, Nagarjuna Nagar, Guntur, Andhra Pradesh, India. Email: karun014@gmail.com 1. INTRODUCTION Twitter is microblogging, which allows a maximum of 280 characters per tweet; twitter is one of the best social networks. Twitter has 330 million active users per month [1]. It provides a good platform to share user's opinions and views about the trending topics. Twitter has no local groups; any user can post tweets and has open access. Twitter provides a good environment for sentiment analysis [2]. Sentiment classification can be done on twitter data like positive, negative and neutral. Sentiment analysis can be applied to movie reviews, product reviews, market review analysis, twitter trending news [3], restaurant review analysis [4], age, gender sentiment classification [5], future predictions like election prediction, forecast applications [6, 7], and forecasting stock market movements [8]. Multilingual tweets can be seen in trending news in the twitter data. Extending the sentiment analysis into multilingual [9] tweets is very essential to improve the accuracy in sentiment analysis without losing any user tweets. The efficiency of this analysis was improved by the best pre-processing techniques [10]. Increasing the accuracy in sentiment analysis is a research challenge in big data. By reducing the data sparsity and translating the multilingual tweets accuracy can be increased [7]. In the sentiment analysis, pre-processing plays an important role [10, 11]. Mostly sentiment analysis is applied only on
  • 2. Int J Elec & Comp Eng ISSN: 2088-8708  Multi-lingual twitter sentiment analysis using machine learning (K. Arun) 5993 English tweets. In the pre-processing level non-English tweets are avoided, i.e. multilingual tweets are not considered in this analysis. On this occasion, the accuracy of the sentiment analysis gets to drop. Very valuable tweets and opinions of the public are not considered here. In twitter sentiment analysis, multilingual tweets must be translated into English tweets [12], and noise data like misspellings, emphasized words, code-mixed tweets need to be solved [13]. Multilingual twitter sentiment analysis is still prominent in reaserch area [14, 15]. In this context, analysis of multilingual tweets is not a direct process. The first step in this sequence is that it requires a translation from non-English tweets to the English language and the complementary steps are applied for the sentiment classification. Spelling mistakes, emphasized words, contraction words are called as data sparsity. Data sparsity can be reduced in pre-processing using NLP. In this way, the accuracy of the classification can be improved drastically through multilingual twitter sentiment analysis [16, 17]. In this paper, MLTSA (Algorithm-1) is a new proposed algorithm, which is used to define and solve the data sparsity by using efficient preprocessing methods and translation from non-English to English is by using Google translator. Finally, the sentiment analysis process has been applied in this algorithm. In this proposed system accuracy is increased up to 95%. Sentiment analysis for Swiss politician tweets; in this study tweets are available in multiple national languages like Swiss. The language was translated from Swiss to English with the help of machine learning and a lexicon-based approach [18]. Norah Fahad Alshammari [19] Sentiment analysis classification for twitter data from Arabic to English with deep learning model. A complete review of Arabic sentiment analysis is given by Abdullatif Ghallab [20]. Md. Al-Amin improves the accuracy of the Bengali comments up to 75.5%, with the combination of word2vec and sentiment extraction of words. This approach performance is not compromising in increasing the size of the dataset [21]. Narr, Presents multilingual tweets sentiment analysis with the language independent classification approach. Here four languages were translated using the Amazon Mechanical Turk such as English, German, French, and Portuguese. Language tweets are independently translated and classified on human-annotated tweets with good accuracy [22]. Nankani H., Dutta H focuses on sentiment analysis of various low resources multi-languages [23] having limited sentiment analysis resources such as annotated datasets, word embeddings and sentiment lexicons, along with the English language. 2. RESEARCH METHOD Multilingual twitter sentiment analysis (MLTSA) contains the following steps. - Pre-processing. - Language translation for each multi-language tweets. These two methods are very clearly defined in the flowchart Figure 1 and algorithm as follows. Figure 1. Flowchart for multilingual twitter sentiment analysis
  • 3.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000 5994 Algorithm 1. Multilingual twitter sentiment analysis (MLTSA) ALGORITHM: - “Multilingual Twitter Sentiment Analysis (MLTSA)” Purpose: - “Prediction of sentiment Analysis from the Multilingual and sparsity of tweets”. Input: - Tweets ᴦ, Tweet_language T_L, Sentiment_Lexicon L Output: - Tweet_Sentiment TS {P, Neg, NT}, Sentiment_Score SS Where P: Positive, Neg: Negative, and Nt: Neutral. Initialize: - POS,NEG, and NT=0,E=”English Language” 1. While Tokenize Tweets ᴦ into Tweet(ti) 2. Remove RT, Numbers, URL, Hyperlinks from ti 3. if T_L(ti) ≠ English then 4. Each ti∈ ᴦ is Tokenize into word set Wi 5. //translating into English using Google Translator(GNMT)// 6. ti=P(Ei|Wi)= ∏ 𝑃𝑛 𝑖=1 (Ei|E0,E1,E3…Ei-1;W1,W2,W3…Wi) 7. Tokenize ti ∈ ᴦ into word set Wi 8. Remove stop words, punctuation symbols and special symbols from Wi 9. if contraction words in Wi then 10. Replace with the complete word. 11. else if emphasized words in Wi then 12. Replace with the proper word. 13. //Lexicon values from the Lexicon Dictionary for each word// 14. Search for Wi in L 15. if Wi ∈ L.POS then 16. POSPOS+L.val 17. else if Wi∈ L.NEG then 18. NEGNEG+L.val 19. else 20. NTNT+1 21. // Tweets Sentiments and Sentiments scores calculation // 22. if POS>|NEG| then 23. TS=P, SS=POS|(POS+NEG) 24. else if POS<|NEG| then 25. TS=Neg, SS=NEG|(POS+NEG) 26. else 27. TS=Nt 28. End 2.1. Pre-processing In multi-lingual twitter sentiment analysis, pre-processing plays an important key role. The pre-processing step is used to prepare the raw data into suitable data for analysis. The basic method of this process is cleaning and replacing the required data. The pre-processing method directly affects the accuracy and efficiency of sentiment analysis. Here some of the pre-processing steps are used for the best analysis. In the Figure 1 and Algorithm 1 defines many steps in pre-processing. - Remove “RT”: In the twitter data set “RT” re-tweets are common, but it is not required for the processing, so it is removed from the data. - Remove numeric values: Numbers are not required in this analysis part, so numbers are also removed from the tweets in the data set. - Remove Isolated words: These words have no significant impact on sentiment analysis. Such as {i, a, y} which are single characters. - Remove undefined language tweets from the data set: In the twitter data set, every tweet is written in a specific language. But some of the tweets are posted in the undefined language, this type of tweets will not support for the language translations and sentiment analysis. So it is better to avoid undefined language tweets. - Remove punctuations and special symbols: Punctuation symbols do not affect the analysis; punctuations are removed from each and every tweet.The symbols are [!"#$%&'()*+,-./:;<=>?@[]^_`{|}~], not only these symbols some more special symbols are available regularly that are used in the tweets such as [:“ ‘ ’…], these are also removed from the tweets. - Remove hyperlinks: Hyperlinks do not affect the analysis; from each tweet, hyperlinks are removed. - Remove Tags: there are different types of tags used in the tweets such as @, and &. This will not show any effect on the analysis. - Remove Stop words: There is One Hundred and Seventy-Nine (179) list of English language stop words that were supported in the NLTK. All these are used in regular English sentences, but there is no effect in the sentiment analysis so, it is better to remove the stop words. In the stop word list, some of the negative words such as { 'nor', 'not', }has an effect in the sentiment classification, so these words must be excluded from the stop word list, otherwise negative tweets will be classified into positive.
  • 4. Int J Elec & Comp Eng ISSN: 2088-8708  Multi-lingual twitter sentiment analysis using machine learning (K. Arun) 5995 - Replace the contraction words: Contraction words are commonly used in the tweets. The contraction words are {I’ll, don’t, I’m, won’t, this’ll, etc}. But this type of contraction words are not directly processed, it needs to be replaced with the proper words instead of contraction words. Like {I’llI will, don’tdo not, I’m I am, won’t would not, this’llthis will, ..etc} - Spelling corrections: More than 10% of tweets contain misspellings, which decreases the efficiency of classification [24]. Here every tweet and every word in the tweet is verified. If any spelling mistakes occur, that misspellings are corrected by using dictionary-based which is proposed by Peter Norvig [9]. - Replace the emphasized words: Emphasized words are used for expressive words and it is very common to express the feelings in the text form. These words are not supported from any dictionary, with the help of NLTK, regular expressions and spelling corrections can be replaced with proper words, {‘cooooooool’’cool’,’perrrrrrrrrrrfffffect’’perfect’,’gooooood’’good’,’hooooot’’hot’,’toooooo’ ’too’}. 2.2. Language translation Language translation is used to translate non-English language tweets into English tweets. The language of the tweets is specified in the "lang" attribute along with the "text". Three types of labels are defining the language of tweets. The labels are 1) English, 2) Non-English, 3) Undefined. Example "en", "hi", "und" in the Figure 1 and Algorithm 1 explains the translation procedure. If the ’lang' type is 'und', These types of tweets provide hyperlinks, videos, stickers, or any graphics but not the textual information and there is no use in the sentiment analysis process. If the language is in the 'Non-English' type, it requires English translation. According to the given algorithm (MLTSA), Algorithm 1 uses one most Google translator that is GNMT(google neural machine translator). This GNMT [25-28] uses the LSTM RNN (long-short term memory recurrent neural networks) [29]. This GNMT architecture divides into four parts 1) Decoder LSTMs, 2) Encoder LSTMs, 3) Attention, 4) Softmax. Encoder LSTMs encodes the given input set of words from the source sentence into fixed-sized vectors in its hidden layers, Decoder LSTMs and Softmax can decode into target language words symbols, and attention provides the generated target word set and communication between encoder and decoder hidden layers. In this process, Twitter dataset ᴦ is tokenized into single tweets ti, and each tweet has its language, the Language of ti is 'eng' or 'und' no need of translation, otherwise, translation is needed for any source language into English only. In this process ti tokenized into words Wi. V1, V2, V3, ..... Vn = EncoderLSTM(W1, W2, W3, ..... Wn) (1) where Vn is the word symbol vector is the output of the encoder LSTM hidden layers. Then decoderLMST and Softmax modules can make it into English word vector by using the conditional probability model. P(E|V) = P(E| V1, V2, V3, ..... Vn ) (2) = ∏ Pn i=1 (Ei|E0,E1,E3…Ei-1; V1, V2, V3, ..... Vn ) (3) where E0 is the special symbol as the beginning of the sentence in any target language. Ei translated word vector. This system can do the translation task for the valid inputs. If Wi is a non-English word and which is defined in a dictionary of a language, that word(s) will go for translation. But in other cases where Wi is non-English but lexically English, translation is not possible. 3. RESULTS 3.1. Data set The data set is collected from the Twitter API. The topic that has been accessed from Twitter live stream is the honourable CM, he is the Andhra Pradesh honourable CM, and 200 tweets were collected. Tweets are in local languages, and mostly in English. The rest of the tweets are in Indian local languages such as Hindi, and Telugu. which is in the Figure 2. Initially, each tweet contains 69 attributes, such as used_id,user_str_id, text, language, country, out of this process it requires only two attributes {text,lang}. Here text contains tweets, re-tweets and lang is referred to as the language of each tweet. The dimensionality of the data set was reduced up to two features.
  • 5.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000 5996 One of the new feature added to the data set is "sentiment" as a third feature. This sentiment attribute contains three labels i.e. "positive, negative, neutral". An auto label is assigned with the support of a lexicon-based approach. Translation of different languages into English is done using a python programming language with NLP support. Finally data set contains three attributes. In Figure 3, 65-80% of tweets are in the English language (en), and remaining are in the local language [te: Telugu, hi: Hindi, be: Bengali, ar: Arabic, ur: Urdu] along with the undefined(und) language. Undefined language tweets are not considered in this process. Figure 2. Sample data set from twitter live Figure 3. Multilingual tweets [English, Telugu, Hindi, und]
  • 6. Int J Elec & Comp Eng ISSN: 2088-8708  Multi-lingual twitter sentiment analysis using machine learning (K. Arun) 5997 3.2. Pre-processing The preprocessing method contains 11 steps as mentioned above. The data set contains a huge amount of noise data, even though the attributes were reduced up to three. The noise data must be cleared before analyzing it. Python language completely supports the pre-processing because of its rich set of packages. URL's, numbers, HTML tags, stop words, punctuation symbols, special symbols, and emoticons were removed from each and every tweet. Additionally, the pre-processing module replaces contraction words with actual words, Negation words with complete word and misspelling words were rectified. Finally, the two data sets were pre-processed very effectively. Sample pre-processing steps: Input-1: RT@ravi: Apple phone it is verrrry costlllllllllllllllllllllllly Output-1: apple phone costly. Emphasized words and spelling corrections in the tweets are verrrrry, costllllllllllllllly. These words can reduce the length like very, costly by using some of the regular expressions. The remaining corrections depend on the spelling corrections. 3.3. Language translation According to the specified algorithm MLTSA, different language (non-English) tweets are translated. This translation is supported by one of the python modules, and that module depends on the Google translator. In this algorithm, it contains 3 cases. 3.3.1. Translating non-English into English Tweet in Hindi: पुलवामा हमले पर बोले भारतीय बैडममिंटन टीम के राष्ट्रीय कोच पुलेला गोपीचिंद पाक के साथ ककसी तरह के Translated to English: Pullela Gopichand, national coach of the Indian badminton team, spoke on the Pulwama attack. Tweet in Bengali: RT @IamSourav_b: যখন আজকালকার জীবনমুখী বাাংলা চলিচিত্র বা গানে গালিগালাজ ব্যবহার করা হয় তখন তো @MyAnandaBazar এর গায়ে লাগেনা দেখি| কিন্তু… Translated to English: When we are now using the lyrics of the life-changing Bangla film or song, we do not see it Tweet in Telugu and Translation: 3.3.2. Translating mixed language words into English Translating into English: Country universal election noise there farming attached mine the problems judiciary his majestically light losing Varukirata. 3.3.3. Code-mixed words A tweet is typed in one language but the meaning is in other languages, this type of tweet is not getting translated(Code mixed). Tweet: “Antha Rajanna mahima Appatlo annagarini devudu ni ela chesaro ipudu ysr ni devudu cheyadam chusthunte ardamaipoth”. This tweet is written in English, but each and every meaning of the word is related to Telugu language. So this type of tweet is not converted. 3.4. Training and testing Bag-of-words vector is created after the pre-processing and translations by using the counter vectorization. Data set splits into 7:3 ratio out of that 70% is for the train data, and 30% is for the test data. By using machine learning algorithms [30], the training and testing process is implemented. The ML algorithms are - Multinomial Naïve Bayes (MNB), - Logistic Regression (LR), - Support Vector Machines (SVM), - Decision Tree (DT), - k-Nearest Neighbour (kNN), - Random Forest (RF).
  • 7.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000 5998 Performance measures are computed for all the algorithms. Performance measures such as precision, recall, f1-scores. Figure 4 to Figure 6 are used to represent the sentiment classification with pre-processes and MLTSA algorithm. Figure 4. Precision in pre-process vs. MLTSA classification Figure 5. Recall in pre-process vs. MLTSA classification Figure 6. F1-Score in pre-process vs. MLTSA classification
  • 8. Int J Elec & Comp Eng ISSN: 2088-8708  Multi-lingual twitter sentiment analysis using machine learning (K. Arun) 5999 In the Figures 4-6 the results show the comparison between pre-processed tweets and MLTSA algorithm. In all the cases, MLTSA leads to good and accurate results. Precision, Recall and F1-scores are the averages for the positive, negative, neutrals. MLTSA line is increased above the line of pre-processing. Training and testing score accuracy in machine learning algorithms for SVM (support vector machine) is 95% and for RF (random forest), it is 93%. 4. CONCLUSION AND FUTURE WORK In this paper, sentiments are extracted from the multilingual tweets. This algorithm provides sufficient pre-processing techniques that were applied to set the dataset. Here totally 11 pre-processing techniques were implemented to improve the accuracy in the sentiment analysis. Multi-lingual tweets deal with multiple numbers of international or local languages. Python language translator module, which is supported by the Google translator, is used to translate the no English tweets into English, and then sentiments were extracted from the English data. MLTSA algorithm is a better pre-processing and language translation technique. Machine learning algorithms are applied in the trained and test data. Sentiment classifications are improved on the translated data. Machine learning algorithms are Multinomial naive bayes (MNB), logistic regression (LR), support vector machine (SVM), decision tree (DT), K-nearest neighbor (KNN), and random forest (RF) applied. The performance measures in machine learning are precision, recall, and f1-score. Acceptable improvements were recorded after the translation and SVM is the best classifier in multilingual twitter sentiment analysis. Accuracy is improved by up to 95%. In this work, some challenges need to be solved in situations like for example, accuracy drops when the tweets contain code-mixed and code-switched words and sentences. REFERENCES [1] J. Clement, “Twitter: number of active users 2010-2019,” Statista, Aug. 2019. [Online], Available: https://www.statista.com/statistics/282087/number-of-monthly-activetwitter-users. [2] E. Kouloumpis, et al., “Twitter Sentiment Analysis: The Good the Bad and the OMG!” in Proceedings of Fifth International AAAI Conference on Weblogs and Social Media, pp. 538-541, 2011. [3] K. Arun, et al., “Twitter Sentiment Analysis on Demonetization tweets in India Using R language,” International Journal of Computer Engineering in Research Trends, vol. 4, no. 6, pp. 252-258, 2017. [4] S. A. El Rahman, et al., “Sentiment Analysis of Twitter Data,” International Conference on Computer and Information Sciences (ICCIS), Saudi Arabia, pp. 1-4, 2019. [5] S. Kumar, et al., “Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning,” Electronics, vol. 9, no. 2, pp. 1-14, 2020. [6] H. Saif, et al., “Semantic sentiment analysis of twitter,” International Semantic Web Conference, pp. 508-524, 2012. [7] S. Shayaa, et al., “Sentiment Analysis of Big Data: Methods, Applications, and Open Challenges,” in IEEE Access, vol. 6, pp. 37807-37827, 2018. [8] T. M. Nisar and M. Yeung, “Twitter as a Tool for Forecasting Stock Market Movements: A Short-window Event Study,” The Journal of Finance and Data Science, vol. 4, no. 2, pp. 101-119, 2018. [9] A. K. Soni, “Multi-Lingual Sentiment Analysis of Twitter data by using classification algorithms,” 2007 Second International Conference on Electrical, Computer and Communication Technologies, pp. 1-5, 2017. [10] Z. Jianqiang and G. Xiaolin, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis,” in IEEE Access, vol. 5, pp. 2870-2879, 2017. [11] A. Deshwal and S. K. Sharma, “Twitter Sentiment Analysis using Various Classification Algorithms,” 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, pp. 251-257, 2016. [12] A. Balahur and M. Turchi, “Improving sentiment analysis in twitter using multilingual machine translated data,” Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 49-55, 2013. [13] S. K. Mahata, et al., “Analyzing Code-Switching Rules for English–Hindi Code-Mixed Text,” Emerging Technology in Modelling and Graphics, pp. 137-145, 2020. [14] K. Dashtipour, et al., “Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques,” Cognitive Computation, vol. 8, pp. 757-771, 2016. [15] I. Mozetič, et al., “Multilingual Twitter Sentiment Classification: The Role of Human Annotators,” PLOSONE, vol. 11, no. 5, 2016. [16] M. E. Basiri and A. Kabiri, “Uninorm operators for sentence-level score aggregation in sentiment analysis,” 4th International Conference on Web Research (ICWR), Tehran, pp. 97-102, 2018. [17] S. Sahu, S. K. Rout and D. Mohanty, “Twitter Sentiment Analysis - A More Enhanced Way of Classification and Scoring,” 2015 IEEE International Symposium on Nanoelectronic and Information Systems, pp. 67-72, 2015. [18] L. Brönnimann, “Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians,” M.Sc. Thesis, University of Applied Sciences Northwestern Switzerland, 2014.
  • 9.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 5992 - 6000 6000 [19] N. F. Alshammari and A. A. Al Mansour, “State-of-the-art review on Twitter Sentiment Analysis,” 2nd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, pp. 1-8, 2019. [20] A. Ghallab, et al., “Arabic Sentiment Analysis: A Systematic Literature Review,” Applied Computational Intelligence and Soft Computing, pp. 1-21, 2020. [21] M. Al-Amin, et al., “Sentiment Analysis of Bengali Comments with words2VCE and Sentiment Information of Words,” 2017 International Conference on Electrical, computer and Communication Engineering (ECCE), pp. 186-190, 2017. [22] S. Narr, et al., “Language-Independent Twitter Sentiment Analysis,” Workshop on Knowledge Discovery, Data Mining and Machine Learning (KDML 2012), 2012. [23] Nankani H., et al., “Multilingual Sentiment Analysis,” in Agarwal B., et al. (eds), “Deep Learning-Based Approaches for Sentiment Analysis,” Algorithms for Intelligent Systems, pp. 193-236, 2020. [24] H. Saif, et al., “Alleviating data sparsity for Twitter sentiment analysis,” 2nd Workshop on Making Sense of Microposts, pp. 2-9, 2012. [25] Y. Wu, et al., “Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” arXiv: 1609.08144, 2016. [26] Q. V. Le and M. Schuster, “A Neural Network for Machine Translation, at Production Scale,” Google AI Blog, Sep 2016. [Online], Available: https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html. [27] S. Strassel, et al., “Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation,” Springer-Verlag, 2011. [28] L. Specia, et al., “Quality Estimation for Machine Translation,” Morgan & Claypool Publishers, 2018. [29] Christopher Olah, “Understanding LSTM Networks,” 2015. [Online], Available: https://colah.github.io/posts/2015-08- Understanding-LSTMs/. [30] Jesus Rogel-Salazar, “Data Science and Analytics with Python,” 1st. ed., Chapman & Hall/CRC, 2017