SlideShare a Scribd company logo
1 of 18
Download to read offline
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 1
Selangor, Malaysia. Organized by https://worldconferences.net
BENCHMARKING SUPERVISED LEARNING MODELS
FOR SENTIMENT ANALYSIS
Ang Jia Pheng, Duc Nghia Pham, Ong Hong Hoe
Artificial Intelligence Laboratory
MIMOS Berhad
{jp.ang, nghia.pham, hh.ong} @mimos.my
ABSTRACT
Sentiment analysis is an effective method to extract meaningful information from sentence and
documents. While pursuing performance of a model, we also need to take other factors such as
time, hardware and manpower into consideration. Beside algorithm, the characteristic and
feature of textual dataset also affect the performance of a model. In this paper, we benchmark
the performance of 1 deep learning model (Bidirectional long short-term memory – BiLSTM),
and 1 machine learning model (FastText) using 3 different datasets: binary class document
level Amazon Product Reviews dataset, binary class sentence level Sentiment140 Twitter
Sentiment dataset, and ternary class sentence level financial domain-specific dataset. Since
FastText does not support GPU for computation, this study ran on CPU to benchmark the
speed and performance. Our results show that BiLSTM performed better than FastText in all
tasks, with the cost of at least 15000% longer time for training process under the same machine
specification.
Field of Research: sentiment analysis, machine learning, natural language processing.
----------------------------------------------------------------------------------------------------------------
1. Introduction
With the advancement of technology and the increment of internet population, the volume of
data being generated daily is staggering and the increment has yet to reach an end. According
to ‘Data Never Sleeps 6.0’ infographic [1], “Over 2.5 quintillion bytes of data are created every
single day, and it’s only going to grow from there. By 2020, it’s estimated that 1.7MB of data
will be created every second for every person on earth”. Sentiment analysis is an efficient
method for extracting useful information from textual data. Organizations such as governments
and businesses have started to utilize sentiment analysis for decision making [2]. To generate
a classifier with high performance, benchmarking is necessary to find the most suitable
algorithm with low resource consumption.
In this paper, three benchmarking textual datasets were prepared: (i) Amazon Product Reviews
dataset [11], (ii) Sentiment140 Twitter Sentiment dataset [10], and (iii) self-collected financial
domain-specific dataset. We benchmarked 1 deep learning model: Bidirectional long short-
term memory (BiLSTM) [15], and 1 linear machine learning model: FastText [12][13]. For
BiLSTM model, we applied the default settings for the benchmark. FastText, which does not
have default setting, we configured the training hyper-parameters mimicking the hyper-
parameters of the BiLSTM model. Due to the fast computation speed of FastText, we were able
to train several models using different epochs, n-grams, and learning rates to benchmark the
changes in performance.
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 2
Selangor, Malaysia. Organized by https://worldconferences.net
2. Background
2.1 Sentiment Analysis
Sentiment analysis, also known as opinion mining, is a process of extracting opinion from
textual data by building a system to analyze and identify the polarity of the text. Sentiment
analysis, a Natural Language Processing (NLP) task, recently gained popularity [3] due to its
practicality in terms of extracting direct information and usable feedback from the target
audience.
A sentiment analysis model will intake data from a source, analyze the data normally using a
pre-trained machine learning model, and produce an output to categorize the data into
predetermined classes. The sentiment classes are usually binary (‘positive’ or ‘negative’), or
ternary (‘positive’, ‘neutral’, and ‘negative’). In addition, the model can further categorize input
text into classes with degrees (such as ‘slightly positive’) and aspect-based information. For
aspect-based classification, instead of categorizing a text into polarity classes, an extra
categorization will be performed to determine the aspect of a topic/domain mentioned in the
text. For example, instead of only categorizing the following review of a restaurant: “the waiter
of that restaurant is very rude” as ‘negative’, an aspect-based classifier will further indicate that
the negativity came from the aspect ‘waiter’. Although aspect-based classification generally
requires more computing resources, it often have a lower error margin than general sentiment
classification as a usable aspect-based result requires higher precision accuracy [4]. In the other
hand, aspect-based classification provides a pinpoint information where decisions can be made
with minimal resource costs.
Due to the usefulness and practicality of sentiment analysis, may research works have been
devoted to this topic. The regular challenges and competitions, such as Kaggle [5] and the
International Workshop on Semantic Evaluation (SemEval) [6], also help facilitate works on
this topic. Indeed, many high-quality models were introduced during these events.
2.2 Sentence-level and Document-level Supervised Learning Sentiment Analysis
Dataset from different sources has different characteristics and features. Unlike formally written
text documents from sources like mainstream news, data originated from microblogs such as
Twitter, even with the recent higher message limit of 280 characters, normally have a median
message length of no more than 50 characters [7]. Micro-statuses like Twitter tweets are often
very short, and contain short hands, buzzword, emoji, and other short symbols that represent
long text. Depending on their characteristics, different datasets may require different methods
or settings to achieve the desired accuracy.
In this benchmark, we evaluate sentiment analysis models on both document-level and sentence-
level sentiment analysis. While a sentence-level analysis focuses on determining the sentiment
of each sentence, the sentiment of a document-level analysis is determined by the overall
sentiment of a paragraph that contains more than 1 sentence. Both sentiment analyses are
performed with supervision, meaning all training data must be labelled. For each data entry
input into the model, the machine will identify the pattern and produce a prediction which will
be used to compare with the label. When there is an error or mismatch between the prediction
and the actual result, the model will adjust its internal logic to improve its performance.
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 3
Selangor, Malaysia. Organized by https://worldconferences.net
3. Datasets
In this study, we prepared three datasets: (i) self-collected financial domain-specific dataset,
(ii) Sentiment140 Twitter Sentiment dataset, and (iii) Amazon Product Reviews dataset. The
details of these datasets are discussed below.
3.1 Self-collected financial domain-specific dataset
Table 1: Label distribution for self-collected financial domain-specific dataset
Labels Size (%)
Positive 3040 (26.95%)
Neutral 4454 (39.49%)
Negative 3786 (33.56%)
We collected 11,280 data entries by crawling local (Malaysia) reputable websites and blogs,
such as Sin Chew Daily [8] to collect financial-related news, and blog posts. After initial
processing (segmentation and tagging), the dataset has an average of 11 words per sentence.
As the data’s source is mainly news, the content of the sentence is usually straightforward in
expressing a matter, grammatically correct, and have consistent sentence structure.
The crawled raw data are often in paragraphs and we manually segmented the paragraphs into
sentence. The segmentation is done to ensure that each sentence refers to only one subject or
one topic. Table 2 shows an example sentence mentions 4 companies and each has a different
sentiment. The sentence was then manually segmented into 4 entries for clear sentiment
representation. After each batch of segmentation, duplicates were removed to reduce bias in
the dataset.
Table 2: The comparison between original sentence and post segmentation sentence for financial dataset
Original Sentence CIMB fell four sen to RM5.98, Hong Leong Bank 10 sen lower at
RM20.62, Public Bank two sen to RM24.98 and Maybank unchanged
at RM9.60.
After
segmentation
 CIMB fell four sen to RM5.98,
 Hong Leong Bank 10 sen lower at RM20.62,
 Public Bank two sen to RM24.98
 and Maybank unchanged at RM9.60.
Each sentence is manually labeled with sentiment from the perspective of Malaysia and
company level. If a sentence involves Malaysia and other countries, the decision to label the
sentence will be made based on how it would affect Malaysia. Sentences that are beneficial to
companies, even though they might imply negativity on the consumer or citizen, the sentences
will be labelled as positive. For sentence that implies changes but without context or the context
it is referring is in another sentence, the sentence will be labelled as neutral.
Table 3 displays an example sentence showcasing the difficulties when dealing with financial
data containing many numbers, unique symbols, domain-specific terms, and domain-specific
short hand that carries solid sentiment representation. Unlike the other 2 datasets, the sentiment
of this dataset often depending on comparing 2 numbers in the sentence to indicated increment
or decrement, which directly implies positive or negative polarity. Option to remove
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 4
Selangor, Malaysia. Organized by https://worldconferences.net
punctuation during usual data preprocessing becomes a dilemma, as we fear that by removing
punctuation and symbol, the sentence will lose its sentiment indicator such as ‘+’ and ‘-’, where
in stock related news, those symbols are the key indicators for the sentence to be correctly
classified. The method to represent numbers also becomes an issue. Financial dataset consists
of large amount of numbers with various format and combination. Our model are using word
vectors as embedding and does not recognize numbers. For example: ‘10’, ‘10.0’, and ‘10.00’
while indicating same value, the model will identify it as 3 different individual vocabulary and
assign respective vectors and representations. Besides causing unnecessarily large word vectors
that slows down the computational speed, different vectors on vocabs that bring same meaning
will cause unwanted bias. Using the same example mentioned above, if ‘10’ appears frequently
in ‘negative’ labelled sentences, when another sentence with ‘10’ as part of the sentence is fed
into the model, the model will have high bias towards ‘negative’ label during classification.
Table 3: Example of domain-specific sentence that hard to understand but indicate sentiment
Sentence: Maintain MP with cum/ex-TP of RM0.41/RM0.34, pegged to 0.4x PBV
Translation Maintain Market perform with cumulative/ex-Target Price of
RM0.41/RM0.34, pegged to 0.4 times of Price-to-Book Value.
Sentiment Positive
3.2 Sentiment140 [9] Sentiment Dataset
Table 4: Label distribution for Sentiment140 Sentiment dataset
Labels Size (%)
Positive 50000 (50%)
Negative 50000 (50%)
The dataset [10] was obtained from Kaggle, which originally consists of 1,499,999 entries with
50.5% positive tweets and 49.5% negative tweets from Twitter. Due to time limitation, 100000
sentences were used in this benchmark. For each polarity, sentences were sorted by length and
then alphabetically before selection. Once ordered, we selected 1 sentence as a benchmark
entry for every 13 sentences to ensure we included all sentences with various lengths and avoid
duplication. With this selection strategy, we minimized the probability of obtaining very similar
or duplicated sentences to reduce bias.
The dataset has an average of 13 words per sentence. Unlike the financial dataset, the nature of
this dataset is general and does not have specific aspect. The entries in the dataset often contain
repeated alphabets in a word to express emotion, Twitter specific function words, URLs, emoji,
emoticon, and buzzwords. An example from the dataset is shown in Table 5.
Table 5: Example sentence from Sentiment140 Twitter Sentiment Dataset
Trying to signup to @SerialSeb Agile talk... the register button takes you to the venue page!!!
http://tinyurl.com/nd7t4u #signupfail
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 5
Selangor, Malaysia. Organized by https://worldconferences.net
3.3 Amazon Product Reviews Dataset
Table 6: Label Distribution for Amazon Product Review Dataset
Labels Size (%)
Positive 50000 (50%)
Negative 50000 (50%)
The dataset [11] was downloaded from Kaggle, which originally consists of 3,600,000
balanced entries with 50 % positive and 50% negative product reviews from Amazon, written
by Amazon users who purchased a product from the site. The dataset was presented in multiple
sentences, where only less than ten entries in the dataset are made up of one sentence. The label
of each entry is being determined based on the overall sentiment of entry, instead of each
sentence. Due to time limitation, 100,000 sentences are being used in this benchmark. We use
the identical amount of entries of Sentiment140 Twitter Sentiment Dataset to limit the influence
of dataset set on the model training speed and performance.
To obtain the data, we use identical technique mentioned in section 3.2 (Sentiment140 Twitter
Sentiment Dataset). We ensure we have a balanced dataset and contains entries with different
length for a better representation of the original dataset.
After the selection of the 100,000 dataset, the dataset has an average of 78 words per sentence.
Due to the source of the dataset, Amazon Product Reviews dataset comes with higher than
usual word count that estimate will consume significantly more time to train the model. The
characteristics of the dataset is similar to Sentiment140 Twitter Sentiment Dataset, but without
Twitter function words and the length is significantly longer. The content from the dataset is
also more grammatically correct compared to Twitter Dataset as Amazon have larger character
limit than Twitter. Table 7 below shows an example entry from the Amazon Product Review
Dataset.
Table 7: Example sentence from Amazon Product Review Dataset
Must-Have for Children's Library: I ordered this book as part of a third grade curriculum for
my son. These Everyman titles are really wonderful-- beautiful binding, heavy pages,
amazing illustrations. I was not disappointed with this book. It is the kind of book we will
read again and again and hopefully pass on to the next generation. Hawthorne's rendition of
the myths and fables is classic and engaging, and Rackham's illustrations are worth the price
on their own.
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 6
Selangor, Malaysia. Organized by https://worldconferences.net
4. Model Description
4.1 FastText
FastText is a text classification [12] and word representation [13] library developed by
Facebook AI Research (FAIR) team. FastText can produce supervised and unsupervised text
classification models and word representation. FastText provides 2 models for computing word
representation: skipgram and continuous bag-of-words (CBOW). For this benchmark, we
compared the performance of FastText using two word representations: its own generated word
representation and a controlled word representation from word2vec [14].
FastText is known for its short training time and high accuracy is that on par with Deep
Learning text classifiers. The creator of FastText claimed: “We can train FastText on more than
one billion words in less than ten minutes using a standard multicore CPU, and classify half a
million sentences among 312K classes in less than a minute.” [12]. For word embedding layer,
word vectors are being averaged and then input into a multinomial logistic regression for
training and classification. Hashing method was introduced to speed up the computational
process. To reduce computational complexity, user can specify the use of hierarchical softmax
to further increase the computation speed, by reducing computational complexity O(k*h) to
O(h*log(k)), where k is the number of classes and h is dimension of text representation. Figure
1 shows the basic architecture of FastText.
4.2 BiLSTM
For deep learning, we selected the BiLSTM with attention algorithm by Baziotis et al [15], due
to its 1st
place and 2nd
place rankings in two different tasks in the SemEval-2017 Task 4
competition. This model consists of two layers of bidirectional LSTM, which feed into an
attention layer where the attention mechanism assigns a weight to each word annotation, and
finally send into an output layer with output probability distribution over all classes. Figure 2
depicts the structure of this model.
Figure 1: Model architecture of FastText for a sentence with N ngram features x1, …, xN. The features are embedded and
averaged to form the hidden variable
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 7
Selangor, Malaysia. Organized by https://worldconferences.net
Figure 2: Structure of Bi-LSTM with attention developed by Baziotis.et al. [16]
5. Experiment Setup
5.1 Data Pre-processing
Nowadays, people used emoji, emoticons, special symbols, etc. to express sentiment. These
icons and symbols are quite easy for human to interpret the sentiment they carry, but pose a
great challenge for machine to comprehend. In this study, we used the text pre-processing
method implemented by Baziotis et al. [15] to resolve these icons and symbols, replace
inconsistent features with consistent annotation, and reduce the vocabulary size of the model
while still retaining useful information. In particular, our text preprocessor performs the
following steps: (i) tokenization (also handling emoji, dates, currency, time, acronyms,
censored words, and emphasized words), (ii) spell correction, (iii) word normalization, (iv)
segmentation, and (v) annotation. Table 8 shows an example of the output text after
preprocessing.
Table 8: Changes of raw text after applying text-pre-processor
Before @Carm823 I did!! It looks good! Im gonna twitpic it when its not so red.... Im
definitely sore today....
After <user> i did ! <repeated> it looks good ! im gonna twitpic it when its not so red .
<repeated> im definitely sore today . <repeated>
5.2 BiLSTM
We used the default settings for BiLSTM as in [15]. Table 9 summarizes the settings. We trained
BiLSTM on each dataset for 50 epochs where each epoch training stops when the validation
lost stops decreasing.
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 8
Selangor, Malaysia. Organized by https://worldconferences.net
Table 9: Parameters for BiLSTM Model
Parameters Values
Epoch 50
Batch Size 50
Input max length (Financial, Twitter, Amazon) 100, 50,
150
Embedding Dimension 300
Embedding Gaussian Noise (σ) 0.2
Embedding Dropout 0.5
LSTM Dropout 0.25
L2 Regularization 0.0001
5.3 FastText
Table 9 shows the different settings for FastText ran in this benchmark.
Table 10: Combination of hyperparameters used in FastText model training
Name Dimension Epoch Learning Rate N-gram Word Vectors
E50LR0.05N1BV 300 50 0.05 1
BiLSTM
word vector
E50LR0.05N2BV 300 50 0.05 2
E50LR0.5N1BV 300 50 0.5 1
E25LR0.05N1BV 300 25 0.05 1
E50LR0.05N1FV 300 50 0.05 1
FastText
word vector
E50LR0.05N2FV 300 50 0.05 2
E50LR0.5N1FV 300 50 0.5 1
E25LR0.05N1FV 300 25 0.05 1
6. Experimental Result
All our benchmark tests were conducted on an Ubuntu 18.04 machine with an Intel(R) Xeon(R)
quad-core CPU E5-1603 @ 2.80GHz and 16GB of RAM. As FastText does not support GPU
acceleration, we ran both FastText and BiLSTM on CPU only to have a fair comparison in
terms of time and speed. However, it is expected that the training time and processing time of
these models will be significantly improved with GPU acceleration.
We ran both FastText and BiLSTM on the 3 original raw datasets described in Section 3 as
well as their corresponding pre-processed datasets. Each of these 6 (raw or pre-processed)
datasets was split into 80-10-10, i.e. 80% for training, 10% for validation and 10% for testing.
10-fold cross validation was applied to each run of FastText and the macro-average results
were reported.
Tables 11-13 show the results of BiLSTM and 8 variants of FastText on the raw and pre-
processed versions of our three datasets: Financial domain-specific dataset, Amazon product
review dataset, and Sentiment140 Twitter sentiment dataset. In these tables, ‘P’, ‘R’, and ‘F1’
stand for precision, recall and F1-score, respectively. ‘Time’ column recorded the total time in
seconds to train, validate and test. The best results are bolded. All results in a table are sorted
in descending order according to F1-score of runs on pre-processed datasets.
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page 9
Selangor, Malaysia. Organized by https://worldconferences.net
Table 11: Results on Financial domain-specific dataset
Raw Text Pre-processed Text
Model P R F1 Time P R F1 Time
BiLSTM 0.8272 0.8215 0.8243 14948.86 0.8583 0.8541 0.8562 15107.60
E50LR0.05N2BV 0.8262 0.8262 0.8262 97.11 0.8561 0.8561 0.8561 96.57
E50LR0.05N2FV 0.8213 0.8213 0.8213 4.23 0.8520 0.8520 0.8520 4.41
E25LR0.05N1FV 0.8140 0.8140 0.8140 1.07 0.8396 0.8396 0.8396 1.02
E25LR0.05N1BV 0.8192 0.8192 0.8192 88.00 0.8337 0.8337 0.8337 87.61
E50LR0.05N1FV 0.8129 0.8129 0.8129 1.71 0.8318 0.8318 0.8318 1.72
E50LR0.05N1BV 0.8174 0.8174 0.8174 90.34 0.8311 0.8311 0.8311 89.88
E50LR0.5N1FV 0.8107 0.8107 0.8107 1.71 0.8293 0.8293 0.8293 1.72
E50LR0.5N1BV 0.8160 0.8160 0.8160 90.57 0.8283 0.8283 0.8283 90.01
Table 12: Results on Sentiment140 Twitter sentiment dataset
Raw Text Pre-processed Text
Model P R F1 Time P R F1 Time
BiLSTM 0.7325 0.7324 0.7324 44340.00 0.8179 0.8177 0.8178 44318.65
E25LR0.05N1BV 0.7437 0.7437 0.7437 92.87 0.7739 0.7739 0.7739 91.68
E50LR0.05N1FV 0.7353 0.7353 0.7353 9.89 0.7679 0.7679 0.7679 9.58
E50LR0.05N2BV 0.7572 0.7572 0.7572 112.86 0.7675 0.7675 0.7675 110.74
E50LR0.5N1BV 0.7307 0.7307 0.7307 98.86 0.7653 0.7653 0.7653 97.85
E50LR0.05N2FV 0.7546 0.7546 0.7546 18.87 0.7651 0.7651 0.7651 18.83
E50LR0.5N1FV 0.7324 0.7324 0.7324 9.89 0.7622 0.7622 0.7622 9.56
E50LR0.05N1BV 0.7336 0.7336 0.7336 98.82 0.7603 0.7603 0.7603 97.70
E25LR0.05N1FV 0.7459 0.7459 0.7459 5.86 0.7520 0.7520 0.7520 5.22
Table 13: Results on Amazon product review dataset
Raw Text Pre-processed Text
Model P R F1 Time P R F1 Time
BiLSTM 0.8813 0.8812 0.8812 198155.24 0.9467 0.9466 0.9466 177514.44
E50LR0.05N2BV 0.9016 0.9016 0.9016 191.00 0.9160 0.9160 0.9160 189.44
E50LR0.05N2FV 0.9009 0.9009 0.9009 93.82 0.9147 0.9147 0.9147 94.95
E25LR0.05N1BV 0.8822 0.8822 0.8822 110.32 0.8934 0.8934 0.8934 108.33
E25LR0.05N1FV 0.8820 0.8820 0.8820 23.67 0.8903 0.8903 0.8903 22.59
E50LR0.05N1BV 0.8790 0.8790 0.8790 130.88 0.8860 0.8860 0.8860 130.12
E50LR0.05N1FV 0.8784 0.8784 0.8784 42.58 0.8855 0.8855 0.8855 42.60
E50LR0.5N1BV 0.8769 0.8769 0.8769 131.47 0.8830 0.8830 0.8830 130.31
E50LR0.5N1FV 0.8762 0.8762 0.8762 43.63 0.8821 0.8821 0.8821 42.57
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
10
Selangor, Malaysia. Organized by https://worldconferences.net
6.1 Pre-processed Text vs Raw Text
Figures 3-5 plot the pairwise comparison of F1-scores of nine models on the raw and pre-
processed versions of our three datasets. The charts in these figures clearly show that text pre-
processing uniformly boosted the performance of all models across the three datasets.
Figure 3: Pairwise F1-score comparison on raw vs pre-processed Financial domain-specific dataset
Figure 4: Pairwise F1-score comparison on raw vs pre-processed Sentiment140 Twitter sentiment dataset
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
11
Selangor, Malaysia. Organized by https://worldconferences.net
Figure 5: Pairwise F1-score comparison on raw vs pre-processed Amazon product review dataset
Overall, text pre-processing boosted the BiLSTM’s F1-score by up to 8.54% on Sentiment140
Twitter sentiment dataset. Among the three datasets, Sentiment140 dataset benefited the most
from text pre-processing with an average of 2.96% increase of F1-score. On the Financial and
Amazon datasets, each of the nine models received an average boost of 2.18% and 1.55%,
respectively.
6.2 Formal Text vs Informal Text
In this section, we looked at the impact of formal and informal written text on the performance
of supervised learning models. Figures 6 and 7 plot the results of nine tested models, grouped
by benchmark datasets. It is worthy to note that Financial dataset consists of sentence-level
formal text, whereas the other two datasets consists of informal text. In particular, Sentiment140
dataset consists of sentence-level short tweets, and Amazon dataset consists of document-level
long paragraph reviews. Among the three, our supervised models performed the best on Amazon
dataset, followed by Financial dataset and Sentiment140 dataset at the last. This suggest that
informal short text poses a great challenge to supervised learning models. Indeed, the accuracy
of our tested models on this dataset was between 11.76% and 14.96% worse than their measures
on the other two datasets. Interestingly, BiLSTM was the worst performer on the original
Sentiment140 dataset, but received a significant performance boost from text-preprocessing and
became the best model on the same pre-processed set.
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
12
Selangor, Malaysia. Organized by https://worldconferences.net
Figure 6: F1-score of Nine Models on Raw Datasets
Figure 7: F1-score of Nine Models on Pre-processed Datasets
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
13
Selangor, Malaysia. Organized by https://worldconferences.net
6.3 BiLSTM
Figure 8: Comparison of Precision & Recall Value of Datasets Preformed on BiLSTM
Figure 8 shows the precision and recall of BiLSTM across the 6 datasets: raw and pre-processed.
BiLSTM performed best on Amazon dataset, followed by Financial and Sentiment140 datasets.
This order is true for both raw and pre-processing cases.
Figure 9 plots the precision of BiLSTM over 50 epochs of training. On the Amazon and
Sentiment140 datasets, the precision loss stabilized after 10 epochs. However, this validation
loss on the Financial dataset remained fluctuated until the last epoch. In addition, this chart
provides further evidence for the advantage of text pre-processing: all three pre-processing
curves are smoother than their corresponding raw curves.
Figure 9: Performance of BiLSTM in precision over 50 epoch
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
14
Selangor, Malaysia. Organized by https://worldconferences.net
6.4 FastText variants
Figure 10: Pairwise F1-score comparison on raw dataset using BiLSTM and FastText word vectors
Figures 10 and 11 plot the pairwise comparison of the impact of different word embedding
models on FastText’s performance. Without pre-processing, BiLSTM’s word embedding model
produced better results on the Financial dataset whereas FastText’s word vector gave better
results on Sentiment140 dataset. However, the conclusion is reversed when applying text pre-
processing: FastText’s word embedding model gave better results on the Financial dataset but
lost to BiLSTM’s word model on the other two datasets.
Figure 11: Pairwise F1-score comparison on pre-processed dataset using BiLSTM and FastText word vectors
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
15
Selangor, Malaysia. Organized by https://worldconferences.net
Figure 12: Pairwise F1-score comparison on three datasets using different epoch settings
Figures 12-14 plot the pairwise comparison on the impact of settings epoch size, learning rate,
and n-gram to FastText model. These figures clearly show that setting a smaller epoch size (25),
lower learning rate (0.05), and bi-gram improved FastText performance across the three
datasets. Among these settings, bi-gram is the clear winner, providing the biggest boost;
following by small epoch setting. On the other hand, the improvements by setting a lower
learning rate were slight small.
Figure 13: Pairwise F1-score comparison on three datasets using different learning rate
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
16
Selangor, Malaysia. Organized by https://worldconferences.net
Figure 14: Pairwise F1-score comparison on three datasets using unigram and bi-gram
6.5 BiLSTM vs FastText
Tables 14-16 compare the results of BiLSTM and the best FastText variant across the three
datasets. On the Financial dataset, FastText is the better choice for sentiment analysis given its
superior fast processing time even though the difference in accuracy is marginal (0.01%).
On the other hand, BiLSTM achieved higher accuracy than FastText on the other two datasets:
Amazon product reviews and Sentiment140 tweets, with a difference in F1-score of 3% and
4.39%, respectively. It is the clear choice if accuracy is prioritized. However, FastText remains
a solid choice, especially if computing resources and time are limited.
Table 14: Comparison of BiLSTM and the best FastText model on Financial domain specific dataset.
BiLSTM FastText Difference
Precision 0.8583 0.8561 0.0022
Recall 0.8541 0.8561 -0.002
F1 Score 0.8562 0.8561 0.0001
Time (s) 15107.6 96.57 15011.03
Table 15: Comparison of BiLSTM and the best FastText model on Sentiment140 Twitter sentiment dataset
BiLSTM FastText Difference
Precision 0.8179 0.7739 0.044
Recall 0.8177 0.7739 0.0438
F1 Score 0.8178 0.7739 0.0439
Time(s) 44318.65 91.68 44226.97
Table 16: Comparison of BiLSTM and the best FastText model on Amazon product review dataset
BiLSTM FastText Difference
Precision 0.9467 0.916 0.0307
Recall 0.9466 0.916 0.0306
F1 Score 0.9466 0.916 0.0306
Time(s) 177514.4 189.44 177325
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
17
Selangor, Malaysia. Organized by https://worldconferences.net
7. Conclusions and Future Work
In conclusion, we found that BiLSTM produced higher accuracy than FastText across all three
benchmark datasets, with or without text-preprocessing. However, FastText remains a solid
choice for sentiment analysis, given its superior fast processing and the small accuracy
difference (up to 4%) in comparison to BiLSTM. It is expected that GPU acceleration can help
reduce the training time required by BiLSTM, making it a more practical choice to users. In
addition, we found that text pre-processing was necessary when dealing with textual data,
especially informal written text. We also observed that informal short text like tweets pose a
huge challenge to supervised learning models. In future, we plan to identify and evaluate
mechanism to improve the accuracy of sentiment analysis on informal written micro-blogs.
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi
Page
18
Selangor, Malaysia. Organized by https://worldconferences.net
References
[1] Domo. (2018, June 5). Data Never Sleeps 6.0 Infographic. Retrieved from Domo
Corporation Website: https://www.domo.com/learn/data-never-sleeps-6
[2] G. R. Brindha, B. S. (2012). Application of opinion mining technique in talent
management. 2012 - International Conference on Management Issues in Emerging
Economies
[3] Sentiment Analysis's interest over time. Retrieved from Google Trend:
https://trends.google.com/trends/explore?date=all&geo=US&q=%2Fm%2F0g57xn
[4] Villena-Román, J., Martínez-Cámara, E., García-Morera, J., & Zafra, S.M. (2015). TASS
2014 - The Challenge of Aspect-based Sentiment Analysis. Procesamiento del Lenguaje
Natural, 54, 61-68.
[5] Sentiment Analysis on Movie Reviews. (2015, March 01). Retrieved from Kaggle:
https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews
[6] SemEval-2017. (2017, Jan 09). Retrieved from Arabic Language Technologies Group
(ALT): http://alt.qcri.org/semeval2017/index.php?id=tasks
[7] Christian M. Alis. (2015). Quantifying Regional Differences in the Length of Twitter
Messages. (p. 2). United Kingdom: PLOS one.
[8] https://www.sinchew.com.my/business.html
[9] Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant
supervision. CS224N Project Report, Stanford, 1(2009), p.12.
[10] Sentiment140 dataset with 1.6 million tweets. (2017, September 14). Retrieved from
Kaggle Website: https://www.kaggle.com/kazanova/sentiment140
[11] McAuley, J. Amazon product data. Retrieved from
http://jmcauley.ucsd.edu/data/amazon/
[12] Joulin, A. a. (2016). Bag of Tricks for Efficient Text Classification. arXiv preprint
arXiv:1607.01759.
[13] Bojanowski, P. a. (2016). Enriching Word Vectors with Subword Information. arXiv
preprint arXiv:1607.04606
[14] Dean, T. M. (2013). Efficient Estimation of Word Representations in Vector Space.
Retrieved from http://arxiv.org/abs/1301.3781
[15] Baziotis, C. a. (2017). DataStories at SemEval-2017 Task 4: Deep LSTM with Attention
for Message-level and Topic-based Sentiment Analysis. Proceedings of the 11th
International Workshop on Semantic Evaluation (SemEval-2017). Vancouver, Canada:
Association for Computational Linguistics

More Related Content

What's hot

Blockchain secure biometric access systems (bsbas)
Blockchain secure biometric access systems (bsbas)Blockchain secure biometric access systems (bsbas)
Blockchain secure biometric access systems (bsbas)Conference Papers
 
The design and implementation of trade finance application based on hyperledg...
The design and implementation of trade finance application based on hyperledg...The design and implementation of trade finance application based on hyperledg...
The design and implementation of trade finance application based on hyperledg...Conference Papers
 
My harmony generating statistics from clinical text for monitoring clinical...
My harmony   generating statistics from clinical text for monitoring clinical...My harmony   generating statistics from clinical text for monitoring clinical...
My harmony generating statistics from clinical text for monitoring clinical...Conference Papers
 
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEIAEME Publication
 
An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...IJECEIAES
 
Smart information desk system with voice assistant for universities
Smart information desk system with voice assistant for universities Smart information desk system with voice assistant for universities
Smart information desk system with voice assistant for universities IJECEIAES
 
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008Journal For Research
 
An overview of internet of things
An overview of internet of thingsAn overview of internet of things
An overview of internet of thingsTELKOMNIKA JOURNAL
 
IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...
IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...
IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...ijcsit
 
A Study on the Applications and Impact of Artificial Intelligence in E Commer...
A Study on the Applications and Impact of Artificial Intelligence in E Commer...A Study on the Applications and Impact of Artificial Intelligence in E Commer...
A Study on the Applications and Impact of Artificial Intelligence in E Commer...ijtsrd
 
IRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT ApplicationsIRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT ApplicationsIRJET Journal
 
RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...
RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...
RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...ijcsit
 
Securing Cloud Using Fog: A Review
Securing Cloud Using Fog: A ReviewSecuring Cloud Using Fog: A Review
Securing Cloud Using Fog: A ReviewIRJET Journal
 
Vijayananda Mohire ORCID (0000 0002-0647-4130)
Vijayananda Mohire ORCID (0000 0002-0647-4130)Vijayananda Mohire ORCID (0000 0002-0647-4130)
Vijayananda Mohire ORCID (0000 0002-0647-4130)Vijayananda Mohire
 
Privacy Preserving Mining in Code Profiling Data
Privacy Preserving Mining in Code Profiling DataPrivacy Preserving Mining in Code Profiling Data
Privacy Preserving Mining in Code Profiling DataDr. Amarjeet Singh
 
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...IJMIT JOURNAL
 
PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...
PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...
PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...AIRCC Publishing Corporation
 
Study of Resource Discovery trends in Internet of Things (IoT)
Study of Resource Discovery trends in Internet of Things (IoT)Study of Resource Discovery trends in Internet of Things (IoT)
Study of Resource Discovery trends in Internet of Things (IoT)Eswar Publications
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionJustluk Luk
 

What's hot (20)

Blockchain secure biometric access systems (bsbas)
Blockchain secure biometric access systems (bsbas)Blockchain secure biometric access systems (bsbas)
Blockchain secure biometric access systems (bsbas)
 
The design and implementation of trade finance application based on hyperledg...
The design and implementation of trade finance application based on hyperledg...The design and implementation of trade finance application based on hyperledg...
The design and implementation of trade finance application based on hyperledg...
 
My harmony generating statistics from clinical text for monitoring clinical...
My harmony   generating statistics from clinical text for monitoring clinical...My harmony   generating statistics from clinical text for monitoring clinical...
My harmony generating statistics from clinical text for monitoring clinical...
 
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
 
An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...
 
Smart information desk system with voice assistant for universities
Smart information desk system with voice assistant for universities Smart information desk system with voice assistant for universities
Smart information desk system with voice assistant for universities
 
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
 
An overview of internet of things
An overview of internet of thingsAn overview of internet of things
An overview of internet of things
 
IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...
IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...
IMMERSIVE TECHNOLOGIES IN 5G-ENABLED APPLICATIONS: SOME TECHNICAL CHALLENGES ...
 
A Study on the Applications and Impact of Artificial Intelligence in E Commer...
A Study on the Applications and Impact of Artificial Intelligence in E Commer...A Study on the Applications and Impact of Artificial Intelligence in E Commer...
A Study on the Applications and Impact of Artificial Intelligence in E Commer...
 
IRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT ApplicationsIRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT Applications
 
RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...
RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...
RECOMMENDATION GENERATION JUSTIFIED FOR INFORMATION ACCESS ASSISTANCE SERVICE...
 
Securing Cloud Using Fog: A Review
Securing Cloud Using Fog: A ReviewSecuring Cloud Using Fog: A Review
Securing Cloud Using Fog: A Review
 
Vijayananda Mohire ORCID (0000 0002-0647-4130)
Vijayananda Mohire ORCID (0000 0002-0647-4130)Vijayananda Mohire ORCID (0000 0002-0647-4130)
Vijayananda Mohire ORCID (0000 0002-0647-4130)
 
Privacy Preserving Mining in Code Profiling Data
Privacy Preserving Mining in Code Profiling DataPrivacy Preserving Mining in Code Profiling Data
Privacy Preserving Mining in Code Profiling Data
 
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...
 
PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...
PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...
PROVIDES AN APPROACH BASED ON ADAPTIVE FORWARDING AND LABEL SWITCHING TO IMPR...
 
Study of Resource Discovery trends in Internet of Things (IoT)
Study of Resource Discovery trends in Internet of Things (IoT)Study of Resource Discovery trends in Internet of Things (IoT)
Study of Resource Discovery trends in Internet of Things (IoT)
 
Scirj p1220830(3)
Scirj p1220830(3)Scirj p1220830(3)
Scirj p1220830(3)
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 

Similar to Benchmarking supervised learning models for sentiment analysis

IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...IRJET Journal
 
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...IRJET Journal
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET Journal
 
Sentiment Analysis in Social Media and Its Operations
Sentiment Analysis in Social Media and Its OperationsSentiment Analysis in Social Media and Its Operations
Sentiment Analysis in Social Media and Its OperationsIRJET Journal
 
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET Journal
 
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET Journal
 
FESCCO: Fuzzy Expert System for Career Counselling
FESCCO: Fuzzy Expert System for Career CounsellingFESCCO: Fuzzy Expert System for Career Counselling
FESCCO: Fuzzy Expert System for Career Counsellingrahulmonikasharma
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET Journal
 
Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013ijcsbi
 
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...IRJET Journal
 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learningbusiness Corporate
 
Product Analyst Advisor
Product Analyst AdvisorProduct Analyst Advisor
Product Analyst AdvisorIRJET Journal
 
Machine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerceMachine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerceIAESIJAI
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniquesijtsrd
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyIJECEIAES
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
 
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...IRJET Journal
 
Building a recommendation system based on the job offers extracted from the w...
Building a recommendation system based on the job offers extracted from the w...Building a recommendation system based on the job offers extracted from the w...
Building a recommendation system based on the job offers extracted from the w...IJECEIAES
 

Similar to Benchmarking supervised learning models for sentiment analysis (20)

IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
 
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
Twitter Text Sentiment Analysis: A Comparative Study on Unigram and Bigram Fe...
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News Articles
 
Sentiment Analysis in Social Media and Its Operations
Sentiment Analysis in Social Media and Its OperationsSentiment Analysis in Social Media and Its Operations
Sentiment Analysis in Social Media and Its Operations
 
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
 
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
 
FESCCO: Fuzzy Expert System for Career Counselling
FESCCO: Fuzzy Expert System for Career CounsellingFESCCO: Fuzzy Expert System for Career Counselling
FESCCO: Fuzzy Expert System for Career Counselling
 
ML
ML ML
ML
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
 
Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013
 
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learning
 
Product Analyst Advisor
Product Analyst AdvisorProduct Analyst Advisor
Product Analyst Advisor
 
Machine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerceMachine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerce
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrency
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
 
Building a recommendation system based on the job offers extracted from the w...
Building a recommendation system based on the job offers extracted from the w...Building a recommendation system based on the job offers extracted from the w...
Building a recommendation system based on the job offers extracted from the w...
 

More from Conference Papers

Ai driven occupational skills generator
Ai driven occupational skills generatorAi driven occupational skills generator
Ai driven occupational skills generatorConference Papers
 
Advanced resource allocation and service level monitoring for container orche...
Advanced resource allocation and service level monitoring for container orche...Advanced resource allocation and service level monitoring for container orche...
Advanced resource allocation and service level monitoring for container orche...Conference Papers
 
Adaptive authentication to determine login attempt penalty from multiple inpu...
Adaptive authentication to determine login attempt penalty from multiple inpu...Adaptive authentication to determine login attempt penalty from multiple inpu...
Adaptive authentication to determine login attempt penalty from multiple inpu...Conference Papers
 
Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...
Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...
Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...Conference Papers
 
A deployment scenario a taxonomy mapping and keyword searching for the appl...
A deployment scenario   a taxonomy mapping and keyword searching for the appl...A deployment scenario   a taxonomy mapping and keyword searching for the appl...
A deployment scenario a taxonomy mapping and keyword searching for the appl...Conference Papers
 
Automated snomed ct mapping of clinical discharge summary data for cardiology...
Automated snomed ct mapping of clinical discharge summary data for cardiology...Automated snomed ct mapping of clinical discharge summary data for cardiology...
Automated snomed ct mapping of clinical discharge summary data for cardiology...Conference Papers
 
Automated login method selection in a multi modal authentication - login meth...
Automated login method selection in a multi modal authentication - login meth...Automated login method selection in a multi modal authentication - login meth...
Automated login method selection in a multi modal authentication - login meth...Conference Papers
 
Atomization of reduced graphene oxide ultra thin film for transparent electro...
Atomization of reduced graphene oxide ultra thin film for transparent electro...Atomization of reduced graphene oxide ultra thin film for transparent electro...
Atomization of reduced graphene oxide ultra thin film for transparent electro...Conference Papers
 
An enhanced wireless presentation system for large scale content distribution
An enhanced wireless presentation system for large scale content distribution An enhanced wireless presentation system for large scale content distribution
An enhanced wireless presentation system for large scale content distribution Conference Papers
 
An analysis of a large scale wireless image distribution system deployment
An analysis of a large scale wireless image distribution system deploymentAn analysis of a large scale wireless image distribution system deployment
An analysis of a large scale wireless image distribution system deploymentConference Papers
 
Validation of early testing method for e government projects by requirement ...
Validation of early testing method for e  government projects by requirement ...Validation of early testing method for e  government projects by requirement ...
Validation of early testing method for e government projects by requirement ...Conference Papers
 
Towards predictive maintenance for marine sector in malaysia
Towards predictive maintenance for marine sector in malaysiaTowards predictive maintenance for marine sector in malaysia
Towards predictive maintenance for marine sector in malaysiaConference Papers
 
The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...
The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...
The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...Conference Papers
 
Searchable symmetric encryption security definitions
Searchable symmetric encryption security definitionsSearchable symmetric encryption security definitions
Searchable symmetric encryption security definitionsConference Papers
 
Super convergence of autonomous things
Super convergence of autonomous thingsSuper convergence of autonomous things
Super convergence of autonomous thingsConference Papers
 
Study on performance of capacitor less ldo with different types of resistor
Study on performance of capacitor less ldo with different types of resistorStudy on performance of capacitor less ldo with different types of resistor
Study on performance of capacitor less ldo with different types of resistorConference Papers
 
Stil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal designStil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal designConference Papers
 
On premise ai platform - from dc to edge
On premise ai platform - from dc to edgeOn premise ai platform - from dc to edge
On premise ai platform - from dc to edgeConference Papers
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis Conference Papers
 
Rapid reduction of ultrathin films of graphene oxide on large area silicon su...
Rapid reduction of ultrathin films of graphene oxide on large area silicon su...Rapid reduction of ultrathin films of graphene oxide on large area silicon su...
Rapid reduction of ultrathin films of graphene oxide on large area silicon su...Conference Papers
 

More from Conference Papers (20)

Ai driven occupational skills generator
Ai driven occupational skills generatorAi driven occupational skills generator
Ai driven occupational skills generator
 
Advanced resource allocation and service level monitoring for container orche...
Advanced resource allocation and service level monitoring for container orche...Advanced resource allocation and service level monitoring for container orche...
Advanced resource allocation and service level monitoring for container orche...
 
Adaptive authentication to determine login attempt penalty from multiple inpu...
Adaptive authentication to determine login attempt penalty from multiple inpu...Adaptive authentication to determine login attempt penalty from multiple inpu...
Adaptive authentication to determine login attempt penalty from multiple inpu...
 
Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...
Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...
Absorption spectrum analysis of dentine sialophosphoprotein (dspp) in orthodo...
 
A deployment scenario a taxonomy mapping and keyword searching for the appl...
A deployment scenario   a taxonomy mapping and keyword searching for the appl...A deployment scenario   a taxonomy mapping and keyword searching for the appl...
A deployment scenario a taxonomy mapping and keyword searching for the appl...
 
Automated snomed ct mapping of clinical discharge summary data for cardiology...
Automated snomed ct mapping of clinical discharge summary data for cardiology...Automated snomed ct mapping of clinical discharge summary data for cardiology...
Automated snomed ct mapping of clinical discharge summary data for cardiology...
 
Automated login method selection in a multi modal authentication - login meth...
Automated login method selection in a multi modal authentication - login meth...Automated login method selection in a multi modal authentication - login meth...
Automated login method selection in a multi modal authentication - login meth...
 
Atomization of reduced graphene oxide ultra thin film for transparent electro...
Atomization of reduced graphene oxide ultra thin film for transparent electro...Atomization of reduced graphene oxide ultra thin film for transparent electro...
Atomization of reduced graphene oxide ultra thin film for transparent electro...
 
An enhanced wireless presentation system for large scale content distribution
An enhanced wireless presentation system for large scale content distribution An enhanced wireless presentation system for large scale content distribution
An enhanced wireless presentation system for large scale content distribution
 
An analysis of a large scale wireless image distribution system deployment
An analysis of a large scale wireless image distribution system deploymentAn analysis of a large scale wireless image distribution system deployment
An analysis of a large scale wireless image distribution system deployment
 
Validation of early testing method for e government projects by requirement ...
Validation of early testing method for e  government projects by requirement ...Validation of early testing method for e  government projects by requirement ...
Validation of early testing method for e government projects by requirement ...
 
Towards predictive maintenance for marine sector in malaysia
Towards predictive maintenance for marine sector in malaysiaTowards predictive maintenance for marine sector in malaysia
Towards predictive maintenance for marine sector in malaysia
 
The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...
The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...
The new leaed (ii) ion selective electrode on free plasticizer film of pthfa ...
 
Searchable symmetric encryption security definitions
Searchable symmetric encryption security definitionsSearchable symmetric encryption security definitions
Searchable symmetric encryption security definitions
 
Super convergence of autonomous things
Super convergence of autonomous thingsSuper convergence of autonomous things
Super convergence of autonomous things
 
Study on performance of capacitor less ldo with different types of resistor
Study on performance of capacitor less ldo with different types of resistorStudy on performance of capacitor less ldo with different types of resistor
Study on performance of capacitor less ldo with different types of resistor
 
Stil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal designStil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal design
 
On premise ai platform - from dc to edge
On premise ai platform - from dc to edgeOn premise ai platform - from dc to edge
On premise ai platform - from dc to edge
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis
 
Rapid reduction of ultrathin films of graphene oxide on large area silicon su...
Rapid reduction of ultrathin films of graphene oxide on large area silicon su...Rapid reduction of ultrathin films of graphene oxide on large area silicon su...
Rapid reduction of ultrathin films of graphene oxide on large area silicon su...
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Benchmarking supervised learning models for sentiment analysis

  • 1. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 1 Selangor, Malaysia. Organized by https://worldconferences.net BENCHMARKING SUPERVISED LEARNING MODELS FOR SENTIMENT ANALYSIS Ang Jia Pheng, Duc Nghia Pham, Ong Hong Hoe Artificial Intelligence Laboratory MIMOS Berhad {jp.ang, nghia.pham, hh.ong} @mimos.my ABSTRACT Sentiment analysis is an effective method to extract meaningful information from sentence and documents. While pursuing performance of a model, we also need to take other factors such as time, hardware and manpower into consideration. Beside algorithm, the characteristic and feature of textual dataset also affect the performance of a model. In this paper, we benchmark the performance of 1 deep learning model (Bidirectional long short-term memory – BiLSTM), and 1 machine learning model (FastText) using 3 different datasets: binary class document level Amazon Product Reviews dataset, binary class sentence level Sentiment140 Twitter Sentiment dataset, and ternary class sentence level financial domain-specific dataset. Since FastText does not support GPU for computation, this study ran on CPU to benchmark the speed and performance. Our results show that BiLSTM performed better than FastText in all tasks, with the cost of at least 15000% longer time for training process under the same machine specification. Field of Research: sentiment analysis, machine learning, natural language processing. ---------------------------------------------------------------------------------------------------------------- 1. Introduction With the advancement of technology and the increment of internet population, the volume of data being generated daily is staggering and the increment has yet to reach an end. According to ‘Data Never Sleeps 6.0’ infographic [1], “Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there. By 2020, it’s estimated that 1.7MB of data will be created every second for every person on earth”. Sentiment analysis is an efficient method for extracting useful information from textual data. Organizations such as governments and businesses have started to utilize sentiment analysis for decision making [2]. To generate a classifier with high performance, benchmarking is necessary to find the most suitable algorithm with low resource consumption. In this paper, three benchmarking textual datasets were prepared: (i) Amazon Product Reviews dataset [11], (ii) Sentiment140 Twitter Sentiment dataset [10], and (iii) self-collected financial domain-specific dataset. We benchmarked 1 deep learning model: Bidirectional long short- term memory (BiLSTM) [15], and 1 linear machine learning model: FastText [12][13]. For BiLSTM model, we applied the default settings for the benchmark. FastText, which does not have default setting, we configured the training hyper-parameters mimicking the hyper- parameters of the BiLSTM model. Due to the fast computation speed of FastText, we were able to train several models using different epochs, n-grams, and learning rates to benchmark the changes in performance.
  • 2. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 2 Selangor, Malaysia. Organized by https://worldconferences.net 2. Background 2.1 Sentiment Analysis Sentiment analysis, also known as opinion mining, is a process of extracting opinion from textual data by building a system to analyze and identify the polarity of the text. Sentiment analysis, a Natural Language Processing (NLP) task, recently gained popularity [3] due to its practicality in terms of extracting direct information and usable feedback from the target audience. A sentiment analysis model will intake data from a source, analyze the data normally using a pre-trained machine learning model, and produce an output to categorize the data into predetermined classes. The sentiment classes are usually binary (‘positive’ or ‘negative’), or ternary (‘positive’, ‘neutral’, and ‘negative’). In addition, the model can further categorize input text into classes with degrees (such as ‘slightly positive’) and aspect-based information. For aspect-based classification, instead of categorizing a text into polarity classes, an extra categorization will be performed to determine the aspect of a topic/domain mentioned in the text. For example, instead of only categorizing the following review of a restaurant: “the waiter of that restaurant is very rude” as ‘negative’, an aspect-based classifier will further indicate that the negativity came from the aspect ‘waiter’. Although aspect-based classification generally requires more computing resources, it often have a lower error margin than general sentiment classification as a usable aspect-based result requires higher precision accuracy [4]. In the other hand, aspect-based classification provides a pinpoint information where decisions can be made with minimal resource costs. Due to the usefulness and practicality of sentiment analysis, may research works have been devoted to this topic. The regular challenges and competitions, such as Kaggle [5] and the International Workshop on Semantic Evaluation (SemEval) [6], also help facilitate works on this topic. Indeed, many high-quality models were introduced during these events. 2.2 Sentence-level and Document-level Supervised Learning Sentiment Analysis Dataset from different sources has different characteristics and features. Unlike formally written text documents from sources like mainstream news, data originated from microblogs such as Twitter, even with the recent higher message limit of 280 characters, normally have a median message length of no more than 50 characters [7]. Micro-statuses like Twitter tweets are often very short, and contain short hands, buzzword, emoji, and other short symbols that represent long text. Depending on their characteristics, different datasets may require different methods or settings to achieve the desired accuracy. In this benchmark, we evaluate sentiment analysis models on both document-level and sentence- level sentiment analysis. While a sentence-level analysis focuses on determining the sentiment of each sentence, the sentiment of a document-level analysis is determined by the overall sentiment of a paragraph that contains more than 1 sentence. Both sentiment analyses are performed with supervision, meaning all training data must be labelled. For each data entry input into the model, the machine will identify the pattern and produce a prediction which will be used to compare with the label. When there is an error or mismatch between the prediction and the actual result, the model will adjust its internal logic to improve its performance.
  • 3. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 3 Selangor, Malaysia. Organized by https://worldconferences.net 3. Datasets In this study, we prepared three datasets: (i) self-collected financial domain-specific dataset, (ii) Sentiment140 Twitter Sentiment dataset, and (iii) Amazon Product Reviews dataset. The details of these datasets are discussed below. 3.1 Self-collected financial domain-specific dataset Table 1: Label distribution for self-collected financial domain-specific dataset Labels Size (%) Positive 3040 (26.95%) Neutral 4454 (39.49%) Negative 3786 (33.56%) We collected 11,280 data entries by crawling local (Malaysia) reputable websites and blogs, such as Sin Chew Daily [8] to collect financial-related news, and blog posts. After initial processing (segmentation and tagging), the dataset has an average of 11 words per sentence. As the data’s source is mainly news, the content of the sentence is usually straightforward in expressing a matter, grammatically correct, and have consistent sentence structure. The crawled raw data are often in paragraphs and we manually segmented the paragraphs into sentence. The segmentation is done to ensure that each sentence refers to only one subject or one topic. Table 2 shows an example sentence mentions 4 companies and each has a different sentiment. The sentence was then manually segmented into 4 entries for clear sentiment representation. After each batch of segmentation, duplicates were removed to reduce bias in the dataset. Table 2: The comparison between original sentence and post segmentation sentence for financial dataset Original Sentence CIMB fell four sen to RM5.98, Hong Leong Bank 10 sen lower at RM20.62, Public Bank two sen to RM24.98 and Maybank unchanged at RM9.60. After segmentation  CIMB fell four sen to RM5.98,  Hong Leong Bank 10 sen lower at RM20.62,  Public Bank two sen to RM24.98  and Maybank unchanged at RM9.60. Each sentence is manually labeled with sentiment from the perspective of Malaysia and company level. If a sentence involves Malaysia and other countries, the decision to label the sentence will be made based on how it would affect Malaysia. Sentences that are beneficial to companies, even though they might imply negativity on the consumer or citizen, the sentences will be labelled as positive. For sentence that implies changes but without context or the context it is referring is in another sentence, the sentence will be labelled as neutral. Table 3 displays an example sentence showcasing the difficulties when dealing with financial data containing many numbers, unique symbols, domain-specific terms, and domain-specific short hand that carries solid sentiment representation. Unlike the other 2 datasets, the sentiment of this dataset often depending on comparing 2 numbers in the sentence to indicated increment or decrement, which directly implies positive or negative polarity. Option to remove
  • 4. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 4 Selangor, Malaysia. Organized by https://worldconferences.net punctuation during usual data preprocessing becomes a dilemma, as we fear that by removing punctuation and symbol, the sentence will lose its sentiment indicator such as ‘+’ and ‘-’, where in stock related news, those symbols are the key indicators for the sentence to be correctly classified. The method to represent numbers also becomes an issue. Financial dataset consists of large amount of numbers with various format and combination. Our model are using word vectors as embedding and does not recognize numbers. For example: ‘10’, ‘10.0’, and ‘10.00’ while indicating same value, the model will identify it as 3 different individual vocabulary and assign respective vectors and representations. Besides causing unnecessarily large word vectors that slows down the computational speed, different vectors on vocabs that bring same meaning will cause unwanted bias. Using the same example mentioned above, if ‘10’ appears frequently in ‘negative’ labelled sentences, when another sentence with ‘10’ as part of the sentence is fed into the model, the model will have high bias towards ‘negative’ label during classification. Table 3: Example of domain-specific sentence that hard to understand but indicate sentiment Sentence: Maintain MP with cum/ex-TP of RM0.41/RM0.34, pegged to 0.4x PBV Translation Maintain Market perform with cumulative/ex-Target Price of RM0.41/RM0.34, pegged to 0.4 times of Price-to-Book Value. Sentiment Positive 3.2 Sentiment140 [9] Sentiment Dataset Table 4: Label distribution for Sentiment140 Sentiment dataset Labels Size (%) Positive 50000 (50%) Negative 50000 (50%) The dataset [10] was obtained from Kaggle, which originally consists of 1,499,999 entries with 50.5% positive tweets and 49.5% negative tweets from Twitter. Due to time limitation, 100000 sentences were used in this benchmark. For each polarity, sentences were sorted by length and then alphabetically before selection. Once ordered, we selected 1 sentence as a benchmark entry for every 13 sentences to ensure we included all sentences with various lengths and avoid duplication. With this selection strategy, we minimized the probability of obtaining very similar or duplicated sentences to reduce bias. The dataset has an average of 13 words per sentence. Unlike the financial dataset, the nature of this dataset is general and does not have specific aspect. The entries in the dataset often contain repeated alphabets in a word to express emotion, Twitter specific function words, URLs, emoji, emoticon, and buzzwords. An example from the dataset is shown in Table 5. Table 5: Example sentence from Sentiment140 Twitter Sentiment Dataset Trying to signup to @SerialSeb Agile talk... the register button takes you to the venue page!!! http://tinyurl.com/nd7t4u #signupfail
  • 5. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 5 Selangor, Malaysia. Organized by https://worldconferences.net 3.3 Amazon Product Reviews Dataset Table 6: Label Distribution for Amazon Product Review Dataset Labels Size (%) Positive 50000 (50%) Negative 50000 (50%) The dataset [11] was downloaded from Kaggle, which originally consists of 3,600,000 balanced entries with 50 % positive and 50% negative product reviews from Amazon, written by Amazon users who purchased a product from the site. The dataset was presented in multiple sentences, where only less than ten entries in the dataset are made up of one sentence. The label of each entry is being determined based on the overall sentiment of entry, instead of each sentence. Due to time limitation, 100,000 sentences are being used in this benchmark. We use the identical amount of entries of Sentiment140 Twitter Sentiment Dataset to limit the influence of dataset set on the model training speed and performance. To obtain the data, we use identical technique mentioned in section 3.2 (Sentiment140 Twitter Sentiment Dataset). We ensure we have a balanced dataset and contains entries with different length for a better representation of the original dataset. After the selection of the 100,000 dataset, the dataset has an average of 78 words per sentence. Due to the source of the dataset, Amazon Product Reviews dataset comes with higher than usual word count that estimate will consume significantly more time to train the model. The characteristics of the dataset is similar to Sentiment140 Twitter Sentiment Dataset, but without Twitter function words and the length is significantly longer. The content from the dataset is also more grammatically correct compared to Twitter Dataset as Amazon have larger character limit than Twitter. Table 7 below shows an example entry from the Amazon Product Review Dataset. Table 7: Example sentence from Amazon Product Review Dataset Must-Have for Children's Library: I ordered this book as part of a third grade curriculum for my son. These Everyman titles are really wonderful-- beautiful binding, heavy pages, amazing illustrations. I was not disappointed with this book. It is the kind of book we will read again and again and hopefully pass on to the next generation. Hawthorne's rendition of the myths and fables is classic and engaging, and Rackham's illustrations are worth the price on their own.
  • 6. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 6 Selangor, Malaysia. Organized by https://worldconferences.net 4. Model Description 4.1 FastText FastText is a text classification [12] and word representation [13] library developed by Facebook AI Research (FAIR) team. FastText can produce supervised and unsupervised text classification models and word representation. FastText provides 2 models for computing word representation: skipgram and continuous bag-of-words (CBOW). For this benchmark, we compared the performance of FastText using two word representations: its own generated word representation and a controlled word representation from word2vec [14]. FastText is known for its short training time and high accuracy is that on par with Deep Learning text classifiers. The creator of FastText claimed: “We can train FastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.” [12]. For word embedding layer, word vectors are being averaged and then input into a multinomial logistic regression for training and classification. Hashing method was introduced to speed up the computational process. To reduce computational complexity, user can specify the use of hierarchical softmax to further increase the computation speed, by reducing computational complexity O(k*h) to O(h*log(k)), where k is the number of classes and h is dimension of text representation. Figure 1 shows the basic architecture of FastText. 4.2 BiLSTM For deep learning, we selected the BiLSTM with attention algorithm by Baziotis et al [15], due to its 1st place and 2nd place rankings in two different tasks in the SemEval-2017 Task 4 competition. This model consists of two layers of bidirectional LSTM, which feed into an attention layer where the attention mechanism assigns a weight to each word annotation, and finally send into an output layer with output probability distribution over all classes. Figure 2 depicts the structure of this model. Figure 1: Model architecture of FastText for a sentence with N ngram features x1, …, xN. The features are embedded and averaged to form the hidden variable
  • 7. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 7 Selangor, Malaysia. Organized by https://worldconferences.net Figure 2: Structure of Bi-LSTM with attention developed by Baziotis.et al. [16] 5. Experiment Setup 5.1 Data Pre-processing Nowadays, people used emoji, emoticons, special symbols, etc. to express sentiment. These icons and symbols are quite easy for human to interpret the sentiment they carry, but pose a great challenge for machine to comprehend. In this study, we used the text pre-processing method implemented by Baziotis et al. [15] to resolve these icons and symbols, replace inconsistent features with consistent annotation, and reduce the vocabulary size of the model while still retaining useful information. In particular, our text preprocessor performs the following steps: (i) tokenization (also handling emoji, dates, currency, time, acronyms, censored words, and emphasized words), (ii) spell correction, (iii) word normalization, (iv) segmentation, and (v) annotation. Table 8 shows an example of the output text after preprocessing. Table 8: Changes of raw text after applying text-pre-processor Before @Carm823 I did!! It looks good! Im gonna twitpic it when its not so red.... Im definitely sore today.... After <user> i did ! <repeated> it looks good ! im gonna twitpic it when its not so red . <repeated> im definitely sore today . <repeated> 5.2 BiLSTM We used the default settings for BiLSTM as in [15]. Table 9 summarizes the settings. We trained BiLSTM on each dataset for 50 epochs where each epoch training stops when the validation lost stops decreasing.
  • 8. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 8 Selangor, Malaysia. Organized by https://worldconferences.net Table 9: Parameters for BiLSTM Model Parameters Values Epoch 50 Batch Size 50 Input max length (Financial, Twitter, Amazon) 100, 50, 150 Embedding Dimension 300 Embedding Gaussian Noise (σ) 0.2 Embedding Dropout 0.5 LSTM Dropout 0.25 L2 Regularization 0.0001 5.3 FastText Table 9 shows the different settings for FastText ran in this benchmark. Table 10: Combination of hyperparameters used in FastText model training Name Dimension Epoch Learning Rate N-gram Word Vectors E50LR0.05N1BV 300 50 0.05 1 BiLSTM word vector E50LR0.05N2BV 300 50 0.05 2 E50LR0.5N1BV 300 50 0.5 1 E25LR0.05N1BV 300 25 0.05 1 E50LR0.05N1FV 300 50 0.05 1 FastText word vector E50LR0.05N2FV 300 50 0.05 2 E50LR0.5N1FV 300 50 0.5 1 E25LR0.05N1FV 300 25 0.05 1 6. Experimental Result All our benchmark tests were conducted on an Ubuntu 18.04 machine with an Intel(R) Xeon(R) quad-core CPU E5-1603 @ 2.80GHz and 16GB of RAM. As FastText does not support GPU acceleration, we ran both FastText and BiLSTM on CPU only to have a fair comparison in terms of time and speed. However, it is expected that the training time and processing time of these models will be significantly improved with GPU acceleration. We ran both FastText and BiLSTM on the 3 original raw datasets described in Section 3 as well as their corresponding pre-processed datasets. Each of these 6 (raw or pre-processed) datasets was split into 80-10-10, i.e. 80% for training, 10% for validation and 10% for testing. 10-fold cross validation was applied to each run of FastText and the macro-average results were reported. Tables 11-13 show the results of BiLSTM and 8 variants of FastText on the raw and pre- processed versions of our three datasets: Financial domain-specific dataset, Amazon product review dataset, and Sentiment140 Twitter sentiment dataset. In these tables, ‘P’, ‘R’, and ‘F1’ stand for precision, recall and F1-score, respectively. ‘Time’ column recorded the total time in seconds to train, validate and test. The best results are bolded. All results in a table are sorted in descending order according to F1-score of runs on pre-processed datasets.
  • 9. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 9 Selangor, Malaysia. Organized by https://worldconferences.net Table 11: Results on Financial domain-specific dataset Raw Text Pre-processed Text Model P R F1 Time P R F1 Time BiLSTM 0.8272 0.8215 0.8243 14948.86 0.8583 0.8541 0.8562 15107.60 E50LR0.05N2BV 0.8262 0.8262 0.8262 97.11 0.8561 0.8561 0.8561 96.57 E50LR0.05N2FV 0.8213 0.8213 0.8213 4.23 0.8520 0.8520 0.8520 4.41 E25LR0.05N1FV 0.8140 0.8140 0.8140 1.07 0.8396 0.8396 0.8396 1.02 E25LR0.05N1BV 0.8192 0.8192 0.8192 88.00 0.8337 0.8337 0.8337 87.61 E50LR0.05N1FV 0.8129 0.8129 0.8129 1.71 0.8318 0.8318 0.8318 1.72 E50LR0.05N1BV 0.8174 0.8174 0.8174 90.34 0.8311 0.8311 0.8311 89.88 E50LR0.5N1FV 0.8107 0.8107 0.8107 1.71 0.8293 0.8293 0.8293 1.72 E50LR0.5N1BV 0.8160 0.8160 0.8160 90.57 0.8283 0.8283 0.8283 90.01 Table 12: Results on Sentiment140 Twitter sentiment dataset Raw Text Pre-processed Text Model P R F1 Time P R F1 Time BiLSTM 0.7325 0.7324 0.7324 44340.00 0.8179 0.8177 0.8178 44318.65 E25LR0.05N1BV 0.7437 0.7437 0.7437 92.87 0.7739 0.7739 0.7739 91.68 E50LR0.05N1FV 0.7353 0.7353 0.7353 9.89 0.7679 0.7679 0.7679 9.58 E50LR0.05N2BV 0.7572 0.7572 0.7572 112.86 0.7675 0.7675 0.7675 110.74 E50LR0.5N1BV 0.7307 0.7307 0.7307 98.86 0.7653 0.7653 0.7653 97.85 E50LR0.05N2FV 0.7546 0.7546 0.7546 18.87 0.7651 0.7651 0.7651 18.83 E50LR0.5N1FV 0.7324 0.7324 0.7324 9.89 0.7622 0.7622 0.7622 9.56 E50LR0.05N1BV 0.7336 0.7336 0.7336 98.82 0.7603 0.7603 0.7603 97.70 E25LR0.05N1FV 0.7459 0.7459 0.7459 5.86 0.7520 0.7520 0.7520 5.22 Table 13: Results on Amazon product review dataset Raw Text Pre-processed Text Model P R F1 Time P R F1 Time BiLSTM 0.8813 0.8812 0.8812 198155.24 0.9467 0.9466 0.9466 177514.44 E50LR0.05N2BV 0.9016 0.9016 0.9016 191.00 0.9160 0.9160 0.9160 189.44 E50LR0.05N2FV 0.9009 0.9009 0.9009 93.82 0.9147 0.9147 0.9147 94.95 E25LR0.05N1BV 0.8822 0.8822 0.8822 110.32 0.8934 0.8934 0.8934 108.33 E25LR0.05N1FV 0.8820 0.8820 0.8820 23.67 0.8903 0.8903 0.8903 22.59 E50LR0.05N1BV 0.8790 0.8790 0.8790 130.88 0.8860 0.8860 0.8860 130.12 E50LR0.05N1FV 0.8784 0.8784 0.8784 42.58 0.8855 0.8855 0.8855 42.60 E50LR0.5N1BV 0.8769 0.8769 0.8769 131.47 0.8830 0.8830 0.8830 130.31 E50LR0.5N1FV 0.8762 0.8762 0.8762 43.63 0.8821 0.8821 0.8821 42.57
  • 10. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 10 Selangor, Malaysia. Organized by https://worldconferences.net 6.1 Pre-processed Text vs Raw Text Figures 3-5 plot the pairwise comparison of F1-scores of nine models on the raw and pre- processed versions of our three datasets. The charts in these figures clearly show that text pre- processing uniformly boosted the performance of all models across the three datasets. Figure 3: Pairwise F1-score comparison on raw vs pre-processed Financial domain-specific dataset Figure 4: Pairwise F1-score comparison on raw vs pre-processed Sentiment140 Twitter sentiment dataset
  • 11. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 11 Selangor, Malaysia. Organized by https://worldconferences.net Figure 5: Pairwise F1-score comparison on raw vs pre-processed Amazon product review dataset Overall, text pre-processing boosted the BiLSTM’s F1-score by up to 8.54% on Sentiment140 Twitter sentiment dataset. Among the three datasets, Sentiment140 dataset benefited the most from text pre-processing with an average of 2.96% increase of F1-score. On the Financial and Amazon datasets, each of the nine models received an average boost of 2.18% and 1.55%, respectively. 6.2 Formal Text vs Informal Text In this section, we looked at the impact of formal and informal written text on the performance of supervised learning models. Figures 6 and 7 plot the results of nine tested models, grouped by benchmark datasets. It is worthy to note that Financial dataset consists of sentence-level formal text, whereas the other two datasets consists of informal text. In particular, Sentiment140 dataset consists of sentence-level short tweets, and Amazon dataset consists of document-level long paragraph reviews. Among the three, our supervised models performed the best on Amazon dataset, followed by Financial dataset and Sentiment140 dataset at the last. This suggest that informal short text poses a great challenge to supervised learning models. Indeed, the accuracy of our tested models on this dataset was between 11.76% and 14.96% worse than their measures on the other two datasets. Interestingly, BiLSTM was the worst performer on the original Sentiment140 dataset, but received a significant performance boost from text-preprocessing and became the best model on the same pre-processed set.
  • 12. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 12 Selangor, Malaysia. Organized by https://worldconferences.net Figure 6: F1-score of Nine Models on Raw Datasets Figure 7: F1-score of Nine Models on Pre-processed Datasets
  • 13. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 13 Selangor, Malaysia. Organized by https://worldconferences.net 6.3 BiLSTM Figure 8: Comparison of Precision & Recall Value of Datasets Preformed on BiLSTM Figure 8 shows the precision and recall of BiLSTM across the 6 datasets: raw and pre-processed. BiLSTM performed best on Amazon dataset, followed by Financial and Sentiment140 datasets. This order is true for both raw and pre-processing cases. Figure 9 plots the precision of BiLSTM over 50 epochs of training. On the Amazon and Sentiment140 datasets, the precision loss stabilized after 10 epochs. However, this validation loss on the Financial dataset remained fluctuated until the last epoch. In addition, this chart provides further evidence for the advantage of text pre-processing: all three pre-processing curves are smoother than their corresponding raw curves. Figure 9: Performance of BiLSTM in precision over 50 epoch
  • 14. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 14 Selangor, Malaysia. Organized by https://worldconferences.net 6.4 FastText variants Figure 10: Pairwise F1-score comparison on raw dataset using BiLSTM and FastText word vectors Figures 10 and 11 plot the pairwise comparison of the impact of different word embedding models on FastText’s performance. Without pre-processing, BiLSTM’s word embedding model produced better results on the Financial dataset whereas FastText’s word vector gave better results on Sentiment140 dataset. However, the conclusion is reversed when applying text pre- processing: FastText’s word embedding model gave better results on the Financial dataset but lost to BiLSTM’s word model on the other two datasets. Figure 11: Pairwise F1-score comparison on pre-processed dataset using BiLSTM and FastText word vectors
  • 15. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 15 Selangor, Malaysia. Organized by https://worldconferences.net Figure 12: Pairwise F1-score comparison on three datasets using different epoch settings Figures 12-14 plot the pairwise comparison on the impact of settings epoch size, learning rate, and n-gram to FastText model. These figures clearly show that setting a smaller epoch size (25), lower learning rate (0.05), and bi-gram improved FastText performance across the three datasets. Among these settings, bi-gram is the clear winner, providing the biggest boost; following by small epoch setting. On the other hand, the improvements by setting a lower learning rate were slight small. Figure 13: Pairwise F1-score comparison on three datasets using different learning rate
  • 16. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 16 Selangor, Malaysia. Organized by https://worldconferences.net Figure 14: Pairwise F1-score comparison on three datasets using unigram and bi-gram 6.5 BiLSTM vs FastText Tables 14-16 compare the results of BiLSTM and the best FastText variant across the three datasets. On the Financial dataset, FastText is the better choice for sentiment analysis given its superior fast processing time even though the difference in accuracy is marginal (0.01%). On the other hand, BiLSTM achieved higher accuracy than FastText on the other two datasets: Amazon product reviews and Sentiment140 tweets, with a difference in F1-score of 3% and 4.39%, respectively. It is the clear choice if accuracy is prioritized. However, FastText remains a solid choice, especially if computing resources and time are limited. Table 14: Comparison of BiLSTM and the best FastText model on Financial domain specific dataset. BiLSTM FastText Difference Precision 0.8583 0.8561 0.0022 Recall 0.8541 0.8561 -0.002 F1 Score 0.8562 0.8561 0.0001 Time (s) 15107.6 96.57 15011.03 Table 15: Comparison of BiLSTM and the best FastText model on Sentiment140 Twitter sentiment dataset BiLSTM FastText Difference Precision 0.8179 0.7739 0.044 Recall 0.8177 0.7739 0.0438 F1 Score 0.8178 0.7739 0.0439 Time(s) 44318.65 91.68 44226.97 Table 16: Comparison of BiLSTM and the best FastText model on Amazon product review dataset BiLSTM FastText Difference Precision 0.9467 0.916 0.0307 Recall 0.9466 0.916 0.0306 F1 Score 0.9466 0.916 0.0306 Time(s) 177514.4 189.44 177325
  • 17. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 17 Selangor, Malaysia. Organized by https://worldconferences.net 7. Conclusions and Future Work In conclusion, we found that BiLSTM produced higher accuracy than FastText across all three benchmark datasets, with or without text-preprocessing. However, FastText remains a solid choice for sentiment analysis, given its superior fast processing and the small accuracy difference (up to 4%) in comparison to BiLSTM. It is expected that GPU acceleration can help reduce the training time required by BiLSTM, making it a more practical choice to users. In addition, we found that text pre-processing was necessary when dealing with textual data, especially informal written text. We also observed that informal short text like tweets pose a huge challenge to supervised learning models. In future, we plan to identify and evaluate mechanism to improve the accuracy of sentiment analysis on informal written micro-blogs.
  • 18. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019 E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-xxxxx-x-x). 29 July 2019, Puri Pujangga Hotel, Bangi Page 18 Selangor, Malaysia. Organized by https://worldconferences.net References [1] Domo. (2018, June 5). Data Never Sleeps 6.0 Infographic. Retrieved from Domo Corporation Website: https://www.domo.com/learn/data-never-sleeps-6 [2] G. R. Brindha, B. S. (2012). Application of opinion mining technique in talent management. 2012 - International Conference on Management Issues in Emerging Economies [3] Sentiment Analysis's interest over time. Retrieved from Google Trend: https://trends.google.com/trends/explore?date=all&geo=US&q=%2Fm%2F0g57xn [4] Villena-Román, J., Martínez-Cámara, E., García-Morera, J., & Zafra, S.M. (2015). TASS 2014 - The Challenge of Aspect-based Sentiment Analysis. Procesamiento del Lenguaje Natural, 54, 61-68. [5] Sentiment Analysis on Movie Reviews. (2015, March 01). Retrieved from Kaggle: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews [6] SemEval-2017. (2017, Jan 09). Retrieved from Arabic Language Technologies Group (ALT): http://alt.qcri.org/semeval2017/index.php?id=tasks [7] Christian M. Alis. (2015). Quantifying Regional Differences in the Length of Twitter Messages. (p. 2). United Kingdom: PLOS one. [8] https://www.sinchew.com.my/business.html [9] Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009), p.12. [10] Sentiment140 dataset with 1.6 million tweets. (2017, September 14). Retrieved from Kaggle Website: https://www.kaggle.com/kazanova/sentiment140 [11] McAuley, J. Amazon product data. Retrieved from http://jmcauley.ucsd.edu/data/amazon/ [12] Joulin, A. a. (2016). Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759. [13] Bojanowski, P. a. (2016). Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 [14] Dean, T. M. (2013). Efficient Estimation of Word Representations in Vector Space. Retrieved from http://arxiv.org/abs/1301.3781 [15] Baziotis, C. a. (2017). DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Vancouver, Canada: Association for Computational Linguistics