Aspect-Level Sentiment Analysis On Hotel Reviews

Aspect-Level Sentiment Analysis
on Hotel Reviews
Nibedita Panigrahi and T. Asha
Abstract Sentimental analysis is a part of natural language processing which
extracts and analyzes the opinions, sentiments, and emotions from written language.
In today’s world, every organization always wants to know public and customer’s
feedback about their products and also about their services that gives very important
for business or organization about their product in the market and their services to
perform better. Aspect-level sentiment analysis is one of the techniques which ﬁnd
and aggregate sentiment on entities mentioned within documents or aspects of
them. This paper converts unstructured data into structural data by using scrappy
and selection tool in Python, then Natural Language Tool Kit (NLTK) is used to
tokenize and part-of-speech tagging. Next the reviews are broken into single-line
sentence and identify the lists of aspects of each sentence. Finally, we have ana-
lyzed different aspects along with its scores calculated from a sentiment score
algorithm, which we have collected from the hotel Web sites.
Keywords Opinion analysis ⋅ Aspects mining ⋅ Machine learning
Natural language processing (NLP) ⋅ POS tagging
1 Introduction
Opinions are very important to all human activities. Sentiment analysis and opinion
mining give the information about sentiments of opinions, emotions, and reactions.
Since 2000, opinion analysis had become the most important research area in
NLP. Sentiment analysis is mainly used in data mining. Due to importance in
computer science ﬁeld, sentiment analysis is widely used in management services,
N. Panigrahi (✉) ⋅ T. Asha
Department of Computer Science & Engineering, Bangalore Institute of Technology,
Bangaluru 560004, India
e-mail: nibedita.kuni@gmail.com
T. Asha
e-mail: asha.masthi@gmail.com
© Springer Nature Singapore Pte Ltd. 2019
H. S. Behera et al. (eds.), Computational Intelligence in Data Mining,
Advances in Intelligent Systems and Computing 711,
https://doi.org/10.1007/978-981-10-8055-5_34
379

social science also. NLP plays a vital role of user actions, and for this reason every
users’ decision is based on others opinions. The basic task of sentimental analysis is
to find the difference of a given user data and text from a data set and give output as
positive, negative, or neutral. The sentimental analysis types are document level,
aspect level, and sentence level. The final output in document level is identifying
whether a whole document gives positive, negative, or neutral opinion. Here each
document gives opinions on a single entity, so this is not good for those types of
document that contains more than one entity like hotel reviews. In sentence-level
analysis, every sentence expresses a positive opinion, negative opinion, or neutral
opinion. Positive opinion means the sentence will have some positive sense or
similar feelings, negative opinion means the sentence will have some negative sense
or similar feelings, and neutral opinion means the sentence does not have any
sentiment. Here, the sentence that expresses factual information is found first, and it
is known as subjective sentence and examines sentiment value for each sentence.
This kind of analysis is better than the document level of opinion analysis. Both
document and sentence levels could not give proper understanding, what the user is
trying to tell. So far that another analysis that is aspect-level analysis where aspects
inside the sentence will be identified just and then finding out the polarity is
whether positive, negative, or neutral. The analysis of this kind gives clear result of
sentiment score. For example, in the sentence “Hotel rooms are not good; wifi
internet facility is good”, here the opinion looks like positive but that is combi-
nation of positive and negative opinions. Here the analysis is positive for “hotel
facilities” but negative for the “hotel rooms” which gives two different aspects,
where the aspect gives negative polarity and the second aspect gives positive
polarity. So, the main aim of aspects level is to discover sentiment on various
aspects.
2 Related Work
The most recent two decades have seen change in the field of opinion mining or
sentiment analysis. A couple of experimentation papers have also been published
and issued showing new methods and original plans to perform sentiment analysis.
Still there required many ideas for the field of corpus creation and data extraction.
According to Kim et al. [1], the opinion on new movies can be analyzed in three
phases: The first phase is to building the sentiment word list for analyzing opinions
of the user, then organizing certain contractions and phrases for performing the
process of opinion mining, and finally managing a new movie features, for
example, the actors. According to D’Aranzo and Pilato [2], user opinion analysis is
done from specific sort of business sectors. The analysis practices Vygotsky’s zone
of proximal development model and the model introduces Bayesian Learner and
TF-IDF grounded chooser. The procedure has been useful on pages of Facebook
mobile device and style marketplaces. Another author Monti et al. [3] analyzes
about disaffection from political process. So here the creator accumulates a great
380 N. Panigrahi and T. Asha

number of Twitters from the Italian Twitter database and utilizes an adaptable
machine learning method to deal with deliver a time series in regard to Italian
political disillusionment. Denecke [4] presented sentiment analysis and multilingual
sentiment analysis methodologies on the basis of SentiWordNet. The previous one
demonstrates that opinion mining presents diverse difficulties, once connected to a
multilingual setting. By and large lexical methodologies require language particular
lexical and linguistic assets. Producing these assets is exceptionally tedious, and it
regularly requires labor-intensive work. The later one depends on SentiWordNet.
Baccianella et al. [5] investigation depends on a lexical asset that partners three
scores showing objectivity obj(s), positivity pos(s), and negativity Neg(s) to a
gathering of subjective equivalent words called synset. Every synset coordinate set
is comprised of things, verbs, and descriptive words, and each of these gatherings
communicates an unmistakable idea. The approach speaks to an advancement of the
lexical database WordNet. The scores that are ascribed to single synset are the
aftereffect of a blend of the outcomes delivered by eight ternary classifiers, alto-
gether portrayed via genuinely comparative accuracy stages, yet unique in relation
to conduct arrangement. Each score related to every synset fluctuates in the vicinity
of 0.0 and 1.0, and the whole of the three markers is constantly equivalent to union
value. Artale et al. [6] analyze various disambiguates regarding the SentiWordNet
which is an issue for the computational utilization of WORDNET. According to
Pang and Lee [7], online review sites and personal blogs, new opportunities, and
challenges can be classified using unsupervised lexicon approaches and other
unsupervised approaches to search out and comprehend the sentiments of others.
Generally, the aspect entity recognition techniques use machine learning and lin-
guistic approach. In machine learning approach, a set of collection of data is used to
perform automatic rule-based approach on new input data, and this approach does
not require any predefined rules. This approach requires large collection of anno-
tated corpus. The supervised learning technique and semi-supervised learning
technique are the two techniques that are mainly used for machine learning process.
In linguistic approach, predefined rules are used by the user, and input defines a
pattern which contains scientific features and some rules that contain dictionary
features. This approach is also known as knowledge- or rule-based approach.
3 Issues in Sentimental Analysis
The words which expresses positive or negative sense are called sentiment words or
also known as opinion words. For example, good, awesome, amazing are the
positive opinion words and bad, worst, poor are the negative opinion words. Apart
from individual sentiment words, the phrases and idioms that also give positive
opinion or negative sense are known as sentiment lexicon or opinion lexicon.
Opinion lexicons play very important role for opinion analysis but is it not sufficient
for opinion analysis because of the following issues.
Aspect-Level Sentiment Analysis on Hotel Reviews 381

1. A positive or negative sentiment word may have different meaning in sentences
in dissimilar domains. For example, “This vacuum cleaner sucks”, thus sentence
indicates a positive opinion about vacuum cleaner.
2. Sentences that are sarcastic which does not contain any sentiment words these
types of sentences are hard to deal. For example, “what a nice food! I stopped
eating”.
These types of sentences are common in political discussion. When customer
gives review about any product and services, they use very less sarcastic word.
3. There are many sentences that contain factual information with no sentiment
words, and these types of sentences contain some useful information. Those
sentences are objective sentences that are used to give certain useful evidence,
and there are numerous of such kinds of sentences. For example, “This hotel
charges lot of money for food”.
Above sentence implies a negative sentiment about “food” that is provided by
“hotel”, and this sentence does not contain sentiment word but overall this is
negative sentiment.
4 Problem Definition
The paper’s fundamental aim is to identify aspects of entities and sentiment
expressed for each aspect, and finally the goal is to summarize all the aspects and
their sentiment values. The final outcome will be average opinion for each aspect of
an entity. Here input is taken as real hotel review from a hotel located at New Delhi.
5 Methodology
Aspect-level sentimental analysis task:
(1) Extraction and categorization of entity: In this task, extracting all the entities
from dataset, i.e., hotel reviews by customers, and then categorizing into similar
groups with a group name, where each group gives a similar entity.
(2) Extraction and categorization: for each entity in above task, extracting aspect
for each entity, into similar group with group name, where one group or on
cluster represents one type of aspects.
(3) Extraction and categorization of opinion holder: This task is parallel to above
two tasks and extraction opinion holder of those opinions and also save the
time.
(4) Classification of aspect sentiment: In this task, performing main calculation for
sentiment value of each opinion that is found in the user review sentence by
using a sentiment score algorithm. That may be positive value, negative value,

or neutral, i.e., zero value, based on this numeric value sentences that have
positive opinion, negative opinion, or neutral opinion.
Sentiment score algorithmic steps:
for each single Sentence s
Assign P = 0 and N = 0
Step 1: Check for the presence of idiom in s
Set s = 1 if exist
and s = 1 without idiom
Based on idiom update P and N
Step 2: If not exist check for the presence of token
tokenize = 1
(a) For each token t, check for the negative word
(b) If the token exists, then check for the emotion word
If the emotion word exist,
extract and invert the scores and also based on magnitude of
scores update values of P, N
(c) If the token exists, then check for the presence of next
emotion word
(d) Then extract score and verify whether the score is positive or
negative
If positive, add one to emotion word score otherwise subtract
one from emotion word score.
Again update P and N values based on scores’ magnitudes,
(e) Check whether token is booster word or negative word or an
emotion word if matches, then
extract scores and assess the values of P, N on the basis of
scores’ magnitudes
Step 3: Check for the positive and negative words’ values if anyone is
nonnegative
Then enter the line into the output file in a table format and end up the
while-loop decision tree and also perform well with all the datasets. The
accuracy of classifiers decreases when using Bank data due to the
presence of categorical attributes in the dataset.
The accuracy of classifiers could be enhanced by developing a fraud detection
model on some selected attributes of the dataset and by using the datasets which
have less categorical attributes. It would decrease the computational time or time
taken to build a model. In future, more analysis could be done using other com-
bination of classifiers. Other ensemble classifiers for different datasets and methods
for handling diverged variety of attributes.

6 Classification of Aspect Sentiment
Supervised learning approach and lexicon-based approach are two main approaches
to find out the opinion value for each aspect in a given customer review (Fig. 1).
a. Machine learning approach: It depends on certain famed algorithms for solving
sentimental analysis as a systematic text classification problem that utilizes
syntactic and/or linguistic features. The supervised learning procedures hinge on
presence of labeled training documents for finding the aspect and opinion value
in given sentence. So supervised learning approach is relying on the small set of
training data. These trained data may not give correct result for large applica-
tions which give poor result as compared to lexicon-based approach.
b. Lexicon-based approach: Lexicon-based methods are unsupervised. The
lexicon-based approach gives better result in large number of domains. The list
of sentiment words and phrases are recycled for finding the sentiment orienta-
tion on every aspect in the given input sentence. Opinion shifters are also used
which may affect opinions. Lexicon-based approach has mainly four steps:
• Identify opinion words
• Apply opinion changer
• Handle but clauses
• Aggregate sentiments
Fig. 1 Sentiment classification technique

Identify Opinion Words
Here first customer review is taken as input, then the review is broken into
single-line sentence, then each sentence that has one or more aspects is identified.
Now the total numbers of aspects that are available in a given sentence are listed
where all positive word is set with a sentiment score +1 and −1 is allocated for all
the negative word.
Apply Opinion Changer
Opinion changer or sentiment changer are the words and phrases that can swing
user opinion from positive to negative or negative to positive. Most common
opinion shifters are not, none, neither, nobody, none, nowhere, and cannot.
Handle But Clauses
“But” is mostly used in English sentences to changes the opinion of given sentence.
The words and phrases which contain “but” changes the meaning and orientations
of sentences and gives different output. The rules to handle “but” are before but and
after “but”, if the sentiment word cannot be found then both sides have opposite
sentiment.
Aggregate Sentiments
Here sentiment score of all the opinion words is aggregated, and total number of
aspects along with their sentiment scores will be displayed.
7 Natural Language Tool Kit (NLTK)
Natural Language Tool Kit makes us to write simple program in Python that works
with large quantities of text. NLTK extracts keywords and phrases from the
structured test, gives useful meaning, and saves that meaningful data into database
for further use. NLTK treats text as raw data and performs operation in an inter-
esting way. NLTK is free and open source, and it is used as a good tool and
stunning library to work with natural language. It provides functionality that can
convert input text into tokenized form and also classifies the words, and labeling
can be done by part-of-speech tagging (POS tagging). POS tagger takes input as
tokenized form of sentence and gives output as tag for each word (Table 1 and
Fig. 2).
Part-of-speech-based features
• Classify total of adjectives in the sentences.
• Find out total of adverbs.
• Total number of interjections in the sentence (e.g., “hey”, “hello”, “wow”).
• All verbs in the sentence.
• All nouns in the sentence.
• All proper nouns in the sentence.

Table 1 Universal POS No Tag Description
1 CC Coordinating_Conjunction
2 CD Cardinal_Number
3 DD Determiner
4 EX Existential_There
5 FW Foreign_Word
6 IN Preposition
7 JJ Adjective
8 JJR Adjective, Comparative
9 JJS Adjective, Superlative
10 LS List_Item_Marker
11 MD Model
12 NN Noun, singular
13 NNS Noun, plural
14 NNP Proper_Noun, singular
15 NNPS Proper_Noun, plural
16 PDT Pre_Determiner
17 POS Possessive_Ending
18 PRP Personal pronoun
19 PRP$ Possessive pronoun
20 RB Adverb
21 RBR Adverb_Comparative
22 RBS Adverb_Superlative
23 RP Participle
24 SYN Symbol
25 TO To
26 UH Interjection
27 VB Verb
28 VBD Base-Verb
29 VBG Verb-Present-Participle
30 VBN Verb-Past-Participle
31 VBP Verb-Non-3rd Person-Singular-Present
32 VBZ Verb-3rd-Person-Singular-Present
33 WDT Wh-Determiner
34 WP Wh-Pronoun
35 WP$ Possessive-wh-Pronoun
36 WRB Wh-Adverb

8 System Architecture
NLTK provides different libraries to ﬁnd out the subjective and objective in sen-
tences. The necessary steps of the aspect-based sentiment analysis are given below
(Fig. 3).
• Break the customer review into sentences and make in tokenized form.
• Remove unwanted symbols from the sentences and use part-of-speech for
individual word of the above tokenized form of sentence.
• Identify important aspect inside sentence with part-of-speech tagging help.
• Arrange the sentences into subjective and objective with the help of lexicon
approach.
• With the help of lexical directory, identify the sentiment score for each positive,
negative, or neutral sentence.
• Analyze the ﬁnal output of different aspect versus sentiment score.
Web
Source
Text Cleaning
process
Lexicon for
token tagging
Text
Processing
Sentiment
Classification
Analyzing and
processing System Archi
Knowledge
bases for
sentence structure
Fig. 2 System architecture
Hotel Website
Extract user
Review
Aspect bases
Sentiment
Analysis on each
user review
Sentiment
Value for each
aspect
Fig. 3 Steps of aspect-level analysis

9 Results and Analysis
This chapter is showing results from different modules. Final result of this project
contains four different parts. First part is scrappy module that converts unstructured
data into structured data and saves that data into text file. The structured data are
used as input for the next module, i.e., break long user reviews into sentences and
these sentences are saved into separate files.
Structured data: All unstructured data are converted form. First, data will be
crawled from the Web site. Data crawling is done by improving scrappy spider. In
this paper, spider is Python code that crawls all the unstructured data and saves into
the text file as structured form, and later this data is taken as next module, i.e.,
sentiment analysis module. Sentiment analysis module takes that structured data
and extracts aspects from that structured data. Next, one sentiment score algorithm
is used to find the score values of each aspect. Finally, the result is analyzed by
using bar chart and pie chart by taking the aspect count and sentiment score
(Fig. 4).
Fig. 4 Bar chart and pie
chart of sentiment scores

10 Conclusion and Future Work
Aspect-based sentiment analysis is new topic to the academics, as the customer’s
reviews play a central role of user’s actions. Online users, different discussion
group, online forums, and user blogs are growing very fast; all users share their
information through these means of Internet on daily basis. So that is very neces-
sary to design an efficient and effective. In aspect-based sentiment analysis system
for online user data, there are many challenges in the field of sentiment analysis
which will give better understanding of user’s data. Hence, sentiment analysis gives
very important impact on natural language processing and also gives great under-
standing on political science, management science, and social science because these
all are affected by the user’s opinions.
References
1. D. Kim et al., ‘A user opinion and metadata mining scheme for predicting box office
performance of movies in the social network environment’, New Review of Hypermedia and
Multimedia, 2013.
2. E. D’Avanzo, G. Pilato, ‘Mining Social Network users Opinions to Aid Buyers shopping
Decisions’, Procedia Computer Science 118, 2014.
3. C. Monti, A Rozza, G. Zappela, A. Arvidsson, E. Colleoni, ‘Modelling Political Disaffection
from Twitter data’, WISDOM’13 proceedings of the second international Workshop on Issues
of Sentiment Discovery and Opinion Mining, 2013.
4. K. Denecke, ‘Using SentiWordNet for Multilingual Sentiment Analysis’ ICDEW, 2008.
5. S. Baccianella, A. Esuli and F. Sebastiani, ‘SENTIWORDNET 3.0; An enhanced lexical
Resources for Sentiment Analysis and Opinion Mining’. ELRA 2010.
6. A. Artale, A. Goy, B. Magnini, E. Pianta, C. Strapparava, ‘Coping with WORDNET Sense
Proliferation’, ELRA, 1998.
7. B. Pang, and L. Lee, “Opinion Mining and Sentiment Analysis,” Foundations and Trends in
Information Retrieval, vol. 2, pp 1–135, 2008.

Aspect-Level Sentiment Analysis On Hotel Reviews

Recommended

Recommended

More Related Content

Similar to Aspect-Level Sentiment Analysis On Hotel Reviews

Similar to Aspect-Level Sentiment Analysis On Hotel Reviews (20)

More from Kimberly Pulley

More from Kimberly Pulley (20)

Recently uploaded

Recently uploaded (20)

Aspect-Level Sentiment Analysis On Hotel Reviews