More Related Content
Similar to 20120140506009 (20)
More from IAEME Publication (20)
20120140506009
- 1. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
53
ASPECT BASED SENTIMENT ANALYSIS OF MOVIE REVIEWS
Mitisha Vaidya1
, Priyank Thakkar2
Nirma University, Ahmedabad, 382481, Gujarat, India
ABSTRACT
Aspect based Sentiment Analysis identifies user’s sentiment towards particular aspect of an
entity. In aspect based sentiment analysis, aspect and sentiment word extraction and sentiment
polarity identification are two important tasks. In this paper, Seeded Aspect and Sentiment (SAS)
topic model is extended using part of speech (POS) tagging for aspect and sentiment word extraction.
Two approaches of SentiWordNet for sentiment polarity identification are also studied in the paper.
Keywords: Aspect, Aspect Extraction, Sentiment Analysis, Sentiwordnet, Topic Modeling.
I. INTRODUCTION
Aspect based sentiment analysis investigates what precisely individual’s likes or dislikes.
Document level and sentence level sentiment analysis would not be able to identify user’s opinion
towards particular aspect of an entity. Document level analysis represents general opinion of users
towards an entity. Sentence level analysis represents user’s opinion sentence by sentence. So, for
reviewing any entity accurately, aspect based sentiment analysis is more preferable.
In aspect based sentiment analysis, aspect and sentiment word extraction separates aspects
that have been assessed [5]. For instance, in the sentence, “The voice quality of this phone is
amazing”, the aspect is “voice quality” of the entity “this phone”. Here, “this phone” does not show
the aspect GENERAL, in light of the fact that the assessment is not about the phone in general, but
just about its voice quality. On the other hand, the sentence “I love this phone.” assesses the phone
all in all, i.e., the GENERAL aspect of the entity “this phone”.
Sentiment polarity identification figures out if the opinions on different aspects are positive,
negative, or neutral [5]. In the first illustration over, the opinion on the “voice quality” aspect is
positive. In the second, the opinion on the aspect GENERAL is also positive.
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH
IN ENGINEERING AND TECHNOLOGY (IJARET)
ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 5, Issue 6, June (2014), pp. 53-61
© IAEME: http://www.iaeme.com/IJARET.asp
Journal Impact Factor (2014): 7.8273 (Calculated by GISI)
www.jifactor.com
IJARET
© I A E M E
- 2. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
54
II. RELATED WORK
In aspect and sentiment word extraction, mainly three techniques are used.First technique is
aspect extraction based on frequent nouns and noun phrases [12]. Second technique is aspect
extraction by exploiting opinion and target relations [4] and the third technique is aspect extraction
using topic modelling [7]. Aspect extraction using topic modelling as discussed in [7] combined
features of the first two techniques. In topic modelling, the synonymous aspects must be grouped
into the same class. To address this issue, a different setting was presented in [7], where the user
gave some seed words for a few aspect class and the model extracted and grouped aspect terms into
class at the same time. This setting was paramount on the grounds that arranging aspects was a
subjective task. For different application proposed, different arrangements may be required. Some
form of user direction is sought. The principle task focused in [7] was to extract the aspects and
group them. Notwithstanding, the models could additionally extract aspect specific sentiment word.
In sentiment polarity identification, two primary approaches are used. First technique is
Lexicon based approach [4],[9] and second technique is supervised learning approach [3]. In this
paper, the Lexicon based approach is used as described in [9]. In [9], SentiWordNet is used to
determine aspects’ sentiment polarity. This was done for all the sentences in a review and
subsequently for all reviews of a movie. The scores for a particular aspect from all the reviews of a
movie were aggregated to obtain an opinionated analysis of that aspect. The sentiment analysis
around aspects thus first located an opinionated content about an aspect in a review and then used the
SentiWordNet based approach to compute its sentiment polarity. This paper examines two methods
of SentiWordNet. First method is “Adjective + Adverb Combine” denoted as SWN(AAC)[9] and the
second method is “Adjective + Adverb combine” with “Adverb +Verb combine” denoted as
SWN(AAAVC)[9].
III. SAS MODEL [7] WITH POS TAGS
Figure 1: SAS model [7] with POS tags
- 3. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
55
ME-SAS [7] used a maximum entropy method to generate priors for aspect's and sentiment's
part-of-speech tag. In this paper Stanford-POS-Tagger[10] is used for the same purpose. As this
tagger tags the words in the sentences using Maximum Entropy, proposed model does not require to
calculate maximum entropy for part-of-speech tags separately. As shown in Figure 1, ߰ௗ,௦is
computed at the same level as in ME-SAS. But passing parameters are hyper parameter ߜ and words
generated by part-of-speech tagging denoted as “pos” in this study. The equations used to set priors
are same as in SAS-model.
The entries in the vocabulary is denoted by ܸଵ…௩, where V is the number of unique non-seed
terms. ܳୀଵ…is used to signifyܥ seed sets, where each seed set ܳis a group of semantically related
terms. T aspects and T aspect specific sentiment models are denoted by ߮௧ୀଵ…்
, ߮௧ୀଵ…்
ை
respectively.
Aspect specific distribution of seeds in the seed set Q is represented by ௧,. In this study, it is
assumed that a review sentence usually talks about one aspect. A review document dଵ.. comprises of
Sௗ sentences and each sentence s in Sௗhas Nௗ,௦ words. The sentence s of document d is represented
by ܵ݁݊ݐ௦
ௗ
. To distinguish between aspect and sentiment terms, an indicator (switch) variable
ݎௗ,௦, א ሼܽො, ොሽ for the ݆௧
term ofܵ݁݊ݐ௦
ௗ
, ݓௗ,௦, is used. Further, let߰ௗ,௦mean the distribution of
aspects and sentiments in ܵ݁݊ݐ௦
ௗ
. Different priors are calculated from the Equations (1), (2) and (3).
This equations are same as used in SAS model [7].
൫ܼௗ,௦ ൌ ݐหܼௗ,௦, ܴௗ,௦, ܹௗ,௦, ܷௗ,௦൯
ן
ܤ൫݊௧,ሾሿ
ߚை
൯
ܤ൫݊௧,ሾሿௗ,௦
ߚை൯
ൈ
ܤ ቀ݊௧,ሾሿ
,
ߚ
ቁ
ܤ ቀ݊௧,ሾሿௗ,௦
,
ߚቁ
ൈ Πୀଵ
ܤ ቀ݊௧,,ሾሿ
ௌ,
ߛቁ
ܤ ቀ݊௧,,ሾሿௗ,௦
ௌ,
ߛቁ
ൈ
݊ௗ,௧ ௗ,௦
ௌ௧
ߙ
݊ௗ,௧ሺ·ሻ ௗ,௦
ௌ௧
ܶߙ
ሺ1ሻ
൫ݎௗ,௦, ൌ ොหܼௗ,௦, ܴௗ,௦, ܹௗ,௦, ܷௗ,௦, ܼௗ,௦ ൌ ,ݐ ܹௗ,௦, ൌ ܹ൯
ן
݊௧,௪,ೞ,ೕ
ߚை
݊௧,ሺ·ሻ,ೞ,ೕ
|ܸ ܷܳ|ߚை
ൈ
݊ௗ,௦,ೞ,ೕ
ߜ
݊ௗ,௦,ೞ,ೕ
ߜ ݊ௗ,௦,ೞ,ೕ
ߜ
ሺ2ሻ
൫ݎௗ,௦, ൌ ܽොหܼௗ,௦, ܴௗ,௦, ܹௗ,௦, ܷௗ,௦, ܼௗ,௦ ൌ ,ݐ ܹௗ,௦, ൌ ܹ൯
ן
݊௧,,௪,ೞ,ೕ
ௌ,
ߛ
݊௧,,ሺ·ሻ,ೞ,ೕ
ௌ,
|ܳ|ߛ
ൈ
݊௧, ߚ
݊௧,ሺ·ሻ ሺܸ ܥሻߚ
ൈ
݊ௗ,௦,ೞ,ೕ
ߜ
݊ௗ,௦,ೞ,ೕ
ߜ ݊ௗ,௦,ೞ,ೕ
ߜ
; ܹ א ܳ
݊௧,௪,ೞ,ೕ
,
ߚ
݊௧,ሺ·ሻ
,
ሺܸ ܥሻߚ
ൈ
݊ௗ,௦,ೞ,ೕ
ߜ
݊ௗ,௦,ೞ,ೕ
ߜ ݊ௗ,௦,ೞ,ೕ
ߜ
; ,݈ ܹ א ܳ
ۙ
ۖ
ۘ
ۖ
ۗ
ሺ
whereܤሺݔԦሻ ൌ
Πసభ
ሺೣሬሬԦሻ
Γሺ௫ሻ
ΓቀΣసభ
ሺೣሬሬԦሻ
௫ቁ
is the multinomial Beta function. Number of times term v assigned to
aspect t as an opinion/sentiment word is denoted as ݊௧,௩
.Number of times non-seed term v in
Vassigned to aspect t as an aspect is signified by ݊௧,௩
,
. Number of times seed term v in ܸ assigned to
aspect t as an aspect is represented as ݊௧,,௩
ௌ,
.݊ௗ,௧
ௌ௧
is the number of sentences in document d that were
assigned to aspect t. designate The number of terms inܵ݁݊ݐ௦
ௗ
that were assigned to aspects and
- 4. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
56
opinions are designated as ݊ௗ,௦
and ݊ௗ,௦
ை
respectively. Number of times any term of seed set ܳ
assigned to aspect t is represented as ݒ௧,. Omission of a latter index denoted by “[]” in the above
notation represents the corresponding row vector spanning over the latter index. For example,
݊௧,ሾሿ
,
ൌ ൣ݊௧,௩ୀଵ
,
… ݊௧,௩ୀ
,
൧ and ሺ·ሻdenotes the marginalized sum over the latter index. Counts excluding
assignments of all terms inܵ݁݊ݐ௦
ௗ
is denoted by the subscript ݀, .ݏ Counts excluding ݓௗ,௦, is
represented by ݀, ,ݏ ݆. Hierarchical sampling is performed in this paper. For each sentence ܼௗ,௦,
first, an aspect is sampled using Equation (1). Once the aspect is sampled, ݎௗ,௦, is computed. In ݎௗ,௦,,
the probability of ݓௗ,௦, being an opinion or sentiment term, ሺݎௗ,௦, ൌ ොሻis given by Equation (2).
However, for ሺݎௗ,௦, ൌ ܽොሻ, there are two cases: (i) the observed term ܹ ൌ ݓௗ,௦, א ܳ or
(ii) does not belong to any seed set, ,݈ ܹ א ܳi.e., w is an non-seed term. These cases are dealt in
Equation (3).
IV. SentiWordNet
After extracting aspect and sentiment words for each sentence in a document, for sentiment
polarity identification two approaches are implemented. In SWN(AAC), “Adjective” or “Adjective +
Adverb combine” words are extracted from the sentences, which contain aspects. Polarities to these
words are assigned by SentiWordNet using following algorithm [9]. Here, scaling factor (sf) for
adverb is taken 0.35 as suggested in [9]. Adjective is represented by adj and adverb is represented by
adv.
Algorithm 1: SWN(AAC) [9]
For each sentence, extract adv+adj combines.
For each extracted adv+adj combine do:
• If adj score=0, ignore it.
• If adv is affirmative, then
o If score(adj)>0
݂ௌ(adv,adj)=
min(1,score(adj)+sf*score(adv))
o If score(adj)<0
݂ௌ(adv,adj)=
min(1,score(adj)-sf*score(adv))
• If adv is negative, then
o If score(adj)>0
݂ௌ(adv,adj)=
max(-1,score(adj)+sf*score(adv))
o If score(adj)<0
݂ௌ(adv,adj)=
max(-1,score(adj)-sf*score(adv))
In SWN(AAAVC), “Adverb + verb” patterns are combined with “Adjective + Adverb”. Here
“Adverb + Verb” are multiplied with different weight factors from 0.1 to 1 as suggested in [9]. In
this implementation, best result is obtained when weight factor is set to 1.
- 5. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
57
Algorithm 2: SWN(AAAVC) [9]
For each sentence, extract adv+adj and adv+verb combines.
1. For each extracted adv+adj combine do:
• If adj score=0, ignore it.
• If adv is affirmative, then
o If score(adj)>0
݂ௌ(adv,adj)=
min(1,score(adj)+sf*score(adv))
o If score(adj)<0
݂ௌ(adv,adj)=
min(1,score(adj)-sf*score(adv))
• If adv is negative, then
o If score(adj)>0
݂ௌ(adv,adj)=
max(-1,score(adj)+sf*score(adv))
o If score(adj)<0
݂ௌ(adv,adj)=
max(-1,score(adj)-sf*score(adv))
2. For each extracted adv+verb combine do:
• If verb score=0, ignore it.
• If adv is affirmative, then
o If score(verb)>0
݂ௌ (adv,verb)=
min(1,score(verb)+sf*score(adv))
o If score(verb)<0
݂ௌ (adv, verb)=
min(1,score(verb)-sf*score(adv))
• If adv is negative, then
o If score(verb)>0
݂ௌ(adv, verb)=
max(-1,score(verb)+sf*score(adv))
o If score(verb)<0
݂ௌ(adv, verb)=
max(-1,score(verb)-sf*score(adv))
3. ݂(sentence)=
f(adv,adj)+1*f(adv,verb)
IV. EXPERIMENTAL EVALUATION
DataSet
In all the experiments carried out, benchmark dataset AC1IMDB [6] is used. For aspect and
sentiment word extraction seeds are manually created using different film awards, movie review sites
and film magazines. This dataset contains 50,000 movie reviews from www.imdb.com. From that,
25,000 movie reviews are negative and 25,000 movie reviews are positive.
- 6. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
58
Evaluation Measures
Accuracy and f-measure are used to evaluate the performance. Accuracy is defined as the
ratio of the correctly identified polarities of reviews divided by total reviews. In this paper, user
liking a movie is considered as positive review while user disliking a movie is considered as negative
review. In this sense, true positive (TP), false negative (FN), false positive (FP) and true negative
(TN) are defined as under [13].
TP: the number of correctly identified positive reviews
FN: the number of incorrectly identified of the negative reviews
FP: the number of incorrectly identified of the positive reviews
TN: the number of correctly identified of the negative reviews
Based on the above interpretations precision () and recall ()ݎ are defined in equations (4) and (5)
respectively.
ൌ
ܶܲ
ܶܲ ܲܨ
ሺ4ሻ
ݎ ൌ
ܶܲ
ܶܲ ܰܨ
ሺ5ሻ
F-measure (F) is used to compare classifier on a single measure and it is represented by the
equation (6)
ܨ ൌ
2ݎ
ݎ
ሺ6ሻ
Experimental Methodology, Results and Discussions
First, pre-processing of the dataset was done using stop-words excluding negative words i.e.
not, isn’t, doesn’t. Words that appeared less than five times in corpus are removed. The seeds for
aspects were manually made from various film awards sites, film magazines and film review sites.
After pre-processing the dataset, SAS model with pos tags is applied on dataset to extract
aspect and aspect specific sentiment words. SWN(AAC) and SWN(AAAVC) schemes are used to
assign sentiment scores for sentiment words extracted by SAS model. After identifying scores of the
sentiment words assigned to the aspects appearing in the review, final score of the review is
computed by aggregating the scores of these sentiment words. If score > 0, review is considered
positive else negative. Computed polarity is then matched with actual polarity to compute accuracy
and f-measure.
Table 1: Comparison of SentiWordNet schemes with computed sentiment polarity
Scheme Actual
Computed
(In Comparison to Actual)
SWN(AAC)
Positive 25000 21736
Negative 25000 17774
SWN(AAAVC)
Positive 25000 23002
Negative 25000 19422
Table 1 represents the total number of correctly identified reviews by two SentiWordNet
schemes with actual number of reviews. From this result, it can be seen that SWN(AAAVC)
provides better result than SWN(AAC). Table 2 shows correctly classified polarities for both the
schemes in terms of percentage.
- 7. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
59
Table 2: Percentage of correctly classified polarity by two schemes
Scheme Correctly Classified Polarity (%)
SWN(AAC)
Positive 86.94%
Negative 71.10%
SWN(AAAVC)
Positive 92%
Negative 77.69%
Table 3: Accuracy and f-measure
Scheme Performance Measure Value
SWN(AAC)
Accuracy 70.02%
F-measure 78.89%
SWN(AAAVC)
Accuracy 84.85%
F-measure 84.77%
As shown in Table 3, accuracy of 84.85% is achieved for the task of sentiment polarity
identification by SWN (AAAVC) schemeof SentiWordNet..Figure 2 depicts the impact of different
amount of fraction of verb score (weight factor) on the accuracy for the SWN(AAAVC) scheme. It
can be seen that best accuracy is achieved when the weight factor is set to 1.
Figure 2: Impact of weight factors on accuracy
Using aspect level sentiment analysis, detailed review profile of a movie can be represented.
Figure 3 shows review profile of a movie with majority positive reviews while Figure 4 depicts the
same for a movie with majority negative reviews.
- 8. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
60
Figure 3: Review Profile of a movie with majority positive reviews
Figure 4: Review Profile of a movie with majority negative reviews
- 9. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 © IAEME
61
V. CONCLUSIONS & FUTURE WORK
This paper focuses on identifying polarity/sentiment of reviews about the product/items. To
identify the sentiment, first, aspects and sentiment words are extracted using SAS model with POS
tagging. Using two schemes of SentiWordNet, sentiment scores of the sentiment words related to the
aspects appearing in the review are found. After identifying scores of the sentiment words assigned
to the aspects appearing in the review, final score of the review is computed by aggregating the
scores of these sentiment words. It is evident from the result that SWN(AAAVC) scheme gives
better result than SWN(AAC) scheme. One potential direction for the future work can be the
experimentation on other data sets of the same domain as well as different domain than the movie
reviews.
REFERENCES
[1] http://www.tripadvisor.com.
[2] SentiWordNet, available at http://www.sentiwordnet.isti.cnr.it.
[3] Murthy Ganapathibhotla, South Morgan Street, Bing Liu, and South Morgan Street. Mining
opinions in comparative sentences. In International Conference on Compu-tational Linguistics
(Coling-2008), 2008.
[4] Minqing Hu, Bing Liu, and South Morgan Street. Mining and summarizing customer reviews. In
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD-2004), 2004.
[5] Bing Liu. Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers,May 2012.
[6] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher
Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting
of the Association for Computational Linguistics:Human Language Technologies, pages
142{150, Portland, Oregon, USA, June 2011.Association for Computational Linguistics.
[7] Arjun Mukherjee and Bing Liu. Aspect extraction through semi-supervised modeling. In ACL,
2012.
[8] Bo Pang, Lillian Lee, Harry Rd, and San Jose. Sentiment classi_cation using machine learning
techniques. In Conference on Empirical Methods in Natural LanguageProcessing (EMNLP-
2002), pages 79-86, July 2002.
[9] V K Singh, R Piryani, and A Uddin. Sentiment analysis of movie reviews. In IEEE explore, 2013.
[10] Kristina Toutanova and Christopher D. Manning. 2000. Enriching the knowledge sources used in
a maximum entropy part of-speech tagger. In Joint SIGDAT Conference on Empirical Methods,
2000.
[11] Bruce Wiebe and O'Hara. Development and use of a gold-standard data set for subjectivity
classification. In Association for Computational Linguistics, 1999.
[12] L. Zhang and B. Liu. Identifying noun product features that imply opinions. In ACL (short
paper), 2011.
[13] J. P. Jiawei Han, MichelineKamber, “Data Mining Concepts and Techniques”, Morgan
Kaufmann, 3 Edition, July 2011.
[14] Ronak Patel, Priyank Thakkar and K Kotecha, “Enhancing Movie Recommender System”,
International Journal of Advanced Research in Engineering & Technology (IJARET), Volume 5,
Issue 1, 2014, pp. 73 - 82, ISSN Print: 0976-6480, ISSN Online: 0976-6499.
[15] R. Manickam, D. Boominath and V. Bhuvaneswari, “An Analysis of Data Mining: Past,
Present and Future”, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 1, 2012, pp. 1 - 9, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[16] Dr. Jamshed Siddiqui, “An Overview of Opinion Mining Techniques”, International Journal of
Advanced Research in Engineering & Technology (IJARET), Volume 4, Issue 7, 2013,
pp. 176 - 182, ISSN Print: 0976-6480, ISSN Online: 0976-6499.