2. Contents
Introduction
Motivation
Literature Review
Dataset Structure
Proposed Methodology
Results and Discussions
Conclusion and future scope
References
2
3. Mining Opinion is very important
3
Social media contains huge repository of opinions.
Mining opinions is important task in knowledge mining
Discovers collective and subjective
information.
5. Motivation
Existing approaches on aspect-opinion mining focus on the text
domain or multimedia with references to aspects considering
opinion words in association.
Obtaining aspects and opinions from opinions, classifying the
opinion based on the polarities of adjectives.
5
6. Research Objectives
To introduce a new technique of custom heuristic rules for
identifying aspect and their corresponding opinion words from
opinion document.
To append phrases other than adjectives for opinion words.
To classify the opinion using SNLP and NB classifier.
6
7. Literature Review
Aspect-Opinion Mining model is used to extract aspects and its
corresponding opinions from the user generated reviews or user
tagged photos.
Textual aspects are represented by noun words in the documents,
and opinion words are conveyed through adjective, verb, and adverb
words.[10].
For fine grained
knowledge considering
only adjectives is
not sufficient[9]
7
Aspect Based Sentiment
Analysis
Aspect Identification
Extraction Sentiment Classification
9. 9
Paper title Author and
publication
year
Dataset used Methods
used
Evaluation
metrics
Sentiment Analysis
based on a deep
stochastic network
and active learning
T. Jain et. al.
2018
Movie review
dataset from
Rotten
tomatoes
Naïve Bayes,
RNN
Run time and
accuracy
Aspect extraction
from product reviews
using category
hierarchy
information
Y. Yang et.
al. April 3-7
2017
Amazon.com
review dataset
Cat-LDA Hit rate of
aspect
Improving Opinion
aspect extraction
using semantic
similarity & aspect
association
Q. Liu. Et.
al. 2016
Review dataset
from
amazon.com
for eight
products
AER Precision, recall
and f1-score
10. 10
Paper title Author and
publication
year
Dataset used Methods used Evaluation
metrics
Word of mouth
understanding: Entity
centric multimodal
aspect opinion mining
on social media
Quan Fang et.
al. 2015
Opinions
collected from
flickr, BBC
news and trip
advisor
mmAOM (
Bernoulli
classifier and
gibbs sampling)
Perplexity, purity,
precision, recall
and f-measure
Opinion polarity
identification through
adjectives
S.
Moghaddam
& Fred
Popowich
June 2010
Movie review
dataset from
NLTK
Naïve bayes Accuracy
Jointly modelling
aspects and opinions
with MaxEnt-LDA
Hybrid
W. Zaho et. al.
9-11 October
2010
A restaurant
and hotel
review dataset
in Ganu et. al
2009
MaxEnt-LDA Precision, recall
and f1-score
11. Research Gap
Predicting sentiments by using a training set of previously defined
opinion words and have a limited capacity to identify association
between aspects and opinion words.
Considers adjectives only as opinion words.
Proposed a generalized technique that addresses this problem by
generating heuristic rules for identifying opinions, aspects and
applied Stanford NLP for classification.
11
13. Aspect-Opinion identification
Documents are collected from social media sites related to entities.
Entities carries certain aspects and opinions.
Aspect words are noun words (NN, NNP, NP, NNS)
Opinion words are adjective, adverbs and verbs (JJ, JJS, JJR, RB,
RBR, VB, VBP, VBZ)
And to classify tokens to their respective variants Parts of speech
tagging function provided by Stanford NLP toolkit.
13
14. Custom Heuristic Rules
Based on the feature engineering method of n-grams; unigrams and bigrams are generated.
Four vocabularies(lists) are constructed; contains aspect words and opinion words. (RA, RO,
PA, PO)
Rules are created using regular expression on POS tags.
Alphabetical list of Parts of speech tags used in Penn Treebank.
Example,
14
Tag Description
NN Noun, singular or mass
NNS Noun plural
NNP Proper noun singular
NNPS Proper noun plural
JJ Adjective
Tag Description
JJR Adjective comparative
JJS Adjective superlative
RB Adverb
RBR Adverb comparative
RBS Adverb superlative
Tag Description
VBG Verb Gerund
VBN Verb past participle
DT Determiner
CC Coordinating conjunction
CD Cardinal number
16. Aspects and opinion identified
Example Review:
“iPhone has a super solid stainless steel body surrounded by glass. It is simply the best,
more secure among all the smartphones.”
POS tagging,
[(ROOT(S(NP(NNP iPhone))(VP (VBZ has)(NP(NP (DT a) (JJ super) (JJ solid) (JJ stainless) (NN
steel)(NN body))(VP (VBN surrounded)(PP (IN by)(NP (NN glass))))))(. .)))][(ROOT(S(NP (PRP
It))(VP (VBZ is)(ADJP (RB simply)(ADJP (DT the) (JJS best))(, ,)(ADJP (RBR more) (JJ secure)(PP
(IN among)(NP (PDT all) (DT the) (NNS smartphones))))))(. .)))]
16
Raw aspect list Processed aspect list Raw opinion list Processed opinion list
Steel iphone secure Super
Body Solid
Smartphones stainless
glass
17. Pseudo code analysis
17
Input: Text review
Output: Aspects, opinions and sentence wise sentiment analysis
1: Take a review as input, can comprise of n sentences.
2: Initialize Stanford NLP to identify Tokens, Parts of Speech Tagging.
3: Build a dependency tree with Parts of Speech Tagging.
4: Identify stop words and extract Noun, Common Noun, Proper
Noun, Adjectives, Verbs, Superlative, Comparative ADJ and ADVB
5: Initialize Pattern Matching with Regularized REGEX Tree with heuristic rules
6: Obtain Aspects and Opinion Words.
7: Obtain Sentiment Analysis for every sentence.
8:Return result.
20. Aspect-opinion identification
Reviews are collected for Hotel11 and Smart phone12 and experiment is
conducted on each review document separately.
The total number of aspect words and its corresponding opinion words
identified,
20
Entity #documents #RA #PA #RO #PO
Hotel 30 348 63 28 266
Smartphone 30 385 121 63 263
22. Results obtained
Result obtained for aspect and opinion identification using custom heuristic
rules with SNLP classifier,
22
Entities SNLP Classifier
Precision(%) Recall(%) Accuracy(%) Kappa coefficient(%)
Hotel 92.13 89.94 88.57 0.6849
Smartphone 79.47 90.1 81.2 0.6197
Entities NB Classifier
Precision(%) Recall(%) Accuracy(%)
Hotel 55.91 86.27 56.61
Smartphone 55.25 86.25 55.96
Result obtained for aspect and opinion identification using NB classifier
23. Performance evaluation
23
0
10
20
30
40
50
60
70
80
90
precision recall accuracy
83.26
88.7
81.88
49.97
86.43
54.23
SNLP classifier NB classifier
0
10
20
30
40
50
60
70
80
90
100
precision recall Accuracy
92.3
89.1 88.3
58.49
62.29
58.4
SNLP classifier NB classifier
(a) (b)
Identification of aspects and opinion result using heuristic rules with Stanford NLP and
NB Classification techniques on (a) hotel & (b) smart phone entity
24. Conclusion
The Presented work concludes that-
Preserving custom heuristic rules using regular expression on POS
tags are addressed to identify aspects and opinions from opinion
corpus.
Accuracy obtained for aspects and opinions identified by SNLP with
the implication of heuristic rules is maximum compared to NB
classifier[9].
24
25. Future Scope
Future work includes-
Association of aspects and aspect specific opinions which will give in
detail aspect level sentiments towards entity.
More heuristic rules with the successful implementation of stemming can
be addressed for fine grained information from user generated opinions.
25
26. References
1. Bo Pang, Lillian Lee, Shivakumar Vaithyanathan, “Thumbs up - Sentiment Classification using Machine Learning Techniques,” in
Proceedings of EMNLP 2002, pp. 79–86.
2. Moghaddam, S., Popowich F., “Opinion polarity identification through adjectives”, CoRR arXiv: 1011.4623 (2010).
3. W. X. Zhao, J. Jiang, H. Yan, and X. Li, “Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid,” in Proc. EMNLP, 2010,
pp. 56– 65.
4. Sida Wang and Christopher Manning, “Baselines and bigrams: simple, good sentiment and topic classification,” in ACL’12
Proceedings of the 50th Annual Meeting of the Association for computational Linguistics, volume 2, July 08 – 14, 2012, pp. 90-94.
5. F. Chua, W. Cohen, J. Betteridge, E. Lim, “Community-Based Classification of Noun Phrases in Twitter,” in proc. CIKM’ 12
proceedings of 21st international ACM conference of information and knowledge management, pp. 1702-1706.
6. B. Liu and L. Zhang,“A survey of opinion mining and sentiment analysis,” in Mining Text Data. New York, NY, USA: Springer, 2012,
pp. 415–463.
7. Masud Karim, Rashedur M. Rahman, “Decision tree and naïve bayes algorithm for classification and generation of actionable
knowledge for direct marketing,” in proc. Journal of Software Engineering and Applications, 6, 2013, pp. 196-206.
8. Richard Socher and et. Al, “Recursive deep models for semantic compositionality over sentiment a Treebank,” in proceedings of the
conference on empirical methods in natural language processing, EMNLP’13, 2013.
9. Qian Liu, Zhiqiang Gao, Bing Liu3 and Yuanlin Zhang, “Automated Rule Selection for Aspect Extraction in Opinion Mining,” in
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 1291-1297.
10. Quan Fang, Changsheng Xu, Jitao Sang, M. Shamim Hossain and Ghulam Muhammad,”Word-of-mouth understanding: entity-centric
multimodal aspect-opinion mining in social media,” in IEEE transaction on multimedia, volume 17.No. 12, December 2015, pp.
2281-2296.
11. Y. Yang, C. Chen, M. Qiu, F. s. Bao, “Aspect extraction from product reviews using category hierarchy information,” in proceedings of
the 15th Conference of the European Chapter of Association for Computational Linguistics: Volume 2, Short Papers, pages 675–680
12. www.tripadvisor.in
13. www.mouthshut.com
14. https://nlp.stanford.edu/sentiment/
26