Opinion Mining and Classification Technique to help make better choices before buying a product

Data Mining and Business
Intelligence
PGP 2012-14
Group no 1
Amit Singh Chauhan
(60)
Komal Billu (21)

 consumer

market is flooded with products of the most
varied sorts, each being advertised as better, cheaper,
and more resistant.

 Is

advertisement really true?

INDIAN INSTITUTE OF MANAGEMENT RAIPUR

2

 Good

Solution is to go for “Word of Mouth” on the web.

 Ideal

situation is that one is able to read all the available
reviews and create an opinion.
• Time spent in reviewing will be huge
• Product reviews written in different languages


3



How to extract the features for a given product, that
could be commented upon in a customer review ????


4

 Significance

of the problem

• Mining the web for customer opinion on different products is

both a useful, as well as challenging task.
• This research will give customer a clear polarity which will be

binary in nature.
• Eventually it will help customer to take a firm opinion about

the product he goes for opinion mining.

5

 What

are the expected results of the project?

It will evolve methods to evaluate a system
implementing the method presented and we show the
evaluation results obtained when applying our system
to a set of previously manually annotated texts
containing customer reviews in English and Spanish.


6

 The

approach to the problem has been divided into two
major phases:
 Preprocessing
 Main Processing
 Assigning polarity to feature attribute
 Summarization of feature polarity
 Discussion and Evaluation


7


8

 Once

the user enters a query about the product a series
of documents are downloaded in different languages
 A second operation is performed to determine the
category of the product
 After the category is determined the product specific
features are extracted using the Word net and Concept
net
 Product independent features also extracted which are
applicable to all the products

9

 Once

we are done with Word net we search the Concept
net for further attributes and features.
 In the next step we look for undiscovered features of the
product. For eg. For a camera these features would be
battery life, picture resolution and auto mode.
 These features extracted by using bigrams which use a
corpus of target words and other words used with it in
the customer review

10

English

Spanish

11

 The

main processing process starts with anaphora
resolution in which we replace anaphoric references with
their corresponding referents
 For eg: I bought this camera about a week ago, and so far
have found it very simple to use and after anaphoric
resolution it will become I bought this camera about a week
ago, and so far have found <this camera > very simple to
use
 Sentence chunking done to convert the modified text to
sentences and after that sentence extraction done to
remove text of no importance

12

 Sentence

parsing done to obtain sentence structure and
component dependencies.
 In the next step the features and their values i.e.
attributes are extracted
 We also assign a modifier to each attribute feature to
determine whether the attribute is positive or negative
 Hence triplets of the form (feature, feature attribute,
valueof Modifier).

13



ConceptNet methodology:

• the OUT relations PropertyOf and CapableOf relations
• IN relations PartOf and UsedFor relations



Feature value extraction:

• feature, attributeFeature, valueOfModifier



Assigning polarity to feature attributes i.e. SMO(sequential minimal

optimization ) SVM(Support Vector Machine)
• The set of anchors contains the terms {featureName,happy, unsatisfied, nice,

small, buy}
• 6 dimensional training vector v(j,i) = NGD(w,a), where a with j ranging from 1 to 6
are the anchors and wi, with i from 1 to 30 are the words from the positive and
negative categories.
i

j


j

14

 Summarization

of feature polarity:

The formulas can be summarized in:
• Fpos(i)= #pos_feature_attributes(i)/#feature_attributes(i)
Fneg(i) =#neg_feature_attributes(i)/#feature attributes(i)
• The results shown are triplets of the form (feature, % Positive Opinions,
% Negative Opinions)

 Discussion

and Evaluation:

Three formula for computing the system performance
• System Accuracy (SA)
• Feature Identification Precision (FIP)
• Feature Identification Recall (FIR)

15



The Normalized Google Distance, is a semantic similarity measure
derived from the number of hits returned by the Google search
engine for a given set of keywords. Keywords with the same or similar
meanings in a natural language sense tend to be "close" in units
of Normalized Google Distance, while words with dissimilar meanings
tend to be farther apart.

NGD(x,y) = [max{logf(x), logf(y)}-log f(x,y)]/[log N – min{log f(x), log f(y)]
Where:

• N is the total number of web pages searched by Google * average number of singleton

search terms occurring on pages
• f(x) and f(y) are the number of hits for search terms x and y, respectively
• f(x, y) is the number of web pages on which both x and y occur.

16



Once the product category is determined, extracting
the product specific features and feature attributes by
using:
• WordNet for English
• EuroWordNet for Spanish

 Process

of determining the specific product features is
done by ConceptNet


17



Specialised tool for anaphora resolution
• JavaRAP for English.
• SUPAR (Slot Unification Parser for Anaphora Resolution) for

Spanish.
 Named

Entity Recognizer to spot names of products,
brands and shops.
 Ling Pipe is used to split to sentence and identifying the
named entities being referred.


18

 Sentence

parsing tool

• Minipar (English)
• Freeling (Spanish)
 To

assign polarity to each of the identified attribute of
the product, following are used sequentially
• Sequential Minimal Optimization (SMO) Support Vector Machine

(SVM)
• Normalized Google Distance (NGD)


19

 SVM

and NGD scores use a set of anchors that must be
established previously, which remains largely a
subjective matter.
 The informal language style used by the customers
while jotting their reviews, makes the identification of
words and dependencies in phrases sometimes
impossible.


20

 Currently

it is possible to review consumer comments in
two languages it can also be further extended to include
other languages also
 We can also extend it to include for extracting
information from images and photos posted by the other
users
 It can also be used for suggestive selling i.e. user will
provide his criteria for buying the product as well as
how important each factor is to him and then our system
will give suggestions accordingly

21



A Feature Dependent Method for Opinion Mining and Classification
• By - Alexandra BALAHUR DLSI, Univ. Alicante Alicante, Spain Andrés MONTOYO DLSI, Univ. Alicante

Alicante, Spain












http://en.wikipedia.org/wiki/Sequential_minimal_optimization
http://en.wikipedia.org/wiki/Normalized_Google_distance
http://research.microsoft.com/en-us/groups/nlp/
http://en.wikipedia.org/wiki/Natural_language_processing
http://wordnet.princeton.edu/
http://conceptnet5.media.mit.edu/
http://web.media.mit.edu/~hugo/publications/papers/BTTJ-ConceptNet.pdf
http://www.acronymfinder.com/Slot-Unification-Parser-for-Anaphora-Resolution(computer-science)-(SUPAR).html
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.8911&rep=rep1&ty
pe=pdf

22


23

Opinion Mining and Classification Technique to help make better choices before buying a product

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Opinion Mining and Classification Technique to help make better choices before buying a product

Similar to Opinion Mining and Classification Technique to help make better choices before buying a product (20)

Recently uploaded

Recently uploaded (20)

Opinion Mining and Classification Technique to help make better choices before buying a product