SlideShare a Scribd company logo
1 of 11
Download to read offline
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
DOI: 10.5121/ijsc.2014.5102 11
METHODOLOGICAL STUDY OF OPINION MINING
AND SENTIMENT ANALYSIS TECHNIQUES
Pravesh Kumar Singh1
, Mohd Shahid Husain2
1
M.Tech, Department of Computer Science and Engineering, Integral University,
Lucknow, India
2
Assistant Professor, Department of Computer Science and Engineering, Integral
University, Lucknow, India
ABSTRACT
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
KEYWORDS
Opinion Mining, Sentiment Analysis, Feature Extraction Techniques, Naïve Bayes Classifiers, Clustering,
Support Vector Machines
1. INTRODUCTION
Generally individuals and companies are always interested in other’s opinion like if someone
wants to purchase a new product, then firstly, he/she tries to know the reviews i.e., what other
people think about the product and based on those reviews, he/she takes the decision.
Similarly, companies also excavate deep for consumer reviews. Digital ecosystem has a plethora
for same in the form of blogs, reviews etc.
A very basic step of opinion mining and sentiment analysis is feature extraction. Figure 1 shows
the process of opinion mining and sentiment analysis
.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
12
There are various methods used for opinion mining and sentiment analysis among which
following are the important ones:
1) Naïve Bays Classifier.
2) Support Vector Machine (SVM).
3) Multilayer Perceptron.
4) Clustering.
In this paper, categorization of work done for feature extraction and classification in opinion
mining and sentiment analysis is done. In addition to this, performance analysis, advantages and
disadvantages of different techniques are appraised.
2. DATA SETS
This section provides brief details of datasets used in experiments.
2.1. Product Review Dataset
Blitzer takes the review of products from amazon.com which belong to a total of 25 categories
like videos, toys etc. He randomly selected 4000 +ve and 4000 –ve reviews.
2.2. Movie Review Dataset
The movie review dataset is taken from the Pang and Lee (2004) works. It contains movie review
with feature of 1000 +ve and 1000 –ve processed movie reviews.
3. CLASSIFICATION TECHNIQUES
3.1. Naïve Bayes Classifier
It’s a probabilistic and supervised classifier given by Thomas Bayes. According to this theorem,
if there are two events say, e1 and e2 then the conditional probability of occurrence of event e1
when e2 has already occurred is given by the following mathematical formula:
2
1
1
2
2
1
)
(
)
|
(
)
|
(
e
e
P
e
e
P
e
e
P =
This algorithm is implemented to calculate the probability of a data to be positive or negative. So,
conditional probability of a sentiment is given as:
)
Sentence
(
P
)
Sentiment
|
Sentence
(
P
)
Sentiment
(
P
)
Sentence
|
Sentiment
(
P =
And conditional probability of a word is given as:
Word
of
nos
Total
+
class
a
to
belonging
words
of
Number
1
+
class
in
occurence
word
of
Number
=
)
Sentiment
|
Word
(
P
Algorithm
S1: Initialize P(positive) ← num − popozitii (positive)/ num_total_propozitii
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
13
S2: Initialize P(negative) ← num − popozitii (negative) / num_total_propozitii
S3: Convert sentences into words
for each class of {positive, negative}:
for each word in {phrase}
P(word | class) < num_apartii (word | class) 1 | num_cuv (class) +
num_total_cuvinte
P (class) ←P (class) * P (word | class)
Returns max {P(pos), P(neg)}
The above algorithm can be represented using figure 2
3.1.1. Evaluation of Algorithm
To evaluate the algorithm following measures are used:
 Accuracy
 Precision
 Recall
 Relevance
Following contingency table is used to calculate the various measures.
Relevant Irrelevant
Detected Opinions True Positive (tp) False Positive (fp)
Undetected Opinions False Negative (fn) True Negative (tn)
Now, Precision =
fp
+
tp
tp
Accuracy = F
,
fn
+
fp
+
tn
+
tp
tn
+
tp
=
call
Re
ecision
Pr
call
Re
*
ecision
Pr
*
2
+
; Recall =
fn
+
tp
tp
Classifier
Classifier
Training Set
+ve Sentence
−ve Sentence
Classifier
Sentence
Review
Book Review
Figure 2. Algorithm of Naïve Bayes
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
14
3.1.2. Accuracy
On the 5000 sentences [1] Ion SMEUREANU, Cristian BUCUR train the Naïve Gauss Algorithm
and got 0.79939209726444 accuracy; Where number of groups (n) is 2.
3.1.3. Advantages of Naïve Bayes Classification Method
1. Model is easy to interpret.
2. Efficient computation.
3.1.4. Disadvantage of Naïve Bayes Classification Method
Assumptions of attributes being independent, which may not be necessarily valid.
3.2 Support Vector Machine (SVM)
SVM is a supervised learning model. This model is associated with a learning algorithm that
analyzes the data and identifies the pattern for classification.
The concept of SVM algorithm is based on decision plane that defines decision boundaries. A
decision plane separates group of instances having different class memberships.
For example, consider an instance which belongs to either class Circle or Diamond. There is a
separating line (figure 3) which defines a boundary. At the right side of boundary all instances are
Circle and at the left side all instances are Diamond.
Is there is an exercise/training data set D, a set of n points is written as:
( ) { }
{ } )
1
.......(
1
,
1
c
,
R
x
c
,
x
D
n
1
i
i
p
i
i
i
−
¬
ε
ε
=
Where, xi is a p-dimensional real vector. Find the maximum-margin hyper plane i.e. splits the
points having ci = 1 from those having ci = -1. Any hyperplane can be written as the set of points
satisfying:
)
........(2
1
b
-
x
w =
•
Support
Vectors
Support
Vectors
Figure 3. Principle of SVM
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
15
Finding a maximum margin hyperplane, reduces to find the pair w and b, such that the distance
between the hyperplanes is maximal while still separating the data. These hyperplanes are
described by:
1
b=
−
• x
w and 1
b
x
w −
=
−
•
The distance between two hyperplanes is
w
b
and therefore w needs to be minimized. The
minimized w in w, b subject to ci (w.xi − b) ≥ 1 for any i = 1… n.
Using Lagrange’s multipliers (αi) this optimization problem can be expressed as:
( )
3
.....
}
]
1
-
)
b
-
x
.
w
(
c
[
-
w
2
1
{
b
,
w
max
min
∑
n
1
i
i
i
i
2




















α
α =
3.2.1. Extensions of SVM
There are some extensions which makes SVM more robust and adaptable to real world problem.
These extensions include the following:
1. Soft Margin Classification
In text classification sometimes data are linearly divisible, for very high dimensional problems
and for multi-dimensional problems data are also separable linearly. Generally (in maximum
cases) the opinion mining solution is one that classifies most of the data and ignores outliers and
noisy data. If a training set data say D cannot be separated clearly then the solution is to have fat
decision classifiers and make some mistake.
Mathematically, a slack variable ξi are introduced that are not equal to zero which allows xi to not
meet the margin requirements with a cost i.e., proportional to ξ.
2. Non-linear Classification
Non-linear classifiers are given by the Bemhard Boser, Isabelle Guyon and Vapnik in 1992 using
kernel to max margin hyperplanes.
Aizeman given a kernel trick i.e., every dot product is replaced by non-linear kernel function.
When this case is apply then the effectiveness of SVM lies in the selection of kernel and soft
margin parameters.
3. Multiclass SVM
Basically SVM relevant for two class tasks but for the multiclass problems there is multiclass
SVM is available. In the multi class case labels are designed to objects which are drawn from a
finite set of numerous elements. These binary classifiers might be built using two classifiers:
1. Distinguishing one versus all labels and
2. Among each pair of classes one versus one.
3.2.2. Accuracy
When pang take unigrams learning method then it gives the best output in a presence based
frequency model run by SVM and he calculated 82.9% accuracy in the process.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
16
3.2.3. Advantages of Support Vector Machine Method
1. Very good performance on experimental results.
2. Low dependency on data set dimensionality.
3.2.4. Disadvantages of Support Vector Machine Method
1. One disadvantages of SVM is i.e. in case of categorical or missing value it needs pre-processed.
2. Difficult interpretation of resulting model.
3.3. Multi-Layer Perceptron (MLP)
Multi-Layer perceptron is a feed forward neural network, with one or N layers among inputs and
output. Feed forward means i.e, uni-direction flow of data such as from input layer to output
layer. This ANN which multilayer perceptron begin with input layer where every node means a
predicator variable. Input nodes or neurons are connected with every neuron in next layer (named
as hidden layers). The hidden layer neurons are connected to other hidden layer neuron.
Output layer is made up as follows:
1. When prediction is binary output layer made up of one neuron and
2. When prediction is non-binary then output layer made up of N neuron.
This arrangement makes an efficient flow of information from input layer to output layer.
Figure 4 shows the structure of MLP. In figure 4 there is input layer and an output layer like
single layer perceptron but there is also a hidden layer work in this algorithm.
MLP is a back propagation algorithm and has two phases:
Phase I: It is the forward phase where activation are propagated from the input layer to output
layer.
Phase II: In this phase to change the weight and bias value errors among practical & real values
and the requested nominal value in the output layer is propagate in the backward direction.
MLP is popular technique due to the fact i.e. it can act as universal function approximator. MLP
is a general, flexible and non-linear tool because a “back propagation” network has minimum one
hidden layer with various non-linear entities that can learn every function or relationship between
group of input and output variable (whether variables are discrete or continuous).
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
17
An advantage of MLP, compare to classical modeling method is that it does not enforce any sort
of constraint with respect to the initial data neither does it generally start from specific
assumptions.
Another benefit of the method lies in its capability to evaluation good models even despite the
presence of noise in the analyzed information, as arises when there is an existence of omitted and
outlier values in the spreading of the variables. Hence, it is a robust method when dealing with
problems of noise in the given information.
3.3.1. Accuracy
On the health care data Ludmila I. Kuncheva, (IEEE Member) calculate accuracy of MLP as
84.25%-89.50%.
3.3.2. Advantages of MLP
1. It acts as a universal function approximator.
2. MLP can learn each and every relationship among input and output variables.
3.3.3. Disadvantages of MLP
1. MLP needs more time for execution compare to other technique because flexibility lies in the
need to have enough training data.
2. It is considered as complex “black box”.
3.4 Clustering Classifier
Clustering is an unsupervised learning method and has no labels on any point. Clustering
technique recognizes the structure in data and group, based on how nearby they are to one
another.
So, clustering is process of organizing objects and instances in a class or group whose members
are similar in some way and members of class or cluster is not similar to those are in the other
cluster
This method is an unsupervised method, so one does not know that how many clusters or groups
are existing in the data.
Using this method one can organize the data set into different clusters based on the similarities
and distance among data points.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
18
Clustering organization is denoted as a set of subsets C = C1 . . . Ck of S, such that:
φ
=
∩
=
= j
i
k
1
i i C
C
and
C
S  for j
i ≠ . Therefore, any object in S related to exactly one and only one
subset.
For example, consider figure 5 where data set has three normal clusters.
Now consider the some real-life examples for illustrating clustering:
Example 1: Consider the people having similar size together to make small and large shirts.
1. Tailor-made for each person: expensive
2. One-size-fits-all: does not fit all.
Example 2: In advertising, segment consumers according to their similarities: To do targeted
advertising.
Example 3: To create a topic hierarchy, we can take a group of text and organize those texts
according to their content matches.
Basically there are two types of measures used to estimate the relation: Distance measures and
similarity measures.
Basically following are two kinds of measures used to guesstimate this relation:
1. Distance measures and
2. Similarity measures
Distance Measures
To get the similarity and difference between the group of objects distance measures uses the
various clustering methods.
It is convenient to represent the distance between two instances let say xi and xj as: d (xi, xj). A
valid distance measure should be symmetric and gains its minimum value (usually zero) in case
of identical vectors.
If distance measure follows the following properties then it is known as metric distance measure:
S
∈
x
,
x
∀
x
=
x
⇒
0
=
)
x
,
x
(
d
.
2
S
∈
x
,
x
,
x
∀
)
x
,
x
(
d
+
)
x
,
x
(
d
≤
)
x
,
x
(
d
inequality
Triangle
.
1
j
i
j
i
j
i
k
j
i
k
j
j
i
k
i
There are variations in distance measures depending upon the attribute in question.
3.4.1. Clustering Algorithms
A number of clustering algorithms are getting popular. The basic reason of a number of clustering
methods is that “cluster” is not accurately defined (Estivill-Castro, 2000). As a result many
clustering methods have been developed, using a different induction principle.
1. Exclusive Clustering
In this clustering algorithm, data are clusters in an exclusive way, so that a data fits to only one
certain cluster. Example of exclusive clustering is K-means clustering.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
19
2. Overlapping Clustering
This clustering algorithm uses fuzzy sets to grouped data, so each point may fit to two or more
groups or cluster with various degree of membership.
3. Hierarchical Clustering
Hierarchical clustering has two variations: agglomerative and divisive clustering
Agglomerative clustering is based on the union among the two nearest groups. The start state is
realized by setting every data as a group or cluster. After some iteration it gets the final clusters
needed. It is a bottom-up version.
Divisive clustering begins from one group or cluster containing all data items. At every step,
clusters are successively fragmented into smaller groups or clusters according to some difference.
It is a top-down version.
4. Probabilistic Clustering
It is a mix of Gaussian, and uses totally a probabilistic approach.
3.4.2. Evaluation Criteria Measures for Clustering Technique
Basically, it is divided into two group’s internal quality criteria and external quality criteria.
1. Internal Quality Criteria
Using similarity measure it measures the compactness if clusters. It generally takes into
consideration intra-cluster homogeneity, the inter-cluster separability or a combination of these
two. It doesn’t use any exterior information beside the data itself.
2. External Quality Criteria
External quality criteria are important for observing the structure of the cluster match to some
previously defined classification of the instance or objects.
3.4.3. Accuracy
Depending on the data accuracy of the clustering techniques varied from 65.33% to 99.57%.
3.4.4. Advantages of Clustering Method
The most important benefit of this technique is that it offers the classes or groups that fulfill
(approximately) an optimality measure.
3.4.5. Disadvantages of Clustering Method
1. There is no learning set of labeled observations.
2. Number of groups is usually unknown.
3. Implicitly, users already choose the appropriate features and distance measure.
4. CONCLUSION
The important part of gathering information always seems as, what the people think. The rising
accessibility of opinion rich resources such as online analysis websites and blogs means that, one
can simply search and recognize the opinions of others. One can precise his/her ideas and
opinions concerning goods and facilities. These views and thoughts are subjective figures which
signify opinions, sentiments, emotional state or evaluation of someone.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
20
In this paper, different methods for data (feature or text) extraction are presented. Every method
has some benefits and limitations and one can use these methods according to the situation for
feature and text extraction. Based on the survey we can find the accuracy of different methods in
different data set using N-gram feature shown in table 1.
Table 1: Accuracy of Different Methods
Movie Reviews Product Reviews
N-gram
Feature
NB MLP SVM NB MLP SVM
75.50 81.05 81.15 62.50 79.27 79.40
According to the survey, accuracy of SVM is better than other three methods when N-gram
feature was used.
The four methods discussed in the paper are actually applicable in different areas like clustering is
applied in movie reviews and SVM techniques is applied in biological reviews & analysis.
Although the field of opinion mining is new, but still diverse methods available to provide a way
to implement these methods in various programming languages like PHP, Python etc. with an
outcome of innumerable applications. From a convergent point of view Naïve Bayes is best
suitable for textual classification, clustering for consumer services and SVM for biological
reading and interpretation.
ACKNOWLEDGEMENTS
Every good writing requires the help and support of many people for it to be truly good. I would
take the opportunity of thanking all those who extended a helping hand whenever I needed one.
I offer my heartfelt gratitude to Mr. Mohd. Shahid Husain, who encouraged, guided and helped
me a lot in the project. I extent my thanks to Miss. Ratna Singh (fiancee) for her incandescent
help to complete this paper.
A vote of thanks to my family for their moral and emotional support. Above all utmost thanks to
the Almighty God for the divine intervention in this academic endeavor.
REFERENCES
[1] Ion SMEUREANU, Cristian BUCUR, Applying Supervised Opinion Mining Techniques on Online
User Reviews, Informatica Economică vol. 16, no. 2/2012.
[2] Bo Pang and Lillian Lee, “Opinion Mining and Sentiment Analysis”, Foundations and TrendsR_ in
Information Retrieval Vol. 2, Nos. 1–2 (2008).
[3] Abbasi, “Affect intensity analysis of dark web forums,” in Proceedings of Intelligence and Security
Informatics (ISI), pp. 282–288, 2007.
[4] K. Dave, S. Lawrence & D. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic
Classi_cation of Product Reviews." Proceedings of the 12th International Conference on World Wide
Web, pp. 519-528, 2003.
[5] B. Liu. Web Data Mining: Exploring hyperlinks, contents, and usage data," Opinion Mining.
Springer, 2007.
[6] B. Pang & L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with
respect to rating scales." Proceedings of the Association for Computational Linguistics (ACL), pp.
15124,2005.
[7] Nilesh M. Shelke, Shriniwas Deshpande, Vilas Thakre, Survey of Techniques for Opinion Mining,
International Journal of Computer Applications (0975 – 8887) Volume 57– No.13, November 2012.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
21
[8] Nidhi Mishra and C K Jha, Classification of Opinion Mining Techniques, International Journal of
Computer Applications 56 (13):1-6, October 2012, Published by Foundation of Computer Science,
New York, USA.
[9] Oded Z. Maimon, Lior Rokach, “Data Mining and Knowledge Discovery Handbook” Springer, 2005.
[10] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Sentiment classification using machine
learning techniques.” In Proceedings of the 2002 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pages 79–86.
[11] Towards Enhanced Opinion Classification using NLP Techniques, IJCNLP 2011, pages 101–107,
Chiang Mai, Thailand, November 13, 2011
Author
Pravesh Kumar Singh is a fine blend of strong scientific orientation and editing.
He is a Computer Science (Bachelor in Technology) graduate from a renowned
gurukul in India called Dr. Ram Manohar Lohia Awadh University with
excellence not only in academics but also had flagship in choreography. He
mastered in Computer Science and Engineering from Integral University,
Lucknow, India. Currently he is acting as Head MCA (Master in Computer
Applications) department in Thakur Publications and also working in the capacity
of Senior Editor.

More Related Content

What's hot

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...hyunsung lee
 
Citython presentation
Citython presentationCitython presentation
Citython presentationAnkit Tewari
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Ahmed Gad
 
Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)hyunsung lee
 
Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Editor IJARCET
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...ijfls
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
 

What's hot (18)

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
Text categorization
Text categorizationText categorization
Text categorization
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
 
Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)
 
Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368
 
Classifiers
ClassifiersClassifiers
Classifiers
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
 

Similar to Methodological Study Of Opinion Mining And Sentiment Analysis Techniques

Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...gerogepatton
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...ijfls
 
A Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdfA Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdfLori Head
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXmlaij
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET Journal
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTScscpconf
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text IJERA Editor
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a BookIRJET Journal
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...journalBEEI
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networksbutest
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionTarekMourad8
 

Similar to Methodological Study Of Opinion Mining And Sentiment Analysis Techniques (20)

Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
61_Empirical
61_Empirical61_Empirical
61_Empirical
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
 
A Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdfA Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdf
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text
 
5th_sem_presentationtoday.pdf
5th_sem_presentationtoday.pdf5th_sem_presentationtoday.pdf
5th_sem_presentationtoday.pdf
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Ankit presentation
Ankit presentationAnkit presentation
Ankit presentation
 
Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information Extraction
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 

Methodological Study Of Opinion Mining And Sentiment Analysis Techniques

  • 1. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 DOI: 10.5121/ijsc.2014.5102 11 METHODOLOGICAL STUDY OF OPINION MINING AND SENTIMENT ANALYSIS TECHNIQUES Pravesh Kumar Singh1 , Mohd Shahid Husain2 1 M.Tech, Department of Computer Science and Engineering, Integral University, Lucknow, India 2 Assistant Professor, Department of Computer Science and Engineering, Integral University, Lucknow, India ABSTRACT Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis. KEYWORDS Opinion Mining, Sentiment Analysis, Feature Extraction Techniques, Naïve Bayes Classifiers, Clustering, Support Vector Machines 1. INTRODUCTION Generally individuals and companies are always interested in other’s opinion like if someone wants to purchase a new product, then firstly, he/she tries to know the reviews i.e., what other people think about the product and based on those reviews, he/she takes the decision. Similarly, companies also excavate deep for consumer reviews. Digital ecosystem has a plethora for same in the form of blogs, reviews etc. A very basic step of opinion mining and sentiment analysis is feature extraction. Figure 1 shows the process of opinion mining and sentiment analysis .
  • 2. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 12 There are various methods used for opinion mining and sentiment analysis among which following are the important ones: 1) Naïve Bays Classifier. 2) Support Vector Machine (SVM). 3) Multilayer Perceptron. 4) Clustering. In this paper, categorization of work done for feature extraction and classification in opinion mining and sentiment analysis is done. In addition to this, performance analysis, advantages and disadvantages of different techniques are appraised. 2. DATA SETS This section provides brief details of datasets used in experiments. 2.1. Product Review Dataset Blitzer takes the review of products from amazon.com which belong to a total of 25 categories like videos, toys etc. He randomly selected 4000 +ve and 4000 –ve reviews. 2.2. Movie Review Dataset The movie review dataset is taken from the Pang and Lee (2004) works. It contains movie review with feature of 1000 +ve and 1000 –ve processed movie reviews. 3. CLASSIFICATION TECHNIQUES 3.1. Naïve Bayes Classifier It’s a probabilistic and supervised classifier given by Thomas Bayes. According to this theorem, if there are two events say, e1 and e2 then the conditional probability of occurrence of event e1 when e2 has already occurred is given by the following mathematical formula: 2 1 1 2 2 1 ) ( ) | ( ) | ( e e P e e P e e P = This algorithm is implemented to calculate the probability of a data to be positive or negative. So, conditional probability of a sentiment is given as: ) Sentence ( P ) Sentiment | Sentence ( P ) Sentiment ( P ) Sentence | Sentiment ( P = And conditional probability of a word is given as: Word of nos Total + class a to belonging words of Number 1 + class in occurence word of Number = ) Sentiment | Word ( P Algorithm S1: Initialize P(positive) ← num − popozitii (positive)/ num_total_propozitii
  • 3. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 13 S2: Initialize P(negative) ← num − popozitii (negative) / num_total_propozitii S3: Convert sentences into words for each class of {positive, negative}: for each word in {phrase} P(word | class) < num_apartii (word | class) 1 | num_cuv (class) + num_total_cuvinte P (class) ←P (class) * P (word | class) Returns max {P(pos), P(neg)} The above algorithm can be represented using figure 2 3.1.1. Evaluation of Algorithm To evaluate the algorithm following measures are used:  Accuracy  Precision  Recall  Relevance Following contingency table is used to calculate the various measures. Relevant Irrelevant Detected Opinions True Positive (tp) False Positive (fp) Undetected Opinions False Negative (fn) True Negative (tn) Now, Precision = fp + tp tp Accuracy = F , fn + fp + tn + tp tn + tp = call Re ecision Pr call Re * ecision Pr * 2 + ; Recall = fn + tp tp Classifier Classifier Training Set +ve Sentence −ve Sentence Classifier Sentence Review Book Review Figure 2. Algorithm of Naïve Bayes
  • 4. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 14 3.1.2. Accuracy On the 5000 sentences [1] Ion SMEUREANU, Cristian BUCUR train the Naïve Gauss Algorithm and got 0.79939209726444 accuracy; Where number of groups (n) is 2. 3.1.3. Advantages of Naïve Bayes Classification Method 1. Model is easy to interpret. 2. Efficient computation. 3.1.4. Disadvantage of Naïve Bayes Classification Method Assumptions of attributes being independent, which may not be necessarily valid. 3.2 Support Vector Machine (SVM) SVM is a supervised learning model. This model is associated with a learning algorithm that analyzes the data and identifies the pattern for classification. The concept of SVM algorithm is based on decision plane that defines decision boundaries. A decision plane separates group of instances having different class memberships. For example, consider an instance which belongs to either class Circle or Diamond. There is a separating line (figure 3) which defines a boundary. At the right side of boundary all instances are Circle and at the left side all instances are Diamond. Is there is an exercise/training data set D, a set of n points is written as: ( ) { } { } ) 1 .......( 1 , 1 c , R x c , x D n 1 i i p i i i − ¬ ε ε = Where, xi is a p-dimensional real vector. Find the maximum-margin hyper plane i.e. splits the points having ci = 1 from those having ci = -1. Any hyperplane can be written as the set of points satisfying: ) ........(2 1 b - x w = • Support Vectors Support Vectors Figure 3. Principle of SVM
  • 5. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 15 Finding a maximum margin hyperplane, reduces to find the pair w and b, such that the distance between the hyperplanes is maximal while still separating the data. These hyperplanes are described by: 1 b= − • x w and 1 b x w − = − • The distance between two hyperplanes is w b and therefore w needs to be minimized. The minimized w in w, b subject to ci (w.xi − b) ≥ 1 for any i = 1… n. Using Lagrange’s multipliers (αi) this optimization problem can be expressed as: ( ) 3 ..... } ] 1 - ) b - x . w ( c [ - w 2 1 { b , w max min ∑ n 1 i i i i 2                     α α = 3.2.1. Extensions of SVM There are some extensions which makes SVM more robust and adaptable to real world problem. These extensions include the following: 1. Soft Margin Classification In text classification sometimes data are linearly divisible, for very high dimensional problems and for multi-dimensional problems data are also separable linearly. Generally (in maximum cases) the opinion mining solution is one that classifies most of the data and ignores outliers and noisy data. If a training set data say D cannot be separated clearly then the solution is to have fat decision classifiers and make some mistake. Mathematically, a slack variable ξi are introduced that are not equal to zero which allows xi to not meet the margin requirements with a cost i.e., proportional to ξ. 2. Non-linear Classification Non-linear classifiers are given by the Bemhard Boser, Isabelle Guyon and Vapnik in 1992 using kernel to max margin hyperplanes. Aizeman given a kernel trick i.e., every dot product is replaced by non-linear kernel function. When this case is apply then the effectiveness of SVM lies in the selection of kernel and soft margin parameters. 3. Multiclass SVM Basically SVM relevant for two class tasks but for the multiclass problems there is multiclass SVM is available. In the multi class case labels are designed to objects which are drawn from a finite set of numerous elements. These binary classifiers might be built using two classifiers: 1. Distinguishing one versus all labels and 2. Among each pair of classes one versus one. 3.2.2. Accuracy When pang take unigrams learning method then it gives the best output in a presence based frequency model run by SVM and he calculated 82.9% accuracy in the process.
  • 6. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 16 3.2.3. Advantages of Support Vector Machine Method 1. Very good performance on experimental results. 2. Low dependency on data set dimensionality. 3.2.4. Disadvantages of Support Vector Machine Method 1. One disadvantages of SVM is i.e. in case of categorical or missing value it needs pre-processed. 2. Difficult interpretation of resulting model. 3.3. Multi-Layer Perceptron (MLP) Multi-Layer perceptron is a feed forward neural network, with one or N layers among inputs and output. Feed forward means i.e, uni-direction flow of data such as from input layer to output layer. This ANN which multilayer perceptron begin with input layer where every node means a predicator variable. Input nodes or neurons are connected with every neuron in next layer (named as hidden layers). The hidden layer neurons are connected to other hidden layer neuron. Output layer is made up as follows: 1. When prediction is binary output layer made up of one neuron and 2. When prediction is non-binary then output layer made up of N neuron. This arrangement makes an efficient flow of information from input layer to output layer. Figure 4 shows the structure of MLP. In figure 4 there is input layer and an output layer like single layer perceptron but there is also a hidden layer work in this algorithm. MLP is a back propagation algorithm and has two phases: Phase I: It is the forward phase where activation are propagated from the input layer to output layer. Phase II: In this phase to change the weight and bias value errors among practical & real values and the requested nominal value in the output layer is propagate in the backward direction. MLP is popular technique due to the fact i.e. it can act as universal function approximator. MLP is a general, flexible and non-linear tool because a “back propagation” network has minimum one hidden layer with various non-linear entities that can learn every function or relationship between group of input and output variable (whether variables are discrete or continuous).
  • 7. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 17 An advantage of MLP, compare to classical modeling method is that it does not enforce any sort of constraint with respect to the initial data neither does it generally start from specific assumptions. Another benefit of the method lies in its capability to evaluation good models even despite the presence of noise in the analyzed information, as arises when there is an existence of omitted and outlier values in the spreading of the variables. Hence, it is a robust method when dealing with problems of noise in the given information. 3.3.1. Accuracy On the health care data Ludmila I. Kuncheva, (IEEE Member) calculate accuracy of MLP as 84.25%-89.50%. 3.3.2. Advantages of MLP 1. It acts as a universal function approximator. 2. MLP can learn each and every relationship among input and output variables. 3.3.3. Disadvantages of MLP 1. MLP needs more time for execution compare to other technique because flexibility lies in the need to have enough training data. 2. It is considered as complex “black box”. 3.4 Clustering Classifier Clustering is an unsupervised learning method and has no labels on any point. Clustering technique recognizes the structure in data and group, based on how nearby they are to one another. So, clustering is process of organizing objects and instances in a class or group whose members are similar in some way and members of class or cluster is not similar to those are in the other cluster This method is an unsupervised method, so one does not know that how many clusters or groups are existing in the data. Using this method one can organize the data set into different clusters based on the similarities and distance among data points.
  • 8. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 18 Clustering organization is denoted as a set of subsets C = C1 . . . Ck of S, such that: φ = ∩ = = j i k 1 i i C C and C S  for j i ≠ . Therefore, any object in S related to exactly one and only one subset. For example, consider figure 5 where data set has three normal clusters. Now consider the some real-life examples for illustrating clustering: Example 1: Consider the people having similar size together to make small and large shirts. 1. Tailor-made for each person: expensive 2. One-size-fits-all: does not fit all. Example 2: In advertising, segment consumers according to their similarities: To do targeted advertising. Example 3: To create a topic hierarchy, we can take a group of text and organize those texts according to their content matches. Basically there are two types of measures used to estimate the relation: Distance measures and similarity measures. Basically following are two kinds of measures used to guesstimate this relation: 1. Distance measures and 2. Similarity measures Distance Measures To get the similarity and difference between the group of objects distance measures uses the various clustering methods. It is convenient to represent the distance between two instances let say xi and xj as: d (xi, xj). A valid distance measure should be symmetric and gains its minimum value (usually zero) in case of identical vectors. If distance measure follows the following properties then it is known as metric distance measure: S ∈ x , x ∀ x = x ⇒ 0 = ) x , x ( d . 2 S ∈ x , x , x ∀ ) x , x ( d + ) x , x ( d ≤ ) x , x ( d inequality Triangle . 1 j i j i j i k j i k j j i k i There are variations in distance measures depending upon the attribute in question. 3.4.1. Clustering Algorithms A number of clustering algorithms are getting popular. The basic reason of a number of clustering methods is that “cluster” is not accurately defined (Estivill-Castro, 2000). As a result many clustering methods have been developed, using a different induction principle. 1. Exclusive Clustering In this clustering algorithm, data are clusters in an exclusive way, so that a data fits to only one certain cluster. Example of exclusive clustering is K-means clustering.
  • 9. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 19 2. Overlapping Clustering This clustering algorithm uses fuzzy sets to grouped data, so each point may fit to two or more groups or cluster with various degree of membership. 3. Hierarchical Clustering Hierarchical clustering has two variations: agglomerative and divisive clustering Agglomerative clustering is based on the union among the two nearest groups. The start state is realized by setting every data as a group or cluster. After some iteration it gets the final clusters needed. It is a bottom-up version. Divisive clustering begins from one group or cluster containing all data items. At every step, clusters are successively fragmented into smaller groups or clusters according to some difference. It is a top-down version. 4. Probabilistic Clustering It is a mix of Gaussian, and uses totally a probabilistic approach. 3.4.2. Evaluation Criteria Measures for Clustering Technique Basically, it is divided into two group’s internal quality criteria and external quality criteria. 1. Internal Quality Criteria Using similarity measure it measures the compactness if clusters. It generally takes into consideration intra-cluster homogeneity, the inter-cluster separability or a combination of these two. It doesn’t use any exterior information beside the data itself. 2. External Quality Criteria External quality criteria are important for observing the structure of the cluster match to some previously defined classification of the instance or objects. 3.4.3. Accuracy Depending on the data accuracy of the clustering techniques varied from 65.33% to 99.57%. 3.4.4. Advantages of Clustering Method The most important benefit of this technique is that it offers the classes or groups that fulfill (approximately) an optimality measure. 3.4.5. Disadvantages of Clustering Method 1. There is no learning set of labeled observations. 2. Number of groups is usually unknown. 3. Implicitly, users already choose the appropriate features and distance measure. 4. CONCLUSION The important part of gathering information always seems as, what the people think. The rising accessibility of opinion rich resources such as online analysis websites and blogs means that, one can simply search and recognize the opinions of others. One can precise his/her ideas and opinions concerning goods and facilities. These views and thoughts are subjective figures which signify opinions, sentiments, emotional state or evaluation of someone.
  • 10. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 20 In this paper, different methods for data (feature or text) extraction are presented. Every method has some benefits and limitations and one can use these methods according to the situation for feature and text extraction. Based on the survey we can find the accuracy of different methods in different data set using N-gram feature shown in table 1. Table 1: Accuracy of Different Methods Movie Reviews Product Reviews N-gram Feature NB MLP SVM NB MLP SVM 75.50 81.05 81.15 62.50 79.27 79.40 According to the survey, accuracy of SVM is better than other three methods when N-gram feature was used. The four methods discussed in the paper are actually applicable in different areas like clustering is applied in movie reviews and SVM techniques is applied in biological reviews & analysis. Although the field of opinion mining is new, but still diverse methods available to provide a way to implement these methods in various programming languages like PHP, Python etc. with an outcome of innumerable applications. From a convergent point of view Naïve Bayes is best suitable for textual classification, clustering for consumer services and SVM for biological reading and interpretation. ACKNOWLEDGEMENTS Every good writing requires the help and support of many people for it to be truly good. I would take the opportunity of thanking all those who extended a helping hand whenever I needed one. I offer my heartfelt gratitude to Mr. Mohd. Shahid Husain, who encouraged, guided and helped me a lot in the project. I extent my thanks to Miss. Ratna Singh (fiancee) for her incandescent help to complete this paper. A vote of thanks to my family for their moral and emotional support. Above all utmost thanks to the Almighty God for the divine intervention in this academic endeavor. REFERENCES [1] Ion SMEUREANU, Cristian BUCUR, Applying Supervised Opinion Mining Techniques on Online User Reviews, Informatica Economică vol. 16, no. 2/2012. [2] Bo Pang and Lillian Lee, “Opinion Mining and Sentiment Analysis”, Foundations and TrendsR_ in Information Retrieval Vol. 2, Nos. 1–2 (2008). [3] Abbasi, “Affect intensity analysis of dark web forums,” in Proceedings of Intelligence and Security Informatics (ISI), pp. 282–288, 2007. [4] K. Dave, S. Lawrence & D. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic Classi_cation of Product Reviews." Proceedings of the 12th International Conference on World Wide Web, pp. 519-528, 2003. [5] B. Liu. Web Data Mining: Exploring hyperlinks, contents, and usage data," Opinion Mining. Springer, 2007. [6] B. Pang & L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales." Proceedings of the Association for Computational Linguistics (ACL), pp. 15124,2005. [7] Nilesh M. Shelke, Shriniwas Deshpande, Vilas Thakre, Survey of Techniques for Opinion Mining, International Journal of Computer Applications (0975 – 8887) Volume 57– No.13, November 2012.
  • 11. International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014 21 [8] Nidhi Mishra and C K Jha, Classification of Opinion Mining Techniques, International Journal of Computer Applications 56 (13):1-6, October 2012, Published by Foundation of Computer Science, New York, USA. [9] Oded Z. Maimon, Lior Rokach, “Data Mining and Knowledge Discovery Handbook” Springer, 2005. [10] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Sentiment classification using machine learning techniques.” In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79–86. [11] Towards Enhanced Opinion Classification using NLP Techniques, IJCNLP 2011, pages 101–107, Chiang Mai, Thailand, November 13, 2011 Author Pravesh Kumar Singh is a fine blend of strong scientific orientation and editing. He is a Computer Science (Bachelor in Technology) graduate from a renowned gurukul in India called Dr. Ram Manohar Lohia Awadh University with excellence not only in academics but also had flagship in choreography. He mastered in Computer Science and Engineering from Integral University, Lucknow, India. Currently he is acting as Head MCA (Master in Computer Applications) department in Thakur Publications and also working in the capacity of Senior Editor.