SlideShare a Scribd company logo
1 of 26
Download to read offline
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Classification of Sentimental Reviews Using
Machine Learning Techniques
ICRTC-2015: 3rd International Conference On Recent Trends In
Computing
Presented At
SRM University
Delhi-NCR Campus, Ghaziabad(U.P)
Ankit Agrawal
Department of Computer Science and Engineering,
National Institute of Technology Rourkela,
Rourkela - 769008, Odisha, India
March 9, 2015
1 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Sentiment Analysis
Sentiment mainly refers to feelings, emotions, opinion or atti-
tude (Argamon et al., 2009).
With the rapid increase of world wide web, people often express
their sentiments over internet through social media, blogs, rat-
ing and reviews.
Business owners and advertising companies often employ sen-
timent analysis to discover new business strategies and adver-
tising campaign.
2 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Machine Learning Techniques
Machine leaning algorithms are used to classify and predict whether
a document represents positive or negative sentiment. Different
types of machine learning algorithms are:
Supervised algorithm uses a labeled dataset where each doc-
ument of training set is labeled with appropriate sentiment.
Unsupervised algorithm include unlabeled dataset (Singh et al.,
2007).
This study mainly concerns with supervised learning techniques on
a labeled dataset.
3 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Types of Sentiment Analysis
Different Types of Sentiment analysis are as follows:
Document Level: Document Level sentiment classification
aims at classifying the entire document or review as either pos-
itive or negative.
Sentence Level: Sentence level sentiment classification con-
siders the polarity of individual sentence of a document.
Aspect Level: Aspect level sentiment classification first iden-
tifies the different aspects of a corpus and then for each doc-
ument, the polarity is calculated with respect to obtained as-
pects.
Document level sentiment analysis is being considered for analysis
in this study.
4 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Review of Related work
Author(s) Sentiment Analysis Approach
(Pang et al.,
2002)
They have considered sentiment classification based
on categorization aspect with positive and negative
sentiments . They used three different machine learn-
ing algorithms i.e., Naive Bayes, SVM , and Maximum
Entropy classification applied over the n-gram tech-
nique.
(Turney, 2002). He presented unsupervised algorithm to classify re-
view as either recommended (positive) or not rec-
ommended (negative). The author has used Part of
Speech (POS) tagger to identify phrases which con-
tain adjectives or adverbs.
(Read, 2005). He used emotions for labeling of dataset. He used
emotions for labeling because they are independent of
time, topic and domain. He applied machine learning
classifiers on this labeled dataset.
5 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
continue...
Author(s) Sentiment Analysis Approach
(Dave et al.,
2003).
They have used structured review for testing and
training, identifying features and score methods to de-
termine whether the reviews are positive or negative.
They used classifier to classify the sentences obtained
from web search through search query using product
name as search condition.
(Whitelaw
et al., 2005).
They have presented a sentiment classification tech-
nique on the basis of analysis and extraction of ap-
praisal groups. Appraisal group represents a set of
attribute values in task independent semantic tax-
onomies.
(Li et al.,
2011).
They have proposed various semi-supervised tech-
niques to solve the issue of shortage of labeled data
for sentiment classification . They have used under
sampling technique to deal with the problem of sen-
timent classification i.e., imbalance problem.
6 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Types of Classification
Binary sentiment classification: Each document or review
of the corpus is classified into two classes either positive or
negative.
Multi-class sentiment classification: Each review can be
classified into more than two classes (strong positive, positive,
neutral , negative, strong negative).
Generally, the binary classification is useful when two products
need to be compared. In this study, implementation is done with
respect to binary sentiment classification.
7 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Preparation of Data
The unstructured textual review data need to be converted to mean-
ingful data in order to apply machine learning algorithms.
Following methods have been used to transform textual data to
numerical vectors.
CountVectorizer: Based on the number of occurrences of a
feature in the review, a sparse matrix is created (Garreta and
Moncecchi, 2013).
Term Frequency - Inverse Document frequency (TF-IDF):
The TF-IDF score is helpful in balancing the weight between
most frequent or general words and less commonly used words.
Term frequency calculates the frequency of each token in the
review but this frequency is offset by frequency of that token
in the whole corpus (Garreta and Moncecchi, 2013). TF-IDF
value shows the importance of a token to a document in the
corpus.
8 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Machine Learning Algorithms Used
Naive Bayes Algorithm: It is a probabilistic classifier which uses
the properties of Bayes theorem assuming the strong independence
between the features (McCallum et al., 1998).
For a given textual review ‘d’ and for a class ‘c’ (positive,negative),
the conditional probability for each class given a review is P(c|d) .
According to Bayes theorem this quantity can be computed using
the following equation 1
P(c|d) =
P(d|c) ∗ P(c)
P(d)
(1)
9 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Continue...
Support Vector Machine Algorithm: SVM is a non probabilistic
binary linear classifier (Turney, 2002). SVM Model represents each
review in vectorized form as a data point in the space. This method
is used to analyze the complete vectorized data and the key idea
behind the training of model is to find a hyperplane.
The set of textual data vectors are said to be optimally separated by
hyperplane only when it is separated without error and the distance
between closest points of each class and hyperplane is maximum.
10 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Confusion Matrix
Confusion matrix is generated to tabulate the performance of any
classifier.
Correct Labels
Positive Negative
Positive True Positive (TP) False Positive (FP)
Negative False Negative (FN) True Negative (TN)
Table: Confusion Matrix
TP(True Positive) is the number of positive reviews that are
correctly predicted and FP(False positive) is the number of
positive reviews predicted as negative.
TN(True Negative) is number of negative reviews correctly
predicted and FN(False Negative) is number of negative re-
views predicted as positive.
11 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Evaluation Parameters
Precision : It gives the exactness of the classifier. It is the
ratio of number of correctly predicted positive reviews to the
total number of reviews predicted as positive.
precision =
TP
TP + FP
(2)
Recall: It measures the completeness of the classifier. It is the
ratio of number of correctly predicted positive reviews to the
actual number of positive reviews present in the corpus.
Recall =
TP
TP + FN
(3)
12 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Continue...
F-measure: It is the harmonic mean of precision and recall.
F-measure can have best value as 1 and worst value as 0. The
formula for calculating F-measure is given below in equation 4
FMeasure =
2 ∗ Precision ∗ Recall
Precision + Recall
(4)
Accuracy: It is one of the most common performance eval-
uation parameter and it is calculated as the ratio of number
of correctly predicted reviews to the number of total number
of reviews present in the corpus. The formula for calculating
accuracy is given as equation 5
Accuracy =
TP + TN
TP + FP + TN + FN
(5)
13 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Proposed Approach
Dataset
Preprocessing: Stopword, Numerical and Special character removal
Vectorization
Train using machine learning algorithm
Classification
Result
Figure: Steps to obtain the required output
14 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Dataset
In this study, labeled aclIMDb movie dataset (IMDb, 2006) is
considered .
Dataset contain 12500 labeled positive and negative reviews for
training of model
It also contain 12500 positive and negative reviews for testing
of model as well.
15 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Preprocessing
The review contains a large amount of vague information which
needed to be eliminated.
In preprocessing step, firstly, all the special characters used like
(!@) and the unnecessary blank spaces are removed.
It is observed that reviewers often repeat a particular character
of a word to give more emphasis to an expression or to make
the review trendy (Amir et al., 2014).
second step in preprocessing involves the removal of all the
stopwords of English language.
16 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Vectorization
CountVectorizer: It transforms the review to token count ma-
trix. First, it tokenizes the review and according to number of
occurrence of each token, a sparse matrix is created.
TF-IDF: Its value represents the importance of a word to a
document in a corpus. TF-IDF value is proportional to the
frequency of a word in a document; but it is limited by the
frequency of the word in the corpus.
Calculation of TF-IDF value : suppose a movie review contain
100 words wherein the word Awesome appears 5 times. The term
frequency (i.e., TF) for Awesome then (5 / 100) = 0.05. Again, sup-
pose there are 1 million reviews in the corpus and the word Awesome
appears 1000 times in whole corpus. Then, the inverse document
frequency (i.e., IDF) is calculated as log(1,000,000 / 1,000) = 3.
Thus, the TF-IDF value is calculated as: 0.05 * 3 = 0.15.
17 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Training and Classification
Naive Bayes(NB) algorithm: Using probabilistic analysis, fea-
tures are extracted from numeric vectors. These features help
in training of the Naive Bayes classifier model (McCallum et al.,
1998).
Support vector machine (SVM) algorithm: SVM plots all the
numeric vectors in space and defines decision boundaries by
hyperplanes. This hyperplane separates the vectors in two cat-
egories such that, the distance from the closest point of each
category to the hyperplane is maximum (Turney, 2002).
After training of model using above mentioned algorithms, the
12500 positive and negative reviews given for testing are classified
based on training of model.
18 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Result NB
The confusion matrix obtained after Naive Bayes classification is
shown in table 2 and evaluation parameters Precision, Recall and
F-Measure are shown in table 3 as follows:
Table: Confusion matrix for
Naive Bayes classifier
Correct Labels
Positive Negative
Positive 11025 1475
Negative 2612 9888
Table: Evaluation parameter for
Naive Bayes classifier
Precision Recall F-Measure
Negative 0.81 0.88 0.84
Positive 0.87 0.79 0.83
The accuracy for Naive Bayes Classifier is 0.83652
19 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Result SVM
The confusion matrix obtained after Support Vector Machine
classification is shown in table 4 and evaluation parameters
Precision, Recall and F-Measure are shown in table 5 as follows:
Table: Confusion matrix for
Support Vector Machine
classifier
Correct Labels
Positive Negative
Positive 10993 1507
Negative 1749 10751
Table: Evaluation parameter for
Support Vector Machine
classifier
Precision Recall F-Measure
Negative 0.86 0.88 0.87
Positive 0.88 0.86 0.87
The accuracy for Support Vector Machine classifier for unigram is
0.86976
20 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Comparison of Work
Table: Comparative result obtained among different literature using
IMDb dataset
Classifier
Classification Accuracy
Pang et al. (2002) Salvetti et al. (2004) Mullen and Collier (2004) Beineke et al. (2004) Matsumoto et al. (2005) Proposed approach
Naive Bayes 0.815 0.796 x 0.659 x 0.83
Support Vector Machine 0.659 x 0.86 x 0.883 0.884
The ‘x’ mark indicates that the algorithm is not considered by the author
in their respective paper.
21 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Conclusion
In this study, an attempt has been made to classify sentimental
movie reviews using machine learning techniques.
Two different algorithms namely Naive Bayes (NB) and Support
Vector Machine (SVM) are implemented.
It is observed that SVM classifier outperforms every other clas-
sifier in predicting the sentiment of a review.
The result obtained in this study is comparatively better than
other literatures on same dataset.
22 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Future Work
In this study, only two different classifiers have been imple-
mented.
In future, other classification strategies under supervised learn-
ing methodology like Maximum Entropy classifier, Stochastic
Gradient Classifier, K Nearest Neighbor and others can be con-
sidered for implementation.
Finally, comparison of results can be presented with SVM, which
is currently the best classifier, for the sentiment analysis.
23 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Reference I
Amir, S., Almeida, M., Martins, B., Filgueiras, J., and Silva, M. J. (2014). Tugas: Exploiting unlabelled data for
twitter sentiment analysis. SemEval 2014, page 673.
Argamon, S., Bloom, K., Esuli, A., and Sebastiani, F. (2009). Automatically determining attitude type and force
for sentiment analysis. In Human Language Technology. Challenges of the Information Society, pages 218–231.
Springer.
Beineke, P., Hastie, T., and Vaithyanathan, S. (2004). The sentimental factor: Improving review classification via
human-provided information. In Proceedings of the 42nd Annual Meeting on Association for Computational
Linguistics, page 263. Association for Computational Linguistics.
Dave, K., Lawrence, S., and Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic
classification of product reviews. In Proceedings of the 12th international conference on World Wide Web, pages
519–528. ACM.
Garreta, R. and Moncecchi, G. (2013). Learning scikit-learn: Machine Learning in Python. Packt Publishing Ltd.
IMDb (2006). Imdb, internet movie database sentiment analysis dataset.
Li, S., Wang, Z., Zhou, G., and Lee, S. Y. M. (2011). Semi-supervised learning for imbalanced sentiment classification.
In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 1826.
Matsumoto, S., Takamura, H., and Okumura, M. (2005). Sentiment classification using word sub-sequences and
dependency sub-trees. In Advances in Knowledge Discovery and Data Mining, pages 301–311. Springer.
McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes text classification. In AAAI-98
workshop on learning for text categorization, volume 752, pages 41–48. Citeseer.
Mullen, T. and Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources.
In EMNLP, volume 4, pages 412–418.
Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning
techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-
Volume 10, pages 79–86. Association for Computational Linguistics.
24 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Reference II
Read, J. (2005). Using emoticons to reduce dependency in machine learning techniques for sentiment classification.
In Proceedings of the ACL Student Research Workshop, pages 43–48. Association for Computational Linguistics.
Salvetti, F., Lewis, S., and Reichenbach, C. (2004). Automatic opinion polarity classification of movie. Colorado
research in linguistics, 17:2.
Singh, Y., Bhatia, P. K., and Sangwan, O. (2007). A review of studies on machine learning techniques. International
Journal of Computer Science and Security, 1(1):70–84.
Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of
reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417–424.
Association for Computational Linguistics.
Whitelaw, C., Garg, N., and Argamon, S. (2005). Using appraisal groups for sentiment analysis. In Proceedings of
the 14th ACM international conference on Information and knowledge management, pages 625–631. ACM.
25 / 26
Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References
Thank You!
26 / 26

More Related Content

What's hot

Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Sunny Mervyne Baa
 
A fast non dominated sorting guided genetic algorithm for multi objective pow...
A fast non dominated sorting guided genetic algorithm for multi objective pow...A fast non dominated sorting guided genetic algorithm for multi objective pow...
A fast non dominated sorting guided genetic algorithm for multi objective pow...Pvrtechnologies Nellore
 
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Dr. Amarjeet Singh
 
Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14Mazhar Poohlah
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aRai University
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12Mazhar Poohlah
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01Mehmet Çelik
 
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...IJERA Editor
 
An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...csandit
 
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...csandit
 
Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...IAEME Publication
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretationVasista Vinuthan
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...TEJVEER SINGH
 
A robust multi criteria optimization approach
A robust multi criteria optimization approachA robust multi criteria optimization approach
A robust multi criteria optimization approachPhuong Dx
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSSANSHU TIWARI
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)newIAESIJEECS
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 

What's hot (20)

Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
 
A fast non dominated sorting guided genetic algorithm for multi objective pow...
A fast non dominated sorting guided genetic algorithm for multi objective pow...A fast non dominated sorting guided genetic algorithm for multi objective pow...
A fast non dominated sorting guided genetic algorithm for multi objective pow...
 
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
 
Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
 
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
 
An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...
 
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
 
Data analysis
Data analysisData analysis
Data analysis
 
Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretation
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
 
A robust multi criteria optimization approach
A robust multi criteria optimization approachA robust multi criteria optimization approach
A robust multi criteria optimization approach
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSS
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 

Viewers also liked

Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningNihar Suryawanshi
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Treparel
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 

Viewers also liked (8)

Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine Learning
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 

Similar to Ankit presentation

Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A SurveyLiz Adams
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXmlaij
 
Classification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsClassification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsAM Publications
 
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISFEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
 
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
 
AN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
AN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGAN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
AN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGijscai
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Gurdal Ertek
 
An Experimental Study of Feature Extraction Techniques in Opinion Mining
An Experimental Study of Feature Extraction Techniques in Opinion MiningAn Experimental Study of Feature Extraction Techniques in Opinion Mining
An Experimental Study of Feature Extraction Techniques in Opinion MiningIJSCAI Journal
 
An experimental study of feature
An experimental study of featureAn experimental study of feature
An experimental study of featureijscai
 
The sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionThe sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionEditorIJAERD
 
Opinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLPOpinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLPIJECEIAES
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Decision Support Systems in Clinical Engineering
Decision Support Systems in Clinical EngineeringDecision Support Systems in Clinical Engineering
Decision Support Systems in Clinical EngineeringAsmaa Kamel
 

Similar to Ankit presentation (20)

Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A Survey
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
Classification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsClassification of Machine Learning Algorithms
Classification of Machine Learning Algorithms
 
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISFEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
 
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
 
AN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
AN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGAN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
AN EXPERIMENTAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
An Experimental Study of Feature Extraction Techniques in Opinion Mining
An Experimental Study of Feature Extraction Techniques in Opinion MiningAn Experimental Study of Feature Extraction Techniques in Opinion Mining
An Experimental Study of Feature Extraction Techniques in Opinion Mining
 
An experimental study of feature
An experimental study of featureAn experimental study of feature
An experimental study of feature
 
The sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionThe sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regression
 
Opinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLPOpinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLP
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Decision Support Systems in Clinical Engineering
Decision Support Systems in Clinical EngineeringDecision Support Systems in Clinical Engineering
Decision Support Systems in Clinical Engineering
 

Ankit presentation

  • 1. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Classification of Sentimental Reviews Using Machine Learning Techniques ICRTC-2015: 3rd International Conference On Recent Trends In Computing Presented At SRM University Delhi-NCR Campus, Ghaziabad(U.P) Ankit Agrawal Department of Computer Science and Engineering, National Institute of Technology Rourkela, Rourkela - 769008, Odisha, India March 9, 2015 1 / 26
  • 2. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Sentiment Analysis Sentiment mainly refers to feelings, emotions, opinion or atti- tude (Argamon et al., 2009). With the rapid increase of world wide web, people often express their sentiments over internet through social media, blogs, rat- ing and reviews. Business owners and advertising companies often employ sen- timent analysis to discover new business strategies and adver- tising campaign. 2 / 26
  • 3. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Machine Learning Techniques Machine leaning algorithms are used to classify and predict whether a document represents positive or negative sentiment. Different types of machine learning algorithms are: Supervised algorithm uses a labeled dataset where each doc- ument of training set is labeled with appropriate sentiment. Unsupervised algorithm include unlabeled dataset (Singh et al., 2007). This study mainly concerns with supervised learning techniques on a labeled dataset. 3 / 26
  • 4. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Types of Sentiment Analysis Different Types of Sentiment analysis are as follows: Document Level: Document Level sentiment classification aims at classifying the entire document or review as either pos- itive or negative. Sentence Level: Sentence level sentiment classification con- siders the polarity of individual sentence of a document. Aspect Level: Aspect level sentiment classification first iden- tifies the different aspects of a corpus and then for each doc- ument, the polarity is calculated with respect to obtained as- pects. Document level sentiment analysis is being considered for analysis in this study. 4 / 26
  • 5. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Review of Related work Author(s) Sentiment Analysis Approach (Pang et al., 2002) They have considered sentiment classification based on categorization aspect with positive and negative sentiments . They used three different machine learn- ing algorithms i.e., Naive Bayes, SVM , and Maximum Entropy classification applied over the n-gram tech- nique. (Turney, 2002). He presented unsupervised algorithm to classify re- view as either recommended (positive) or not rec- ommended (negative). The author has used Part of Speech (POS) tagger to identify phrases which con- tain adjectives or adverbs. (Read, 2005). He used emotions for labeling of dataset. He used emotions for labeling because they are independent of time, topic and domain. He applied machine learning classifiers on this labeled dataset. 5 / 26
  • 6. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References continue... Author(s) Sentiment Analysis Approach (Dave et al., 2003). They have used structured review for testing and training, identifying features and score methods to de- termine whether the reviews are positive or negative. They used classifier to classify the sentences obtained from web search through search query using product name as search condition. (Whitelaw et al., 2005). They have presented a sentiment classification tech- nique on the basis of analysis and extraction of ap- praisal groups. Appraisal group represents a set of attribute values in task independent semantic tax- onomies. (Li et al., 2011). They have proposed various semi-supervised tech- niques to solve the issue of shortage of labeled data for sentiment classification . They have used under sampling technique to deal with the problem of sen- timent classification i.e., imbalance problem. 6 / 26
  • 7. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Types of Classification Binary sentiment classification: Each document or review of the corpus is classified into two classes either positive or negative. Multi-class sentiment classification: Each review can be classified into more than two classes (strong positive, positive, neutral , negative, strong negative). Generally, the binary classification is useful when two products need to be compared. In this study, implementation is done with respect to binary sentiment classification. 7 / 26
  • 8. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Preparation of Data The unstructured textual review data need to be converted to mean- ingful data in order to apply machine learning algorithms. Following methods have been used to transform textual data to numerical vectors. CountVectorizer: Based on the number of occurrences of a feature in the review, a sparse matrix is created (Garreta and Moncecchi, 2013). Term Frequency - Inverse Document frequency (TF-IDF): The TF-IDF score is helpful in balancing the weight between most frequent or general words and less commonly used words. Term frequency calculates the frequency of each token in the review but this frequency is offset by frequency of that token in the whole corpus (Garreta and Moncecchi, 2013). TF-IDF value shows the importance of a token to a document in the corpus. 8 / 26
  • 9. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Machine Learning Algorithms Used Naive Bayes Algorithm: It is a probabilistic classifier which uses the properties of Bayes theorem assuming the strong independence between the features (McCallum et al., 1998). For a given textual review ‘d’ and for a class ‘c’ (positive,negative), the conditional probability for each class given a review is P(c|d) . According to Bayes theorem this quantity can be computed using the following equation 1 P(c|d) = P(d|c) ∗ P(c) P(d) (1) 9 / 26
  • 10. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Continue... Support Vector Machine Algorithm: SVM is a non probabilistic binary linear classifier (Turney, 2002). SVM Model represents each review in vectorized form as a data point in the space. This method is used to analyze the complete vectorized data and the key idea behind the training of model is to find a hyperplane. The set of textual data vectors are said to be optimally separated by hyperplane only when it is separated without error and the distance between closest points of each class and hyperplane is maximum. 10 / 26
  • 11. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Confusion Matrix Confusion matrix is generated to tabulate the performance of any classifier. Correct Labels Positive Negative Positive True Positive (TP) False Positive (FP) Negative False Negative (FN) True Negative (TN) Table: Confusion Matrix TP(True Positive) is the number of positive reviews that are correctly predicted and FP(False positive) is the number of positive reviews predicted as negative. TN(True Negative) is number of negative reviews correctly predicted and FN(False Negative) is number of negative re- views predicted as positive. 11 / 26
  • 12. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Evaluation Parameters Precision : It gives the exactness of the classifier. It is the ratio of number of correctly predicted positive reviews to the total number of reviews predicted as positive. precision = TP TP + FP (2) Recall: It measures the completeness of the classifier. It is the ratio of number of correctly predicted positive reviews to the actual number of positive reviews present in the corpus. Recall = TP TP + FN (3) 12 / 26
  • 13. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Continue... F-measure: It is the harmonic mean of precision and recall. F-measure can have best value as 1 and worst value as 0. The formula for calculating F-measure is given below in equation 4 FMeasure = 2 ∗ Precision ∗ Recall Precision + Recall (4) Accuracy: It is one of the most common performance eval- uation parameter and it is calculated as the ratio of number of correctly predicted reviews to the number of total number of reviews present in the corpus. The formula for calculating accuracy is given as equation 5 Accuracy = TP + TN TP + FP + TN + FN (5) 13 / 26
  • 14. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Proposed Approach Dataset Preprocessing: Stopword, Numerical and Special character removal Vectorization Train using machine learning algorithm Classification Result Figure: Steps to obtain the required output 14 / 26
  • 15. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Dataset In this study, labeled aclIMDb movie dataset (IMDb, 2006) is considered . Dataset contain 12500 labeled positive and negative reviews for training of model It also contain 12500 positive and negative reviews for testing of model as well. 15 / 26
  • 16. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Preprocessing The review contains a large amount of vague information which needed to be eliminated. In preprocessing step, firstly, all the special characters used like (!@) and the unnecessary blank spaces are removed. It is observed that reviewers often repeat a particular character of a word to give more emphasis to an expression or to make the review trendy (Amir et al., 2014). second step in preprocessing involves the removal of all the stopwords of English language. 16 / 26
  • 17. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Vectorization CountVectorizer: It transforms the review to token count ma- trix. First, it tokenizes the review and according to number of occurrence of each token, a sparse matrix is created. TF-IDF: Its value represents the importance of a word to a document in a corpus. TF-IDF value is proportional to the frequency of a word in a document; but it is limited by the frequency of the word in the corpus. Calculation of TF-IDF value : suppose a movie review contain 100 words wherein the word Awesome appears 5 times. The term frequency (i.e., TF) for Awesome then (5 / 100) = 0.05. Again, sup- pose there are 1 million reviews in the corpus and the word Awesome appears 1000 times in whole corpus. Then, the inverse document frequency (i.e., IDF) is calculated as log(1,000,000 / 1,000) = 3. Thus, the TF-IDF value is calculated as: 0.05 * 3 = 0.15. 17 / 26
  • 18. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Training and Classification Naive Bayes(NB) algorithm: Using probabilistic analysis, fea- tures are extracted from numeric vectors. These features help in training of the Naive Bayes classifier model (McCallum et al., 1998). Support vector machine (SVM) algorithm: SVM plots all the numeric vectors in space and defines decision boundaries by hyperplanes. This hyperplane separates the vectors in two cat- egories such that, the distance from the closest point of each category to the hyperplane is maximum (Turney, 2002). After training of model using above mentioned algorithms, the 12500 positive and negative reviews given for testing are classified based on training of model. 18 / 26
  • 19. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Result NB The confusion matrix obtained after Naive Bayes classification is shown in table 2 and evaluation parameters Precision, Recall and F-Measure are shown in table 3 as follows: Table: Confusion matrix for Naive Bayes classifier Correct Labels Positive Negative Positive 11025 1475 Negative 2612 9888 Table: Evaluation parameter for Naive Bayes classifier Precision Recall F-Measure Negative 0.81 0.88 0.84 Positive 0.87 0.79 0.83 The accuracy for Naive Bayes Classifier is 0.83652 19 / 26
  • 20. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Result SVM The confusion matrix obtained after Support Vector Machine classification is shown in table 4 and evaluation parameters Precision, Recall and F-Measure are shown in table 5 as follows: Table: Confusion matrix for Support Vector Machine classifier Correct Labels Positive Negative Positive 10993 1507 Negative 1749 10751 Table: Evaluation parameter for Support Vector Machine classifier Precision Recall F-Measure Negative 0.86 0.88 0.87 Positive 0.88 0.86 0.87 The accuracy for Support Vector Machine classifier for unigram is 0.86976 20 / 26
  • 21. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Comparison of Work Table: Comparative result obtained among different literature using IMDb dataset Classifier Classification Accuracy Pang et al. (2002) Salvetti et al. (2004) Mullen and Collier (2004) Beineke et al. (2004) Matsumoto et al. (2005) Proposed approach Naive Bayes 0.815 0.796 x 0.659 x 0.83 Support Vector Machine 0.659 x 0.86 x 0.883 0.884 The ‘x’ mark indicates that the algorithm is not considered by the author in their respective paper. 21 / 26
  • 22. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Conclusion In this study, an attempt has been made to classify sentimental movie reviews using machine learning techniques. Two different algorithms namely Naive Bayes (NB) and Support Vector Machine (SVM) are implemented. It is observed that SVM classifier outperforms every other clas- sifier in predicting the sentiment of a review. The result obtained in this study is comparatively better than other literatures on same dataset. 22 / 26
  • 23. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Future Work In this study, only two different classifiers have been imple- mented. In future, other classification strategies under supervised learn- ing methodology like Maximum Entropy classifier, Stochastic Gradient Classifier, K Nearest Neighbor and others can be con- sidered for implementation. Finally, comparison of results can be presented with SVM, which is currently the best classifier, for the sentiment analysis. 23 / 26
  • 24. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Reference I Amir, S., Almeida, M., Martins, B., Filgueiras, J., and Silva, M. J. (2014). Tugas: Exploiting unlabelled data for twitter sentiment analysis. SemEval 2014, page 673. Argamon, S., Bloom, K., Esuli, A., and Sebastiani, F. (2009). Automatically determining attitude type and force for sentiment analysis. In Human Language Technology. Challenges of the Information Society, pages 218–231. Springer. Beineke, P., Hastie, T., and Vaithyanathan, S. (2004). The sentimental factor: Improving review classification via human-provided information. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 263. Association for Computational Linguistics. Dave, K., Lawrence, S., and Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web, pages 519–528. ACM. Garreta, R. and Moncecchi, G. (2013). Learning scikit-learn: Machine Learning in Python. Packt Publishing Ltd. IMDb (2006). Imdb, internet movie database sentiment analysis dataset. Li, S., Wang, Z., Zhou, G., and Lee, S. Y. M. (2011). Semi-supervised learning for imbalanced sentiment classification. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 1826. Matsumoto, S., Takamura, H., and Okumura, M. (2005). Sentiment classification using word sub-sequences and dependency sub-trees. In Advances in Knowledge Discovery and Data Mining, pages 301–311. Springer. McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, volume 752, pages 41–48. Citeseer. Mullen, T. and Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. In EMNLP, volume 4, pages 412–418. Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing- Volume 10, pages 79–86. Association for Computational Linguistics. 24 / 26
  • 25. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Reference II Read, J. (2005). Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In Proceedings of the ACL Student Research Workshop, pages 43–48. Association for Computational Linguistics. Salvetti, F., Lewis, S., and Reichenbach, C. (2004). Automatic opinion polarity classification of movie. Colorado research in linguistics, 17:2. Singh, Y., Bhatia, P. K., and Sangwan, O. (2007). A review of studies on machine learning techniques. International Journal of Computer Science and Security, 1(1):70–84. Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417–424. Association for Computational Linguistics. Whitelaw, C., Garg, N., and Argamon, S. (2005). Using appraisal groups for sentiment analysis. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 625–631. ACM. 25 / 26
  • 26. Introduction Related work Methodology Proposed Approach Conclusion & Future Work Reference References Thank You! 26 / 26