Performance Analysis of Supervised Machine
Learning Techniques for Sentiment Analysis
Presented By : BiswaRanja Samal
Software Engineer @ Acesocloud
IEEE 3rd International Conference on Sensing, Signal Processing & Security
Authors :
1. Anil Kumar Behera
2. Mrutyunjaya Panda
P.G. Department of Computer Science and Applications
UtkalUniversity,Vani Vihar,Bhubaneswar-751004, India
Contents
• Introduction
• Motivation
• Proposed Methodology
• Experimental Results
• Conclusions and Future Scope
• References
Introduction
>> What is Machine Learning?
>> Types of Machine Learning.
>>> Supervised Machine Learning
>>> Unsupervised Machine Learning
>>> Reinforcement Learning
>>> Transduction
>>> Semi-supervised Machine Learning
>> What is Sentiment Analysis?
>> Why Sentiment Analysis?
Expressing the emotions and feelings with the help of words
makes human beings unique[19]. These feelings are known as
the sentiments and the process of analyzing these statements is
known as the Sentiment Analysis.
Sentiment analysis along with the machine learning
techniques can result in the building of a high-performance
intelligent system and can proof its expertise in the area of
artificial intelligence [16].
Motivation
sometimes it becomes a very complex job for the researchers to
select an appropriate machine learning technique according to
their requirement which leads them to improper result with very
poor accuracy and performance of the model.
This motivated us towards doing an investigation on performance
analysis of available machine learning techniques for sentiment
analysis. We have considered only the supervised machine
learning techniques and have tried to do a comparison in each
criterion of this technique.
Proposed Methodology
Algorithm overflow of the proposed methodology
Step1 : Start
Step2 : for each dataset present in dataSetList
Step2.1: Clean the data set
Step2.2: Prepare training data set
Step2.3: Prepare testing data set
Step2.4: for each classifier present in classifierList
Step2.4.1: train classifier with training data set
Step2.4.2: test classifier with testing data set
Step2.4.3: Obtain the accuracy percent from result
Step3: Finish
Methodology in Details
>> Collecting Movie Review Data Sets
We have collected various size of movie review data sets such as
10600, 25000, 35600, 50000 and 85600.
>> Cleaning the Data Sets
Movie review data set consists of characters, numbers, special
characters and unrecognized characters. Which may create
hazard for our classifier, that’s why after collecting the data
sets we have undertaken the data set cleaning procedure
>> Data Categorization
Data Set Size Positive Feedbacks Negative Feedbacks
10,600 5,300 5,300
25,000 12,500 12,500
35,600 17,800 17,800
50,000 25,000 25,000
85,600 42,800 42,8001
Data sets showing numbers of positive and negative feedback
>> Preparing Training and Testing Data Sets
It’s a common convenience to use 70% of the data set for the
training purpose and use rest 30% of the data for testing the
model we have also followed it.
Data Set Size Training Data Set Size Testing Data Set Size
10,600 7420 3180
25,000 17500 7500
35,600 24920 10680
50,000 35000 15000
85,600 59918 25680
Representing number of training and testing reviews
>> Training the Model with Training Data Sets
>> Testing the Model with Testing Data Sets
Experimental Results
Representing used classifiers performance with highlighting the classifier
which has performed very accurately .
Representing the graphical representation of all classifiers performance .
Conclusions and Future Works
>> In this paper, a simple yet novel approach on sentiment
analysis of movie reviews is performed using seven
promising supervised machine learning algorithms.
>> The results obtained concludes linear SVC/SVM as the
best classifier among others in achieving 100% accuracy
for large number of movie reviews.
>> In future, we try to investigate its effectiveness
considering big datasets using the unsupervised and
semi supervised machine learning techniques.
References
[1] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830,
2011.
[2] TaiwoOladipupoAyodele . Types of Machine Learning Algorithms, New Advances in
Machine Learning, Yagang Zhang (Ed.), InTech,2010, DOI: 10.5772/9385.
[3]https://www.jasondavies.com/wordcloud/
[4]CagatayCatal, Mehmet Nangir, A Sentiment
Classification Model Based On Multiple Classifiers, Applied Soft Computing Journal
http://dx.doi.org/10.1016/j.asoc.2016.11.022.
[5] S.C. Satapathy et al. (eds.), Proceedings of the 5th International Conference on
Frontiersin Intelligent Computing: Theory and Applications, Advances in Intelligent
Systemsand Computing 516, DOI 10.1007/978-981-10-3156-4_39.
[6] Springer International Publishing Switzerland 2017L. Igual and S. Seguí, Introduction
to Data Science,Undergraduate Topics in Computer Science, DOI 10.1007/978-3-319-
50017-1_10.
[7] S.V.Solai Ananth1, Chandu PMSS, Live Twitter Knowledge as a Corpus for
Sentiment Analysis and Opinion Mining , International Journal of Engineering Science
and Computing, January 2017.
[8] Singh, J.P., et al., Predicting the “helpfulness” of online consumer reviews, Journal
of Business Research (2016), http://dx.doi.org/10.1016/j.jbusres.2016.08.008.
[9]http:/northcampus.uok.edu.in/downloads/20161105144024077.pdf
[10] Bing Liu, Xiaoli Li, Wee Sun Lee and Philip S. Yu, “Text Classification by Labeling
Words” , American Association for Artificial Intelligence. 2004.[11]Semi-Supervised
Learning—O. Chapelle, B. Schölkopf, andA. Zien, Eds. (London, U.K.: MIT Press, 2006,
pp. 508, ISBN:978-0-262-03358-9). Reviewed by Philippe Thomas.
[12] Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning:
Data Mining, Inference, and Prediction (2nd edition) (Springer Series in Statistics), 2009.
[13] Sebastian B. Thrun, Efficient Exploration In Reinforcement Learning (1992).
[14]Stiglitz, Joseph E. "Learning to learn, localized learning and technological progress."
Economic policy and technological performance (1987): 125-153.
[15] Freitag, Dayne. "Machine learning for information extraction in informal domains."
Machine learning 39.2-3 (2000): 169-202.
[16] Bing Liu. Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, May
2012.
[17]Timothy et al (Timothy Jason Shepard, 1998).
[18] Maas, Andrew L., et al. "Learning word vectors for sentiment analysis." Proceedings of
the 49th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies-Volume 1. Association for Computational Linguistics, 2011.
[19] BiswaRanjanSamal, Mrutyunjaya Panda, HumanBeing Character Analysis from Their
SocialNetworking Profiles A Semisupervised Machine Learning Approach, (IJCSIS)
International Journal of Computer Science and Information Security, Vol. 14, No. 5, May 2016
[20] Murphy, Kevin P. "Naive bayes classifiers." University of British Columbia (2006)..
[21]McCallum, Andrew, and Kamal Nigam. "A comparison of event models for naive bayes
text classification." AAAI-98 workshop on learning for text categorization. Vol. 752. 1998.
[22]Meena, M. Janaki, and K. R. Chandran. "Naive Bayes text classification with positive
features selected by statistical method." Advanced Computing, 2009. ICAC 2009. First.
[23]Kurt, Imran, MevlutTure, and A. TurhanKurum. "Comparing performances of logistic
regression, classification and regression tree, and neural networks for predicting coronary.
[24]Bottou, Léon. "Large-scale machine learning with stochastic gradient descent."
Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186.
[25]Alfaro, René, et al. "Forests for the New Millennium-Making Forests Work for People and
Nature." Selected Books 1 (2005).
Thank You

Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis

  • 1.
    Performance Analysis ofSupervised Machine Learning Techniques for Sentiment Analysis Presented By : BiswaRanja Samal Software Engineer @ Acesocloud IEEE 3rd International Conference on Sensing, Signal Processing & Security Authors : 1. Anil Kumar Behera 2. Mrutyunjaya Panda P.G. Department of Computer Science and Applications UtkalUniversity,Vani Vihar,Bhubaneswar-751004, India
  • 2.
    Contents • Introduction • Motivation •Proposed Methodology • Experimental Results • Conclusions and Future Scope • References
  • 3.
  • 4.
    >> What isMachine Learning? >> Types of Machine Learning. >>> Supervised Machine Learning >>> Unsupervised Machine Learning >>> Reinforcement Learning >>> Transduction >>> Semi-supervised Machine Learning
  • 5.
    >> What isSentiment Analysis? >> Why Sentiment Analysis? Expressing the emotions and feelings with the help of words makes human beings unique[19]. These feelings are known as the sentiments and the process of analyzing these statements is known as the Sentiment Analysis. Sentiment analysis along with the machine learning techniques can result in the building of a high-performance intelligent system and can proof its expertise in the area of artificial intelligence [16].
  • 6.
  • 7.
    sometimes it becomesa very complex job for the researchers to select an appropriate machine learning technique according to their requirement which leads them to improper result with very poor accuracy and performance of the model. This motivated us towards doing an investigation on performance analysis of available machine learning techniques for sentiment analysis. We have considered only the supervised machine learning techniques and have tried to do a comparison in each criterion of this technique.
  • 8.
  • 9.
    Algorithm overflow ofthe proposed methodology Step1 : Start Step2 : for each dataset present in dataSetList Step2.1: Clean the data set Step2.2: Prepare training data set Step2.3: Prepare testing data set Step2.4: for each classifier present in classifierList Step2.4.1: train classifier with training data set Step2.4.2: test classifier with testing data set Step2.4.3: Obtain the accuracy percent from result Step3: Finish
  • 10.
    Methodology in Details >>Collecting Movie Review Data Sets We have collected various size of movie review data sets such as 10600, 25000, 35600, 50000 and 85600. >> Cleaning the Data Sets Movie review data set consists of characters, numbers, special characters and unrecognized characters. Which may create hazard for our classifier, that’s why after collecting the data sets we have undertaken the data set cleaning procedure >> Data Categorization Data Set Size Positive Feedbacks Negative Feedbacks 10,600 5,300 5,300 25,000 12,500 12,500 35,600 17,800 17,800 50,000 25,000 25,000 85,600 42,800 42,8001 Data sets showing numbers of positive and negative feedback
  • 11.
    >> Preparing Trainingand Testing Data Sets It’s a common convenience to use 70% of the data set for the training purpose and use rest 30% of the data for testing the model we have also followed it. Data Set Size Training Data Set Size Testing Data Set Size 10,600 7420 3180 25,000 17500 7500 35,600 24920 10680 50,000 35000 15000 85,600 59918 25680 Representing number of training and testing reviews
  • 12.
    >> Training theModel with Training Data Sets >> Testing the Model with Testing Data Sets
  • 13.
  • 14.
    Representing used classifiersperformance with highlighting the classifier which has performed very accurately .
  • 15.
    Representing the graphicalrepresentation of all classifiers performance .
  • 16.
  • 17.
    >> In thispaper, a simple yet novel approach on sentiment analysis of movie reviews is performed using seven promising supervised machine learning algorithms. >> The results obtained concludes linear SVC/SVM as the best classifier among others in achieving 100% accuracy for large number of movie reviews. >> In future, we try to investigate its effectiveness considering big datasets using the unsupervised and semi supervised machine learning techniques.
  • 18.
    References [1] Scikit-learn: MachineLearning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. [2] TaiwoOladipupoAyodele . Types of Machine Learning Algorithms, New Advances in Machine Learning, Yagang Zhang (Ed.), InTech,2010, DOI: 10.5772/9385. [3]https://www.jasondavies.com/wordcloud/ [4]CagatayCatal, Mehmet Nangir, A Sentiment Classification Model Based On Multiple Classifiers, Applied Soft Computing Journal http://dx.doi.org/10.1016/j.asoc.2016.11.022. [5] S.C. Satapathy et al. (eds.), Proceedings of the 5th International Conference on Frontiersin Intelligent Computing: Theory and Applications, Advances in Intelligent Systemsand Computing 516, DOI 10.1007/978-981-10-3156-4_39. [6] Springer International Publishing Switzerland 2017L. Igual and S. Seguí, Introduction to Data Science,Undergraduate Topics in Computer Science, DOI 10.1007/978-3-319- 50017-1_10. [7] S.V.Solai Ananth1, Chandu PMSS, Live Twitter Knowledge as a Corpus for Sentiment Analysis and Opinion Mining , International Journal of Engineering Science and Computing, January 2017. [8] Singh, J.P., et al., Predicting the “helpfulness” of online consumer reviews, Journal of Business Research (2016), http://dx.doi.org/10.1016/j.jbusres.2016.08.008. [9]http:/northcampus.uok.edu.in/downloads/20161105144024077.pdf [10] Bing Liu, Xiaoli Li, Wee Sun Lee and Philip S. Yu, “Text Classification by Labeling Words” , American Association for Artificial Intelligence. 2004.[11]Semi-Supervised Learning—O. Chapelle, B. Schölkopf, andA. Zien, Eds. (London, U.K.: MIT Press, 2006, pp. 508, ISBN:978-0-262-03358-9). Reviewed by Philippe Thomas.
  • 19.
    [12] Trevor Hastie,Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition) (Springer Series in Statistics), 2009. [13] Sebastian B. Thrun, Efficient Exploration In Reinforcement Learning (1992). [14]Stiglitz, Joseph E. "Learning to learn, localized learning and technological progress." Economic policy and technological performance (1987): 125-153. [15] Freitag, Dayne. "Machine learning for information extraction in informal domains." Machine learning 39.2-3 (2000): 169-202. [16] Bing Liu. Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, May 2012. [17]Timothy et al (Timothy Jason Shepard, 1998). [18] Maas, Andrew L., et al. "Learning word vectors for sentiment analysis." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011. [19] BiswaRanjanSamal, Mrutyunjaya Panda, HumanBeing Character Analysis from Their SocialNetworking Profiles A Semisupervised Machine Learning Approach, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 5, May 2016 [20] Murphy, Kevin P. "Naive bayes classifiers." University of British Columbia (2006).. [21]McCallum, Andrew, and Kamal Nigam. "A comparison of event models for naive bayes text classification." AAAI-98 workshop on learning for text categorization. Vol. 752. 1998. [22]Meena, M. Janaki, and K. R. Chandran. "Naive Bayes text classification with positive features selected by statistical method." Advanced Computing, 2009. ICAC 2009. First. [23]Kurt, Imran, MevlutTure, and A. TurhanKurum. "Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary. [24]Bottou, Léon. "Large-scale machine learning with stochastic gradient descent." Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186. [25]Alfaro, René, et al. "Forests for the New Millennium-Making Forests Work for People and Nature." Selected Books 1 (2005).
  • 20.