Introduction to Sentiment Analysis


Published on

This is seminar report on Sentiment Analysis.This report gives the brief introduction to what is sentiment analysis?what are the various ways to implement it?

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Sentiment Analysis

  1. 1. SENTIMENT ANALYSIS A Seminar Report Submitted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Engineering in Computer Engineering Submitted by Patil Makrand Anil DEPARTMENT OF COMPUTER ENGINEERING SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE 2013 - 2014
  2. 2. SENTIMENT ANALYSIS A Seminar Report Submitted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Engineering in Computer Engineering Submitted by Patil Makrand Anil Guided by Ms. A. A. Chavan DEPARTMENT OF COMPUTER ENGINEERING SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE 2013 - 2014
  3. 3. SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE DEPARTMENT OF COMPUTER ENGINEERING CERTIFICATE This is to certify that the Seminar entitled Sentiment Analysis has been carried out by Patil Makrand Anil under my guidance in partial fulfillment of the degree of Bachelor of Engineering in Computer Engineering of North Maharashtra University, Jalgaon during the academic year 2013 - 2014. To the best of my knowledge and belief this work has not been submitted elsewhere for the award of any other degree. Date: Place: Dhule Guide Ms. A. A. Chavan Head Principal Prof. B. R. Mandre Dr. Hitendra D. Patil iii
  4. 4. Acknowledgement The completion of the report on “Sentiment Analysis”has given me profound knowledge. I am sincerely thankful to Prof B. R. Mandre and my guide Ms. A. A. Chavan who have cooperated and guided me at different stages during the preparation of this report. My sincere thanks to the staff of “Computer Engineering Department”, without the help of them I could not have even conceived the accomplishment of this report. This work is virtually the result of their inspiration and guidance.I would also like to thank the entire library staff and all those who directly or indirectly were the part of this work. Patil Makrand Anil iv
  5. 5. Contents Acknowledgement iv Abstract 1 1 Introduction 2 1.1 What is Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Need of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Literature Survey 4 3 Methodology 6 3.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Implementation 9 4.1 Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Natural Language Processing Approach . . . . . . . . . . . . . . . . . . . . . 10 4.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Applications 5.1 12 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Advantages & Disadvantages 13 14 6.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.2 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 7 Conclusion 15 Bibliography 16 v
  6. 6. List of Figures 4.1 Implementation Architecture using Machine Learning Approach . . . . . . . 4.2 Implementation Architecture using NLP Approach vi . . . . . . . . . . . . . . 9 10
  7. 7. Abstract Our day-to-day life has always been influenced by what people think. Ideas and opinions of others have always affected our own opinions. The explosion of Web 2.0 has led to increased activity in Podcasting, Blogging, Tagging, Contributing to RSS, Social Bookmarking, and Social Networking. As a result there has been an eruption of interest in people to mine these vast resources of data for opinions. Sentiment Analysis or Opinion Mining is the computational treatment of opinions, sentiments and subjectivity of text. In this report, we discuss various approaches to perform a computational treatment of sentiments and opinions. Various supervised or data-driven techniques to Sentiment Analysis like Naive Byes, Support Vector Machine and SentiWordNet approach to Sentiment Analysis. 1
  8. 8. Chapter 1 Introduction 1.1 What is Sentiment Analysis Sentiment Analysis is a Natural Language Processing and Information Extraction task that aims to obtain writers feelings expressed in positive or negative comments, questions and requests, by analyzing a large numbers of documents.For example: “I am so happy today,good morning to everyone”, is a general positive text.Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall functonality of a document.Sentiment analysis is also known as opinion mining. Basically, Sentiment Analysis is the task of identifying whether the opinion expressed in a text is Positive or Negative. Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. 1.2 Need of Sentiment Analysis According to a recent statistics by the Social Media tracking company Technorati, four out of every five users of Internet use social media in some form. This includes friendship networks, blogging and micro-blogging sites, content and video sharing sites etc. It is worth observing that the World Wide Web has now completely transformed into a more participative and co-creative Web. It allows a large number of users to contribute in a variety of forms. The fact is that even those who are virtually novice to the technicalities of the Web publishing are creating content on the Web. In fact the value of a Website is now determined largely by its user base, which in turn decides the amount of data available on it. It may perhaps be true to say that Data is the new Intel inside.[1] One such interesting form of user contributions on the Web is reviews. Many sites on the Web allow users to write their experiences or opinion about a product or service in form 2
  9. 9. CHAPTER 1. INTRODUCTION of a review. The Web is now full of userreviews for different items ranging from mobile phones, holiday trips, and hotel services to movie reviews etc. It is interesting to observe that these reviews not only express opinions of a group of users but is also a valuable source for harnessing collective intelligence. For example, a user looking for a hotel in a particular tourist city may prefer to go through the reviews of available hotels in the city before making a decision to book in one of them. Or a user willing to buy a particular model of digital camera may first look at reviews posted by many other users about that camera before making a buying decision. This not only helps in allowing the user to get more and relevant information about different products and services on a mouse click, but also helps in arriving at a more informed decision. Sometimes users prefer to write their experiences about a product or service as form of a blog post rather than an explicit review. However, in both case the data is basically textual. Popular sites like, are now full of user reviews, in this case reviews of cars and movies respectively.[3] Though these reviews and posts are beyond doubt very useful and valuable, but at the same time it is also quite difficult for a new user (or a prospective customer) to read all the reviews/ posts in a short span of time. Fortunately we have a solution to this information overload problem which can present a comprehensive summary result out of a large number of reviews. The new Information Retrieval formulations, popularly called sentiment classifiers, now not only allow to automatically label a review as positive or negative, but to extract and highlight positive and negative aspects of a product/ service. Sentiment analysis is now an important part of Information Retrieval based formulations in a variety of domains. It is traditionally used for automatic extraction of opinions types about a product and for highlighting positive or negative aspects/ features of a product. It is widely believed that Sentiment analysis is needed and useful. It is also widely accepted that extracting sentiment from text is a hard semantic problem even for human beings. So in general, Sentiment Analysis will be useful for extracting sentiments available on Blogging sites, Social Network, Discussion Forum in order to benefit both company and customer/user. 1.3 Summery What is Sentiment Analysis, what is the need of Sentiment Analysis and the basic introduction Sentiment Analysis has been covered in this chapter. 3
  10. 10. Chapter 2 Literature Survey Balamurali et al. (2011) presents an innovative idea to introduce sense based sentiment analysis. This implies shifting from lexeme feature space to semantic space i.e. from simple words to their synsets. The works in Sentiment Analysis, for so long, concentrated on lexeme feature space or identifying relations between words using parsing. The need for integrating sense to Sentiment Analysis was the need of the hour due to the following scenarios, as identified by the authors: • A word may have some sentiment-bearing and some non-sentiment-bearing senses • There may be different senses of a word that bear sentiment of opposite polarity • The same sense can be manifested by different words (appearing in the same synset) Using sense as features helps to exploit the idea of sense/concepts and the hierarchical structure of the WordNet. The following feature representations were used by the authors and their performance were compared to that of lexeme based features: • A group of word senses that have been manually annotated (M) • A group of word senses that have been annotated by an automatic WSD (I) • A group of manually annotated word senses and words (both separately as features) (Sense + Words(M)) • A group of automatically annotated word senses and words (both separately as features) (Sense + Words(I)) Sense + Words(M) and Sense + Words(I) were used to overcome non-coverage of WordNet for some noun synsets. The authors used synset-replacement strategies to deal with non-coverage, in case a synset in test document is not found in the training documents. In that case the target unknown synset is replaced with its closest counterpart among the WordNet synsets by using some metric. 4
  11. 11. CHAPTER 2. LITERATURE SURVEY Supprt Vector Machines were used for classification of the feature vectors and IWSD was used for automatic WSD. Extensive experiments were done to compare the performance of the 4 feature representations with lexeme representation. Best performance, in terms of accuracy, was obtained by using sense based SA with manual annotation (with an accuracy of 90.2 percent and an increase of 5.3 percent over the baseline accuracy) followed by Sense(M), Sense + Words(I), Sense(I) and lexeme feature representation. LESK was found to perform the best among the 3 metrics used in replacement strategies. One of the reasons for improvements was attributed to feature abstraction and dimensionality reduction leading to noise reduction. The work achieved its target of bringing a new dimension to Sentiment Analysis by introducing sense based Sentiment Analysis. 5
  12. 12. Chapter 3 Methodology There are primarily two types of approaches for sentiment classification of opinionated texts[1]: 1. using a Machine learning based text classifier such as Naive Bayes, Support Vector Machine 2. using Natural Language Processing 3.1 Machine Learning Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders. Machine learning focuses on prediction, based on known properties learned from the training data. Classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known For example would be assigning a given email into “spam” or “non-spam” classes An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. The term classifier sometimes also refers to the mathematical function, implemented by a classification algorithm, that maps input data to a category By training it means to train them on particular inputs so that later on we may test them for unknown inputs (which they have never seen before) for which they may classify or predict etc based on their learning.Classifying data is a common task in machine learning. Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. 6
  13. 13. CHAPTER 3. METHODOLOGY The machine learning based text classifiers are a kind of supervised machine learning paradigm, where the classifier needs to be trained on some labeled training data before it can be applied to actual classification task. The training data is usually an extracted portion of the original data hand labeled manually. After suitable training they can be used on the actual test data. The Naive Bayes is a statistical classifier whereas Support Vector Machine is a kind of vector space classifier. The statistical text classifier scheme of Naive Bayes (NB) can be adapted to be used for sentiment classification problem as it can be visualized as a 2-class text classification problem: in positive and negative classes.[2] Support Vector machine (SVM) is a kind of vector space model based classifier which requires that the text documents should be transformed to feature vectors before they are used for classification. Usually the text documents are transformed to multidimensional tf.idf vectors. The entire problem of classification is then classifying every text document represented as a vector into a particular class. It is a type of large margin classifier. Here the goal is to find a decision boundary between two classes that is maximally far from any document in the training data. This approach needs 1. A good classifier such as Naive Byes, Support Vector Machine,etc 2. A training set for each class There are various training sets available on Internet such as Movie Reviews data set, twitter dataset, etc. Class can be Positive,negative. For both the classes we need training data sets. 3.2 Natural Language Processing Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. This approach utilizes the publicly available library of SentiWordNet, which provides a sentiment polarity values for every term occurring in the document. In this lexical resource each term t occurring in WordNet is associated to three numerical scores obj(t), pos(t) and neg(t), describing the objective, positive and negative polarities of the term, respectively. These three scores are computed by combining the results produced by eight ternary classifiers.[3] WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. 7
  14. 14. CHAPTER 3. METHODOLOGY WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing. It groups words together based on their meanings. Synet is nothing but a set of one or more Synonyms. This approach uses Semantics to understand the language. Major tasks in NLP that helps in extracting sentiment from a sentence[1] : 1. Extracting part of the sentence that reflects the sentiment 2. Understanding the structure of the sentence 3. Different tools which help process the textual data Basically, Positive and Negative scores (for particular synet) got from SentiWordNet according to its part-of-speech tag and then by counting the total positive and negative scores we determine the sentiment polarity based on which class (i.e. either positive or negative) has received the highest score. 3.3 Summery The various approaches for Sentiment Analysis has been discussed in this chapter. There are total two ways one is using Machine Learning and the other is using Natural Language Processing. 8
  15. 15. Chapter 4 Implementation Sentiment Analysis can be implemented using 2 approaches [1] 1. Machine Learning Approach 2. Natural Language Processing Approach 4.1 Machine Learning Approach Machine learning approach needs a dataset, a classifier to train. Basic idea behind this approach is that first we collect the data set which can be movie review dataset,twitter dataset, etc. These data sets are freely available on internet. Then we pre process the data set and prepare a training set for our classifier. Using training set we train the classifier, after training we provide test data set to classifier. Following figure shows the basic implementation model of Sentiment Analysis using Machine Learning Approach Figure 4.1: Implementation Architecture using Machine Learning Approach 9
  16. 16. CHAPTER 4. IMPLEMENTATION Data sets are freely available on internet. For Example, City Grid Media, it is a online media company that connects web and mobile publishers with local businesses by linking them through city grid. It provides apis, reviews, ratings(1-10). Its domain is Restaurant. Pre-processing involves dividing the sentence into tokens, case conversion, removal of punctuations, word conversion to full forms. 4.2 Natural Language Processing Approach Natural Language Processing approach uses SentiWordNet lexicon. Which consists of positive, negative score for each of the term occuring in WordNet. The implementation done by extracting the adjectives out of the sentence and then searching it in the SentiWordNet to find out its positive, negative score. In this way the total net score of the sentence is calculated and whichever is greater (either positive or negative) becomes the review for the sentence. Following figure shows the basic implementation architecture of Sentiment Analysis using Natural Language Processing Approach. Figure 4.2: Implementation Architecture using NLP Approach 10
  17. 17. CHAPTER 4. IMPLEMENTATION 4.3 Summery The various approaches to implement Sentiment Analysis has been discussed in this chapter in detail. There are total two ways one is using Machine Learning and the other is using Natural Language Processing. 11
  18. 18. Chapter 5 Applications Word of mouth is the process of conveying information from person to person and plays a major role in customer buying decisions. In commercial situations, Word of mouth involves consumers sharing attitudes, opinions, or reactions about businesses, products, or services with other people. Word of mouth communication functions based on social networking and trust. People rely on families, friends, and others in their social network. Research also indicates that people appear to trust seemingly disinterested opinions from people outside their immediate social network, such as online reviews. This is where Sentiment Analysis comes into play. Growing availability of opinion rich resources like online review sites, blogs, social networking sites have made this “decision-making process” easier for us. With explosion of Web 2.0 platforms consumers have a soapbox of unprecedented reach and power by which they can share opinions. Major companies have realized these consumer voices affect shaping voices of other consumers.[2] Sentiment Analysis thus finds its use in Consumer Market for Product reviews,Marketing for knowing consumer attitudes and trends, Social Media for finding general opinion about recent hot topics in town, Movie to find whether a recently released movie is a hit.[2] 12
  19. 19. CHAPTER 5. APPLICATIONS Classification of applications into the following categories: 1. Review-Related Websites : Movie Reviews, Product Reviews etc. 2. As a Sub-Component Technology : Detecting antagonistic, heated language in mails, spam detection, context sensitive information detection etc. 3. Businesses and Organizations : • Brand analysis • New product perception • Product and Service benchmarking • Market Intelligence • Business spends a huge amount of money to find consumer sentiments and opinions – Consultants, surveys and focused groups, etc 4. Individuals : Interested in other’s opinions when • Purchasing a product or using a service • Finding opinions on political topics 5. Ads Placements : Placing ads in the user-generated content • Place an ad when one praises a product. • Place an ad from a competitor if one criticizes a product. 5.1 Summery This chapter tells the various applications of Sentiment Analysis. 13
  20. 20. Chapter 6 Advantages & Disadvantages 6.1 Advantages 1. A lower cost than traditional methods of getting customer insight. 2. A faster way of getting insight from customer data. 3. The ability to act on customer suggestions. 4. Identifies an organisation’s Strengths, Weaknesses, Opportunities & Threats (SWOT Analysis) 5. As 80% of all data in a business consists of words, the Sentiment Engine is an essential tool for making sense of it all. 6. More accurate and insightful customer perceptions and feedback. 6.2 Summery This chapter gives the advantages of Sentiment Analysis. 14
  21. 21. Chapter 7 Conclusion Sentiment analysis, as an interdisciplinary field that crosses natural language processing, artificial intelligence, and text mining. We have seen that Sentiment Analysis can be used for analyzing opinions in blogs, newspaper, articles,Product reviews, Social Media websites, Movie-review websites where a third person narrates his/her views. We also studied Natural Language Processing and Machine Learning approaches for Sentiment Analysis. We have seen that is easy to implement Sentiment Analysis via SentiWordNet approach than via Classifier approach. We have seen that sentiment analysis has many applications and it is important field to study. Sentiment analysis has Strong commercial interest because Companies want to know how their products are being perceived and also Prospective consumers want to know what existing users think 15
  22. 22. Bibliography [1] P. W. V.K. Singh R. Piryani A. Uddin, “Sentiment analysis of movie reviews and blog posts,” IEEE International Advance Computing Conference (IACC), vol. 3, 2013. [2] A. A. G. Mostafa Karamibekr, “Sentiment analysis of social issues,” International Conference on Social Informatics, 2012. [3] M. R. Alaa Hamouda, “Reviews classification using sentiwordnet lexicon,” The Online Journal on Computer Science and Information Technology (OJCSIT), vol. 2, August 2011. 16