Introduction to Sentiment Analysis

SENTIMENT ANALYSIS
A Seminar Report Submitted
in Partial Fulﬁllment of the Requirements
for the Degree of

Bachelor of Engineering
in

Computer Engineering
Submitted by

Patil Makrand Anil

DEPARTMENT OF COMPUTER ENGINEERING

SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE
2013 - 2014

SENTIMENT ANALYSIS
A Seminar Report Submitted
in Partial Fulﬁllment of the Requirements
for the Degree of

Bachelor of Engineering
in

Computer Engineering
Submitted by

Patil Makrand Anil
Guided by

Ms. A. A. Chavan


2013 - 2014


CERTIFICATE
This is to certify that the Seminar entitled Sentiment Analysis has been carried out
by
Patil Makrand Anil
under my guidance in partial fulﬁllment of the degree of Bachelor of Engineering in
Computer Engineering of North Maharashtra University, Jalgaon during the academic
year 2013 - 2014. To the best of my knowledge and belief this work has not been
submitted elsewhere for the award of any other degree.

Date:
Place: Dhule
Guide
Ms. A. A. Chavan

Head

Principal

Prof. B. R. Mandre

Dr. Hitendra D. Patil

iii

Acknowledgement
The completion of the report on “Sentiment Analysis”has given me profound knowledge. I
am sincerely thankful to Prof B. R. Mandre and my guide Ms. A. A. Chavan who have cooperated and guided me at different stages during the preparation of this report. My sincere
thanks to the staff of “Computer Engineering Department”, without the help of them I could
not have even conceived the accomplishment of this report. This work is virtually the result
of their inspiration and guidance.I would also like to thank the entire library staff and all
those who directly or indirectly were the part of this work.
Patil Makrand Anil

iv

Contents
Acknowledgement

iv

Abstract

1

1 Introduction

2

1.1

What is Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Need of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.3

Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2 Literature Survey

4

3 Methodology

6

3.1

Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

3.2

Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . .

7

3.3

Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

4 Implementation

9

4.1

Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

4.2

Natural Language Processing Approach . . . . . . . . . . . . . . . . . . . . .

10

4.3

Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

5 Applications
5.1

12

Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 Advantages & Disadvantages

13
14

6.1

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

6.2

Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

7 Conclusion

15

Bibliography

16

v

List of Figures
4.1

Implementation Architecture using Machine Learning Approach . . . . . . .

4.2

Implementation Architecture using NLP Approach

vi

. . . . . . . . . . . . . .

9
10

Abstract
Our day-to-day life has always been inﬂuenced by what people think. Ideas and opinions of
others have always aﬀected our own opinions. The explosion of Web 2.0 has led to increased
activity in Podcasting, Blogging, Tagging, Contributing to RSS, Social Bookmarking, and
Social Networking. As a result there has been an eruption of interest in people to mine
these vast resources of data for opinions. Sentiment Analysis or Opinion Mining is the
computational treatment of opinions, sentiments and subjectivity of text. In this report, we
discuss various approaches to perform a computational treatment of sentiments and opinions.
Various supervised or data-driven techniques to Sentiment Analysis like Naive Byes, Support
Vector Machine and SentiWordNet approach to Sentiment Analysis.

1

Chapter 1
Introduction
1.1

What is Sentiment Analysis

Sentiment Analysis is a Natural Language Processing and Information Extraction task that
aims to obtain writers feelings expressed in positive or negative comments, questions and requests, by analyzing a large numbers of documents.For example: “I am so happy today,good
morning to everyone”, is a general positive text.Generally speaking, sentiment analysis aims
to determine the attitude of a speaker or a writer with respect to some topic or the overall
functonality of a document.Sentiment analysis is also known as opinion mining. Basically,
Sentiment Analysis is the task of identifying whether the opinion expressed in a text is Positive or Negative. Natural language processing (NLP) is a field of computer science, artificial
intelligence, and linguistics concerned with the interactions between computers and human
(natural) languages.

1.2

Need of Sentiment Analysis

According to a recent statistics by the Social Media tracking company Technorati, four out of
every five users of Internet use social media in some form. This includes friendship networks,
blogging and micro-blogging sites, content and video sharing sites etc. It is worth observing
that the World Wide Web has now completely transformed into a more participative and
co-creative Web. It allows a large number of users to contribute in a variety of forms. The
fact is that even those who are virtually novice to the technicalities of the Web publishing
are creating content on the Web. In fact the value of a Website is now determined largely
by its user base, which in turn decides the amount of data available on it. It may perhaps
be true to say that Data is the new Intel inside.[1]
One such interesting form of user contributions on the Web is reviews. Many sites on the
Web allow users to write their experiences or opinion about a product or service in form
2

CHAPTER 1. INTRODUCTION
of a review. The Web is now full of userreviews for different items ranging from mobile
phones, holiday trips, and hotel services to movie reviews etc. It is interesting to observe
that these reviews not only express opinions of a group of users but is also a valuable
source for harnessing collective intelligence. For example, a user looking for a hotel in a
particular tourist city may prefer to go through the reviews of available hotels in the city
before making a decision to book in one of them. Or a user willing to buy a particular model
of digital camera may first look at reviews posted by many other users about that camera
before making a buying decision. This not only helps in allowing the user to get more and
relevant information about different products and services on a mouse click, but also helps
in arriving at a more informed decision. Sometimes users prefer to write their experiences
about a product or service as form of a blog post rather than an explicit review. However,
in both case the data is basically textual. Popular sites like carwale.com, imdb.com are now
full of user reviews, in this case reviews of cars and movies respectively.[3]
Though these reviews and posts are beyond doubt very useful and valuable, but at the
same time it is also quite difficult for a new user (or a prospective customer) to read all the
reviews/ posts in a short span of time. Fortunately we have a solution to this information
overload problem which can present a comprehensive summary result out of a large number of
reviews. The new Information Retrieval formulations, popularly called sentiment classifiers,
now not only allow to automatically label a review as positive or negative, but to extract
and highlight positive and negative aspects of a product/ service. Sentiment analysis is
now an important part of Information Retrieval based formulations in a variety of domains.
It is traditionally used for automatic extraction of opinions types about a product and for
highlighting positive or negative aspects/ features of a product.
It is widely believed that Sentiment analysis is needed and useful. It is also widely accepted
that extracting sentiment from text is a hard semantic problem even for human beings. So
in general, Sentiment Analysis will be useful for extracting sentiments available on Blogging
sites, Social Network, Discussion Forum in order to benefit both company and customer/user.

1.3

Summery

What is Sentiment Analysis, what is the need of Sentiment Analysis and the basic introduction Sentiment Analysis has been covered in this chapter.

3

Chapter 2
Literature Survey
Balamurali et al. (2011) presents an innovative idea to introduce sense based sentiment
analysis. This implies shifting from lexeme feature space to semantic space i.e. from simple
words to their synsets. The works in Sentiment Analysis, for so long, concentrated on lexeme
feature space or identifying relations between words using parsing. The need for integrating
sense to Sentiment Analysis was the need of the hour due to the following scenarios, as
identified by the authors:
• A word may have some sentiment-bearing and some non-sentiment-bearing senses
• There may be different senses of a word that bear sentiment of opposite polarity
• The same sense can be manifested by different words (appearing in the same synset)
Using sense as features helps to exploit the idea of sense/concepts and the hierarchical
structure of the WordNet. The following feature representations were used by the authors
and their performance were compared to that of lexeme based features:
• A group of word senses that have been manually annotated (M)
• A group of word senses that have been annotated by an automatic WSD (I)
• A group of manually annotated word senses and words (both separately as features)
(Sense + Words(M))
• A group of automatically annotated word senses and words (both separately as features) (Sense + Words(I))
Sense + Words(M) and Sense + Words(I) were used to overcome non-coverage of WordNet for some noun synsets. The authors used synset-replacement strategies to deal with
non-coverage, in case a synset in test document is not found in the training documents.
In that case the target unknown synset is replaced with its closest counterpart among the
WordNet synsets by using some metric.
4

CHAPTER 2. LITERATURE SURVEY
Supprt Vector Machines were used for classiﬁcation of the feature vectors and IWSD was
used for automatic WSD. Extensive experiments were done to compare the performance
of the 4 feature representations with lexeme representation. Best performance, in terms of
accuracy, was obtained by using sense based SA with manual annotation (with an accuracy of
90.2 percent and an increase of 5.3 percent over the baseline accuracy) followed by Sense(M),
Sense + Words(I), Sense(I) and lexeme feature representation. LESK was found to perform
the best among the 3 metrics used in replacement strategies.
One of the reasons for improvements was attributed to feature abstraction and dimensionality reduction leading to noise reduction. The work achieved its target of bringing a new
dimension to Sentiment Analysis by introducing sense based Sentiment Analysis.

5

Chapter 3
Methodology
There are primarily two types of approaches for sentiment classification of opinionated
texts[1]:
1. using a Machine learning based text classifier such as Naive Bayes, Support Vector
Machine
2. using Natural Language Processing

3.1

Machine Learning

Machine learning, a branch of artificial intelligence, concerns the construction and study of
systems that can learn from data. For example, a machine learning system could be trained
on email messages to learn to distinguish between spam and non-spam messages. After
learning, it can then be used to classify new email messages into spam and non-spam folders.
Machine learning focuses on prediction, based on known properties learned from the training
data.
Classification is the problem of identifying to which of a set of categories (sub-populations)
a new observation belongs, on the basis of a training set of data containing observations (or
instances) whose category membership is known For example would be assigning a given
email into “spam” or “non-spam” classes
An algorithm that implements classification, especially in a concrete implementation, is
known as a classifier. The term classifier sometimes also refers to the mathematical function,
implemented by a classification algorithm, that maps input data to a category
By training it means to train them on particular inputs so that later on we may test
them for unknown inputs (which they have never seen before) for which they may classify or
predict etc based on their learning.Classifying data is a common task in machine learning.
Suppose some given data points each belong to one of two classes, and the goal is to decide
which class a new data point will be in.
6

CHAPTER 3. METHODOLOGY
The machine learning based text classifiers are a kind of supervised machine learning
paradigm, where the classifier needs to be trained on some labeled training data before it
can be applied to actual classification task. The training data is usually an extracted portion
of the original data hand labeled manually. After suitable training they can be used on the
actual test data. The Naive Bayes is a statistical classifier whereas Support Vector Machine
is a kind of vector space classifier. The statistical text classifier scheme of Naive Bayes
(NB) can be adapted to be used for sentiment classification problem as it can be visualized
as a 2-class text classification problem: in positive and negative classes.[2] Support Vector
machine (SVM) is a kind of vector space model based classifier which requires that the text
documents should be transformed to feature vectors before they are used for classification.
Usually the text documents are transformed to multidimensional tf.idf vectors. The entire
problem of classification is then classifying every text document represented as a vector into
a particular class. It is a type of large margin classifier. Here the goal is to find a decision
boundary between two classes that is maximally far from any document in the training data.
This approach needs
1. A good classifier such as Naive Byes, Support Vector Machine,etc
2. A training set for each class
There are various training sets available on Internet such as Movie Reviews data set, twitter
dataset, etc.
Class can be Positive,negative. For both the classes we need training data sets.

3.2

Natural Language Processing

Natural language processing (NLP) is a field of computer science, artificial intelligence, and
linguistics concerned with the interactions between computers and human (natural) languages.
This approach utilizes the publicly available library of SentiWordNet, which provides a sentiment polarity values for every term occurring in the document. In this lexical resource
each term t occurring in WordNet is associated to three numerical scores obj(t), pos(t)
and neg(t), describing the objective, positive and negative polarities of the term, respectively. These three scores are computed by combining the results produced by eight ternary
classifiers.[3]
WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are
grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
7

CHAPTER 3. METHODOLOGY
WordNet is also freely and publicly available for download. WordNet’s structure makes it a
useful tool for computational linguistics and natural language processing. It groups words
together based on their meanings.
Synet is nothing but a set of one or more Synonyms.
This approach uses Semantics to understand the language. Major tasks in NLP that helps
in extracting sentiment from a sentence[1] :
1. Extracting part of the sentence that reﬂects the sentiment
2. Understanding the structure of the sentence
3. Diﬀerent tools which help process the textual data
Basically, Positive and Negative scores (for particular synet) got from SentiWordNet
according to its part-of-speech tag and then by counting the total positive and negative
scores we determine the sentiment polarity based on which class (i.e. either positive or
negative) has received the highest score.

3.3

Summery

The various approaches for Sentiment Analysis has been discussed in this chapter. There
are total two ways one is using Machine Learning and the other is using Natural Language
Processing.

8

Chapter 4
Implementation
Sentiment Analysis can be implemented using 2 approaches [1]
1. Machine Learning Approach
2. Natural Language Processing Approach

4.1

Machine Learning Approach

Machine learning approach needs a dataset, a classifier to train. Basic idea behind this approach is that first we collect the data set which can be movie review dataset,twitter dataset,
etc. These data sets are freely available on internet. Then we pre process the data set and
prepare a training set for our classifier. Using training set we train the classifier, after training we provide test data set to classifier.
Following figure shows the basic implementation model of Sentiment Analysis using Machine Learning Approach

Figure 4.1: Implementation Architecture using Machine Learning Approach

9

CHAPTER 4. IMPLEMENTATION
Data sets are freely available on internet. For Example, City Grid Media, it is a online
media company that connects web and mobile publishers with local businesses by linking
them through city grid. It provides apis, reviews, ratings(1-10). Its domain is Restaurant.
Pre-processing involves dividing the sentence into tokens, case conversion, removal of punctuations, word conversion to full forms.

4.2

Natural Language Processing Approach

Natural Language Processing approach uses SentiWordNet lexicon. Which consists of positive, negative score for each of the term occuring in WordNet. The implementation done
by extracting the adjectives out of the sentence and then searching it in the SentiWordNet
to ﬁnd out its positive, negative score. In this way the total net score of the sentence is
calculated and whichever is greater (either positive or negative) becomes the review for the
sentence.
Following ﬁgure shows the basic implementation architecture of Sentiment Analysis using
Natural Language Processing Approach.

Figure 4.2: Implementation Architecture using NLP Approach

10

CHAPTER 4. IMPLEMENTATION

4.3

Summery

The various approaches to implement Sentiment Analysis has been discussed in this chapter
in detail. There are total two ways one is using Machine Learning and the other is using
Natural Language Processing.

11

Chapter 5
Applications
Word of mouth is the process of conveying information from person to person and plays a
major role in customer buying decisions. In commercial situations, Word of mouth involves
consumers sharing attitudes, opinions, or reactions about businesses, products, or services
with other people. Word of mouth communication functions based on social networking and
trust. People rely on families, friends, and others in their social network. Research also
indicates that people appear to trust seemingly disinterested opinions from people outside
their immediate social network, such as online reviews. This is where Sentiment Analysis
comes into play. Growing availability of opinion rich resources like online review sites,
blogs, social networking sites have made this “decision-making process” easier for us. With
explosion of Web 2.0 platforms consumers have a soapbox of unprecedented reach and power
by which they can share opinions. Major companies have realized these consumer voices
affect shaping voices of other consumers.[2]
Sentiment Analysis thus finds its use in Consumer Market for Product reviews,Marketing
for knowing consumer attitudes and trends, Social Media for finding general opinion about
recent hot topics in town, Movie to find whether a recently released movie is a hit.[2]

12

CHAPTER 5. APPLICATIONS
Classiﬁcation of applications into the following categories:
1. Review-Related Websites : Movie Reviews, Product Reviews etc.
2. As a Sub-Component Technology : Detecting antagonistic, heated language in mails,
spam detection, context sensitive information detection etc.
3. Businesses and Organizations :
• Brand analysis
• New product perception
• Product and Service benchmarking
• Market Intelligence
• Business spends a huge amount of money to ﬁnd consumer sentiments and opinions
– Consultants, surveys and focused groups, etc
4. Individuals : Interested in other’s opinions when
• Purchasing a product or using a service
• Finding opinions on political topics
5. Ads Placements : Placing ads in the user-generated content
• Place an ad when one praises a product.
• Place an ad from a competitor if one criticizes a product.

5.1

Summery

This chapter tells the various applications of Sentiment Analysis.

13

Chapter 6
Advantages & Disadvantages
6.1

Advantages

1. A lower cost than traditional methods of getting customer insight.
2. A faster way of getting insight from customer data.
3. The ability to act on customer suggestions.
4. Identiﬁes an organisation’s Strengths, Weaknesses, Opportunities & Threats (SWOT
Analysis)
5. As 80% of all data in a business consists of words, the Sentiment Engine is an essential
tool for making sense of it all.
6. More accurate and insightful customer perceptions and feedback.

6.2

Summery

This chapter gives the advantages of Sentiment Analysis.

14

Chapter 7
Conclusion
Sentiment analysis, as an interdisciplinary field that crosses natural language processing,
artificial intelligence, and text mining. We have seen that Sentiment Analysis can be used
for analyzing opinions in blogs, newspaper, articles,Product reviews, Social Media websites,
Movie-review websites where a third person narrates his/her views. We also studied Natural
Language Processing and Machine Learning approaches for Sentiment Analysis. We have
seen that is easy to implement Sentiment Analysis via SentiWordNet approach than via
Classifier approach. We have seen that sentiment analysis has many applications and it is
important field to study. Sentiment analysis has Strong commercial interest because Companies want to know how their products are being perceived and also Prospective consumers
want to know what existing users think

15

Bibliography
[1] P. W. V.K. Singh R. Piryani A. Uddin, “Sentiment analysis of movie reviews and blog
posts,” IEEE International Advance Computing Conference (IACC), vol. 3, 2013.
[2] A. A. G. Mostafa Karamibekr, “Sentiment analysis of social issues,” International Conference on Social Informatics, 2012.
[3] M. R. Alaa Hamouda, “Reviews classiﬁcation using sentiwordnet lexicon,” The Online
Journal on Computer Science and Information Technology (OJCSIT), vol. 2, August
2011.

16

Introduction to Sentiment Analysis

More Related Content

What's hot

Viewers also liked

Similar to Introduction to Sentiment Analysis

Recently uploaded

Introduction to Sentiment Analysis