SlideShare a Scribd company logo
1 of 5
Download to read offline
SENTIMENT ANALYSIS OF TWITTER
DATA
ANARGHA GANGADHARAN
anarghagangadharan@gmail.com
ANJU ANIL
anjuanil1217@gmail.com
MARY LIS JOSEPH
marylisjp@gmail.com
PARVATHY D
parvathydevaraj8@gmail.com
B.Tech Scholars
Department of Computer Science
College of Engineering Cherthala
Abstract—Micro blogging has now become a very
popular communication tool. Millions of people share
their views, opinions on various topics in these sites.
Therefore these sites have become a rich source of
opinion and views of different people among many
micro blogging sites twitter is one of the popular sites.
Today it is a daily practice for many people to read the
news online and therefore In this paper we examined
the sentiment analysis of twitter data and we focused on
news channels and other news sites which post about
current news and the tweets of those news posted daily
is being analysed and the overall sentiment of that news
is being analysed. Here we have presented a system
which gives a score that indicates whether the news is
positive or negative. Each news is being considered and
is being tokenized and sentiment is being calculated
using naive bayes classifier which classify the data into
positive, negative or neutral and the main feature is
that the sentiment calculation is being done on real
time data.
Key words: sentiment analysis, machine learning,
naive bayes classifier.
I. INTRODUCTION
Various microblogging sites have become
a part of our day today life as a source for varIous
kinds of information. This is because people rely
mostly on websites rather than any other media.
This is because people can post real time messages
and their opinions on various topics. Among various
sites we have chosen twitter as a platform for
performing sentiment analysis because of various
facilities and features that twitter provides us such
as it is the only web site media through which each
can communicate with their potential customers.
Twitter audience varies from regular users to
celebrities, company representatives, politicians,
students and even it includes high authority
government officials which even consist of
president. Therefore it is possible to collect text
posts of users from various categories. Major works
on sentiment analysis has been done on subjective
texts types such as blogs, result prediction and
product reviews. Authors of such text types
typically express their individual opinions freely
sometimes it may even restrict the sentiment to a
single group of people or may even leads to a single
person. The situation is different in news articles.
News can be good or bad but it is seldom neutral.
Analysing this news and thereby calculating the
sentiments expressed by the twitter audience can
provide a meaningful sense of how the latest news
impacts important entities. Another difference
between reviews and news is that reviews
frequently are about a relatively concrete object or
which can be said as a target subject. Whereas news
articles covers a larger subject domain which is
even more complex event description And a
whole range of targets. Our paper mainly
concentrates on experimental evaluation on a set of
real time news that has been posted on twitter by
various news channels and newspapers and thereby
evaluating the overall impact of the news on the
people. We look over the news article and obtain the
tweets based on that news; the tweets may either be
a link or an opinion or can even be a query. We
classify the news as positive, negative and neutral
and consider only positive and negative news for
sentiment calculation. This paper is structured
mainly as follows. First module is all about
collecting the data. Second module is text pre-
processing. Third module deals with term
frequencies. Fourth module discusses about rugby
and term co-occurrences and the fifth module deals
with data visualisation basics.
II. LITERATURE SURVEY
Social media plays an important share on
the web. Users have become a part and co-creators
of contents on the web. The users now contribute
major part of social media ranging from articles,
news, reviews etc. This leads to the creation of a
large unstructured text on the web. Among all the
social media Twitter plays an important role in
interacting with the people all around the world.
The task here is to analyse the sentiment of such
data which is pertinent research topic in recent
time.
In previous studies by Namrata Godbole,
Manjunath Srinivasaiah, Steven Skiena has done
sentiment analysis on general news following news
articles and blogs. Kiran Shriniwas Dodd, Dr. Mrs.
Y. V. Haribhakta, Dr. Parag Kulkarni has also
succeeded in finding sentiment analysis on online
news media. However, not many researches in
opinion mining contemplate blogs and even much
less addressed micro blogging .Turney, 2002; Pang
and Lee, 2004 sentiment analysis has been carried
on document level classification. Whereas Hu and
Liu, 2004; Kim and Hovy, 2004 has done the
analysis of data in sentence level. Bermingham and
Smeaton, 2010 has done analysis on data but they
failed to break data into tokens and even they
succeed only in handling unigrams. Go et al. (2009)
has succeeded classifying data into tokens but he
too failed to handle n grams. In Sentiment Analysis
of Twitter Data by Apoorv Agarwal we can see
that the sentiment analysis of Twitter data has been
done on data. They included the POS specific prior
polarity features. They mainly deals with two kinds
of models tree kernel and feature based models and
demonstrated it. In another paper by Alexander
Pak, Patrick Paroubek they have used tree tagger
for POS tagging and they have presented a method
for automatic collection of data that is been used to
train a sentiment classifier in that the author used
syntactic structures to describe emotions or state
facts .In the work done by James Spencer and
Gulden Uchyigit School on sentiment analysis of
twitter data they have only deal with common
process in NLP for finding the sentiment or
meaning of a given phrase or text and it gave
accuracy of only 50%.
In another paper about sentiment analysis
of news by Alexandra Balahur we have seen that
the news is being analysed and sentiment of
particular news is being calculated but they haven't
include any method for evaluating the brunt of
using negation and valence shift. In all the papers
which we have considered as a reference for our
work we have seen that sentiment analysis on
Twitter has been done only on structured data like
product reviews, election prediction, blogs, etc. and
no past works has been done on the news that are
been posted daily on twitter. None of the past work
has been dealt with real time data for sentiment
calculation and they haven't followed any specific
algorithm for calculating sentiment analysis.
III. DATA DESCRIPTION
Twitter is the most famous social
networking site in which users are allowed to post
real time messages called tweets. Tweets are small
in size and comprises of 140 characters .As a result
of these peculiarities of tweets, users use
wordplays, spelling mistakes, emoticons so as to
express their ideas. Following is a jargon associated
with tweets.
Hashtags: A special word or phrase indicated by a
hash symbol so as to identify the topic as specific.
Emoticons: An indication of facial expression so as
to convey user's feelings towards a particular topic.
Targets: Target is expressed by @ symbol so as to
identify a particular user specified.
We collected real time messages from the
Twitter. There were no restrictions regarding the
collection of data. The collection even consists of
all the tweets received. After gathering of them we
arranged them into two types positive and negative.
IV. COLLECTING DATA
The first step in collecting data is the
registration of our application. For this we have to
login our twitter account and after logging into our
account, we have to register a name and description
regarding our application. After entering these
entities a consumer key as well as a consumer
secret is obtained and these should be kept private.
From the configuration page we are secured with
an access token and an access token secret provided
the application accesses thus permitted are read
only. Twitter provides an API so as to interact with
its services. We also use tweepy so as to stream
data from twitter (python). Tweepy provides the
convenient cursor interface to iterate through
different types of object.
V. TEXT ANALYSIS
Text analysis is used to extract meaningful
pattern from unstructured text. Here we use
components and concepts from text analysis to
analyse the sentiments in tweets. The process of
analysing the sentiment consists of multiple steps.
First step is breaking texts into words. This process
is known as tokenization. The purpose of
tokenization is to split the text or a tweet, which is
streamed in Real time, into several smaller units
called tokens .Tokens can be either be words or
phrases. These tokens are the primary building
blocks for our Sentiment Analysis. Tokenization is
very crucial especially for Twitter data, since it
poses many challenges because of the nature of the
language being used. In second phase we extract
meaningful terms and counts from our tweets
called term frequency .This analysis phase contains
three parts counting terms, stopwords removal and
term filter. In counting terms we observe what are
the terms most commonly used in the data set. In
every language, some words are particularly
common, and that doesn’t convey any special
meaning called stopwords. After stopword
removal, counting and sorting we will get the most
frequently used words. Sometimes terms comes
together makes more sense .In term co-occurrence
we apply this concept. Visualization phase
represents the graph of frequently used words.
Finally we calculate the sentiment of real time
tweets using naïve bayes algorithm.
A. Tokenization
Table 1: tokenization of tweets
The tokenization is based on regular
expressions. Some specific types of tokens will not
be captured. This problem can be solved by
improving the regular expressions, or even employ
more innovatory techniques like Named Entity
Recognition. The important component of the
tokenizer is the regex_str variable, which is a list of
possible patterns. In particular, we need some
emoticons, HTML tags, Twitter @usernames (@-
mentions), Twitter #hashtags, URLs, numbers,
words with and without dashes and apostrophes.
Punctuation and whitespace may or may not be
included in the resulting list of tokens. All
contiguous strings of alphabetic characters are part
of one token; likewise with numbers. Tokens are
separated by whitespace characters, such as a space
or line break, or by punctuation characters.
After tokenization ‘@-mentions’ , ’ emoticons’,
‘URLs’ and ‘#hash-tags’ are now preserved as
individual tokens using NLTK libraries .Let us see
the example given below:
Table shows how the tokenized tweets or data set
looks like. That is each token separated by white
space are now preserved as individual tokens.
B. Term Frequencies
In term frequency we are extracting frequently
used meaningful tokens and there count. On the
basis of this ,term frequency can partitioned into
three they are:
• Counting terms
• Stopword removal
• Term filter
By performing simple word count we can find
the most commonly used term in the data set.
In order to keep track of the frequencies while
we are processing the tweets, we can use
collections.Counter() which internally is a
Tweets Tokenized tweets
"How I feel when dealing
with Unicode strings in
#python n #programming
https://t.co/xqFmmmyJiJ"
‘ How ’, ‘ I ’, ‘ feel ’ , ‘ when ’ ,
‘ dealing ’ , ‘ with ’ , ‘ Unicode
’,
’ strings ’ , ‘ in ’ , ‘#Python’ , ‘
 n ’ , ‘ #programming’,
‘http://t.co/xqFmmmyJiJ’
A $5 microcontroller with
wi-fi that runs python
#python
‘A’, ‘ microcontroller ’ , ’ with
’, ’ wi-fi ’, ’ that ’ , ’ runs ’ ,
’ # ’ , ’ python ’
A # python coding dojo to
end the day @ Downham
Market Academy #rocks
‘ A ’ , ‘ # ’,’ python ’ , ’ coding
’ , ’ dojo ’ , ’ to ’ , ’ end ’ , ’ the
‘ ,
’ day ’ , ’ @ ’ , ’ Downham ‘ , ’ Market
‘, ‘ Academy ’,’ #rocks ’
dictionary with some useful methods like
most_common()
Terms Count
The 42
It 25
Has 06
On 14
And 23
After processing, the tokens we will get
the frequency of word as in table above. Sometimes
the most frequent words are not exactly
meaningful. This due to the presence of articles,
conjunctions, adverbs, etc. in a language, which are
commonly called stop-words. Stop-word removal is
one important step that should be considered during
the pre-processing stages. Anyone can build a
custom list of stop-words, or use available lists;
NLTK provides a simple list for English stop-word.
The punctuation marks and with terms like RT used
for re-tweets and via, which are not in the default
stop-word list. After counting and sorting, we will
get the most commonly used terms.
Term filter don’t give us a deep explanation of
what the text is about.
C. Term co-occurrence
To place things in context, let’s consider
sequences of two terms. Because the terms come
together give more insight about the meaning of the
text, look at the table given below. The terms
comes together is called bigrams. The bigrams()
function from NLTK will take a list of tokens and
produce a list of tuples using adjacent tokensIn
case we decide to analyse longer n-grams that is
sequences of n tokens, it could make sense to keep
the stop-words, just in case we want to capture
phrases given in the table.
The terms that comes together gives us better
information about the meaning of a term,
supporting applications such as word
disambiguation or semantic similarity. We build a
co-occurrence matrix that contains the number of
times the term x has been seen in the same tweet as
the term y. For each term, we then extract the most
frequent co-occurrent terms, creating a list of tuple,
here we are collecting.
D. Visualisation
A good pictorial representation
of our data can help us to make sense of them and
highlight interesting insights.While there are some
options to create plots in Python using libraries like
matplotlib or ggplot Vincent bridges the gap
between a Python back-end and a front-end that
supports D3.js visualisation, allowing us to benefit
from both sides Vincent bridges the gap between a
Python back-end and a front-end that supports
D3.js visualisation, allowing us to benefit from
both sides Using the list of most frequent terms
(without hashtags) from our rugby data set, we
want to plot their frequencies: we can plot many
different types of charts with Vincent.
E. Naive Bayes Classifier Algorithm
Real time sentiment analysis using Naïve
Bayes algorithm. Final step is to calculate the
sentiment of the real time tweet . We used Naive
Bayes (NB) classification because it is simple and
natural method. NB combines efficiency with
reasonable accuracy. The important feature of this
algorithm is that the extracted text can be tokenised
easily; it is evident that they cannot be considered
as independent, since words. It is a classification
technique based on Bayes’ Theorem with an
assumption of independence among predictors. In
simple terms, a Naive Bayes classifier assumes that
the presence of a particular feature in a class is
unrelated to the presence of any other feature.
Naive Bayes model is easy to build and particularly
useful for very large data sets. Along with
bigrams
To be
Not to be
Miss you
I know
Look better
simplicity, Naive Bayes is known to outperform
even highly sophisticated classification methods.
Here we are using two types of data set they are
test data and train data. Supervised learning are
used in naïve bayes algorithm where supervised
learning is the machine learning task of inferring a
function from labelled training data. The training
data consist of a set a desired of training examples.
In supervised learning, each example is a pair
consisting of an input object and output value.
Trained data is the historical data.
Two different naive bayes classifiers have been
built, according to two different strategies here we
are using the second classifier.it was trained on a
simplified training corpus and makes use of a
polarity lexicon. The corpus was simplified since
only positive and negative tweets were considered.
Neutral tweets were not taken into account. As a
result, a basic binary (or Boolean) classifier which
only identifies both Positive and Negative tweets
was trained. In order to detect tweets without
polarity (or Neutral), the following basic rule is
used: if the tweet contains at least one word that is
also found in the polarity lexicon, then the tweet
has some degree of polarity. Otherwise, the tweet
has no polarity at all and is classified as Neutral.
The binary classifier is actually suited to specify
the basic polarity between positive and negative,
reaching a precision of more than 80% in a corpus
with just these two categories Bayes theorem
provides a way of calculating posterior probability
P(c|x) from P(c), P(x) and P(x|c). Look at the
equation below:0
Above,
• P(c|x) is the posterior probability of class (c,
target) given predictor (x, attributes).
• P(c) is the prior probability of class.
• P(x|c) is the likelihood which is the probability of
predictor given class.
• P(x) is the prior probability of predictor.
we’re able to get almost 73% accuracy. This is
somewhat near human accuracy, as apparently
people agree on sentiment only around 80% of the
time.
VI. CONCLUSION
We conferred results for sentiment
analysis on Twitter based on daily news. Here we
have used SVM and naive bayes classifier for
finding the sentiment of people based on the
current news. Here we have dealt with the two
possible kinds of sentiments positive and negative.
We have also dealt with uni grams, bi grams and
even n grams and have also considered the
hyphenated words. We have also dealt with tweets
which come in form of query or any links. As our
future work we also look forward on developing an
application which carries our textual analysis on
voice data and even extend our textual analysis
with specifying the overall impact of news on
people either as positive or negative along with the
root cause being specified.
VII. REFERENCES
[1] “Large Scale Sentiment Analysis for News and
Blogs” by Namrata Godbole, Manjunath
Srinivasaiah, Steven Skiena.
[2] “Sentiment Analysis of Twitter Data” by
Apoorv Agarwa, Boyi Xie, Ilia Vovsha, Owen
Rambow, Rebecca Passonneau.
[3] Apoorv Agarwal, Fadi Biadsy, and Kathleen
Mckeown 2009. “Contextual phrase-level polarity
analysis using lexical affect scoring and syntactic
n-grams”. Proceedings of the 12th Conference of
the European Chapter of the ACL.
[4] “Sentimentor: Sentiment Analysis of Twitter
Data “ by James Spencer and Gulden Uchyigit.
[5] Bo Pang, “L.L.: Opinion mining and sentiment
analysis.” Foundations and Trends in Information
Retrieval January Volume 2 Issue 1-2, 1–94 (2008)
[6] Pak, A., and Paroubek, P. 2010. “Twitter as a
corpus for sentiment analysis and opinion mining.”
[7] Pang, B., and Lee, L. 2008. “Opinion mining
and sentiment analysis.” Foundations and Trends
in Information Retrieval.

More Related Content

What's hot

Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis prnk08
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using pythonCloudTechnologies
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project reportBharat Khanna
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisSunil Kandari
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataHari Prasad
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment AnalysisNihar Suryawanshi
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysisAntaraBhattacharya12
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisGangasagar Patil
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter dataBhagyashree Deokar
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 

What's hot (20)

Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment Analysis
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment Analysis
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter data
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 

Similar to SENTIMENT ANALYSIS OF TWITTER DATA

Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Reviewiosrjce
 
Sentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logicSentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logicijcseit
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...ijcseit
 
Sentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logicSentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logicVinay Sawant
 
SENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGICSENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGICijcseit
 
Sentiment analysis using machine learning and deep Learning
Sentiment analysis using machine learning and deep LearningSentiment analysis using machine learning and deep Learning
Sentiment analysis using machine learning and deep LearningVenkat Projects
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsRESHAN FARAZ
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique IJERA Editor
 
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...IRJET Journal
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platformFayan TAO
 
ONLINE TOXIC COMMENTS.pptx
ONLINE TOXIC COMMENTS.pptxONLINE TOXIC COMMENTS.pptx
ONLINE TOXIC COMMENTS.pptxyegnajayasimha21
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learningVenkat Projects
 

Similar to SENTIMENT ANALYSIS OF TWITTER DATA (20)

Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Review
 
W01761157162
W01761157162W01761157162
W01761157162
 
vishwas
vishwasvishwas
vishwas
 
Sentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logicSentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logic
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Sentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logicSentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logic
 
SENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGICSENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGIC
 
Sentiment analysis using machine learning and deep Learning
Sentiment analysis using machine learning and deep LearningSentiment analysis using machine learning and deep Learning
Sentiment analysis using machine learning and deep Learning
 
F017433947
F017433947F017433947
F017433947
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
 
E017433538
E017433538E017433538
E017433538
 
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
 
Monitoring opinion on esop through social media and clustering its polarity
Monitoring opinion on esop through social media and clustering its polarityMonitoring opinion on esop through social media and clustering its polarity
Monitoring opinion on esop through social media and clustering its polarity
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
 
Sub1557
Sub1557Sub1557
Sub1557
 
ONLINE TOXIC COMMENTS.pptx
ONLINE TOXIC COMMENTS.pptxONLINE TOXIC COMMENTS.pptx
ONLINE TOXIC COMMENTS.pptx
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learning
 

SENTIMENT ANALYSIS OF TWITTER DATA

  • 1. SENTIMENT ANALYSIS OF TWITTER DATA ANARGHA GANGADHARAN anarghagangadharan@gmail.com ANJU ANIL anjuanil1217@gmail.com MARY LIS JOSEPH marylisjp@gmail.com PARVATHY D parvathydevaraj8@gmail.com B.Tech Scholars Department of Computer Science College of Engineering Cherthala Abstract—Micro blogging has now become a very popular communication tool. Millions of people share their views, opinions on various topics in these sites. Therefore these sites have become a rich source of opinion and views of different people among many micro blogging sites twitter is one of the popular sites. Today it is a daily practice for many people to read the news online and therefore In this paper we examined the sentiment analysis of twitter data and we focused on news channels and other news sites which post about current news and the tweets of those news posted daily is being analysed and the overall sentiment of that news is being analysed. Here we have presented a system which gives a score that indicates whether the news is positive or negative. Each news is being considered and is being tokenized and sentiment is being calculated using naive bayes classifier which classify the data into positive, negative or neutral and the main feature is that the sentiment calculation is being done on real time data. Key words: sentiment analysis, machine learning, naive bayes classifier. I. INTRODUCTION Various microblogging sites have become a part of our day today life as a source for varIous kinds of information. This is because people rely mostly on websites rather than any other media. This is because people can post real time messages and their opinions on various topics. Among various sites we have chosen twitter as a platform for performing sentiment analysis because of various facilities and features that twitter provides us such as it is the only web site media through which each can communicate with their potential customers. Twitter audience varies from regular users to celebrities, company representatives, politicians, students and even it includes high authority government officials which even consist of president. Therefore it is possible to collect text posts of users from various categories. Major works on sentiment analysis has been done on subjective texts types such as blogs, result prediction and product reviews. Authors of such text types typically express their individual opinions freely sometimes it may even restrict the sentiment to a single group of people or may even leads to a single person. The situation is different in news articles. News can be good or bad but it is seldom neutral. Analysing this news and thereby calculating the sentiments expressed by the twitter audience can provide a meaningful sense of how the latest news impacts important entities. Another difference between reviews and news is that reviews frequently are about a relatively concrete object or which can be said as a target subject. Whereas news articles covers a larger subject domain which is even more complex event description And a whole range of targets. Our paper mainly concentrates on experimental evaluation on a set of real time news that has been posted on twitter by various news channels and newspapers and thereby evaluating the overall impact of the news on the people. We look over the news article and obtain the tweets based on that news; the tweets may either be a link or an opinion or can even be a query. We classify the news as positive, negative and neutral and consider only positive and negative news for
  • 2. sentiment calculation. This paper is structured mainly as follows. First module is all about collecting the data. Second module is text pre- processing. Third module deals with term frequencies. Fourth module discusses about rugby and term co-occurrences and the fifth module deals with data visualisation basics. II. LITERATURE SURVEY Social media plays an important share on the web. Users have become a part and co-creators of contents on the web. The users now contribute major part of social media ranging from articles, news, reviews etc. This leads to the creation of a large unstructured text on the web. Among all the social media Twitter plays an important role in interacting with the people all around the world. The task here is to analyse the sentiment of such data which is pertinent research topic in recent time. In previous studies by Namrata Godbole, Manjunath Srinivasaiah, Steven Skiena has done sentiment analysis on general news following news articles and blogs. Kiran Shriniwas Dodd, Dr. Mrs. Y. V. Haribhakta, Dr. Parag Kulkarni has also succeeded in finding sentiment analysis on online news media. However, not many researches in opinion mining contemplate blogs and even much less addressed micro blogging .Turney, 2002; Pang and Lee, 2004 sentiment analysis has been carried on document level classification. Whereas Hu and Liu, 2004; Kim and Hovy, 2004 has done the analysis of data in sentence level. Bermingham and Smeaton, 2010 has done analysis on data but they failed to break data into tokens and even they succeed only in handling unigrams. Go et al. (2009) has succeeded classifying data into tokens but he too failed to handle n grams. In Sentiment Analysis of Twitter Data by Apoorv Agarwal we can see that the sentiment analysis of Twitter data has been done on data. They included the POS specific prior polarity features. They mainly deals with two kinds of models tree kernel and feature based models and demonstrated it. In another paper by Alexander Pak, Patrick Paroubek they have used tree tagger for POS tagging and they have presented a method for automatic collection of data that is been used to train a sentiment classifier in that the author used syntactic structures to describe emotions or state facts .In the work done by James Spencer and Gulden Uchyigit School on sentiment analysis of twitter data they have only deal with common process in NLP for finding the sentiment or meaning of a given phrase or text and it gave accuracy of only 50%. In another paper about sentiment analysis of news by Alexandra Balahur we have seen that the news is being analysed and sentiment of particular news is being calculated but they haven't include any method for evaluating the brunt of using negation and valence shift. In all the papers which we have considered as a reference for our work we have seen that sentiment analysis on Twitter has been done only on structured data like product reviews, election prediction, blogs, etc. and no past works has been done on the news that are been posted daily on twitter. None of the past work has been dealt with real time data for sentiment calculation and they haven't followed any specific algorithm for calculating sentiment analysis. III. DATA DESCRIPTION Twitter is the most famous social networking site in which users are allowed to post real time messages called tweets. Tweets are small in size and comprises of 140 characters .As a result of these peculiarities of tweets, users use wordplays, spelling mistakes, emoticons so as to express their ideas. Following is a jargon associated with tweets. Hashtags: A special word or phrase indicated by a hash symbol so as to identify the topic as specific. Emoticons: An indication of facial expression so as to convey user's feelings towards a particular topic. Targets: Target is expressed by @ symbol so as to identify a particular user specified. We collected real time messages from the Twitter. There were no restrictions regarding the collection of data. The collection even consists of all the tweets received. After gathering of them we arranged them into two types positive and negative. IV. COLLECTING DATA The first step in collecting data is the registration of our application. For this we have to login our twitter account and after logging into our account, we have to register a name and description regarding our application. After entering these entities a consumer key as well as a consumer secret is obtained and these should be kept private. From the configuration page we are secured with an access token and an access token secret provided the application accesses thus permitted are read only. Twitter provides an API so as to interact with
  • 3. its services. We also use tweepy so as to stream data from twitter (python). Tweepy provides the convenient cursor interface to iterate through different types of object. V. TEXT ANALYSIS Text analysis is used to extract meaningful pattern from unstructured text. Here we use components and concepts from text analysis to analyse the sentiments in tweets. The process of analysing the sentiment consists of multiple steps. First step is breaking texts into words. This process is known as tokenization. The purpose of tokenization is to split the text or a tweet, which is streamed in Real time, into several smaller units called tokens .Tokens can be either be words or phrases. These tokens are the primary building blocks for our Sentiment Analysis. Tokenization is very crucial especially for Twitter data, since it poses many challenges because of the nature of the language being used. In second phase we extract meaningful terms and counts from our tweets called term frequency .This analysis phase contains three parts counting terms, stopwords removal and term filter. In counting terms we observe what are the terms most commonly used in the data set. In every language, some words are particularly common, and that doesn’t convey any special meaning called stopwords. After stopword removal, counting and sorting we will get the most frequently used words. Sometimes terms comes together makes more sense .In term co-occurrence we apply this concept. Visualization phase represents the graph of frequently used words. Finally we calculate the sentiment of real time tweets using naïve bayes algorithm. A. Tokenization Table 1: tokenization of tweets The tokenization is based on regular expressions. Some specific types of tokens will not be captured. This problem can be solved by improving the regular expressions, or even employ more innovatory techniques like Named Entity Recognition. The important component of the tokenizer is the regex_str variable, which is a list of possible patterns. In particular, we need some emoticons, HTML tags, Twitter @usernames (@- mentions), Twitter #hashtags, URLs, numbers, words with and without dashes and apostrophes. Punctuation and whitespace may or may not be included in the resulting list of tokens. All contiguous strings of alphabetic characters are part of one token; likewise with numbers. Tokens are separated by whitespace characters, such as a space or line break, or by punctuation characters. After tokenization ‘@-mentions’ , ’ emoticons’, ‘URLs’ and ‘#hash-tags’ are now preserved as individual tokens using NLTK libraries .Let us see the example given below: Table shows how the tokenized tweets or data set looks like. That is each token separated by white space are now preserved as individual tokens. B. Term Frequencies In term frequency we are extracting frequently used meaningful tokens and there count. On the basis of this ,term frequency can partitioned into three they are: • Counting terms • Stopword removal • Term filter By performing simple word count we can find the most commonly used term in the data set. In order to keep track of the frequencies while we are processing the tweets, we can use collections.Counter() which internally is a Tweets Tokenized tweets "How I feel when dealing with Unicode strings in #python n #programming https://t.co/xqFmmmyJiJ" ‘ How ’, ‘ I ’, ‘ feel ’ , ‘ when ’ , ‘ dealing ’ , ‘ with ’ , ‘ Unicode ’, ’ strings ’ , ‘ in ’ , ‘#Python’ , ‘ n ’ , ‘ #programming’, ‘http://t.co/xqFmmmyJiJ’ A $5 microcontroller with wi-fi that runs python #python ‘A’, ‘ microcontroller ’ , ’ with ’, ’ wi-fi ’, ’ that ’ , ’ runs ’ , ’ # ’ , ’ python ’ A # python coding dojo to end the day @ Downham Market Academy #rocks ‘ A ’ , ‘ # ’,’ python ’ , ’ coding ’ , ’ dojo ’ , ’ to ’ , ’ end ’ , ’ the ‘ , ’ day ’ , ’ @ ’ , ’ Downham ‘ , ’ Market ‘, ‘ Academy ’,’ #rocks ’
  • 4. dictionary with some useful methods like most_common() Terms Count The 42 It 25 Has 06 On 14 And 23 After processing, the tokens we will get the frequency of word as in table above. Sometimes the most frequent words are not exactly meaningful. This due to the presence of articles, conjunctions, adverbs, etc. in a language, which are commonly called stop-words. Stop-word removal is one important step that should be considered during the pre-processing stages. Anyone can build a custom list of stop-words, or use available lists; NLTK provides a simple list for English stop-word. The punctuation marks and with terms like RT used for re-tweets and via, which are not in the default stop-word list. After counting and sorting, we will get the most commonly used terms. Term filter don’t give us a deep explanation of what the text is about. C. Term co-occurrence To place things in context, let’s consider sequences of two terms. Because the terms come together give more insight about the meaning of the text, look at the table given below. The terms comes together is called bigrams. The bigrams() function from NLTK will take a list of tokens and produce a list of tuples using adjacent tokensIn case we decide to analyse longer n-grams that is sequences of n tokens, it could make sense to keep the stop-words, just in case we want to capture phrases given in the table. The terms that comes together gives us better information about the meaning of a term, supporting applications such as word disambiguation or semantic similarity. We build a co-occurrence matrix that contains the number of times the term x has been seen in the same tweet as the term y. For each term, we then extract the most frequent co-occurrent terms, creating a list of tuple, here we are collecting. D. Visualisation A good pictorial representation of our data can help us to make sense of them and highlight interesting insights.While there are some options to create plots in Python using libraries like matplotlib or ggplot Vincent bridges the gap between a Python back-end and a front-end that supports D3.js visualisation, allowing us to benefit from both sides Vincent bridges the gap between a Python back-end and a front-end that supports D3.js visualisation, allowing us to benefit from both sides Using the list of most frequent terms (without hashtags) from our rugby data set, we want to plot their frequencies: we can plot many different types of charts with Vincent. E. Naive Bayes Classifier Algorithm Real time sentiment analysis using Naïve Bayes algorithm. Final step is to calculate the sentiment of the real time tweet . We used Naive Bayes (NB) classification because it is simple and natural method. NB combines efficiency with reasonable accuracy. The important feature of this algorithm is that the extracted text can be tokenised easily; it is evident that they cannot be considered as independent, since words. It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Naive Bayes model is easy to build and particularly useful for very large data sets. Along with bigrams To be Not to be Miss you I know Look better
  • 5. simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Here we are using two types of data set they are test data and train data. Supervised learning are used in naïve bayes algorithm where supervised learning is the machine learning task of inferring a function from labelled training data. The training data consist of a set a desired of training examples. In supervised learning, each example is a pair consisting of an input object and output value. Trained data is the historical data. Two different naive bayes classifiers have been built, according to two different strategies here we are using the second classifier.it was trained on a simplified training corpus and makes use of a polarity lexicon. The corpus was simplified since only positive and negative tweets were considered. Neutral tweets were not taken into account. As a result, a basic binary (or Boolean) classifier which only identifies both Positive and Negative tweets was trained. In order to detect tweets without polarity (or Neutral), the following basic rule is used: if the tweet contains at least one word that is also found in the polarity lexicon, then the tweet has some degree of polarity. Otherwise, the tweet has no polarity at all and is classified as Neutral. The binary classifier is actually suited to specify the basic polarity between positive and negative, reaching a precision of more than 80% in a corpus with just these two categories Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:0 Above, • P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes). • P(c) is the prior probability of class. • P(x|c) is the likelihood which is the probability of predictor given class. • P(x) is the prior probability of predictor. we’re able to get almost 73% accuracy. This is somewhat near human accuracy, as apparently people agree on sentiment only around 80% of the time. VI. CONCLUSION We conferred results for sentiment analysis on Twitter based on daily news. Here we have used SVM and naive bayes classifier for finding the sentiment of people based on the current news. Here we have dealt with the two possible kinds of sentiments positive and negative. We have also dealt with uni grams, bi grams and even n grams and have also considered the hyphenated words. We have also dealt with tweets which come in form of query or any links. As our future work we also look forward on developing an application which carries our textual analysis on voice data and even extend our textual analysis with specifying the overall impact of news on people either as positive or negative along with the root cause being specified. VII. REFERENCES [1] “Large Scale Sentiment Analysis for News and Blogs” by Namrata Godbole, Manjunath Srinivasaiah, Steven Skiena. [2] “Sentiment Analysis of Twitter Data” by Apoorv Agarwa, Boyi Xie, Ilia Vovsha, Owen Rambow, Rebecca Passonneau. [3] Apoorv Agarwal, Fadi Biadsy, and Kathleen Mckeown 2009. “Contextual phrase-level polarity analysis using lexical affect scoring and syntactic n-grams”. Proceedings of the 12th Conference of the European Chapter of the ACL. [4] “Sentimentor: Sentiment Analysis of Twitter Data “ by James Spencer and Gulden Uchyigit. [5] Bo Pang, “L.L.: Opinion mining and sentiment analysis.” Foundations and Trends in Information Retrieval January Volume 2 Issue 1-2, 1–94 (2008) [6] Pak, A., and Paroubek, P. 2010. “Twitter as a corpus for sentiment analysis and opinion mining.” [7] Pang, B., and Lee, L. 2008. “Opinion mining and sentiment analysis.” Foundations and Trends in Information Retrieval.