Sentiment Analysis of Twitter Data

26,883 views

Published on

Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com

3 Comments
82 Likes
Statistics
Notes
No Downloads
Views
Total views
26,883
On SlideShare
0
From Embeds
0
Number of Embeds
58
Actions
Shares
0
Downloads
0
Comments
3
Likes
82
Embeds 0
No embeds

No notes for slide

Sentiment Analysis of Twitter Data

  1. 1. Sentiment Analysis of Twitter DataPresented by :-RITESH KUMAR (1DS09IS069)SAMEER KUMAR SINHA (1DS09IS074)SUMIT KUMAR RAJ (1DS09IS082)Under the guidance ofMrs. Madhura MAsst. ProfessorDepartment of Information Science & Engineering,Dayananda Sagar College of Engineering, Bangalore1
  2. 2. TABLE OF CONTENTS • Introduction • Literature Survey • Motivation • Proposed System• Code Snippets• Applications• Results & Conclusion • References2
  3. 3. 3INTRODUCTIONtwitter.com is a popular microblogging website.Each tweet is 140 characters in length.Tweets are frequently used to express a tweetersemotion on a particular subject.There are firms which poll twitter for analysingsentiment on a particular topic.The challenge is to gather all such relevant data,detect and summarize the overall sentiment on atopic.
  4. 4. 4INTRODUCTION CONTINUED..PROBLEM STATEMENT:-• The problem in sentiment analysis is classifyingthe polarity of a given text at the document,sentence, or feature/aspect level .• whether the expressed opinion in a document, asentence or an entity feature/aspect is positive,negative, or neutral .
  5. 5. 5INTRODUCTION CONTINUED..OBJECTIVES :-●To implement an algorithm for automaticclassification of text into positive, negative or neutral.●Sentiment Analysis to determine the attitude of themass is positive, negative or neutral towards thesubject of interest.●Graphical representation of the sentiment in form ofPie-Chart.
  6. 6. 6LITERATURE SURVEY• Efthymios Kouloumpis, TheresaWilson, Johns Hopkins University, USA,Johanna Moore, School of Informatics University of Edinburgh, Edinburgh,UK in a paper on Twitter Sentiment Analysis:The Good the Bad and theOMG! in July 2011 have investigate the utility of linguistic features fordetecting the sentiment of Twitter messages. We evaluate the usefulness ofexisting lexical resources as well as features that capture information aboutthe informal and creative language used in microblogging. We take asupervised approach to the problem, but leverage existing hashtags in theTwitter data for building training data.• Hassan Saif, Yulan He and Harith Alani, Knowledge Media Institute, TheOpen University, United Kingdom in a paper Semantic Sentiment Analysisof Twitter in Nov 2012 they have introduce a novel approach of addingsemantics as additional features into the training set for sentiment analysis.For each extracted entity (e.g. iPhone) from tweets, we add its semanticconcept (e.g. “Apple product”) as an additional feature, and measure thecorrelation of the representative concept with negative/positive sentiment. 
  7. 7. 7• Subhabrata Mukherjee1, Akshat Malu1, Balamurali A.R.12, PushpakBhattacharyya1,1Dept. of Computer Science and Engineering, IIT Bombay,2IITB-Monash Research Academy, IIT Bombay on a paper on TwiSent: AMultistage System for Analyzing Sentiment in Twitter in Feb 2013 theyhave presented TwiSent, a sentiment analysis system for Twitter. Based on thetopic searched, TwiSent collects tweets pertaining to it and categorizes theminto the different polarity classes positive, negative and objective. However,analyzing micro-blog posts have many inherent challenges compared to theother text genres.• Isaac G. Councill, Ryan McDonald, Leonid Velikovich, Google, Inc., NewYork on a paper on What’s Great and What’s Not: Learning to Classify theScope of Negation for Improved Sentiment Analysis in July 2010 presentsa negation detection system based on a conditional random field modelledusing features from an English dependency parser. The scope of negationdetection is limited to explicit rather than implied negations within singlesentences. LITERATURE SURVEY
  8. 8. 8MOTIVATION• An aspect of social media data such as Twitter messagesis that it includes rich structured information about theindividuals involved in the communication .• It can lead to more accurate tools for extracting semanticinformation.• It provides means for empirically studying properties ofsocial interactions.• Freely available, annotated corpus, Pre-written ClassifierCodes in Python using NLTK that can be used in NLP inorder to promote research that will lead to a betterunderstanding of how sentiment is conveyed in tweets andtexts.
  9. 9. 9PROPOSED SYSTEM
  10. 10. 10Graphical Representation of the sentimentUsing Google Charts API graphical representation is shown as above.
  11. 11. 11MACHINE LEARNING METHODSWe have used Baseline method and in-built classifiers from NLTK: Naive Bayes,maximum entropy.1. BaselineBaseline approach is to use a list of positive and negative keywords. For this weuse Twittratrs list of keywords, which is publicly available. This list consists of444 positive words and 588 negative words. For each tweet, we count the numberof negative keywords and positive keywords that appear. This classifier returns thepolarity with the higher count. If there is a tie, then positive polarity (the majorityclass) is returned.2. Naive BayesNaive Bayes is a simple model which works well on text categorization. We use amultinomial Naive Bayes model.Class c* is assigned to tweet d, whereIMPLEMENTED METHODS
  12. 12. 3.Maximum Entropy● Maximum entropy classifiers are commonly used as alternatives to naiveBayes classifiers because they do not assume statistical independence of therandom variables (commonly known as features) that serve as predictors.● However, learning in such a model is slower than for a naive Bayes classifier,and thus may not be appropriate given a very large number of classes to learn.● Learning in a Naive Bayes classifier is a simple matter of counting up thenumber of co-occurrences of features and classes, while in a maximumentropy classifier the weights, which are typically maximized using maximum aposteriori (MAP) estimation, must be learned using an iterative procedure.
  13. 13. System Requirements, Libraries & Languages used :-* Linux Operating System (Ubuntu Prefered)* Python 3.0 or above* NLTK Package* WebPy Framework Package* Modern Web Browser* HTML, CSS, JavaScript* Twitter API, Google API
  14. 14. Code Snippets:-Preprocessing the tweets:#start process_tweetdef process_tweet(self, tweet):#Conver to lower casetweet = tweet.lower()#Convert https?://* to URLtweet = re.sub(((www.[s]+)|(https?://[^s]+)),URL,tweet)#Convert @username to AT_USERtweet = re.sub(@[^s]+,AT_USER,tweet)#Remove additional white spacestweet = re.sub([s]+, , tweet)#Replace #word ord(c)tweet = re.sub(r#([^s]+), r1, tweet)#trimtweet = tweet.strip()#remove first/last " or at string endtweet = tweet.rstrip(")tweet = tweet.lstrip(")return tweet#end
  15. 15. Classifying the tweets:-#start processing each tweetfor i in self.tweets:tw = self.tweets[i]count = 0res = {}for t in tw:neg_words = [word for word in negative_words if(self.string_found(word, t))]pos_words = [word for word in positive_words if(self.string_found(word, t))]if(len(pos_words) > len(neg_words)):label = positiveself.pos_count[i] += 1elif(len(pos_words) < len(neg_words)):label = negativeself.neg_count[i] += 1else:if(len(pos_words) > 0 and len(neg_words) > 0):label = positiveself.pos_count[i] += 1else:label = neutralself.neut_count[i] += 1
  16. 16. Finalizing the Result and Output:-* We make use of Google Chart Tools to show the sentiment in graphicalRepresentation.* Google Chart Tools provide a perfect way to visualize data on any host.From simple line charts to complex hierarchical tree maps, the chart galleyprovides a large number of well-designed chart types.* We make use of Pie Chart and Line Chart
  17. 17. 17APPLICATIONS:-• Applications to Review-Related Websites-Movie Reviews, Product Reviews etc.• Applications as a Sub-Component Technology-Detecting antagonistic, heated language in mails, spamdetection, context sensitive information detection etc.• Applications in Business and Government Intelligence-Knowing Consumer attitudes and trends• Applications across Different Domains-Knowing public opinions for political leaders or theirnotions about rules and regulations in place etc.
  18. 18. 18RESULTS :-• Real-time sentiment analysis of social media user content has becomeincreasingly critical for organizations to master in order to predictmarket trends, analyze consumer opinions, and remain competitive..Classifier Accuracy:-
  19. 19. CONCLUSION & FUTURE WORK :-Conclusion:-We conclude that using different NLTK classifier it is easier toclassify the tweets and more we improve the training data setmore we can get accurate results.Future Work:-We look forward to use bigger dataset to improve the accuracy,considering the emoticons and internationalization.
  20. 20. 20REFERENCES:-[1]. Aditya Joshi, Balamurali A.R., Pushpak Bhattacharyya,2010, A Fall-Back Strategyfor Sentiment Analysis in a New Language: A Case Study forHindi, ICON 2010,Kharagpur,India [2]. Alec, G.; Lei, H.; and Richa, B. Twitter sentimentclassification using distant supervision Technical report,Standford University. 2009[3]. http://help.sentiment140.com/for-students [4]. http://www.gbsheli.com/2009/03/twitgraph-en.html[5]. http://en.wikipedia.org[6].http://ravikiranj.net/drupal/201205/code/machine-learning/how-build-twitter-sentiment-analyzer
  21. 21. 21

×