IRE2014-Sentiment Analysis

446 views
379 views

Published on

IRE2014-Sentiment Analysis in twitter
IIITh

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
446
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

IRE2014-Sentiment Analysis

  1. 1. Sentiment Analysis in Twitter IRE 2014 Siddharth Goyal Chetna Gagandeep Singh Gangasagar Patil
  2. 2. ● Introduction ● Message Polarity Classification ● Contextual Polarity disambiguation ● Results
  3. 3. ● Growing availability and popularity of opinion-rich resources such as online review sites, personal blogs and microblogging websites like twitter. ● A major challenge is to build technology to detect and summarize an overall sentiment on such websites ● Automatically extracting sentiment from a given block of text or tweet ● Marketers can use this to research public opinion of their company and products, or to analyze customer satisfaction ● Organizations can also use this to gather critical feedback about problems in newly released products ● To promote research that will lead to better understanding of how sentiment is conveyed in tweets and texts, SemEval (Semantic Evaluation) 2014 organizers had organized a task (Task 9) on sentiment analysis on twitter dataset Introduction
  4. 4. “Given a message, classify whether the message is of positive, negative, or neutral sentiment. For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen.” ● Two approaches: 1. Naive-Bayes Classifier a. Pre-processing of tweets Lower Case, @username, URLs, #hashTag, punctuations, additional spaces b . Feature Vector Creation Unigram Model, trained and tested using nltk library 2. Support Vector Machine (SVM) a. Pre-processing of tweets CMU tokenizer, POS tagging, urls, @username, negations, lowercase b. Feature Vector Creation POS-tag, world n gram, emoticons, all-caps, lexicon score, cluster, punctuation, elongation of words Message Polarity Classification
  5. 5. “Given a message containing a marked instance of a word or phrase, determine whether that instance is positive, negative or neutral in that context.” 1. Lexicon Used NRC Hashtag Sentiment Lexicon and Sentiment140 Lexicon 2. Pre-processing of tweets CMU Tokenizer, POS Tagging, @username, url, negation lower case 3. Features Used POS Tags, Word N-Grams, Emoticons, All-Caps, Lexicon Scores, Punctuation, Elongated Words, Linguistic Feature 4. Semantic Features Adjective, Modifier, Verb-modifier, Subjective Relationship, Dependency etc Contextual Polarity disambiguation
  6. 6. Experiment with different features using SVM Accuracy Achieved (in %) Only unigrams 63.8554 Without sentiment scores 64.0275 Bigram with thresholding (d = 1) 64.3718 All features + Trigrams 65.4045 All features 66.6093 All features without bigrams 67.2978 Experiment using SVM Recall Precision F-Score Bigrams with Thresholding (d = 1) 61.6582 63.5262 62.2186 Bigrams without Thresholding 63.8728 66.2443 64.6489 Without Bigrams 64.7117 66.3478 65.2356 Results (Message Polarity Classification)
  7. 7. Results (Contextual Polarity disambiguation) Experiment with different features using SVM Accuracy Achieved (in %) All features (with 1000 test data-points, 21673 train data-points) 85.9 All features (with 10000 test data-points, 11673 train data-points) 87.71 Experiment using SVM Recall Precision F-Score All Features(1K) 78.4063 79.7105 70.0381 All Features(10K) 81.2385 82.4210 81.7664
  8. 8. Demo

×