Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Under the guidance of
Prof. Vasudeva Varma
IIIT Hyderabad Submitted by:
Abhishek Jain(201201137)
Pradeep Anumala(201350843...
Input: Textual content of a
tweet.
Output: Label signifying
whether the tweet is positive,
negative or neutral
Problem?
download Parser Tokenizer
PreProcessor
Feature
Vector
Builder
Add Additional
Features
Construct feed
File
SVM Classifier T...
A total of 9684 train tweets and 8987 test tweets
were downloaded from twitter and are fed to Parser.
The parser
1. Remo...
We used the ‘ARK tokenizer’ to tokenize the tweets
The tokenizer divides each tweets into a sequence of
space separated ...
The tokenized tweets are fed to the pre processor
which :
1)Replaces the urls with | | U | |
2)Replaces @references with ...
The preprocessed file is fed to the feature vector
builder which creates the final feature vector.
The basic(baseline) f...
• The Feature Vector was enhanced by introducing
more features like:
• POS-Tagging
• Count of emoticons, hashtags and excl...
The formed feature vector was written into a file in a
format expected by the libsvm classifier.
A linear SVM Classifier...
The model is tested on a set of 8011 test tweets.
The following results were obtained:
Accuracy : 64% (5127/8011)
F-measur...
Thank You
Upcoming SlideShare
Loading in …5
×

19-14-Sentiment Analysis On Twitter

This is a small presentation on the Project "Sentiment Analysis on Twitter" done as a part of course Information Retrieval and Extraction under Prof. Vasudeva Varma at IIIT Hyderabad.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

19-14-Sentiment Analysis On Twitter

  1. 1. Under the guidance of Prof. Vasudeva Varma IIIT Hyderabad Submitted by: Abhishek Jain(201201137) Pradeep Anumala(201350843) Pragya Musal(201405545) Shashank S(201405599) Mentored by: Satarupa Guha
  2. 2. Input: Textual content of a tweet. Output: Label signifying whether the tweet is positive, negative or neutral Problem?
  3. 3. download Parser Tokenizer PreProcessor Feature Vector Builder Add Additional Features Construct feed File SVM Classifier Training.model SVM Classifier Test data Polarity of tweet (positive/negative/neutral)
  4. 4. A total of 9684 train tweets and 8987 test tweets were downloaded from twitter and are fed to Parser. The parser 1. Removes the unavailable tweets 2.Segregates the tweet and polarity 3.After removing the unavailable tweets,the total no. of train tweets were 7875 and test tweets were 8011
  5. 5. We used the ‘ARK tokenizer’ to tokenize the tweets The tokenizer divides each tweets into a sequence of space separated tokens and puts them into a file, which is used at a later stage for processing.
  6. 6. The tokenized tweets are fed to the pre processor which : 1)Replaces the urls with | | U | | 2)Replaces @references with | | T | | 3)Replaces +ve emoticons with the word ‘epositive’ and –ve emoticons with the word ‘enegative’ 4)Replaces the words that signify negative context with the word “not”
  7. 7. The preprocessed file is fed to the feature vector builder which creates the final feature vector. The basic(baseline) feature that was considered was of unigrams. A list of all unique unigrams across the training set was constructed and it formed the basic vector for each tweet.
  8. 8. • The Feature Vector was enhanced by introducing more features like: • POS-Tagging • Count of emoticons, hashtags and exclamations. • Scores from standard Lexicons • Negated contexts • Elongated words (sooooo,happppppppppy)
  9. 9. The formed feature vector was written into a file in a format expected by the libsvm classifier. A linear SVM Classifier was used and trained with the training file as an input and creates training.model file This model file was used on the testing file to predict the results.
  10. 10. The model is tested on a set of 8011 test tweets. The following results were obtained: Accuracy : 64% (5127/8011) F-measure : 0.6163
  11. 11. Thank You

    Be the first to comment

    Login to see the comments

  • sunilhg123

    Apr. 25, 2015

This is a small presentation on the Project "Sentiment Analysis on Twitter" done as a part of course Information Retrieval and Extraction under Prof. Vasudeva Varma at IIIT Hyderabad.

Views

Total views

366

On Slideshare

0

From embeds

0

Number of embeds

27

Actions

Downloads

10

Shares

0

Comments

0

Likes

1

×