Twitter Sentiment Analysis
Akhil Batra
Avinash Kalivarapu
Sunil Kandari
Sentiment Analysis ?
• Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or
neutral.
• Also referred to as opinion mining, it makes our goal to determine whether the data(tweet) is
positive, negative or neutral.
Why is Sentiment Analysis Important?
• In public opinions eg:
• Is this product review positive or negative?
• Is this customer email satisfied or dissatisfied?
• Based on a sample of tweets, how are people responding to this ad campaign/product
release/news item?
• How have bloggers' attitudes about the president changed since the election?
Why Twitter Data for Sentiment Analysis?
• Popular microblogging site
• Short Text Messages of 140 characters
• 240+ million active users
• 500 million tweets are generated everyday
• Twitter audience varies from common man to celebrities
• Users often discuss current affairs and share personal views on
• various subjects
• Tweets are small in length and hence unambiguous
Problem Statement
Given a message, decide whether the message is of positive, negative, or neutral
sentiment. For messages conveying both a positive and negative sentiment,
whichever is the stronger sentiment should be chosen
Challenges
• People express opinions in complex ways
• In opinion texts, lexical content alone can be misleading
• Intra-textual and sub-sentential reversals,negation, topic change common
• Rhetorical devices/modes such as sarcasm, irony, implication, etc.
• Unstructured and also non-grammatical
• Lexical Variation
• Out of Vocabulary Words
• Extensive usage of acronyms like asap, lol, afaik
Twitter Dataset
Preprocessing
Tokenizer
Feature Extraction (Word +Senti Feature)
Classification(unigram-bigram SVM/Bayes )
Process Flow
Training
Testing
Extracted Features
• Word feature
• Word polarity score using wordnet
• Positive/Negative Hash Tags
• Positive/Negative/Extremely Positive/Extremely Negative Emoticons
• Negations
• POS tag polarity score(Noun,Preposition,Adjectives)
• Special characters
• Count of repetition words
• Count of Non English words
• Count of Acronyms
Classifiers
• Naive Bayes Classifier
• SVM
Analysis and Results
Classifiers % Accuracy
Unigram + Bayes Classification function 50*
Bigram + Bayes Classification function 54*
Unigram + SVM 65*
Unigram+ Senti-Feature+SVM 66*
Unigram+ Senti-Feature+POS polarity+SVM 68*
Conclusion
We conclude that extracting features and POS tagging of tweets gives the best
result using SVM classifier
There is always a scope of increasing the accuracy by extracting more features
which are relevant for the sentiments
Increasing the n-gram value to more than 2 does not necessarily increase the
accuracy

Twitter sentiment analysis

  • 1.
    Twitter Sentiment Analysis AkhilBatra Avinash Kalivarapu Sunil Kandari
  • 2.
    Sentiment Analysis ? •Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. • Also referred to as opinion mining, it makes our goal to determine whether the data(tweet) is positive, negative or neutral.
  • 3.
    Why is SentimentAnalysis Important? • In public opinions eg: • Is this product review positive or negative? • Is this customer email satisfied or dissatisfied? • Based on a sample of tweets, how are people responding to this ad campaign/product release/news item? • How have bloggers' attitudes about the president changed since the election?
  • 4.
    Why Twitter Datafor Sentiment Analysis? • Popular microblogging site • Short Text Messages of 140 characters • 240+ million active users • 500 million tweets are generated everyday • Twitter audience varies from common man to celebrities • Users often discuss current affairs and share personal views on • various subjects • Tweets are small in length and hence unambiguous
  • 5.
    Problem Statement Given amessage, decide whether the message is of positive, negative, or neutral sentiment. For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen
  • 6.
    Challenges • People expressopinions in complex ways • In opinion texts, lexical content alone can be misleading • Intra-textual and sub-sentential reversals,negation, topic change common • Rhetorical devices/modes such as sarcasm, irony, implication, etc. • Unstructured and also non-grammatical • Lexical Variation • Out of Vocabulary Words • Extensive usage of acronyms like asap, lol, afaik
  • 7.
    Twitter Dataset Preprocessing Tokenizer Feature Extraction(Word +Senti Feature) Classification(unigram-bigram SVM/Bayes ) Process Flow
  • 8.
  • 9.
  • 10.
    Extracted Features • Wordfeature • Word polarity score using wordnet • Positive/Negative Hash Tags • Positive/Negative/Extremely Positive/Extremely Negative Emoticons • Negations • POS tag polarity score(Noun,Preposition,Adjectives) • Special characters • Count of repetition words • Count of Non English words • Count of Acronyms
  • 11.
    Classifiers • Naive BayesClassifier • SVM
  • 12.
    Analysis and Results Classifiers% Accuracy Unigram + Bayes Classification function 50* Bigram + Bayes Classification function 54* Unigram + SVM 65* Unigram+ Senti-Feature+SVM 66* Unigram+ Senti-Feature+POS polarity+SVM 68*
  • 14.
    Conclusion We conclude thatextracting features and POS tagging of tweets gives the best result using SVM classifier There is always a scope of increasing the accuracy by extracting more features which are relevant for the sentiments Increasing the n-gram value to more than 2 does not necessarily increase the accuracy