Z Score,T Score, Percential Rank and Box Plot Graph
Sentiment Analysis in Twitter
1. Sentiment Analysis for Twitter
Priyanka Bajaj priyanka.bajaj@students.iiit.ac.in
Kamal Gurala kamal.gurala@students.iiit.ac.in
Faraz Alam faraz.alam@students.iiit.ac.in
Ritesh Kumar Gupta ritesh.kumar.gupta@in.ibm.com
Guided By : Satarupa Guha satarupaguha11@gmail.com
2. AGENDA
1.Introduction – Sentiment Analysis
2.About Twitter and Our Goal
3.Glossary
4.Challenges
5.Approach
6.Results and Conclusion
7.Tools and Technologies
3. What is Sentiment Analysis?
Mechanism to extract opinions, emotions and sentiments in
text
Enable us to track attitudes and feelings on the web based on
blog posts, comments, reviews and tweets on different topics
Enable to track products, brands and people and determine
whether they are viewed positively or negatively on the web.
acts: "The painting was more expensive than a Monet"
pinions: "I honestly don't like Monet, Pollock is the
better” artist"
4. Challenges
• Tweets are highly unstructured and also non-
grammatical
• Out of Vocabulary Words
• Lexical Variation
• Extensive usage of acronyms like asap, lol, afaik
6. • Tweet Downloader
– Download the tweets using Twitter API
• Tokenisation
– Twitter specific POS Tagger developed by ARK Social
Media Search
• Preprocessing
– Replacing Emoticons by their polarity, assign scores
– Remove URL, Target Mentions
– Replace #text -> text, since hashtags may contribute to the
sentiment
– Replace Sequence of Repeated Characters eg. ‘cooooool’
by ‘cool’ and assign higher score
– Twitter specific stop word removal
– Acronym expansion
System Details
7. • Feature Extractor
– Unigrams and Bigrams
– Polarity Score of the Tweet (f1)
– Count of Positive/Negative Words (f2,f3)
– Maximum Positive/Negative Score for Words (f4,f5)
– Count of Positive/Negative Emoticons and assign
scores(contibutes to all f1,f2,f3,f4,f5)
– Positive/Negative special POS Tags Polarity Score
• Classifier and Prediction
– Features extracted are fed into to SVM classifier
– Model built used to predict sentiment of new tweets
System Details Contd.
8. Results and Conclusion
A baseline model by taking the unigrams, and
compare it with the bigrams and lexicon features
model
Sub-Task Baseline Model Feature Based
Model
Sentence Based 49.81% 57.85%
Accuracy F1 Score (f-Measure)
Sub-Task Baseline Model Feature Based
Model
Sentence Based 55.56 61.17
• We investigated two kinds of models: Baseline and
Feature Based Models
• For our feature-based approach, feature analysis reveals
that the most important features are bigrams and those
that combine the prior polarity of words and their parts-
of-speech tags