Qualitative analysis

Qualitative Analysis of
Social Media
Ayush Pareek

Outline
● Qualitative Data
● Qualitative Analysis
● Sentiment Analysis
○ What?
○ Why?
○ Literature Survey
■ General Approach
■ Recent Solutions
■ State of the art
■ Practical Implementation
○ Proposed Solution
■ Dataset
■ Feature Reduction
■ Features and Negation Handling
○ Experiment
○ Results
● Future Work
● References

Qualitative Data
● Score: Sunderland 0 Liverpool 0
● Newspaper: There was more excitement in the Selhurst car park than on the
pitch…
Event
Quantitative Qualitative

Qualitative Analysis
Favourable
for Team A
Favourable
for Team B
Describing
Connecting
Classifying
Video Footage of the Match

What is Sentiment Analysis ?
● It is a classification of a given text in the
document,sentence or phrase
● This to determine whether expressed opinion in the text is
positive,negative or neutral.

Why Sentiment Analysis is important?
● Microblogging as become important communication tool
● Opinion of the mass is important
● Political party may want to know whether people support their or not.
● Before investing into a company, one can leverage the sentiment of
the people for the company to find out where it stands
● A company might want find out the reviews of its products

Using Twitter for Sentiment Analysis
● Twitter is popular micro blogging site
● It allows short text messages of 280 Characters
● More than 330 Million Active Users
● Every second, on average, around 6,000 tweets are tweeted
● Twitter audience varies from common man to celebrities
● Users often discuss current affairs and share personal views on various
subjects
● Tweets are small in length and hence unambiguous

Recent Solutions
1. Twitter as a Corpus for Sentiment Analysis and Opinion Mining
2. Twitter Sentiment Analysis: The Good the Bad and the OMG!
3. Twitter Sentiment Classification using Distant Supervision
4. Semantic Sentiment Analysis of Twitter

Twitter as a Corpus for Sentiment Analysis and
Opinion Mining
Alexander Pak, Patrick Paroubek
1. Collected tweets for training by querying Emotions
2. Feature Extraction : Filtering, Tokenization, Removing stop
words and Constructing n-grams
3. Classifiers Used : Naive Bayes and SVM
4. Tested the impact of an n-gram order.

Twitter Sentiment Analysis : The Good the Bad and the OMG!
● Evaluating the Usefulness of features
● Explored a novel method for building dataset
● Data preprocessing : n-gram, Lexicon, POS and Microblogging
● The best performance on the evaluation data
EfthymiosKouloumpis, TheresaWilson, JohannaMoore

Twitter Sentiment Classification using Distant Supervision
1. Collected tweets by querying emoticons
2. Features Reduction: Usernames, Usage of links and Repeated
letters
3. Classifiers : BaseLine, Naive Bayes and SVM
4. Emoticon Training Data improved Accuracy.POS found not useful.
Alec Go, Richa Bhayani, Lei Huang

Semantic Sentiment Analysis of Twitter
● Semantic features for training the model
● Approaches for extracting and incorporating these
features
● Perform comparison of various approaches.
● Demonstrate the value of not removing stop words
Hassan Saif, Yulan He and Harith Alani

Sentiment Treebank
Stanford Ph.D. student Richard Socher
Developed a computer model that can accurately classify the sentiment of a
sentence 85 percent of the time.
Socher’s team pulled off its accomplishment by focusing not just on single words,
but on entire sentences.
They took nearly 11,000 sentences from online movie reviews (from research
database culled from Rotten Tomatoes, specifically) and created what the team
has dubbed the Sentiment Treebank.

The team split those nearly 11,000 sentences into more than 215,000 individual
phrases and then used human workers — via Amazon Mechanical Turk — to
classify each phrase on a scale from “very negative” to “very positive.”
The Labeling Interface

A visual representation of how model breaks down sentences

Twitter Sentiment Analysis :
> Installing packages
> Authentication
> Fetching the tweets
> Preprocessing the tweets
> get sentiment for the tweets
> Agglomerate the sentiments

Packages used :
● re
● tweepy
● OAuthHandler
● TextBlob

Tweepy :
To begin the process:
● We need to register our client application with Twitter.
● Create a new application.
● Create an Access Token
● 2 Key-Pairs:
■ Consumer key/Secret
■ Access Token/Secret
● Create an OAuthHandler instance

attempt authentication:
auth = OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token, access_token_secret)
try:
api = tweepy.API(auth)
print("Successfull Authentication")
except: print("Error: Authentication Failed")

Fetching the tweets :
tc = TwitterClient()
query="Google"
tweets = tc.get_tweets(query, count = 500)
Inside TwitterClient.get_tweets():
fetched_tweets = [tweets for tweets in tweepy.Cursor(api.search, q
=query).items(count)]

Benefits of Preprocessing :
➢ Noise Removal
➢ Normalisation
➢ Natural Language Analysis
○ Tokenization
○ Sentence splitting
○ Stop-word removal
○ Stemming

TextBlob : Python library for processing textual data.
● N-grams:
I/P=>blob = TextBlob(“It is a Python library")
blob.ngrams(n=3)
O/P=>[['It’, 'is', 'a'],['is', 'a', 'Python'],['a', 'Python', 'library']]
● Words Inflection and Lemmatization:
sentence = TextBlob('Use 4 spaces per indentation level.')
sentence.words[2].singularize()
O/P=> 'space'
from textblob import Word
w = Word("octopi")
w.lemmatize()
O/P=> 'octopus'

● Sentiment Analysis:
Module: textblob.sentiments
> Pattern Analyzer (based on the pattern library) :
polarity : [-1.0, 1.0]
subjectivity : [0.0, 1.0] (0.0 is very objective, 1.0 is very subjective)
eg.Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)
> Naive Bayes Analyzer (an NLTK classifier trained on a movie reviews corpus):
eg.Sentiment(classification='pos', p_pos=0.7996209910191279, p_neg=0.2003790089808724)

Output :
Query: Google
Tweets fetched :500

Features
● Most common feature set- “Word n-grams”
● What is an n-gram?
○ Example- “My name is Ayush”
● Unigrams- ‘My’ , ’name’ , ’is’ , ’Ayush’
● Bigrams - ‘My name’ , ‘name is’ , ‘is Ayush’
● No clear conclusion regarding the performance of n-grams in sentiment analysis
● Different results based on task (Pang & Lee, 2008)
○ Unigrams alone are better than bigrams for classification of movie reviews
○ Bigrams and trigrams yield better product-review polarity classification
● Research Problem: Which combination of features give better accuracy for sentiment analysis on a
general corpus?

Proposed Solution: Dataset
● Twitter Sentiment Corpus -
○ Collection of 5513 tweets
collected for 4 topics-
Apple, Google, Microsoft,
Twitter
○ Each tweet is classified
as Positive or Negative
Table 1: Twitter Sentiment Corpus

Proposed Solution: Dataset
● Stanford Twitter Corpus
○ Collected by querying
Twitter API for 5000+
tweets and manually
labelling them as
positive or negative
Table 2: Stanford Twitter Corpus

Preprocessing: Reducing the size of feature set

Results of Reduction in Feature set + Stemming
Table 3: Results of feature reduction Table 4: Basic steps of Porter
Stemmer

Feature - Unigram
Graph 1: 50 most frequent unigrams (Unigram v/s Frequency)
Presence v/s
Frequency
(Pak & Paroubek, 2010)

Features: Ngrams
● Higher order N grams
○ Sparsely
populated
○ Remove those
which occur
only once
Graph 2: 50 most frequent bigrams (Bigram v/s Frequency)

Negation Handling
● “Ram is a good boy” vs “Ram is not a good boy”
● Two tasks-
○ Detection of negation cues
■ Using word list
○ scope of negation
■ Negation vector
● A method for negation detection based on Left and Right Distances of a token to the nearest explicit negation
cue. (Councill et al.)
● Example Tweet: “@Skype crashes too much ! Not expecting this from #microsoft”
● Words: [ ‘HASH_Skype’, crashes’,’too’,’much’,’PUNC_EXCL’,’not’,’expect’,’this’,’from’,’HASH_MICROSOFT’]
● Neg V: [ 0.0, 0.0, 0.0, 0.0, 0.0, 1, 0.9, 0.8, 0.7, 0.6 ]

Experiment
● Trying to find the best results by making different combination of features-
○ only unigrams,
○ unigrams + bigrams and trigrams,
○ unigrams + negation,
○ unigrams + bigrams and trigrams + negation.
● Classifier used-
○ Naive Bayes
■ Many researchers claim to have achieved best results using this classifier for sentiment
Analysis.(Bhayani & Huang, 2009) (Pak & Paroubek, 2010)
● Cross Validation
○ 10 fold for each of the 10 parts

Conclusion
● Took Sentiment Analysis as a detailed example of Qualitative Analysis
● Created a sentiment classifier for twitter using labelled data sets.
● Took Unigram as a baseline
● Investigated the use of Negation Detection and N-grams and found that they
can improve accuracy for our dataset
● Accuracy of Classifier increases on using Negation Detection or using
bigrams and trigrams
● Uni+Bi+Tri without negation gives best results=> 86.7% accuracy

Future Work
● Try to improve sentiment Analysis method
● Exploring further methods of Qualitative analysis such as Summarization.

References-
1. Saif, Hassan, Yulan He, and Harith Alani. "Semantic sentiment analysis of twitter." The Semantic
Web–ISWC 2012 (2012): 508-524.
2. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information
Retrieval 2(1-2), 1–135 (2008)
3. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision.Processing,
1–6 (2009)
4. Sanders, N.: Twitter sentiment corpus. Sanders Analytics, http://www.sananalytics.com/lab/twitter-
sentiment/
5. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining, vol. 2010, pp.
1320–1326 (2010)
6. Kouloumpis, E., Wilson, T., Moore, J.: Twitter sentiment analysis: The good the bad and the omg! In:
ICWSM, vol. 11, pp. 538–541 (2011)
7. Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of twitter..: What’s great and what’s not:
learning
8. to classify the scope of negation for improved sentiment analysis. pp. 51–59. Association for
Computational Linguistics (2010)

● Words: [ ‘HASH_Skype’, crashes’,’too’,’much’,’PUNC_EXCL’,’not’,’expect’,’this’,’from’,’HASH_MICROSOFT’]
● Neg V: [ 0.0, 0.0, 0.0, 0.0, 0.0, 1, 0.9, 0.8, 0.7, 0.6 ]
● Neg_Weighs:[ 1-(1*0) 1-(1*0) ... 1-(1*1) 1-(1*0.9)
● Neg_Weighs:[ 1 1 0 0.1 0.2 0.3 ]

Qualitative analysis

More Related Content

Recently uploaded

Featured

Qualitative analysis

Editor's Notes