This document provides information on analyzing Twitter data for social science and humanities research. It discusses how to gather Twitter data using tools like Mozdeh, and what types of analysis can be done, including time series analysis to identify trends over time, sentiment analysis to determine high sentiment topics, and comparing differences between subsets by gender or keywords. Content analysis of tweets can provide quick insights into research topics. The document also describes the SentiStrength algorithm for detecting sentiment in informal short text.
2. Twitter Data: Quick and Slow
1. Quick: http://topsy.com search, then click
2. Slow: Download Mozdeh
http://mozdeh.wlv.ac.uk,
collect data in advance
then analyse
Tweets per hour
including ‘Nobel’
Tweets per day
including ‘Nobel’
3. Gathering Twitter data
Tweets can be obtained via Mozdeh
Also: Chorus http://www.chorusanalytics.co.uk/
NodeXL and other software
Must be gathered in near to real time
Not available after about 2 weeks
Gather using keyword or phrase
searches
If historical tweets needed, buy from a
data provider
4. Twitter Time Series Analysis
with Mozdeh: Step by Step
Download free from http://mozdeh.wlv.ac.uk for Windows
17. Co-word analysis
Am automatic method to compare
different subsets of tweets for key
differences
Identifies words that are relatively
frequent compared to one keyword (or
gender) compared to another
18. The above example will compare the
words in tweets containing book and
read by gender
19. Words more common in male tweets containing “book”
than in female tweets containing “book”
21. Words more common in ? tweets than in ? Tweets
(male or female)
A statistical measure of the significance
of the difference between genders
22. Twitter research ideas
Monitor keyword searches for a topic
Analysis ideas
Time series – identify trends & causes of
any peaks
Content analysis of random sample tweets –
why is the topic tweeted?
Gender/sentiment/high frequency keywords
Network analysis of tweeters/tweeting &
qualitative analysis of key tweeters – who is
tweeting and who is successful?
23. How and why scholars cite on
Twitter – Priem & Costello
Example of a Twitter study using interviews
and content analysis of tweets
Gathered and analysed a sample of tweets
sent by scholars
Concluded that: “Twitter citations are also
uniquely conversational, reflecting a broader
discussion crossing traditional disciplinary
boundaries.”
24. High frequency word analysis
An analysis of high frequency words for
Nobel prizes found tweets about:
Alternative winners’ names non-
academic prizes only
Gender references Female winner only (9%
mentioned her gender).
Expressions of Sentiment literature
prize only (42% positive)
A quick analysis can give new insights.
25. Sentiment Strength Detection in
the Social Web -SentiStrength
• Detect positive and negative sentiment
strength in short informal text
Develop workarounds for lack of standard
grammar and spelling
Harness emotion expression forms unique to
MySpace or CMC (e.g., :-) or haaappppyyy!!!)
Classify simultaneously as positive 1-5 AND
negative 1-5 sentiment
Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web.
Journal of the American Society for Information Science and Technology, 63(1), 163-173.
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text.
Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.
26. SentiStrength Algorithm - Core
List of 2,489 positive and negative
sentiment term stems and strengths (1
to 5), e.g.
ache = -2, dislike = -3, hate=-4,
excruciating -5
encourage = 2, coolest = 3, lover = 4
Sentiment strength is highest in
sentence; or highest sentence if multiple
sentences
27. My legs ache.
You are the coolest.
I hate Paul but encourage him.
-2
3
-4 2
1, -2
positive, negative
3, -1
2, -4
28. Extra sentiment methods
spelling correction nicce -> nice
booster words alter strength very happy
negating words flip emotions not nice
repeated letters boost sentiment/+ve niiiice
emoticon list :) =+2
exclamation marks count as +2 unless –ve hi!
repeated punctuation boosts sentiment good!!!
negative emotion ignored in questions u h8 me?
Sentiment idiom list shock horror = -2
Online as http://sentistrength.wlv.ac.uk/
29. Tests against human coders
Data set
Positive
scores -
correlation
with
humans
Negative
scores -
correlation
with
humans
YouTube 0.589 0.521
MySpace 0.647 0.599
Twitter 0.541 0.499
Sports forum 0.567 0.541
Digg.com news 0.352 0.552
BBC forums 0.296 0.591
All 6 data sets 0.556 0.565
SentiStrength
agrees with
humans
as much as they
agree with each
other 1 is perfect agreement, 0 is random agreement
30. Why the bad results for BBC?
(and Digg)
Irony, sarcasm and expressive language
e.g.,
David Cameron must be very happy that I
have lost my job.
It is really interesting that David Cameron
and most of his ministers are millionaires.
Your argument is a joke.
$
31. Example – sentiment in major
media events
Analysis of a corpus of 1 month of English
Twitter posts (35 Million, from 2.7M accounts)
Automatic detection of spikes (events)
Assessment of whether sentiment changes
during major media events
32. Automatically-identified Twitter
spikes
9 Mar 2010
9 Feb 2010
Proportion of tweets
mentioning keyword
Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events.
Journal of the American Society for Information Science and Technology, 62(2), 406-418.
33. Chile
matchingpostsSentimentstrengthSubj.
Increase in –ve sentiment strength
9 Feb 2010
9 Feb 2010
Date and time
Date and time
9 Mar 2010
9 Mar 2010
Av. +ve sentiment
Just subj.
Av. -ve sentiment
Just subj.
Proportion of tweets
mentioning Chile
34. #oscars
%matchingpostsSentimentstrengthSubj.
Increase in –ve sentiment strength
Date and time
Date and time
9 Feb 2010
9 Feb 2010
9 Mar 2010
9 Mar 2010
Av. +ve sentime
Just subj.
Av. -ve sentime
Just subj.
Proportion of tweets
mentioning the Oscars
35. Sentiment and spikes
Statistical analysis of top 30 events:
Strong evidence that higher volume hours
have stronger negative sentiment than
lower volume hours
No evidence that higher volume hours
have different positive sentiment strength
than lower volume hours
=> Spikes are typified by small increases
in negativity
36. Summary
Tweets gathered free with Mozdeh
Search tweets and conduct content analysis
Time series analysis graphs – trends over time
Identify topics causing high sentiment
Identify differences by gender
Identify important topics by high freq. keywords
Identify important topic differences by co-words
Pilot test your ideas first
Gather tweets in advance for a period of time
Can give quick insights into your research
goals – or can be a primary research method
Editor's Notes
Twitter Analysis for the Social Sciences and Humanities
This talk will describe how to download and analyse tweets using the free software Mozdeh. Mozdeh allows users to enter a set of
queries and then collects tweets matching these queries for as long as needed. Once the collection period is complete, Mozdeh offers a
range of different quantitative analysis methods, from simple to complex. These methods include graphs of changes over time,
sentiment analysis, simple gender analysis and various types of word frequency analysis. Together, the methods can quickly identify
themes within the tweets and compare the content of different topics within them. The talk will demonstrate Mozdeh, describe its
main analysis methods and give examples of Twitter investigations.