Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Rob Procter
1. READING
THE RIOTS
ON TWITTER
Challenges of new
social media for
quantitative
researchers
Rob Procter (University of Manchester)
Farida Vis (University of Leicester)
Alexander Voss (University of St Andrews)
http://www.analysingsocialmedia.org/
#readingtheriots
2. READING
THE RIOTS
ON TWITTER Methodology
• Development of computer-based tools for
sentiment and topic analysis of tweets is an
active area of research.
• Our methodology combines computer-based
tools with established content analysis
techniques in ways that are complementary to
their respective strengths.
3. READING
THE RIOTS
ON TWITTER Information Flows
• Any collection of tweets can be divided into tweets
that are ‘original’ and retweets.
• If we are interested in how Twitter is used to
communicate and share information, the only
reliable evidence that a tweet has been read is that it
has been retweeted.
• We use computational tools to group a tweet (the
parent) and its retweets (its children) into
information flows.
4. READING
THE RIOTS
ON TWITTER
Information Flow Analysis
For N = 1, CorpusMax
InformationFlow[N-1] = {}
If Corpus[N] == “RT @”.username.body
(LevenshteinDistance, Parent) = LDMin(N-
1, username, body)
If LevenshteinDistance< 30
InformationFlow[Parent] =
InformationFlow[Parent] + Corpus[N]
5. READING
THE RIOTS
ON TWITTER Example Information Flows
Riots Corpus
Great sight in my #Birmingham where #Pakistani lads are
2.6M Tweets protecting temples while Sikh lads protecting the mosques: 758
700,000 accounts
incitement pls?: 5
Can we have them arrested for
Hackney! Fuck the feds! #hackney
#punchcroft has just posted Go on
someone calling themselves
6. READING
THE RIOTS
ON TWITTER Coding Frames
• We use established methods of content
analysis to understand how Twitter was being
used in the context of topics we wished to
analyse.
• Inductively code information flows to develop
a ‘code frame’ to categorise topics and
examine relationships in context of a given
topic.
15. READING
THE RIOTS
ON TWITTER
Sampling Issues
• Riots corpus selected from Twitter firehose
using set of hashtags:
– Sample may systematically exclude some
relevant data.
• Twitter users not representative of the
population as a whole:
– Younger, better off, better educated, urban
– How can we use profile info to counter bias?
16. READING
THE RIOTS
ON TWITTER
Twitter Data APIs
• Twitter offers a number of different APIs
providing access to different sets of data.
• Differences are in terms of:
– Timescale
– Real-time vs. retrospective
– Completeness
– Functionality to specify subsets of tweets
17. READING
THE RIOTS
ON TWITTER
Search and REST APIs
• Search API – unauthenticated use:
– Search by keyword, account, etc.
– No tweets older than about one week
– Rate limited by details not published
• REST API – authenticated use, account centric:
– Retrieve tweets, profile data, friends & followers,
etc. and authenticated users’ direct messages
– Searching public tweets also possible
– Rate limited to 350 requests per hour
18. READING
THE RIOTS
ON TWITTER
REST and Search Limitations
• Users can delete tweets.
• Twitter will retire tweets (depending on
traffic an account generates).
• Rate limiting means it is difficult to
collect substantial corpora that are
complete.
19. READING
THE RIOTS
ON TWITTER
Twitter Streaming API
• Twitter Streaming API allows streaming either a
random sample or tweets selected by keyword
(track), account (follow) or geo-location (but few
tweets are geo-coded).
• Track and follow can be rate-limited if too much
traffic is generated, but in a way as to produce a
random sample (needs to be confirmed).
21. READING
THE RIOTS
ON TWITTER
Reliability of Computer-based Tools
• If, for a given corpus sample, a computer-
based tool matches performance of human
coders with a precision of y%:
– what is the estimated precision over the whole
corpus?
• How would a representative corpus sample be
specified?
25. READING
THE RIOTS
ON TWITTER
Challenges of new
social media for
quantitative
researchers
Rob Procter (University of Manchester)
Farida Vis (University of Leicester)
Alexander Voss (University of St Andrews)
http://www.analysingsocialmedia.org/
#readingtheriots
Editor's Notes
The cloud economics argument from Amazon shows how traditional forms of providing computational and storage resources are either wasteful or risk customer dissatisfaction. Using a cloud model, the level of resource provision can be adapted to current demand. The St Andrews Cloud Collaboratory (StACC) is a private cloud (actually, more than one) that allows us to allocate resources to a research project when needed and release them for other uses when not needed for the project. This allows St Andrews to do more research per server room / watt / CO2.