CHARACTERISING TWITTER REPLY CHAINS IN
THE AUSTRALIAN TWITTERSPHERE
Brenda Moon
Digital Media Research Centre
Queensland University of Technology
brenda.moon@qut.edu.au, @brendam
Reply trees or reply cascades
Reply tree with all tweets in TrISMA
dataset
Reply tree with missing tweet
Reply tree with tweets outside
TrISMA dataset
Finding reply trees – previous approach
Seed tweet in_reply_to
tweet_id
in_reply_to
tweet_id
follow reply
chain back in
time using
‘in_reply_to’
field which
contains the
tweet_id of the
replied to
tweet
searching for
tweet_id in
‘in_reply_to’
field
Finding reply trees – new approach
• Set a unique reply_tree_id for each tweet in a reply tree by
working through all tweets in dataset in time order starting
from oldest
• reply_tree_id is set to the tweet_id of the originating tweet
(originating tweets don’t have in_reply_to set) if it is in the
TrISMA dataset, otherwise to the in_reply_to field of the
oldest tweet in the reply tree in the TrISMA dataset.
• The reply_tree_id may be a tweet id that is outside the
TrISMA dataset.
• For each reply tweet, can find reply_tree_id by looking up the
tweet it is in reply.
Results based on new approach
Identified
• 383 million tweets as being part of reply trees in the
whole TrISMA dataset (20.4% of all tweets)
– Of these, 56 million (14.6%) are replying to tweets inside
the TrISMA dataset and 323 million are replying to external
tweets
• These reply to 322 million unique tweets
– of which 47 million (14.6%) are in the TrISMA dataset and
275 million are external.
Histogram of reply tree sizes
Structure of reply trees: single layer vs
multiple layers
• Using ratio of reply tweets to reply target tweets as
indicator.
– 1/n (approaching 0) is single layer – all tweets replying to
single original tweet
– 1 is linear – each tweet replies to previous one in the tree
Ratio of unique reply targets to total replies by reply tree size
(count of in_reply_to)
(largest trees filtered, min tree size 10)
Largest reply tree
• Largest reply tree has 11,656
tweets replying to 41 different
tweets – nearly all the tweets reply
to an original tweet by a celebrity
(original tweet has been deleted
and is outside TrISMA dataset)
• single layer tree
• Replies like:
– @NiallOfficial can't wait it's amazing!!
– @NiallOfficial follow me please!!!!
– @NiallOfficial follow me!
Reply tree with most unique replies
• A single conversation between two accounts with
230 tweets has the most unique reply targets of all
the reply trees
• There are 180 unique reply targets
• The conversation continues from 30/8/2010 to
27/10/2010
• Looks like a private conversation, range of day to day
topics.
Limitations
• The initial results show that 65% of the replied-to tweets
are not in the TrISMA dataset, which is limited to tweets
from Australian accounts. Although it is possible to
retrieve these tweets using the Twitter API to find the
older tweets in the reply tree, it is not possible to go the
other way and find external tweets that reply to tweets
in the TrISMA dataset except by manually using the
Twitter client interface.
• I focus on the Australian Twittersphere so I exclude these
external tweets, even though that will reduce the lengths
of some of the reply chains.
Further work
Now that all the reply trees in the TrISMA database have been
identified it is possible to:
• continue with the analysis of the reply trees, looking at different
metrics for reply trees and variations over time.
• evaluate the utility of reply tree metrics in categorising different
conversational patterns. This may make it possible to
computationally distinguish types of conversation, for example
single-authored threads, rapid and heated arguments, extended
chats, multi-user responses to a single controversial tweet.
• apply the reply trees for large-scale social media discourse analysis
and in computational filtering of ‘big social data’ into smaller
samples for analysis.
http://mappingonlinepublics.net/
@brendam
@socialmediaQUT – http://socialmedia.qut.edu.au/
@qutdmrc – https://www.qut.edu.au/research/dmrc
This research is funded by the Australian Research Council through Future Fellowship and LIEF grants FT130100703 and
LE140100148.

Characterising Twitter reply chains in the Australian twittersphere

  • 1.
    CHARACTERISING TWITTER REPLYCHAINS IN THE AUSTRALIAN TWITTERSPHERE Brenda Moon Digital Media Research Centre Queensland University of Technology brenda.moon@qut.edu.au, @brendam
  • 2.
    Reply trees orreply cascades
  • 3.
    Reply tree withall tweets in TrISMA dataset
  • 4.
    Reply tree withmissing tweet
  • 5.
    Reply tree withtweets outside TrISMA dataset
  • 6.
    Finding reply trees– previous approach Seed tweet in_reply_to tweet_id in_reply_to tweet_id follow reply chain back in time using ‘in_reply_to’ field which contains the tweet_id of the replied to tweet searching for tweet_id in ‘in_reply_to’ field
  • 7.
    Finding reply trees– new approach • Set a unique reply_tree_id for each tweet in a reply tree by working through all tweets in dataset in time order starting from oldest • reply_tree_id is set to the tweet_id of the originating tweet (originating tweets don’t have in_reply_to set) if it is in the TrISMA dataset, otherwise to the in_reply_to field of the oldest tweet in the reply tree in the TrISMA dataset. • The reply_tree_id may be a tweet id that is outside the TrISMA dataset. • For each reply tweet, can find reply_tree_id by looking up the tweet it is in reply.
  • 8.
    Results based onnew approach Identified • 383 million tweets as being part of reply trees in the whole TrISMA dataset (20.4% of all tweets) – Of these, 56 million (14.6%) are replying to tweets inside the TrISMA dataset and 323 million are replying to external tweets • These reply to 322 million unique tweets – of which 47 million (14.6%) are in the TrISMA dataset and 275 million are external.
  • 9.
  • 10.
    Structure of replytrees: single layer vs multiple layers • Using ratio of reply tweets to reply target tweets as indicator. – 1/n (approaching 0) is single layer – all tweets replying to single original tweet – 1 is linear – each tweet replies to previous one in the tree
  • 11.
    Ratio of uniquereply targets to total replies by reply tree size (count of in_reply_to) (largest trees filtered, min tree size 10)
  • 12.
    Largest reply tree •Largest reply tree has 11,656 tweets replying to 41 different tweets – nearly all the tweets reply to an original tweet by a celebrity (original tweet has been deleted and is outside TrISMA dataset) • single layer tree • Replies like: – @NiallOfficial can't wait it's amazing!! – @NiallOfficial follow me please!!!! – @NiallOfficial follow me!
  • 13.
    Reply tree withmost unique replies • A single conversation between two accounts with 230 tweets has the most unique reply targets of all the reply trees • There are 180 unique reply targets • The conversation continues from 30/8/2010 to 27/10/2010 • Looks like a private conversation, range of day to day topics.
  • 14.
    Limitations • The initialresults show that 65% of the replied-to tweets are not in the TrISMA dataset, which is limited to tweets from Australian accounts. Although it is possible to retrieve these tweets using the Twitter API to find the older tweets in the reply tree, it is not possible to go the other way and find external tweets that reply to tweets in the TrISMA dataset except by manually using the Twitter client interface. • I focus on the Australian Twittersphere so I exclude these external tweets, even though that will reduce the lengths of some of the reply chains.
  • 15.
    Further work Now thatall the reply trees in the TrISMA database have been identified it is possible to: • continue with the analysis of the reply trees, looking at different metrics for reply trees and variations over time. • evaluate the utility of reply tree metrics in categorising different conversational patterns. This may make it possible to computationally distinguish types of conversation, for example single-authored threads, rapid and heated arguments, extended chats, multi-user responses to a single controversial tweet. • apply the reply trees for large-scale social media discourse analysis and in computational filtering of ‘big social data’ into smaller samples for analysis.
  • 16.
    http://mappingonlinepublics.net/ @brendam @socialmediaQUT – http://socialmedia.qut.edu.au/ @qutdmrc– https://www.qut.edu.au/research/dmrc This research is funded by the Australian Research Council through Future Fellowship and LIEF grants FT130100703 and LE140100148.

Editor's Notes

  • #3 While working on this I have realised that reply chain is a misleading name, and changed to using reply trees or reply cascades.
  • #10 This shows that a lot of the conversation in Australia happens with accounts outside of Australia, and that most of the reply trees are of the deep, ‘thread’ type.
  • #11 Most reply trees are very short small (note that y axis is logarithmic scale) – log log graph is almost straight line
  • #13 Larger reply trees tend to be created by many replies to few tweets.