1. Election 2010:
The View from Twitter
Axel Bruns / Jean Burgess
ARC Centre of Excellence for Creative Industries and Innovation,
Brisbane
a.bruns@qut.edu.au – @snurb_dot_info
je.burgess@qut.edu.au – @jeanburgess
http://mappingonlinepublics.net – http://cci.edu.au/
Image by campoalto
2. Project: New Media and Public Communication
• ARC Discovery (2010-12) – A$410.000
• Axel Bruns (CI), Jean Burgess (SRF) – QUT, Brisbane
• Lars Kirchhoff, Thomas Nicolai (PIs) – Sociomantic Labs, Berlin
• Project blog: http://mappingonlinepublics.net/
Year 1 Year 2 Year 3
Social network
sources:
YouTube
Flickr
Twitter
blogs
Research tools:
network crawler
content scraper
content analysis
network analysis
Research tool development and baseline
data
Baseline information:
data extraction
content creation
statistics
patterns in terms
and themes
baseline social
networking map
interconnections
between social
network spaces
Content creation patterns
Changes over time:
short-term statistics
regular / seasonal
patterns
Cluster profiling:
common themes /
patterns
lead users
Focus on specific events
Cultural dynamics:
rapid spread of new
ideas
communication
across clusters
thematic discourse
analysis
relationship with main-
stream media coverage
5. Data Processing – Twitter
• Tools:
• Gawk – Scripting tool für CSV processing (open source)
• Excel – Data aggregation, pivot tables and charts
• Leximancer / WordStat – Keyword extraction, co-occurence matrices
• Gephi – Network analysis and visualisation (open source)
# Extract @replies for network visualisation
#
# this script takes a CSV archive of tweets, and reworks it into network data for visualisation
#
# expected data format:
# text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type,
# geo_coordinates_0,geo_coordinates_1,created_at,time
#
# output format:
# from,to,tweet,time,timestamp
#
# the script extracts @replies from tweets, and creates duplicates where multiple @replies are
# present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in
# @user,@one,"@one @two hello" and @user,@two,"@one @two hello"
#
# Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au
BEGIN {
print "from,to,tweet,time,timestamp"
}
/@([A-Za-z0-9_]+)/ {
a=0
do {
match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)
a=a+atArray[1, "start"]+atArray[1, "length"]
if (atArray[1] != 0) print $3 "," atArray[1] "," $1 "," $12 "," $13
} while(atArray[1, "start"] != 0)
}
# filter.awk - Filter list of tweets
#
# this script takes a CSV or other list of tweets, and removes any lines that don't include RT
@username
# the script preserves the first line, expecting that it contains header information
#
# script expects command-line argument search={searchcriteria} _before_ the input CSV filename
# enclose the search term in quotation marks if it contains any special characters
#
# e.g.: gawk -F , -f filter.awk search="(julia|gillard)" tweets.csv >filteredtweets.csv
#
# expected data format:
# CSV or simple list of tweets, line-by-line
#
# output format:
# same as above, listing only retweets
#
# Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au
BEGIN {
getline
print $0
}
tolower($0) ~ search {
print $0
}
17. Notes and Limitations
• Twapperkeeper relies on #hashtags
• Problem if #hashtags are inconsistent/unclear
• Follow-on @replies and retweets may not continue to use #hashtags
• Casual commenters may not use #hashtags in the first place
• May miss early developments – e.g. #hashtag standardisation
• Twitter as a subset of society:
• Broadband policy and Internet filter over-, asylum seekers underrepresented
• #hashtag use is a further sign of self-selection
• Need to look to Twitter firehose for more comprehensive picture
• Need to track baseline activity to understand how exceptional #ausvotes was
• See more at mappingonlinepublics.net – up next: time-based animations...
• Or find us at @snurb_dot_info and @jeanburgess