This document summarizes the work of Associate Professors Axel Bruns and Jean Burgess on developing new methods for analyzing social media data, with a focus on Twitter metrics and analysis. It outlines their work mapping online publics and analyzing large-scale Twitter data to understand issue-based hashtag communities and the Australian Twitter sphere. It also discusses challenges around data access, research ethics, and integrating computational analysis with traditional social research methods.
Notes towards the Scientific Study of Public Communication on Twitter
1. Notes towards the
Scientific Study of
Public Communication
on Twitter
Associate Professor Axel Bruns
Associate Professor Jean Burgess
@snurb_dot_info | @jeanburgess
http://mappingonlinepublics.net/
Queensland University of Technology
2.
3. CCI INTERNET AND SOCIETY PROJECTS
MEDIA ECOLOGIES & METHODOLOGICAL INNOVATION
(Axel Bruns, Jean Burgess, Tim Highfield et al.)
Development of new interdisciplinary methods (especially
computational methods) for media and communication studies, in order
to better map, track and analyse the changing media environment.
MAPPING ONLINE PUBLICS
- Large-scale blogosphere mapping and (mostly) Twitter analysis
- Crisis, politics, culture – e.g. #qldfloods #ausvotes #royalwedding
- Comprehensive project website and blog at http://mappingonlinepublics.net/
5. ‘BIG DATA’ & INTERNET RESEARCH
• Big Data as currency across the sciences and social sciences
• Business intelligence & ‘data markets’ (including social media &
online behavioral data)
• ‘Computational turn’ in new humanities research: shift from
computational tools to a new computational paradigm, changing the
ontologies and epistemologies of humanities research (Berry, 2012)
• E.g. shift from ‘close’ to ‘distant’ reading (Moretti); ‘software studies’
(e.g. Fuller, 2008) and ANT approaches to new media platforms
• From ‘virtual’/trad social research methods to ‘natively’ digital
methods to diagnose patterns of social change (Rogers, 2009)
6. DATA-DRIVEN SOCIAL MEDIA RESEARCH
• Twitter – an emergent and dynamic site of public communication
• Wide range of quantitative and qualitative methods in humanities &
social sciences to understand cultures of use, power relations etc.
• Computer/information science – data-driven, large-scale
‘computational’ approaches (e.g. SNA, diffusion of information, etc.)
• Data-driven, ‘natively’ digital methods within humanities-oriented
media and communication studies proving to be very productive
• Need for shared basic metrics:
– For comparative work (across topics & events; across international
research teams)
– As a baseline for development of mixed-methods research
7. DEVELOPING TWITTER METRICS
• Key data points available through the Twitter API:
– text: contents of the tweet itself, in 140 characters or less
– to_user_id: numerical ID of the tweet recipient (for @replies)
– from_user: screen name of the tweet sender
– id: numerical ID of the tweet itself
– from_user_id: numerical ID of the tweet sender
– iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language
– source: client software used to tweet (e.g. Web, Tweetdeck, ...)
– profile_image_url: URL of the tweet sender’s profile picture
– geo_type: format of the sender’s geographical coordinates
– geo_coordinates_0: first element of the geographical coordinates
– geo_coordinates_1: second element of the geographical coordinates
– created_at: tweet timestamp in human-readable format
– time: tweet timestamp as a numerical Unix timestamp
8. DEVELOPING TWITTER METRICS
• Additional data points from tweets:
– original tweets: tweets which are neither @reply nor retweet
– retweets: tweets which contain RT @user… (or similar)
• unedited retweets: retweets which start with RT @user…
• edited retweets: retweets do not start with RT @user…
– genuine @replies: tweets which contain @user, but are not retweets
– URL sharing: tweets which contain URLs
• Potential uses:
– metrics per hashtag
– metrics per timeframe (day, hour, minute, second, …)
– metrics per user (or group of users)
– …
12. TOWARDS A TYPOLOGY
OF TWITTER USES
• How are hashtags used (during acute events)?
– Gatewatching:
• Finding and sharing information about breaking news (before the
mainstream media do?)
• Ad hoc publics: many URLs, many retweets (even unedited)
– Audiencing:
• Shared experience of major (foreseen) events
• Imagined community of fellow participants: few URLs, limited retweeting
• What other uses are there?
– Continuing discussions (#auspol, #bundesliga, …)
– Memes (#ghettohurricanenames, …)
– Emotive hashtags (#fail, #win, #headdesk, …)
– What about keywords?
13. BEYOND HASHTAGS
• Publics on Twitter:
– Micro: @reply and retweet conversations
– Meso: hashtag ‘communities’
– Macro: follower/followee networks
Multiple overlapping publics / networks
• What drives their formation and dissipation?
• How do they interact and interweave?
• How are they interleaved with the wider media ecology?
• Twitter doesn’t contain publics: publics transcend Twitter
14. UNDERSTANDING
AUSTRALIAN TWITTER USE
• What is the Australian Twitter userbase?
– Large-scale snowballing project
– Starting from selected hashtag communities
(e.g. #ausvotes, #qldfloods, #masterchef)
– Identifying participating users, testing for ‘Australianness’:
• Timezone setting, location information, profile information
– Retrieving follower/followee information for each account (very slow)
• Progress update:
– ~950,000 Australian users identified so far, ~21m connections
~2 million Australian users in total?
15. THE AUSTRALIAN TWITTERSPHERE?
Follower/followee network:
~120,000 Australian Twitter users
(of ~950,000 known accounts by early 2012)
colour = outdegree, size = indegree
16. Real Estate
Jobs
Property
HR
Business
Parenting
THEMATIC CLUSTERS
Business Mums Craft
Design
Social Media Property Arts
Web
Creative Tech Food
Perth PR Wine
Marketing / PR Advertising
IT
Beer
Tech
Creative
Social
Design
ICTs
NGOs Fashion
Utilities
Farming Social Policy Beauty
Services
Agriculture Net Culture
Adelaide
Opinion Books Theatre
Greens News Literature Film Arts
Publishing
ALP
Hardline Progressives
News @KRuddMP
Conservatives
@JuliaGillard Radio
Conservatives TV Music
Journalists Triple J
Talkback
Dance
Breakfast TV
Hip Hop
Cycling Celebrities
Union
Evangelicals Swimming
NRL V8s
Football Teens
Christians
Cricket Teaching Hillsong
AFL e-Learning
Schools Jonas Bros.
Beliebers
17. #AUSPOL
Follower/followee network:
~120,000 Australian Twitter users
(of ~950,000 known accounts by early 2012)
colour = #auspol tweets, size = indegree
18. #AUSVOTES
Follower/followee network:
~120,000 Australian Twitter users
(of ~950,000 known accounts by early 2012)
colour = #ausvotes tweets, size = indegree
19. #ROYALWEDDING
Follower/followee network:
~120,000 Australian Twitter users
(of ~950,000 known accounts by early 2012)
colour = #royalwedding tweets, size = indeg.
20. LOOKING AHEAD
• Media & communication studies of social media as a ‘special case’
of the big data paradigm: we can be more reflexive
• Entangled with the evolving business models & architectures of
social media platforms, shifting and variable regulatory
structures, as well as public anxieties around the control and use of
our social data
• Still need to move beyond ‘snapshots’ and single platform studies
(cf. Burgess & Green, 2009): accounting for complexity and change
21. CHALLENGES
• Data access, platform volatility
– Limitations of API rules and TOS, lack of public archives
– Siloed datagathering, difficult to share and compare
– ‘data divide’ (boyd & Crawford) caused by uneven $ and skills
distribution
22.
23.
24. CHALLENGES
• Propagation and regularisation of methods (and consequences for
research training)
– ‘Code literacy’ sufficient to engage with the material
consequences of software platforms
• Better integration with existing social and cultural theory & empirical
work
– Mixed methods, especially integration of qualitative and
ethnographic approaches
25. CHALLENGES
• Research ethics
– textual research or ‘human subjects’ research?
– Consequences of public/personal convergence, data
markets/open data movement, and ‘context collapse’ (boyd)
– Cross-national and cross-disciplinary differences, need for public
discussion (Burgess & Puschmann, in progress)