‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
1. ‘Big Soc ial Data’ in Context:
Connecting Social Media Data
and Other Sources
Axel Bruns and Tim Highfield
Social Media Research Group
Queensland University of Technology
Brisbane, Australia
a.bruns / t.highfield @ qut.edu.au
@snurb_dot_info / @timhighfield
2. THE PROMISE OF BIG SOCIAL DATA
• Social media and big data:
– Substantial growth in social media usage
– User activity generates data and metadata
– Readily accessible through APIs
– New tools for processing and visualising big data at scale
• Emergence of social media analytics:
– Large-scale tracking of public user activities
– ‘Trending topics’, user sentiment, network influencers
– Scholarly and commercial research
– A ‘computational turn’ towards the digital humanities (David Berry)
– Ethical concerns around profiling and content ownership
3. BIG DATA AND SOCIETY
• New methodologies:
– Empirical, large-scale, real-time investigation
– Data-led, comprehensive evaluation rather than small-scale sampling of public
communication
– But also: combined quantitative/qualitative approaches
– Not studying the Internet, but studying society with the Internet (Richard Rogers)
• Applications:
– Political engagement, especially during elections, crises, scandals
– Crisis communication during natural and human-made disasters
– Engagement with mainstream media: watching, reading, sharing, …
– Brand communication, especially during brand crises
– Identification of earthquakes (USGS), tracking of epidemias (Google)
– …
4. SOCIAL MEDIA AND BEYOND
• Facebook, Twitter:
– Useful but highly particular areas of online activity
– Not necessarily generalisable to overall activity patterns
– Current research approaches and API limitations introduce further biases
• E.g. publics on Twitter:
– Micro: @reply and retweet conversations
– Meso: follower/followee networks
– Macro: #hashtag ‘communities’ (Bruns & Moe, 2014)
• Key needs in Twitter research:
– Situation of hashtags in wider communicative ecology on Twitter
– Day-to-day uses of Twitter, beyond and outside hashtags
– Dynamics of everyday quasi-private, interpersonal, and/or public communication
– Track impact of social and technological changes on these uses
5. BIG DATA, RARE DATA?
• The political economy of social media research:
– API-based data access is shaped to privilege certain approaches
– Research funding is easier to obtain for specific, limited purposes
– Longitudinal, ‘big’ data access requires ongoing, substantial funding and
infrastructure
– Exploratory, data-driven research is difficult to sell to most funding bodies
– Also related to divergent resources available to different scholarly disciplines
• Most ‘difficult’ large-scale social media research is conducted by Facebook /
Twitter and commercial research institutes
6. RESEARCH PROJECT
• ARC Future Fellowship:
– Four-year project, $876,973
– Axel Bruns (FF), Tim Highfield (Postdoc), Felix Münch (PhD1, 2014-2017);
PhD2 (2015-2018) – enquire within
At the intersection of mainstream, niche, and social media, the
processes by which public opinion forms and public debate unfolds
are increasingly complex, and poorly understood. This project draws
on large datasets and innovative methods to develop a new model of
the Australian online public sphere.
• Also supported by ARC LIEF project:
– Two-year project (2014/15; QUT, Curtin, Deakin, Swinburne) to develop
comprehensive infrastructure for large-scale social media data analytics
7. WHAT DATA SOURCES ARE THERE?
• Data sources:
– Facebook: user engagement with public pages (profile activity is semi-private)
– Twitter:
• hashtag, keyword, URL sharing datasets (public accounts only)
• Australian network data; Australian firehose (public accounts only)
– Other social media sources…
– Experian Hitwise:
• Australian Web browsing data (ISP-level, anonymous and opt-in panels, 1.5m users)
• Australian Web searching data (same methodology)
– Proprietary datasets:
• Website analytics for major news sites (e.g. News Ltd. / Fairfax Digital sites)
– Mainstream media monitoring:
• Content databases for mainstream media coverage
8. PROJECT AGENDA
• Data sources:
– Australian Internet browse / search patterns (Experian Hitwise)
– Online news media reading patterns (Fairfax Digital)
– Big social data on news sharing via social media (ARC LIEF)
• Multiple overlapping publics / networks:
– What drives their formation and dissipation?
– How do they interact and interweave?
– How are they interleaved with the wider media ecology?
– Social media do not contain publics: publics transcend social media
9. RESEARCH AIMS
• Methodological development (Y1):
– How do we process and integrate these data?
• Standard methods for gathering, processing, storing, analysing datasets
• Regular, automated workflows
• Short term (Y2):
– What happens as news breaks?
• Search, browsing, reading, sharing patterns
• Formation of ad hoc publics
• Medium term (Y3):
– How do themes, topics, actors wax and wane?
• Prominence in user activities
• Development of stable issue publics
• Long term (Y4):
– How do these patterns affect public opinion formation?
• Predictable patterns, stable networks of interaction
• Structural analysis of the online public sphere
13. Education
Agriculture
Literature
Adelaide / SA
Food
Wine
Beer
Leftists Hard Right
Netizens
Politics
Journalists
Marketing
Mums PR
Parenting
Real Estate
Investing
Home Business
Sole Traders
Self-Help
HR / Support
NRL
Followback
Urban Media
Utilities
Advertising
Business
TV
Fashion
Beauty
Arts
Cinema
News
TalkbackCycling
Music
V8s
UFC
AFL
Football
Horse Racing
Cricket
NRU
Celebrities
Hillsong
Perth
Pop
Media
Teen Idols
Cody Simpson
THE AUSTRALIAN TWITTERSPHERE
~140k Australian accounts with
degree > 1000, as of Sep. 2013
17. NEXT STEPS
• Further data points:
– More detailed data on search patterns (Experian Hitwise)
– Readership patterns (Fairfax Digital sites)
– Facebook audience engagement patterns with news pages
• Further analytical approaches:
– Activity patterns around key issues and events
(e.g. G20, AFC Asian Cup, ANZAC Day, Queensland state election)
– Correlation of activity patterns across datasets
– Computational modelling of patterns to identify cross-influence of activities on
different platforms on each other
• Further theory development:
– Ad hoc publics, issue publics, public sphericules in the Australian public sphere
18. http://mappingonlinepublics.net/
@snurb_dot_info
@timhighfield
@stationsarzt
@dpwoodford
@katieprowd
@tsadkowsky
@jeanburgess
@socialmediaQUT – http://socialmedia.qut.edu.au/
This research is funded by the Australian Research Council through Future Fellowship and LIEF
grants FT130100703 and LE140100148.