‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources

Axel Bruns
Axel BrunsProfessor at Digital Media Research Centre, Queensland University of Technology
‘Big Soc ial Data’ in Context: 
Connecting Social Media Data 
and Other Sources 
Axel Bruns and Tim Highfield 
Social Media Research Group 
Queensland University of Technology 
Brisbane, Australia 
a.bruns / t.highfield @ qut.edu.au 
@snurb_dot_info / @timhighfield
THE PROMISE OF BIG SOCIAL DATA 
• Social media and big data: 
– Substantial growth in social media usage 
– User activity generates data and metadata 
– Readily accessible through APIs 
– New tools for processing and visualising big data at scale 
• Emergence of social media analytics: 
– Large-scale tracking of public user activities 
– ‘Trending topics’, user sentiment, network influencers 
– Scholarly and commercial research 
– A ‘computational turn’ towards the digital humanities (David Berry) 
– Ethical concerns around profiling and content ownership
BIG DATA AND SOCIETY 
• New methodologies: 
– Empirical, large-scale, real-time investigation 
– Data-led, comprehensive evaluation rather than small-scale sampling of public 
communication 
– But also: combined quantitative/qualitative approaches 
– Not studying the Internet, but studying society with the Internet (Richard Rogers) 
• Applications: 
– Political engagement, especially during elections, crises, scandals 
– Crisis communication during natural and human-made disasters 
– Engagement with mainstream media: watching, reading, sharing, … 
– Brand communication, especially during brand crises 
– Identification of earthquakes (USGS), tracking of epidemias (Google) 
– …
SOCIAL MEDIA AND BEYOND 
• Facebook, Twitter: 
– Useful but highly particular areas of online activity 
– Not necessarily generalisable to overall activity patterns 
– Current research approaches and API limitations introduce further biases 
• E.g. publics on Twitter: 
– Micro: @reply and retweet conversations 
– Meso: follower/followee networks 
– Macro: #hashtag ‘communities’ (Bruns & Moe, 2014) 
• Key needs in Twitter research: 
– Situation of hashtags in wider communicative ecology on Twitter 
– Day-to-day uses of Twitter, beyond and outside hashtags 
– Dynamics of everyday quasi-private, interpersonal, and/or public communication 
– Track impact of social and technological changes on these uses
BIG DATA, RARE DATA? 
• The political economy of social media research: 
– API-based data access is shaped to privilege certain approaches 
– Research funding is easier to obtain for specific, limited purposes 
– Longitudinal, ‘big’ data access requires ongoing, substantial funding and 
infrastructure 
– Exploratory, data-driven research is difficult to sell to most funding bodies 
– Also related to divergent resources available to different scholarly disciplines 
• Most ‘difficult’ large-scale social media research is conducted by Facebook / 
Twitter and commercial research institutes
RESEARCH PROJECT 
• ARC Future Fellowship: 
– Four-year project, $876,973 
– Axel Bruns (FF), Tim Highfield (Postdoc), Felix Münch (PhD1, 2014-2017); 
PhD2 (2015-2018) – enquire within 
At the intersection of mainstream, niche, and social media, the 
processes by which public opinion forms and public debate unfolds 
are increasingly complex, and poorly understood. This project draws 
on large datasets and innovative methods to develop a new model of 
the Australian online public sphere. 
• Also supported by ARC LIEF project: 
– Two-year project (2014/15; QUT, Curtin, Deakin, Swinburne) to develop 
comprehensive infrastructure for large-scale social media data analytics
WHAT DATA SOURCES ARE THERE? 
• Data sources: 
– Facebook: user engagement with public pages (profile activity is semi-private) 
– Twitter: 
• hashtag, keyword, URL sharing datasets (public accounts only) 
• Australian network data; Australian firehose (public accounts only) 
– Other social media sources… 
– Experian Hitwise: 
• Australian Web browsing data (ISP-level, anonymous and opt-in panels, 1.5m users) 
• Australian Web searching data (same methodology) 
– Proprietary datasets: 
• Website analytics for major news sites (e.g. News Ltd. / Fairfax Digital sites) 
– Mainstream media monitoring: 
• Content databases for mainstream media coverage
PROJECT AGENDA 
• Data sources: 
– Australian Internet browse / search patterns (Experian Hitwise) 
– Online news media reading patterns (Fairfax Digital) 
– Big social data on news sharing via social media (ARC LIEF) 
• Multiple overlapping publics / networks: 
– What drives their formation and dissipation? 
– How do they interact and interweave? 
– How are they interleaved with the wider media ecology? 
– Social media do not contain publics: publics transcend social media
RESEARCH AIMS 
• Methodological development (Y1): 
– How do we process and integrate these data? 
• Standard methods for gathering, processing, storing, analysing datasets 
• Regular, automated workflows 
• Short term (Y2): 
– What happens as news breaks? 
• Search, browsing, reading, sharing patterns 
• Formation of ad hoc publics 
• Medium term (Y3): 
– How do themes, topics, actors wax and wane? 
• Prominence in user activities 
• Development of stable issue publics 
• Long term (Y4): 
– How do these patterns affect public opinion formation? 
• Predictable patterns, stable networks of interaction 
• Structural analysis of the online public sphere
HITWISE: NEWS SEARCHING TRENDS
HITWISE: NEWS BROWSING TRENDS
TWITTER: NEWS SHARING TRENDS
Education 
Agriculture 
Literature 
Adelaide / SA 
Food 
Wine 
Beer 
Leftists Hard Right 
Netizens 
Politics 
Journalists 
Marketing 
Mums PR 
Parenting 
Real Estate 
Investing 
Home Business 
Sole Traders 
Self-Help 
HR / Support 
NRL 
Followback 
Urban Media 
Utilities 
Advertising 
Business 
TV 
Fashion 
Beauty 
Arts 
Cinema 
News 
TalkbackCycling 
Music 
V8s 
UFC 
AFL 
Football 
Horse Racing 
Cricket 
NRU 
Celebrities 
Hillsong 
Perth 
Pop 
Media 
Teen Idols 
Cody Simpson 
THE AUSTRALIAN TWITTERSPHERE 
~140k Australian accounts with 
degree > 1000, as of Sep. 2013
Q&A (3 SEP. TO 7 OCT. 2014)
ABC NEWS (JUNE 2012 TO SEP. 2014)
DAILY TELEGRAPH (JUNE 2012 TO SEP. 2014)
NEXT STEPS 
• Further data points: 
– More detailed data on search patterns (Experian Hitwise) 
– Readership patterns (Fairfax Digital sites) 
– Facebook audience engagement patterns with news pages 
• Further analytical approaches: 
– Activity patterns around key issues and events 
(e.g. G20, AFC Asian Cup, ANZAC Day, Queensland state election) 
– Correlation of activity patterns across datasets 
– Computational modelling of patterns to identify cross-influence of activities on 
different platforms on each other 
• Further theory development: 
– Ad hoc publics, issue publics, public sphericules in the Australian public sphere
http://mappingonlinepublics.net/ 
@snurb_dot_info 
@timhighfield 
@stationsarzt 
@dpwoodford 
@katieprowd 
@tsadkowsky 
@jeanburgess 
@socialmediaQUT – http://socialmedia.qut.edu.au/ 
This research is funded by the Australian Research Council through Future Fellowship and LIEF 
grants FT130100703 and LE140100148.
1 of 18

More Related Content

What's hot(20)

Political Uses of Social MediaPolitical Uses of Social Media
Political Uses of Social Media
Axel Bruns1K views

Similar to ‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources(20)

Science communication via social mediaScience communication via social media
Science communication via social media
Simon Schneider1.4K views
Science communication via social mediaScience communication via social media
Science communication via social media
Simon Schneider314 views
Open analytics   social media frameworkOpen analytics   social media framework
Open analytics social media framework
Open Analytics3.4K views

More from Axel Bruns(20)

‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources

  • 1. ‘Big Soc ial Data’ in Context: Connecting Social Media Data and Other Sources Axel Bruns and Tim Highfield Social Media Research Group Queensland University of Technology Brisbane, Australia a.bruns / t.highfield @ qut.edu.au @snurb_dot_info / @timhighfield
  • 2. THE PROMISE OF BIG SOCIAL DATA • Social media and big data: – Substantial growth in social media usage – User activity generates data and metadata – Readily accessible through APIs – New tools for processing and visualising big data at scale • Emergence of social media analytics: – Large-scale tracking of public user activities – ‘Trending topics’, user sentiment, network influencers – Scholarly and commercial research – A ‘computational turn’ towards the digital humanities (David Berry) – Ethical concerns around profiling and content ownership
  • 3. BIG DATA AND SOCIETY • New methodologies: – Empirical, large-scale, real-time investigation – Data-led, comprehensive evaluation rather than small-scale sampling of public communication – But also: combined quantitative/qualitative approaches – Not studying the Internet, but studying society with the Internet (Richard Rogers) • Applications: – Political engagement, especially during elections, crises, scandals – Crisis communication during natural and human-made disasters – Engagement with mainstream media: watching, reading, sharing, … – Brand communication, especially during brand crises – Identification of earthquakes (USGS), tracking of epidemias (Google) – …
  • 4. SOCIAL MEDIA AND BEYOND • Facebook, Twitter: – Useful but highly particular areas of online activity – Not necessarily generalisable to overall activity patterns – Current research approaches and API limitations introduce further biases • E.g. publics on Twitter: – Micro: @reply and retweet conversations – Meso: follower/followee networks – Macro: #hashtag ‘communities’ (Bruns & Moe, 2014) • Key needs in Twitter research: – Situation of hashtags in wider communicative ecology on Twitter – Day-to-day uses of Twitter, beyond and outside hashtags – Dynamics of everyday quasi-private, interpersonal, and/or public communication – Track impact of social and technological changes on these uses
  • 5. BIG DATA, RARE DATA? • The political economy of social media research: – API-based data access is shaped to privilege certain approaches – Research funding is easier to obtain for specific, limited purposes – Longitudinal, ‘big’ data access requires ongoing, substantial funding and infrastructure – Exploratory, data-driven research is difficult to sell to most funding bodies – Also related to divergent resources available to different scholarly disciplines • Most ‘difficult’ large-scale social media research is conducted by Facebook / Twitter and commercial research institutes
  • 6. RESEARCH PROJECT • ARC Future Fellowship: – Four-year project, $876,973 – Axel Bruns (FF), Tim Highfield (Postdoc), Felix Münch (PhD1, 2014-2017); PhD2 (2015-2018) – enquire within At the intersection of mainstream, niche, and social media, the processes by which public opinion forms and public debate unfolds are increasingly complex, and poorly understood. This project draws on large datasets and innovative methods to develop a new model of the Australian online public sphere. • Also supported by ARC LIEF project: – Two-year project (2014/15; QUT, Curtin, Deakin, Swinburne) to develop comprehensive infrastructure for large-scale social media data analytics
  • 7. WHAT DATA SOURCES ARE THERE? • Data sources: – Facebook: user engagement with public pages (profile activity is semi-private) – Twitter: • hashtag, keyword, URL sharing datasets (public accounts only) • Australian network data; Australian firehose (public accounts only) – Other social media sources… – Experian Hitwise: • Australian Web browsing data (ISP-level, anonymous and opt-in panels, 1.5m users) • Australian Web searching data (same methodology) – Proprietary datasets: • Website analytics for major news sites (e.g. News Ltd. / Fairfax Digital sites) – Mainstream media monitoring: • Content databases for mainstream media coverage
  • 8. PROJECT AGENDA • Data sources: – Australian Internet browse / search patterns (Experian Hitwise) – Online news media reading patterns (Fairfax Digital) – Big social data on news sharing via social media (ARC LIEF) • Multiple overlapping publics / networks: – What drives their formation and dissipation? – How do they interact and interweave? – How are they interleaved with the wider media ecology? – Social media do not contain publics: publics transcend social media
  • 9. RESEARCH AIMS • Methodological development (Y1): – How do we process and integrate these data? • Standard methods for gathering, processing, storing, analysing datasets • Regular, automated workflows • Short term (Y2): – What happens as news breaks? • Search, browsing, reading, sharing patterns • Formation of ad hoc publics • Medium term (Y3): – How do themes, topics, actors wax and wane? • Prominence in user activities • Development of stable issue publics • Long term (Y4): – How do these patterns affect public opinion formation? • Predictable patterns, stable networks of interaction • Structural analysis of the online public sphere
  • 13. Education Agriculture Literature Adelaide / SA Food Wine Beer Leftists Hard Right Netizens Politics Journalists Marketing Mums PR Parenting Real Estate Investing Home Business Sole Traders Self-Help HR / Support NRL Followback Urban Media Utilities Advertising Business TV Fashion Beauty Arts Cinema News TalkbackCycling Music V8s UFC AFL Football Horse Racing Cricket NRU Celebrities Hillsong Perth Pop Media Teen Idols Cody Simpson THE AUSTRALIAN TWITTERSPHERE ~140k Australian accounts with degree > 1000, as of Sep. 2013
  • 14. Q&A (3 SEP. TO 7 OCT. 2014)
  • 15. ABC NEWS (JUNE 2012 TO SEP. 2014)
  • 16. DAILY TELEGRAPH (JUNE 2012 TO SEP. 2014)
  • 17. NEXT STEPS • Further data points: – More detailed data on search patterns (Experian Hitwise) – Readership patterns (Fairfax Digital sites) – Facebook audience engagement patterns with news pages • Further analytical approaches: – Activity patterns around key issues and events (e.g. G20, AFC Asian Cup, ANZAC Day, Queensland state election) – Correlation of activity patterns across datasets – Computational modelling of patterns to identify cross-influence of activities on different platforms on each other • Further theory development: – Ad hoc publics, issue publics, public sphericules in the Australian public sphere
  • 18. http://mappingonlinepublics.net/ @snurb_dot_info @timhighfield @stationsarzt @dpwoodford @katieprowd @tsadkowsky @jeanburgess @socialmediaQUT – http://socialmedia.qut.edu.au/ This research is funded by the Australian Research Council through Future Fellowship and LIEF grants FT130100703 and LE140100148.