Twitter, Big Data, and
 the Search for Meaning:
 Methodology in Progress


Associate Professor Axel Bruns
@snurb_dot_info
http://mappingonlinepublics.net/
Queensland University of Technology
WHY TWITTER?

• Researching Twitter:
   – Significant world-wide social network
   – ~500 million accounts (but how many active?)
   – Varied range of uses: from phatic communication to emergency coordination
   – Healthy third-party ecosystem (for now)
   – Strong history of user innovation:
     @replies, #hashtags
   – Flat and open network structure:
     non-reciprocal following, public profiles by default
   – Good API for gathering (big) data for research
NEW MEDIA AND PUBLIC COMMUNICATION:
      MAPPING AUSTRALIAN USER -CREATED CONTENT
             IN ONLINE SOCIAL NETWORKS

•   Australian Research Council (ARC) Discovery Project (2010-13) – $410,000
     –   QUT (Brisbane), Sociomantic Labs (Berlin)
     –   First comprehensive study of Australian social media use
     –   Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr,
         YouTube as ‘networked publics’
     –   Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and
         communication studies – natively digital methods
     –   Studying society with the Internet (Richard Rogers)

      http://mappingonlinepublics.net/
A TWITTER RESEARCH TOOLKIT

• Data Gathering
   – yourTwapperkeeper + in-house crawler

• Data Processing
   – Gawk – open source, multiplatform, programmable command-line tool for
     processing CSV documents

• Textual Analysis
   – Leximancer – commercial, multiplatform: extracts key concepts from large
     corpora of text, examines and visualises concept co-occurrence
   – WordStat – commercial, PC-only text analysis tool; generates concept co-
     occurrence data that can be exported for visualisation

• Visualisation
   – Gephi – open source, multiplatform network visualisation tool
SO NOW WHAT?
#HASHTAGS AS PUBLICS

• #hashtags
   – ‘#’ + keyword makes tweets easily discoverable and marks themes
   – E.g. #ausvotes, #qldfloods, #londonriots, #royalwedding, #euro2012, …


• Publics
   – Attend to matters of shared concern with some level of co-awareness
   – Varied in intensity and temporality
   – Emergent, constituted via discourse & affect


• #hashtag publics
   – Not all hashtags constitute publics; Twitter doesn’t ‘contain’ publics
   – What are the patterns in the dynamics of different hashtag-based publics?
   – What might account for these differences?
#SPILL: 23 JUNE 2010, 6-7 P.M.




http://mappingonlinepublics.net/2010/12/30/visualising-twitter-dynamics-in-gephi-part-2/
#SPILL: 23 JUNE 2010, 7-8 P.M.
#SPILL: 23 JUNE 2010, 8-9 P.M.
BUT WHY?

• Possible research questions:
   – Ad hoc events and publics:
       • How do online publics form and dissolve? How do they interact, what
         structures do they form?
       • Where do they draw information from? What do they share?
       • Do they simply consist of the usual suspects? How insular and disconnected
         are online publics?
   – Hashtags in context:
       • How do different hashtag events compare? Are there common types of
         hashtags/publics?
       • How ‘big’ are they? What topics attract attention on Twitter?
       • What community (?) structures emerge?
DEVELOPING TWITTER METRICS

• Key data points available through the Twitter API:
    –   text:                contents of the tweet itself, in 140 characters or less
    –   to_user_id:          numerical ID of the tweet recipient (for @replies)
    –   from_user:           screen name of the tweet sender
    –   id:                  numerical ID of the tweet itself
    –   from_user_id:        numerical ID of the tweet sender
    –   iso_language_code:   code (e.g. en, de, fr, ...) of the sender’s default language
    –   source:              client software used to tweet (e.g. Web, Tweetdeck, ...)
    –   profile_image_url:   URL of the tweet sender’s profile picture
    –   geo_type:            format of the sender’s geographical coordinates
    –   geo_coordinates_0:   first element of the geographical coordinates
    –   geo_coordinates_1:   second element of the geographical coordinates
    –   created_at:          tweet timestamp in human-readable format
    –   time:                tweet timestamp as a numerical Unix timestamp
DEVELOPING TWITTER METRICS

• Additional data points from tweets:
    – original tweets:          tweets which are neither @reply nor retweet
    – retweets:                 tweets which contain RT @user… (or similar)
         • unedited retweets:            retweets which start with RT @user…
         • edited retweets:              retweets do not start with RT @user…
    – genuine @replies:         tweets which contain @user, but are not retweets
    – URL sharing:              tweets which contain URLs


• Potential uses:
    –   metrics per hashtag
    –   metrics per timeframe (day, hour, minute, second, …)
    –   metrics per user (or group of users)
    –   …
                                                          (Bruns & Stieglitz, forthcoming)
#QLDFLOODS @REPLIES
         authorities




                       mainstream
                         media
#ROYALWEDDING
#AUSPOL (FEB.-DEC. 2011)
HASHTAG METRICS
TOWARDS A TYPOLOGY
                   OF TWITTER USES
• How are hashtags used (during acute events)?
   – Gatewatching:
        • Finding and sharing information about breaking news (before the
          mainstream media do?)
        • Ad hoc publics: many URLs, many retweets (even unedited)
   – Audiencing:
        • Shared experience of major (foreseen) events
        • Imagined community of fellow participants: few URLs, limited retweeting
• What other uses are there?
   –   Continuing discussions (#auspol, #bundesliga, …)
   –   Memes (#ghettohurricanenames, …)
   –   Emotive hashtags (#fail, #win, #headdesk, …)
   –   What about keywords?
BEYOND HASHTAGS

• Publics on Twitter:
     – Micro:    @reply and retweet conversations
     – Meso:     follower/followee networks
     – Macro:    hashtag ‘communities’              (Bruns & Moe, forthcoming)


 Multiple overlapping publics / networks

•   What drives their formation and dissipation?
•   How do they interact and interweave?
•   How are they interleaved with the wider media ecology?
•   Twitter doesn’t contain publics: publics transcend Twitter
UNDERSTANDING AUSTRALIAN TWITTER USE

• What is the Australian Twitter userbase?
   – Large-scale snowballing project
   – Starting from selected hashtag communities
     (e.g. #ausvotes, #qldfloods, #masterchef)
   – Identifying participating users, testing for ‘Australianness’:
        • Timezone setting, location information, profile information
   – Retrieving follower/followee information for each account (very slow)


• Progress update:
   – ~1.06 million Australian users identified so far
    ~2 million Australian users in total?
THE AUSTRALIAN TWITTERSPHERE?




                     Follower/followee network:
                     ~120,000 Australian Twitter users
                     (of ~950,000 known accounts by early 2012)
                     colour = outdegree, size = indegree
Real Estate
                                                             Jobs
                                              Property
                                                              HR
                                                           Business
                                                                                    Parenting

                      THEMATIC CLUSTERS
                                                             Business            Mums     Craft
                        Design
                                        Social Media         Property                     Arts
                         Web
                       Creative             Tech                                                        Food
        Perth                                PR                                                                         Wine
    Marketing / PR                       Advertising
                                 IT
                                                                                                                 Beer
                                Tech
                                                                      Creative
                                         Social
                                                                       Design
                                          ICTs
                           NGOs                                                          Fashion
                                                     Utilities
         Farming        Social Policy                                                    Beauty
                                                     Services
        Agriculture                                 Net Culture
                                                                                                            Adelaide
                                      Opinion          Books          Theatre
                   Greens              News          Literature    Film Arts
                                                     Publishing
                    ALP
  Hardline      Progressives
                               News       @KRuddMP
Conservatives
                                       @JuliaGillard       Radio
           Conservatives                                    TV                        Music
            Journalists                                                  Triple J
                                                            Talkback
                                                                                          Dance
                                                          Breakfast TV
                                                                                         Hip Hop
                                                  Cycling Celebrities
                                  Union
    Evangelicals                             Swimming
                                       NRL          V8s

                            Football                                                                        Teens
                                                                       Christians
                                  Cricket                    Teaching Hillsong
                                              AFL           e-Learning
                                                              Schools                         Jonas Bros.
                                                                                               Beliebers
#AUSPOL




          Follower/followee network:
          ~120,000 Australian Twitter users
          (of ~950,000 known accounts by early 2012)
          colour = #auspol tweets, size = indegree
#AUSVOTES




            Follower/followee network:
            ~120,000 Australian Twitter users
            (of ~950,000 known accounts by early 2012)
            colour = #ausvotes tweets, size = indegree
#ROYALWEDDING




            Follower/followee network:
            ~120,000 Australian Twitter users
            (of ~950,000 known accounts by early 2012)
            colour = #royalwedding tweets, size = indeg.
ABC.NET.AU URLS




              Follower/followee network:
              ~120,000 Australian Twitter users
              (of ~950,000 known accounts by early 2012)
              colour = tweets with URLs, size = indegree
AUSTRALIAN TWITTER NEWS INDEX

• ATNIX:
   – Tracking tweets which link to 29 key Australian news / opinion sites
     (even if URLs are shortened: e.g. t.co  bit.ly  ow.ly  abc.net.au)
   – Regular processing and evaluation


• Potential uses:
   –   Examination of general market share
   –   Impact of key events and stories
   –   Tracking of specific articles
   –   Examination of retweet chains for new URLs – how does news
       disseminate?

   – Coming soon: DeTNIX (Germany), others?
TWITTER AND/IN THE MEDIA ECOLOGY
TWITTER AND/IN THE MEDIA ECOLOGY
UNDERSTANDING TWITTER PUBLICS

• #hashtags:
   –   Useful coordinating mechanism for core discussion
   –   Relatively easy to capture and analyse
   –   Fails to capture non-hashtagged tweets about the topic
   –   Good case studies, but very little comparative work to date


• National / global Twittersphere maps
   –   Crucial contextual baseline for #hashtag case studies
   –   Slow and laborious data gathering process, never complete
   –   Very long-term perspective, beyond most funded projects
   –   Indispensable for study of Twitter as a public space
‘BIG DATA’ AND THE DIGITAL HUMANITIES

• Emerging needs in Twitter research:
    – Unified, compatible methods and metrics for Twitter analysis
         Tools and approaches shared at http://mappingonlinepublics.net/
    – Powerful infrastructure for long-term, high-volume tracking of public
      communication on Twitter
         Data access requires substantial funding stream
    – Facilities for long-term data storage and preservation
         Key roles for National Libraries, National Archives
    – Integration with related datasets (e.g. MSM content)
         Need to address data interoperability questions


• Twitter as a test case for digital humanities research
    – Widespread, open, public platform for everyday communication
    – Tool for observing society at scale through Internet research
‘BIG DATA’ AND STUDENT SKILLS

• Students need interdisciplinary skill sets:
    –   Media & communication to understand the media environment
    –   Maths and statistics to deal with ‘big data’
    –   Computer science to develop tools to process social media data
    –   Communication design to develop effective visualisations
    –   Writing and communication skills to communicate the results
    –   …

    – Where do we find them?
         (few people have such a diverse range of skills)
    – How do we support their work?
         (we’re only just developing our methods and tools)
    – What is our strategy for dealing with precarity?
         (sudden API changes, changing fortunes of platforms, …)
http://mappingonlinepublics.net/
@snurb_dot_info
@jeanburgess
@_StephenH
@DrTNitins
@timhighfield
@cdtavijit

Twitter, Big Data, and the Search for Meaning: Methodology in Progress

  • 1.
    Twitter, Big Data,and the Search for Meaning: Methodology in Progress Associate Professor Axel Bruns @snurb_dot_info http://mappingonlinepublics.net/ Queensland University of Technology
  • 2.
    WHY TWITTER? • ResearchingTwitter: – Significant world-wide social network – ~500 million accounts (but how many active?) – Varied range of uses: from phatic communication to emergency coordination – Healthy third-party ecosystem (for now) – Strong history of user innovation: @replies, #hashtags – Flat and open network structure: non-reciprocal following, public profiles by default – Good API for gathering (big) data for research
  • 3.
    NEW MEDIA ANDPUBLIC COMMUNICATION: MAPPING AUSTRALIAN USER -CREATED CONTENT IN ONLINE SOCIAL NETWORKS • Australian Research Council (ARC) Discovery Project (2010-13) – $410,000 – QUT (Brisbane), Sociomantic Labs (Berlin) – First comprehensive study of Australian social media use – Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr, YouTube as ‘networked publics’ – Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and communication studies – natively digital methods – Studying society with the Internet (Richard Rogers)  http://mappingonlinepublics.net/
  • 4.
    A TWITTER RESEARCHTOOLKIT • Data Gathering – yourTwapperkeeper + in-house crawler • Data Processing – Gawk – open source, multiplatform, programmable command-line tool for processing CSV documents • Textual Analysis – Leximancer – commercial, multiplatform: extracts key concepts from large corpora of text, examines and visualises concept co-occurrence – WordStat – commercial, PC-only text analysis tool; generates concept co- occurrence data that can be exported for visualisation • Visualisation – Gephi – open source, multiplatform network visualisation tool
  • 5.
  • 6.
    #HASHTAGS AS PUBLICS •#hashtags – ‘#’ + keyword makes tweets easily discoverable and marks themes – E.g. #ausvotes, #qldfloods, #londonriots, #royalwedding, #euro2012, … • Publics – Attend to matters of shared concern with some level of co-awareness – Varied in intensity and temporality – Emergent, constituted via discourse & affect • #hashtag publics – Not all hashtags constitute publics; Twitter doesn’t ‘contain’ publics – What are the patterns in the dynamics of different hashtag-based publics? – What might account for these differences?
  • 7.
    #SPILL: 23 JUNE2010, 6-7 P.M. http://mappingonlinepublics.net/2010/12/30/visualising-twitter-dynamics-in-gephi-part-2/
  • 8.
    #SPILL: 23 JUNE2010, 7-8 P.M.
  • 9.
    #SPILL: 23 JUNE2010, 8-9 P.M.
  • 10.
    BUT WHY? • Possibleresearch questions: – Ad hoc events and publics: • How do online publics form and dissolve? How do they interact, what structures do they form? • Where do they draw information from? What do they share? • Do they simply consist of the usual suspects? How insular and disconnected are online publics? – Hashtags in context: • How do different hashtag events compare? Are there common types of hashtags/publics? • How ‘big’ are they? What topics attract attention on Twitter? • What community (?) structures emerge?
  • 11.
    DEVELOPING TWITTER METRICS •Key data points available through the Twitter API: – text: contents of the tweet itself, in 140 characters or less – to_user_id: numerical ID of the tweet recipient (for @replies) – from_user: screen name of the tweet sender – id: numerical ID of the tweet itself – from_user_id: numerical ID of the tweet sender – iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language – source: client software used to tweet (e.g. Web, Tweetdeck, ...) – profile_image_url: URL of the tweet sender’s profile picture – geo_type: format of the sender’s geographical coordinates – geo_coordinates_0: first element of the geographical coordinates – geo_coordinates_1: second element of the geographical coordinates – created_at: tweet timestamp in human-readable format – time: tweet timestamp as a numerical Unix timestamp
  • 12.
    DEVELOPING TWITTER METRICS •Additional data points from tweets: – original tweets: tweets which are neither @reply nor retweet – retweets: tweets which contain RT @user… (or similar) • unedited retweets: retweets which start with RT @user… • edited retweets: retweets do not start with RT @user… – genuine @replies: tweets which contain @user, but are not retweets – URL sharing: tweets which contain URLs • Potential uses: – metrics per hashtag – metrics per timeframe (day, hour, minute, second, …) – metrics per user (or group of users) – … (Bruns & Stieglitz, forthcoming)
  • 13.
    #QLDFLOODS @REPLIES authorities mainstream media
  • 14.
  • 15.
  • 16.
  • 17.
    TOWARDS A TYPOLOGY OF TWITTER USES • How are hashtags used (during acute events)? – Gatewatching: • Finding and sharing information about breaking news (before the mainstream media do?) • Ad hoc publics: many URLs, many retweets (even unedited) – Audiencing: • Shared experience of major (foreseen) events • Imagined community of fellow participants: few URLs, limited retweeting • What other uses are there? – Continuing discussions (#auspol, #bundesliga, …) – Memes (#ghettohurricanenames, …) – Emotive hashtags (#fail, #win, #headdesk, …) – What about keywords?
  • 18.
    BEYOND HASHTAGS • Publicson Twitter: – Micro: @reply and retweet conversations – Meso: follower/followee networks – Macro: hashtag ‘communities’ (Bruns & Moe, forthcoming)  Multiple overlapping publics / networks • What drives their formation and dissipation? • How do they interact and interweave? • How are they interleaved with the wider media ecology? • Twitter doesn’t contain publics: publics transcend Twitter
  • 19.
    UNDERSTANDING AUSTRALIAN TWITTERUSE • What is the Australian Twitter userbase? – Large-scale snowballing project – Starting from selected hashtag communities (e.g. #ausvotes, #qldfloods, #masterchef) – Identifying participating users, testing for ‘Australianness’: • Timezone setting, location information, profile information – Retrieving follower/followee information for each account (very slow) • Progress update: – ~1.06 million Australian users identified so far  ~2 million Australian users in total?
  • 20.
    THE AUSTRALIAN TWITTERSPHERE? Follower/followee network: ~120,000 Australian Twitter users (of ~950,000 known accounts by early 2012) colour = outdegree, size = indegree
  • 21.
    Real Estate Jobs Property HR Business Parenting THEMATIC CLUSTERS Business Mums Craft Design Social Media Property Arts Web Creative Tech Food Perth PR Wine Marketing / PR Advertising IT Beer Tech Creative Social Design ICTs NGOs Fashion Utilities Farming Social Policy Beauty Services Agriculture Net Culture Adelaide Opinion Books Theatre Greens News Literature Film Arts Publishing ALP Hardline Progressives News @KRuddMP Conservatives @JuliaGillard Radio Conservatives TV Music Journalists Triple J Talkback Dance Breakfast TV Hip Hop Cycling Celebrities Union Evangelicals Swimming NRL V8s Football Teens Christians Cricket Teaching Hillsong AFL e-Learning Schools Jonas Bros. Beliebers
  • 22.
    #AUSPOL Follower/followee network: ~120,000 Australian Twitter users (of ~950,000 known accounts by early 2012) colour = #auspol tweets, size = indegree
  • 23.
    #AUSVOTES Follower/followee network: ~120,000 Australian Twitter users (of ~950,000 known accounts by early 2012) colour = #ausvotes tweets, size = indegree
  • 24.
    #ROYALWEDDING Follower/followee network: ~120,000 Australian Twitter users (of ~950,000 known accounts by early 2012) colour = #royalwedding tweets, size = indeg.
  • 25.
    ABC.NET.AU URLS Follower/followee network: ~120,000 Australian Twitter users (of ~950,000 known accounts by early 2012) colour = tweets with URLs, size = indegree
  • 26.
    AUSTRALIAN TWITTER NEWSINDEX • ATNIX: – Tracking tweets which link to 29 key Australian news / opinion sites (even if URLs are shortened: e.g. t.co  bit.ly  ow.ly  abc.net.au) – Regular processing and evaluation • Potential uses: – Examination of general market share – Impact of key events and stories – Tracking of specific articles – Examination of retweet chains for new URLs – how does news disseminate? – Coming soon: DeTNIX (Germany), others?
  • 27.
    TWITTER AND/IN THEMEDIA ECOLOGY
  • 28.
    TWITTER AND/IN THEMEDIA ECOLOGY
  • 29.
    UNDERSTANDING TWITTER PUBLICS •#hashtags: – Useful coordinating mechanism for core discussion – Relatively easy to capture and analyse – Fails to capture non-hashtagged tweets about the topic – Good case studies, but very little comparative work to date • National / global Twittersphere maps – Crucial contextual baseline for #hashtag case studies – Slow and laborious data gathering process, never complete – Very long-term perspective, beyond most funded projects – Indispensable for study of Twitter as a public space
  • 30.
    ‘BIG DATA’ ANDTHE DIGITAL HUMANITIES • Emerging needs in Twitter research: – Unified, compatible methods and metrics for Twitter analysis  Tools and approaches shared at http://mappingonlinepublics.net/ – Powerful infrastructure for long-term, high-volume tracking of public communication on Twitter  Data access requires substantial funding stream – Facilities for long-term data storage and preservation  Key roles for National Libraries, National Archives – Integration with related datasets (e.g. MSM content)  Need to address data interoperability questions • Twitter as a test case for digital humanities research – Widespread, open, public platform for everyday communication – Tool for observing society at scale through Internet research
  • 31.
    ‘BIG DATA’ ANDSTUDENT SKILLS • Students need interdisciplinary skill sets: – Media & communication to understand the media environment – Maths and statistics to deal with ‘big data’ – Computer science to develop tools to process social media data – Communication design to develop effective visualisations – Writing and communication skills to communicate the results – … – Where do we find them? (few people have such a diverse range of skills) – How do we support their work? (we’re only just developing our methods and tools) – What is our strategy for dealing with precarity? (sudden API changes, changing fortunes of platforms, …)
  • 32.