New Methodologies for Capturing and Working with Publicly Available Twitter Data
Upcoming SlideShare
Loading in...5
×
 

New Methodologies for Capturing and Working with Publicly Available Twitter Data

on

  • 932 views

Paper presented at the Association of Internet Researchers conference, Salford, 19-21 Oct. 2012.

Paper presented at the Association of Internet Researchers conference, Salford, 19-21 Oct. 2012.

Statistics

Views

Total Views
932
Views on SlideShare
902
Embed Views
30

Actions

Likes
0
Downloads
9
Comments
0

5 Embeds 30

http://snurb.info 21
https://twitter.com 4
http://www.linkedin.com 3
https://si0.twimg.com 1
http://tweetedtimes.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

New Methodologies for Capturing and Working with Publicly Available Twitter Data New Methodologies for Capturing and Working with Publicly Available Twitter Data Presentation Transcript

  • New Methodologies for Capturing and Working with Publicly Available Twitter DataAssociate Professor Axel Bruns@snurb_dot_infohttp://mappingonlinepublics.net/Queensland University of Technology
  • WHY TWITTER?• Researching Twitter: – Significant world-wide social network – ~500 million accounts (but how many active?) – Varied range of uses: from phatic communication to emergency coordination – Healthy third-party ecosystem (for now) – Strong history of user innovation: @replies, #hashtags – Flat and open network structure: non-reciprocal following, public profiles by default – Good API for gathering (big) data for research
  • NEW MEDIA AND PUBLIC COMMUNICATION: MAPPING AUSTRALIAN USER -CREATED CONTENT IN ONLINE SOCIAL NETWORKS• Australian Research Council (ARC) Discovery Project (2010-13) – $410,000 – QUT (Brisbane), Sociomantic Labs (Berlin) – First comprehensive study of Australian social media use – Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr, YouTube as ‘networked publics’ – Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and communication studies – natively digital methods – Studying society with the Internet (Richard Rogers)  http://mappingonlinepublics.net/
  • A TWITTER RESEARCH TOOLKIT• Data Gathering – yourTwapperkeeper + in-house crawler• Data Processing – Gawk – open source, multiplatform, programmable command-line tool for processing CSV documents• Textual Analysis – Leximancer – commercial, multiplatform: extracts key concepts from large corpora of text, examines and visualises concept co-occurrence – WordStat – commercial, PC-only text analysis tool; generates concept co- occurrence data that can be exported for visualisation• Visualisation – Gephi – open source, multiplatform network visualisation tool
  • SO NOW WHAT?
  • APPROACHING TWITTER• Possible research questions: – Hashtags as vehicles for ad hoc events and publics: • How do online publics form and dissolve? How do they interact, what structures do they form? • Where do they draw information from? What do they share? • Do they simply consist of the usual suspects? How insular and disconnected are online publics? – Hashtags in context: • How do different hashtag events compare? Are there common types of hashtags/publics? • How ‘big’ are they? What topics attract attention on Twitter? • What community (?) structures emerge?
  • DEVELOPING TWITTER METRICS• Key data points available through the Twitter API: – text: contents of the tweet itself, in 140 characters or less – to_user_id: numerical ID of the tweet recipient (for @replies) – from_user: screen name of the tweet sender – id: numerical ID of the tweet itself – from_user_id: numerical ID of the tweet sender – iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language – source: client software used to tweet (e.g. Web, Tweetdeck, ...) – profile_image_url: URL of the tweet sender’s profile picture – geo_type: format of the sender’s geographical coordinates – geo_coordinates_0: first element of the geographical coordinates – geo_coordinates_1: second element of the geographical coordinates – created_at: tweet timestamp in human-readable format – time: tweet timestamp as a numerical Unix timestamp
  • DEVELOPING TWITTER METRICS• Additional data points from tweets: – original tweets: tweets which are neither @reply nor retweet – retweets: tweets which contain RT @user… (or similar) • unedited retweets: retweets which start with RT @user… • edited retweets: retweets do not start with RT @user… – genuine @replies: tweets which contain @user, but are not retweets – URL sharing: tweets which contain URLs• Potential uses: – metrics per hashtag – metrics per timeframe (day, hour, minute, second, …) – metrics per user (or group of users) – … (Bruns & Stieglitz, forthcoming)
  • #QLDFLOODS @REPLIES authorities mainstream media
  • #ROYALWEDDING
  • #AUSPOL (FEB.-DEC. 2011)
  • HASHTAG METRICS
  • BEYOND HASHTAGS• Publics on Twitter: – Micro: @reply and retweet conversations – Meso: follower/followee networks – Macro: hashtag ‘communities’ (Bruns & Moe, forthcoming) Multiple overlapping publics / networks• What drives their formation and dissipation?• How do they interact and interweave?• How are they interleaved with the wider media ecology?• Twitter doesn’t contain publics: publics transcend Twitter
  • ‘BIG DATA’ AND THE DIGITAL HUMANITIES• Emerging needs in Twitter research: – Unified, compatible methods and metrics for Twitter analysis  Tools and approaches shared at http://mappingonlinepublics.net/ – Powerful infrastructure for long-term, high-volume tracking of public communication on Twitter  Data access requires substantial funding stream – Facilities for long-term data storage and preservation  Key roles for National Libraries, National Archives – Integration with related datasets (e.g. MSM content)  Need to address data interoperability questions – Robust frameworks for Internet research ethics  Clear guidelines which take into account complex new public/private structures• Twitter as a test case for digital humanities research – Widespread, open, public platform for everyday communication – Tool for observing society at scale through Internet research
  • http://mappingonlinepublics.net/@snurb_dot_info@jeanburgess@_StephenH@DrTNitins@timhighfield@cdtavijit