Transcript of "New Methodologies for Capturing and Working with Publicly Available Twitter Data"
New Methodologies for Capturing and Working with Publicly Available Twitter DataAssociate Professor Axel Bruns@snurb_dot_infohttp://mappingonlinepublics.net/Queensland University of Technology
WHY TWITTER?• Researching Twitter: – Significant world-wide social network – ~500 million accounts (but how many active?) – Varied range of uses: from phatic communication to emergency coordination – Healthy third-party ecosystem (for now) – Strong history of user innovation: @replies, #hashtags – Flat and open network structure: non-reciprocal following, public profiles by default – Good API for gathering (big) data for research
NEW MEDIA AND PUBLIC COMMUNICATION: MAPPING AUSTRALIAN USER -CREATED CONTENT IN ONLINE SOCIAL NETWORKS• Australian Research Council (ARC) Discovery Project (2010-13) – $410,000 – QUT (Brisbane), Sociomantic Labs (Berlin) – First comprehensive study of Australian social media use – Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr, YouTube as ‘networked publics’ – Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and communication studies – natively digital methods – Studying society with the Internet (Richard Rogers) http://mappingonlinepublics.net/
A TWITTER RESEARCH TOOLKIT• Data Gathering – yourTwapperkeeper + in-house crawler• Data Processing – Gawk – open source, multiplatform, programmable command-line tool for processing CSV documents• Textual Analysis – Leximancer – commercial, multiplatform: extracts key concepts from large corpora of text, examines and visualises concept co-occurrence – WordStat – commercial, PC-only text analysis tool; generates concept co- occurrence data that can be exported for visualisation• Visualisation – Gephi – open source, multiplatform network visualisation tool
APPROACHING TWITTER• Possible research questions: – Hashtags as vehicles for ad hoc events and publics: • How do online publics form and dissolve? How do they interact, what structures do they form? • Where do they draw information from? What do they share? • Do they simply consist of the usual suspects? How insular and disconnected are online publics? – Hashtags in context: • How do different hashtag events compare? Are there common types of hashtags/publics? • How ‘big’ are they? What topics attract attention on Twitter? • What community (?) structures emerge?
DEVELOPING TWITTER METRICS• Key data points available through the Twitter API: – text: contents of the tweet itself, in 140 characters or less – to_user_id: numerical ID of the tweet recipient (for @replies) – from_user: screen name of the tweet sender – id: numerical ID of the tweet itself – from_user_id: numerical ID of the tweet sender – iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language – source: client software used to tweet (e.g. Web, Tweetdeck, ...) – profile_image_url: URL of the tweet sender’s profile picture – geo_type: format of the sender’s geographical coordinates – geo_coordinates_0: first element of the geographical coordinates – geo_coordinates_1: second element of the geographical coordinates – created_at: tweet timestamp in human-readable format – time: tweet timestamp as a numerical Unix timestamp
DEVELOPING TWITTER METRICS• Additional data points from tweets: – original tweets: tweets which are neither @reply nor retweet – retweets: tweets which contain RT @user… (or similar) • unedited retweets: retweets which start with RT @user… • edited retweets: retweets do not start with RT @user… – genuine @replies: tweets which contain @user, but are not retweets – URL sharing: tweets which contain URLs• Potential uses: – metrics per hashtag – metrics per timeframe (day, hour, minute, second, …) – metrics per user (or group of users) – … (Bruns & Stieglitz, forthcoming)
#QLDFLOODS @REPLIES authorities mainstream media
BEYOND HASHTAGS• Publics on Twitter: – Micro: @reply and retweet conversations – Meso: follower/followee networks – Macro: hashtag ‘communities’ (Bruns & Moe, forthcoming) Multiple overlapping publics / networks• What drives their formation and dissipation?• How do they interact and interweave?• How are they interleaved with the wider media ecology?• Twitter doesn’t contain publics: publics transcend Twitter
‘BIG DATA’ AND THE DIGITAL HUMANITIES• Emerging needs in Twitter research: – Unified, compatible methods and metrics for Twitter analysis Tools and approaches shared at http://mappingonlinepublics.net/ – Powerful infrastructure for long-term, high-volume tracking of public communication on Twitter Data access requires substantial funding stream – Facilities for long-term data storage and preservation Key roles for National Libraries, National Archives – Integration with related datasets (e.g. MSM content) Need to address data interoperability questions – Robust frameworks for Internet research ethics Clear guidelines which take into account complex new public/private structures• Twitter as a test case for digital humanities research – Widespread, open, public platform for everyday communication – Tool for observing society at scale through Internet research