Mapping Online Publics<br />Axel Bruns / Jean BurgessARC Centre of Excellence  for Creative Industries and Innovation,  Qu...
Gathering Data<br />Keyword / #hashtag archives<br />Twapperkeeper.com<br />No longer fully functional<br />yourTwapperkee...
Twapperkeeper / yourTwapperkeeper data<br />Typical data format (#ausvotes):<br />
Processing Data<br />Gawk:<br />Command-line tool for processing CSV / TSV data<br />Can use ready-made scripts for comple...
# atextractfromtoonly.awk - Extract @replies for network visualisation<br />#<br /># this script takes a Twapperkeeper CSV...
Running Gawk Scripts<br />Gawk command line execution:<br />Open terminal window<br />Run command:<br />#> gawk -F t -f sc...
Basic #hashtag data: most active users<br />Pivot table in Excel – ‘from_user’ against ‘count of text’<br />
Identifying Time-Based Patterns<br />#> gawk -F t -f scriptsexplodetime.awk input.tsv >output.tsv<br />Output:<br />Additi...
Basic #hashtag data: activity over time<br />Pivot table – ‘day’ against ‘count of text’<br />
Identifying @reply Networks<br />#> gawk -F t -f scriptsatreplyfromtoonly.awk input.tsv >output.tsv<br />Output:<br />Basi...
Basic #hashtag data: @replies received<br />Pivot table – ‘to’ against ‘from’<br />
Basic @reply Network Visualisation<br />Gephi:<br />Open source network visualisation tool – Gephi.org<br />Frequently upd...
Basic @reply Network Visualisation<br />Degree = 100+, colour = outdegree, size = indegree<br />
Tracking Themes (and More) over Time<br />#> gawk -F t -f multifilter.awk search="term1,term2,..." input.tsv >output.tsv<b...
Tracking Themes over Time<br />Pivot table – ‘day’ against keyword bundles, normalised to 100%<br />
Dynamic @reply Network Visualisation<br />Multi-step process:<br />Make sure tweets are in ascending chronological order<b...
http://mappingonlinepublics.net/<br />Image by campoalto<br />@snurb_dot_info<br />@jeanburgess<br />
Upcoming SlideShare
Loading in...5
×

Mapping Online Publics (Part 2)

2,578

Published on

Part 2 of the "Making Sense of Twitter: Quantitative Analysis Using Twapperkeeper and Other Tools" workshop, presented at the Communities & Technologies 2011 conference, Brisbane, 29 June 2011.

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,578
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
42
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Mapping Online Publics (Part 2)

  1. 1. Mapping Online Publics<br />Axel Bruns / Jean BurgessARC Centre of Excellence for Creative Industries and Innovation, Queensland University of Technology<br />a.bruns@qut.edu.au – @snurb_dot_info / je.burgess@qut.edu.au – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/<br />
  2. 2. Gathering Data<br />Keyword / #hashtag archives<br />Twapperkeeper.com<br />No longer fully functional<br />yourTwapperkeeper<br />Open source solution<br />Runs on your own server<br />Use our modifications to be able to export CSV / TSV<br />Uses Twitter streaming API to track keywords<br />Including #hashtags, @mentions<br />
  3. 3. Twapperkeeper / yourTwapperkeeper data<br />Typical data format (#ausvotes):<br />
  4. 4. Processing Data<br />Gawk:<br />Command-line tool for processing CSV / TSV data<br />Can use ready-made scripts for complex processing<br />Vol. 1 of our scripts collection now online at MOP<br />Regular expressions (regex):<br />Key tool for working with Gawk<br />Powerful way of expressing search patterns<br />E.g.: @[A-Za-z0-9_]+ = any @username<br />See online regex primers...<br />
  5. 5. # atextractfromtoonly.awk - Extract @replies for network visualisation<br />#<br /># this script takes a Twapperkeeper CSV/TSV archive of tweets, and reworks it into simple network data for visualisation<br /># the output format for this script is always CSV, to enable import into Gephi and other visualisation tols<br />#<br /># expected data format:<br /># text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type,geo_coordinates_0,geo_coordinates_1,created_at,time<br />#<br /># output format:<br /># from,to<br />#<br /># the script extracts @replies from tweets, and creates duplicates where multiple @replies are<br /># present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in<br /># @user,@one and @user,@two<br />#<br /># Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au<br />BEGIN {<br /> print "from,to"<br />}<br />/@([A-Za-z0-9_]+)/ {<br /> a=0<br /> do {<br /> match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)<br /> a=a+atArray[1, "start"]+atArray[1, "length"]<br /> if (atArray[1] != 0) print tolower($3) "," tolower(atArray[1])<br /> } while(atArray[1, "start"] != 0)<br />}<br />
  6. 6. Running Gawk Scripts<br />Gawk command line execution:<br />Open terminal window<br />Run command:<br />#> gawk -F t -f scriptsexplodetime.awk input.tsv >output.tsv<br />Arguments:<br />-F t = field separator is a TAB (otherwise -F ,)<br />-f scriptsexplodetime.awk = run the explodetime.awk script(adjust scripts path as required)<br />
  7. 7. Basic #hashtag data: most active users<br />Pivot table in Excel – ‘from_user’ against ‘count of text’<br />
  8. 8. Identifying Time-Based Patterns<br />#> gawk -F t -f scriptsexplodetime.awk input.tsv >output.tsv<br />Output:<br />Additional time data:<br />Original format + year,month,day,hour,minute<br />Uses:<br />Time series per year, month, day, hour, minute<br />
  9. 9. Basic #hashtag data: activity over time<br />Pivot table – ‘day’ against ‘count of text’<br />
  10. 10. Identifying @reply Networks<br />#> gawk -F t -f scriptsatreplyfromtoonly.awk input.tsv >output.tsv<br />Output:<br />Basic network information:<br />from,to<br />Uses:<br />Key @reply recipients<br />Network visualisation<br />
  11. 11. Basic #hashtag data: @replies received<br />Pivot table – ‘to’ against ‘from’<br />
  12. 12. Basic @reply Network Visualisation<br />Gephi:<br />Open source network visualisation tool – Gephi.org<br />Frequently updated, growing number of plugins<br />Load CSV into Gephi<br />Run ‘Average Degree’ network metric<br />Filter for minimum degree / indegree / outdegree<br />Adjust node size and node colour settings:<br />E.g. colour = outdegree, size = indegree<br />Run network visualisation:<br />E.g. ForceAtlas – play with settings as appropriate<br />
  13. 13. Basic @reply Network Visualisation<br />Degree = 100+, colour = outdegree, size = indegree<br />
  14. 14. Tracking Themes (and More) over Time<br />#> gawk -F t -f multifilter.awk search="term1,term2,..." input.tsv >output.tsv<br /> term examples: (julia|gillard),(tony|abbott)<br /> .?,@[A-Za-z0-9_]+,RT @[A-Za-z0-9_]+,http<br />Output:<br />Basic network information:<br />Original format + term1 match, term2 match, ...<br />Uses:<br />Use on output from explodetime.awk<br />Graph occurrences of terms per time period (hour, day, ...)<br />
  15. 15. Tracking Themes over Time<br />Pivot table – ‘day’ against keyword bundles, normalised to 100%<br />
  16. 16. Dynamic @reply Network Visualisation<br />Multi-step process:<br />Make sure tweets are in ascending chronological order<br />Use timeframe.awk to select period to visualise:<br />#> gawk -F , -f timeframe.awk start="2011 01 01 00 00 00" end="2011 01 01 23 59 59" tweets.csv >tweets-1Jan.csv<br />start / end = start and end of period to select (YYYY MM DD HH MM SS)<br />Use preparegexfatreplyintervals.awk to prepare data:<br />#> gawk -F , -f preparegexfattimeintervals.awk tweets-1Jan.csv >tweets-1Jan-prep.csv<br />Use gexfattimeintervals.awk to convert to Gephi GEXF format:<br />#> gawk -F , -f gexfattimeintervals.awk decaytime="1800" tweets-1Jan-prep.csv >tweets-1Jan.gexf<br />decaytime = time in seconds that an @reply remains ‘active’, once made<br />This may take some time...<br />
  17. 17.
  18. 18. http://mappingonlinepublics.net/<br />Image by campoalto<br />@snurb_dot_info<br />@jeanburgess<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×