Your SlideShare is downloading. ×
0
Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges<br />Image by campoalto<br /...
Project: New Media and Public Communication<br />ARC Discovery (2010-12) – A$410.000<br />Axel Bruns (CI), Jean Burgess (S...
Flickr
 Twitter
 blogs</li></ul>Research tool development and baseline data<br />Baseline information:<br /><ul><li> data extraction
 content creation    statistics
 patterns in terms    and themes
 baseline social    networking map
 interconnections    between social    network spaces</li></ul>Content creation patterns<br />Changes over time:<br /><ul>...
 regular / seasonal    patterns</li></ul>Cluster profiling:<br /><ul><li> common themes /    patterns
 lead users</li></ul>Focus on specific events<br />Cultural dynamics:<br /><ul><li> rapid spread of new ideas
 communication    across clusters
 thematic discourse    analysis
 relationship with main-   stream media coverage</li></ul>Research tools:<br /><ul><li> network crawler
 content scraper
Upcoming SlideShare
Loading in...5
×

Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

1,708

Published on

Paper presented at AoIR 2010, Gothenburg, 22 Oct. 2010

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,708
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges"

  1. 1. Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges<br />Image by campoalto<br />Axel Bruns / Jean Burgess<br />ARC Centre of Excellence for Creative Industries and Innovation, Brisbane<br />a.bruns@qut.edu.au – @snurb_dot_info<br />je.burgess@qut.edu.au – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/<br />Thomas Nicolai / Lars Kirchhoff<br />Sociomantic Labs, Berlin<br />thomas.nicolai@sociomantic.com / lars.kirchhoff@sociomantic.com<br />http://sociomantic.com/<br />
  2. 2. Project: New Media and Public Communication<br />ARC Discovery (2010-12) – A$410.000<br />Axel Bruns (CI), Jean Burgess (SRF) – QUT, Brisbane<br />Lars Kirchhoff, Thomas Nicolai (PIs) – Sociomantic Labs, Berlin<br />Project blog: http://mappingonlinepublics.net/<br />Year 1 Year 2 Year 3<br />Social network sources:<br /><ul><li> YouTube
  3. 3. Flickr
  4. 4. Twitter
  5. 5. blogs</li></ul>Research tool development and baseline data<br />Baseline information:<br /><ul><li> data extraction
  6. 6. content creation statistics
  7. 7. patterns in terms and themes
  8. 8. baseline social networking map
  9. 9. interconnections between social network spaces</li></ul>Content creation patterns<br />Changes over time:<br /><ul><li> short-term statistics
  10. 10. regular / seasonal patterns</li></ul>Cluster profiling:<br /><ul><li> common themes / patterns
  11. 11. lead users</li></ul>Focus on specific events<br />Cultural dynamics:<br /><ul><li> rapid spread of new ideas
  12. 12. communication across clusters
  13. 13. thematic discourse analysis
  14. 14. relationship with main- stream media coverage</li></ul>Research tools:<br /><ul><li> network crawler
  15. 15. content scraper
  16. 16. content analysis
  17. 17. network analysis</li></li></ul><li>Methodology – Blogs<br />
  18. 18. Analysis – Blogs<br />
  19. 19. Blog Network (between known blogs only)(~8500 blogs / 17 July to 25 Aug. 2010 / All page links / Node size: Indegree)<br />parenting<br />politics<br />food<br />arts & crafts<br />design and style<br />
  20. 20. Methodology – Twitter<br />
  21. 21. Analysis – Twitter<br />
  22. 22. Data Processing – Twitter<br />Typical data structure (#ausvotes):<br />
  23. 23. Data Processing – Twitter<br />Tools:<br />Gawk – Scripting tool für CSV processing (open source)<br />Excel – Data aggregation, pivot tables and charts<br />Leximancer / WordStat – Keyword extraction, co-occurence matrices<br />Gephi – Network analysis and visualisation (open source)<br /># Extract @replies for network visualisation<br />#<br /># this script takes a CSV archive of tweets, and reworks it into network data for visualisation<br />#<br /># expected data format:<br /># text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type, # geo_coordinates_0,geo_coordinates_1,created_at,time<br />#<br /># output format:<br /># from,to,tweet,time,timestamp<br />#<br /># the script extracts @replies from tweets, and creates duplicates where multiple @replies are<br /># present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in<br /># @user,@one,"@one @two hello" and @user,@two,"@one @two hello"<br />#<br /># Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au<br />BEGIN {<br /> print "from,to,tweet,time,timestamp"<br />}<br />/@([A-Za-z0-9_]+)/ {<br /> a=0<br /> do {<br /> match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)<br /> a=a+atArray[1, "start"]+atArray[1, "length"]<br /> if (atArray[1] != 0) print $3 "," atArray[1] "," $1 "," $12 "," $13<br /> } while(atArray[1, "start"] != 0)<br />}<br /># filter.awk - Filter list of tweets<br />#<br /># this script takes a CSV or other list of tweets, and removes any lines that don't include RT @username<br /># the script preserves the first line, expecting that it contains header information<br />#<br /># script expects command-line argument search={searchcriteria} _before_ the input CSV filename<br /># enclose the search term in quotation marks if it contains any special characters<br />#<br /># e.g.: gawk -F , -f filter.awk search="(julia|gillard)" tweets.csv >filteredtweets.csv<br />#<br /># expected data format:<br /># CSV or simple list of tweets, line-by-line<br />#<br /># output format:<br /># same as above, listing only retweets<br />#<br /># Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au<br />BEGIN { <br />getline<br /> print $0<br />}<br />tolower($0) ~ search {<br /> print $0<br />}<br />
  24. 24. #ausvotes: Overall Activity (17 July – 24 Aug. 2010)<br />
  25. 25. #ausvotes: Discussion Network17 July to 25 Aug. 2010 / All @replies / Node size: Indegree / Node colours: betweenness centrality)<br />
  26. 26. Keyword Co-Occurrence<br />
  27. 27. #ausvotes: Mentions of the Leaders (cumulative)<br />
  28. 28. #ausvotes: Key Themes<br />
  29. 29. Challenges<br />Twapperkeeper relies on #hashtags<br />Problem if #hashtags are inconsistent/unclear<br />Follow-on @replies and retweets may not continue to use #hashtags<br />May miss early developments – e.g. #hashtagstandardisation<br />Need to look at overall user activity / Twitterfirehose for more comprehensive picture<br />Need to track baseline activity to understand how exceptional acute events are<br />Ethical considerations:<br />Using only publicly available data (no protected tweets, no firewalled blogs)<br />But technical publicness not enough – ‘publicly available’ ≠ ‘meant to be public’<br />No easy answers – #hashtags probably indicate intention to be public, but may not<br />Need to consider data storage and publication carefully, too<br />See more at mappingonlinepublics.net – up next: time-based animations...<br />Or find us at @snurb_dot_info and @jeanburgess<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×