Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

  • 1,664 views
Uploaded on

Paper presented at AoIR 2010, Gothenburg, 22 Oct. 2010

Paper presented at AoIR 2010, Gothenburg, 22 Oct. 2010

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,664
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
9
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges
    Image by campoalto
    Axel Bruns / Jean Burgess
    ARC Centre of Excellence for Creative Industries and Innovation, Brisbane
    a.bruns@qut.edu.au – @snurb_dot_info
    je.burgess@qut.edu.au – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/
    Thomas Nicolai / Lars Kirchhoff
    Sociomantic Labs, Berlin
    thomas.nicolai@sociomantic.com / lars.kirchhoff@sociomantic.com
    http://sociomantic.com/
  • 2. Project: New Media and Public Communication
    ARC Discovery (2010-12) – A$410.000
    Axel Bruns (CI), Jean Burgess (SRF) – QUT, Brisbane
    Lars Kirchhoff, Thomas Nicolai (PIs) – Sociomantic Labs, Berlin
    Project blog: http://mappingonlinepublics.net/
    Year 1 Year 2 Year 3
    Social network sources:
    Research tool development and baseline data
    Baseline information:
    • data extraction
    • 6. content creation statistics
    • 7. patterns in terms and themes
    • 8. baseline social networking map
    • 9. interconnections between social network spaces
    Content creation patterns
    Changes over time:
    • short-term statistics
    • 10. regular / seasonal patterns
    Cluster profiling:
    • common themes / patterns
    • 11. lead users
    Focus on specific events
    Cultural dynamics:
    • rapid spread of new ideas
    • 12. communication across clusters
    • 13. thematic discourse analysis
    • 14. relationship with main- stream media coverage
    Research tools:
    • network crawler
    • 15. content scraper
    • 16. content analysis
    • 17. network analysis
  • Methodology – Blogs
  • 18. Analysis – Blogs
  • 19. Blog Network (between known blogs only)(~8500 blogs / 17 July to 25 Aug. 2010 / All page links / Node size: Indegree)
    parenting
    politics
    food
    arts & crafts
    design and style
  • 20. Methodology – Twitter
  • 21. Analysis – Twitter
  • 22. Data Processing – Twitter
    Typical data structure (#ausvotes):
  • 23. Data Processing – Twitter
    Tools:
    Gawk – Scripting tool für CSV processing (open source)
    Excel – Data aggregation, pivot tables and charts
    Leximancer / WordStat – Keyword extraction, co-occurence matrices
    Gephi – Network analysis and visualisation (open source)
    # Extract @replies for network visualisation
    #
    # this script takes a CSV archive of tweets, and reworks it into network data for visualisation
    #
    # expected data format:
    # text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type, # geo_coordinates_0,geo_coordinates_1,created_at,time
    #
    # output format:
    # from,to,tweet,time,timestamp
    #
    # the script extracts @replies from tweets, and creates duplicates where multiple @replies are
    # present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in
    # @user,@one,"@one @two hello" and @user,@two,"@one @two hello"
    #
    # Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au
    BEGIN {
    print "from,to,tweet,time,timestamp"
    }
    /@([A-Za-z0-9_]+)/ {
    a=0
    do {
    match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)
    a=a+atArray[1, "start"]+atArray[1, "length"]
    if (atArray[1] != 0) print $3 "," atArray[1] "," $1 "," $12 "," $13
    } while(atArray[1, "start"] != 0)
    }
    # filter.awk - Filter list of tweets
    #
    # this script takes a CSV or other list of tweets, and removes any lines that don't include RT @username
    # the script preserves the first line, expecting that it contains header information
    #
    # script expects command-line argument search={searchcriteria} _before_ the input CSV filename
    # enclose the search term in quotation marks if it contains any special characters
    #
    # e.g.: gawk -F , -f filter.awk search="(julia|gillard)" tweets.csv >filteredtweets.csv
    #
    # expected data format:
    # CSV or simple list of tweets, line-by-line
    #
    # output format:
    # same as above, listing only retweets
    #
    # Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au
    BEGIN {
    getline
    print $0
    }
    tolower($0) ~ search {
    print $0
    }
  • 24. #ausvotes: Overall Activity (17 July – 24 Aug. 2010)
  • 25. #ausvotes: Discussion Network17 July to 25 Aug. 2010 / All @replies / Node size: Indegree / Node colours: betweenness centrality)
  • 26. Keyword Co-Occurrence
  • 27. #ausvotes: Mentions of the Leaders (cumulative)
  • 28. #ausvotes: Key Themes
  • 29. Challenges
    Twapperkeeper relies on #hashtags
    Problem if #hashtags are inconsistent/unclear
    Follow-on @replies and retweets may not continue to use #hashtags
    May miss early developments – e.g. #hashtagstandardisation
    Need to look at overall user activity / Twitterfirehose for more comprehensive picture
    Need to track baseline activity to understand how exceptional acute events are
    Ethical considerations:
    Using only publicly available data (no protected tweets, no firewalled blogs)
    But technical publicness not enough – ‘publicly available’ ≠ ‘meant to be public’
    No easy answers – #hashtags probably indicate intention to be public, but may not
    Need to consider data storage and publication carefully, too
    See more at mappingonlinepublics.net – up next: time-based animations...
    Or find us at @snurb_dot_info and @jeanburgess