Rob Procter

442 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
442
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The cloud economics argument from Amazon shows how traditional forms of providing computational and storage resources are either wasteful or risk customer dissatisfaction. Using a cloud model, the level of resource provision can be adapted to current demand. The St Andrews Cloud Collaboratory (StACC) is a private cloud (actually, more than one) that allows us to allocate resources to a research project when needed and release them for other uses when not needed for the project. This allows St Andrews to do more research per server room / watt / CO2.
  • Rob Procter

    1. 1. READINGTHE RIOTSON TWITTER Challenges of new social media for quantitative researchers Rob Procter (University of Manchester) Farida Vis (University of Leicester) Alexander Voss (University of St Andrews) http://www.analysingsocialmedia.org/ #readingtheriots
    2. 2. READINGTHE RIOTSON TWITTER Methodology • Development of computer-based tools for sentiment and topic analysis of tweets is an active area of research. • Our methodology combines computer-based tools with established content analysis techniques in ways that are complementary to their respective strengths.
    3. 3. READINGTHE RIOTSON TWITTER Information Flows • Any collection of tweets can be divided into tweets that are ‘original’ and retweets. • If we are interested in how Twitter is used to communicate and share information, the only reliable evidence that a tweet has been read is that it has been retweeted. • We use computational tools to group a tweet (the parent) and its retweets (its children) into information flows.
    4. 4. READINGTHE RIOTSON TWITTER Information Flow Analysis For N = 1, CorpusMax InformationFlow[N-1] = {} If Corpus[N] == “RT @”.username.body (LevenshteinDistance, Parent) = LDMin(N- 1, username, body) If LevenshteinDistance< 30 InformationFlow[Parent] = InformationFlow[Parent] + Corpus[N]
    5. 5. READINGTHE RIOTSON TWITTER Example Information Flows Riots Corpus Great sight in my #Birmingham where #Pakistani lads are 2.6M Tweets protecting temples while Sikh lads protecting the mosques: 758 700,000 accounts incitement pls?: 5 Can we have them arrested for Hackney! Fuck the feds! #hackney #punchcroft has just posted Go on someone calling themselves
    6. 6. READINGTHE RIOTSON TWITTER Coding Frames • We use established methods of content analysis to understand how Twitter was being used in the context of topics we wished to analyse. • Inductively code information flows to develop a ‘code frame’ to categorise topics and examine relationships in context of a given topic.
    7. 7. READINGTHE RIOTSON TWITTER Coding Frames
    8. 8. READINGTHE RIOTSON TWITTER Rumours on Twitter
    9. 9. READINGTHE RIOTSON TWITTER
    10. 10. READINGTHE RIOTSON TWITTER
    11. 11. READINGTHE RIOTSON TWITTER
    12. 12. READINGTHE RIOTSON TWITTER
    13. 13. READINGTHE RIOTSON TWITTER
    14. 14. READINGTHE RIOTSON TWITTER
    15. 15. READINGTHE RIOTSON TWITTER Sampling Issues • Riots corpus selected from Twitter firehose using set of hashtags: – Sample may systematically exclude some relevant data. • Twitter users not representative of the population as a whole: – Younger, better off, better educated, urban – How can we use profile info to counter bias?
    16. 16. READINGTHE RIOTSON TWITTER Twitter Data APIs • Twitter offers a number of different APIs providing access to different sets of data. • Differences are in terms of: – Timescale – Real-time vs. retrospective – Completeness – Functionality to specify subsets of tweets
    17. 17. READINGTHE RIOTSON TWITTER Search and REST APIs • Search API – unauthenticated use: – Search by keyword, account, etc. – No tweets older than about one week – Rate limited by details not published • REST API – authenticated use, account centric: – Retrieve tweets, profile data, friends & followers, etc. and authenticated users’ direct messages – Searching public tweets also possible – Rate limited to 350 requests per hour
    18. 18. READINGTHE RIOTSON TWITTER REST and Search Limitations • Users can delete tweets. • Twitter will retire tweets (depending on traffic an account generates). • Rate limiting means it is difficult to collect substantial corpora that are complete.
    19. 19. READINGTHE RIOTSON TWITTER Twitter Streaming API • Twitter Streaming API allows streaming either a random sample or tweets selected by keyword (track), account (follow) or geo-location (but few tweets are geo-coded). • Track and follow can be rate-limited if too much traffic is generated, but in a way as to produce a random sample (needs to be confirmed).
    20. 20. READINGTHE RIOTSON TWITTER
    21. 21. READINGTHE RIOTSON TWITTER Reliability of Computer-based Tools • If, for a given corpus sample, a computer- based tool matches performance of human coders with a precision of y%: – what is the estimated precision over the whole corpus? • How would a representative corpus sample be specified?
    22. 22. READINGTHE RIOTSON TWITTER 140m Tweets a Day…
    23. 23. READINGTHE RIOTSON TWITTER Why Cloud Computing? St Andrews Cloud Collaboratory (StACC) Information flow analysis: 16 instances, one working day.
    24. 24. READINGTHE RIOTSON TWITTER How to cope… distribution
    25. 25. READINGTHE RIOTSON TWITTER Challenges of new social media for quantitative researchers Rob Procter (University of Manchester) Farida Vis (University of Leicester) Alexander Voss (University of St Andrews) http://www.analysingsocialmedia.org/ #readingtheriots

    ×