Thou Shalt not Share Collections of Tweets: Should we give a TOS?


Published on

A conversation about Twitter's recent moves to enforce aspects of its API TOS to prohibit online research services archives for download. This was informed by recent discussion on the AoIR mailing list and my own experiences.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Thou Shalt not Share Collections of Tweets: Should we give a TOS?

  1. 1. Thou Shalt not Share Collections of Tweets: Should we give a TOS?<br />
  2. 2. “Thou Shalt not Share Collections ...”<br />Interest sparked by AoIR discussion<br />Post by Prof Stuart Shulman on May 5th<br />2<br />
  3. 3. The Original Post (OP)<br />3<br />[Posted: Thu May 5 05:24:10 PDT 2011]<br />
  4. 4. What Twitter said<br />4<br />
  5. 5. 5<br />
  6. 6. Twitter-History a.k.a. ‘Twistory’<br />“We hope Twitter will realize the value of enabling researchers, journalists and citizens better ways to search, sort and analyze clusters of this important historical information.”<br />6<br />
  7. 7. Twitter appears to think so too!<br />7<br />
  8. 8. Twitter says “desist!”<br />Prohibited other services from offering archives (for download):<br />E.g., 140kit, TwapperKeeper, DiscoverText, ...<br />Shut down 3rd party clients (Twidroyd & UberTwitter) for:<br />Private Direct Messages longer than 140 characters<br />Trademark infringement<br />Changing the content of users' Tweets in order to make money<br />8<br />
  9. 9. Twitter responds ...<br />“... abide by a simple set of rules that are in the interests of our users, as well as the health and vitality of the platform as a whole.”<br />“... on an average day we turn off more than one hundred services that violate our API rules of the road.”<br />“You can download Twitter for Blackberry, Twitter for Android and other official Twitter apps here. You can also try our mobile web site or apps from other third-party developers.”<br />9<br />
  10. 10. Why now?<br />10<br />
  11. 11. Perspectives:<br />Online social messaging service (user)<br />Open ecosystem infrastructure (developer)<br />Historical social record (researchers)<br />Post “tweets” with max. 140 characters in real-time<br />Publicly accessible (cf. CB radios) with some privacy<br />Provides search (limited)<br />Uses & develops open-source software (e.g., Cassandra, Lucene, FlockDB, ...)<br />
  12. 12. 12<br />
  13. 13. Some Twitter numbers<br />Valuation: 4 billion (January 2011)<br />Investment: $360 million (200m, Dec 2010)<br />Employees: 400 (Jan 2011)200 are engineers<br />Revenue: Ad estimates 150 million for 2011<br />No. of tweets: 140-150 million per day<br />Users/Accounts: 200 million (approx.)<br />Website ranking: Top 10-Top20<br />Twitter search: One billion queries per day<br />13<br />
  14. 14. 2006 (late)-2008<br />14<br />
  15. 15. 2009-2010<br />15<br />
  16. 16. 2011<br />16<br />
  17. 17. A quick aside ...<br />
  18. 18. Twitter Research<br />Services: 140kit, TwapperKeeper, DiscoverText, The Archivist, ...<br />Some hundreds of publications<br />Areas: <br />Social network analysis, recommendations systems, social influence, user sentiment, business strategy, disaster prediction & alerts, education, software engineering, politics, ...<br />Using: <br />Content analysis (narrative), ethnography, SVMs, TextRank, TFIDF, BoW, POS, ...<br />18<br />
  19. 19. The Twitter API<br />REST API uses HTTP protocol <br />All website features supported through API<br />Programming libraries available<br />Rate limiting (user & IP):<br />Anonymous: 150 requests per hour<br />OAuth: 350 requests per hour<br />Whitelist e.g.  20,000 requests<br />Streaming offerings:<br />Spritzer (1%)<br />Gardenhose (10%) <br />Firehose (100%)<br />19<br />
  20. 20. General Terms of Service (Nov 2010)<br />Under “Your Rights”:<br />“... You grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).”<br />20<br />
  21. 21. TOS tips<br />“This license is you authorizing us to make your Tweets available to the rest of the world and to let others do the same. But what’s yours is yours – you own your content.”<br />“Twitter has an evolving set of rules for how API developers can interact with your content. These rules exist to enable an open ecosystem with your rights in mind.”<br />21<br />
  22. 22. API TOS (Feb 2011)<br />Access to Twitter Content:<br />You will not attempt or encourage others to:<br />sell, rent, lease, sublicense, redistribute, or syndicate the Twitter API or Twitter Content to any third party for such party to develop additional products or services without prior written approval from Twitter<br />Content = “All use of the Twitter API and content, documentation, code, and related materials made available to you on or through Twitter.”<br />22<br />
  23. 23. Authorised resyndication = GNIP<br />First authorized reseller of Twitter data, Nov 2010<br />Offerings:<br />Halfhose (50%, $30k / mo)<br />Decahose (10%, $5k / mo)<br />Power Track ($.10 per 1,000 Tweets)<br />Link Stream ($50k / mo)<br />User Mention Stream ($20k / mo)<br />Keyword Search<br />23<br />
  24. 24. Potential consequences<br />Obstruct peer review of datasets<br />Prohibits researchers getting access to data (in a timely way, if at all)<br />Stifle innovations (most come from user community & 3rd party developers!)<br />Users become more cautious about using social media<br />Twitter becomes less useful (protest, reporting, ...)<br />Twitter services become hacking targets: (unreliable, unstable, slow, ...)<br />Social science researchers twiddle their thumbs<br />
  25. 25. One solution ...<br />One solution?<br />25<br />
  26. 26. Talking points<br />Is there a problem here?<br />Does Twitter have any obligation to users, developers & researchers?<br />Is it worth (or even ethical) to violate Twitter’s TOS to get access to researchable data?<br />Should users’ content even be available to researchers?<br />
  27. 27. Thanks!<br />