Twitter6

695
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
695
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Twitter6

  1. 1. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet #2 charsyam@naver.com
  2. 2. Data Mining
  3. 3. Discover New Knowledge
  4. 4. Discover New Knowledgefrom Existing Information
  5. 5. What do #TeaParty and #JustinBieber have in common
  6. 6. Tools: Pymongo, MongoDBapt-get install python-devpip install pymongo
  7. 7. Get Tweetsfrom pymongo.connection import Connectionimport sysimport tweepyconnection = Connection("localhost")db = connection.fooimport tweepyapi = tweepy.API()tweets = api.search(#JustinBieber, rpp=100)for tweet in tweets: db.foo.save(tweet.__getstate__())
  8. 8. Insert TO MongoDBfrom pymongo.connection import Connectionimport sysimport tweepyconnection = Connection("localhost")db = connection.fooimport tweepyapi = tweepy.API()for num in range(1,16): tweets = api.search(#JustinBieber, rpp=100, page=num) for tweet in tweets: db.foo.save(tweet.__getstate__())
  9. 9. Count Frequency in mongo MAPmap = function(){ words = this.text.split( ); for ( i in words ){ emit({ key: words[i] }, {count: 1}); }};
  10. 10. Count Frequency in mongo REDUCEreduce = function (key, values) { var count = 0; values.forEach(function (v) {count += v.count;}); return {count:count};}
  11. 11. Count Frequency in mongo EXECUTEres = db.foo.mapReduce( map, reduce, {out: "mystring"});
  12. 12. Count Frequency in mongo RESULT{ "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } }{ "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } }{ "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } }{ "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } }{ "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } }{ "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }
  13. 13. Get From MongoDBfrom pymongo.connection import Connectionimport sysimport tweepyconnection = Connection("localhost")db = connection.foocursor = db.mystring.find()for d in cursor: print d
  14. 14. What Entities Co-Occur Most Often with #JustinBieber and #TeaParty Tweets?
  15. 15. intersectionimport sysfrom sets import Setif __name__==__main__: r1 = open( sys.argv[1] ) r2 = open( sys.argv[2] ) s1 = Set() s2 = Set() for line in r1.readlines(): key = line.split() if( len(key) > 0 ): s1.add(key[0]) for line in r2.readlines(): key = line.split() if( len(key) > 0 ): s2.add(key[0]) s3 = s1.intersection(s2) print len(s1) print len(s2) print len(s3)
  16. 16. On Average, Do #JustinBieber or #TeaParty Tweets Have More Hashtags?
  17. 17. Which Get Retweeted More Often: #JustinBieber or #TeaParty?
  18. 18. How Much Overlap ExistsBetween the Entities of #TeaParty and #JustinBieber Tweet?
  19. 19. Thank You!

×