Unleashing twitter data for fun and insight

  • 1,450 views
Uploaded on

Matthew Russell, VP Engineering at Digital Reasoning, discusses techniques and results of mining twitter data for fun and insight

Matthew Russell, VP Engineering at Digital Reasoning, discusses techniques and results of mining twitter data for fun and insight

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,450
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
40
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Unleashing Twitter Data for fun and insightMatthew A. Russellhttp://linkedin.com/in/ptwobrussell@ptwobrussell Agile Data Solutions Social Web Mining the
  • 2. Happy Groundhog Day!
  • 3. Mining the Social Web Chapters 1-5Introduction: Trends, Tweets, and TwitterersMicroformats: Semantic Markup and Common Sense CollideMailboxes: Oldies but GoodiesFriends, Followers, and Setwise OperationsTwitter: The Tweet, the Whole Tweet, andNothing but the Tweet
  • 4. Mining the Social Web Chapters 6-10LinkedIn: Clustering Your Professional Network For Fun (andProfit?)Google Buzz: TF-IDF, Cosine Similarity, and CollocationsBlogs et al: Natural Language Processing (and Beyond)Facebook: The All-In-One WonderThe Semantic Web: A Cocktail Discussion
  • 5. O verview• Trends, Tweets, and Retweet Visualizations• Friends, Followers, and Setwise Operations• The Tweet, the Whole Tweet, and Nothing but the Tweet
  • 6. Insight Matters• What is @users potential influence?• What are @users passions right now?• Who are @users most trusted friends?
  • 7. Part 1:Tweets, Trends, and Retweet Visualizations Agile Data Solutions Social Web Mining the
  • 8. A point to ponder:Twitter : Data :: JavaScript : Programming Languages (???)
  • 9. Getting Ready To Code Agile Data Solutions Social Web Mining the
  • 10. Python Installation• Mac users already have it• Linux users probably have it• Windows users should grab ActivePython
  • 11. easy_install• Installs packages from PyPI• Get it: • http://pypi.python.org/pypi/setuptools • Ships with ActivePython• It really is easy: easy_install twitter easy_install nltk easy_install networkx
  • 12. Git It?• http://github.com/ptwobrussell/Mining-the-Social-Web• git clone git://github.com/ptwobrussell/Mining-the-Social-Web.git • introduction__*.py • friends_followers__*.py • the_tweet__*.py
  • 13. Getting Data Agile Data Solutions Social Web Mining the
  • 14. Twitter Data Sources• Twitter API Resources• GNIP• Infochimps• Library of Congress
  • 15. Trending Topics>>> import twitter # Remember to "easy_install twitter">>> twitter_search = twitter.Twitter(domain="search.twitter.com")>>> trends = twitter_search.trends()>>> [ trend[name] for trend in trends[trends] ][u#ZodiacFacts, u#nowplaying, u#ItsOverWhen, u#Christoferdrew, uJustin Bieber, u#WhatwouldItBeLike, u#Sagittarius, uSNL, u#SurveySays, u#iDoit2]
  • 16. Search Results>>> search_results = []>>> for page in range(1,6):... search_results.append(twitter_search.search(q="SNL",rpp=100, page=page))
  • 17. Search Results (continued) >>> import json >>> print json.dumps(search_results, sort_keys=True, indent=1) [ { "completed_in": 0.088122000000000006, "max_id": 11966285265, "next_page": "?page=2&max_id=11966285265&rpp=100&q=SNL", "page": 1, "query": "SNL", "refresh_url": "?since_id=11966285265&q=SNL", ...more...
  • 18. Search Results (continued) "results": [ { "created_at": "Sun, 11 Apr 2010 01:34:52 +0000", "from_user": "bieber_luv2", "from_user_id": 106998169, "geo": null, "id": 11966285265, "iso_language_code": "en", "metadata": { "result_type": "recent" }, ...more...
  • 19. Search Results (continued) "profile_image_url": "http://a1.twimg.com/profile_images/80...", "source": "<a href="http://twitter.com/&quo...", "text": "im nt gonna go to sleep happy unless i see ...", "to_user_id": null } ... output truncated - 99 more tweets ... ], "results_per_page": 100, "since_id": 0 }, ... output truncated - 4 more pages ...]
  • 20. Lexical Diversity• Ratio of unique terms to total terms • A measure of "stickiness"? • A measure of "group think"? • A crude indicator of retweets to originally authored tweets?
  • 21. Distilling Tweet Text >>> # search_results is already defined >>> tweets = [ r[text] ... for result in search_results ... for r in result[results] ] >>> words = [] >>> for t in tweets: ... words += [ w for w in t.split() ] ...
  • 22. Analyzing Data Agile Data Solutions Social Web Mining the
  • 23. Lexical Diversity >>> len(words) 7238 >>> # unique words >>> len(set(words)) 1636 >>> # lexical diversity >>> 1.0*len(set(words))/len(words) 0.22602928985907708 >>> # average number of words per tweet >>> 1.0*sum([ len(t.split()) for t in tweets ])/len(tweets) 14.476000000000001
  • 24. Size Frequency Matters• Counting: always the first step• Simple but effective• NLTK saves us a little trouble
  • 25. Frequency Analysis >>> import nltk >>> freq_dist = nltk.FreqDist(words) >>> freq_dist.keys()[:50] #50 most frequent tokens [usnl, uon, urt, uis, uto, ui, uwatch, ujustin, u@justinbieber, ube, uthe, utonight, ugonna, uat, uin, ubieber, uand, uyou, uwatching, utina, ufor, ua, uwait, ufey, uof, u@justinbieber:, uif, uwith, uso, u"cant", uwho, ugreat, uit, ugoing, uim, u:), usnl..., u2nite..., uare, ucant, udress, urehearsal, usee, uthat, uwhat, ubut, utonight!, u:d, u2, uwill]
  • 26. Frequency Visualization
  • 27. Tweet and RT were sitting on a fence. Tweet fell off. Who was left?
  • 28. RTs: past, present, & future• Retweet: Tweeting a tweet thats already been tweeted• RT or via followed by @mention• Example: RT @SocialWebMining Justin Bieber is on SNL 2nite. w00t?!?• Relatively new APIs were rolled out last year for retweeting sans conventions
  • 29. Some people, when confronted with a problem, think "I know, Ill use regular expressions." Now they have two problems. -- Jamie Zawinski
  • 30. Parsing Retweets >>> example_tweets = ["Visualize Twitter search results w/ this simple script http://bit.ly/cBu0l4 - Gist instructions http://bit.ly/9SZ2kb (via @SocialWebMining @ptwobrussell)"] >>> import re >>> rt_patterns = re.compile(r"(RT|via)((?:bW*@w+)+)", ... re.IGNORECASE) >>> rt_origins = [] >>> for t in example_tweets: ... try: ... rt_origins += [mention.strip() ... for mention in rt_patterns.findall(t)[0][1].split()] ... except IndexError, e: ... pass >>> [rto.strip("@") for rto in rt_origins]
  • 31. Visualizing Data Agile Data Solutions Social Web Mining the
  • 32. Graph Construction >>> import networkx as nx >>> g = nx.DiGraph() >>> g.add_edge("@SocialWebMining", "@ptwobrussell", ... {"tweet_id" : 4815162342},)
  • 33. Writing out DOTOUT_FILE = "out_file.dot"try: nx.drawing.write_dot(g, OUT_FILE)except ImportError, e: dot = ["%s" -> "%s" [tweet_id=%s] % (n1, n2, g[n1][n2][tweet_id]) for n1, n2 in g.edges()] f = open(OUT_FILE, w) f.write(strict digraph {n%sn} % (;n.join(dot),)) f.close()
  • 34. Example DOT Language strict digraph { "@ericastolte" -> "bonitasworld" [tweet_id=11965974697]; "@mpcoelho" ->"Lil_Amaral" [tweet_id=11965954427]; "@BieberBelle123" -> "BELIEBE4EVER" [tweet_id=11966261062]; "@BieberBelle123" -> "sabrina9451" [tweet_id=11966197327]; }
  • 35. DOT to Image• Download Graphviz: http://www.graphviz.org/•$ dot -Tpng out_file.dot > graph.png• Windows users might prefer GVEdit
  • 36. Graphviz: Extreme Closeup
  • 37. But you want more sexy?
  • 38. Protovis: Extreme Closeup 38 Mining the Social Web
  • 39. It Doesnt Have To Be a Graph Graph Connectedness
  • 40. Part 2:Friends, Followers, and Setwise Operations Agile Data Solutions Social Web Mining the
  • 41. Insight Matters• What is my potential influence?• Who are the most popular people in my network?• Who are my mutual friends?• What common friends/followers do I have with @user?• Who is not following me back?• What can I learn from analyzing my friendship cliques?
  • 42. Getting Data Agile Data Solutions Social Web Mining the
  • 43. OAuth (1.0a)import twitterfrom twitter.oauth_dance import oauth_dance# Get these from http://dev.twitter.com/apps/newconsumer_key, consumer_secret = key, secret(oauth_token, oauth_token_secret) = oauth_dance(MiningTheSocialWeb, consumer_key, consumer_secret)auth=twitter.oauth.OAuth(oauth_token, oauth_token_secret, consumer_key, consumer_secret)t = twitter.Twitter(domain=api.twitter.com, auth=auth)
  • 44. Getting Friendship Data friend_ids = t.friends.ids(screen_name=timoreilly, cursor=-1) follower_ids = t.followers.ids(screen_name=timoreilly, cursor=-1) # store the data somewhere...
  • 45. Perspective: Fetching all of Lady Gagas~7M followers would take ~4 hours
  • 46. But theres always a catch...
  • 47. Rate Limits• 350 requests/hr for authenticated requests• 150 requests/hr for anonymous requests• Coping mechanisms: • Caching & Archiving Data • Streaming API • HTTP 400 codes• See http://dev.twitter.com/pages/rate-limiting
  • 48. The Beloved Fail Whale • Twitter is sometimes "overcapacity" • HTTP 503 Error • Handle it just as any other HTTP error • RESTfulness has its advantages
  • 49. Abstraction Helps friend_ids = [] wait_period = 2 # secs cursor = -1 while cursor != 0: response = makeTwitterRequest(t, # twitter.Twitter instance t.friends.ids, screen_name=screen_name, cursor=cursor) friend_ids += response[ids] cursor = response[next_cursor] # break out of loop early if you dont need all ids
  • 50. Abstracting Abstractions screen_name = timoreilly # This is what you ultimately want... friend_ids = getFriends(screen_name) follower_ids = getFollowers(screen_name)
  • 51. Storing Data Agile Data Solutions Social Web Mining the
  • 52. Flat Files? ./ screen_name1/ friend_ids.json follower_ids.json user_info.json screen_name2/ ... ...
  • 53. Pickles?import cPickleo = { friend_ids : friend_ids, follower_ids : follower_ids, user_info : user_info}f = open(screen_name1.pickle, wb)cPickle.dump(o, f)f.close()
  • 54. A relational database? import sqlite3 as sqlite conn = sqlite.connect(data.db) c = conn.cursor() c.execute(create table friends...) c.execute(insert into friends... ) # Lots of fun...sigh...
  • 55. Redis (A Data Structures Server) import redis r = redis.Redis() [ r.sadd("timoreilly$friend_ids", i) for i in friend_ids ] r.smembers("timoreilly$friend_ids") # returns a set Project page: http://redis.io Windows binary: http://code.google.com/p/servicestack/wiki/RedisWindowsDownload
  • 56. Redis Set Operations• Key/value store...on typed values!• Common set operations • smembers, scard • sinter, sdiff, sunion • sadd, srem, etc.• See http://code.google.com/p/redis/wiki/CommandReference• Dont forget to $ easy_install redis
  • 57. Analyzing Data Agile Data Solutions Social Web Mining the
  • 58. Setwise Operations• Union• Intersection• Difference• Complement
  • 59. Venn Diagrams Followers - Friends Friends Friends - Followers Friends Followers U Followers
  • 60. Count Your Blessings# A utility functiondef getRedisIdByScreenName(screen_name, key_name): return screen_name$ + screen_name + $ + key_name# Number of friendsn_friends = r.scard(getRedisIdByScreenName(screen_name, friend_ids))# Number of followersn_followers = r.scard(getRedisIdByScreenName(screen_name, follower_ids))
  • 61. Asymmetric Relationships# Friends who arent following backfriends_diff_followers = r.sdiffstore(temp, [ getRedisIdByScreenName(screen_name, friend_ids), getRedisIdByScreenName(screen_name, follower_ids) ])# ... compute interesting things ...r.delete(temp)
  • 62. Asymmetric Relationships# Followers who arent friendedfollowers_diff_friends = r.sdiffstore(temp, [ getRedisIdByScreenName(screen_name, follower_ids), getRedisIdByScreenName(screen_name, friend_ids) ])# ... compute interesting things ...r.delete(temp)
  • 63. Symmetric Relationships mutual_friends = r.sinterstore(temp, [ getRedisIdByScreenName(screen_name, follower_ids), getRedisIdByScreenName(screen_name, friend_ids) ]) # ... compute interesting things ... r.delete(temp)
  • 64. Sample Output timoreilly is following 663 timoreilly is being followed by 1,423,704 131 of 663 are not following timoreilly back 1,423,172 of 1,423,704 are not being followed back by timoreilly timoreilly has 532 mutual friends
  • 65. Who Isnt Following Back? user_ids = [ ... ] # Resolve these to user info objects while len(user_ids) > 0: user_ids_str, = ,.join([ str(i) for i in user_ids[:100] ]) user_ids = user_ids[100:] response = t.users.lookup(user_id=user_ids) if type(response) is dict: response = [response] r.mset(dict([(getRedisIdByUserId(resp[id], info.json), json.dumps(resp)) for resp in response])) r.mset(dict([(getRedisIdByScreenName(resp[screen_name],info.json), json.dumps(resp)) for resp in response]))
  • 66. Friends in Common# Assume weve harvested friends/followers and its in Redis...screen_names = [timoreilly, mikeloukides]r.sinterstore(temp$friends_in_common, [getRedisIdByScreenName(screen_name, friend_ids) for screen_name in screen_names])r.sinterstore(temp$followers_in_common, [getRedisIdByScreenName(screen_name,follower_ids) for screen_name in screen_names])# Manipulate the sets
  • 67. Potential Influence• My followers?• My followers followers?• My followers followers followers?•for n in range(1, 7): # 6 degrees? print "My " + "followers "*n + "followers?"
  • 68. Saving a Thousand Words... { 1 2 Branching 3 Factor = 2 Depth = 3 4 5 6 7 8 9 10 11 12 13 14 15
  • 69. Same Data, Different Layout 9 10 4 5 8 2 11 1 12 3 15 6 7 13 14 4 12 5
  • 70. Space Complexity Depth 1 2 3 4 5 2 3 7 15 31 63Branching 3 4 13 40 121 364 Factor 4 5 21 85 341 1365 5 6 31 156 781 3906 6 7 43 259 1555 9331
  • 71. Breadth-First TraversalCreate an empty graphCreate an empty queue to keep track of unprocessed nodesAdd the starting point to the graph as the "root node"Add the root node to a queue for processingRepeat until some maximum depth is reached or the queue is empty: Remove a node from queue For each of the nodes neighbors: If the neighbor hasnt already been processed: Add it to the graph Add it to the queue Add an edge to the graph connecting the node & its neighbor
  • 72. Breadth-First Harvest next_queue = [ timoreilly ] # seed node d = 1 while d < depth: d += 1 queue, next_queue = next_queue, [] for screen_name in queue: follower_ids = getFollowers(screen_name=screen_name) next_queue += follower_ids getUserInfo(user_ids=next_queue)
  • 73. The Most Popular Followers freqs = {} for follower in followers: cnt = follower[followers_count] if not freqs.has_key(cnt): freqs[cnt] = [] freqs[cnt].append({screen_name: follower[screen_name], user_id: f[id]}) popular_followers = sorted(freqs, reverse=True)[:100]
  • 74. Average # of Followers all_freqs = [k for k in keys for user in freqs[k]] avg = sum(all_freqs) / len(all_freqs)
  • 75. @timoreillys Popular Followers The top 10 followers from the sample: aplusk 4,993,072 BarackObama 4,114,901 mashable 2,014,615 MarthaStewart 1,932,321 Schwarzenegger 1,705,177 zappos 1,689,289 Veronica 1,612,827 jack 1,592,004 stephenfry 1,531,813 davos 1,522,621
  • 76. Futzing the Numbers• The average number of timoreillys followers followers: 445• Discarding the top 10 lowers the average to around 300• Discarding any follower with less than 10 followers of their own increases the average to over 1,000!• Doing both brings the average to around 800
  • 77. The Right Tool For the Job:NetworkX for Networks
  • 78. Friendship Graphsfor i in ids: #ids is timoreillys id along with friend ids info = json.loads(r.get(getRedisIdByUserId(i, info.json))) screen_name = info[screen_name] friend_ids = list(r.smembers(getRedisIdByScreenName(screen_name, friend_ids))) for friend_id in [fid for fid in friend_ids if fid in ids]: friend_info = json.loads(r.get(getRedisIdByUserId(friend_id, info.json))) g.add_edge(screen_name, friend_info[screen_name])nx.write_gpickle(g, timoreilly.gpickle) # see also nx.read_gpickle
  • 79. Clique Analysis • Cliques • Maximum Cliques • Maximal Cliqueshttp://en.wikipedia.org/wiki/Clique_problem
  • 80. Calculating Cliquescliques = [c for c in nx.find_cliques(g)]num_cliques = len(cliques)clique_sizes = [len(c) for c in cliques]max_clique_size = max(clique_sizes)avg_clique_size = sum(clique_sizes) / num_cliquesmax_cliques = [c for c in cliques if len(c) == max_clique_size]num_max_cliques = len(max_cliques)people_in_every_max_clique = list(reduce( lambda x, y: x.intersection(y),[set(c) for c in max_cliques]))
  • 81. Cliques for @timoreilly Num cliques: 762573 Avg clique size: 14 Max clique size: 26 Num max cliques: 6 Num people in every max clique: 20
  • 82. Visualizing Data Agile Data Solutions Social Web Mining the
  • 83. Graphs, etc • Your first instinct is naturally G = (V, E) ?
  • 84. Dorling Cartogram • A location-aware bubble chart (ish) • At least 3-dimensional • Position, color, size • Look at friends/followers by state
  • 85. Sunburst of Friends • A very compact visualization • Slice and dice friends/followers by gender, country, locale, etc.
  • 86. Part 3:The Tweet, the Whole Tweet, and Nothing but the Tweet Agile Data Solutions Social Web Mining the
  • 87. Insight Matters• Which entities frequently appear in @users tweets?• How often does @user talk about specific friends?• Who does @user retweet most frequently?• How frequently is @user retweeted (by anyone)?• How many #hashtags are usually in @users tweets?
  • 88. Pen : Sword :: Tweet : Machine Gun (?!?)
  • 89. Getting Data Mining the Social Web
  • 90. Let me count the APIs...• Timelines• Tweets• Favorites• Direct Messages• Streams
  • 91. Anatomy of a Tweet (1/2){ "created_at" : "Thu Jun 24 14:21:11 +0000 2010", "id" : 16932571217, "text" : "Great idea from @crowdflower: Crowdsourcing ... #opengov", "user" : { "description" : "Founder and CEO, OReilly Media. Watching the alpha geeks...", "id" : 2384071, "location" : "Sebastopol, CA", "name" : "Tim OReilly", "screen_name" : "timoreilly", "url" : "http://radar.oreilly.com" }, ...
  • 92. Anatomy of a Tweet (2/2) ... "entities" : { "hashtags" : [ {"indices" : [ 97, 103 ], "text" : "gov20"}, {"indices" : [ 104, 112 ], "text" : "opengov"} ], "urls" : [{"expanded_url" : null, "indices" : [ 76, 96 ], "url" : "http://bit.ly/9o4uoG"} ], "user_mentions" : [{"id" : 28165790, "indices" : [ 16, 28 ], "name" : "crowdFlower","screen_name" : "crowdFlower"}] }}
  • 93. Entities & Annotations• Entities • Opt-in now but will "soon" be standard • $ easy_install twitter_text• Annotations • User-defined metadata • See http://dev.twitter.com/pages/annotations_overview
  • 94. Manual Entity Extraction import twitter_text extractor = twitter_text.Extractor(tweet[text]) mentions = extractor.extract_mentioned_screen_names_with_indices() hashtags = extractor.extract_hashtags_with_indices() urls = extractor.extract_urls_with_indices() # Splice info into a tweet object
  • 95. Storing Data Mining the Social Web
  • 96. Storing Tweets• Flat files? (Really, who does that?)• A relational database?• Redis?• CouchDB (Relax...?)
  • 97. CouchDB: Relax• Document-oriented key/value• Map/Reduce• RESTful API• Erlang
  • 98. As easy as sitting on the couch• Get it - http://www.couchone.com/get• Install it• Relax - http://localhost:5984/_utils/• Also - $ easy_install couchdb
  • 99. Storing Timeline Dataimport couchdbimport twitterTIMELINE_NAME = "user" # or "home" or "public"t = twitter.Twitter(domain=api.twitter.com, api_version=1)server = couchdb.Server(http://localhost:5984)db = server.create(DB)page_num = 1while page_num <= MAX_PAGES: api_call = getattr(t.statuses, TIMELINE_NAME + _timeline) tweets = makeTwitterRequest(t, api_call, page=page_num) db.update(tweets, all_or_nothing=True) print Fetched %i tweets % len(tweets) page_num += 1
  • 100. Analyzing & Visualizing Data Mining the Social Web
  • 101. Approach:Map/Reduce on Tweets
  • 102. Map/Reduce Paraadigm• Mapper: yields key/value pairs• Reducer: operates on keyed mapper output• Example: Computing the sum of squares • Mapper Input: (k, [2,4,6]) • Mapper Output: (k, [4,16,36]) • Reducer Input: [(k, 4,16), (k, 36)] • Reducer Output: 56
  • 103. Which entities frequently appear in @mentions tweets?
  • 104. @timoreillys Tweet Entities
  • 105. How often does @timoreilly mention specific friends?
  • 106. Filtering Tweet Entities• Lets find out how often someone talks about specific friends• We have friend info on hand• Weve extracted @mentions from the tweets• Lets cound friend vs non-friend mentions
  • 107. @timoreillys friend mentions Number of user entities in tweets who are Number of @user entities in tweets: 20 not friends: 2 Number of @user entities in tweets who are friends: 18 n2vip timoreilly ahier andrewsavikas pkedrosky gnat CodeforAmerica slashdot nytimes OReillyMedia brady dalepd carlmalamud mikeloukides pahlkadot monkchips make fredwilson jamesoreilly digiphile andrewsavikas
  • 108. Who does @timoreilly retweet most frequently?
  • 109. Counting Retweets• Map @mentions out of tweets using a regex• Reduce to sum them up• Sort the results• Display results
  • 110. Retweets by @timoreilly
  • 111. How frequently is @timoreilly retweeted?
  • 112. Retweet Counts• An API resource /statuses/retweet_count exists (and is now functional)• Example: http://twitter.com/statuses/show/29016139807.json • retweet_count • retweeted
  • 113. Survey Says...@timoreilly is retweeted about 2/3 of the time
  • 114. How often does @timoreillyinclude #hashtags in tweets?
  • 115. Counting Hashtags• Use a mapper to emit a #hashtag entities for tweets• Use a reducer to sum them all up• Been there, done that...
  • 116. Survey Says...About 1 out of every 3 tweets by @timoreilly contain #hashtags
  • 117. But if you order within the next 5 mintues... Mining the Social Web
  • 118. Bonus Material:What do #JustinBieber and #TeaParty have in common? Mining the Social Web
  • 119. Tweet Entities
  • 120. #JustinBieber co-occurrences #bieberblast http://tinyurl.com/ #music #Eclipse 343kax4 @justinbieber #somebodytolove @JustBieberFact #nowplaying http://bit.ly/aARD4t @TinselTownDirt #Justinbieber http://bit.ly/b2Kc1L #beliebers #JUSTINBIEBER #Escutando #BieberFact #Proform #justinBieber #Celebrity http://migre.me/TJwj #Restart #Dschungel @ProSieben #TT @_Yassi_ @lojadoaltivo #Telezwerge #musicmonday #JustinBieber @rheinzeitung #video #justinbieber #WTF #tickets
  • 121. #TeaParty co-occurrences @STOPOBAMA2012 #jcot@blogging_tories @TheFlaCracker #tweetcongress#cdnpoli #palin2012 #Obama#fail #AZ #topprog#nra #TopProg #palin#roft #conservative #dems@BrnEyeSuss http://tinyurl.com/386k5hh #acon@crispix49 @ResistTyranny #cspj@koopersmith #tsot #immigration@Kriskxx @ALIPAC #politics#Kagan #majority #hhrs@Liliaep #NoAmnesty #TeaParty#nvsen #patriottweets #vote2010@First_Patriots @Drudge_Report #libertarian#patriot #military #obama#pjtv #palin12 #ucot@andilinks #rnc #iamthemob@RonPaulNews #TCOT #GOP#ampats http://tinyurl.com/24h36zq #tpp#cnn #spwbt #dnc#jews @welshman007 #twisters#GOPDeficit #FF #sgp#wethepeople #liberty #ocra#asamom #glennbeck #gop@thenewdeal #news #tlot#AFIRE #oilspill #p2#Dems #rs #tcot@JIDF #Teaparty #teaparty
  • 122. Hashtag Distributions
  • 123. Hashtag Analysis• TeaParty: ~ 5 hashtags per tweet.• Example: “Rarely is the questioned asked: Is our children learning?” - G.W. Bush #p2 #topprog #tcot #tlot #teaparty #GOP #FF• JustinBieber: ~ 2 hashtags per tweet• Example: #justinbieber is so coool
  • 124. Common #hashtags #lol #dancing #jesus #music #worldcup #glennbeck #teaparty @addthis #AZ #nowplaying #milk #news #ff #WTF #guns #fail #WorldCup #toomanypeople #bp #oilspill #News #catholic
  • 125. Retweet Patterns
  • 126. Retweet Behaviors
  • 127. Friendship Networks
  • 128. Juxtaposing Friendships• Harvest search results for #JustinBieber and #TeaParty• Get friend ids for each @mention with /friends/ids• Resolve screen names with /users/lookup• Populate a NetworkX graph• Analyze it• Visualize with Graphviz
  • 129. Nodes Degrees
  • 130. Two Kinds of Hairballs... #JustinBieber #TeaParty
  • 131. The world twitterverse is your oyster
  • 132. • Twitter: @SocialWebMining• GitHub: http://bit.ly/socialwebmining• Facbook: http://facebook.com/MiningTheSocialWeb Mining the Social Web