Unleashing Twitter Data for Fun and Insight

Unleashing Twitter Data
for fun and insight

Matthew A. Russell
http://linkedin.com/in/ptwobrussell
@ptwobrussell

Agile Data Solutions Social Web
Mining the

Mining the Social Web
Chapters 1-5
Introduction: Trends, Tweets, and Twitterers
Microformats: Semantic Markup and Common Sense Collide
Mailboxes: Oldies but Goodies
Friends, Followers, and Setwise Operations
Twitter: The Tweet, the Whole Tweet, and
Nothing but the Tweet

Chapters 6-10

LinkedIn: Clustering Your Professional Network For Fun (and
Proﬁt?)
Google Buzz: TF-IDF, Cosine Similarity, and Collocations
Blogs et al: Natural Language Processing (and Beyond)
Facebook: The All-In-One Wonder
The Semantic Web: A Cocktail Discussion

O verview

• Trends, Tweets, and Retweet Visualizations
• Friends, Followers, and Setwise Operations
• The Tweet, the Whole Tweet, and Nothing but the Tweet

Insight Matters

• What is @user's potential inﬂuence?
• What are @user's passions right now?
• Who are @user's most trusted friends?

Part 1:
Tweets, Trends, and Retweet
Visualizations

Mining the

A point to ponder:
Twitter : Data :: JavaScript : Programming Languages (???)

Getting Ready To Code

Mining the

Python Installation

• Mac users already have it
• Linux users probably have it
• Windows users should grab ActivePython

easy_install
• Installs packages from PyPI
• Get it:
• http://pypi.python.org/pypi/setuptools
• Ships with ActivePython
• It really is easy:
easy_install twitter
easy_install nltk
easy_install networkx

Git It?
• http://github.com/ptwobrussell/Mining-the-Social-Web
• git clone git://github.com/ptwobrussell/Mining-the-Social-Web.git
• introduction__*.py
• friends_followers__*.py
• the_tweet__*.py

Getting Data

Mining the

Twitter Data Sources

• Twitter API Resources
• GNIP
• Infochimps
• Library of Congress

Trending Topics

>>> import twitter # Remember to "easy_install twitter"
>>> twitter_search = twitter.Twitter(domain="search.twitter.com")
>>> trends = twitter_search.trends()
>>> [ trend['name'] for trend in trends['trends'] ]

[u'#ZodiacFacts', u'#nowplaying', u'#ItsOverWhen',
u'#Christoferdrew', u'Justin Bieber', u'#WhatwouldItBeLike',
u'#Sagittarius', u'SNL', u'#SurveySays', u'#iDoit2']

Search Results

>>> search_results = []
>>> for page in range(1,6):
... search_results.append(twitter_search.search(q="SNL",rpp=100, page=page))

Search Results (continued)
>>> import json
>>> print json.dumps(search_results, sort_keys=True, indent=1)
[
{
"completed_in": 0.088122000000000006,
"max_id": 11966285265,
"next_page": "?page=2&max_id=11966285265&rpp=100&q=SNL",
"page": 1,
"query": "SNL",
"refresh_url": "?since_id=11966285265&q=SNL",

...more...

"results": [
{
"created_at": "Sun, 11 Apr 2010 01:34:52 +0000",
"from_user": "bieber_luv2",
"from_user_id": 106998169,
"geo": null,
"id": 11966285265,
"iso_language_code": "en",
"metadata": {
"result_type": "recent"
},
...more...

"profile_image_url": "http://a1.twimg.com/profile_images/80...",
"source": "<a href="http://twitter.com/&quo...",
"text": "im nt gonna go to sleep happy unless i see ...",
"to_user_id": null
}
... output truncated - 99 more tweets ...
],
"results_per_page": 100,
"since_id": 0
},
... output truncated - 4 more pages ...
]

Lexical Diversity

• Ratio of unique terms to total terms
• A measure of "stickiness"?
• A measure of "group think"?
• A crude indicator of retweets to originally authored tweets?

Distilling Tweet Text
>>> # search_results is already defined

>>> tweets = [ r['text']
... for result in search_results
... for r in result['results'] ]

>>> words = []

>>> for t in tweets:
... words += [ w for w in t.split() ]
...

Analyzing Data

Mining the

Lexical Diversity
>>> len(words)
7238

>>> # unique words
>>> len(set(words))
1636

>>> # lexical diversity
>>> 1.0*len(set(words))/len(words)
0.22602928985907708

>>> # average number of words per tweet
>>> 1.0*sum([ len(t.split()) for t in tweets ])/len(tweets)
14.476000000000001

Size Frequency Matters

• Counting: always the ﬁrst step
• Simple but effective
• NLTK saves us a little trouble

Frequency Analysis
>>> import nltk
>>> freq_dist = nltk.FreqDist(words)
>>> freq_dist.keys()[:50] #50 most frequent tokens

[u'snl', u'on', u'rt', u'is', u'to', u'i', u'watch', u'justin',
u'@justinbieber', u'be', u'the', u'tonight', u'gonna', u'at',
u'in', u'bieber', u'and', u'you', u'watching', u'tina', u'for',
u'a', u'wait', u'fey', u'of', u'@justinbieber:', u'if', u'with',
u'so', u"can't", u'who', u'great', u'it', u'going', u'im', u':)',
u'snl...', u'2nite...', u'are', u'cant', u'dress', u'rehearsal',
u'see', u'that', u'what', u'but', u'tonight!', u':d', u'2',
u'will']

Tweet and RT were sitting on a fence.
Tweet fell off. Who was left?

RTs: past, present, & future

• Retweet: Tweeting a tweet that's already been tweeted
• RT or via followed by @mention
• Example: RT @SocialWebMining Justin Bieber is on SNL 2nite. w00t?!?
• Relatively new APIs were rolled out last year for retweeting sans
conventions

Some people, when confronted with a problem, think "I know,
I'll use regular expressions." Now they have two
problems. -- Jamie Zawinski

Parsing Retweets
>>> example_tweets = ["Visualize Twitter search results w/ this simple script
http://bit.ly/cBu0l4 - Gist instructions http://bit.ly/9SZ2kb (via
@SocialWebMining @ptwobrussell)"]

>>> import re
>>> rt_patterns = re.compile(r"(RT|via)((?:bW*@w+)+)",
... re.IGNORECASE)

>>> rt_origins = []
>>> for t in example_tweets:
... try:
... rt_origins += [mention.strip()
... for mention in rt_patterns.findall(t)[0][1].split()]
... except IndexError, e:
... pass

>>> [rto.strip("@") for rto in rt_origins]

Visualizing Data

Mining the

Graph Construction

>>> import networkx as nx
>>> g = nx.DiGraph()
>>> g.add_edge("@SocialWebMining", "@ptwobrussell",
... {"tweet_id" : 4815162342},)

Writing out DOT
OUT_FILE = "out_file.dot"

try:
nx.drawing.write_dot(g, OUT_FILE)
except ImportError, e:
dot = ['"%s" -> "%s" [tweet_id=%s]' %
(n1, n2, g[n1][n2]['tweet_id']) for n1, n2 in g.edges()]

f = open(OUT_FILE, 'w')
f.write('strict digraph {n%sn}' % (';n'.join(dot),))
f.close()

Example DOT Language

strict digraph {
"@ericastolte" -> "bonitasworld" [tweet_id=11965974697];
"@mpcoelho" ->"Lil_Amaral" [tweet_id=11965954427];
"@BieberBelle123" -> "BELIEBE4EVER" [tweet_id=11966261062];
"@BieberBelle123" -> "sabrina9451" [tweet_id=11966197327];
}

DOT to Image

• Download Graphviz: http://www.graphviz.org/
•$ dot -Tpng out_file.dot > graph.png
• Windows users might prefer GVEdit

Protovis: Extreme Closeup

38 Mining the Social Web

It Doesn't Have To Be a Graph

Graph Connectedness

Part 2:
Friends, Followers, and Setwise
Operations

Mining the

Insight Matters

• What is my potential inﬂuence?
• Who are the most popular people in my network?
• Who are my mutual friends?
• What common friends/followers do I have with @user?
• Who is not following me back?
• What can I learn from analyzing my friendship cliques?

OAuth (1.0a)
import twitter
from twitter.oauth_dance import oauth_dance

# Get these from http://dev.twitter.com/apps/new
consumer_key, consumer_secret = 'key', 'secret'

(oauth_token, oauth_token_secret) = oauth_dance('MiningTheSocialWeb',
consumer_key, consumer_secret)

auth=twitter.oauth.OAuth(oauth_token, oauth_token_secret,
consumer_key, consumer_secret)

t = twitter.Twitter(domain='api.twitter.com', auth=auth)

Getting Friendship Data

friend_ids = t.friends.ids(screen_name='timoreilly', cursor=-1)
follower_ids = t.followers.ids(screen_name='timoreilly', cursor=-1)

# store the data somewhere...

Perspective: Fetching all of Lady Gaga's
~7M followers would take ~4 hours

Rate Limits
• 350 requests/hr for authenticated requests
• 150 requests/hr for anonymous requests
• Coping mechanisms:
• Caching & Archiving Data
• Streaming API
• HTTP 400 codes
• See http://dev.twitter.com/pages/rate-limiting

The Beloved Fail Whale

• Twitter is sometimes "overcapacity"
• HTTP 503 Error
• Handle it just as any other HTTP error
• RESTfulness has its advantages

Abstraction Helps
friend_ids = []
wait_period = 2 # secs
cursor = -1

while cursor != 0:
response = makeTwitterRequest(t, # twitter.Twitter instance
t.friends.ids,
screen_name=screen_name,
cursor=cursor)

friend_ids += response['ids']
cursor = response['next_cursor']
# break out of loop early if you don't need all ids

Abstracting Abstractions
screen_name = 'timoreilly'

# This is what you ultimately want...

friend_ids = getFriends(screen_name)
follower_ids = getFollowers(screen_name)

Storing Data

Mining the

Flat Files?
./
screen_name1/
friend_ids.json
follower_ids.json
user_info.json

screen_name2/
...

...

Pickles?
import cPickle

o = {
'friend_ids' : friend_ids,
'follower_ids' : follower_ids,
'user_info' : user_info
}

f = open('screen_name1.pickle, 'wb')
cPickle.dump(o, f)
f.close()

A relational database?
import sqlite3 as sqlite

conn = sqlite.connect('data.db')
c = conn.cursor()

c.execute('''create table
friends...''')

c.execute('''insert into friends...
''')

# Lots of fun...sigh...

Redis (A Data Structures Server)

import redis

r = redis.Redis()

[ r.sadd("timoreilly$friend_ids", i) for i in friend_ids ]

r.smembers("timoreilly$friend_ids") # returns a set

Project page: http://redis.io
Windows binary: http://code.google.com/p/servicestack/wiki/RedisWindowsDownload

Redis Set Operations
• Key/value store...on typed values!
• Common set operations
• smembers, scard
• sinter, sdiff, sunion
• sadd, srem, etc.
• See http://code.google.com/p/redis/wiki/CommandReference
• Don't forget to $ easy_install redis

Setwise Operations

• Union
• Intersection
• Difference
• Complement

Venn Diagrams

Followers - Friends
Friends
Friends - Followers

Friends Followers
U
Followers

Count Your Blessings
# A utility function
def getRedisIdByScreenName(screen_name, key_name):
return 'screen_name$' + screen_name + '$' + key_name

# Number of friends
n_friends = r.scard(getRedisIdByScreenName(screen_name,
'friend_ids'))

# Number of followers
n_followers = r.scard(getRedisIdByScreenName(screen_name,
'follower_ids'))

Asymmetric Relationships

# Friends who aren't following back
friends_diff_followers = r.sdiffstore('temp', [
getRedisIdByScreenName(screen_name, 'friend_ids'),
getRedisIdByScreenName(screen_name, 'follower_ids')
])
# ... compute interesting things ...
r.delete('temp')

Asymmetric Relationships

# Followers who aren't friended
followers_diff_friends = r.sdiffstore('temp', [
getRedisIdByScreenName(screen_name, 'follower_ids'),
getRedisIdByScreenName(screen_name, 'friend_ids')
])
r.delete('temp')

Symmetric Relationships

mutual_friends = r.sinterstore('temp', [
getRedisIdByScreenName(screen_name, 'follower_ids'),
getRedisIdByScreenName(screen_name, 'friend_ids')
])
r.delete('temp')

Sample Output

timoreilly is following 663

timoreilly is being followed by 1,423,704

131 of 663 are not following timoreilly back

1,423,172 of 1,423,704 are not being followed back by
timoreilly

timoreilly has 532 mutual friends

Who Isn't Following Back?
user_ids = [ ... ] # Resolve these to user info objects

while len(user_ids) > 0:
user_ids_str, = ','.join([ str(i) for i in user_ids[:100] ])
user_ids = user_ids[100:]

response = t.users.lookup(user_id=user_ids)

if type(response) is dict: response = [response]
r.mset(dict([(getRedisIdByUserId(resp['id'], 'info.json'), json.dumps(resp))
for resp in response]))

r.mset(dict([(getRedisIdByScreenName(resp['screen_name'],'info.json'),
json.dumps(resp)) for resp in response]))

Friends in Common
# Assume we've harvested friends/followers and it's in Redis...
screen_names = ['timoreilly', 'mikeloukides']

r.sinterstore('temp$friends_in_common',
[getRedisIdByScreenName(screen_name, 'friend_ids')
for screen_name in screen_names])

r.sinterstore('temp$followers_in_common',
[getRedisIdByScreenName(screen_name,'follower_ids')
for screen_name in screen_names])

# Manipulate the sets

Potential Inﬂuence

• My followers?
• My followers' followers?
• My followers' followers' followers?
•for n in range(1, 7): # 6 degrees?
print "My " + "followers' "*n + "followers?"

Saving a Thousand Words...

{
1

2 Branching 3
Factor = 2

Depth = 3 4 5 6 7

8 9 10 11 12 13 14 15

Same Data, Different Layout
9 10

4 5
8 2 11

1

12 3 15
6 7

13 14

4 12 5

Space Complexity
Depth
1 2 3 4 5
2 3 7 15 31 63
Branching 3 4 13 40 121 364
Factor 4 5 21 85 341 1365
5 6 31 156 781 3906
6 7 43 259 1555 9331

Breadth-First Traversal
Create an empty graph
Create an empty queue to keep track of unprocessed nodes

Add the starting point to the graph as the "root node"
Add the root node to a queue for processing

Repeat until some maximum depth is reached or the queue is empty:
Remove a node from queue
For each of the node's neighbors:
If the neighbor hasn't already been processed:
Add it to the graph
Add it to the queue
Add an edge to the graph connecting the node & its neighbor

Breadth-First Harvest

next_queue = [ 'timoreilly' ] # seed node
d = 1

while d < depth:
d += 1
queue, next_queue = next_queue, []
for screen_name in queue:
follower_ids = getFollowers(screen_name=screen_name)
next_queue += follower_ids
getUserInfo(user_ids=next_queue)

The Most Popular Followers

freqs = {}
for follower in followers:
cnt = follower['followers_count']
if not freqs.has_key(cnt):
freqs[cnt] = []

freqs[cnt].append({'screen_name': follower['screen_name'],
'user_id': f['id']})

popular_followers = sorted(freqs, reverse=True)[:100]

Average # of Followers

all_freqs = [k for k in keys for user in freqs[k]]
avg = sum(all_freqs) / len(all_freqs)

@timoreilly's Popular Followers

The top 10 followers from the sample:

aplusk 4,993,072
BarackObama 4,114,901
mashable 2,014,615
MarthaStewart 1,932,321
Schwarzenegger 1,705,177
zappos 1,689,289
Veronica 1,612,827
jack 1,592,004
stephenfry 1,531,813
davos 1,522,621

Futzing the Numbers

• The average number of timoreilly's followers' followers: 445
• Discarding the top 10 lowers the average to around 300
• Discarding any follower with less than 10 followers of their
own increases the average to over 1,000!
• Doing both brings the average to around 800

The Right Tool For the Job:
NetworkX for Networks

Friendship Graphs
for i in ids: #ids is timoreilly's id along with friend ids
info = json.loads(r.get(getRedisIdByUserId(i, 'info.json')))
screen_name = info['screen_name']
friend_ids = list(r.smembers(getRedisIdByScreenName(screen_name,
'friend_ids')))

for friend_id in [fid for fid in friend_ids if fid in ids]:
friend_info = json.loads(r.get(getRedisIdByUserId(friend_id, 'info.json')))
g.add_edge(screen_name, friend_info['screen_name'])

nx.write_gpickle(g, 'timoreilly.gpickle') # see also nx.read_gpickle

Clique Analysis

• Cliques
• Maximum Cliques
• Maximal Cliques

http://en.wikipedia.org/wiki/Clique_problem

Calculating Cliques
cliques = [c for c in nx.find_cliques(g)]

num_cliques = len(cliques)
clique_sizes = [len(c) for c in cliques]

max_clique_size = max(clique_sizes)
avg_clique_size = sum(clique_sizes) / num_cliques
max_cliques = [c for c in cliques if len(c) == max_clique_size]
num_max_cliques = len(max_cliques)

people_in_every_max_clique = list(reduce(
lambda x, y: x.intersection(y),[set(c) for c in max_cliques]
))

Cliques for @timoreilly

Num cliques: 762573
Avg clique size: 14
Max clique size: 26
Num max cliques: 6
Num people in every max clique: 20

Graphs, etc

• Your ﬁrst instinct is naturally
G = (V, E) ?

Dorling Cartogram

• A location-aware bubble chart (ish)
• At least 3-dimensional
• Position, color, size
• Look at friends/followers by state

Sunburst of Friends

• A very compact visualization
• Slice and dice friends/followers by
gender, country, locale, etc.

Part 3:
The Tweet, the Whole Tweet, and
Nothing but the Tweet

Mining the

Insight Matters

• Which entities frequently appear in @user's tweets?
• How often does @user talk about speciﬁc friends?
• Who does @user retweet most frequently?
• How frequently is @user retweeted (by anyone)?
• How many #hashtags are usually in @user's tweets?

Pen : Sword :: Tweet : Machine Gun (?!?)

Getting Data


Let me count the APIs...

• Timelines
• Tweets
• Favorites
• Direct Messages
• Streams

Anatomy of a Tweet (1/2)
{
"created_at" : "Thu Jun 24 14:21:11 +0000 2010",
"id" : 16932571217,
"text" : "Great idea from @crowdflower: Crowdsourcing ... #opengov",
"user" : {
"description" : "Founder and CEO, O'Reilly Media. Watching the alpha geeks...",
"id" : 2384071,
"location" : "Sebastopol, CA",
"name" : "Tim O'Reilly",
"screen_name" : "timoreilly",
"url" : "http://radar.oreilly.com"
},

...

Anatomy of a Tweet (2/2)

...

"entities" : {
"hashtags" : [ {"indices" : [ 97, 103 ], "text" : "gov20"},
{"indices" : [ 104, 112 ], "text" : "opengov"} ],

"urls" : [{"expanded_url" : null, "indices" : [ 76, 96 ],
"url" : "http://bit.ly/9o4uoG"} ],

"user_mentions" : [{"id" : 28165790, "indices" : [ 16, 28 ],
"name" : "crowdFlower","screen_name" : "crowdFlower"}]
}
}

Entities & Annotations

• Entities
• Opt-in now but will "soon" be standard
• $ easy_install twitter_text
• Annotations
• User-deﬁned metadata
• See http://dev.twitter.com/pages/annotations_overview

Manual Entity Extraction
import twitter_text

extractor = twitter_text.Extractor(tweet['text'])

mentions = extractor.extract_mentioned_screen_names_with_indices()
hashtags = extractor.extract_hashtags_with_indices()
urls = extractor.extract_urls_with_indices()

# Splice info into a tweet object

Storing Data


Storing Tweets

• Flat ﬁles? (Really, who does that?)
• A relational database?
• Redis?
• CouchDB (Relax...?)

CouchDB: Relax

• Document-oriented key/value
• Map/Reduce
• RESTful API
• Erlang

As easy as sitting on the couch

• Get it - http://www.couchone.com/get
• Install it
• Relax - http://localhost:5984/_utils/
• Also - $ easy_install couchdb

Storing Timeline Data
import couchdb
import twitter

TIMELINE_NAME = "user" # or "home" or "public"

t = twitter.Twitter(domain='api.twitter.com', api_version='1)

server = couchdb.Server('http://localhost:5984')
db = server.create(DB)

page_num = 1
while page_num <= MAX_PAGES:
api_call = getattr(t.statuses, TIMELINE_NAME + '_timeline')
tweets = makeTwitterRequest(t, api_call, page=page_num)
db.update(tweets, all_or_nothing=True)
print 'Fetched %i tweets' % len(tweets)
page_num += 1

Analyzing & Visualizing Data


Approach:
Map/Reduce on Tweets

Map/Reduce Paraadigm

• Mapper: yields key/value pairs
• Reducer: operates on keyed mapper output
• Example: Computing the sum of squares
• Mapper Input: (k, [2,4,6])
• Mapper Output: (k, [4,16,36])
• Reducer Input: [(k, 4,16), (k, 36)]
• Reducer Output: 56

Which entities frequently appear in
@mention's tweets?

How often does @timoreilly
mention speciﬁc friends?

Filtering Tweet Entities

• Let's ﬁnd out how often someone talks about
speciﬁc friends
• We have friend info on hand
• We've extracted @mentions from the tweets
• Let's cound friend vs non-friend mentions

@timoreilly's friend mentions
Number of user entities in tweets who are
Number of @user entities in tweets: 20
not friends: 2
Number of @user entities in tweets who
are friends: 18
n2vip
timoreilly
ahier andrewsavikas
pkedrosky gnat
CodeforAmerica slashdot
nytimes OReillyMedia
brady dalepd
carlmalamud mikeloukides
pahlkadot monkchips
make fredwilson
jamesoreilly digiphile
andrewsavikas

Who does @timoreilly retweet
most frequently?

Counting Retweets

• Map @mentions out of tweets using a regex
• Reduce to sum them up
• Sort the results
• Display results

How frequently is @timoreilly
retweeted?

Retweet Counts

• An API resource /statuses/retweet_count exists (and is now functional)
• Example: http://twitter.com/statuses/show/29016139807.json
• retweet_count
• retweeted

Survey Says...
@timoreilly is retweeted about 2/3
of the time

How often does @timoreilly
include #hashtags in tweets?

Counting Hashtags

• Use a mapper to emit a #hashtag entities for tweets
• Use a reducer to sum them all up
• Been there, done that...

Survey Says...
About 1 out of every 3 tweets by
@timoreilly contain #hashtags

But if you order within the next 5
mintues...


Bonus Material:
What do #JustinBieber and #TeaParty
have in common?


#JustinBieber co-occurrences

#bieberblast http://tinyurl.com/
#music
#Eclipse 343kax4
@justinbieber
#somebodytolove @JustBieberFact
#nowplaying
http://bit.ly/aARD4t @TinselTownDirt
#Justinbieber
http://bit.ly/b2Kc1L #beliebers
#JUSTINBIEBER
#Escutando #BieberFact
#Proform
#justinBieber #Celebrity
http://migre.me/TJwj
#Restart #Dschungel
@ProSieben
#TT @_Yassi_
@lojadoaltivo
#Telezwerge #musicmonday
#JustinBieber
@rheinzeitung #video
#justinbieber
#WTF #tickets

#TeaParty co-occurrences
@STOPOBAMA2012 #jcot
@blogging_tories @TheFlaCracker #tweetcongress
#cdnpoli #palin2012 #Obama
#fail #AZ #topprog
#nra #TopProg #palin
#roft #conservative #dems
@BrnEyeSuss http://tinyurl.com/386k5hh #acon
@crispix49 @ResistTyranny #cspj
@koopersmith #tsot #immigration
@Kriskxx @ALIPAC #politics
#Kagan #majority #hhrs
@Liliaep #NoAmnesty #TeaParty
#nvsen #patriottweets #vote2010
@First_Patriots @Drudge_Report #libertarian
#patriot #military #obama
#pjtv #palin12 #ucot
@andilinks #rnc #iamthemob
@RonPaulNews #TCOT #GOP
#ampats http://tinyurl.com/24h36zq #tpp
#cnn #spwbt #dnc
#jews @welshman007 #twisters
#GOPDeficit #FF #sgp
#wethepeople #liberty #ocra
#asamom #glennbeck #gop
@thenewdeal #news #tlot
#AFIRE #oilspill #p2
#Dems #rs #tcot
@JIDF #Teaparty #teaparty

Hashtag Analysis

• TeaParty: ~ 5 hashtags per tweet.
• Example: “Rarely is the questioned asked: Is our children
learning?” - G.W. Bush #p2 #topprog #tcot #tlot #teaparty
#GOP #FF
• JustinBieber: ~ 2 hashtags per tweet
• Example: #justinbieber is so coool

Common #hashtags
#lol #dancing
#jesus #music
#worldcup #glennbeck
#teaparty @addthis
#AZ #nowplaying
#milk #news
#ff #WTF
#guns #fail
#WorldCup #toomanypeople
#bp #oilspill
#News #catholic

Juxtaposing Friendships

• Harvest search results for #JustinBieber and #TeaParty
• Get friend ids for each @mention with /friends/ids
• Resolve screen names with /users/lookup
• Populate a NetworkX graph
• Analyze it
• Visualize with Graphviz

Two Kinds of Hairballs...

#JustinBieber #TeaParty

The world twitterverse is your oyster

• Twitter: @SocialWebMining
• GitHub: http://bit.ly/socialwebmining
• Facbook: http://facebook.com/MiningTheSocialWeb


Unleashing Twitter Data for Fun and Insight

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Unleashing Twitter Data for Fun and Insight

Similar to Unleashing Twitter Data for Fun and Insight (20)

More from Matthew Russell

More from Matthew Russell (8)

Recently uploaded

Recently uploaded (20)

Unleashing Twitter Data for Fun and Insight