Follower / Following networks are essentially meaningless on Twitter due to the prevalence of spam. However by creating the graphs of conversation networks it is possible to create a better picture of more meaningful connections – the other users that interact / are interacted with by a given user. For power users, however, these graphs can be extremely busy, making it difficult to pick out the most important conversations and connections.
One potential way to summarize the most important connections in a network is to pull out cliques – completely connected sub-graphs. These cliques may represent part of a users core network, or a suggestion of new users to interact with by generating those cliques that a user is connected to. For example, user A might be in a clique with users B, C, D and users B, C, D may be in a clique with a 5th user, E. This suggests that user A might well be interested to interact with user E, as well. This may also help us determine tie strength as well, as a clique is likely an indication of a stronger tie strength than just a singly connected node.
2. What’s Twitter?
Micro-blogging service.
Answer the question, “What’s
Happening” in 140 characters or
less.
3. Is it Useful?
• Yes!
• Information Gathering
• Ambient Awareness
• Customer
Relationship
Management
• Conversations
4. Follower / Following
Networks
• Meaningless - too much
spam
• Looking for a better
measure of engagement
• Spammers don’t have
conversations -
nobody talks to
spammers.
12. Data-mine
conversation
networks
GraphML
Python script processes and
creates a mapping
List of GraphML
connections
Find Cliques List of
GraphML
Visualize
Cliques
(Haskell) (Prefuse)
Python script combines
cliques and mapping
19. About the Algorithm
• Written in Haskell
• Fast
• Over 1000 nodes (many
more edges) in around 5
seconds
• Compact
• Clique finding consists
of16 lines of code
• Including white-space,
comments, signatures
and pre-processing
functions, 90 lines.
20. Leveraging the Sparse
Nature of the Graphs
• In a graph of over 1000
nodes, many don’t have
enough connections to
be part of a clique of
size 3 or 4: eliminate
them.
• Eliminate many
connections by removing
duplicates.
21. Going Further?
• Possible to do this
multiple times, until the
graph is reduced to the
central core.
• This is where the cliques
tend to be.
• Perhaps due to the
nature of Haskell, the
benefits of this are
limited.
25. Number of Cliques relative to follower/
following. See full gadget at http://bit.ly/4FwrHm
26. Don’t Prevents endless loop
want to go
somewhere
we’ve already
been. Set of
Set of Constraints movements for a
vertex from the
possibilities set of
Want to
possibilities.
move to a
higher number
Pruning - don’t want to generate than we have
the same clique multiple times so far.
Intersect the two sets and recurse with the new set of possibilities.
Algorithm Summary
27. Further Work
• Can we determine the
nature of a user by the
number of cliques they
are in?
• Do cliques indicate
stronger ties?
• Do the cliques that a
user is connected to
make good suggestions
for interaction?
28. Further Work
• What optimizations
work for Haskell?
• Can we make it faster?
• How does it compare to
speed from C
algorithms?
• How does average clique
size compare to
Dunbar’s research?