Twitter Community Extraction by Markov Clustering

Twitter Community Extraction
(Beware of geeks bearing .gifs?)
https://github.com/kingsBSD
http://bigsocialdata.net/
@kingsBSD
Our Data Ourselves Project:
Giles Greenway, Tobias Blanke, Jennifer Pybus & Mark Coté.
Departments of Digital Humanities & Culture, Media & Creative
Industries, King's College London
“Our aims are to increase our understanding of the nature
and role of the data that young people produce when they
use platforms and applications on their smartphones.”
Social network community extraction is part of this.

Markov Clustering MCL
Assumptions:
● There are clusters of
Twitter users with
densely connected
networks of
friend/follower
relationships.
● If you take a random
walk around the network,
you are likely to stay
within the cluster you
started in.
http://www.micans.org/mcl/

MCL A Trivial Example
1: Build an adjacency matrix for the graph.
2: Normalize the columns to produce
transition probabilities.

3: Square the matrix to get probabilities
after two steps.

4: Element wise square the matrix and re
normalize.
5: Rinse and repeat until convergence.
The matrix entries will be 0 or 1. Interpret
rows as: “If I'm in this row node, which
column nodes are credible startpoints?”

Does it work?
Gephi's “OpenOrd” layout is meant to
emphasise clusters. Are nodes in the same
cluster close together?
Compare with Gephi's own “modularity
algorithm”, the Louvain method.
MCL was applied to two Twitter accounts of
digital culture researchers with ~7000 once
removed friendfollower relationships.

Does it work?
Why did Gephi put these two in the same modularity class?
Researchers rated clusters for both methods.
MCL Louvain
Cluster is identifiable and
relevant.
20% 0% !
Cluster is not identifiable,
but possibly relevant.
37%
Cluster is neither
identifiable or relevant.
43%

Tools
● Acquire Twitter data with
Twython/Celery/Redis/RabbbitMQ.
● Store Twitter data with: Neo4J/Py2Neo.
● Perform MCL with NumPy.
● Export to Gephi with NetworkX.
Conclusions
● The Louvain method works by combining smaller
clusters to maximize modularity. Does the very high
degree of Twitter networks harm its performance?
● MCL produces highly relevant clusterings, albeit rather
slowly.

Twitter Community Extraction by Markov Clustering

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (7)

Similar to Twitter Community Extraction by Markov Clustering

Similar to Twitter Community Extraction by Markov Clustering (20)

Recently uploaded

Recently uploaded (20)

Twitter Community Extraction by Markov Clustering