SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
We illustrate the use of the MCL algorithm to find communities of Twitter users. This is compared to the out-of-the-box algorithm used by Gephi, forming a cautionary tale about the careless use of visualizations.
We illustrate the use of the MCL algorithm to find communities of Twitter users. This is compared to the out-of-the-box algorithm used by Gephi, forming a cautionary tale about the careless use of visualizations.
1.
Twitter Community Extraction
(Beware of geeks bearing .gifs?)
https://github.com/kingsBSD
http://bigsocialdata.net/
@kingsBSD
Our Data Ourselves Project:
Giles Greenway, Tobias Blanke, Jennifer Pybus & Mark Coté.
Departments of Digital Humanities & Culture, Media & Creative
Industries, King's College London
“Our aims are to increase our understanding of the nature
and role of the data that young people produce when they
use platforms and applications on their smartphones.”
Social network community extraction is part of this.
2.
Markov Clustering MCL
Assumptions:
● There are clusters of
Twitter users with
densely connected
networks of
friend/follower
relationships.
● If you take a random
walk around the network,
you are likely to stay
within the cluster you
started in.
http://www.micans.org/mcl/
3.
MCL A Trivial Example
1: Build an adjacency matrix for the graph.
2: Normalize the columns to produce
transition probabilities.
4.
MCL A Trivial Example
3: Square the matrix to get probabilities
after two steps.
5.
MCL A Trivial Example
4: Element wise square the matrix and re
normalize.
5: Rinse and repeat until convergence.
The matrix entries will be 0 or 1. Interpret
rows as: “If I'm in this row node, which
column nodes are credible startpoints?”
6.
Does it work?
Gephi's “OpenOrd” layout is meant to
emphasise clusters. Are nodes in the same
cluster close together?
Compare with Gephi's own “modularity
algorithm”, the Louvain method.
MCL was applied to two Twitter accounts of
digital culture researchers with ~7000 once
removed friendfollower relationships.
8.
Does it work?
Why did Gephi put these two in the same modularity class?
Researchers rated clusters for both methods.
MCL Louvain
Cluster is identifiable and
relevant.
20% 0% !
Cluster is not identifiable,
but possibly relevant.
37%
Cluster is neither
identifiable or relevant.
43%
9.
Tools
● Acquire Twitter data with
Twython/Celery/Redis/RabbbitMQ.
● Store Twitter data with: Neo4J/Py2Neo.
● Perform MCL with NumPy.
● Export to Gephi with NetworkX.
Conclusions
● The Louvain method works by combining smaller
clusters to maximize modularity. Does the very high
degree of Twitter networks harm its performance?
● MCL produces highly relevant clusterings, albeit rather
slowly.
10.
Tools
● Acquire Twitter data with
Twython/Celery/Redis/RabbbitMQ.
● Store Twitter data with: Neo4J/Py2Neo.
● Perform MCL with NumPy.
● Export to Gephi with NetworkX.
Conclusions
● The Louvain method works by combining smaller
clusters to maximize modularity. Does the very high
degree of Twitter networks harm its performance?
● MCL produces highly relevant clusterings, albeit rather
slowly.