Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Twitter Community Extraction by Markov Clustering

1,457 views

Published on

We illustrate the use of the MCL algorithm to find communities of Twitter users. This is compared to the out-of-the-box algorithm used by Gephi, forming a cautionary tale about the careless use of visualizations.

Published in: Internet, Technology
  • Be the first to comment

Twitter Community Extraction by Markov Clustering

  1. 1. Twitter Community Extraction (Beware of geeks bearing .gifs?) https://github.com/kingsBSD http://big­social­data.net/ @kingsBSD Our Data Ourselves Project: Giles Greenway, Tobias Blanke, Jennifer Pybus & Mark Coté. Departments of Digital Humanities &  Culture, Media & Creative  Industries, King's College London “Our aims are to increase our understanding of the nature  and role of the data that young people produce when they  use platforms and applications on their smartphones.” Social network community extraction is part of this. 
  2. 2. Markov Clustering ­MCL Assumptions: ● There are clusters of  Twitter users with  densely connected  networks of  friend/follower  relationships. ● If you take a random  walk around the network,  you are likely to stay  within the cluster you  started in. http://www.micans.org/mcl/
  3. 3. MCL ­A Trivial Example 1: Build an adjacency matrix for the graph. 2: Normalize the columns to produce  transition probabilities.
  4. 4. MCL ­A Trivial Example 3: Square the matrix to get probabilities  after two steps.
  5. 5. MCL ­A Trivial Example 4: Element wise square the matrix and re­ normalize. 5: Rinse and repeat until convergence. The matrix entries will be 0 or 1. Interpret  rows as: “If I'm in this row node, which  column nodes are credible start­points?” 
  6. 6. Does it work? Gephi's “OpenOrd” layout is meant to  emphasise clusters. Are nodes in the same  cluster close together? Compare with Gephi's own “modularity  algorithm”, the Louvain method. MCL was applied to two Twitter accounts of  digital culture researchers with ~7000 once­ removed friend­follower relationships.
  7. 7. Does it work? MCL Louvain
  8. 8. Does it work? Why did Gephi put these two in the same modularity class? Researchers rated clusters for both methods. MCL Louvain Cluster is identifiable and  relevant. 20% 0% ! Cluster is not identifiable,  but possibly relevant. 37% Cluster is neither  identifiable or relevant. 43%
  9. 9. Tools ● Acquire Twitter data with  Twython/Celery/Redis/RabbbitMQ. ● Store Twitter data with: Neo4J/Py2Neo. ● Perform MCL with NumPy. ● Export to Gephi with NetworkX. Conclusions ● The Louvain method works by combining smaller  clusters to maximize modularity. Does the very high  degree of Twitter networks harm its performance? ●  MCL produces highly relevant clusterings, albeit rather  slowly.
  10. 10. Tools ● Acquire Twitter data with  Twython/Celery/Redis/RabbbitMQ. ● Store Twitter data with: Neo4J/Py2Neo. ● Perform MCL with NumPy. ● Export to Gephi with NetworkX. Conclusions ● The Louvain method works by combining smaller  clusters to maximize modularity. Does the very high  degree of Twitter networks harm its performance? ●  MCL produces highly relevant clusterings, albeit rather  slowly.

×