ECWAY TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE
CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111
VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com

CLUSTERING LARGE PROBABILISTIC GRAPHS

ABSTRACT:

We study the problem of clustering probabilistic graphs. Similar to the problem of clustering
standard graphs, probabilistic graph clustering has numerous applications, such as finding
complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of
users in affiliation networks.

We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We
establish a connection between our objective function and correlation clustering to propose
practical approximation algorithms for our problem. A benefit of our approach is that our
objective function is parameter-free. Therefore, the number of clusters is part of the output.

We develop methods for testing the statistical significance of the output clustering and study the
case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data,
we show that our methods discover the correct number of clusters and identify established
protein relationships. Finally, we show the practicality of our techniques using a large social
network of Yahoo! users consisting of one billion edges.

Clustering large probabilistic graphs

  • 1.
    ECWAY TECHNOLOGIES IEEE PROJECTS& SOFTWARE DEVELOPMENTS OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111 VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com CLUSTERING LARGE PROBABILISTIC GRAPHS ABSTRACT: We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.