15. Community Detection
• Communities and clusters are different
• Network data is related to graph properties
• Real world means big data
DIME 2014, FEUP, 5-12-2014 15
16. Modularity
• Compares number of edges with number of
edges of a random network
• Maximize Q is NP-hard
DIME 2014, FEUP, 5-12-2014 16
j
g,
i
g
ij
ij
P
ij
A
m2
1
Q
m2
j
k
i
k
ij
P
17. Clauset-Newman-Moore
A hierarchical agglomeration algorithm for detecting community
structure which is faster than many competing algorithms.
Its running time on a network with n vertices and m edges is
O(md log n) where d is the depth of the dendrogram describing the
community structure.
DIME 2014, FEUP, 5-12-2014 17
NodeXL
18. Wakita-Tsurumi
CNM algorithm does not scale well and its use is practically limited to
networks whose sizes are up to 500,000 nodes.
A simple heuristics that attempts to merge community structures in a
balanced manner can dramatically improve community structure
analysis.
DIME 2014, FEUP, 5-12-2014 18
NodeXL
19. Girvan-Newman
A property that is found in many networks, the property of community
structure, in which network nodes are joined together in tightly knit
groups, between which there are only looser connections.
We propose a method for detecting such communities, built around
the idea of using centrality indices to find community boundaries.
DIME 2014, FEUP, 5-12-2014 19
NodeXL
20. Chinese Whispers [Biemann]
• a
Randomized graph-clustering algorithm, which is time-linear in the
number of edges.
It can be viewed as a simulation of an agent-based social network.
DIME 2014, FEUP, 5-12-2014 20
Gephi plugin
26. Figure 5. Map of science derived from clickstream data.
Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3):
e4803. doi:10.1371/journal.pone.0004803
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0004803
38. Datasets
• netvizz
• I keep my collection here
https://sites.google.com/site/frestivo/networked-life/databases
• There is another in Quora
Where can I find large datasets open to the public?
DIME 2014, FEUP, 5-12-2014 38
47. Project approach
• Big data set
• Think if communities make sense
• Compare different approaches
• Explain your findings
DIME 2014, FEUP, 5-12-2014 47