SlideShare a Scribd company logo
1 of 47
Download to read offline
Model-based Overlapping Seed ExpanSion
(MOSES)
Aaron McDaid and Neil Hurley. This research was supported by
Science Foundation Ireland (SFI) Grant No. 08/SRC/I1407.
Clique: Graph & Network Analysis Cluster
School of Computer Science & Informatics
University College Dublin, Ireland
Overview
Community finding
The MOSES model
The MOSES algorithm
Evaluation
Scalability
Other/future work
August 7, 2010 2
Communities
August 7, 2010 3
Facebook
Traud et al. Community Structure In Online Collegiate Social
Networks
M. Salter-Townshend and T.B. Murphy. Variational Bayesian
Inference for the Latent Position Cluster Model
Marlow et al. Maintained relationships on Facebook
August 7, 2010 4
Communities
Some nodes assigned to multiple communities.
Most edges assigned to just one community.
Multiple researchers have found Facebook members being in 6
or 7 communities.
August 7, 2010 5
Communities
A partition will break some of the communities in that simple
example.
Graclus breaks synthetic communities with low levels of
overlap. (A. Lancichinetti and S. Fortunato, Benchmarks for
testing community detection algorithms on directed and
weighted graphs with overlapping communities. )
Graclus breaks communities found by MOSES in Facebook
networks. (Traud et al, Community Structure in Online
Collegiate Social Networks)
Modularity has known problems, but we need to go further
and move on from partitioning.
August 7, 2010 6
Facebook
Traud et al’s five university networks.
Average of 7 communities per node.
August 7, 2010 7
Community finding
A general-purpose community finding algorithm must allow:
Each node to be assigned to any number of communities.
Pervasive overlap. Ahn et al. Link communities reveal
multiscale complexity in networks. (Nature).
The intersection (number of shared nodes) between a pair of
communities can vary. It can be small, even when the number
of communities-per-node is high.
August 7, 2010 8
MOSES
MOSES deals only with undirected, unweighted, networks.
No attributes/weights associated with nodes or edges.
August 7, 2010 9
The MOSES model
Model that:
Every pair of nodes has a chance of having an edge.
Independent for each pair of nodes, given the communities,
but probability is higher for pairs that share community(ies).
(This is an OSBM - Latouche et al. Annals of Applied
Statistics
http://www.imstat.org/aoas/next_issue.html.)
August 7, 2010 10
MOSES model
Ignoring the observed edges
for now. Just consider the
nodes and a (proposed) set of
communities
August 7, 2010 11
MOSES model
These communities create
probabilities for the edges.
P(v1 ∼ v2) = pout where the
two vertices do NOT share a
community.
P(v1 ∼ v2) = 1−(1−pout)(1−
pin) where the two vertices do
share 1 community.
August 7, 2010 12
MOSES model
These communities create
probabilities for the edges.
P(v1 v2) = qout where the
two vertices do NOT share a
community.
P(v1 v2) = qoutqin where
the two vertices do share 1
community.
P(v1 v2) = qoutqin
s(v1,v2)
where s(v1, v2) is the number
of communities shared by v1
and v2.
August 7, 2010 13
MOSES model
We now have a model that, for a given set of communities,
assigns probabilities for edges.
P(g|z, pin, pout)
g is the observed graph of nodes and edges. z is the proposed
set of communities.
August 7, 2010 14
MOSES model
We now have a model that, for a given set of communities,
assigns probabilities for edges.
P(g|z, pin, pout)
g is the observed graph of nodes and edges. z is the proposed
set of communities.
How do we match that with the observed edges to get a good
estimate of the set of communities?
Naive approach: find (z, pin, pout) that maximizes
P(g|z, pin, pout).
August 7, 2010 14
MOSES model
P(g|z, pin, pout) is maximized when pin = 1, pout = 1, and
when z is defined as exactly one community around each edge.
i.e. we don’t want to maximize P(g|z, pin, pout).
August 7, 2010 15
MOSES model
P(z, pin, pout|g)
August 7, 2010 16
MOSES model
Apply Bayes’ Theorem:
P(z, pin, pout|g) ∝ P(g|z, pin, pout) P(z) P(pin, pout)
August 7, 2010 17
MOSES model
Apply Bayes’ Theorem:
P(z, pin, pout|g) ∝ P(g|z, pin, pout) P(z) P(pin, pout)
P(z) ∼ k!
1≤i≤k
1
N + 1
1
N
ni
where k is the number of communities, and ni is the number
of nodes in community i.
August 7, 2010 17
MOSES model
We can correctly integrate out the number of communities, k,
and search across the resulting varying-dimensional space.
No need for model selection, e.g. BIC.
August 7, 2010 18
MOSES Algorithm
For the MOSES algorithm, we chose to look at the joint
distribution over (z, pin, pout) and aim to maximize it.
The algorithm is a heuristic approximate algorithm, and we do
not claim that it finds the maximum.
August 7, 2010 19
MOSES Algorithm
Choose an edge at random to form a seed, and expand.
Accept/reject those expanded seeds that contribute positively
to the objective.
Update pin, pout based on the graph and the current set of
communities.
Delete communities that don’t make a positive contribution to
the objective.
Final fine-tuning that moves nodes one at a time.
August 7, 2010 20
MOSES Algorithm
Choose an edge at random to form a seed, and expand.
Accept/reject those expanded seeds that contribute positively
to the objective.
Update pin, pout based on the graph and the current set of
communities.
Delete communities that don’t make a positive contribution to
the objective.
Final fine-tuning that moves nodes one at a time.
It is not a Markov Chain, nor an EM algorithm. We can make
no such guarantees.
The algorithm will be reaching a local maximum, and may
well have strong biases.
August 7, 2010 20
Evaluation
Synthetic benchmarks
Networks created randomly by software.
Ground truth communities are builtin to these networks.
Check if the algorithms can discover the correct communities
when fed the network.
To measure the similarity between the found communities and
the ground truth communities, overlapping NMI is used.
(Lancichinetti et al. Detecting the overlapping and
hierarchical community structure in complex networks)
August 7, 2010 21
Evaluation
2000 nodes
Define hundreds of communities.
Each community contains 20 nodes chosen at random from
the 2000 nodes.
Some nodes may be assigned to many communities. Some
may not be assigned to a community.
pin = 0.4. About 40% of the pairs of nodes that share a
community are then joined.
pout = 0.005. Finally, a small amount of background noise is
added.
August 7, 2010 22
Evaluation
20-node communities (pin = 0.4), po = 0.005
2 4 6 8 10 12 14
0.00.20.40.60.81.0
Average Overlap
NMI
1 15
MOSES
LFM (default)
LFM (last Collection)
GCE
Louvain method
copra
5−clique percolation
4−clique percolation (dashed)
Iterative Scan (dashed)
August 7, 2010 23
Evaluation, LFR benchmarks
1 2 5 10
0.00.20.40.60.81.0
Communities per node
NMI
3 4 6 7 8 91.2 1.6
MOSES
LFM2−firstCol
LFM2−lastCol
GCE
SCP−3
Louvain method
copra
SCP−4
Evaluation, degree = 15,
15 ≤ c ≤ 60
August 7, 2010 24
Evaluation, LFR benchmarks
1 2 5 10
0.00.20.40.60.81.0
Communities per node
NMI
3 4 6 7 8 91.2 1.6
MOSES
LFM2−firstCol
LFM2−lastCol
GCE
Louvain method
copra
SCP−4
degree ∼ 15, maxdegree = 45, 15 ≤ c ≤ 60
August 7, 2010 25
Facebook
1 5 10 50 500
0.00.10.20.30.4
Degree
Density
August 7, 2010 26
Facebook
1 2 5 10 20 50 100
0.00.10.20.30.40.5
Communities−per−person
Density
August 7, 2010 27
Facebook
1 5 10 50 500
0.00.10.20.30.40.50.6
Size of community
Density
Oklahoma
Princeton
UNC
Georgetown
Caltech
August 7, 2010 28
Facebook
0 200 400 600 800 1000 1200
0
10
20
30
40
50
60
70
Degree
Communitierspernode
1
72
144
215
286
358
429
500
572
643
714
785
857
928
999
1071
1142
Counts
August 7, 2010 29
Facebook
Table: Summary of Traud et al’s five university Facebook datasets, and
of MOSES’s output.
Caltech
Princeton
Georgetown
UNC
Oklahoma
Edges 16656 293320 425638 766800 892528
Nodes 769 6596 9414 18163 17425
Average Degree 43.3 88.9 90.4 84.4 102.4
Communities found 62 832 1284 2725 3073
Average Overlap 3.29 6.28 6.67 6.96 7.46
MOSES runtime (s) 41 553 839 1585 2233
August 7, 2010 30
Scalability
1 2 5 10
1e−021e+001e+02
Communities per node
Time(s)
3 4 6 7 8 91.2 1.6
MOSES
LFM2−firstCol
LFM2−lastCol
GCE
blondel
copra
SCP−4
degree ∼ 15, maxdegree = 45, 15 ≤ c ≤ 60
August 7, 2010 31
Scalability
In general, community finding means overlapping community
finding, (in my interpretation).
August 7, 2010 32
Scalability
In general, community finding means overlapping community
finding, (in my interpretation).
Partitioning breaks communities.
August 7, 2010 32
Scalability
In general, community finding means overlapping community
finding, (in my interpretation).
Partitioning breaks communities.
So, partitioning is scalable, but partitioning doesn’t help with
community finding.
August 7, 2010 32
Scalability
In general, community finding means overlapping community
finding, (in my interpretation).
Partitioning breaks communities.
So, partitioning is scalable, but partitioning doesn’t help with
community finding.
Challenge: a very scalable algorithm that can credibly claim to
be a community-finding algorithm.
August 7, 2010 32
Other/future research
Markov Chain Monte Carlo
Working with Prof. Brendan Murphy on an MCMC method.
Very different algorithm, which allows us to investigate the
model directly.
August 7, 2010 33
Other/future research
Markov Chain Monte Carlo
Working with Prof. Brendan Murphy on an MCMC method.
Very different algorithm, which allows us to investigate the
model directly.
MOSES algorithm may have many biases we’ll never fully
grasp.
August 7, 2010 33
Other/future research
Markov Chain Monte Carlo
Working with Prof. Brendan Murphy on an MCMC method.
Very different algorithm, which allows us to investigate the
model directly.
MOSES algorithm may have many biases we’ll never fully
grasp.
Different model (still an OSBM) where each community has
its own internal-connection probability.
MOSES breaks down on synthetic data if the communities are
not equally dense (pin).
August 7, 2010 33
Other/future research
Markov Chain Monte Carlo
Working with Prof. Brendan Murphy on an MCMC method.
Very different algorithm, which allows us to investigate the
model directly.
MOSES algorithm may have many biases we’ll never fully
grasp.
Different model (still an OSBM) where each community has
its own internal-connection probability.
MOSES breaks down on synthetic data if the communities are
not equally dense (pin).
Draw from this distribution: P(z, pout, p1, p2, p3, ...|g)
August 7, 2010 33
Other/future research
Markov Chain Monte Carlo
Working with Prof. Brendan Murphy on an MCMC method.
Very different algorithm, which allows us to investigate the
model directly.
MOSES algorithm may have many biases we’ll never fully
grasp.
Different model (still an OSBM) where each community has
its own internal-connection probability.
MOSES breaks down on synthetic data if the communities are
not equally dense (pin).
Draw from this distribution: P(z, pout, p1, p2, p3, ...|g)
Multiple MCMC chains, where chains propose splits/merge to
each other.
(Modern) statisticians are innovative about scalability, e.g.
Hybrid Monte Carlo.
August 7, 2010 33
Take home messages
Community finding should be about discovering structure, not
forcing the structure. Overlapping, hierarchy, et cetera.
August 7, 2010 34
Take home messages
Community finding should be about discovering structure, not
forcing the structure. Overlapping, hierarchy, et cetera.
MOSES is a proof-of-concept: We show that quality results,
overlapping communities, and scalability, are not incompatible.
August 7, 2010 34
Take home messages
Community finding should be about discovering structure, not
forcing the structure. Overlapping, hierarchy, et cetera.
MOSES is a proof-of-concept: We show that quality results,
overlapping communities, and scalability, are not incompatible.
Very-scalable community finding algorithms don’t exist. This
is an interesting challenge.
August 7, 2010 34
Acknowledgments
This research was supported by Science Foundation Ireland (SFI)
Grant No. 08/SRC/I1407.
http://clique.ucd.ie/software
http://www.aaronmcdaid.com
aaronmcdaid@gmail.com , neil.hurley@ucd.ie
August 7, 2010 35

More Related Content

Viewers also liked

presentation 1
presentation 1 presentation 1
presentation 1 stoliros
 
Presentation1
Presentation1Presentation1
Presentation1stoliros
 
Textual analysis mark scheme
Textual analysis mark schemeTextual analysis mark scheme
Textual analysis mark schemestoliros
 
Uk presentation
Uk presentationUk presentation
Uk presentationmojahid123
 
[STP]"10 corso como"
[STP]"10 corso como"[STP]"10 corso como"
[STP]"10 corso como"kdg1020
 
Matiro event rubicon keynote
Matiro event   rubicon keynoteMatiro event   rubicon keynote
Matiro event rubicon keynoteMatiro
 
YAPC::Asia 2012 CPANに恩返ししよう
YAPC::Asia 2012 CPANに恩返ししようYAPC::Asia 2012 CPANに恩返ししよう
YAPC::Asia 2012 CPANに恩返ししようazuma satoshi
 
Textual analysis action adventure
Textual analysis action adventureTextual analysis action adventure
Textual analysis action adventurestoliros
 
LU SZF SP Jauno biedru seminārs 2010
LU SZF SP Jauno biedru seminārs  2010LU SZF SP Jauno biedru seminārs  2010
LU SZF SP Jauno biedru seminārs 2010Peteris Jurcenko
 
What can communication do for me
What can communication do for meWhat can communication do for me
What can communication do for meamandaemery
 
Instalacion de sistemas operativos
Instalacion de sistemas operativosInstalacion de sistemas operativos
Instalacion de sistemas operativosJavier Santos
 

Viewers also liked (19)

presentation 1
presentation 1 presentation 1
presentation 1
 
Presentation1
Presentation1Presentation1
Presentation1
 
Textual analysis mark scheme
Textual analysis mark schemeTextual analysis mark scheme
Textual analysis mark scheme
 
Uk presentation
Uk presentationUk presentation
Uk presentation
 
draw aurreratua
draw aurreratuadraw aurreratua
draw aurreratua
 
Developing the organziation
Developing the organziationDeveloping the organziation
Developing the organziation
 
[STP]"10 corso como"
[STP]"10 corso como"[STP]"10 corso como"
[STP]"10 corso como"
 
Matiro event rubicon keynote
Matiro event   rubicon keynoteMatiro event   rubicon keynote
Matiro event rubicon keynote
 
sabrina
sabrinasabrina
sabrina
 
YAPC::Asia 2012 CPANに恩返ししよう
YAPC::Asia 2012 CPANに恩返ししようYAPC::Asia 2012 CPANに恩返ししよう
YAPC::Asia 2012 CPANに恩返ししよう
 
Textual analysis action adventure
Textual analysis action adventureTextual analysis action adventure
Textual analysis action adventure
 
LU SZF SP Jauno biedru seminārs 2010
LU SZF SP Jauno biedru seminārs  2010LU SZF SP Jauno biedru seminārs  2010
LU SZF SP Jauno biedru seminārs 2010
 
Picnik
PicnikPicnik
Picnik
 
What can communication do for me
What can communication do for meWhat can communication do for me
What can communication do for me
 
Instalacion de sistemas operativos
Instalacion de sistemas operativosInstalacion de sistemas operativos
Instalacion de sistemas operativos
 
Ka taisa ertas lapas
Ka taisa ertas lapasKa taisa ertas lapas
Ka taisa ertas lapas
 
Sitio movil
Sitio movilSitio movil
Sitio movil
 
Sare sozialak
Sare sozialakSare sozialak
Sare sozialak
 
Iimk news vol.5
Iimk news vol.5Iimk news vol.5
Iimk news vol.5
 

Similar to Model-Based Overlapping Seed Expansion (MOSES

Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIAustin Benson
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...Daniel Katz
 
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structuresMaxim Boiko Savenko
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Tin180 VietNam
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK csandit
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGraph-TA
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Michael Mathioudakis
 
Greedy Incremental approach for unfolding of communities in massive networks
Greedy Incremental approach for unfolding of communities in massive networksGreedy Incremental approach for unfolding of communities in massive networks
Greedy Incremental approach for unfolding of communities in massive networksIJCSIS Research Publications
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]sdnumaygmailcom
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMarko Rodriguez
 
Netwoks icml09
Netwoks icml09Netwoks icml09
Netwoks icml09zhangzhao
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Behrang Mehrparvar
 

Similar to Model-Based Overlapping Seed Expansion (MOSES (20)

Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoI
 
Node similarity
Node similarityNode similarity
Node similarity
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
 
03 Communities in Networks (2017)
03 Communities in Networks (2017)03 Communities in Networks (2017)
03 Communities in Networks (2017)
 
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structures
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologies
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
Greedy Incremental approach for unfolding of communities in massive networks
Greedy Incremental approach for unfolding of communities in massive networksGreedy Incremental approach for unfolding of communities in massive networks
Greedy Incremental approach for unfolding of communities in massive networks
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large Networks
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network Research
 
Netwoks icml09
Netwoks icml09Netwoks icml09
Netwoks icml09
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Model-Based Overlapping Seed Expansion (MOSES

  • 1. Model-based Overlapping Seed ExpanSion (MOSES) Aaron McDaid and Neil Hurley. This research was supported by Science Foundation Ireland (SFI) Grant No. 08/SRC/I1407. Clique: Graph & Network Analysis Cluster School of Computer Science & Informatics University College Dublin, Ireland
  • 2. Overview Community finding The MOSES model The MOSES algorithm Evaluation Scalability Other/future work August 7, 2010 2
  • 4. Facebook Traud et al. Community Structure In Online Collegiate Social Networks M. Salter-Townshend and T.B. Murphy. Variational Bayesian Inference for the Latent Position Cluster Model Marlow et al. Maintained relationships on Facebook August 7, 2010 4
  • 5. Communities Some nodes assigned to multiple communities. Most edges assigned to just one community. Multiple researchers have found Facebook members being in 6 or 7 communities. August 7, 2010 5
  • 6. Communities A partition will break some of the communities in that simple example. Graclus breaks synthetic communities with low levels of overlap. (A. Lancichinetti and S. Fortunato, Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. ) Graclus breaks communities found by MOSES in Facebook networks. (Traud et al, Community Structure in Online Collegiate Social Networks) Modularity has known problems, but we need to go further and move on from partitioning. August 7, 2010 6
  • 7. Facebook Traud et al’s five university networks. Average of 7 communities per node. August 7, 2010 7
  • 8. Community finding A general-purpose community finding algorithm must allow: Each node to be assigned to any number of communities. Pervasive overlap. Ahn et al. Link communities reveal multiscale complexity in networks. (Nature). The intersection (number of shared nodes) between a pair of communities can vary. It can be small, even when the number of communities-per-node is high. August 7, 2010 8
  • 9. MOSES MOSES deals only with undirected, unweighted, networks. No attributes/weights associated with nodes or edges. August 7, 2010 9
  • 10. The MOSES model Model that: Every pair of nodes has a chance of having an edge. Independent for each pair of nodes, given the communities, but probability is higher for pairs that share community(ies). (This is an OSBM - Latouche et al. Annals of Applied Statistics http://www.imstat.org/aoas/next_issue.html.) August 7, 2010 10
  • 11. MOSES model Ignoring the observed edges for now. Just consider the nodes and a (proposed) set of communities August 7, 2010 11
  • 12. MOSES model These communities create probabilities for the edges. P(v1 ∼ v2) = pout where the two vertices do NOT share a community. P(v1 ∼ v2) = 1−(1−pout)(1− pin) where the two vertices do share 1 community. August 7, 2010 12
  • 13. MOSES model These communities create probabilities for the edges. P(v1 v2) = qout where the two vertices do NOT share a community. P(v1 v2) = qoutqin where the two vertices do share 1 community. P(v1 v2) = qoutqin s(v1,v2) where s(v1, v2) is the number of communities shared by v1 and v2. August 7, 2010 13
  • 14. MOSES model We now have a model that, for a given set of communities, assigns probabilities for edges. P(g|z, pin, pout) g is the observed graph of nodes and edges. z is the proposed set of communities. August 7, 2010 14
  • 15. MOSES model We now have a model that, for a given set of communities, assigns probabilities for edges. P(g|z, pin, pout) g is the observed graph of nodes and edges. z is the proposed set of communities. How do we match that with the observed edges to get a good estimate of the set of communities? Naive approach: find (z, pin, pout) that maximizes P(g|z, pin, pout). August 7, 2010 14
  • 16. MOSES model P(g|z, pin, pout) is maximized when pin = 1, pout = 1, and when z is defined as exactly one community around each edge. i.e. we don’t want to maximize P(g|z, pin, pout). August 7, 2010 15
  • 17. MOSES model P(z, pin, pout|g) August 7, 2010 16
  • 18. MOSES model Apply Bayes’ Theorem: P(z, pin, pout|g) ∝ P(g|z, pin, pout) P(z) P(pin, pout) August 7, 2010 17
  • 19. MOSES model Apply Bayes’ Theorem: P(z, pin, pout|g) ∝ P(g|z, pin, pout) P(z) P(pin, pout) P(z) ∼ k! 1≤i≤k 1 N + 1 1 N ni where k is the number of communities, and ni is the number of nodes in community i. August 7, 2010 17
  • 20. MOSES model We can correctly integrate out the number of communities, k, and search across the resulting varying-dimensional space. No need for model selection, e.g. BIC. August 7, 2010 18
  • 21. MOSES Algorithm For the MOSES algorithm, we chose to look at the joint distribution over (z, pin, pout) and aim to maximize it. The algorithm is a heuristic approximate algorithm, and we do not claim that it finds the maximum. August 7, 2010 19
  • 22. MOSES Algorithm Choose an edge at random to form a seed, and expand. Accept/reject those expanded seeds that contribute positively to the objective. Update pin, pout based on the graph and the current set of communities. Delete communities that don’t make a positive contribution to the objective. Final fine-tuning that moves nodes one at a time. August 7, 2010 20
  • 23. MOSES Algorithm Choose an edge at random to form a seed, and expand. Accept/reject those expanded seeds that contribute positively to the objective. Update pin, pout based on the graph and the current set of communities. Delete communities that don’t make a positive contribution to the objective. Final fine-tuning that moves nodes one at a time. It is not a Markov Chain, nor an EM algorithm. We can make no such guarantees. The algorithm will be reaching a local maximum, and may well have strong biases. August 7, 2010 20
  • 24. Evaluation Synthetic benchmarks Networks created randomly by software. Ground truth communities are builtin to these networks. Check if the algorithms can discover the correct communities when fed the network. To measure the similarity between the found communities and the ground truth communities, overlapping NMI is used. (Lancichinetti et al. Detecting the overlapping and hierarchical community structure in complex networks) August 7, 2010 21
  • 25. Evaluation 2000 nodes Define hundreds of communities. Each community contains 20 nodes chosen at random from the 2000 nodes. Some nodes may be assigned to many communities. Some may not be assigned to a community. pin = 0.4. About 40% of the pairs of nodes that share a community are then joined. pout = 0.005. Finally, a small amount of background noise is added. August 7, 2010 22
  • 26. Evaluation 20-node communities (pin = 0.4), po = 0.005 2 4 6 8 10 12 14 0.00.20.40.60.81.0 Average Overlap NMI 1 15 MOSES LFM (default) LFM (last Collection) GCE Louvain method copra 5−clique percolation 4−clique percolation (dashed) Iterative Scan (dashed) August 7, 2010 23
  • 27. Evaluation, LFR benchmarks 1 2 5 10 0.00.20.40.60.81.0 Communities per node NMI 3 4 6 7 8 91.2 1.6 MOSES LFM2−firstCol LFM2−lastCol GCE SCP−3 Louvain method copra SCP−4 Evaluation, degree = 15, 15 ≤ c ≤ 60 August 7, 2010 24
  • 28. Evaluation, LFR benchmarks 1 2 5 10 0.00.20.40.60.81.0 Communities per node NMI 3 4 6 7 8 91.2 1.6 MOSES LFM2−firstCol LFM2−lastCol GCE Louvain method copra SCP−4 degree ∼ 15, maxdegree = 45, 15 ≤ c ≤ 60 August 7, 2010 25
  • 29. Facebook 1 5 10 50 500 0.00.10.20.30.4 Degree Density August 7, 2010 26
  • 30. Facebook 1 2 5 10 20 50 100 0.00.10.20.30.40.5 Communities−per−person Density August 7, 2010 27
  • 31. Facebook 1 5 10 50 500 0.00.10.20.30.40.50.6 Size of community Density Oklahoma Princeton UNC Georgetown Caltech August 7, 2010 28
  • 32. Facebook 0 200 400 600 800 1000 1200 0 10 20 30 40 50 60 70 Degree Communitierspernode 1 72 144 215 286 358 429 500 572 643 714 785 857 928 999 1071 1142 Counts August 7, 2010 29
  • 33. Facebook Table: Summary of Traud et al’s five university Facebook datasets, and of MOSES’s output. Caltech Princeton Georgetown UNC Oklahoma Edges 16656 293320 425638 766800 892528 Nodes 769 6596 9414 18163 17425 Average Degree 43.3 88.9 90.4 84.4 102.4 Communities found 62 832 1284 2725 3073 Average Overlap 3.29 6.28 6.67 6.96 7.46 MOSES runtime (s) 41 553 839 1585 2233 August 7, 2010 30
  • 34. Scalability 1 2 5 10 1e−021e+001e+02 Communities per node Time(s) 3 4 6 7 8 91.2 1.6 MOSES LFM2−firstCol LFM2−lastCol GCE blondel copra SCP−4 degree ∼ 15, maxdegree = 45, 15 ≤ c ≤ 60 August 7, 2010 31
  • 35. Scalability In general, community finding means overlapping community finding, (in my interpretation). August 7, 2010 32
  • 36. Scalability In general, community finding means overlapping community finding, (in my interpretation). Partitioning breaks communities. August 7, 2010 32
  • 37. Scalability In general, community finding means overlapping community finding, (in my interpretation). Partitioning breaks communities. So, partitioning is scalable, but partitioning doesn’t help with community finding. August 7, 2010 32
  • 38. Scalability In general, community finding means overlapping community finding, (in my interpretation). Partitioning breaks communities. So, partitioning is scalable, but partitioning doesn’t help with community finding. Challenge: a very scalable algorithm that can credibly claim to be a community-finding algorithm. August 7, 2010 32
  • 39. Other/future research Markov Chain Monte Carlo Working with Prof. Brendan Murphy on an MCMC method. Very different algorithm, which allows us to investigate the model directly. August 7, 2010 33
  • 40. Other/future research Markov Chain Monte Carlo Working with Prof. Brendan Murphy on an MCMC method. Very different algorithm, which allows us to investigate the model directly. MOSES algorithm may have many biases we’ll never fully grasp. August 7, 2010 33
  • 41. Other/future research Markov Chain Monte Carlo Working with Prof. Brendan Murphy on an MCMC method. Very different algorithm, which allows us to investigate the model directly. MOSES algorithm may have many biases we’ll never fully grasp. Different model (still an OSBM) where each community has its own internal-connection probability. MOSES breaks down on synthetic data if the communities are not equally dense (pin). August 7, 2010 33
  • 42. Other/future research Markov Chain Monte Carlo Working with Prof. Brendan Murphy on an MCMC method. Very different algorithm, which allows us to investigate the model directly. MOSES algorithm may have many biases we’ll never fully grasp. Different model (still an OSBM) where each community has its own internal-connection probability. MOSES breaks down on synthetic data if the communities are not equally dense (pin). Draw from this distribution: P(z, pout, p1, p2, p3, ...|g) August 7, 2010 33
  • 43. Other/future research Markov Chain Monte Carlo Working with Prof. Brendan Murphy on an MCMC method. Very different algorithm, which allows us to investigate the model directly. MOSES algorithm may have many biases we’ll never fully grasp. Different model (still an OSBM) where each community has its own internal-connection probability. MOSES breaks down on synthetic data if the communities are not equally dense (pin). Draw from this distribution: P(z, pout, p1, p2, p3, ...|g) Multiple MCMC chains, where chains propose splits/merge to each other. (Modern) statisticians are innovative about scalability, e.g. Hybrid Monte Carlo. August 7, 2010 33
  • 44. Take home messages Community finding should be about discovering structure, not forcing the structure. Overlapping, hierarchy, et cetera. August 7, 2010 34
  • 45. Take home messages Community finding should be about discovering structure, not forcing the structure. Overlapping, hierarchy, et cetera. MOSES is a proof-of-concept: We show that quality results, overlapping communities, and scalability, are not incompatible. August 7, 2010 34
  • 46. Take home messages Community finding should be about discovering structure, not forcing the structure. Overlapping, hierarchy, et cetera. MOSES is a proof-of-concept: We show that quality results, overlapping communities, and scalability, are not incompatible. Very-scalable community finding algorithms don’t exist. This is an interesting challenge. August 7, 2010 34
  • 47. Acknowledgments This research was supported by Science Foundation Ireland (SFI) Grant No. 08/SRC/I1407. http://clique.ucd.ie/software http://www.aaronmcdaid.com aaronmcdaid@gmail.com , neil.hurley@ucd.ie August 7, 2010 35