SlideShare a Scribd company logo
Artem Lutov, Mourad Khayati and Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg, Switzerland
https://github.com/eXascaleInfolab/daoc
https://bit.ly/daoc-slides
Stability includes both robustness and determinism in our paper:
● Robustness means that the forming clusters should evolve gracefully
and without any surges on minor perturbations (i.e., some
changes in the links or nodes) of the input network (i.e., graph).
● Determinism represents both non-stochastic and input
order-independent (i.e., reshuffling-resistant) results.
2
Stable (i.e., robust and deterministic) clustering of large networks:
DAOC - Deterministic and Agglomerative Overlapping Clustering
● Mutual Maximal Gain, to ensure robustness while being capable of
identifying micro-scale clusters
● Overlap Decomposition, to identify fine-grained clusters in a
deterministic way, while capturing multiple optima even for the
algorithms whose optimization function is not supported overlaps
3
Human perception-adapted Taxonomy construction
for large Evolving Networks by Incremental Clustering
● Stable
● Fully-automatic
● Browsable
● Large
● Multi-viewpoint
● Narrow (7 ± 2 rule)
4
● Robust + Determ.
● Parameter-free
● Hierarchical
● Near-linear runtime
● Overlapping
● Fine-grained
Human perception-adapted Taxonomy construction
for large Evolving Networks by Incremental Clustering
● Stable
● Fully-automatic
● Browsable
● Large
● Multi-viewpoint
● Narrow (7 ± 2 rule)
5
● Robust + Determ.
● Parameter-free
● Hierarchical
● Near-linear runtime
● Overlapping
● Fine-grained
Louvain
6
Modularity:
Modularity gain:
Mutual Maximal(⬦) Gain:
Decomposition of a node of degree d=3 into K=3 fragments:
7
OD
constraints:
The following algorithms are evaluated on synthetic (multiple instances of each category) and
real-world networks using open-source benchmarking framework, Clubmark:
8
* the feature is partially available, parameters tuning might be required for specific cases
◦ the feature is not supported by the original implementation of the algorithm
9
F1h (average value and deviation)
for subsequent perturbations (link
removals) of a synthetic network.
Stable algorithms (among non-
stochastic and ensemble) are
outlined with a bold line and
expected to have a gracefully
decreasing F1h without surges.
10
11
DAOC is a novel clustering algorithm design for a stable (both robust
and deterministic) clustering of large networks aiming to construct
human perception-adapted taxonomies without any manual tuning.
● DAOC is 25% more accurate on average than state-of-the-art
non-stochastic clustering algorithms being on par with the most
accurate existing (including stochastic) clustering algorithms
● DAOC is one the least memory consuming and fastest state-of-
the-art clustering algorithms being applicable to large networks
12
Artem Lutov <artem.lutov@unifr.ch>
https://github.com/eXascaleInfolab/daoc
13
14
1515
Racing cars
Overlapping Clusters Clusters on Various Resolutions
Blue cars
Jeeps
Cars
Racing cars
Bikes
Racing &
blue cars
Bikes
Matching the clusterings (unordered sets of elements) even with the
elements having a single membership may yield multiple best matches:
=> Strictclusterslabeling is not
always possible and undesirable.
Many dedicated accuracy metrics
are designed but few of them are
applicable for the elements with
multiplemembership.
16
Produced Ground-truth
Dark or Cyan?
Yellow
● Applicable for the elements having multiple membership
● Applicable for Large Datasets: ideally O(N), runtime up to O(N2
)
Families with the accuracy metrics satisfying our requirements:
● Pair Counting Based Metrics: Omega Index [Collins,1988]
● Cluster Matching Based Metrics: Average F1 score [Yang,2013]
● Information Theory Based Metrics: Generalized NMI
[Esquivel,2012]
Problem: accuracy values interpretability and the metric selection. 17
Omega Index (𝛀) counts the number of pairs of elements occurring
in exactly the same number of clusters as in the number of categories
and adjusted to the expected number of such pairs:
18
,
,
C’ - ground-truth
(categories)
C - produced cls.
Soft Omega Index take into account pairs present in different
number of clusters by normalizing smaller number of occurrences of
each pair of elements in all clusters of one clustering by the larger
number of occurrences in another clustering:
19
,
F1a is defined as the average of the weighted F1 scores of a) the best
matching ground-truth clusters to the formed clusters and b) the best
matching formed clusters to the ground-truth clusters:
20
,
F1 - F1-measure
[Rijsbergen, 1974]
F1h uses harmonic instead of the arithm. mean to address F1a ≳ 0.5
for the clusters produced from all combinations of the nodes (F1C‘,C
=
1 since for each category there exists the exactly matching cluster,
F1C,C’
→0 since majority of the clusters have low similarity to the
categories):
21
, for the contribution m of the nodes:
F1p is the harmonic mean of the average over each clustering of the
best local probabilities (f1 ➞ pprob) for each cluster:
22
Purpose: O(N(|C’| + |C|)) ➞ O(N)
Cluster
mbs # Member nodes, const
cont #Members contrib, const
counter # Contribs counter
Counter
orig # Originating cluster
ctr # Raw counter, <= mbs
C
23
..
...
a
for a in g2.mbs: for c in cls(C.a):
cc = c.counter;
if cc.orig != g2:
cc.ctr=0; cc.orig=g2
cc.ctr += 1 / |C.a| if ovp else 1
fmatch(cc.ctr, c.cont, g2.cont)
g2
c1
c3
C’
SNAP DBLP (Nodes: 317,080
Edges: 1,049,866 Clusters:
13,477) ground-truth vs
clustering by the Louvain.
Evaluation on Intel Xeon
E5-2620 (32 logical CPUs)
@ 2.10 GHz, apps compiled
using GCC 5.4 with -O3 flag.
24
NMI is Mutual Information I(C’:C) normalized by the max or mean
value of the unconditional entropy H of the clusterings C’, C:
25
,
,
GNMI
[Esquivel,2012]
uses a stochastic
process to
compute MI.
(Soft) 𝛀
MF1
GNMI
26
O(N2
), performs purely for the
multi-resolution clusterings.
Evaluates the best-matching
clusters only (unfair advantage
for the larger clusters).
Biased to the number of clusters,
non-deterministic results, the
convergence is not guaranteed in
the stochastic implementation.
values are not affected by the
number of clusters.
O(N), F1p satisfiers more
formal constraints than others.
Highly parallelized, evaluates
full matches, well-grounded
theoretically.

More Related Content

What's hot

Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural Networks
Wei Yang
 
Qb pc ii
Qb   pc   iiQb   pc   ii
Qb pc ii
Ayaz Shariff
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Voica Gavrilut
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
Cn tutorial (6 cs119)
Cn tutorial (6 cs119)Cn tutorial (6 cs119)
Cn tutorial (6 cs119)
harshapenugonda
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
Shunta Saito
 
Graph Matching
Graph MatchingGraph Matching
Graph Matching
graphitech
 
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Tiziano De Matteis
 
F044062933
F044062933F044062933
F044062933
IJERA Editor
 
Aerial detection part2
Aerial detection part2Aerial detection part2
Aerial detection part2
ssuser456ad6
 
High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange
Ieee Xpert
 
Fuzzy clustering using RSIO-FCM ppt
Fuzzy clustering using RSIO-FCM pptFuzzy clustering using RSIO-FCM ppt
Fuzzy clustering using RSIO-FCM ppt
NIGAN NAYAK
 
Programmable Logic Array
Programmable Logic Array Programmable Logic Array
Programmable Logic Array
Comilla University
 
A High Throughput CFA AES S-Box with Error Correction Capability
A High Throughput CFA AES S-Box with Error Correction CapabilityA High Throughput CFA AES S-Box with Error Correction Capability
A High Throughput CFA AES S-Box with Error Correction Capability
IOSR Journals
 

What's hot (19)

Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural Networks
 
Qb pc ii
Qb   pc   iiQb   pc   ii
Qb pc ii
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
SPAA11
SPAA11SPAA11
SPAA11
 
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
Cn tutorial (6 cs119)
Cn tutorial (6 cs119)Cn tutorial (6 cs119)
Cn tutorial (6 cs119)
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
Graph Matching
Graph MatchingGraph Matching
Graph Matching
 
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
 
F044062933
F044062933F044062933
F044062933
 
Aerial detection part2
Aerial detection part2Aerial detection part2
Aerial detection part2
 
High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange
 
Fuzzy clustering using RSIO-FCM ppt
Fuzzy clustering using RSIO-FCM pptFuzzy clustering using RSIO-FCM ppt
Fuzzy clustering using RSIO-FCM ppt
 
Programmable Logic Array
Programmable Logic Array Programmable Logic Array
Programmable Logic Array
 
A High Throughput CFA AES S-Box with Error Correction Capability
A High Throughput CFA AES S-Box with Error Correction CapabilityA High Throughput CFA AES S-Box with Error Correction Capability
A High Throughput CFA AES S-Box with Error Correction Capability
 
Data comparation
Data comparationData comparation
Data comparation
 
Dsp ic(2) jan 2013
Dsp ic(2) jan 2013Dsp ic(2) jan 2013
Dsp ic(2) jan 2013
 
Capp nov dec2012
Capp nov dec2012Capp nov dec2012
Capp nov dec2012
 

Similar to DAOC: Stable Clustering of Large Networks

DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
Artem Lutov
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
Jenny Liu
 
PDCLECTURE.pptx
PDCLECTURE.pptxPDCLECTURE.pptx
PDCLECTURE.pptx
ssuser5904d8
 
24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf
FrangoCamila
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
Förderverein Technische Fakultät
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
acijjournal
 
Towards the Design of Heuristics by Means of Self-Assembly
Towards the Design of Heuristics by Means of Self-AssemblyTowards the Design of Heuristics by Means of Self-Assembly
Towards the Design of Heuristics by Means of Self-Assembly
German Terrazas
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam search
Satoru Katsumata
 
Leach-Protocol
Leach-ProtocolLeach-Protocol
Leach-Protocol
zhendong
 
Sudormrf.pdf
Sudormrf.pdfSudormrf.pdf
Sudormrf.pdf
ssuser849b73
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMS
chandugoswami
 
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Artem Lutov
 
Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc NetworksSimulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
cscpconf
 
F017123439
F017123439F017123439
F017123439
IOSR Journals
 
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc NetworkA Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
IOSR Journals
 
Clique-based Network Clustering
Clique-based Network ClusteringClique-based Network Clustering
Clique-based Network ClusteringGuang Ouyang
 

Similar to DAOC: Stable Clustering of Large Networks (20)

DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
PDCLECTURE.pptx
PDCLECTURE.pptxPDCLECTURE.pptx
PDCLECTURE.pptx
 
24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 
Towards the Design of Heuristics by Means of Self-Assembly
Towards the Design of Heuristics by Means of Self-AssemblyTowards the Design of Heuristics by Means of Self-Assembly
Towards the Design of Heuristics by Means of Self-Assembly
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam search
 
Leach-Protocol
Leach-ProtocolLeach-Protocol
Leach-Protocol
 
Sudormrf.pdf
Sudormrf.pdfSudormrf.pdf
Sudormrf.pdf
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMS
 
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
 
Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc NetworksSimulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
 
F017123439
F017123439F017123439
F017123439
 
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc NetworkA Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
 
Clique-based Network Clustering
Clique-based Network ClusteringClique-based Network Clustering
Clique-based Network Clustering
 

Recently uploaded

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

DAOC: Stable Clustering of Large Networks

  • 1. Artem Lutov, Mourad Khayati and Philippe Cudré-Mauroux eXascale Infolab, University of Fribourg, Switzerland https://github.com/eXascaleInfolab/daoc https://bit.ly/daoc-slides
  • 2. Stability includes both robustness and determinism in our paper: ● Robustness means that the forming clusters should evolve gracefully and without any surges on minor perturbations (i.e., some changes in the links or nodes) of the input network (i.e., graph). ● Determinism represents both non-stochastic and input order-independent (i.e., reshuffling-resistant) results. 2
  • 3. Stable (i.e., robust and deterministic) clustering of large networks: DAOC - Deterministic and Agglomerative Overlapping Clustering ● Mutual Maximal Gain, to ensure robustness while being capable of identifying micro-scale clusters ● Overlap Decomposition, to identify fine-grained clusters in a deterministic way, while capturing multiple optima even for the algorithms whose optimization function is not supported overlaps 3
  • 4. Human perception-adapted Taxonomy construction for large Evolving Networks by Incremental Clustering ● Stable ● Fully-automatic ● Browsable ● Large ● Multi-viewpoint ● Narrow (7 ± 2 rule) 4 ● Robust + Determ. ● Parameter-free ● Hierarchical ● Near-linear runtime ● Overlapping ● Fine-grained
  • 5. Human perception-adapted Taxonomy construction for large Evolving Networks by Incremental Clustering ● Stable ● Fully-automatic ● Browsable ● Large ● Multi-viewpoint ● Narrow (7 ± 2 rule) 5 ● Robust + Determ. ● Parameter-free ● Hierarchical ● Near-linear runtime ● Overlapping ● Fine-grained Louvain
  • 7. Decomposition of a node of degree d=3 into K=3 fragments: 7 OD constraints:
  • 8. The following algorithms are evaluated on synthetic (multiple instances of each category) and real-world networks using open-source benchmarking framework, Clubmark: 8 * the feature is partially available, parameters tuning might be required for specific cases ◦ the feature is not supported by the original implementation of the algorithm
  • 9. 9 F1h (average value and deviation) for subsequent perturbations (link removals) of a synthetic network. Stable algorithms (among non- stochastic and ensemble) are outlined with a bold line and expected to have a gracefully decreasing F1h without surges.
  • 10. 10
  • 11. 11
  • 12. DAOC is a novel clustering algorithm design for a stable (both robust and deterministic) clustering of large networks aiming to construct human perception-adapted taxonomies without any manual tuning. ● DAOC is 25% more accurate on average than state-of-the-art non-stochastic clustering algorithms being on par with the most accurate existing (including stochastic) clustering algorithms ● DAOC is one the least memory consuming and fastest state-of- the-art clustering algorithms being applicable to large networks 12
  • 14. 14
  • 15. 1515 Racing cars Overlapping Clusters Clusters on Various Resolutions Blue cars Jeeps Cars Racing cars Bikes Racing & blue cars Bikes
  • 16. Matching the clusterings (unordered sets of elements) even with the elements having a single membership may yield multiple best matches: => Strictclusterslabeling is not always possible and undesirable. Many dedicated accuracy metrics are designed but few of them are applicable for the elements with multiplemembership. 16 Produced Ground-truth Dark or Cyan? Yellow
  • 17. ● Applicable for the elements having multiple membership ● Applicable for Large Datasets: ideally O(N), runtime up to O(N2 ) Families with the accuracy metrics satisfying our requirements: ● Pair Counting Based Metrics: Omega Index [Collins,1988] ● Cluster Matching Based Metrics: Average F1 score [Yang,2013] ● Information Theory Based Metrics: Generalized NMI [Esquivel,2012] Problem: accuracy values interpretability and the metric selection. 17
  • 18. Omega Index (𝛀) counts the number of pairs of elements occurring in exactly the same number of clusters as in the number of categories and adjusted to the expected number of such pairs: 18 , , C’ - ground-truth (categories) C - produced cls.
  • 19. Soft Omega Index take into account pairs present in different number of clusters by normalizing smaller number of occurrences of each pair of elements in all clusters of one clustering by the larger number of occurrences in another clustering: 19 ,
  • 20. F1a is defined as the average of the weighted F1 scores of a) the best matching ground-truth clusters to the formed clusters and b) the best matching formed clusters to the ground-truth clusters: 20 , F1 - F1-measure [Rijsbergen, 1974]
  • 21. F1h uses harmonic instead of the arithm. mean to address F1a ≳ 0.5 for the clusters produced from all combinations of the nodes (F1C‘,C = 1 since for each category there exists the exactly matching cluster, F1C,C’ →0 since majority of the clusters have low similarity to the categories): 21 , for the contribution m of the nodes:
  • 22. F1p is the harmonic mean of the average over each clustering of the best local probabilities (f1 ➞ pprob) for each cluster: 22
  • 23. Purpose: O(N(|C’| + |C|)) ➞ O(N) Cluster mbs # Member nodes, const cont #Members contrib, const counter # Contribs counter Counter orig # Originating cluster ctr # Raw counter, <= mbs C 23 .. ... a for a in g2.mbs: for c in cls(C.a): cc = c.counter; if cc.orig != g2: cc.ctr=0; cc.orig=g2 cc.ctr += 1 / |C.a| if ovp else 1 fmatch(cc.ctr, c.cont, g2.cont) g2 c1 c3 C’
  • 24. SNAP DBLP (Nodes: 317,080 Edges: 1,049,866 Clusters: 13,477) ground-truth vs clustering by the Louvain. Evaluation on Intel Xeon E5-2620 (32 logical CPUs) @ 2.10 GHz, apps compiled using GCC 5.4 with -O3 flag. 24
  • 25. NMI is Mutual Information I(C’:C) normalized by the max or mean value of the unconditional entropy H of the clusterings C’, C: 25 , , GNMI [Esquivel,2012] uses a stochastic process to compute MI.
  • 26. (Soft) 𝛀 MF1 GNMI 26 O(N2 ), performs purely for the multi-resolution clusterings. Evaluates the best-matching clusters only (unfair advantage for the larger clusters). Biased to the number of clusters, non-deterministic results, the convergence is not guaranteed in the stochastic implementation. values are not affected by the number of clusters. O(N), F1p satisfiers more formal constraints than others. Highly parallelized, evaluates full matches, well-grounded theoretically.