Community Detection Method for Multi-Label Classification

COMMUNITY DETECTION FOR
MULTI-LABEL CLASSIFICATION
Elaine Cecília Gatto | Alan Valejo | Mauri Ferrandin | Ricardo Cerri

20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 2
Co-Supervisor
UFSC
Elaine Cecília
Gatto - Cissa
Alan
Demétrius
Baria Valejo
Main
Supervisor
UFSCar
Ricardo Cerri
PhD
Candidate
UFSCar
Mauri
Ferrandin
Collaborator
UFSCar
Researches

CONTENTS
• Introduction
• Proposal
• Experiments
• Results and Discussion
• Conclusion and Future Works

INTRODUCTION
• Multi-Label Classification
• Label Correlations
• Multi-label Approaches:
• Global:
• New models or adaptation of existing
models;
• Learn all labels at once;
• Does not correctly learn correlations;
• Induction of a single model (one
classifier)
• Local:
• Divide the original problem into binary
problems;
• Learn each label individually;
• It does not learn the correlations;
• Induction of one model per label (many
classifiers);
• Different approach
• Use the advantgens of both;
• Mitigate the disadvantagens;
• Between global and local approaches

HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
Figure 1 – Types of partitions considered in this paper.

Figure 2 – FlowChart HPML

Stratification for Multi-Label Classification

- Similarity Measures
- Jaccard Index
- Rogers-Tanimoto
- Similarity Matrices
- Vector-based data (Label Co-Occurrence Graphs)
- Sparsification
- cut edges with small weights
- Knn: k=1, k=2, k=3
- Threshold: self-loops and 10%
- 5 label co-occurrence graphs for each similarity
measure (10 in total)
Complex Networks
Community Detection Methods
- systematically encode interactions between
data and find relationships between them;
- correlations and partitioning;
- set of vertices with many edges inside and
some edges outside

- Hierarchical Methods (dendrograms): several hybrid partitions for each
- Non-Hierarchical Methods: only one partition for each
- Several partitions in general

- Modularity measure as a criterion for choosing a method
- Measures the separation among vertices
- Quantify the density of links within communities compared to links between communities
- Build the corresponding datasets

- Validates all hybrid partitions from hierarchical methods
- Highest silhouette coefficient as criterion for choosing a hybrid partition

- CLUS framework
- PCTs
- hierarchical multi-label classification
- binary and multi-label versions
- Same classifier for all type of partitions
- Compare partitions not methods
- Investigate the improvements for Hybrid
to local and global partitions

Datasets
• 10 fold cross
validation
• 20 datasets
• 5 domains:
áudio, music,
biology, image
and text
• Instances from
194 to 10k
• Labels from 4
to 178

Methods
Measures:
- MLP (missing label problem):
calculates the proportion of labels
that are never predicted
- MACRO-F1: considers the
individual performances in each
class

Community Detection Methods
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Jaccard
Index
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Rogers
Tanimoto
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Random
KNN TR KNN TR KNN TR KNN TR KNN TR KNN TR
Edge Betweenness
WalkTrap
WalkTrap WalkTrap
Info Map Info Map Info Map

Best Chosen Hybrid Partition
Most chosen Hybrid Partition in general:
A hybrid partition with 2 clusters is closer
to a global partition that is composed of a
single cluster.
This can be one reason that our performance results
are competitive compared with other partitions,
overcome the global, and are not superior to the local
ones for some datasets.

Performance

Performance
RANDOM PARTITIONS
- Better or superior than local for some
datasets
- Superior than global for most
datasets;
HYBRID PARTITIONS
- Better or superior than local for some
datasets
- Superior than global for most
datasets;
HYBRID – RANDOM - LOCAL
Competitive between them!
LOCAL
PARTITIONS
Best results
GLOBAL
PARTITIONS
Worst results
Performance Values
In General for all datasets,
partitions and measures

Performance
MACRO-F1
- Range: 0.0 to 1.0
- Low performance values
MLP
- Range: 1.0 to 00
- High performance values = high
prediction error
In General for all datasets and partitions
• HPML managed to obtain hybrid
partitions that can improve the
classifier.
• Low level of correlations between the
labels – random partitions better
• Global and local approaches may not
be learning correctly the label
correlations
• Our approach worked!!!

Statistical Tests
Nemenyi + Friedman
MACRO-F1
No differences:
Lo – NHRa
Lo – H-HPML
G – H-Ra – NH - H-HMPL
Different:
Lo – G
MLP
No differences:
Lo - Random – H-HPML - NH
H-HPML - NH
Different:
Lo - G
Left Side: best methods
Right Side: worst methods

Conclusion and Future Works
• Hybrid partitions obtained better or competitive results in several datasets;
• The average performance remained competitive for most methods and datasets;
• Independently of the partitioning used:
o There is no vast improvement besides our competitive results;
o Most labels were not learned by the classifier, even by traditional approaches;
o The classifier still has difficulties learning several labels and predicting them correctly;
• The local and global approaches still need improvements:
o They may not correctly learn label correlations;
• Multi-label classification methods need to improve because:
o Regardless of the partitioning used, or if the correlations were (or not) explored, we cannot state with absolute certainty that
they are correctly learning the labels.
• Still, it is better to use a partition composed of disjoint correlated labels clusters, even a random partitions, than a
global partition;
• Explore other multi-label evaluation measures;
• Use other classifiers and datasets;

https://sites.google.com/view/cissagatto
THANKS!
“Whoever goes up the stairs should start at the
bottom. To be good at something you have to take it
one step at a time” (Haruichi Furudate – Haikyuu!!)

Community Detection Method for Multi-Label Classification

Recommended

Recommended

More Related Content

Similar to Community Detection Method for Multi-Label Classification

Similar to Community Detection Method for Multi-Label Classification (20)

More from Elaine Cecília Gatto

More from Elaine Cecília Gatto (20)

Recently uploaded

Recently uploaded (20)

Community Detection Method for Multi-Label Classification