2. 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 2
Co-Supervisor
UFSC
Elaine Cecília
Gatto - Cissa
Alan
Demétrius
Baria Valejo
Main
Supervisor
UFSCar
Ricardo Cerri
PhD
Candidate
UFSCar
Mauri
Ferrandin
Collaborator
UFSCar
Researches
3. CONTENTS
• Introduction
• Proposal
• Experiments
• Results and Discussion
• Conclusion and Future Works
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 3
5. INTRODUCTION
• Multi-Label Classification
• Label Correlations
• Multi-label Approaches:
• Global:
• New models or adaptation of existing
models;
• Learn all labels at once;
• Does not correctly learn correlations;
• Induction of a single model (one
classifier)
• Local:
• Divide the original problem into binary
problems;
• Learn each label individually;
• It does not learn the correlations;
• Induction of one model per label (many
classifiers);
• Different approach
• Use the advantgens of both;
• Mitigate the disadvantagens;
• Between global and local approaches
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 5
7. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 7
Figure 1 – Types of partitions considered in this paper.
8. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 8
Figure 2 – FlowChart HPML
9. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 9
Stratification for Multi-Label Classification
Figure 2 – FlowChart HPML
10. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 10
- Similarity Measures
- Jaccard Index
- Rogers-Tanimoto
- Similarity Matrices
- Vector-based data (Label Co-Occurrence Graphs)
- Sparsification
- cut edges with small weights
- Knn: k=1, k=2, k=3
- Threshold: self-loops and 10%
- 5 label co-occurrence graphs for each similarity
measure (10 in total)
Figure 2 – FlowChart HPML
Complex Networks
Community Detection Methods
- systematically encode interactions between
data and find relationships between them;
- correlations and partitioning;
- set of vertices with many edges inside and
some edges outside
11. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 11
- Hierarchical Methods (dendrograms): several hybrid partitions for each
- Non-Hierarchical Methods: only one partition for each
- Several partitions in general
Figure 2 – FlowChart HPML
12. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 12
- Modularity measure as a criterion for choosing a method
- Measures the separation among vertices
- Quantify the density of links within communities compared to links between communities
- Build the corresponding datasets
Figure 2 – FlowChart HPML
13. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 13
- Validates all hybrid partitions from hierarchical methods
- Highest silhouette coefficient as criterion for choosing a hybrid partition
Figure 2 – FlowChart HPML
14. HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 14
- CLUS framework
- PCTs
- hierarchical multi-label classification
- binary and multi-label versions
Figure 2 – FlowChart HPML
- Same classifier for all type of partitions
- Compare partitions not methods
- Investigate the improvements for Hybrid
to local and global partitions
16. Datasets
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 16
• 10 fold cross
validation
• 20 datasets
• 5 domains:
áudio, music,
biology, image
and text
• Instances from
194 to 10k
• Labels from 4
to 178
17. Methods
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 17
Measures:
- MLP (missing label problem):
calculates the proportion of labels
that are never predicted
- MACRO-F1: considers the
individual performances in each
class
19. Community Detection Methods
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 19
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Jaccard
Index
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Rogers
Tanimoto
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Random
KNN TR KNN TR KNN TR KNN TR KNN TR KNN TR
Edge Betweenness
WalkTrap
WalkTrap WalkTrap
Info Map Info Map Info Map
20. Best Chosen Hybrid Partition
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 20
Most chosen Hybrid Partition in general:
A hybrid partition with 2 clusters is closer
to a global partition that is composed of a
single cluster.
This can be one reason that our performance results
are competitive compared with other partitions,
overcome the global, and are not superior to the local
ones for some datasets.
22. Performance
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 22
RANDOM PARTITIONS
- Better or superior than local for some
datasets
- Superior than global for most
datasets;
HYBRID PARTITIONS
- Better or superior than local for some
datasets
- Superior than global for most
datasets;
HYBRID – RANDOM - LOCAL
Competitive between them!
LOCAL
PARTITIONS
Best results
GLOBAL
PARTITIONS
Worst results
Performance Values
In General for all datasets,
partitions and measures
23. Performance
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 23
MACRO-F1
- Range: 0.0 to 1.0
- Low performance values
MLP
- Range: 1.0 to 00
- High performance values = high
prediction error
In General for all datasets and partitions
• HPML managed to obtain hybrid
partitions that can improve the
classifier.
• Low level of correlations between the
labels – random partitions better
• Global and local approaches may not
be learning correctly the label
correlations
• Our approach worked!!!
24. Statistical Tests
Nemenyi + Friedman
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 24
MACRO-F1
No differences:
Lo – NHRa
Lo – H-HPML
G – H-Ra – NH - H-HMPL
Different:
Lo – G
MLP
No differences:
Lo - Random – H-HPML - NH
H-HPML - NH
Different:
Lo - G
Left Side: best methods
Right Side: worst methods
26. Conclusion and Future Works
• Hybrid partitions obtained better or competitive results in several datasets;
• The average performance remained competitive for most methods and datasets;
• Independently of the partitioning used:
o There is no vast improvement besides our competitive results;
o Most labels were not learned by the classifier, even by traditional approaches;
o The classifier still has difficulties learning several labels and predicting them correctly;
• The local and global approaches still need improvements:
o They may not correctly learn label correlations;
• Multi-label classification methods need to improve because:
o Regardless of the partitioning used, or if the correlations were (or not) explored, we cannot state with absolute certainty that
they are correctly learning the labels.
• Still, it is better to use a partition composed of disjoint correlated labels clusters, even a random partitions, than a
global partition;
• Explore other multi-label evaluation measures;
• Use other classifiers and datasets;
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 26