The area of multi-label classification has rapidly developed in recent years. It has become widely known that the baseline binary relevance approach suffers from class imbalance and a restricted hypothesis space that negatively affects its predictive performance, and can easily be outperformed by methods which learn labels together. A number of methods have grown around the label powerset approach, which models label combinations together as class values in a multi-class problem. We describe the label-powerset-based solutions under a general framework of \emph{meta-labels}. We provide theoretical justification for this framework which has been lacking, by viewing meta-labels as a hidden layer in an artificial neural network. We explain how meta-labels essentially allow a random projection into a space where non-linearities can easily be tackled with established linear learning algorithms. The proposed framework enables comparison and combination of related approaches to different multi-label problems. Indeed, we present a novel model in the framework and evaluate it empirically against several high-performing methods, with respect to predictive performance and scalability, on a number of datasets and evaluation metrics. Our deployment of an ensemble of meta-label classifiers obtains competitive accuracy for a fraction of the computation required by the current meta-label methods for multi-label classification.
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Multi-label Classification with Meta-labels
1. Multi-Label Classification with
Meta Labels
Jesse Read(1), Antti Puurula(2), Albert Bifet(3)
(1) Aalto University and HIIT, Finland
(2) University of Waikato, New Zealand
(3) Huawei Noah’s Ark Lab, Hong Kong
17 December 2014
2. Overview
Multi-label classification:
⊆ {beach, sunset, foliage, field, mountain, urban}
• Most multi-label classification methods can be expressed
in a general framework of meta-label classification
• Our work combines labels into meta-labels, so as to learn
dependence efficiently and effectively.
3. Multi-label Learning
With input variables X, produce predictions for multiple
output variables Y . The basic binary relevance approach,
YCYBYA
X5X4X3X2X1
• does not capture label dependence (among Y -variables)
• does not scale to large number of labels
The label powerset approach models label combinations as
class values in a multi-class problem.
4. Meta Labels
We introduce a layer of meta-labels,
YCYBYA
YB,CYA,B
X5X4X3X2X1
• captures label dependence
• meta-labels can be fewer than the original number of
labels
• and are deterministically decodable into the labels
5. Producing Meta Labels
dataset
instance labels
1 B
2 B,C
3 C
4 B
5 A,C
6 A,C
7 A,C
8 A,B,C
9 C
binary relevance
YA YB YC
0 1 0
0 1 1
0 0 1
0 1 0
1 0 1
1 0 1
1 0 1
1 1 1
0 0 1
label powerset
YA,B,C
B
BC
C
B
AC
AC
AC
ABC
C
meta labels
YA,C YB,C
∅ B
∅ BC
∅ C
∅ B
AC C
AC C
AC C
∅ BC
∅ C
• binary relevance: 9 exs, 3 × 2 binary classes
• label powerset: 9 exs, 1 × 5 multi-class
6. Producing Meta Labels
dataset
instance labels
1 B
2 B,C
3 C
4 B
5 A,C
6 A,C
7 A,C
8 A,B,C
9 C
binary relevance
YA YB YC
0 1 0
0 1 1
0 0 1
0 1 0
1 0 1
1 0 1
1 0 1
1 1 1
0 0 1
label powerset
YA,B,C
B
BC
C
B
AC
AC
AC
ABC
C
meta labels
YA,C YB,C
∅ B
∅ BC
∅ C
∅ B
AC C
AC C
AC C
∅ BC
∅ C
• binary relevance: 9 exs, 3 × 2 binary classes
• label powerset: 9 exs, 1 × 5 multi-class
• pruned meta labels: 9 exs, 2 meta labels, of 2 and 3 values
YAC ∈ {∅, AC}, YBC ∈ {B, C, BC}
(one possible formulation)
7. Example 2
There is no need to model labels together if there is no strong
dependence between them,
YCYBYA
YCYA,B
X1X2X3X4X5
YA,B ∈ {∅, B, AB}, YC ∈ {∅, C}
(e.g., no strong relation between C and the other labels)
8. General process for
classification with meta-labels
1 Make a partition (either overlapping or disjoint) of the
label set
2 Relabel the meta-labels, deciding on how many values
each label can take (i.e., possibly pruning some)
3 Train classifiers to predict meta-labels from the input
instances
4 Make predictions into the meta-label space
5 Recombine predictions into the label space
9. Voting Using Meta Labels
Table : Meta-label Vote, e.g., for YAB ∈ {∅, B, AB}
v A B p(YAB = v|˜x)
∅ 0 0 0.0
B 0 1 0.9
AB 1 1 0.1
PAB 0.1 1.0
Table : Labelset Voting: From Meta-labels to Labels
A B C
PAB 0.1 1.0
PBC 0.7 0.3
k Pk,j 0.1 1.7 0.3
ˆy (> 0.5) 0 1 0
10. Results
0 5 10 15 20 25 30 35 400.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
Accuracy
RAkEL
RAkELp
0 5 10 15 20 25 30 35 400
500
1000
1500
2000
Runningtime
RAkEL
RAkELp
Figure : On Enron (1700 emails, 53 labels). RAkEL: random
ensembles of ‘label-powerset method’ on subsets of size k
(horizontal axis) vs RAkELp: with pruned meta-labels
11. Summary
• General framework of meta-labels for multi-label
classification
• Unifies various approaches from the literature
• New models RAkELp and EpRd have large improvement
in running time
• Part of solution for LSHTC4 (1st place) and WISE (2nd
Place) Kaggle challenges
• Code available at
http://meka.sourceforge.net
12. Multi-Label Classification with
Meta Labels
Jesse Read(1), Antti Puurula(2), Albert Bifet(3)
(1) Aalto University and HIIT, Finland
(2) University of Waikato, New Zealand
(3) Huawei Noah’s Ark Lab, Hong Kong
17 December 2014