Label Propagation using Amendable Clamping

Label Propagation Using Amendable Clamping
1
○ Tatsurou Miyazaki
Tokyo University of Science
Yasunobu Sumikawa
Tokyo University of Science

Motivation
8
Suicide bombings is not assigned.

Motivation
9
• Mass murder and Suicide bombings is not assigned.

Motivation
• Problem 1 : taking high cost
• Semi-supervised learning is known as better approach.
• Ex.) Lapel Propagation (LP)
10

11
Labeled data
Unlabeled data
0.81
0.07
0.62
Motivation

Motivation
• Problem 2 : the quality of dataset fall.
• Wrong : Linear Neighborhood Propagation (LNP)
• Missing : our proposed approach
12

Effect of missing labels
13
The number of missing labels
Themicro-averagedF-scores• The missing of labels is a serious issue for accuracy.

14

15

Our approach
• label propagation using amendable clamping (LPAC).
• Objective: decreasing the impact of missing labels on accuracy.
• Our approach is 45% higher than comparative
approach.
• document : 70%
• label : 50%
16

17
Labeled data
Unlabeled data
Proposed algorithm: LPAC

18
top-k of
Labeled data
Unlabeled data

19
top-k of
Labeled data at nth iteration Instead of clamping
we set average valuesUnlabeled data at nth iteration

20
top-k of
Labeled data at nth iteration
Unlabeled data at nth iteration
Instead of clamping
we set average values

Experimental setting
21
Dataset SIAM 2007 Text Mining Competition dataset
Labeled data 4819
Unlabeled data 4819
Classes 22
Average number of
label
3.41
• We apply latent dirichlet allocation (LDA) to our data.

22
• The ratio of documents that has missing labels.
Label A
Label B
Label A
Label C
Ex.) Extraction ratio is 40%. (2 documents / 5 documents)

23
• The ratio of documents that has missing labels.
Label A
Label B
Label A
Label C
Ex.) Extraction ratio is 40%. (2 documents / 5 documents)

24
• The ratio of missing labels.
Ex.) Removal ratio is 50%.
1. Label A
2. Label B
3. Label C
4. Label D
(2 labels / 4 labels)

25
• The ratio of missing labels.
1. Label A
2. Label B
3. Label C
4. Label D
(2 labels / 4 labels)Ex.) Removal ratio is 50%.

26
Comparative algorithm
LP (traditional)
DLP (Dynamic Label Propagation, state-of-
the-art)
LNP (Linear neighborhoods propagation)
Random Forest
SVM

Micro-averaged F-scores for six classifiers
27
The x axis represents the ratio of documents that has missing labels.
The y axis represents the ratio of missing labels.

28

29

30

31

32

Conclusion
• We propose a multi-label classification (LPAC) for a
moderately challenging multi-labeling task.
1. Propagating labels according to top-k similar data.
2. Updating labeled data by taking cluster assumption.
33

Conclusion
• We propose a multi-label classification (LPAC) for a
moderately challenging multi-labeling task.
1. Propagating labels according to top-k similar data.
2. Updating labeled data by taking cluster assumption.
• Future work
1. The effective utilization of label correlation.
2. How effectively our algorithm works on a real dataset.
3. Establishing algorithm that can be trained on dataset including
both of wrong and missing.
34

Label Propagation using Amendable Clamping

More Related Content

Recently uploaded

Featured

Label Propagation using Amendable Clamping