20190818 Bread Seminar

Reading the paper:
Mul1-class Classiﬁca1on without
Mul1-class Labels
X37  
Aug, 18th 2019 in Tokyo

Paper info
• Title: MULTI-CLASS CLASSIFICATION WITHOUT MULTI- CLASS LABELS
• Author: Yen-Chang Hsu et al.
• Published: ICLR 2019
• Demo: hYps://github.com/GT-RIPL/L2C
• Targeted Problem: 
Mul1-class classiﬁca1on using pairwise similarity 
• Proposed Summary:
✓ new framework: MCL (Meta Classiﬁca1on Likelihood)
✓ comparable accuracy with SotA in supervised, semi-supervised,
unsupervised se_ngs.

Why we need pairwise similarity learning?
• Three reasons:
✓ labeling could be expensive to collect
✓ The classes may be ambiguous or non-expert human annotators
may be able to more easily provide informa1on about whether
two instances are of the same class or not, rather than
iden1fying the speciﬁc class
✓ diﬀerent tasks on available data:
✴ supervised learning — known classes
✴ cross-task unsupervised learning — unknown classes in the
target domain
✴ semi-supervised learning — mix of labeled and unlabeled
with known classes

How we deal with mul1-class classifica1on
• Historically, ensemble methods and binariza1on techniques
✓ “one vs one”: (Knerr+1990; Has1e & Tibshirani 1998; Wu+2004)
✓ “one vs all”: (Anand+ 1995; Rinin & Klautau 2004)
• Author proposed,
✓ binary classifica1on of Mul1-class classifica1on
✓ i.e. similarity (0-1 matrix) of their categorical distribu1ons

Model overview
• Computed the likelihood breakdown
under the approxima1on that each
similarity S_{ij} and others are
independent on observed nodes X_i and
X_j.
Author named MCL 
(Meta Classiﬁcation
graphical model

Learning Paradigms for (un-)supervised
• Supervised learning: classiﬁca1on given similarity matrix
• Unsupervised learning: classiﬁca1on under the es1ma1on of data similarity 
from the auxiliary dataset by SPN (similarity predic1on network)
✓ cross-domain transfer: the overlap between the auxiliary dataset and
target dataset
✓ cross-task transfer: not overlapped

Results of supervised learning with weak labels
• Datasets:
✓ MNIST: handwriYen digit with 60k training and 10k tes1ng
✓ CIFAR10 and CIFAR100: colored images with 50k training and 10k tes1ng
• Comparison: CE (cross entropy) loss, KCL (KL divergence contras1ve loss)
• results are comparable
Supervised learning (lower is better)

Results of Unsupervised cross-task transfer learning
• Datasets:
✓ Omniglot: 20 images for each of 1623 diﬀerent characters, which are
from 50 diﬀerent alphabets
✓ ImageNet118: 1000-class dataset, separated into 882-class for pre-
training of similarity predic1on func1on and 118-class for the unlabeled
target data
• Criteria for clustering: ACC (Accuracy), NMI (normalized mutual informa1on)
• beYer results (i wonder if it compares the recent DEC and DECE models)
Unsupervised cross-task transfer learning on Omniglot (higher is better)
and ImageNet118 (C=3

Learning Paradigms for semi-supervised
• Semi-supervised learning: es1mate the similarity from labeled and unlabeled (DL and DUL)
• Main idea is to create a pseudo-similarity SL+UL for the meta classiﬁer by binarizing the
predicted hat{S}L+UL at probability 0.5 (named Pseudo-MCL) c.f. Pseudo-Labeling (Lee 2013)
• The learning objec1ve combines the mul1-class cross entropy and Pseudo-MCL
Result on CIFAR10 (lower is better)
• Datasets: CIFAR10: 50k images, separated
randomly into 4k labeled images and 46k
unlabeled images
• Model: ResNet-18 (pre-ac1va1on version,
without dropout)

Evalua1on Measures for clustering performance
High value of these metrics indicates better performance
hYp://ecmlpkdd2017.ijs.si/papers/paperID345.pdf

Benchmarks for unsupervised learning
• DEC (Deep Embedded Clustering): ICML 2016
• DCEC (Deep Convolu1onal Embedded Clustering): ICONIP 2017
✓ Both minimize KL between sot labels and the prediﬁned target distribu1on
✓ DEC op1mizes rec loss and KL separately, and DCEC does by E2E
hYps://xifengguo.github.io/papers/ICONIP17-DCEC.pdf hYps://arxiv.org/pdf/1511.06335.pdf
DCEC model architecture and performance stats
Minimize the KL-divergence on z features via soft

Π-model and Temporal ensembling
TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING - ICML 2017
hYps://arxiv.org/pdf/1610.02242.pdf
• self-ensembling relied on dropout / input augmenta1on
• Π-model: ensembling with diﬀerent dropout and augmenta1on
• Temporal ensembling: with previous training epochs
Supervised Loss and
Unsupervised Loss

VAT (Virtual Adversarial Training)
DISTRIBUTIONAL SMOOTHING WITH VIRTUAL ADVERSARIAL TRAINING - ICLR 2016 
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
• Purpose: to promote the smoothness of the model distribu1on
• Idea: minimize KL-divergence on the posterior distribu1on with the input noise
• Implementa1on: Reguraliza1on term named LDS
LDS: KL-divergence by the input noise having its largest dispersion
Eﬃcient way to compute LDS 
1. Taylor expansion, 2. compute eigen vector corresponding to the largest eigenvalue
hYps://qiita.com/yuzupepper/items/e2d093f05adccbe1b7f1

Cita1on
• MULTI-CLASS CLASSIFICATION WITHOUT MULTI- CLASS LABELS 
hYps://arxiv.org/abs/1901.00544
• Unsupervised Deep Embedding for Clustering Analysis 
• Deep Clustering with Convolu1onal Autoencoders 
hYps://xifengguo.github.io/papers/ICONIP17-DCEC.pdf
• TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING 
• Virtual Adversarial Training: A Regulariza1on Method for Supervised
and Semi-Supervised Learning 

20190818 Bread Seminar

Recommended

Recommended

More Related Content

Similar to 20190818 Bread Seminar

Similar to 20190818 Bread Seminar (20)

More from X 37

More from X 37 (9)

Recently uploaded

Recently uploaded (20)

20190818 Bread Seminar