3. Why we need pairwise similarity learning?
• Three reasons:
✓ labeling could be expensive to collect
✓ The classes may be ambiguous or non-expert human annotators
may be able to more easily provide informa1on about whether
two instances are of the same class or not, rather than
iden1fying the specific class
✓ different tasks on available data:
✴ supervised learning — known classes
✴ cross-task unsupervised learning — unknown classes in the
target domain
✴ semi-supervised learning — mix of labeled and unlabeled
with known classes
13. Π-model and Temporal ensembling
TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING - ICML 2017
hYps://arxiv.org/pdf/1610.02242.pdf
• self-ensembling relied on dropout / input augmenta1on
• Π-model: ensembling with different dropout and augmenta1on
• Temporal ensembling: with previous training epochs
Supervised Loss and
Unsupervised Loss
14. VAT (Virtual Adversarial Training)
DISTRIBUTIONAL SMOOTHING WITH VIRTUAL ADVERSARIAL TRAINING - ICLR 2016
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
hYps://arxiv.org/pdf/1507.00677.pdf
• Purpose: to promote the smoothness of the model distribu1on
• Idea: minimize KL-divergence on the posterior distribu1on with the input noise
• Implementa1on: Reguraliza1on term named LDS
LDS: KL-divergence by the input noise having its largest dispersion
Efficient way to compute LDS
1. Taylor expansion, 2. compute eigen vector corresponding to the largest eigenvalue
hYps://qiita.com/yuzupepper/items/e2d093f05adccbe1b7f1