PR 171: Large margin softmax loss for Convolutional Neural Networks
1. visionNoob @ PR12
PR 171 :
Large-Margin Softmax Loss
for Convolutional Neural Networks
Liu, Weiyang, et al. ICML 2016
https://arxiv.org/abs/1612.02295
2. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 2
Wang, Mei, and Weihong Deng. "Deep face recognition: A survey." arXiv preprint arXiv:1804.06655 (2018).
today!
The Development of Loss Function
see also
PR-127: FaceNet (https://youtu.be/0k3X-9y_9S8)
SphereFace
3. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 3
Large-Margin Softmax Loss
intra-class compactness
inter-class separability
Ahmed, Saeed, et al. "Covert cyber assault detection in smart grid
networks utilizing feature selection and Euclidean distance-based
machine learning." Applied Sciences 8.5 (2018): 772.
Intuitively,
the learned features are good if
are simultaneously maximized
Goal
more discriminative information
(compared to original softmax loss or others)
In perspective of loss function,
4. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 4
Large-Margin Softmax Loss
๐ ๐ = ๐1
๐
๐
๐ ๐ = ๐2
๐
๐
softmax loss
๐ฟ ๐ ๐๐๐ก๐๐๐ฅ = โ๐๐๐
exp(๐๐ฆ
๐ ๐)
๐=1
๐ถ
exp(๐๐
๐
๐)
* biases and batches are omitted for simplicity.
= โ๐๐๐
exp(๐1
๐
๐)
exp ๐1
๐
๐ + exp(๐2
๐
๐)
๐พ๐
๐พ ๐ ๐พ ๐
๐ฆ1
๐ฆ2
๐
example : binary classification
(if ๐ฆ = 1)
last fc layerfeature
from prev layer
The original softmax is to force ๐1
๐
๐ > ๐2
๐
๐
in order to classify ๐ correctly.
intra-class compactness
inter-class separability
5. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 5
Large-Margin Softmax Loss
Intuition
๐พ๐
๐พ ๐ ๐พ ๐
๐ฆ1
๐ฆ2
๐
๐ ๐ = ๐1
๐
๐
= ๐1 ๐ cos(๐1)
๐ ๐ = ๐2
๐
๐
= ๐2 ๐ cos(๐2)
๐1 ๐ cos ๐1 > ๐2 ๐ cos(๐2)
1. Original softmax is to force ๐1
๐
๐ > ๐2
๐
๐
๐1 ๐ cos ๐๐1 > ๐2 ๐ cos(๐2)
2. L-softmax (proposed)
(0 โค ๐1 โค
๐
๐
)
๐ is a positive integer
๐1 ๐ cos ๐1 โฅ ๐2 ๐ cos(๐๐1)
following inequality holds:
> ๐1 ๐ cos ๐2
6. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 6
Large-Margin Softmax Loss
Intuition
๐1 ๐ cos ๐1 > ๐2 ๐ cos(๐2)
1. Original softmax is to force ๐1
๐
๐ > ๐2
๐
๐
๐1 ๐ cos ๐๐1 > ๐2 ๐ cos(๐2)
2. L-softmax (proposed)
(0 โค ๐1 โค
๐
๐
)
๐ is a positive integer
๐1 ๐ cos ๐1 โฅ ๐2 ๐ cos(๐๐1)
following inequality holds:
> ๐1 ๐ cos ๐2
the original softmax loss requires ฮธ1 < ฮธ2 to classify the sample x as class 1,
while the L-Softmax loss requires mฮธ1 < ฮธ2 to make the same decision.
๐1 ๐ cos ๐1 > ๐2 ๐ cos(๐2) ๐1 ๐ cos ๐๐1 > ๐2 ๐ cos(๐2)
Geometric interpretation
-> More rigor classification criteria
7. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 7
Large-Margin Softmax Loss
1. softmax loss
๐ฟ ๐ ๐๐๐ก๐๐๐ฅ = โ๐๐๐
exp(๐๐ฆ
๐
๐)
๐=1
๐ถ
exp(๐๐
๐
๐)
= โ๐๐๐
exp( ๐๐ฆ ๐ cos(๐ ๐ฆ)))
๐=1
๐ถ
exp( ๐๐ ๐ cos(๐๐))
* biases and batches are omitted for simplicity.
2. L-softmax loss (Large-Margin Softmax Loss)
๐ฟ ๐ฟโ๐ ๐๐๐ก๐๐๐ฅ = โ๐๐๐
exp( ๐๐ฆ ๐ ฯ(๐ ๐ฆ)))
exp( ๐๐ฆ ๐ ฯ(๐ ๐ฆ))) + ๐โ ๐ฆ exp( ๐๐ ๐ cos(๐๐))
monotonically decreasing function
8. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 8
Experiment
5.1 Experimental Settings
โข Dataset
โข visual classification : MNIST, CIFAR10, CIFAR100
โข face verification : LFW
โข VGG-like model with Caffe
โข PReLU nonlinearities
โข Batch size : 256
โข Weight decay : 0.0005
โข Momentum : 0.9
โข He initialization
โข Batch normalization
โข No dropout
9. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 9
Experiment
model overview
10. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 10
Experiment
3.2 visual classification
1. MNIST
11. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 11
Experiment
3.2 visual classification
1. Cifar10/Cifar100
12. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 12
Experiment
3.2 visual classification
softmax suffer from overfitting
13. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 13
Experiment
3.2 visual classification
softmax suffer from overfitting
14. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 14
Experiment
3.3 face verification
1. LFW
- Training set : CASIA-WebFace
490k labeled face images with
1000 person class
2. ROI Align : IntraFace
3. PCA for compact vector
15. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 15
Conclusion
1. Large-Margin Softmax loss is proposed
2. Adjustable margin with parameter ๐
3. Clear intuition and geometric interpretation
4. Sate-of-the-art CNN in benchmarks
16. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 16
Discussion
SphereFace: Deep Hypersphere Embedding for Face Recognition
๐ฟ ๐ฟโ๐ ๐๐๐ก๐๐๐ฅ = โ๐๐๐
exp( ๐๐ฆ ๐ ฯ(๐ ๐ฆ)))
exp( ๐๐ฆ ๐ ฯ(๐ ๐ฆ))) + ๐โ ๐ฆ exp( ๐๐ ๐ cos(๐๐))
๐1
๐
๐ = ๐1 ๐ cos ๐1
๐2
๐
๐ = ๐2 ๐ cos ๐2
( ๐ is normalized!)
17. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 17
Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face recognition." CVPR 2019
CVPR 2019CVPR 2018CVPR 2017
Angular Margin Losses
L-softmax + normalized
18. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 18
Triplet loss (CVPR 2015)
Wen, Yandong, et al. "A discriminative feature learning approach for deep face recognition." ECCV 2016
Centre loss (ECCV2016)
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face
recognition and clustering.โ CVPR 2015.
Contrastive loss (NIPS 2014) Ring loss (CVPR 2018)
Metric Learning
Sun, Yi, et al. "Deep learning face representation by joint identification-verification." NIPS 2014
Zheng, Yutong, Dipan K. Pal, and Marios Savvides. "Ring loss: Convex feature normalization for face
recognition." CVPR 2018
19. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 19
Q & A