PR 171: Large margin softmax loss for Convolutional Neural Networks

visionNoob @ PR12
PR 171 :
Large-Margin Softmax Loss
for Convolutional Neural Networks
Liu, Weiyang, et al. ICML 2016
https://arxiv.org/abs/1612.02295

PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 2
Wang, Mei, and Weihong Deng. "Deep face recognition: A survey." arXiv preprint arXiv:1804.06655 (2018).
today!
The Development of Loss Function
see also
PR-127: FaceNet (https://youtu.be/0k3X-9y_9S8)
SphereFace

intra-class compactness
inter-class separability
Ahmed, Saeed, et al. "Covert cyber assault detection in smart grid
networks utilizing feature selection and Euclidean distance-based
machine learning." Applied Sciences 8.5 (2018): 772.
Intuitively,
the learned features are good if
are simultaneously maximized
Goal
more discriminative information
(compared to original softmax loss or others)
In perspective of loss function,

𝒚 𝟏 = 𝑊1
𝑇
𝒙
𝒚 𝟐 = 𝑊2
𝑇
𝒙
softmax loss
𝐿 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 = −𝑙𝑜𝑔
exp(𝑊𝑦
𝑇 𝒙)
𝑗=1
𝐶
exp(𝑊𝑗
𝑇
𝒙)
* biases and batches are omitted for simplicity.
= −𝑙𝑜𝑔
exp(𝑊1
𝑇
𝒙)
exp 𝑊1
𝑇
𝒙 + exp(𝑊2
𝑇
𝒙)
𝑾𝒙
𝑾 𝟏 𝑾 𝟐
𝑦1
𝑦2
𝒚
example : binary classification
(if 𝑦 = 1)
last fc layerfeature
from prev layer
The original softmax is to force 𝑊1
𝑇
𝒙 > 𝑊2
𝑇
𝒙
in order to classify 𝒙 correctly.
intra-class compactness
inter-class separability

Intuition
𝑾𝒙
𝑾 𝟏 𝑾 𝟐
𝑦1
𝑦2
𝒚
𝒚 𝟏 = 𝑊1
𝑇
𝒙
= 𝑊1 𝒙 cos(𝜃1)
𝒚 𝟐 = 𝑊2
𝑇
𝒙
= 𝑊2 𝒙 cos(𝜃2)
𝑊1 𝒙 cos 𝜃1 > 𝑊2 𝒙 cos(𝜃2)
1. Original softmax is to force 𝑊1
𝑇
𝒙 > 𝑊2
𝑇
𝒙
𝑊1 𝒙 cos 𝑚𝜃1 > 𝑊2 𝒙 cos(𝜃2)
2. L-softmax (proposed)
(0 ≤ 𝜃1 ≤
𝜋
𝑚
)
𝑚 is a positive integer
𝑊1 𝒙 cos 𝜃1 ≥ 𝑊2 𝒙 cos(𝑚𝜃1)
following inequality holds:
> 𝑊1 𝒙 cos 𝜃2

Intuition
𝑊1 𝒙 cos 𝜃1 > 𝑊2 𝒙 cos(𝜃2)
1. Original softmax is to force 𝑊1
𝑇
𝒙 > 𝑊2
𝑇
𝒙
𝑊1 𝒙 cos 𝑚𝜃1 > 𝑊2 𝒙 cos(𝜃2)
2. L-softmax (proposed)
(0 ≤ 𝜃1 ≤
𝜋
𝑚
)
𝑚 is a positive integer
𝑊1 𝒙 cos 𝜃1 ≥ 𝑊2 𝒙 cos(𝑚𝜃1)
following inequality holds:
> 𝑊1 𝒙 cos 𝜃2
the original softmax loss requires θ1 < θ2 to classify the sample x as class 1,
while the L-Softmax loss requires mθ1 < θ2 to make the same decision.
𝑊1 𝒙 cos 𝜃1 > 𝑊2 𝒙 cos(𝜃2) 𝑊1 𝒙 cos 𝑚𝜃1 > 𝑊2 𝒙 cos(𝜃2)
Geometric interpretation
-> More rigor classification criteria

1. softmax loss
𝐿 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 = −𝑙𝑜𝑔
exp(𝑊𝑦
𝑇
𝒙)
𝑗=1
𝐶
exp(𝑊𝑗
𝑇
𝒙)
= −𝑙𝑜𝑔
exp( 𝑊𝑦 𝒙 cos(𝜃 𝑦)))
𝑗=1
𝐶
exp( 𝑊𝑗 𝒙 cos(𝜃𝑗))
* biases and batches are omitted for simplicity.
2. L-softmax loss (Large-Margin Softmax Loss)
𝐿 𝐿−𝑠𝑜𝑓𝑡𝑚𝑎𝑥 = −𝑙𝑜𝑔
exp( 𝑊𝑦 𝒙 ψ(𝜃 𝑦)))
exp( 𝑊𝑦 𝒙 ψ(𝜃 𝑦))) + 𝑗≠𝑦 exp( 𝑊𝑗 𝒙 cos(𝜃𝑗))
monotonically decreasing function

Experiment
5.1 Experimental Settings
• Dataset
• visual classification : MNIST, CIFAR10, CIFAR100
• face verification : LFW
• VGG-like model with Caffe
• PReLU nonlinearities
• Batch size : 256
• Weight decay : 0.0005
• Momentum : 0.9
• He initialization
• Batch normalization
• No dropout

Experiment
model overview

Experiment
3.2 visual classification
1. MNIST

Experiment
1. Cifar10/Cifar100

Experiment
softmax suffer from overfitting

Experiment
3.3 face verification
1. LFW
- Training set : CASIA-WebFace
490k labeled face images with
1000 person class
2. ROI Align : IntraFace
3. PCA for compact vector

Conclusion
1. Large-Margin Softmax loss is proposed
2. Adjustable margin with parameter 𝒎
3. Clear intuition and geometric interpretation
4. Sate-of-the-art CNN in benchmarks

Discussion
SphereFace: Deep Hypersphere Embedding for Face Recognition
𝐿 𝐿−𝑠𝑜𝑓𝑡𝑚𝑎𝑥 = −𝑙𝑜𝑔
exp( 𝑊𝑦 𝒙 ψ(𝜃 𝑦)))
exp( 𝑊𝑦 𝒙 ψ(𝜃 𝑦))) + 𝑗≠𝑦 exp( 𝑊𝑗 𝒙 cos(𝜃𝑗))
𝑊1
𝑇
𝒙 = 𝑊1 𝒙 cos 𝜃1
𝑊2
𝑇
𝒙 = 𝑊2 𝒙 cos 𝜃2
( 𝑊 is normalized!)

Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face recognition." CVPR 2019
CVPR 2019CVPR 2018CVPR 2017
Angular Margin Losses
L-softmax + normalized

Triplet loss (CVPR 2015)
Wen, Yandong, et al. "A discriminative feature learning approach for deep face recognition." ECCV 2016
Centre loss (ECCV2016)
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face
recognition and clustering.“ CVPR 2015.
Contrastive loss (NIPS 2014) Ring loss (CVPR 2018)
Metric Learning
Sun, Yi, et al. "Deep learning face representation by joint identification-verification." NIPS 2014
Zheng, Yutong, Dipan K. Pal, and Marios Savvides. "Ring loss: Convex feature normalization for face
recognition." CVPR 2018

Q & A

PR 171: Large margin softmax loss for Convolutional Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PR 171: Large margin softmax loss for Convolutional Neural Networks

Similar to PR 171: Large margin softmax loss for Convolutional Neural Networks (20)

More from jaewon lee

More from jaewon lee (9)

Recently uploaded

Recently uploaded (20)

PR 171: Large margin softmax loss for Convolutional Neural Networks

Editor's Notes