SlideShare a Scribd company logo
1 of 19
visionNoob @ PR12
PR 171 :
Large-Margin Softmax Loss
for Convolutional Neural Networks
Liu, Weiyang, et al. ICML 2016
https://arxiv.org/abs/1612.02295
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 2
Wang, Mei, and Weihong Deng. "Deep face recognition: A survey." arXiv preprint arXiv:1804.06655 (2018).
today!
The Development of Loss Function
see also
PR-127: FaceNet (https://youtu.be/0k3X-9y_9S8)
SphereFace
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 3
Large-Margin Softmax Loss
intra-class compactness
inter-class separability
Ahmed, Saeed, et al. "Covert cyber assault detection in smart grid
networks utilizing feature selection and Euclidean distance-based
machine learning." Applied Sciences 8.5 (2018): 772.
Intuitively,
the learned features are good if
are simultaneously maximized
Goal
more discriminative information
(compared to original softmax loss or others)
In perspective of loss function,
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 4
Large-Margin Softmax Loss
๐’š ๐Ÿ = ๐‘Š1
๐‘‡
๐’™
๐’š ๐Ÿ = ๐‘Š2
๐‘‡
๐’™
softmax loss
๐ฟ ๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘”
exp(๐‘Š๐‘ฆ
๐‘‡ ๐’™)
๐‘—=1
๐ถ
exp(๐‘Š๐‘—
๐‘‡
๐’™)
* biases and batches are omitted for simplicity.
= โˆ’๐‘™๐‘œ๐‘”
exp(๐‘Š1
๐‘‡
๐’™)
exp ๐‘Š1
๐‘‡
๐’™ + exp(๐‘Š2
๐‘‡
๐’™)
๐‘พ๐’™
๐‘พ ๐Ÿ ๐‘พ ๐Ÿ
๐‘ฆ1
๐‘ฆ2
๐’š
example : binary classification
(if ๐‘ฆ = 1)
last fc layerfeature
from prev layer
The original softmax is to force ๐‘Š1
๐‘‡
๐’™ > ๐‘Š2
๐‘‡
๐’™
in order to classify ๐’™ correctly.
intra-class compactness
inter-class separability
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 5
Large-Margin Softmax Loss
Intuition
๐‘พ๐’™
๐‘พ ๐Ÿ ๐‘พ ๐Ÿ
๐‘ฆ1
๐‘ฆ2
๐’š
๐’š ๐Ÿ = ๐‘Š1
๐‘‡
๐’™
= ๐‘Š1 ๐’™ cos(๐œƒ1)
๐’š ๐Ÿ = ๐‘Š2
๐‘‡
๐’™
= ๐‘Š2 ๐’™ cos(๐œƒ2)
๐‘Š1 ๐’™ cos ๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2)
1. Original softmax is to force ๐‘Š1
๐‘‡
๐’™ > ๐‘Š2
๐‘‡
๐’™
๐‘Š1 ๐’™ cos ๐‘š๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2)
2. L-softmax (proposed)
(0 โ‰ค ๐œƒ1 โ‰ค
๐œ‹
๐‘š
)
๐‘š is a positive integer
๐‘Š1 ๐’™ cos ๐œƒ1 โ‰ฅ ๐‘Š2 ๐’™ cos(๐‘š๐œƒ1)
following inequality holds:
> ๐‘Š1 ๐’™ cos ๐œƒ2
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 6
Large-Margin Softmax Loss
Intuition
๐‘Š1 ๐’™ cos ๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2)
1. Original softmax is to force ๐‘Š1
๐‘‡
๐’™ > ๐‘Š2
๐‘‡
๐’™
๐‘Š1 ๐’™ cos ๐‘š๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2)
2. L-softmax (proposed)
(0 โ‰ค ๐œƒ1 โ‰ค
๐œ‹
๐‘š
)
๐‘š is a positive integer
๐‘Š1 ๐’™ cos ๐œƒ1 โ‰ฅ ๐‘Š2 ๐’™ cos(๐‘š๐œƒ1)
following inequality holds:
> ๐‘Š1 ๐’™ cos ๐œƒ2
the original softmax loss requires ฮธ1 < ฮธ2 to classify the sample x as class 1,
while the L-Softmax loss requires mฮธ1 < ฮธ2 to make the same decision.
๐‘Š1 ๐’™ cos ๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2) ๐‘Š1 ๐’™ cos ๐‘š๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2)
Geometric interpretation
-> More rigor classification criteria
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 7
Large-Margin Softmax Loss
1. softmax loss
๐ฟ ๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘”
exp(๐‘Š๐‘ฆ
๐‘‡
๐’™)
๐‘—=1
๐ถ
exp(๐‘Š๐‘—
๐‘‡
๐’™)
= โˆ’๐‘™๐‘œ๐‘”
exp( ๐‘Š๐‘ฆ ๐’™ cos(๐œƒ ๐‘ฆ)))
๐‘—=1
๐ถ
exp( ๐‘Š๐‘— ๐’™ cos(๐œƒ๐‘—))
* biases and batches are omitted for simplicity.
2. L-softmax loss (Large-Margin Softmax Loss)
๐ฟ ๐ฟโˆ’๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘”
exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ)))
exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ))) + ๐‘—โ‰ ๐‘ฆ exp( ๐‘Š๐‘— ๐’™ cos(๐œƒ๐‘—))
monotonically decreasing function
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 8
Experiment
5.1 Experimental Settings
โ€ข Dataset
โ€ข visual classification : MNIST, CIFAR10, CIFAR100
โ€ข face verification : LFW
โ€ข VGG-like model with Caffe
โ€ข PReLU nonlinearities
โ€ข Batch size : 256
โ€ข Weight decay : 0.0005
โ€ข Momentum : 0.9
โ€ข He initialization
โ€ข Batch normalization
โ€ข No dropout
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 9
Experiment
model overview
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 10
Experiment
3.2 visual classification
1. MNIST
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 11
Experiment
3.2 visual classification
1. Cifar10/Cifar100
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 12
Experiment
3.2 visual classification
softmax suffer from overfitting
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 13
Experiment
3.2 visual classification
softmax suffer from overfitting
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 14
Experiment
3.3 face verification
1. LFW
- Training set : CASIA-WebFace
490k labeled face images with
1000 person class
2. ROI Align : IntraFace
3. PCA for compact vector
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 15
Conclusion
1. Large-Margin Softmax loss is proposed
2. Adjustable margin with parameter ๐’Ž
3. Clear intuition and geometric interpretation
4. Sate-of-the-art CNN in benchmarks
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 16
Discussion
SphereFace: Deep Hypersphere Embedding for Face Recognition
๐ฟ ๐ฟโˆ’๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘”
exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ)))
exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ))) + ๐‘—โ‰ ๐‘ฆ exp( ๐‘Š๐‘— ๐’™ cos(๐œƒ๐‘—))
๐‘Š1
๐‘‡
๐’™ = ๐‘Š1 ๐’™ cos ๐œƒ1
๐‘Š2
๐‘‡
๐’™ = ๐‘Š2 ๐’™ cos ๐œƒ2
( ๐‘Š is normalized!)
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 17
Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face recognition." CVPR 2019
CVPR 2019CVPR 2018CVPR 2017
Angular Margin Losses
L-softmax + normalized
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 18
Triplet loss (CVPR 2015)
Wen, Yandong, et al. "A discriminative feature learning approach for deep face recognition." ECCV 2016
Centre loss (ECCV2016)
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face
recognition and clustering.โ€œ CVPR 2015.
Contrastive loss (NIPS 2014) Ring loss (CVPR 2018)
Metric Learning
Sun, Yi, et al. "Deep learning face representation by joint identification-verification." NIPS 2014
Zheng, Yutong, Dipan K. Pal, and Marios Savvides. "Ring loss: Convex feature normalization for face
recognition." CVPR 2018
PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 19
Q & A

More Related Content

What's hot

็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–
็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–
็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–
Yusuke Uchida
ย 
็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)
็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)
็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)
STAIR Lab, Chiba Institute of Technology
ย 

What's hot (20)

[DL่ผช่ชญไผš]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
[DL่ผช่ชญไผš]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...[DL่ผช่ชญไผš]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
[DL่ผช่ชญไผš]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
ย 
PCAใฎๆœ€็ต‚ๅฝขๆ…‹GPLVMใฎ่งฃ่ชฌ
PCAใฎๆœ€็ต‚ๅฝขๆ…‹GPLVMใฎ่งฃ่ชฌPCAใฎๆœ€็ต‚ๅฝขๆ…‹GPLVMใฎ่งฃ่ชฌ
PCAใฎๆœ€็ต‚ๅฝขๆ…‹GPLVMใฎ่งฃ่ชฌ
ย 
ใ€DL่ผช่ชญไผšใ€‘Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
ใ€DL่ผช่ชญไผšใ€‘Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...ใ€DL่ผช่ชญไผšใ€‘Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
ใ€DL่ผช่ชญไผšใ€‘Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
ย 
[DL่ผช่ชญไผš]ICLR2020ใฎๅˆ†ๅธƒๅค–ๆคœ็Ÿฅ้€Ÿๅ ฑ
[DL่ผช่ชญไผš]ICLR2020ใฎๅˆ†ๅธƒๅค–ๆคœ็Ÿฅ้€Ÿๅ ฑ[DL่ผช่ชญไผš]ICLR2020ใฎๅˆ†ๅธƒๅค–ๆคœ็Ÿฅ้€Ÿๅ ฑ
[DL่ผช่ชญไผš]ICLR2020ใฎๅˆ†ๅธƒๅค–ๆคœ็Ÿฅ้€Ÿๅ ฑ
ย 
GANใฎๆฆ‚่ฆใจDCGANใฎใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃ๏ผใ‚ขใƒซใ‚ดใƒชใ‚บใƒ 
GANใฎๆฆ‚่ฆใจDCGANใฎใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃ๏ผใ‚ขใƒซใ‚ดใƒชใ‚บใƒ GANใฎๆฆ‚่ฆใจDCGANใฎใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃ๏ผใ‚ขใƒซใ‚ดใƒชใ‚บใƒ 
GANใฎๆฆ‚่ฆใจDCGANใฎใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃ๏ผใ‚ขใƒซใ‚ดใƒชใ‚บใƒ 
ย 
็•ฐๅธธๆคœ็ŸฅใจGAN: AnoGan
็•ฐๅธธๆคœ็ŸฅใจGAN: AnoGan็•ฐๅธธๆคœ็ŸฅใจGAN: AnoGan
็•ฐๅธธๆคœ็ŸฅใจGAN: AnoGan
ย 
[DL่ผช่ชญไผš]Learning Transferable Visual Models From Natural Language Supervision
[DL่ผช่ชญไผš]Learning Transferable Visual Models From Natural Language Supervision[DL่ผช่ชญไผš]Learning Transferable Visual Models From Natural Language Supervision
[DL่ผช่ชญไผš]Learning Transferable Visual Models From Natural Language Supervision
ย 
ใ€DL่ผช่ชญไผšใ€‘"Masked Siamese Networks for Label-Efficient Learning"
ใ€DL่ผช่ชญไผšใ€‘"Masked Siamese Networks for Label-Efficient Learning"ใ€DL่ผช่ชญไผšใ€‘"Masked Siamese Networks for Label-Efficient Learning"
ใ€DL่ผช่ชญไผšใ€‘"Masked Siamese Networks for Label-Efficient Learning"
ย 
ใ€ใƒกใ‚ฟใ‚ตใƒผใƒ™ใ‚คใ€‘ๅŸบ็›คใƒขใƒ‡ใƒซ / Foundation Models
ใ€ใƒกใ‚ฟใ‚ตใƒผใƒ™ใ‚คใ€‘ๅŸบ็›คใƒขใƒ‡ใƒซ / Foundation Modelsใ€ใƒกใ‚ฟใ‚ตใƒผใƒ™ใ‚คใ€‘ๅŸบ็›คใƒขใƒ‡ใƒซ / Foundation Models
ใ€ใƒกใ‚ฟใ‚ตใƒผใƒ™ใ‚คใ€‘ๅŸบ็›คใƒขใƒ‡ใƒซ / Foundation Models
ย 
Noisy Labels ใจๆˆฆใ†ๆทฑๅฑคๅญฆ็ฟ’
Noisy Labels ใจๆˆฆใ†ๆทฑๅฑคๅญฆ็ฟ’Noisy Labels ใจๆˆฆใ†ๆทฑๅฑคๅญฆ็ฟ’
Noisy Labels ใจๆˆฆใ†ๆทฑๅฑคๅญฆ็ฟ’
ย 
GANใฎ็ฐกๅ˜ใช็†่งฃใ‹ใ‚‰ๆญฃใ—ใ„็†่งฃใพใง
GANใฎ็ฐกๅ˜ใช็†่งฃใ‹ใ‚‰ๆญฃใ—ใ„็†่งฃใพใงGANใฎ็ฐกๅ˜ใช็†่งฃใ‹ใ‚‰ๆญฃใ—ใ„็†่งฃใพใง
GANใฎ็ฐกๅ˜ใช็†่งฃใ‹ใ‚‰ๆญฃใ—ใ„็†่งฃใพใง
ย 
Anomaly detection ็ณปใฎ่ซ–ๆ–‡ใ‚’ไธ€่จ€ใงใพใจใ‚ใŸ
Anomaly detection ็ณปใฎ่ซ–ๆ–‡ใ‚’ไธ€่จ€ใงใพใจใ‚ใŸAnomaly detection ็ณปใฎ่ซ–ๆ–‡ใ‚’ไธ€่จ€ใงใพใจใ‚ใŸ
Anomaly detection ็ณปใฎ่ซ–ๆ–‡ใ‚’ไธ€่จ€ใงใพใจใ‚ใŸ
ย 
็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–
็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–
็•ณใฟ่พผใฟใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใฎ้ซ˜็ฒพๅบฆๅŒ–ใจ้ซ˜้€ŸๅŒ–
ย 
-SSIIใฎๆŠ€่ก“ใƒžใƒƒใƒ—- ้ŽๅŽปโ€ข็พๅœจ, ใใ—ใฆๆœชๆฅ ๏ผป้ ˜ๅŸŸ๏ผฝ่ช่ญ˜
-SSIIใฎๆŠ€่ก“ใƒžใƒƒใƒ—- ้ŽๅŽปโ€ข็พๅœจ, ใใ—ใฆๆœชๆฅ ๏ผป้ ˜ๅŸŸ๏ผฝ่ช่ญ˜-SSIIใฎๆŠ€่ก“ใƒžใƒƒใƒ—- ้ŽๅŽปโ€ข็พๅœจ, ใใ—ใฆๆœชๆฅ ๏ผป้ ˜ๅŸŸ๏ผฝ่ช่ญ˜
-SSIIใฎๆŠ€่ก“ใƒžใƒƒใƒ—- ้ŽๅŽปโ€ข็พๅœจ, ใใ—ใฆๆœชๆฅ ๏ผป้ ˜ๅŸŸ๏ผฝ่ช่ญ˜
ย 
[DL่ผช่ชญไผš]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL่ผช่ชญไผš]Life-Long Disentangled Representation Learning with Cross-Domain Laten...[DL่ผช่ชญไผš]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL่ผช่ชญไผš]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
ย 
ใ€DL่ผช่ชญไผšใ€‘An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
ใ€DL่ผช่ชญไผšใ€‘An Image is Worth One Word: Personalizing Text-to-Image Generation usi...ใ€DL่ผช่ชญไผšใ€‘An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
ใ€DL่ผช่ชญไผšใ€‘An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
ย 
็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)
็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)
็Ÿฅ่ญ˜ใ‚ฐใƒฉใƒ•ใฎๅŸ‹ใ‚่พผใฟใจใใฎๅฟœ็”จ (็ฌฌ10ๅ›žใ‚นใƒ†ใ‚ขใƒฉใƒœไบบๅทฅ็Ÿฅ่ƒฝใ‚ปใƒŸใƒŠใƒผ)
ย 
5 ใ‚ฏใƒฉใ‚นใ‚ฟใƒชใƒณใ‚ฐใจ็•ฐๅธธๆคœๅ‡บ
5 ใ‚ฏใƒฉใ‚นใ‚ฟใƒชใƒณใ‚ฐใจ็•ฐๅธธๆคœๅ‡บ5 ใ‚ฏใƒฉใ‚นใ‚ฟใƒชใƒณใ‚ฐใจ็•ฐๅธธๆคœๅ‡บ
5 ใ‚ฏใƒฉใ‚นใ‚ฟใƒชใƒณใ‚ฐใจ็•ฐๅธธๆคœๅ‡บ
ย 
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
่จˆ็ฎ—่ซ–็š„ๅญฆ็ฟ’็†่ซ–ๅ…ฅ้–€ -PACๅญฆ็ฟ’ใจใ‹VCๆฌกๅ…ƒใจใ‹-
ย 
[DL่ผช่ชญไผš]GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
[DL่ผช่ชญไผš]GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution[DL่ผช่ชญไผš]GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
[DL่ผช่ชญไผš]GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
ย 

Similar to PR 171: Large margin softmax loss for Convolutional Neural Networks

Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
NAVER Engineering
ย 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
NoorUlHaq47
ย 
Cdm anetworkdesign
Cdm anetworkdesignCdm anetworkdesign
Cdm anetworkdesign
ahmita
ย 

Similar to PR 171: Large margin softmax loss for Convolutional Neural Networks (20)

Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
ย 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-net
ย 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
ย 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
ย 
My invited talk at the 23rd International Symposium of Mathematical Programmi...
My invited talk at the 23rd International Symposium of Mathematical Programmi...My invited talk at the 23rd International Symposium of Mathematical Programmi...
My invited talk at the 23rd International Symposium of Mathematical Programmi...
ย 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
ย 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual network
ย 
Model-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical ConstraintsModel-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical Constraints
ย 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
ย 
Cdm anetworkdesign
Cdm anetworkdesignCdm anetworkdesign
Cdm anetworkdesign
ย 
Final Poster
Final PosterFinal Poster
Final Poster
ย 
CMSI่จˆ็ฎ—็ง‘ๅญฆๆŠ€่ก“็‰น่ซ–A (2015) ็ฌฌ13ๅ›ž Parallelization of Molecular Dynamics
CMSI่จˆ็ฎ—็ง‘ๅญฆๆŠ€่ก“็‰น่ซ–A (2015) ็ฌฌ13ๅ›ž Parallelization of Molecular Dynamics CMSI่จˆ็ฎ—็ง‘ๅญฆๆŠ€่ก“็‰น่ซ–A (2015) ็ฌฌ13ๅ›ž Parallelization of Molecular Dynamics
CMSI่จˆ็ฎ—็ง‘ๅญฆๆŠ€่ก“็‰น่ซ–A (2015) ็ฌฌ13ๅ›ž Parallelization of Molecular Dynamics
ย 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
ย 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron)
ย 
Real Time Results of a Fuzzy Neural Network Active Noise Controller
Real Time Results of a Fuzzy Neural Network Active Noise ControllerReal Time Results of a Fuzzy Neural Network Active Noise Controller
Real Time Results of a Fuzzy Neural Network Active Noise Controller
ย 
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
ย 
2019-06-14:6 - Reti neurali e compressione immagine
2019-06-14:6 - Reti neurali e compressione immagine2019-06-14:6 - Reti neurali e compressione immagine
2019-06-14:6 - Reti neurali e compressione immagine
ย 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
ย 
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser Access
Non-Linear Optimization Schemefor Non-Orthogonal Multiuser AccessNon-Linear Optimization Schemefor Non-Orthogonal Multiuser Access
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser Access
ย 
26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
ย 

More from jaewon lee (9)

PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PR-185: RetinaFace: Single-stage Dense Face Localisation in the WildPR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
ย 
PR-199: SNIPER:Efficient Multi Scale Training
PR-199: SNIPER:Efficient Multi Scale TrainingPR-199: SNIPER:Efficient Multi Scale Training
PR-199: SNIPER:Efficient Multi Scale Training
ย 
PR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypointsPR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypoints
ย 
PR157: Best of both worlds: human-machine collaboration for object annotation
PR157: Best of both worlds: human-machine collaboration for object annotationPR157: Best of both worlds: human-machine collaboration for object annotation
PR157: Best of both worlds: human-machine collaboration for object annotation
ย 
PR-122: Can-Creative Adversarial Networks
PR-122: Can-Creative Adversarial NetworksPR-122: Can-Creative Adversarial Networks
PR-122: Can-Creative Adversarial Networks
ย 
Rgb data
Rgb dataRgb data
Rgb data
ย 
Pytorch kr devcon
Pytorch kr devconPytorch kr devcon
Pytorch kr devcon
ย 
PR-134 How Does Batch Normalization Help Optimization?
PR-134 How Does Batch Normalization Help Optimization?PR-134 How Does Batch Normalization Help Optimization?
PR-134 How Does Batch Normalization Help Optimization?
ย 
PR-110: An Analysis of Scale Invariance in Object Detection โ€“ SNIP
PR-110: An Analysis of Scale Invariance in Object Detection โ€“ SNIPPR-110: An Analysis of Scale Invariance in Object Detection โ€“ SNIP
PR-110: An Analysis of Scale Invariance in Object Detection โ€“ SNIP
ย 

Recently uploaded

Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Nguyen Thanh Tu Collection
ย 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
ย 

Recently uploaded (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
ย 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
ย 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
ย 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
ย 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
ย 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
ย 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
ย 
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
ย 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
ย 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
ย 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
ย 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
ย 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
ย 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
ย 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
ย 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
ย 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
ย 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
ย 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
ย 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
ย 

PR 171: Large margin softmax loss for Convolutional Neural Networks

  • 1. visionNoob @ PR12 PR 171 : Large-Margin Softmax Loss for Convolutional Neural Networks Liu, Weiyang, et al. ICML 2016 https://arxiv.org/abs/1612.02295
  • 2. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 2 Wang, Mei, and Weihong Deng. "Deep face recognition: A survey." arXiv preprint arXiv:1804.06655 (2018). today! The Development of Loss Function see also PR-127: FaceNet (https://youtu.be/0k3X-9y_9S8) SphereFace
  • 3. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 3 Large-Margin Softmax Loss intra-class compactness inter-class separability Ahmed, Saeed, et al. "Covert cyber assault detection in smart grid networks utilizing feature selection and Euclidean distance-based machine learning." Applied Sciences 8.5 (2018): 772. Intuitively, the learned features are good if are simultaneously maximized Goal more discriminative information (compared to original softmax loss or others) In perspective of loss function,
  • 4. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 4 Large-Margin Softmax Loss ๐’š ๐Ÿ = ๐‘Š1 ๐‘‡ ๐’™ ๐’š ๐Ÿ = ๐‘Š2 ๐‘‡ ๐’™ softmax loss ๐ฟ ๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘” exp(๐‘Š๐‘ฆ ๐‘‡ ๐’™) ๐‘—=1 ๐ถ exp(๐‘Š๐‘— ๐‘‡ ๐’™) * biases and batches are omitted for simplicity. = โˆ’๐‘™๐‘œ๐‘” exp(๐‘Š1 ๐‘‡ ๐’™) exp ๐‘Š1 ๐‘‡ ๐’™ + exp(๐‘Š2 ๐‘‡ ๐’™) ๐‘พ๐’™ ๐‘พ ๐Ÿ ๐‘พ ๐Ÿ ๐‘ฆ1 ๐‘ฆ2 ๐’š example : binary classification (if ๐‘ฆ = 1) last fc layerfeature from prev layer The original softmax is to force ๐‘Š1 ๐‘‡ ๐’™ > ๐‘Š2 ๐‘‡ ๐’™ in order to classify ๐’™ correctly. intra-class compactness inter-class separability
  • 5. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 5 Large-Margin Softmax Loss Intuition ๐‘พ๐’™ ๐‘พ ๐Ÿ ๐‘พ ๐Ÿ ๐‘ฆ1 ๐‘ฆ2 ๐’š ๐’š ๐Ÿ = ๐‘Š1 ๐‘‡ ๐’™ = ๐‘Š1 ๐’™ cos(๐œƒ1) ๐’š ๐Ÿ = ๐‘Š2 ๐‘‡ ๐’™ = ๐‘Š2 ๐’™ cos(๐œƒ2) ๐‘Š1 ๐’™ cos ๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2) 1. Original softmax is to force ๐‘Š1 ๐‘‡ ๐’™ > ๐‘Š2 ๐‘‡ ๐’™ ๐‘Š1 ๐’™ cos ๐‘š๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2) 2. L-softmax (proposed) (0 โ‰ค ๐œƒ1 โ‰ค ๐œ‹ ๐‘š ) ๐‘š is a positive integer ๐‘Š1 ๐’™ cos ๐œƒ1 โ‰ฅ ๐‘Š2 ๐’™ cos(๐‘š๐œƒ1) following inequality holds: > ๐‘Š1 ๐’™ cos ๐œƒ2
  • 6. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 6 Large-Margin Softmax Loss Intuition ๐‘Š1 ๐’™ cos ๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2) 1. Original softmax is to force ๐‘Š1 ๐‘‡ ๐’™ > ๐‘Š2 ๐‘‡ ๐’™ ๐‘Š1 ๐’™ cos ๐‘š๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2) 2. L-softmax (proposed) (0 โ‰ค ๐œƒ1 โ‰ค ๐œ‹ ๐‘š ) ๐‘š is a positive integer ๐‘Š1 ๐’™ cos ๐œƒ1 โ‰ฅ ๐‘Š2 ๐’™ cos(๐‘š๐œƒ1) following inequality holds: > ๐‘Š1 ๐’™ cos ๐œƒ2 the original softmax loss requires ฮธ1 < ฮธ2 to classify the sample x as class 1, while the L-Softmax loss requires mฮธ1 < ฮธ2 to make the same decision. ๐‘Š1 ๐’™ cos ๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2) ๐‘Š1 ๐’™ cos ๐‘š๐œƒ1 > ๐‘Š2 ๐’™ cos(๐œƒ2) Geometric interpretation -> More rigor classification criteria
  • 7. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 7 Large-Margin Softmax Loss 1. softmax loss ๐ฟ ๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘” exp(๐‘Š๐‘ฆ ๐‘‡ ๐’™) ๐‘—=1 ๐ถ exp(๐‘Š๐‘— ๐‘‡ ๐’™) = โˆ’๐‘™๐‘œ๐‘” exp( ๐‘Š๐‘ฆ ๐’™ cos(๐œƒ ๐‘ฆ))) ๐‘—=1 ๐ถ exp( ๐‘Š๐‘— ๐’™ cos(๐œƒ๐‘—)) * biases and batches are omitted for simplicity. 2. L-softmax loss (Large-Margin Softmax Loss) ๐ฟ ๐ฟโˆ’๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘” exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ))) exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ))) + ๐‘—โ‰ ๐‘ฆ exp( ๐‘Š๐‘— ๐’™ cos(๐œƒ๐‘—)) monotonically decreasing function
  • 8. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 8 Experiment 5.1 Experimental Settings โ€ข Dataset โ€ข visual classification : MNIST, CIFAR10, CIFAR100 โ€ข face verification : LFW โ€ข VGG-like model with Caffe โ€ข PReLU nonlinearities โ€ข Batch size : 256 โ€ข Weight decay : 0.0005 โ€ข Momentum : 0.9 โ€ข He initialization โ€ข Batch normalization โ€ข No dropout
  • 9. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 9 Experiment model overview
  • 10. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 10 Experiment 3.2 visual classification 1. MNIST
  • 11. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 11 Experiment 3.2 visual classification 1. Cifar10/Cifar100
  • 12. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 12 Experiment 3.2 visual classification softmax suffer from overfitting
  • 13. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 13 Experiment 3.2 visual classification softmax suffer from overfitting
  • 14. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 14 Experiment 3.3 face verification 1. LFW - Training set : CASIA-WebFace 490k labeled face images with 1000 person class 2. ROI Align : IntraFace 3. PCA for compact vector
  • 15. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 15 Conclusion 1. Large-Margin Softmax loss is proposed 2. Adjustable margin with parameter ๐’Ž 3. Clear intuition and geometric interpretation 4. Sate-of-the-art CNN in benchmarks
  • 16. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 16 Discussion SphereFace: Deep Hypersphere Embedding for Face Recognition ๐ฟ ๐ฟโˆ’๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ = โˆ’๐‘™๐‘œ๐‘” exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ))) exp( ๐‘Š๐‘ฆ ๐’™ ฯˆ(๐œƒ ๐‘ฆ))) + ๐‘—โ‰ ๐‘ฆ exp( ๐‘Š๐‘— ๐’™ cos(๐œƒ๐‘—)) ๐‘Š1 ๐‘‡ ๐’™ = ๐‘Š1 ๐’™ cos ๐œƒ1 ๐‘Š2 ๐‘‡ ๐’™ = ๐‘Š2 ๐’™ cos ๐œƒ2 ( ๐‘Š is normalized!)
  • 17. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 17 Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face recognition." CVPR 2019 CVPR 2019CVPR 2018CVPR 2017 Angular Margin Losses L-softmax + normalized
  • 18. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 18 Triplet loss (CVPR 2015) Wen, Yandong, et al. "A discriminative feature learning approach for deep face recognition." ECCV 2016 Centre loss (ECCV2016) Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering.โ€œ CVPR 2015. Contrastive loss (NIPS 2014) Ring loss (CVPR 2018) Metric Learning Sun, Yi, et al. "Deep learning face representation by joint identification-verification." NIPS 2014 Zheng, Yutong, Dipan K. Pal, and Marios Savvides. "Ring loss: Convex feature normalization for face recognition." CVPR 2018
  • 19. PR 171: Large-Margin Softmax Loss for Convolutional Neural Networks PR12 19 Q & A

Editor's Notes

  1. ๋ฐœํ‘œ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ œ๊ฐ€ ๋ฐœํ‘œํ•  ๋…ผ๋ฌธ์€ Large-Margin Softmax Loss for Convolutional Neural Networks ์ด๋ผ๋Š” ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ICML 2016์—์„œ ๋ฐœํ‘œ๊ฐ€ ๋˜์—ˆ๊ตฌ์š”
  2. ์ด ๋…ผ๋ฌธ์„ ์„ ์ •ํ•œ ์ด์œ ๋Š”, ํƒœ์˜ค๋‹˜์ด PR-127์—์„œ Face Recognition์— ๋Œ€ํ•œ overview๋ฅผ ๋ฐœํ‘œ ํ•ด์ฃผ์…จ๋Š”๋ฐ์š” ์ œ๊ฐ€ ๊ทธ๋ž˜์„œ ์˜ฌํ•ด CVPR์— ๋‚˜์˜จ ArcFace๋ฅผ ๋ฐœํ‘œ๋ฅผ ํ•˜๋ ค๊ณ  ํ•˜๋‹ค๊ฐ€ ์ด ์˜ค๋Š˜ ๋ฐœํ‘œํ•˜๋Š” Large-Margin softmax ๋…ผ๋ฌธ์ด base๊ฐ€ ๋˜๋Š” ๋…ผ๋ฌธ์ด๋ผ์„œ ์„ ํ–‰์ ์œผ๋กœ ์†Œ๊ฐœํ•ด ๋“œ๋ฆฌ๋ ค๊ณ  ํ•˜๊ตฌ์š” ๋’ค์— Sphere Face, Cosface, Arcface ๊ด€๋ จ ์Šฌ๋ผ์ด๋“œ๋„ ๊ฐ„๋žตํ•˜๊ฒŒ ํ•œ์žฅ ๋„ฃ์—ˆ์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค
  3. ์ด ๋…ผ๋ฌธ์—์„œ ํ•˜๊ณ ์‹ถ์€ ๊ฒƒ์€ ๊ฒฐ๊ตญ ์šฐ๋ฆฌ๊ฐ€ ์–ด๋–ป๊ฒŒ loss function์„ ์งœ์•ผ more discriminative information๋ฅผ ์ž˜ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ์„๊นŒ? ๋ผ๋Š” ๊ฒƒ์ด๊ตฌ์š” ๋งŒ์•ฝ classification ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š”๋ฐ ์–ด๋–ค ๋‘ ํด๋ž˜์Šค๋ฅผ ์ž˜ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ• ๊นŒ๋ฅผ ์ƒ๊ฐํ•ด๋ณด๋ฉด intra-class compactness์™€ inter-class separability๋ฅผ maximize ํ•˜๋ฉด ๋˜๊ฒ ์ฃ  ๊ทธ๋Ÿด๋ผ๋ฉด loss function์„ ์„ค๊ณ„ํ•˜๋Š” ๊ด€์ ์—์„œ loss๋ฅผ ์–ด๋–ป๊ฒŒ ์„ค๊ณ„ํ•˜๋ฉด ์ข‹์„๊นŒ? ์–ด๋–ค ํŒจ๋„ํ‹ฐ๋ฅผ ์ค˜์•ผ ์ €๋ ‡๊ฒŒ ํ•™์Šต์ด ๋  ์ˆ˜ ์žˆ์„๊นŒ? ๋ผ๋Š”๊ฒ๋‹ˆ๋‹ค.
  4. ์šฐ์„  softmax๋ฅผ ํ•œ๋ฒˆ ์‚ดํŽด๋ณผ๊ฑด๋ฐ์š”, ์ต์ˆ™ํ•˜์‹ค๊ฑฐ์—์š” ๊ฐ„๋‹จํ•œ binary classification ์˜ˆ์ œ์—์„œ linear layer๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•ด๋ณผ๊ฒŒ์š”. CNN ๊ฐ™์€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•ด๋ณด์ž๋ฉด ์ด์ „ ๋ ˆ์ด์–ด๋ฅผ ํƒ€๊ณ  ์˜จ ์–ด๋–ค ์ž…๋ ฅ ๋ฒกํ„ฐ x๊ฐ€ ์žˆ์„๊ฑฐ๊ตฌ์š” Matrix W๋Š” fc-layer๊ฐ™์€ linear transformation ์ด๊ตฌ์š” ๊ฒฐ๊ณผ๋ฒกํ„ฐ์˜ ๊ฐ elemen๋Š” Wmatrix์˜ col vec์™€ input feature x์˜ ๋‚ด์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๋ ˆ์ด์–ด๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š”๋ฐ ๋ณดํ†ต softmax crossentropy loss๊ฐ™์€๊ฑธ ์‚ฌ์šฉํ•˜๊ฒ ์ฃ  softmas๋Š” ๐‘Š 1 ๐‘‡ ๐’™ > ๐‘Š 2 ๐‘‡ ๐’™ ํ•˜๋ฉด happyํ•˜๋‹ค. ๋…ผ๋ฌธ์—์„œ ์ฃผ์žฅํ•˜๋Š”๊ฒƒ์€ ์‚ฌ์‹ค softmax loss๋Š” Intra-class compactness๋‚˜ inter-class separability๋ฅผ ๊ณ ๋ คํ•˜์ง€๋Š” ์•Š๋Š”๊ฑฐ์ฃ 
  5. ์ž ๊ทผ๋ฐ ์—ฌ๊ธฐ์—์„œ ์ด W mat์˜ col vec์™€ x vec์˜ ๋‚ด์ ์€ ๊ฒฐ๊ตญ ์ด๋ ‡๊ฒŒ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๊ฒ ์ฃ ? ๋‘ magnitud์—๋‹ค๊ฐ€ cos theta๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜์žˆ๊ฒ ์ฃ ? ๊ทธ๋ž˜์„œ Large-Margin Softmax Loss๊ฐ€ ํ•˜๊ณ ์‹ถ์€๊ฒŒ ๋ญ๋ƒ๋ฉด ์ด cos theta์— margin์„ ์ฃผ๋ฉด ์–ด๋–จ๊นŒ? ํ•˜๋Š”๊ฒ๋‹ˆ๋‹ค.
  6. Geometric interpretation์œผ๋กœ ๋‹ค์‹œํ•œ๋ฒˆ ์‚ดํŽด๋ณด์ž๋ฉด Original softmax์—์„œ๋Š” sample x๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋ ค๋ฉด Theta1 < theta2 ์ธ ์ƒํ™ฉ์ด๋ผ๋ฉด L-Softmax์—์„œ๋Š” same decision์„ ํ•˜๋ ค๋ฉด mTheta1 < theta2 ์ธ ์ƒํ™ฉ์„ ์ƒ๊ฐํ•ด๋ณด๋ฉด Original ์˜ theta1 ๋ณด๋‹ค m๋ฐฐ ๋” ์ž‘์•„์•ผ๊ฒ ์ฃ  ์ด๋Ÿฐ ํŠน์„ฑ์„ ์ด์šฉํ•ด์„œ ์กฐ๊ธˆ ๋” ์—„๊ฒฉํ•œ classification criteria๋ฅผ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ? ํ•˜๋Š”๊ฒŒ ์ด ๋…ผ๋ฌธ์—์„œ ์ฃผ์žฅํ•˜๋Š” ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.
  7. ๊ทธ๋ž˜์„œ Large-Margin Softmax Loss๋ฅผ L-softmax๋ผ๊ณ  ํ•˜๋Š”๋ฐ์š” ์‹ค์ œ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Original softmax ์˜ ์‹์„ ์‚ดํŽด๋ณด๋ฉด ์ „์ฒด ํด๋ž˜์Šค์˜ ์ด๊ฑฐ ๋ถ„์— ์ •๋‹ต ํด๋ž˜์Šค์˜ ์ด ๊ฐ’ ์ด์—ˆ๋Š”๋ฐ L-Softmax loss์—์„œ๋Š” ์ •๋‹ต์ผ๋•Œ cos theta๊ฐ€ ์•„๋‹ˆ๋ผ ์—ฌ๊ธฐ ํ”„์‚ฌ์ด ํ•จ์ˆ˜๋ฅผ ํ•˜๋‚˜ ์”Œ์›๋‹ˆ๋‹ค. ๊ทธ๋‹ˆ๊นŒ ์—ฌ๊ธฐ ์›๋ž˜ cos(theta)๊ฐ€ ํŒŒ๋ž€์ƒ‰ ์‹ค์„ ์ด๊ณ  ํ”„์‚ฌ์ด ํ•จ์ˆ˜๋Š” ๋งˆ์ง„ m์— ๋”ฐ๋ผ cos theta๊ฐ’์ด ์—ฐ์†์ ์œผ๋กœ ๋ฐ˜๋ณต๋˜๋Š” ํ˜•ํƒœ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•ด์š” M = 2 ์ผ๋•Œ๋Š” cos(2 theta)์ธ ๊ฒฝ์šฐ์—๋Š” ์ด๋ ‡๊ฒŒ ๋นจ๊ฐ„ ์‹ค์„ ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  8. ์ด์ œ ์‹คํ—˜์ธ๋ฐ์š” ์‹คํ—˜์€ MNIST, CIFAR10 100 LFW datasetset์œผ๋กœ ํ–ˆ๊ตฌ์š” VGG-like ๋ชจ๋ธ๋กœ ์ผ๋ฐ˜์ ์ธ ํ•™์Šต setting์œผ๋กœ ์ง„ํ–‰์„ ํ•œ ๊ฒƒ ๊ฐ™์•„์š”
  9. ์ด๋Ÿฐ์‹์œผ๋กœ
  10. MNIST dataset ๋งˆ์ง„์„ ์–ผ๋งˆ๋ฅผ ์ฃผ๋ƒ์— ๋”ฐ๋ผ์„œ ํ•™์Šต์ด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ M = 1์ผ๋•Œ๋Š” ๊ทธ๋ƒฅ softmax
  11. CIFAR10/100 ์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์ข‹๋”๋ผ
  12. ๊ทธ๋ฆฌ๊ณ  softmax์— ๋น„ํ•ด์„œ overfitting์— ์กฐ๊ธˆ๋” ๊ฐ•์ธํ•œ ๊ฒƒ ๊ฐ™๋‹ค ๋ผ๊ณ  ์ฃผ์žฅ์„ ํ•˜๋Š”๋ฐ์š” ์ด์œ ์— ๋Œ€ํ•ด์„œ๋Š” ์ž์„ธํžˆ ๋‚˜์™€์žˆ์ง€๋Š” ์•Š๊ณ  ๋งˆ์ง„์ด๋ผ๋Š”๊ฑธ ์ค˜์„œ ๋ชจ๋ธ์ด ๋” ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์ด ์•„๋‹Œ๊ฐ€ ๋ญ ์ด์ •๋„์˜ ๋””์Šค์ปค์…˜์ด ์žˆ์—‡์Šต๋‹ˆ๋‹ค.
  13. ์˜ค๋ฒ„ํ”ผํŒ…์— ๊ฐ•์ธํ•ด์„œ ๋ ˆ์ด์–ด๋ฅผ ์ญ‰์ญ‰ ๋Š˜๋ ค๋„ ๋Š˜๋ฆฌ๋Š”๋Œ€๋กœ ์„ฑ๋Šฅ์ด ์ญ‰์ญ‰ ์ž˜ ๋‚˜์˜จ๋‹ค
  14. ํŠน์ง•์ด ์ž˜ ํ•™์Šต๋๋Š”์ง€ ํ‰๊ฐ€ํ• ๋ผ๊ณ 
  15. ์‚ฌ์‹ค ๋ˆˆ์น˜์ฑ„์‹  ๋ถ„๋“ค๋„ ์žˆ๊ฒ ์ง€๋งŒ ๋ฒกํ„ฐ W1์ด๋ž‘ ๋ฒกํ„ฐ W2์˜ magnitude๊ฐ€ ๋™์ผํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ์—๋Š” ์š”๋กœ์ผ€ theta๋งŒ ๊ฐ€์ง€๊ณ  ํ•ด์„ํ•˜๋Š”๊ฒŒ ์กฐ๊ธˆ ์–ด๋ ต๊ฒ ์ฃ  ๋…ผ๋ฌธ์—์„œ๋„ ์ด ๋‘ ๊ฐ’์ด ๋‹ค๋ฅธ ๊ฒฝ์šฐ์—๋Š” ํ•ด์„ํ•˜๊ธฐ๊ฐ€ ์กฐ๊ธˆ complicated ํ•˜์ง€๋งŒ ์–ด์ฉ„๋“  ๋งˆ์ง„์„ ๊ฐ€์ง€๊ณ  ์ž˜๋˜๋Š”๊ฒƒ ๊ฐ™๋‹ค! ์ •๋„๋กœ๋งŒ ์ด์•ผ๊ธฐํ•˜๊ณ  ์žˆ๋Š”๋ฐ์š” ์ด SphereFace๋ผ๋Š” ๋…ผ๋ฌธ์—์„œ๋Š” ์•„์–˜ W๋ฅผ 1๋กœ normalized ํ•ด๋ฒ„๋ ค์„œ ํ•ด๊ฒฐ์„ ํ•ด๋ฒ„๋ฆฝ๋‹ˆ๋‹ค.
  16. ๊ทธ๋ฆฌ๊ณ  ํŠนํžˆ ์š”์ฆ˜ ์–ผ๊ตด์ธ์‹์—์„œ ์ด๋Ÿฐ sphereFace๋ผ๋˜์ง€ CosFace, ArceFace ์ด๋Ÿฐ ๋ฐฉ๋ฒ•๋“ค์ด ๊ณ„์† ๋‚˜์˜ค๋Š”๋ฐ ๊ฒฐ๊ตญ ์—ฌ๊ธฐ ๋ณด์‹œ๋ฉด ๋งˆ์ง„์„ ์–ด๋””์— ๋‘˜๊บผ๋ƒ์— ๋”ฐ๋ผ์„œ ๊ทธ๋ƒฅ ๋…ผ๋ฌธ ์„ธ๊ฐœ๊ฐ€ ๋‚˜์™”๋‹ค๊ณ  ๋ด๋„ ๋ฌด๋ฐฉํ•  ๊ฒƒ ๊ฐ™์•„์š” S๋Š” scale factor์ด๊ณ  ์ด๋Ÿฐ๊ฒŒ minorํ•˜๊ฒŒ ๋‹ค๋ฅด๊ธดํ•œ๋ฐ ์–ด์ฉƒ๋“  ์ปจ์…‰์€ ๋น„์Šทํ•˜๊ฑฐ๋“ ์š” Margin์„ ์–ด๋””์— ๋‘๊ณ  ํ•™์Šต์„ ํ•˜๋ƒ์— ๋”ฐ๋ผ์„œ ์ด๋Ÿฐ geometric interpretation์ด ๋‹ฌ๋ผ์ง€๊ณ  ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์ง€๊ณ  ํ•˜๋Š”๊ฒŒ ์‹ ๊ธฐํ–ˆ์Šต๋‹ˆ๋‹ค.
  17. ์‚ฌ์‹ค ์ด ์™ธ์—์„œ Metric Learning ์ด๋ผ๊ณ ํ•ด์„œ ๊ธฐ์กด์— ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์ด ์žˆ๋Š”๋ฐ์š” ์ œ๊ฐ€ ์ƒ๊ฐํ•˜๊ธฐ์— ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ ๋ฐฉ๋ฒ•์€, ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์กฐ๊ธˆ ๋” ์‹ฌํ”Œํ•˜๊ณ  ๊ฐ•๋ ฅํ•˜๊ฒŒ intra-class compactness, inter-class separability๋ฅผ matimize ํ•  ์ˆ˜ ์žˆ๋Š”๊นŒ ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ณ„์† ์—ฐ๊ตฌ๊ฐ€ ๋˜๋Š” ๊ฒƒ ๊ฐ™์•„์š”. ๊ฐ€๋ น Triplet Loss๊ฐ™์€ ๊ฒฝ์šฐ์— ์ƒ˜ํ”Œ ์„ธ๊ฐœ๋ฅผ ๋ฝ‘๊ฑฐ๋“ ์š” Anchor ํฌ์ธํŠธ ํ•˜๋‚˜, Anchor๋ž‘ ๊ฐ™์€ ๋ ˆ์ด๋ธ”์ธ ์ƒ˜ํ”Œ ํ•˜๋‚˜ ๋‹ค๋ฅธ ๋ ˆ์ด๋ธ”์ธ negative ์ƒ˜ํ”Œ ํ•˜๋‚˜. ๊ทผ๋ฐ negative sample์„ ๋ฝ‘๋Š”๋‹ค๋Š”๊ฒŒ ๋ฐ์ดํ„ฐ๊ฐ€ ์ปค์ง€๋ฉด ์ปค์งˆ์ˆ˜๋ก ์ƒ๊ฐ๋ณด๋‹ค ์ƒ๋‹นํžˆ ๋ณต์žกํ•œ ๋ฌธ์ œ๊ฐ€ ๋˜๊ฑฐ๋“ ์š” Center loss๊ฐ™์€ ๊ฒฝ์šฐ๋Š” class๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก converge์‹œํ‚ค๊ธฐ๊ฐ€ ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค๋Š” ๋ฌธ์ œ๋„ ์žˆ์—ˆ๊ตฌ์š” ์ด๋Ÿฐ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ์ปค์ง€๊ณ  ๋ ˆ์ด๋ธ”์ด ๋งŽ์•„์ ธ๋„ ์‹ฌํ”Œํ•˜๊ณ  ๊ฐ•๋ ฅํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ์ด์ง€ ์•Š๋‚˜ ์‹ถ์—ˆ์Šต๋‹ˆ๋‹ค.
  18. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ด๊ฒŒ dataset์ด ๋” ์ •์ œ๋˜์–ด์žˆ์ง€ ์•Š์•„์„œ ๊ทธ๋Ÿฐ๊ฒŒ ์•„๋‹Œ๊ฐ€ Triplet loss ์‹ค์ œ ์„œ๋น„์Šค๋ฅผ ํ•ด์•ผํ•˜๋Š” ๊ด€์ ์—์„œ ์ ์  ์‹ฌํ”Œํ•˜๋ฉด์„œ๋„ ๊ฐ•๋ ฅํ•œ
  19. ์ €๋Š” ์ด๋ฒˆ์— ์ด Large-Margin Softmax ๋ผ๋Š” ๋…ผ๋ฌธ์„ face recognition์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ stream์˜ ์ผ๋ถ€๋กœ ์‚ดํŽด๋ณผ ์˜ˆ์ •์ด๊ตฌ์š” Open-set Face Recognition ๋ฌธ์ œ์—์„œFeature๋ฅผ ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์ž˜ ๋งŒ๋“ค ์ˆ˜ ์žˆ์„๊นŒ
  20. ์ด๊ฑธ ๋‹ค์‹œํ•œ๋ฒˆ ์ƒ๊ฐํ•ด๋ณด๋ฉด y_1 ์ด๋ผ๋Š”๊ฑด ๊ฒฐ๊ตญ W_1 ๊ณผ x๊ฐ„์˜ ๋‚ด์ ํ•œ ๊ฐ’์ผ ๊ฒƒ์ด๊ณ  Y_2 ๋ผ๋Š”๊ฑด W2์™€ x๊ฐ„์˜ ๋‚ด์ ์ด๊ฒ ์ฃ  ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด intra-class compactness inter-class separability ๊ฐ€ maximize ๋˜๋„๋ก ํ•™์Šต์ด ๋˜์ง€ ์•Š์„๊นŒ Margin์ด ์žˆ์œผ๋‹ˆ๊นŒ theta1์ด theta2์— ๋น„ํ•ด์„œ ํ›จ์”ฌ ๋” ์ž‘์•„์•ผ ๊ฒ ์ฃ  Original softmax์—์„œ๋Š” sample x๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋ ค๋ฉด Theta1 < theta2 ์ธ ์ƒํ™ฉ์ด๋ผ๋ฉด L-Softmax์—์„œ๋Š” same decision์„ ํ•˜๋ ค๋ฉด mTheta1 < theta2 ์—ฌ์•ผ ๊ฒ ์ฃ