2. Current prevailing classification loss functions for deep face
recognition are mostly based on the widely-used softmax loss.
The softmax loss is typically good at optimizing the inter-class
difference (i.e., separating different classes), but not good at
reducing the intra-class variation (i.e., making features of the
same class compact).
The problem is to minimize the intra-class variation.
Adding a regularization term to penalize the feature-to-center
distances and use a scale parameter of the softmax loss,
produce higher gradients with the well-separated samples to
further shrink the intra-class variance.
3. Comparison between the original softmax loss
and the additive margin softmax loss
The angular softmax can only impose unfixed angular margin, while the additive
margin softmax incorporates the fixed hard angular margin.
4. Angular margin push the classification boundary closer to the
weight vector of each class. A theoretical guidance of training
a deep model for metric learning tasks using the classification
loss functions is required through which we can improved the
softmax loss by incorporating different kinds of margins.