3. Task
1. Face Verification
• is this the same person
2. Face recognition
• who is this person
3. Clustering
• find similar people among these faces
4. Key idea
An embedding f(x), from an image x into a feature
space Rd, such that the squared distance between
all faces, independent of imaging conditions, of
the same identity is small, whereas the squared
distance be- tween a pair of face images from
different identities is large.
5. Key idea
The triplet loss tries to enforce a margin between
each pair of faces from one person to all other
faces. This allows the faces for one identity to live
on a manifold, while still enforcing the distance
and thus discriminability to other identities.
7. Face Verification
• With threshold 1.1, if
the distance is lover
than threshold, we
verify both face are
the same ID
8. Triplet Loss
• Minimize the distance between same ID faces and maximize the distance with
different ID faces, with a margin alpha
9. Prepare data
• Correct triplet selection is crucial for fast convergence
• Triplet constraint: The distance of negative samples should have
larger distance with the anchor image (compared to positive
image)
• Triplet is formed by (anchor, positive, negative )
10. Hard data
• In order to ensure fast convgerence, it is crucial to
select triplets that violate the triplet constraint
• It is infeasible to compute the hard positive and
hard negative across the whole training set.
11. Visualize of hard data
Embedding location
of anchor
Embedding location
of anchor
Green area is the distribution of positive faces over embedding space.
Red is distribution of negative faces over the same space.
Hard positive means the furtherest sample (to the anchor location) in the
positive space, and hard negative is the closest sample (to the anchor
location) among the negative faces. Hard sample should help training
converges faster.
12. Triplet generation
• Generate triplets online.
• This can be done by select- ing the hard positive/negative exemplars from
within a mini-batch.
• Mini batch: few thousand exemplars
• 40 faces are selected per ID per mini batch, and randomly add negative
faces
• Only consider argmin and argmax within a mini-batch
• Mis labeled and poorly image might lead to poor training
13. Semi-hard sample
• Selecting the hardest negatives can win
practice lead to bad local minima early on
in training
• For negative data, this is semi-hard
because its lower bound positive data. In
other words, the negative data cannot
look more similar than all the positive
data.
• Compared with triplet constraint, this is
more hard because semi-hard cancel the
margin alpha
14. Setting
• Batch size: around 1800 samplers
• Optimize: SGD or AdaGrad
• Hardware: CPU cluster for 1000~2000 hours
• Train rate start: 0.05, will decrease
• Activation: Relu
• Data set: Labelled Faces in the Wild, YouTube Faces
• Training amount: 100M~200M with 8M different ID
15. Performance on different
CNN model
NNS2 is smallest model the paper test on. Capable to run on
the mobile phone, and the NNS2 performance is not too bad
19. Concern
• Sampling the sample might be hard
• Huge batch size
• Sensitive to mislabeled and bad image (need good
quality image data)
• Might need 100M data
• Need multiple experiment to decide embedding size
• Long time to train
20. Opportunity
• Possible to use train small CNN to achieve good
embedding
• Could be applied for other datasets
• One embedding can be used for multiple
different tasks