FaceNet:
A Unified Embedding for Face Recognition and Clustering
Florian Schroff
Dmitry Kalenichenko
James Philbin
Google [2015]
[ 0.01949719, 0.09399229, -0.01618082, -0.00876935, 0.03146157, 0.06853894, 0.00096175, -0.06854118, -0.04771797, -0.05296798, 0.02119147, 0.00511259, 0.1372668
[ 0.01949719,
0.09399229, -
0.01618082, -
0.00876935,
0.03146157,
0.06853894,
0.00096175, -
0.06854118, -
0.04771797, -
0.05296798,
0.02119147,
0.00511259,
0.13726683, -
0.05780432, -
0.00541799,
0.01251621, -
0.08900651, -
0.15897971, -
0.07564467,
0.16574059,
…]
EMBEDDING
MOTIVATION
PROBLEMS WITH OLD
APPROACHES
Is this Tess? This is Anders and Tess? Who diz?
VERIFICATION IDENTIFICATION CLUSTERING
TIMTESS TEMPEST YANA
TESS UNKNOWN?
SIMON CHRIS
We have lots and lots of
people…
We need lots and lots of
selfies…
FRANCOIS
Let’s not hire him,
I don’t want to retrain the network
TIMTESS TEMPEST YANASIMON CHRIS
This Photo by Unknown Author is licensed under CC BY-NC
LET’S TRAIN A SIMILARITY FUNCTION INSTEAD
EMBEDDINGS
APRICOT PEACH BEACH
MOVIE RECOMMENDATION
AdultChild
Blockbuster
Arthouse
* (-1,-0.95)
* (0.65, 0.2)
* (-1,-0.95)
* (0.65, 0.2)
DISTANCE
d(P,Q) = f(P)−f(Q) 2
2
𝐿2 𝑁𝑜𝑟𝑚
* (-1,-0.95)
* (0.65, 0.2)
DISTANCE
Embedding on the
d-dimensional hypher-sphere
f(X) 2
2
= 1
0 = Identical
4 = As different as can be
d(P,Q) = f(P)−f(Q) 2
2
𝐿2 𝑁𝑜𝑟𝑚
USING EMBEDDINGS
TO SOLVE FACE TASKS
Is this Tess?
VERIFICATION
Verification
=
Distance
Threshold
𝑑(𝑇𝑒𝑠𝑠, 𝑇𝑒𝑠𝑡) ≤ 𝜏
This is <unknown> and Tess?
IDENTIFICATION
Identification
=
search lowest
distance
or
K-nn/SVM
classification
0.3
1.2
1.4
2.3
1.7
1.1
1.6
0.8
Who diz?
CLUSTERING
DEEPFACE
DeepFace by Facebook
1
0
FACENET
TRIPLET LOSS AND
TRIPLET SELECTION
TRIPLET LOSS
ANCHOR (A) POSITIVE (P) ANCHOR (A) NEGATIVE (N)
d(A,P) ≤ d(A,N)d(A,P) - d(A,N) ≤ 0𝑓 𝐴 − 𝑓(𝑃) - 𝑓 𝐴 − 𝑓(𝑁) ≤ 0𝑓 𝐴 − 𝑓(𝑃) - 𝑓 𝐴 − 𝑓(𝑁) + α ≤ 0
0.3 0.32
COST FUNCTION
𝑓 𝑎 − 𝑓(𝑝) - 𝑓 𝑎 − 𝑓(𝑛) + α ≤ 0
FINDING THE RIGHT TRIPLETS
We need triplets that violate the equation to ensure fast convergance
𝑓 𝑎 − 𝑓(𝑝) - 𝑓 𝑎 − 𝑓(𝑛) + α ≤ 0
find P where argmax( 𝑓 𝑎 − 𝑓(𝑝) )
find N where argmin( 𝑓 𝑎 − 𝑓(𝑛) )
Triplet selection options
A) Generate triplets offline every N steps from a subset of data
B) Generate triplets online from a minibatch
Paper uses option B, with large minibatches with 1000s of examples
CNN ARCHITECTURE
OpenFace – Open Source FaceNet implementation
GoogLeNet Inception Network
DATASETS
PREPARATION AND RESULTS
http://vis-www.cs.umass.edu/lfw/
13233 images
5749 people
1680 people with two or more images
Labeled Faces in the Wild (LFW)
77.5 % Male
83.5 % White (Han & Jain, 2014)
BEYOND THE PAPER
https://www.researchgate.net/publication/317649493_Analysis_and_
Automation_of_Deep_Face_Recognition
DATA PREPRATION with DLIB

Facenet - Paper Review

Editor's Notes

  • #8 Classification with very few images New people added all the time – don’t want to retrain OneShot Learning Common strategy for face verification, face identification, face clustering
  • #11 Classification with very few images New people added all the time – don’t want to retrain OneShot Learning Common strategy for face verification, face identification, face clustering
  • #26 Pairs of input and 1 or 0 as output – try to minimize distance of 1s (same) and maximize distance of 0s (different) assumes it will generalize 1000 of features in embedding