Face Recognition Based
on Deep Learning
recognizzit
www.recognizz.it
Face recognition approaches
Verification
Identification
Similarity
Attributes
Benchmarks
ORL
FERET
Labeled Faces in the Wild (LFW)
YouTube Faces (YTF)
ORL
Images taken between
April 1992 and April 1994
at the lab
There are 10 different
images of each of 40
distinct subjects
The size of each image
is 92x112 pixels, with 256
grey levels per pixel
~10 mb
* http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
FERET
Images taken in 2 steps:
1993-1997
1998-2001
1200 subjects,
14051 face images
fa fb dublicate 1 fc dublicate 2
Heads with views ranging from
frontal to left and right profiles
Different lightning
Size ~ 8GB
* http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
Labeled Faces in the Wild
Faces collected
from the web
1680 subjects,
13 000 face images
Size ~200mb
* http://vis-www.cs.umass.edu/lfw/
LFW unrestricted with labeled outside
data protocol
10 batches
Each batch:
300 matched
300 mismatched
Huang, G.B., Learned-Miller, E.: Labeled faces in the wild: Updates and new reporting procedures.
Technical Report UM-CS-2014-003, UMass Amherst (2014)
Round 1
Validation Set
Training Set
Round 2 Round 3 Round 10
...
Face Recognition Algorithm
Face localization Normalization Feature extraction Comparing
Verification
metric
FALSE
What was before DCN
Method Accuracy ± SE
combined Joint Bayesian 0.9242 ±0.0108
Tom-vs-Pete + Attribute 0.9330 ±0.0128
High-dim LBP 0.9517 ±0.0113
TL Joint Bayesian 0.9633 ± 0.0108
Human, cropped 0.975
DeepFace
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance
in face verification. In CVPR, 2014
Training data
Dataset: Social Face
Classification (4030
subjects, ~4.4 million
face images)
Alignment:
2D
3D
Architecture
Input: RGB image 152x152
Output feature size: 4096
Parameters: ~ 120 million
Verification metric:
weighted chi-squared distance
siamese
Results on LFW
Method Accuracy ± SE
combined Joint Bayesian 0.9242 ±0.0108
Tom-vs-Pete + Attribute 0.9330 ±0.0128
High-dim LBP 0.9517 ±0.0113
TL Joint Bayesian 0.9633 ± 0.0108
DeepFace-align2D 0.943 ± 0.0043
DeepFace-Siamese 0.9617 ± 0.0038
DeepFace-ensemble 0.9735 ± 0.0025
Human, cropped 0.975
DeepID
Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In CVPR, 2014.
Training data
Dataset: CelebFaces
(10 177 subjects,
202 599 face images)
Alignment:
2D
Patch
Architecture
Input: 39x31 RGB or grayscale
Output feature size: 160
Additional algorithms:
Joint Bayesian
PCAONLY 1 PATCH
Results on LFW
Method Accuracy ± SE#Net
DeepFace-Siamese 1
7
60
100
100
0.9617 ± 0.0038
DeepFace-ensemble 0.9735 ± 0.0025
DeepID on CelebFaces 0.9605 ± …
DeepID on CelebFaces+ 0.972 ± …
DeepID on CelebFaces+ TL 0.9745 ± 0.0026
Human, cropped 0.975
CASIA
D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.
Training data
Dataset: CASIA WebFace
(10 575 subjects and
494 414 face images)
Alignment: 2D
Architecture
Input: gray image 100x100
Output feature vector size: 320
Parameters:~ 5 million
Verification metric: Cosine
Additional algorithms:
Joint Bayes
PCA
Combined identification
and verification loss
Very deep architecture
Face Labels
Cost2
Contrastive
Verification Labels
[0,1]
Cost1
Softmax
Conv12
Conv11
Poo11
2x2+2(S)
Conv22
Conv21
Poo12
2x2+2(S)
Conv32
Conv31
Poo13
2x2+2(S)
Fc6
Dropcut
40%
Poo15
7x7+1(S)
Conv51
Conv52
Poo14
2x2+2(S)
Conv41
Conv42
Results on LFW
Method Accuracy ± SE#Net
DeepFace 7
100
1
1
1
1
1
0.9735 ± 0.0025
DeepID 0.9745 ± 0.0026
DR + Cosine 0.9613 ± 0.003
DR + PCA on CASIA-WebFace + Cosine 0.963 ± 0.0035
DR + Joint Bayes on CASIA-WebFace 0.973 ± 0.0031
DR + PCA on LFW training set + Cosine
DR + Joint Bayse on LFW training set
0.9633 ± 0.0042
0.9773 ± 0.0031
* DR – CASIA-WebFace
CASIA replication
Training data Accuracy on LFW
normalized 97,27 %
no alignment 94 %
our alignment 96,2 %
Database size
Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, and Chang Huang; Baidu�Targeting Ultimate Accuracy:
Face Recognition via Deep Embedding.
Identities Error rate
1.5K 3.1%
9K 1.35%
18K
Faces
150K
450K
1.2M 0.87%
Database availability
Dataset #Images#Subjects
LFW 5 749
2 995
10 177
4 030
2 000
10 575
13 233
WDRef 99 773
CelebFaces 202 599
SFC 4 400 000
CACD 163 446
CASIA-WebFace 494 414
Availability
Public
Public (feature only)
Private
Private
Public
(partial annotated)
Public
D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch.
Technical report, arXiv:1411.7923, 2014.
Recommendations
Big dataset with large amount of subjects
Careful face extraction and alignment
Deep architecture
Joint Identification-Verification
K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”.
arXiv preprint arXiv:1409.1556, 2014
Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In NIPS, 2014.
THANK YOU
Yurii Pashchenko
george.pashchenko@gmail.com
www.recognizz.it

Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)

  • 1.
    Face Recognition Based onDeep Learning recognizzit www.recognizz.it
  • 2.
  • 3.
    Benchmarks ORL FERET Labeled Faces inthe Wild (LFW) YouTube Faces (YTF)
  • 4.
    ORL Images taken between April1992 and April 1994 at the lab There are 10 different images of each of 40 distinct subjects The size of each image is 92x112 pixels, with 256 grey levels per pixel ~10 mb * http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
  • 5.
    FERET Images taken in2 steps: 1993-1997 1998-2001 1200 subjects, 14051 face images fa fb dublicate 1 fc dublicate 2 Heads with views ranging from frontal to left and right profiles Different lightning Size ~ 8GB * http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
  • 6.
    Labeled Faces inthe Wild Faces collected from the web 1680 subjects, 13 000 face images Size ~200mb * http://vis-www.cs.umass.edu/lfw/
  • 7.
    LFW unrestricted withlabeled outside data protocol 10 batches Each batch: 300 matched 300 mismatched Huang, G.B., Learned-Miller, E.: Labeled faces in the wild: Updates and new reporting procedures. Technical Report UM-CS-2014-003, UMass Amherst (2014) Round 1 Validation Set Training Set Round 2 Round 3 Round 10 ...
  • 8.
    Face Recognition Algorithm Facelocalization Normalization Feature extraction Comparing Verification metric FALSE
  • 9.
    What was beforeDCN Method Accuracy ± SE combined Joint Bayesian 0.9242 ±0.0108 Tom-vs-Pete + Attribute 0.9330 ±0.0128 High-dim LBP 0.9517 ±0.0113 TL Joint Bayesian 0.9633 ± 0.0108 Human, cropped 0.975
  • 10.
    DeepFace Y. Taigman, M.Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014
  • 11.
    Training data Dataset: SocialFace Classification (4030 subjects, ~4.4 million face images) Alignment: 2D 3D
  • 12.
    Architecture Input: RGB image152x152 Output feature size: 4096 Parameters: ~ 120 million Verification metric: weighted chi-squared distance siamese
  • 13.
    Results on LFW MethodAccuracy ± SE combined Joint Bayesian 0.9242 ±0.0108 Tom-vs-Pete + Attribute 0.9330 ±0.0128 High-dim LBP 0.9517 ±0.0113 TL Joint Bayesian 0.9633 ± 0.0108 DeepFace-align2D 0.943 ± 0.0043 DeepFace-Siamese 0.9617 ± 0.0038 DeepFace-ensemble 0.9735 ± 0.0025 Human, cropped 0.975
  • 14.
    DeepID Y. Sun, X.Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In CVPR, 2014.
  • 15.
    Training data Dataset: CelebFaces (10177 subjects, 202 599 face images) Alignment: 2D Patch
  • 16.
    Architecture Input: 39x31 RGBor grayscale Output feature size: 160 Additional algorithms: Joint Bayesian PCAONLY 1 PATCH
  • 17.
    Results on LFW MethodAccuracy ± SE#Net DeepFace-Siamese 1 7 60 100 100 0.9617 ± 0.0038 DeepFace-ensemble 0.9735 ± 0.0025 DeepID on CelebFaces 0.9605 ± … DeepID on CelebFaces+ 0.972 ± … DeepID on CelebFaces+ TL 0.9745 ± 0.0026 Human, cropped 0.975
  • 18.
    CASIA D. Yi, Z.Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.
  • 19.
    Training data Dataset: CASIAWebFace (10 575 subjects and 494 414 face images) Alignment: 2D
  • 20.
    Architecture Input: gray image100x100 Output feature vector size: 320 Parameters:~ 5 million Verification metric: Cosine Additional algorithms: Joint Bayes PCA Combined identification and verification loss Very deep architecture Face Labels Cost2 Contrastive Verification Labels [0,1] Cost1 Softmax Conv12 Conv11 Poo11 2x2+2(S) Conv22 Conv21 Poo12 2x2+2(S) Conv32 Conv31 Poo13 2x2+2(S) Fc6 Dropcut 40% Poo15 7x7+1(S) Conv51 Conv52 Poo14 2x2+2(S) Conv41 Conv42
  • 21.
    Results on LFW MethodAccuracy ± SE#Net DeepFace 7 100 1 1 1 1 1 0.9735 ± 0.0025 DeepID 0.9745 ± 0.0026 DR + Cosine 0.9613 ± 0.003 DR + PCA on CASIA-WebFace + Cosine 0.963 ± 0.0035 DR + Joint Bayes on CASIA-WebFace 0.973 ± 0.0031 DR + PCA on LFW training set + Cosine DR + Joint Bayse on LFW training set 0.9633 ± 0.0042 0.9773 ± 0.0031 * DR – CASIA-WebFace
  • 22.
    CASIA replication Training dataAccuracy on LFW normalized 97,27 % no alignment 94 % our alignment 96,2 %
  • 23.
    Database size Jingtuo Liu,Yafeng Deng, Tao Bai, Zhengping Wei, and Chang Huang; Baidu�Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. Identities Error rate 1.5K 3.1% 9K 1.35% 18K Faces 150K 450K 1.2M 0.87%
  • 24.
    Database availability Dataset #Images#Subjects LFW5 749 2 995 10 177 4 030 2 000 10 575 13 233 WDRef 99 773 CelebFaces 202 599 SFC 4 400 000 CACD 163 446 CASIA-WebFace 494 414 Availability Public Public (feature only) Private Private Public (partial annotated) Public D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.
  • 25.
    Recommendations Big dataset withlarge amount of subjects Careful face extraction and alignment Deep architecture Joint Identification-Verification K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”. arXiv preprint arXiv:1409.1556, 2014 Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In NIPS, 2014.
  • 26.