Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lecture 10 ming yang - face recognition systems

1,682 views

Published on

ICVSS 2014 slides

Published in: Software
  • Be the first to comment

Lecture 10 ming yang - face recognition systems

  1. 1. • •
  2. 2. Machine Learning Biometric System Computer Vision Image Processing FaceRec
  3. 3. • • • http://www.marketsandmarkets.com/PressReleases/facial-recognition.asp , 5/2013
  4. 4. • Security and access control – ATM, buildings, airports/border control, smartphones • Law enforcement and criminal justice system – Mug-shot systems, post-event analysis, forensics, missing children • General identity verification (smart cards) – Driver license, passports, voter registration, welfare fraud • Advanced video surveillance (CCTV control) • Entertainment and multimedia HCI – Smart TV, context-aware system, VIP customers – Video indexing and celebrity search – Refining search engine searches – Photo tagging in social media
  5. 5. • Face verification (1:1) • Face identification (1:N) • Face search/retrieval • Still image • Video sequence • 3D/depth sensing or infrared imagery
  6. 6. • Facial key-point / landmark detection – Face editing, automatic face beautification on selfie photos • Face pose estimation – Amazon Fire Phone, rendering 3D scenes w.r.t. viewer’s head pose • Face or eye or gaze tracking – Fatigue detection for drivers, HCI, sport video analysis • Facial attribute recognition – Gender/age/ethnicity estimation, business intelligence • Face expression recognition – Sentiment / emotion analysis, perception of advertising • Face liveness detection • Face hallucination or super-resolution • Face synthesis (3D morphable models)
  7. 7. • Faces belong to a single category – Subtle intrapersonal and interpersonal variations • Intrinsic factors – Aging, facial expressions, hair styles, accessories, makeup, etc. • Extrinsic factors – illumination, pose, resolution, scaling, noise, occlusion, etc. Boston Marathon bombing suspects in surveillance footage, 4/15/2013
  8. 8. property constrained unconstrained resolution about 2000x2000 50x50 viewpoint fully frontal rotated, loose illumination controlled arbitrary occlusion disallowed allowed
  9. 9. Detect Align Represent Classify
  10. 10. • W.W. Bledsoe, “Man-machine facial recognition”, Tech. Report, PRI 22, Panaromic Res. Inc., 1966 • T. Kanade, “Picture processing system by computer complex and recognition of human faces”, Doctoral Dissertation, Kyoto University, 1973 • Almost all successful algorithms in pattern recognition / machine learning / computer vision have been applied to FaceRec !
  11. 11. Courtesy by A.K. Jain, Intl. Conf. on Biometrics 2013.
  12. 12. • Feature-based structural matching approaches – Kanade, 1973: 16 geometrical facial parameters – Brunelli and Poggio, 1993: 35 geometric features – Wiskott et al., 1997, Elastic bunch graph matching (EBGM) – … – Chen et al., Blessing of dimensionality, High-dim LBP, CVPR 2013 • Appearance-based holistic approaches – Turk and Pentland, 1991, Eigenfaces (PCA) – Belhumeur et al., 1997, Fisherfaces (LDA) – He and Yan et al., 2003, Laplacianfaces (LPP) – Wright, et al., 2009, Sparse representation (SRC) – … – Taigman and Yang, et al., DeepFace, CVPR 2014 • Hybrid approaches
  13. 13. Face: key-points Face recognition by elastic bunch graph matching, Wiskott, et al., TPAMI 1997 Face recognition: features versus templates, Brunelli and Poggio, TPAMI 1993 Feature based face recognition using mixture-distance, Cox, et al., CVPR 1996 35 geometric features Face: A graph of key-points with a bunch of Gabor features
  14. 14. • LBP, 59 uniform LBP for (8,1) circular neighborhood Face recognition with local binary patterns, Ahonen, et al., ECCV 2004 • Face: spatially enhanced histogram of LBP descriptors
  15. 15. Face: a 2D array of intensities projections on Eigenfaces Face recognition using Eigenfaces, Turk and Pentland, CVPR 1991. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, Belhumeur, Hespanha, Kriegman,TPAMI 1997. Face: projections on Fisherfaces (LDA on scatter matrices) “the variations between the images of the same face due to lighting are almost always larger than image variations due to a change in face identity”
  16. 16. • Isometric Feature Mapping (ISOMAP) • Locally Linear Embedding (LLE) • Locality Preserving Projection (LPP) • Other metric learning methods … Face recognition using Laplacianfaces, He, Yan, Hu, Niyogi, Zhang, TPAMI 2005. Face: projections on Laplacianfaces, a linear embedding preserving local manifold structures of an adjacency graph. Laplacian L=D-S of the nearest neighbor graph Low-dimensional embedding
  17. 17. Robust face recognition via sparse representation, Wright, et al., TPAMI 2009. Face: a 2D array of intensities a sparse representation in a sufficiently large feature space Eigenfaces, Laplacianfaces, downsampled, random projections
  18. 18. • Integral images • Fast Haar features • Cascaded boosting classifier • Bootstrapping hard negatives • Sliding window search on image pyramid • Non-maximum suppression Rapid object detection using a boosted cascade of simple features, Viola&Jones, CVPR01
  19. 19. • In-plane/out-of-plane rotations (roll, yaw, pitch) High-performance rotation invariant multiview face detection, Huang, et al., PAMI 2007 Non-face • Width-first-search tree: multi-class vector boosting
  20. 20. • Pixel features in a granular space High-performance rotation invariant multiview face detection, Huang, et al., PAMI 2007 • Weak classifier: stump function piece-wise function • 30K frontal, 25K half profile, 20K full profile faces
  21. 21. • Define and label facial landmarks • Build landmark detectors – templates, SVM/regression • Constrain shape variations – Point distribution models for all landmarks • EM-like algorithm – M-step: Find the optimized location for each landmark individually – E-step: Smooth the shape by using point distribution model Active shape models: their training and application, Cootes, CVIU 1995 Active appearance models, Cootes, et al., ECCV 1998.
  22. 22. • Efficient joint optimization • Efficient landmark detector / regresssor • New modeling of local appearances and global shape Face alignment at 3000 FPS via regressing local binary features, Ren, Cao, Wei, Sun, et al., CVPR 2014 Face alignment through subspace constrained mean-shifts, Saragih, et al. ICCV 2009 Face alignment by explicit shape regression, Cao, et al., CVPR 2012 Face detection, pose estimation, and landmark localization in the wild, Zhu and Ramanan, CVPR 2012 Deep convolutional network cascade for facial point detection, Sun, et al., CVPR 2013 Detecting and aligning faces by image retrieval, Shen, et al., CVPR 2013 Head pose estimation in computer vision: A survey, Murphy-Chutorian and Trivedi, TPAMI 2009
  23. 23. • CMU Multi-Pie – 750,000+ faces of 337 individuals – 9 view points, 19 lighting conditions – Facial expressions in 4 sessions • Extended Yale B dataset – 21,800+ faces of 38 individuals – 9 poses, 64 lighting conditions • CAS-PEAL face dataset – 30,900 faces of 1040 individuals – Pose, lighting, accessories, etc. • MORPH dataset – 55,000 faces of 13,000 individuals – Mug-shot images, biased distribution
  24. 24. • FERET (Face Recognition Technology), 1993-1997 – 316 individuals (1993) 14,051 images of 1199 individuals (2000). • FRVT (Face Vendor Recognition Test) 2000, 2002, 2006 • FRVT 2010 (still image track of MBE 2010) • FRVT 2013 (done in 5/2014) – 1.6 million faces – frontal faces with ambient lighting Organized by National Institute of Standards and Technology (NIST), USA
  25. 25. J. Phillips, FRVT 2010 Report by NIST FRR = 0.3% at FAR = 0.1% Error rate drops by 3 orders of magnitude in 20 years! Test 1: 9,240 true-mates vs. 10K imposters Test 2: 12X3000X2=72K genuine scores vs. 18M imposter scores
  26. 26. • A large gallery: 1.6 million faces • Probe set: 171K mug-shots and 10.7K webcam images • Evaluation metric: rank-1 miss rate Vendor Mug-shot Webcam NEC 4.1% 11.3% Morpho 9.1% 36.4% Toshiba 10.7% 23.7% Cognitec 13.6% 57.6% 3M 17.2% 36.4% Neurotech 20.5% 66.9% FRVT 2013 Report from NIST
  27. 27. • A large gallery: 1.6 million faces • Probe set: 40K mated and 40K non-mated searches • Evaluation metric: – Rank-1 miss rate – FNIR at FPIR = 0.002 – FNIR (false negative identification rate) – FPIR (false positive identification rate) Vendor FRVT 2013 FRVT 2010 FNIR (FPIR = 0. 2%) NEC 6.4% 8.9% 10.8% Morpho 12.1% 13.5% 19.4% Cognitec 17.0% 18.7% 34.2% Neurotech 23.1% 25.8% 68.4% FRVT 2013 Report from NIST
  28. 28. • Gallery images: 1 million mug-shot + 6 web images • Probe images: 5 faces • Rank ranking – w/o or with demographic filtering A case study of automated face recognition: the Boston Marathon bombing suspects, J. C. Klontz and A.K. Jain, IEEE Computer, 2013
  29. 29. • What is the state-of-the-art TPR (true positive rate) at FAR (false alarm rate) 0.1% for constrained face verification in FRVT 2010? – ( a ) 99.7% – ( b ) 97.5% – ( c ) 95.9% • What is the state-of-the-art rank-1 accuracy on probe- gallery search among 1.6 million faces for constrained face identification in FRVT 2013? – ( a ) 99.7% – ( b ) 97.5% – ( c ) 95.9%
  30. 30. CVPR 2014
  31. 31. No automatic face recognition service in EU countries
  32. 32. property constrained unconstrained resolution about 2000x2000 50x50 viewpoint fully frontal rotated, loose illumination controlled arbitrary occlusion disallowed allowed
  33. 33. • Data collection – 13,233 web photos of 5,749 celebrities – 6,000 face pairs in 10 splits • Metric: mean recognition accuracy over 10 folds – Restricted protocol: only same/not-same labels – Unrestricted protocol: face identities, additional training pairs – Unsupervised setting: no training whatsoever on LFW images Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Huang, Jain, Learned-Miller, ECCVW, 2008
  34. 34. • User study on Mechanical Turk – 10 different workers per face pair – Average human performance – Original images, tight crops, inverse crops Attribute and simile classifiers for face verification, Kumar, et al., ICCV 2009 99.20% 97.53% 94.27%
  35. 35. • • • •  • • • • • • •  
  36. 36. 60.02% 73.93% 78.47% 85.54% 88.00% 92.58% 95.17% 96.33% 97.53% 37.08% 19.24% 37.09% 20.52% 48.06% 52.32% 49.15% Accuracy / year Reduction of error wrt human / year
  37. 37. • Accurate (27) dense facial landmarks • Concatenate multi-scale descriptors – ~100K-dim LBP, SIFT, Garbor, etc. • Transfer learning: Joint Bayesian • WDRef dataset – 99,773 images of 2,995 individuals – 95.17% => 96.33% on LFW (unrestricted protocol) Face alignment by explicit shape regression, Cao, et al., CVPR 2012 Bayesian face revisited: A joint formulation, Chen, et al., ECCV 2012 Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification, Chen, et al., CVPR 2013 A practical transfer learning algorithm for face verification, Cao, et al., ICCV 2013 Likelihood ratio test: EM update of the between/within class covariance
  38. 38. • 12X5 Siamese ConvNets X8 + RBM classification Hybrid deep learning for computing face similarities, Sun, Wang, Tang, ICCV 2013. 12 face regions 8 pairs of inputs
  39. 39. Detect Align Represent Classify
  40. 40. Localization Front-End ConvNet Local (Untied) Convolutions Globally Connected
  41. 41. DeepFace Replica DeepFace Replica
  42. 42. 88.00% 92.58% 95.17% 96.33% 97.35% 97.53% 98.4% 2010 2011 2012 2013 DeepFace Human New* Reduction of error wrt human / year Accuracy / year 20.52% 48.06% 52.32% 49.15% 85.00%
  43. 43. Face recognition in unconstrained videos with matched background similarity, Wolf, Hassner, Maoz, ICCV 2011 • Data collection – 3,425 Youtube videos 1,595 celebrities (a subset of LFW subjects) – 5,000 video pairs in 10 splits – Detected and roughly aligned face frames available. • Metric: mean recognition accuracy over 10 folds – Restricted protocol: only same/not-same labels – Unrestricted protocol: face identities, additional training pairs
  44. 44. 87.9 93.7 94.3 97.35 91.3 No Alignment 3D Pertrubation 2D Alignment 3D Alignment 3D Alignment + LBP (LFW Acc. %) 97 96.07 96.72 95.53 97.17 95.87 4096 4096 bits 1024 1024 bits 256 256 bits 0 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 7 0.8 0.9 1
  45. 45. 8.74 10.9 15.1 20.7 100% of the data 50% of the data 20% of the data 10% of the data DB Size / DNN Test Error (%) 8.74 11.2 12.6 13.5 C1+M2+C3+L4+L5+L6+F7 -C3 -L4 -L5 -C3 -L4 -L5
  46. 46. • Naïve binarization 97 96.72 96.78 97.17 96.42 96.1 94.5 92.75 89.8 96.07 95.53 95.5 95.87 93.38 91.45 87.15 85 87 89 91 93 95 97 dim=4096 dim=1024 dim=512 dim=256 dim=128 dim=64 dim=32 dim=16 dim=8 Verification accuracy (%) on LFW (restricted protocol) float binary
  47. 47. • All false negatives on LFW (1%)
  48. 48. • All false positive on LFW (0.65%)
  49. 49. • Sample false negatives on YTF
  50. 50. • Sample false positives on YTF
  51. 51. • Coupling 3D alignment with large-capacity locally-connected networks • At the brink of human-level performance for face verification
  52. 52. 50.00% 55.00% 60.00% 65.00% 70.00% 75.00% 80.00% 85.00% 90.00% 95.00% 100.00% Accuracy / year
  53. 53. Baseline Rank-1 rate (%) Rank-1 rate (%) @ 1% False alarm rate Verification (%) [1] 25 56.7 NA [2] 44.5 64.9 97.35 [3] 61.9 82.5 98.4
  54. 54. • What is the state-of-the-art level of rank-1 accuracy searching 3K faces against 4K gallery faces on the unconstrained LFW dataset? – ( a ) ~80% – ( b ) ~60% – ( c ) ~40%
  55. 55. • Questions • Comments • Suggestions • We are recruiting! • https://www.facebook.com/careers/ • Locations: MPK/Seattle/NYC/London/Dublin

×