Face Recognition
&
Deep Learning
sanparith.marukatat@nectec.or.th
Standard procedure
• Image capturing: camera, webcam, surveillance
• Face detection: locate faces in the image
• Face alignment: normalize size, rectify rotation
• Face matching
• 1:1 Face verification
• 1:N Face recognition
Viola-Jones Haar-like detector

(OpenCV haarcascade_frontalface_alt2.xml)
face size~35x35 to 80x80 pixels
too small
occlusion
rotation
Recognition = compare these
faces to known faces
Controlled environment
face size 218x218 pixels
Viola-Jones eye detector
Eyes distance = 81 pixels
Eyes angle = -0.7 degrees
Face size = 180x200 pixels
Eyes distance = 100 pixels
Eyes angle = 0 degrees
Comparing face
• Face image
• Bitmap of size 180x200 pixels
• Grayscale (0-255)
• 36,000 values/face image
• Given 2 face images x1 and x2
• x1(x,y) - x2(x,y)
• | x1(x,y) - x2(x,y) |
• (x1(x,y) - x2(x,y))
2
• What should be used?
Basic Maths
• 1 Face image = 1 vector
• 36,000 dimensions (d)
• matrix with 1 column
• Distance
• Euclidean distance
• Norm-p distance
• Norm-1 distance
• Norm-infinity distance
Pixels importance and projection
• Not all pixels have the same importance
• Pixel with low variation -> not important
• Pixel with large variation -> could be important
Projection
When ||w||=1, wTx is the
projection of x on axis w
w
Subspace projection
• What should be the axis w?
• How many axis do we need?
Principal Component Analysis
PCA (1)
• Basic idea
• Measure of information = variance
• Variance of z1,…,zN for real numbers zt
• Given a set of face vectors x1,…,xN and axis w

Variance of w
T
x1,…,w
T
xN is
Covariance matrix
Principal Component Analysis
PCA (2)
• Best axis w is obtained by maximizing w
T
Cw
with constraint ||w||=1
• w is an eigenvector of C : Cw = a w
• Variance w
T
Cw=a is the corresponding eigenvalue of w
• PCA
• Construct Covariance matrix C
• Eigen-decompose C
• Select m largest eigenvectors
Eigenface (1)
• What is the problem with face data?
• Solution
Dot matrix
dxd matrix
NxN matrix
Eigenface (2)
• We work with vectors of projected values
x1 x2 …
x40
x Enrollment
Template
Eigenface (3)
• Vector of raw intensity: 36,000 dimensions
• Vector of Eigenface coefficients: 10 dimensions
• Large Eigenface = large variation
• Small Eigenface = noise
Related techniques
• Fisherface (LDA)
• Nullspace LDA
• Laplacianface
• Locality Sensitive Discriminant Analysis
• 2DPCA
• 2DLDA
• 2DPCA+2DLDA
Result on ORL (~10 years ago)
Techniques Accuracy #dim
Eigenface 90-95 200
Fisherface 91-97 50
NLDA 92-97 40
Laplacianface 89-95 50
LSDA 91-97 50
2DPCA 91.5
2DLDA 90.5
2DPCA+2DLDA 93.5
Limitations
• Occlusion: glasses, beard
• Lighting condition
• Facial expression
• Pose
• Make-up
Evaluation
• Accuracy: find closest template and check the ID
• Verification (access control)
• Live captured image VS. stored image
• We have distance -> Should we accept or not?
• False Accept (FA) VS. False Reject (FR)
• From a set of face images
• Compute distances between all pair
• Select threshold T that gives 0 FA and X FR
• Number of tries
distance
T
Labeled Faces in the Wild
• Large number of subjects (>5,000)
• Unconstrained conditions
• Human performance 97-99%
• Traditional methods fail
• New alignment technique: funneling
LFW results
Use outside data
to train the model
Deep Learning
Neural Network timeline
McCulloch & Pitts
Neuron model (1943)
Perceptron limitation
(1969)
Backprop algorithm
70-80’s
SVM (1992)
Deep Learning
(2006)
• Return of Neural Network
• Focus on Deep Structure
• Take advantage of today computing power
Neural Networks (1)
• Neurons are connected via synapse
• A neuron receives signals from other neurons
• When the activation reaches a threshold, it
fires a signal to other neurons
http://en.wikipedia.org/wiki/Neuron
Neural Networks (2)
• Universal Approximator
• Classical structure: MLP
• #hidden nodes, learning rate
• Backprop algorithm
• Gradient
• Direction of change that increases value of objective function
• Vector of partial derivatives wrt. each parameters
• Work on all structures, all objective functions
• Stoping criteria, local optima, gradient vanishing/exploding
Deep Learning
• 2006 Hinton et al.: layer by layer construction -> pre-training
• Stack of RBMs, Stack of Autoencoders
• Convolutional NN (CNN)
• Shared weights
• Take advantage of GPU
CNN today
• Common components
• Convolution layer, Max-pooling layer
• ReLU
• Drop-out, Sampling+flip training data
• GPU
• Tools: Caffe, TensorFlow, Theano, Torch
• Structure: LeNet, AlexNet, GoogLeNet
LeNet
LeNet
AlexNet
LeNet
AlexNet
GoogLeNet
LeNet
AlexNet
GoogLeNet
Microsoft deep residual network: 150 layers!
DeepID

(Sun et al. CVPR 2014)
• 160 dim, 60 regions,
flipped
• 19,200 dimensions!!
• Input to other model
• CelebFace
• Refine training
Learning
technique
for
deep structure
Big data
Computing

power
GPU, etc.

Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC

  • 1.
  • 2.
    Standard procedure • Imagecapturing: camera, webcam, surveillance • Face detection: locate faces in the image • Face alignment: normalize size, rectify rotation • Face matching • 1:1 Face verification • 1:N Face recognition
  • 3.
    Viola-Jones Haar-like detector
 (OpenCVhaarcascade_frontalface_alt2.xml) face size~35x35 to 80x80 pixels too small occlusion rotation Recognition = compare these faces to known faces
  • 4.
    Controlled environment face size218x218 pixels Viola-Jones eye detector Eyes distance = 81 pixels Eyes angle = -0.7 degrees Face size = 180x200 pixels Eyes distance = 100 pixels Eyes angle = 0 degrees
  • 6.
    Comparing face • Faceimage • Bitmap of size 180x200 pixels • Grayscale (0-255) • 36,000 values/face image • Given 2 face images x1 and x2 • x1(x,y) - x2(x,y) • | x1(x,y) - x2(x,y) | • (x1(x,y) - x2(x,y)) 2 • What should be used?
  • 7.
    Basic Maths • 1Face image = 1 vector • 36,000 dimensions (d) • matrix with 1 column • Distance • Euclidean distance • Norm-p distance • Norm-1 distance • Norm-infinity distance
  • 8.
    Pixels importance andprojection • Not all pixels have the same importance • Pixel with low variation -> not important • Pixel with large variation -> could be important Projection When ||w||=1, wTx is the projection of x on axis w w
  • 9.
    Subspace projection • Whatshould be the axis w? • How many axis do we need?
  • 10.
    Principal Component Analysis PCA(1) • Basic idea • Measure of information = variance • Variance of z1,…,zN for real numbers zt • Given a set of face vectors x1,…,xN and axis w
 Variance of w T x1,…,w T xN is Covariance matrix
  • 11.
    Principal Component Analysis PCA(2) • Best axis w is obtained by maximizing w T Cw with constraint ||w||=1 • w is an eigenvector of C : Cw = a w • Variance w T Cw=a is the corresponding eigenvalue of w • PCA • Construct Covariance matrix C • Eigen-decompose C • Select m largest eigenvectors
  • 12.
    Eigenface (1) • Whatis the problem with face data? • Solution Dot matrix dxd matrix NxN matrix
  • 13.
    Eigenface (2) • Wework with vectors of projected values x1 x2 … x40 x Enrollment Template
  • 14.
    Eigenface (3) • Vectorof raw intensity: 36,000 dimensions • Vector of Eigenface coefficients: 10 dimensions • Large Eigenface = large variation • Small Eigenface = noise
  • 15.
    Related techniques • Fisherface(LDA) • Nullspace LDA • Laplacianface • Locality Sensitive Discriminant Analysis • 2DPCA • 2DLDA • 2DPCA+2DLDA
  • 16.
    Result on ORL(~10 years ago) Techniques Accuracy #dim Eigenface 90-95 200 Fisherface 91-97 50 NLDA 92-97 40 Laplacianface 89-95 50 LSDA 91-97 50 2DPCA 91.5 2DLDA 90.5 2DPCA+2DLDA 93.5
  • 17.
    Limitations • Occlusion: glasses,beard • Lighting condition • Facial expression • Pose • Make-up
  • 18.
    Evaluation • Accuracy: findclosest template and check the ID • Verification (access control) • Live captured image VS. stored image • We have distance -> Should we accept or not? • False Accept (FA) VS. False Reject (FR) • From a set of face images • Compute distances between all pair • Select threshold T that gives 0 FA and X FR • Number of tries distance T
  • 19.
    Labeled Faces inthe Wild • Large number of subjects (>5,000) • Unconstrained conditions • Human performance 97-99% • Traditional methods fail • New alignment technique: funneling
  • 20.
    LFW results Use outsidedata to train the model
  • 21.
  • 22.
    Neural Network timeline McCulloch& Pitts Neuron model (1943) Perceptron limitation (1969) Backprop algorithm 70-80’s SVM (1992) Deep Learning (2006)
  • 23.
    • Return ofNeural Network • Focus on Deep Structure • Take advantage of today computing power
  • 24.
    Neural Networks (1) •Neurons are connected via synapse • A neuron receives signals from other neurons • When the activation reaches a threshold, it fires a signal to other neurons http://en.wikipedia.org/wiki/Neuron
  • 25.
    Neural Networks (2) •Universal Approximator • Classical structure: MLP • #hidden nodes, learning rate • Backprop algorithm • Gradient • Direction of change that increases value of objective function • Vector of partial derivatives wrt. each parameters • Work on all structures, all objective functions • Stoping criteria, local optima, gradient vanishing/exploding
  • 26.
    Deep Learning • 2006Hinton et al.: layer by layer construction -> pre-training • Stack of RBMs, Stack of Autoencoders • Convolutional NN (CNN) • Shared weights • Take advantage of GPU
  • 27.
    CNN today • Commoncomponents • Convolution layer, Max-pooling layer • ReLU • Drop-out, Sampling+flip training data • GPU • Tools: Caffe, TensorFlow, Theano, Torch • Structure: LeNet, AlexNet, GoogLeNet
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    DeepID
 (Sun et al.CVPR 2014) • 160 dim, 60 regions, flipped • 19,200 dimensions!! • Input to other model • CelebFace • Refine training
  • 33.