Real-time Face Recognition & Detection Systems 1

www.themegallery.com
Company Logo
Topics Covered
Brief History1
Definitions2
Methods33
Demo44
References35

 What is Face Detection?
 Given an image, tell whether there is any human face
or not.
 If there is then find the location and size of each
human face in the image.
 Classification between face and non-face.
Results for Training_1.jpg

 Why Face Detection is Important?
First step for any automatic face recognition system.
First step in many Human-Computer Interaction
systems and Man-machine Interaction.
• Expression Recognition
• Cognitive State/Emotional State Recogntion
 First step in many surveillance systems
Tracking: Face is a highly non rigid object.
A step towards Automatic Target Recognition(ATR) or
generic object detection/recognition.

WHAT IS FACE RECOGNITION?
“Face Recognition is the task
of identifying an already
detected face as a KNOWN
or UNKNOWN face, and
in more advanced cases
TELLING EXACTLY WHO’S IT IS ! “
FACE DETECTION FEATURE
EXTRACTION
FACE
RECOGNITION

 Difference between Face Detection and
Face Recognition
FD:-Only two classifications
 face or non face
FR:- have multiple classifications, adjusted by the
number of individuals who want to be recognized.
 One person vs. all the others
 Classification from face shape, of form eyes, nose,
mouth, etc.
 FR process requires the first FD

 Methods for Face Detection
 Knowledge-based methods:
 Encode what constitutes a typical face.
e.g., the relationship between facial features
 Feature invariant approaches:
 Aim to find structure features of a face that exist even when
pose, viewpoint or lighting conditions vary
 Template Matching:
 Several standard patterns stored to describe the face as a whole
or the facial features separately
 Appearance-based methods:
 The models are learned from a set of training images that
capture the representative variability of faces.
1 2 3

[10]
Three goals & a conclusion
1. Feature Computation : what features? And h
ow can they be computed as quickly as poss
ible
2. Feature Selection : select the most discrimin
ating features
3. Real-timeliness : must focus on potentially p
ositive areas (that contain faces)
4. Conclusion : presentation of results and disc
ussion of detection issues.
How did Viola & Jones deal with these challenges?

Three solutions
 Feature Computation
~The “Integral” image representation
 Feature Selection
~The AdaBoost training algorithm
 Real-timeliness
~A cascade of classifiers

[13]
Overview | Integral Image | AdaBoost | Cascade
Features  Can a simple feature (i.e. a value) ind
icate the existence of a face?
 All faces share some similar properti
es
The eyes region is darker than the
upper-cheeks.
The nose bridge region is brighter
than the eyes.
That is useful domain knowledge
 Need for encoding of Domain Knowl
edge:
Location - Size: eyes & nose bridge r
egion
Value: darker / brighter

[14]
 Rectangle features:
Value = ∑ (pixels in black area) - ∑ (pi
xels in white area)
Three types: two-, three-, four-rectangl
es, Viola&Jones used two-rectangle fe
atures
For example: the difference in brightne
ss between the white &black rectangle
s over a specific area
 Each feature is related to a special lo
cation in the sub-window
 Each feature may have any size
 Why not pixels instead of features?
Features encode domain knowledge
Feature based systems operate faster

[15]
Rapid computation of rectangular features
 Using the integral image represen
tation we can compute the value
of any rectangular sum (part of fe
atures) in constant time
For example the integral sum inside
rectangle D can be computed as:
ii(d) + ii(a) – ii(b) – ii(c)
 two-, three-, and four-rectangular
features can be computed with 6,
8 and 9 array references respectiv
ely.
 As a result: feature computation
takes less time
ii(a) = A
ii(b) = A+B
ii(c) = A+C
ii(d) = A+B+C+D
D = ii(d)+ii(a)-
ii(b)-ii(c)

LOGO
Feature selection
Adaboost Algorithm

[17]
Feature selection
 Problem: Too many features
In a sub-window (24x24) there are ~
160,000 features (all possible combi
nations of orientation, location and
scale of these feature types)
impractical to compute all of them (
computationally expensive)
 We have to select a subset of rele
vant features – which are informati
ve - to model a face
Hypothesis: “A very small subset of
features can be combined to form a
n effective classifier”
How? SOLUTION:
AdaBoost algorithm
Relevant feature Irrelevant feature

[18]
AdaBoost
 Stands for “Adaptive” boost
 Constructs a “strong” classifier as a linear co
mbination of weighted simple “weak” classif
iers
Strong
classifier
Weak classifier
WeightImage

[19]
AdaBoost - Characteristics
 Features as weak classifiers
Each single rectangle feature may be regarded as a simple we
ak classifier
 An iterative algorithm
AdaBoost performs a series of trials, each time selecting a new
weak classifier
 Weights are being applied over the set of the exam
ple images
During each iteration, each example/image receives a weight
determining its importance

[20]
AdaBoost - Getting the idea…
 Given: example images labeled +/-
Initially, all weights set equally
 Repeat T times
Step 1: choose the most efficient weak classifier that will be a compo
nent of the final strong classifier (Problem! Remember the huge num
ber of features…)
Step 2: Update the weights to emphasize the examples which were i
ncorrectly classified
This makes the next weak classifier to focus on “harder” example
s
 Final (strong) classifier is a weighted combination of the T “weak” cl
assifiers
Weighted according to their accuracy





   
otherwise
x
xh
T
t
T
t ttt h
0
2
1
)(1
)( 1 1

[21]
AdaBoost – Feature Selection
Problem
 On each round, large set of possible weak classifiers (each simpl
e classifier consists of a single feature) – Which one to choose?
choose the most efficient (the one that best separates the exa
mples – the lowest error)
choice of a classifier corresponds to choice of a feature
 At the end, the ‘strong’ classifier consists of T features
Conclusion
 AdaBoost searches for a small number of good classifiers – fea
tures (feature selection)
 adaptively constructs a final strong classifier taking into accou
nt the failures of each one of the chosen weak classifiers (weig
ht appliance)
 AdaBoost is used to both select a small set of features and trai
n a strong classifier

LOGO
A cascade of classifiers
(for realtimeness)

[23]
Now we have a good face detector
 Thus, We can build a 200-feat
picture classifier.
 Experiments showed that a
200-feature classifier achieves:
95% detection rate
0.14x10-3 FP rate (1 in 14084)
Scans all sub-windows of a 384
x288 pixel image in 0.7 seconds
(on Intel PIII 700MHz)
 The more the better (?)
Gain in classifier performance
Lose in CPU time
 Verdict: good & fast, but not e
nough
Competitors achieve close to 1
in a 1.000.000 FP rate!
0.7 sec / frame IS NOT real-tim
e.

[24]
Training a cascade of classifiers
Strong classifier definition:





  
otherwise
x
xh
T
t
t
T
t
tt h
0
2
1
)(1
)( 11
 ,
where )
1
log(

t
t
 ,




1 t
t
t
 Keep in mind:
 Competitors achieved 95% TP rate,10-6 FP rate
 These are the goals. Final cascade must do better!
 Given the goals, to design a cascade we must choose:
 Number of layers in cascade (strong classifiers)
 Number of features of each strong classifier (the ‘T’ in definition)
 Threshold of each strong classifier (the in definition)
 Optimization problem:
 Can we find optimum combination?
 
T
t t1
2
1


[25]
A simple framework for cascade training
 Viola & Jones suggested a heuristic algorithm for the cascade
training:
 does not guarantee optimality
 but produces a “effective” cascade that meets previous goals
 Manual Tweaking:
 overall training outcome is highly depended on user’s choices
 select fi (Maximum Acceptable False Positive rate / layer)
 select di (Minimum Acceptable True Positive rate / layer)
 select Ftarget (Target Overall FP rate)
 possible repeat trial & error process for a given training set
 Until Ftarget is met:
 Add new layer:
 Until fi , di rates are met for this layer
 Increase feature number & train new strong classifier with AdaBoost
 Determine rates of layer on validation set

[26]
Viola & Jones Algorithm
User selects values for f, the maximum acceptable false positive rate per layer and d,
the minimum acceptable detection rate per layer.
User selects target overall false positive rate Ftarget.
P = set of positive examples
N = set of negative examples
F0 = 1.0; D0 = 1.0; i = 0
While Fi > Ftarget
i++
ni = 0; Fi = Fi-1
while Fi > f x Fi-1
oni ++
oUse P and N to train a classifier with ni features using AdaBoost
oEvaluate current cascaded classifier on validation set to determine Fi and Di
oDecrease threshold for the ith classifier until the current cascaded classifier has
a detection rate of at least d x Di-1 (this also affects Fi)
N = 
If Fi > Ftarget then evaluate the current cascaded detector on the set of non-face
images and put any false detections into the set N.

Training
Set
(sub-
windows)
Integral
Representation
Feature
computation
AdaBoost
Feature Selection
Cascade trainer
Training phase
Strong Classifier 1
(cascade stage 1)
Strong Classifier N
(cascade stage N)
Classifier cascade
framework
Strong Classifier 2
(cascade stage 2)
FACE IDENTIFIED

PCA ALGORITHM
STEP 1 : Convert image of training set to
image vectors
A training set consisting of total M images
Each image is of size N x N

STEP 1: Convert image of training set to image vectors(Contd.)
A training set consisting of total M image
Image converted to vector
For each (image in
training set)
Ti Vector
N x N
Image
N
Free vector space

STEP 2: Normalize the face vector
1. Calculate the average face vector
Free
……
Calculate average face vector
‘U’
Ti
U
Free vector space

……
Calculate average face vector
‘U’
Then subtract mean(average) face
vector from EACH face vector to
get to get normalized face vector
Øi=Ti-U
Ti
U
1. Calculate the average
face vectors
2. Subtract avg face vector
from each face vector
Free vector space
STEP 2: Normalize the face vector(Contd.)

……
Øi=Ti-U
Eg. a1 – m1
a2 – m2
Ø1= . .
. .
a3 – m3
Ti
U
1. Calculate the average
face vectors
2. Subtract avg face vector
from each face vector
STEP 2: Normalize the face vector(Contd.)
Free vector space

STEP 3: Calculate the Eigenvectors
(Eigenvectors represent the variations in the faces )
……
To calculate the eigenvectors ,
we need to calculate the
covariance vector C
C=A.AT
where A=[Ø1, Ø2, Ø3,… ØM]
N2 X M
Ti
U
Free vector space

……
Ti
U
C=A.AT
N2 X M M X N2 = N2 X N2
Very huge
matrix
……
N2 eigenvectorsA training set consisting of total M image
Free vector space
STEP 3: Calculate the Eigenvectors (Contd.)

……
Ti
U
……
N2 eigenvectors
But we need to find only K
eigenvectors from the above
N2 eigenvectors, where K<M
Eg. If N=50 and K=100 , we
need to find 100 eigenvectors
from 2500 (i.e.N2 ) VERY TIME
CONSUMING
Free vector space

……
Ti
U
……
N2 eigenvectors
SOLUTION
“DIMENSIONALITY
REDUCTION”
i.e. Calculate eigenvectors
from a covariance of reduced
dimensionality
Free vector space

……
Ti
U
……
M2 eigenvectors
New C=AT .A
M XN2 N2 X M = M XM
matrix
STEP 4: Calculating eigenvectors from
reduced covariance matrix
Free vector space

STEP 5: Select K best eigenfaces such that
K<=M and can represent the whole training set
• Selected K eigenfaces MUST be in the ORIGINAL
dimensionality of the face Vector Space

STEP 6: Convert lower dimension K
eigenvectors to original face dimensionality
•
•
……
Ti
U
……
100 eigenvectors
ui = A vi
ui = ith eigenvector in the
higher dimensional space
vi = ith eigenvector in the
lower dimensional space
Free vector space

……
2500 eigenvectors
……
100 eigenvectors
= A
ui = A vi
ui
vi
Each 100 X 1
Each 2500 X 1
dimension

……
2500 eigenvectors
ui
Each 2500 X 1 dimension
yellow color shows K selected eigenfaces = ui

STEP 6: Represent each face image a linear
combination of all K eigenvectors
w1 w2 w3 w4 …. wk
∑
w of mean face
We can say, the above image contains a little bit proportion of all these
eigenfaces.
w1
Ω = w2
:
wk

Calculating weight of each eigenfaces
The formula for calculating the weight is:
wi= Øi. Ui
For Eg.
 w1= Ø1. U1
 w2= Ø2. U2

Recognizing an unknown face
r1
r2
:
rk
Convert
the input
image to a
face vector
Normalize
the face
vector
a1 – m1
i a2 – m2
. .
. .
a3 – m3
Project
Normalized
face onto the
eigenspace
Weight vector of
input image
w1
Ω= w2
:
wk
Calculate Distance between
input weight vector and all
the weight vector of training
set
€=|Ω–Ωi|2
i=1…M
UNKNOWN FACE
NO
YES
RECOGNIZED AS
Input image of
UNKNOWN FACE
Is
Distance
=thresod
∂ ?

References:
 www.google.com
 www.fec.rec.org.
 www.wikipedia.com
 http://sebastianraschka.com/Articles/2014_pca_step_by_
step.html
 M. Lam, H. Yan, An analytic-to-holistic approach for face
recognition based on a single frontal view, IEEE Trans.
Pattern Anal. Mach. Intel. 20 (1998) 673-686.
 Zhang, Automatic adaptation of a face model using action
units for semantic coding of videophone sequences, IEEE
Trans. Circuits Systems Video Technol. 8 (6) (1998) 781-
795.

Real-time Face Recognition & Detection Systems 1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Real-time Face Recognition & Detection Systems 1

Similar to Real-time Face Recognition & Detection Systems 1 (20)

Real-time Face Recognition & Detection Systems 1

Editor's Notes