4. What is Face Detection?
Given an image, tell whether there is any human face
or not.
If there is then find the location and size of each
human face in the image.
Classification between face and non-face.
Results for Training_1.jpg
5. Why Face Detection is Important?
First step for any automatic face recognition system.
First step in many Human-Computer Interaction
systems and Man-machine Interaction.
• Expression Recognition
• Cognitive State/Emotional State Recogntion
First step in many surveillance systems
Tracking: Face is a highly non rigid object.
A step towards Automatic Target Recognition(ATR) or
generic object detection/recognition.
6. WHAT IS FACE RECOGNITION?
“Face Recognition is the task
of identifying an already
detected face as a KNOWN
or UNKNOWN face, and
in more advanced cases
TELLING EXACTLY WHO’S IT IS ! “
FACE DETECTION FEATURE
EXTRACTION
FACE
RECOGNITION
7. Difference between Face Detection and
Face Recognition
FD:-Only two classifications
face or non face
FR:- have multiple classifications, adjusted by the
number of individuals who want to be recognized.
One person vs. all the others
Classification from face shape, of form eyes, nose,
mouth, etc.
FR process requires the first FD
9. Methods for Face Detection
Knowledge-based methods:
Encode what constitutes a typical face.
e.g., the relationship between facial features
Feature invariant approaches:
Aim to find structure features of a face that exist even when
pose, viewpoint or lighting conditions vary
Template Matching:
Several standard patterns stored to describe the face as a whole
or the facial features separately
Appearance-based methods:
The models are learned from a set of training images that
capture the representative variability of faces.
1 2 3
10. [10]
Three goals & a conclusion
1. Feature Computation : what features? And h
ow can they be computed as quickly as poss
ible
2. Feature Selection : select the most discrimin
ating features
3. Real-timeliness : must focus on potentially p
ositive areas (that contain faces)
4. Conclusion : presentation of results and disc
ussion of detection issues.
How did Viola & Jones deal with these challenges?
11. Three solutions
Feature Computation
~The “Integral” image representation
Feature Selection
~The AdaBoost training algorithm
Real-timeliness
~A cascade of classifiers
13. [13]
Overview | Integral Image | AdaBoost | Cascade
Features Can a simple feature (i.e. a value) ind
icate the existence of a face?
All faces share some similar properti
es
The eyes region is darker than the
upper-cheeks.
The nose bridge region is brighter
than the eyes.
That is useful domain knowledge
Need for encoding of Domain Knowl
edge:
Location - Size: eyes & nose bridge r
egion
Value: darker / brighter
14. [14]
Overview | Integral Image | AdaBoost | Cascade
Rectangle features:
Value = ∑ (pixels in black area) - ∑ (pi
xels in white area)
Three types: two-, three-, four-rectangl
es, Viola&Jones used two-rectangle fe
atures
For example: the difference in brightne
ss between the white &black rectangle
s over a specific area
Each feature is related to a special lo
cation in the sub-window
Each feature may have any size
Why not pixels instead of features?
Features encode domain knowledge
Feature based systems operate faster
15. [15]
Overview | Integral Image | AdaBoost | Cascade
Rapid computation of rectangular features
Using the integral image represen
tation we can compute the value
of any rectangular sum (part of fe
atures) in constant time
For example the integral sum inside
rectangle D can be computed as:
ii(d) + ii(a) – ii(b) – ii(c)
two-, three-, and four-rectangular
features can be computed with 6,
8 and 9 array references respectiv
ely.
As a result: feature computation
takes less time
ii(a) = A
ii(b) = A+B
ii(c) = A+C
ii(d) = A+B+C+D
D = ii(d)+ii(a)-
ii(b)-ii(c)
17. [17]
Overview | Integral Image | AdaBoost | Cascade
Feature selection
Problem: Too many features
In a sub-window (24x24) there are ~
160,000 features (all possible combi
nations of orientation, location and
scale of these feature types)
impractical to compute all of them (
computationally expensive)
We have to select a subset of rele
vant features – which are informati
ve - to model a face
Hypothesis: “A very small subset of
features can be combined to form a
n effective classifier”
How? SOLUTION:
AdaBoost algorithm
Relevant feature Irrelevant feature
18. [18]
Overview | Integral Image | AdaBoost | Cascade
AdaBoost
Stands for “Adaptive” boost
Constructs a “strong” classifier as a linear co
mbination of weighted simple “weak” classif
iers
Strong
classifier
Weak classifier
WeightImage
19. [19]
Overview | Integral Image | AdaBoost | Cascade
AdaBoost - Characteristics
Features as weak classifiers
Each single rectangle feature may be regarded as a simple we
ak classifier
An iterative algorithm
AdaBoost performs a series of trials, each time selecting a new
weak classifier
Weights are being applied over the set of the exam
ple images
During each iteration, each example/image receives a weight
determining its importance
20. [20]
AdaBoost - Getting the idea…
Overview | Integral Image | AdaBoost | Cascade
Given: example images labeled +/-
Initially, all weights set equally
Repeat T times
Step 1: choose the most efficient weak classifier that will be a compo
nent of the final strong classifier (Problem! Remember the huge num
ber of features…)
Step 2: Update the weights to emphasize the examples which were i
ncorrectly classified
This makes the next weak classifier to focus on “harder” example
s
Final (strong) classifier is a weighted combination of the T “weak” cl
assifiers
Weighted according to their accuracy
otherwise
x
xh
T
t
T
t ttt h
0
2
1
)(1
)( 1 1
21. [21]
AdaBoost – Feature Selection
Problem
On each round, large set of possible weak classifiers (each simpl
e classifier consists of a single feature) – Which one to choose?
choose the most efficient (the one that best separates the exa
mples – the lowest error)
choice of a classifier corresponds to choice of a feature
At the end, the ‘strong’ classifier consists of T features
Conclusion
AdaBoost searches for a small number of good classifiers – fea
tures (feature selection)
adaptively constructs a final strong classifier taking into accou
nt the failures of each one of the chosen weak classifiers (weig
ht appliance)
AdaBoost is used to both select a small set of features and trai
n a strong classifier
Overview | Integral Image | AdaBoost | Cascade
23. [23]
Now we have a good face detector
Thus, We can build a 200-feat
picture classifier.
Experiments showed that a
200-feature classifier achieves:
95% detection rate
0.14x10-3 FP rate (1 in 14084)
Scans all sub-windows of a 384
x288 pixel image in 0.7 seconds
(on Intel PIII 700MHz)
The more the better (?)
Gain in classifier performance
Lose in CPU time
Verdict: good & fast, but not e
nough
Competitors achieve close to 1
in a 1.000.000 FP rate!
0.7 sec / frame IS NOT real-tim
e.
Overview | Integral Image | AdaBoost | Cascade
24. [24]
Training a cascade of classifiers
Overview | Integral Image | AdaBoost | Cascade
Strong classifier definition:
otherwise
x
xh
T
t
t
T
t
tt h
0
2
1
)(1
)( 11
,
where )
1
log(
t
t
,
1 t
t
t
Keep in mind:
Competitors achieved 95% TP rate,10-6 FP rate
These are the goals. Final cascade must do better!
Given the goals, to design a cascade we must choose:
Number of layers in cascade (strong classifiers)
Number of features of each strong classifier (the ‘T’ in definition)
Threshold of each strong classifier (the in definition)
Optimization problem:
Can we find optimum combination?
T
t t1
2
1
25. [25]
A simple framework for cascade training
Overview | Integral Image | AdaBoost | Cascade
Viola & Jones suggested a heuristic algorithm for the cascade
training:
does not guarantee optimality
but produces a “effective” cascade that meets previous goals
Manual Tweaking:
overall training outcome is highly depended on user’s choices
select fi (Maximum Acceptable False Positive rate / layer)
select di (Minimum Acceptable True Positive rate / layer)
select Ftarget (Target Overall FP rate)
possible repeat trial & error process for a given training set
Until Ftarget is met:
Add new layer:
Until fi , di rates are met for this layer
Increase feature number & train new strong classifier with AdaBoost
Determine rates of layer on validation set
26. [26]
Viola & Jones Algorithm
User selects values for f, the maximum acceptable false positive rate per layer and d,
the minimum acceptable detection rate per layer.
User selects target overall false positive rate Ftarget.
P = set of positive examples
N = set of negative examples
F0 = 1.0; D0 = 1.0; i = 0
While Fi > Ftarget
i++
ni = 0; Fi = Fi-1
while Fi > f x Fi-1
oni ++
oUse P and N to train a classifier with ni features using AdaBoost
oEvaluate current cascaded classifier on validation set to determine Fi and Di
oDecrease threshold for the ith classifier until the current cascaded classifier has
a detection rate of at least d x Di-1 (this also affects Fi)
N =
If Fi > Ftarget then evaluate the current cascaded detector on the set of non-face
images and put any false detections into the set N.
Overview | Integral Image | AdaBoost | Cascade
30. PCA ALGORITHM
STEP 1 : Convert image of training set to
image vectors
A training set consisting of total M images
Each image is of size N x N
31. STEP 1: Convert image of training set to image vectors(Contd.)
A training set consisting of total M image
Image converted to vector
For each (image in
training set)
Ti Vector
N x N
Image
N
Free vector space
32. STEP 2: Normalize the face vector
1. Calculate the average face vector
A training set consisting of total M image
Free
……
Calculate average face vector
‘U’
Ti
U
Image converted to vector
Free vector space
33. ……
Calculate average face vector
‘U’
Then subtract mean(average) face
vector from EACH face vector to
get to get normalized face vector
Øi=Ti-U
Ti
U
1. Calculate the average
face vectors
2. Subtract avg face vector
from each face vector
A training set consisting of total M image
Image converted to vector
Free vector space
STEP 2: Normalize the face vector(Contd.)
34. ……
Øi=Ti-U
Eg. a1 – m1
a2 – m2
Ø1= . .
. .
a3 – m3
Ti
U
1. Calculate the average
face vectors
2. Subtract avg face vector
from each face vector
STEP 2: Normalize the face vector(Contd.)
Image converted to vector
A training set consisting of total M image
Free vector space
35. STEP 3: Calculate the Eigenvectors
(Eigenvectors represent the variations in the faces )
……
To calculate the eigenvectors ,
we need to calculate the
covariance vector C
C=A.AT
where A=[Ø1, Ø2, Ø3,… ØM]
N2 X M
Ti
U
A training set consisting of total M image
Image converted to vector
Free vector space
36. ……
Ti
U
C=A.AT
N2 X M M X N2 = N2 X N2
Very huge
matrix
……
N2 eigenvectorsA training set consisting of total M image
Image converted to vector
Free vector space
STEP 3: Calculate the Eigenvectors (Contd.)
37. ……
Ti
U
……
N2 eigenvectors
But we need to find only K
eigenvectors from the above
N2 eigenvectors, where K<M
Eg. If N=50 and K=100 , we
need to find 100 eigenvectors
from 2500 (i.e.N2 ) VERY TIME
CONSUMING
A training set consisting of total M image
Image converted to vector
Free vector space
STEP 3: Calculate the Eigenvectors (Contd.)
39. ……
Ti
U
……
M2 eigenvectors
New C=AT .A
M XN2 N2 X M = M XM
matrix
STEP 4: Calculating eigenvectors from
reduced covariance matrix
A training set consisting of total M image
Image converted to vector
Free vector space
40. STEP 5: Select K best eigenfaces such that
K<=M and can represent the whole training set
• Selected K eigenfaces MUST be in the ORIGINAL
dimensionality of the face Vector Space
41. STEP 6: Convert lower dimension K
eigenvectors to original face dimensionality
•
•
……
Ti
U
……
100 eigenvectors
ui = A vi
ui = ith eigenvector in the
higher dimensional space
vi = ith eigenvector in the
lower dimensional space
A training set consisting of total M image
Image converted to vector
Free vector space
44. STEP 6: Represent each face image a linear
combination of all K eigenvectors
w1 w2 w3 w4 …. wk
∑
w of mean face
We can say, the above image contains a little bit proportion of all these
eigenfaces.
w1
Ω = w2
:
wk
45. Calculating weight of each eigenfaces
The formula for calculating the weight is:
wi= Øi. Ui
For Eg.
w1= Ø1. U1
w2= Ø2. U2
46. Recognizing an unknown face
r1
r2
:
rk
Convert
the input
image to a
face vector
Normalize
the face
vector
a1 – m1
i a2 – m2
. .
. .
a3 – m3
Project
Normalized
face onto the
eigenspace
Weight vector of
input image
w1
Ω= w2
:
wk
Calculate Distance between
input weight vector and all
the weight vector of training
set
€=|Ω–Ωi|2
i=1…M
UNKNOWN FACE
NO
YES
RECOGNIZED AS
Input image of
UNKNOWN FACE
Is
Distance
=thresod
∂ ?
48. References:
www.google.com
www.fec.rec.org.
www.wikipedia.com
http://sebastianraschka.com/Articles/2014_pca_step_by_
step.html
M. Lam, H. Yan, An analytic-to-holistic approach for face
recognition based on a single frontal view, IEEE Trans.
Pattern Anal. Mach. Intel. 20 (1998) 673-686.
Zhang, Automatic adaptation of a face model using action
units for semantic coding of videophone sequences, IEEE
Trans. Circuits Systems Video Technol. 8 (6) (1998) 781-
795.