Face recognition techniques and Haar cascade detection explained
1. Face recognition (FR) has been an active research area in the past two decades with a lot of
encouraging results reported in literature (Chellappa, Wilson and Sirohey 1995, 705-740).
However, it still remains a difficult problem far from being solved. Among the various FR methodologies,
appearance-based approaches which treat the face image as a holistic pattern seem to be the most successful
(Brunelli and Poggio n.d.) with the methods such as the well-known principal component analysis (PCA) (Turk
and Pentland 1991) (an unsupervised learning technique) and linear discriminant analysis (LDA) (Belhumeur,
Hespanha and Kriegman 1997) (a supervised learning technique)
LDA
Linear Discriminant analysis (LDA) is a classification method developed in 1936 by R. A.
Fisher. It is simple, mathematically robust and often produces models whose accuracy is as
good as more complex methods. LDA is based upon the concept of searching for a linear
combination of variables that best separates two classes. Haar detection makes use of weak
classifiers.
2. What goes on during Haar Cascade Face Detection?
A square is used to identify the face because it’s a square that you need to identify the face.
The algorithm (V-J algorithm) works by looking for features and all these features are
rectangles inside the rectangles are smaller rectangles then we have pixels. The rectangles that
make up face in VJ is very small because the face is a small square on a person’s head looking
at peoples face in a really weird way.
The algorithms have a very fine tuned profiles for what makes up a face and there are four that
are common, like left eye, right eye, nose and mouth. Others are full body, glassed, clock etc.
Input Image
Apply the HAAR-like features
to the detection region
Detection Region
Adaboost Classifiers
F1
F2
F3
Fn
.. _ _
,,..,.
Object
Detected
3. The algorithm makes use of some numbers that defined the shape of what is called Haar
features. The Haar features are basically black and white rectangles which can actually be
visualize as black and white rectangles. Why they work in the algorithm is that you take the
white rect and subtract the black rectangle and then you ask, is there a diff greater or lesser than
the threshold, and if it is risen and u add it to its sum and then u tell it that sum and then if the
sum is leaving another threshold, then it says ok that was what I was looking for or not.
There are stages to the cascade file. The name cascade file implies that it has a cascading
structure. So in order for it to be efficient enough, you can use it doing real time video. It is a
very efficient algorithm and it was the one that led the explosive growth of Face detection
everywhere across the globe, it is very light weight and very efficient. It drawback though is
that it can’t do profiles very well, that is, it’s mostly targeted towards frontal images.
For the cascade stages, it uses the black and white rectangles on the face to detect the faces at
each stage of the detection phase (up to 30 stages to be precise).
On Fig. 1, there’s a black rectangle (darkness of the eye i.e. the shadow) on top of the white
rectangle (highlight of the chicks).
On Fig. 1, there’s a black rectangle (edges which are darker mostly where one would find the
hair) on top of the white rectangle (middle of the forehead which is lighter).
On Fig. 1, there’s a black rectangle (same as second) on top of the white rectangle ().
One might ask, so you’re comparing each of this rectangular patterns with the pixels in the face
underneath to see how much that particular part of the image matches that pattern? Well, what
is done is called the integral sum of that part of the image and all you do is sum black and white
pixels. All image are converted to b/w grey scale 0.255 and you take these values and u create
a sum of all the pixels in that area so then u comparing the sum of the grey scale values of the
white to the sum of the grey scale value of the white region. Basically, we can say that is the
sum of the white minus the sum of the black within a certain threshold, and if it is, then it will
leave a pass/fail stage. What makes the algorithm efficient is that it can move pass areas of an
image that are non-face, rejecting them and then focus on areas of the image that are more like
a face.
If stage 1 is ok, it goes to stage 2. Stage 2 gets little more detailed, stage 3 is a harder test to
pass, stage 4 even harder and as it goes further into the cascade the scaling is increased and it
gets more complex, and it also take more time to compute and threshold values are small. What
the algorithm does next is that it looks for overlapping regions, computes their average and
displays the image which is almost 90/99% accurate.
4. To get a correct results, the images have to be trained. Training involves taking a group of
positive and a group of negatives and you label all the faces in the negatives and then it runs
like a learning algorithm using Adaboost which accelerates. It’s really large computational
program task to figure out all the subspace because what these profiles have is rectangles that
comprise a 24 by 24 space, although some are different but 24 by 24 are the most common.
And if you think about a grey scale image, 24 by 24 image pixel, that’s a an astronomically
huge number of possibilities within that space which is really hard computing, in order to
accelerate that, the Adaboost is implemented to steer left to right depending on if its closer to
a face on each side of a test pattern.
Now, how do you tell when one particular one of those Haar features matches the part of an
image? If a Haar feature matches that part of the image, there are up to a 1000 within a full
profile to test. The Haar feature gotten will be looked at in the code, and all the pixel values
will be loaded into an array for the image and u look at only the grey scale. Within the rectangle,
u take the sum of every pixel inside the blacks and average them together and then the black
area is subtracted from the white area and then u have your left over with the value compared
to thresholds in the system to see if the leftover is within the threshold (pass) or not (fail). So,
at each stage, it take the sum of the pixels (white and black) and do the computation. Then at
the end of each stage, you’re left with the value of all the pass and fail features and if that value
is within another threshold then you stop or go on to the next stage and that’s one reason why
it’s really an efficient algorithm cos it just ignores all the features that’s not going to be a face.
What happens (in Haar Cascade) is that it is doing pattern recognition in a subset of the picture.
It’s looking for patterns of darks and lights, if u look at your face, u have a dark spot under a
lighter spot cos ur eyes when open is like a colour and there’s a shadow on ur skin. And its
looking for a lighter stripe another darker spot or line and then a lighter blob that’s called the
under of the eyes, ur eyes, and the nose between. when it gets to face, it slows down, cascading
little tiny bits at a time, running the entire frame over again for at least 3 times and if it gets the
same or similar result within about a few millimetres of each other then it stores that as a face.
A lot of techniques have been tried before but these is what was finally came up with. Polygon
matching was tried, but if we look at each other, there are a lot of different face polygons, some
have long faces, some wide, some round, some piggy eyes and all sort. So what was figured
out was that if you take a small cascade of a small subset of features, you can determine a
pattern (light and dark) which when put together create a cascade of recognition that really
works and works really well.
5. Facial Recognition is taking the faces you find and checking them against the database. When
people do some adjustments to their face (e.g. plastic surgery) or happen to have distorted face
maybe due to accidents, diseases, ailments or some other factors, the pattern of light and dark
on the face is also changed and this can really screw up a FR algorithm.