Machine Learning for Mobile Vision: Face Detection and Recognition

MACHINE LEARNING -
MOBILE VISION
-Manikanta Garikipati

WHAT COMES IN FUTURE
➤ Deeper personalisation - Personalised recommendations and less
annoying ads
➤ Neural networks running on mobile devices - Mobile may start
running machine learning tasks locally Ex: face detection, speech and other
innovations.
➤ Mobile experience automation - Automate work of diﬀerent
connected apps ex: IFTT
➤ NLP will begin to make sense - In a state that it can barely beat rule
based engines. That’s an exaggeration it’s still in its infancy.
➤ Real time speech translation - If this technology continues to develop, it
could signiﬁcantly improve the quality of international communication or
even eradicate language barriers.

➤ Face Detection
➤ Natural Language Processing
➤ Barcode Detection
➤ Text Detection (OCR)
What is machine learning?
Machine learning is the idea that there are generic algorithms that can
tell you something interesting about a set of data without you having to
write any custom code specific to the problem. Instead of writing code,
you feed data to the generic algorithm and it builds its own logic based
on the data.
For example:

➤ Supervised and unsupervised learning
➤ Neural Networks
➤ Stateless vs State-full model
➤ Deeplearning
➤ Convolution(small neural networks, maxpooling)
➤ Precision(true positives vs all positive guess) and recall(true positives vs
dataset)
ML Basics

FACE DETECTION
➤ Process of locating human faces in visual media(digital images or video).
➤ Once a face is detected it can be
searched for landmarks such as eyes,
nose etc..
➤ A landmark is a point of interest
within a face. The left eye, right eye,
nose base are examples of
landmarks…
➤ We can see its common usage in
cameras(they recognise faces to make
sure all of them are in focus it takes
the picture)
➤ Social networking sites like Facebook
uses it!!

HOW TO DETECT FACES?
Face Recognition is a series of several related problems:
➤ First, look at a frame and find all faces in it.
➤ Second, focus on each face and able to understand that even if a face is
turned in a weird direction or in bad lighting, it is still the same person.
As a human, your brain is wired to do all of this automatically and instantly. In
fact, humans are too good at recognising faces and end up seeing faces in
everyday objects:

STEP 1: FINDING ALL THE FACES
➤ The first step in the pipeline is going to be finding faces in a photograph
before we try to tell them apart.
➤ More reliable solutions exist these days to recognise a face in a given frame.
We are going to look at HOG(Histogram of Oriented Gradients) solution
Mobile cameras using face detection

➤ We’ll look at every single pixel in our image one at a time. For every single
pixel, we want to look at the pixels that directly surrounding it:

➤ Our goal is to figure out how dark the current pixel is compared to the
pixels directly surrounding it. Then we want to draw an arrow showing in
which direction the image is getting darker:
➤ If you repeat that process for every single pixel in the image, you end up
with every pixel being replaced by an arrow. These arrows are called
gradients and they show the flow from light to dark across the entire image:

➤ We’ll break up the image into small squares of 16x16 pixels each. In each
square, we’ll count up how many gradients point in each major direction
(how many point up, point up-right, point right, etc…). Then we’ll replace
that square in the image with the arrow directions that were the strongest.
➤ The end result is we turn the original image into a very simple
representation that captures the basic structure of a face in a simple way:

➤ To find faces in this HOG image, all we have to do is find the part of our
image that looks the most similar to a known HOG pattern that was
extracted from a bunch of other training faces:

STEP 2: POSING AND PROJECTING FACES
➤ To account for this we will warp picture so that eyes and lips are always
in sample place in image. This makes face comparison a bit easier.
➤ To do this, we are going to use an algorithm called face landmark
estimation.
➤ There are lots of ways to do this, but we are going to use the approach
invented in 2014 by Vahid Kazemi and Josephine Sullivan
➤ Now we have to deal with the problem that faces turned different
directions look totally different to a computer

➤ The basic idea is we will come up with 68 specific points (called landmarks) that exist
on every face — the top of the chin, the outside edge of each eye, the inner edge of each
eyebrow, etc. Then we will train a machine learning algorithm to be able to find these 68
specific points on any face
The 68 landmarks we will locate on every face locating the 68 face landmarks on our test image
➤ You can also use this same technique to implement your own version of Snapchat’s
real-time 3d face filters!

➤ Now that we know were the eyes and mouth are, we’ll simply rotate, scale and shear the
image so that the eyes and mouth are centered as best as possible. We won’t do any fancy
3d warps because that would introduce distortions into the image. We are only going to
use basic image transformations like rotation and scale that preserve parallel lines (called
affine transformations):
➤ Code to find landmarks: https://gist.github.com/ageitgey/ae340db3e493530d5e1f9c15292e5c74
➤ Now no matter how the face is turned, we are able to center the eyes and mouth are in
roughly the same position in the image.

Further Reading
➤ https://github.com/cmusatyalab/openface
➤ https://developers.google.com/vision/face-detection-concepts
References
➤ https://cmusatyalab.github.io/openface/
➤ https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-
face-recognition-with-deep-learning-c3cffc121d78#.red05avch

Machine Learning for Mobile Vision: Face Detection and Recognition

Recommended

Recommended

More Related Content

Similar to Machine Learning for Mobile Vision: Face Detection and Recognition

Similar to Machine Learning for Mobile Vision: Face Detection and Recognition (20)

Recently uploaded

Recently uploaded (20)

Machine Learning for Mobile Vision: Face Detection and Recognition