Facial Keypoint Detection Using R

Finding Faces:
a Kaggle Competition
Facial Keypoint Detection Using R
May 12th, 2016
Geneva Porter
Porter, Porter & Porter Inc.
270 C Street Suite B18
(619) 376-9793
geneva.porter@gmail.com

Abstract
Currently, there are demands for advancement in facial recognition
software. This report focuses on the submission process for the Kaggle Competition
“Facial Keypoints Detection.” This competition asks for an algorithm that can detect
keypoints around the eyes, nose and mouth in images with faces. After using a training
set of images to establish a baseline for keypoint location, R was used to extract a vector
that combined these averages into a composite image. By comparing this vector to the
area around each keypoint, specifically the tone of individual pixels, we were able to
approximately predict the location of these same keypoints in test images.
While our results were adequate for the purposes of this Kaggle Competition, far
more complex analysis would be needed for application in larger software such as facial
tracking, expression analysis, medical diagnosis, and face recognition. While such an
algorithm can be a useful tool for these applications, for the purposes of this report we
will keep it simple. Looking at small vectors and evaluating similarities within each
image is just a starting point for a more advanced facial keypoint detection method.

Kaggle Project & Background
The problem presented here is to form an algorithm that accurately predicts
keypoints on a face given an image containing one. We can use mathematical models to
detect subtle changes in tone variation and location of facial features in order to
formulate a process that can detect the eyes, brows, nose and mouth present in an
image. Given training data and test data, we can “train” our algorithm to evaluate an
average location and tonal area for each keypoint, and see how well it predicts keypoints
in our “test” data. Since facial features are significantly different from one person to
another, finding these needed keypoints can be very challenging. Here, we will create a
simple algorithm for detecting these keypoints based on the competition outlined on
Kaggle.com.
Data, Method & Results
Our algorithm will be “trained” using basic data provided by Dr. Yoshua Bengio
from the University of Montreal, via the Kaggle Competitions website. The data consists
of 7,049 images that have already been tagged for facial keypoints, and is considered
highly reliable. In addition, there are a series of “test” images that can be used for
evaluating the effectiveness of our constructed algorithm. Our objective is to use this
data to form a model that can identify the presence of faces as well as distinguish unique
faces from one another. This project had many applications, like social media, law
enforcement, and genealogy.

To formulate our model, we must first establish which keypoints we will be using
for facial detection. We will begin with a simple model of 15 keypoints, using the
following descriptors:
left_eye_center
right_eye_center
left_eye_inner_corner
left_eye_outer_corner
right_eye_inner_corner
right_eye_outer_corner
left_eyebrow_inner_end
left_eyebrow_outer_end
right_eyebrow_inner_end
right_eyebrow_outer_end
nose_tip
mouth_left_corner
mouth_right_corner
mouth_center_top_lip
mouth_center_bottom_lip
Here is an example of an image that highlights these keypoints. Note that green
point indicate the right eye, blue points indicate the left eye, yellow points indicate the
right brow, purple points indicate the left brow, the red point indicates the nose and the
white points indicate the mouth.

Fig.[1] A 96-square
pixel test image for keypoint
detection
Our “training data” has already identified these points on our sample images, and
we will start by finding the average location of each feature. This will tell us an
approximation of where to expect these features when processing images for facial
recognition. Here is a visual representation of where all the eyes (center), noses, and
mouths (center bottom) in
our sample images are
located:
Fig.[2] keypoints
for eyes, nose, and mouth
for all test images

As we can see, the results form an image of a scary clown. Although these results
are quite widespread, they tell us valuable information. We now have the mean location
of each feature. Once we have established this baseline, we must manipulate our
algorithm to detect outliers. Few images in the real world will be featured with a
forward, centered face that is easy to identify. Notice that in Fig.[2], there is a clear
outlier face in the bottom left corner. This face corresponds to the following image (the
keypoint for the mouth is in black, for clarity).
Fig.[3] keypoints
for an outlier face
Clearly, the location of these facial features are far from the mean. Our solution
will be to evaluate facial features relative to average tone, rather than relative to the
average location. In order to do this, we will use R coding language to evaluate images.
Our first step in creating an algorithm for reading facial recognition keypoints is
to construct a data frame in R. This is a matrix that categorizes our 30 facial recognition
keypoints into columns, and 7049 images into numbers for each row. We begin by

implementing this process with our “training” file, created with keypoints already set.
We can create a variable “im.train” that will hold the values of each image row when
split into 9,216 columns (92 x 92 = 9,216), each representing a single pixel and its
grayscale tone for each 92 x 92 pixel image.
d.train <- read.csv(train.file, stringsAsFactors=F)
im.train <- foreach(im = im.train, .combine=rbind) %do% {
as.integer(unlist(strsplit(im, " ")))}
We can convert these 9,216 numbers from each row into a 92 x 92 matrix to give
us a visual picture of what we are working with. Using the “image” function in R, we
plotted keypoints on one of our training images (Fig.[1,2]) to show an example and the
average location of important keypoints. Our primary basic algorithm simply uses these
average locations as a guide for finding facial keypoints in all other images. Using the
“colMeans” function, we can get these averages from our training images (see
Table[1]). However, these averages will not be very accurate. They will guess the
position of the eyes, nose and face regardless of the image at hand. To go further, we
must use more information--like the expected tone of each keypoint pixel--to improve
our recognition algorithm.
Table [1] Average Locations of Facial Keypoints
Keypoint X-axis Location Keypoint Y -axis Location
left_eye_center_x 66.35902 left_eye_center_y 37 .65123
right_eye_center_x 30.30610 right_eye_center_y 37 .97694
left_eye_inner_corner_x 59.15934 left_eye_inner_corner_y 37 .94475

left_eye_outer_corner_x 7 3.33048 left_eye_outer_corner_y 37 .70701
right_eye_inner_corner_x 36.65261 right_eye_inner_corner_y 37 .98990
right_eye_outer_corner_x 22.38450 right_eye_outer_corner_y 38.03350
left_eyebrow_inner_end_x 56.06851 left_eyebrow_inner_end_y 29.33268
left_eyebrow_outer_end_x 7 9.48283 left_eyebrow_outer_end_y 29.7 3486
right_eyebrow_inner_end_x 39.32214 right_eyebrow_inner_end_y 29.50300
right_eyebrow_outer_end_x 15.87118 right_eyebrow_outer_end_y 30.42817
nose_tip_x 48.37419 nose_tip_y 62.71588
mouth_left_corner_x 63.28574 mouth_left_corner_y 7 5.97071
mouth_right_corner_x 32.90040 mouth_right_corner_y 7 6.17977
mouth_center_top_lip_x 47 .97541 mouth_center_top_lip_y 7 2.91944
mouth_center_bottom_lip_x 48.56947 mouth_center_bottom_lip_y 7 8.97015
Our next building block uses image patches to add precision to our method.
Using each keypoint as an anchor, we can look at a field of pixels around an average
keypoint location and extract this area as a vector. This gives us an expected image for
that keypoint, and we can compare this amalgamation with keypoint areas on our test
images.
coord <- "nose_tip"
patch_size <- 10
coord_x <- paste(coord, "x", sep="_")
coord_y <- paste(coord, "y", sep="_")
patches <- foreach (i = 1:nrow(d.train), .combine=rbind) %do% {
im <- matrix(data = im.train[i,], nrow=96, ncol=96)
x <- d.train[i, coord_x]
y <- d.train[i, coord_y]
x1 <- (x-patch_size)
x2 <- (x+patch_size)
y1 <- (y-patch_size)
y2 <- (y+patch_size)

if ( (!is.na(x)) && (!is.na(y)) && (x1>=1) && (x2<=96) && (y1>=1)
&& (y2<=96) )
{as.vector(im[x1:x2, y1:y2]) } else {NULL}}
mean.patch <- matrix(data = colMeans(patches), nrow=2*patch_size+1,
ncol=2*patch_size+1)
This means that we can tell R to “look” for a similar vector around each average
keypoint location in our test images and adjust the keypoint to better resemble the
defined vector. We can call this our “average patch,” to represent the keypoint image
average. For example, looking at the average of the 10 pixels surrounding the nose
keypoint (giving us a 21-pixel square from 10 + 1 + 1) for each training image, we get
Fig.[4].
Fig.[4] Average image for
“nose tip” keypoints.
This looks like a good
approximation of an average nose.
By finding the position that best
matches our average patch, our algorithm becomes much more precise. For our desired
algorithm, each keypoint on each image must go through this process. Luckily, we need
not do this task for each of the 15 keypoints individually. We can make use of our
“foreach” function to compare each image patch to each test image. (See Appendix 2 for

code used.) After computing the average patches and applying them to our test images,
the submission to Kaggle is ready.
Solution & Conclusions
This method proves effective for this simple Kaggle Competition, but does not
take into account more complex situations. Profiles, out-of-focus images, or partial
faces cannot be detected. While the test images were all correctly identified, real-world
images may not be as clear. Images that have no faces, multiple faces, or faces of similar
species may not be correctly sorted, depending on the content. This project turned out
to be moderately successful. Given another test file, I would be curious to see how it this
algorithm measured up. I could definitely improve this ranking by tinkering with more
variations in tonal averages for keypoints, and perhaps implement an algorithm that
took into account keypoints in relation to each other on a single image. There are many
options to explore with this data in the future.

Kaggle Experience
When first logging onto Kaggle Competitions, I was very interested in several of
the challenges: satellite image chronology, predicting the artist of paintings, Reddit
comments, and learning the programming language Julia. However, since I had a
limited amount of time and I wanted to further improve my knowledge of R, I chose the
project “Facial Keypoints Detection.” This project had a tutorial for using R in this
context, and it seemed manageable to complete in 2 weeks.
Going through the tutorial was simple enough. Unfortunately, I don’t have a GPU
at my disposal, so some of the calculations took several minutes. At the end of the
tutorial, there were several links that offered “next steps” for taking this competition
further. I explored these sites and found that many required more powerful computers,
programming language unknown to me, and lengthy discussions on bugs and the merits
of alternative methods. As much as I would have liked to dive deeper into some of these
topics, I simply did not have the hardware, software, or knowledge to implement these
more complex approaches. I did get to explore a few forums that elaborated on this
project, and borrowed from users’ comments and feedback.
Overall, I’m very happy to have been introduced to Kaggle. Perhaps this summer,
I will have the time (and the use of a programmer friend’s computer) to learn a bit more
about the competitions that interest me and gain experience in a few more
programming languages like C++ and Python. Kaggle will definitely be a page that I visit
regularly to see updates and work through tutorials.

References
1. Kaggle Competitions, “Facial Keypoints Detection”, www.kaggle.com
2. Dr. Yoshua Bengio, University of Montreal
3. James Petterson, “Facial Keypoints Detection” (tutorial), Kaggle
4. GitHub Gist, “Code for Kaggle Getting Started with R: Facial Keypoint Detection
Competition”, www.gist.giethub.com

Appendix 1:
Kaggle Competition Results

Appendix 2:
Image Patch Implement Code
im.train <- foreach(im = d.train$Image, .combine=rbind) %do% {
im.test <- foreach(im = d.test$Image, .combine=rbind) %do% {
d.train$Image <- NULL
d.test$Image <- NULL
coordinate.names <- gsub("_x", "", names(d.train)[grep("_x",
names(d.train))])
mean.patches <- foreach(coord = coordinate.names) %do% {
cat(sprintf("computing mean patch for %sn", coord))
patches <- foreach (i = 1:nrow(d.train), .combine=rbind) %do%
{im <- matrix(data = im.train[i,], nrow=96, ncol=96)
x <- d.train[i, coord_x]
y <- d.train[i, coord_y]
x1 <- (x-patch_size)
x2 <- (x+patch_size)
y1 <- (y-patch_size)
y2 <- (y+patch_size)
if ( (!is.na(x)) && (!is.na(y)) && (x1>=1) && (x2<=96) &&
(y1>=1) && (y2<=96) )
{as.vector(im[x1:x2, y1:y2])} else
{NULL}}
matrix(data = colMeans(patches), nrow=2*patch_size+1,
ncol=2*patch_size+1)}
p <- foreach(coord_i = 1:length(coordinate.names),
.combine=cbind) %do% {
coord <- coordinate.names[coord_i]
mean_x <- mean(d.train[, coord_x], na.rm=T)
mean_y <- mean(d.train[, coord_y], na.rm=T)
x1 <- as.integer(mean_x)-search_size
x2 <- as.integer(mean_x)+search_size
y1 <- as.integer(mean_y)-search_size
y2 <- as.integer(mean_y)+search_size
x1 <- ifelse(x1-patch_size<1, patch_size+1, x1)
y1 <- ifelse(y1-patch_size<1, patch_size+1, y1)
x2 <- ifelse(x2+patch_size>96, 96-patch_size, x2)

y2 <- ifelse(y2+patch_size>96, 96-patch_size, y2)
params <- expand.grid(x = x1:x2, y = y1:y2)
r <- foreach(i = 1:nrow(d.test), .combine=rbind) %do% {
if ((coord_i==1)&&((i %% 100)==0)) {
cat(sprintf("%d/%dn", i, nrow(d.test))) }
im <- matrix(data = im.test[i,], nrow=96, ncol=96)
r <- foreach(j = 1:nrow(params), .combine=rbind) %do% {
x <- params$x[j]
y <- params$y[j]
p <- im[(x-patch_size):(x+patch_size), (y-
patch_size):(y+patch_size)]
score <- cor(as.vector(p),
as.vector(mean.patches[[coord_i]]))
score <- ifelse(is.na(score), 0, score)
data.frame(x, y, score}
best <- r[which.max(r$score), c("x", "y")]}
names(r) <- c(coord_x, coord_y)
r
}
predictions <- data.frame(ImageId = 1:nrow(d.test), p)
submission <- melt(predictions, id.vars="ImageId",
variable.name="FeatureName", value.name="Location")

Facial Keypoint Detection Using R

Recommended

Recommended

More Related Content

Similar to Facial Keypoint Detection Using R

Similar to Facial Keypoint Detection Using R (20)

Recently uploaded

Recently uploaded (20)

Facial Keypoint Detection Using R