Detecing facial keypoints is a very challenging problem. Facial features vary greatly from one individual to another, and even for a single individual, there is a large amount of variation due to 3D pose, size, position, viewing angle, and illumination conditions. Computer vision research has come a long way in addressing these difficulties, but there remain many oppurtunities for improvement.
In this presentation we have used different methods to recognize facial keypoints and compared their RMSE (Root Mean Square Errors) to get better results and accuracy.
2. Introduction
The goal of this project is to be able to properly label the key points on a
greyscale photograph of a human face.
We are given labelled training data consisting of 7049 images.
We used a variety of methods to use this data to perform predictions on a
test data which had 1783 images
Test data was also labelled which allowed us to measure the accuracy of each
method used
Implemented in R
3. Format of the Data
Each image was a 96 x 96 size image in greyscale
This means that each pixel was is described by a value which indicates the
intensity of grey with 0 being purely white and 255 being purely black
Each training image is labelled with the (x,y) coordinates for 15 facial
keypoints; which include the centre and corners of the eyes, eyebrows, lips,
tip of the nose etc.
Each of these labels is followed by the 9216 integers which is essentially the
greyscale image itself.
The entire data is given in a CSV file.
5. Evaluation of a Predictor
We compare the generated results with the labelled test data and calculate
the root mean square error of the results.
The root mean square error will punish large errors and give us a good
reflection of the accuracy of the predictor used.
6. Simple Means
Calculated the mean of every feature in the training data.
Applied the mean as the required answer for every test data picture.
No real analysis of the data was done, a very simplistic method
Resulted in an RMSE of 3.96244
Obviously not a very refined approach
7.
8. Image Patches
This method is similar to the simplistic means method, but instead of taking
the mean only of the point, it considers a patch of the image centered around
the keypoint.
We can consider a patch size of about 10 or 15 pixels as reasonable.
Using this method, we are able to better aggregate and generalize the results
for every image as it now looks for an entire area around the keypoint to
roughly match the average.
10. Evaluation of Mean Patches
Depending on the size of the patch, we got different results for the RMSE
Testing for patch sizes between 10 and 15, we found the optimal size to be 14 as
we ended up with an RMSE of 3.75538
11. Artificial Neural Networks
We then used neural network based classification for the data.
However, the data was just too massive to perform the entire calculation as
9216 labels had to be assigned for over 7000 images. This led to unfeasible
execution times in trying to train the neural network.
Using a decimation filter, we reduced the 96x96 size images down to a 24x24
size, and considered only half of the original training set.
However as the training data is still sizable, the plots of the neural networks
remained unreadable, but execution time was cut down on.
14. Conclusion
The improvement of the RMSE over simplistic methods when using neural
networks indicates that the features in the data are not independent of each
other.
Our earlier methods did not consider the inter-dependency of the features.
It makes intuitive sense that the data proved to be interdependent as the
features of the face generally follow a certain pattern.
Facial Keypoint Recognition is a very important field as it forms the initial
step towards more advanced application such as facial recognition and facial
expression identification
15. References
All the project data was obtained from kaggle.com
The dataset was in turn obtained from Dr. Yoshua Bengio, University of
Montreal.
R packages and tutorials from the official site : http://www.r-project.org