Caricatures are facial drawings by artists with exaggeration of certain facial parts. The exaggerations are often beyond realism and yet the caricatures are still recognizable by humans. With the advent of deep learning, recognition performances by computers on real-world faces has become comparable to human performance even under unconstrained situations. However, there is still a gap in caricature recognition performance between computer and human. This is mainly due to the lack of publicly available caricature data-sets of large scale. Our project deals with two tasks: Identification of a public figure’s name and gender from its caricature and Generating a Caricature drawing from a frontal face photograph. For the former task, we use the IIIT-CFW dataset and show the accuracy obtained by our model. For the purpose of the latter task, the system learns how to exaggerate features based on training examples from a particular caricaturist.
2. Introduction
● Caricatures are facial drawings by artists with
exaggeration on certain facial parts. A caricature
can be rendered in two ways: it can be either
exaggerated or oversimplified.
3. Our Project In a Nutshell:
• Cartoon gender identification: It is binary classification problem
where given a cartoon face the system has to answer whether it is
a male or a female
• Cartoon face recognition: The problem of cartoon face
recognition is to recognize the given cartoon face as one of the C
classes.
• Cartoon face generation: Given a real face of public figure,
generate a caricature in various moods, that has exaggerated
identifiable facial features.
To further improve the accuracy of above tasks, 3 experiments for
landmark extraction of caricature images:
● Dlib facial landmark extractor
● Flood-fill algorithm
● Manual annotation assisted Keypoints extraction of Images
4. Applications
● Can help visually impaired people to
understand cartoon images or movies.
● Can be used to automatically censor
communal or politically incorrect
cartoons in the social media.
6. Summary of roles/responsibilities
● Pratik Parwal (20148093): Problem Analysis, Landmark
Detection using manually annotated dataset, report
making.
● Nikhil Agarwal (20144092): Problem Analysis, Face
averaging, Training and Transformation for caricature
generation, report making.
● Richa Pandey (20144126): Problem Analysis,Dataset
parsing, detecting face boundary using flood-fill algorithm,
report making.
● Saurav Jha (20144007): Problem Analysis,Experiments
with CNN modules, Data set augmentation, report making.
● Pramee Chowdhury (20144136):Problem Analysis, Face
detection, alignment, normalization, Java app for manual
annotation of caricature landmarks, report making.
8. Dataset Used
The recently released Cartoon
faces in the Wild (IIIT-CFW) [11] database
contains a total of 8,928 annotated images of
cartoon faces of 100 public figures. The real
human faces of these public figures have also
been provided.
A number of attributes for each cartoon
image provided.
9. Attributes in XML form
● Type of cartoon: cartoon, cartoon
sketch, caricature
● Pose: frontal, non frontal
● Expression: happy, sad, thoughtful,
seductive, sorrow, angry, serious, fright-
ened, crying, shocked
● Age group: young, old
● Gender: male, female
● Glass: yes, no
10. Data Pre-processing
● We perform XML parsing for each caricature image
to extract the information about their name and
gender.
● Each image was first converted into grayscale to
convert the pixel values in the range (0,255)
followed by the extraction of the face bounding box
and then re-sized to the 96x96 size.
● Finally, we get a list of pixel values along with their
corresponding classes ((1-100 ) for the face
recognition task and (0, 1) for the gender
identification task).
11. Facial Landmark Extraction
L1: Face Region Candidate Extraction
using Flood-Fill.
We make the following assumptions :
• The skin color of the image would be nearly even.
• There is proper distinction between a caricature face and
its surrounding.
The algorithm [14] starts from a skin color pixel (generally,
center of the image), and gradually moves to all its
neighboring pixels if the difference between the two pixels is
less than a certain threshold.
12. L2 : Dlib Face Landmark Detector
Detects a total of 68 facial landmark points
13. L3: Manual Key-points annotations
of caricature images.
We chose a subset of 20 personalities out of 100 in the
data-set and manually annotated 15 images of each. In order
to do this, a Java swing based application was built to
manually get the (x, y) coordinates of the 15 selected
landmark points.
Then the system was trained to get the
landmark points of an image given its pixel
intensities.
14. Architecture of the CNN model. Input and output are normalised. Loss function used was
‘mse’ with SGD optimizer.
15.
16. System Description
● For evaluating the accuracy of the sub-tasks for
gender identification and face verification, we
experimented with different model configurations of
Convolutional Neural Networks(CNNs).
● CNNs [3] have fully-connectedness replaced with
convolutions using filters, they perform much better
for applications that has structured inputs which can
be filtered using such convolutions.
1. System1: 1-D Convolutional Neural Nets
2. System2: 2-D Convolutional Neural Nets
3. System 3: 2-D CNN with Data Augmentation
17. SYSTEM 1: 1-D CNNs
● Filters moving in just one direction and
are generally used for problems involving
text classification and signal smoothing.
18. SYSTEM 2: 2-D CNNs
● Believed to be more suited for tasks
requiring inherent assumption of the
two dimensional spatial patterns of
images, such as ours.
19. 2-D CNN based model configuration used by us for the tasks of gender
identification and face verification.
20. SYSTEM 3: 2D CNN + Data
Augmentation
● We carried out horizontal flips of the
training images followed by a 5 degree
anti-clockwise rotation of the images.
● Horizontal flip preserves the symmetry
of human-like faces along with helping
in doubling the train data size.
21. RESULTS AND ANALYSIS
● Gender Identification and Face verification
1. Using Only List of Pixel Values
- We report our accuracy on all 100 classes using the entire
list of pixel values.
- All the three systems described above were first trained
on a total of 7138 caricature images (with a validation split
of 0.2) and then tested on 1785 images.
- Number of training instances for male faces being 5242
while that for female faces being 1896, dataset suffers
from class imbalance problem.
- Because of gender class imbalance problem, we show the
results of our systems individually on male and female
classes.
22.
23. Using Landmark points of L3 combined with
pixel values
● Effect of the landmark points on the accuracy
obtained by four machine learning classifiers, namely
LinearSVC, Random Forest, Gradient Boosting
Classifier and a Voting of the three classifiers.
● Trained and tested on total of 2,339 caricature
images belonging to 20 classes of the dataset.
● First Run: (x,y) coordinates of 15 landmark points
along with 96*96 dimensional list of pixel values fed
to the classifiers, resulting in 2946 features.
● Second Run: Only list of pixel values fed to the
classifiers, resulting in 2916 features.
24.
25. Accuracy Report
● Face verification on 100 classes:
30.96%
● Gender Identification task on 100
classes:
● Male : 79.33%
● Female : 66.17%
● Face verification using Landmark
points on 20 classes: 33.8%
27. Face Alignment and Normalization
For better result, all the images must confine to
certain standards. So, all the original
image-caricature pairs were subsequently aligned
and normalized.
28. Mesh Warping
Mesh Warping is the module we created to warp image
according to the final landmark points. We used Dlib
landmark detector for obtaining landmark points. We
used these 68 points and 8 points on the boundary of
the original face to calculate a Delaunay Triangulation.
29. Face Averaging
To calculate the mean face, we first need to calculate
the average of all landmarks coordinates in the real
faces. This is done by simply averaging the x and y
values of the landmarks coordinates. After that we used
our Mesh Warping module to calculate the average face.
30. Relationships among original image,
corresponding caricature and mean
face
When a caricaturist sees a face, he/she has the ability to
identify the distinctive facial features by comparing it
with the mean face hidden in his/her mind [7].
32. Neural Network Model
The network is capable
to learn from the
training set by
constructing an
input-output
mapping for the
problem automatically.
33. Result for Task 2:
Initial approach result( not much
exaggeration )
Second approach result by
decreasing training data.
34. Training Data given as
input to get sad caricature.
Sad face of Smriti Irani obtained after
training on Mulayam’s image.
35. CONCLUSION
In our project 2-D Convolutional Neural Network based
model fed with augmented data performed the best
pushing the accuracy to 30.96% on the face
verification task, and to 66.17% and 79.33%
respectively, on the identification of female and male
faces. In order to further improve the accuracy, we
conducted three independent experiments to capture
the facial keypoints of the caricature faces and
analyzed the detailed results which were not
satisfactory enough to be used further as feature set of
the tasks and so, we showed them only for the task of
face verification on 20 classes leading to a best
accuracy of 33.8% using
Gradient Boosting Classier.
36. FUTURE WORKS
The task of landmark detection of cartoon faces has a
substantial scope of improvement based on the
results we achieved. Improvement in face alignment
would also significantly affect the accuracy of the
results. Recent advancements of CNNs such as the
Multi-task Cascaded Convolutional Neural Networks
(MTCNN) can be trained and used for the task of joint
face detection and alignment of such cartoon faces.
37. References :
● [1] Bharadwaj, S., Bhatt, H. S., Vatsa, M., and Singh, R. Domain specific learning for
newborn face recognition. IEEE Transactions on Information Forensics and Security 11
(2016), 1630{1641.
● [2] Hsu, R.-L., and Jain, A. K. Generating discriminating cartoon faces using interacting
snakes. IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003), 1388{1398.
● [3] J urgen Schmidhuber, journal=Neural networks : the official journal of the
International Neural Network Society, y. v. p. Deep learning in neural networks: An
overview.
● [4] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. CoRR
abs/1412.6980 (2014).
● [5] Klare, B., Burge, M., Klontz, J. C., Bruegge, R. W. V., and Jain, A. K. Face recognition
performance: Role of demographic information. IEEE Transactions on Information
Forensics and Security 7 (2012), 1789{1801.
● [6] Kumar, N., Berg, A. C., Belhumeur, P. N., and Nayar, S. K. Attribute and simile
classifiers for face verification. 2009 IEEE 12th International Conference on Computer
Vision (2009), 365{372.[7] Lai, K., Chung, P., and Edirisinghe, E. Novel approach to
neural network based caricature generation.
38. ● [7] Lai, K., Chung, P., and Edirisinghe, E. Novel approach to neural
● network based caricature generation.
● [8] LeCun, Y. Gradient-based learning applied to document recognition.
● [9] Liang, L., Chen, H., Xu, Y.-Q., and Shum, H.-Y. Example-based caricature
generation with exaggeration. In Computer Graphics and Applications,2002.
Proceedings. 10th Pacic Conference on (2002), IEEE, pp. 386{393.
● [10] Liao, S., Jain, A. K., and Li, S. Z. Partial face recognition: Alignment-free approach.
IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), 1193{1205.
● [11] Mishra, A., Rai, S. N., Mishra, A., and Jawahar, C. V. Iiit-cfw: A benchmark database
of cartoon faces in the wild. In ECCV Workshops (2016).
● [12] Shan, C., Gong, S., and McOwan, P. W. Facial expression recognition based on
local binary patterns: A comprehensive study. Image Vision Comput. 27 (2009),
803{816.
● [13] Sun, Y., Liang, D., Wang, X., and Tang, X. Deepid3: Face recognition with very
deep neural networks. CoRR abs/1502.00873 (2015).
● [14] Takayama, K., Johan, H., and Nishita, T. Face detection and face recognition of
cartoon characters using feature extraction. In Image, Electronics and Visual
Computing Workshop (2012), p. 48.