This paper presents an exhaustive component-based analysis
to identify the ethnicity from facial images. The different ethnic groups
identified are Asian, African, African American, Asian Middle East, Caucasian and Other. The classification techniques investigated include Decision Trees, Na¨ıve Bayes, Random Forest and K-Nearest Neighbor. Na¨ıve
Bayes achieved 84.7 % and 85.6 % accuracy rates for African ethnicity and
Asian ethnicity identification, respectively. The Decision Trees achieved
85.8 % for African American ethnicity identification rate, while K-Nearest
Neighbor achieved 86.8 % for Asian Middle East ethnicity and Random
Forest achieved 90.8 % for Caucasian ethnicity identification rate. This
research work achieved an overall ethnicity identification rate of 86.6 %.
Forensic Biology & Its biological significance.pdf
Component-Based Ethnicity Identification from Facial Images.pdf
1. Component-Based Ethnicity Identification
from Facial Images
A. Boyseens and S. Viriri(B)
School of Maths, Statistics and Computer Science,
University of KwaZulu-Natal, Durban, South Africa
{210501411,viriris}@ukzn.ac.za
Abstract. This paper presents an exhaustive component-based analysis
to identify the ethnicity from facial images. The different ethnic groups
identified are Asian, African, African American, Asian Middle East, Cau-
casian and Other. The classification techniques investigated include Deci-
sion Trees, Naı̈ve Bayes, Random Forest and K-Nearest Neighbor. Naı̈ve
Bayes achieved 84.7 % and 85.6 % accuracy rates for African ethnicity and
Asian ethnicity identification, respectively. The Decision Trees achieved
85.8 % for African American ethnicity identification rate, while K-Nearest
Neighbor achieved 86.8 % for Asian Middle East ethnicity and Random
Forest achieved 90.8 % for Caucasian ethnicity identification rate. This
research work achieved an overall ethnicity identification rate of 86.6 %.
1 Introduction
This paper investigates methods of analysing and identifying ethnicity of
a facial image. These ethnicities are Asian, African, African American, Asian
Middle East, Caucasian and Other. The aim of this research work is to investi-
gate a model for the efficient ethnicity identification using facial components.
Ethnicity is a socially defined category of people who are identified by each
other based on the common social, cultural and ancestral backgrounds [1]. Ethnic
facial recognition is an important biometric authentication technology, there are
various applications of facial recognition, these include law enforcement, security
systems and biometric system therefore the need for an ethnic facial recognition
system is essential.
2 Related Work
Lu and Jain [2] described techniques, which use Nearest Neighbor (NN) and
Linear Discriminant Analysis (LDA) for ethnic identification from facial images.
The two classes identified are Asian and Non-Asian ethnicity. The feature extrac-
tion technique used were Hu Moment and Zernike Moments. The experimental
results achieved an average accuracy rate of 86 %. The short falls experienced
are that it only classified for two classes Asian and non-Asian ethnicity. Another
short fall was that Product Rule was used to achieve an integrated strategy to
c
Springer International Publishing AG 2016
L.J. Chmielewski et al. (Eds.): ICCVG 2016, LNCS 9972, pp. 293–303, 2016.
DOI: 10.1007/978-3-319-46418-3 26
2. 294 A. Boyseens and S. Viriri
combine outputs and the data results achieved are estimated due to the extensive
cross-validation that was used.
Buchala et al. [3] discussed the effects that Principal Component Analysis
(PCA) has on the identification of Gender, Ethnicity, Age and Identity of facial
images. Three classes are identified these are Caucasian, African American and
East Asian. The feature extractions technique were Elastic Bunchc Graph. The
experimental results achieved an average accuracy for Ethnicity of 81.67 %. This
paper had the following short falls it only classified for three different classes
Caucasian, African American and East Asian and the authors assumed that
using Linear Discriminant Analysis (LDA) would achieve better results than
that of PCA.
Tin and Sein [4] analysis the use of Nearest Neighbor (NN) and Principal
Component Analysis (PCA) in order to achieve ethnicity identification from
facial images. This paper distinguish between two classes Myanmar and Non-
Myanmar ethnicities. The feature extraction technique used was Hu Moments.
The experimental results obtained by the paper on average are 92 % ethnic
identification accuracy. The insufficiency experienced were that it only classified
between two different classes Myanmar and Non-Myanmar. The authors assumed
that more images are needed to achieve a better result and the system needs to
be more identity-sensitive to features that are closer together.
3 Methods and Techniques
The component-based ethnicity identification system is depicted in Fig. 1.
3.1 Facial Components
The facial components are extracted using the Haar Transformation [5] which is
real and orthogonal. Haar Transformation HTn
(f) of an N-input function Xn
(f)
is the 2nd
element vector as described in Eq. (1). The Haar Transformation cross
multiplies a function with Haar matrix that contains Haar functions with differ-
ent widths at different locations. It is calculated at two levels which decomposes
the discrete signal into two components at half of the original lengths [5].
HTn
(f) = Hn
Xn
(f) (1)
where n is the number of elements in the function, Hn
is the element vector and
Xn
(f) is the 2n
element vector. The components that are extracted are the left
eye, right eye, mouth, nose, chin, forehead, left cheek and right cheek.
3.2 Feature Extraction
From each component a feature vector is obtained. Analysis was done on differ-
ent textural and structural or geometrical feature extractions, 7 Hu Moments,
Zernike Moments, Local Binary Pattern (LBP), Gabor Filter and Haralick
Texture Moments, in order to obtain the correct feature extraction for that
component.
3. Component-Based Ethnicity Identification from Facial Images 295
Fig. 1. Overview of ethnicity identification system
The Hu Moments are a set of invariant moments and it characterizes regard-
less of their scale, positions, size and orientation. These are computed by normal-
izing the central moments though order 3 [6] in terms of the central moments.
The Zernike Moments are used to overcome redundancy in which certain
geometric moments obtain [6]. They are a class of orthogonal moments which
are rotational invariant and effective in image representation. Zernike are a set
of complex, orthogonal polynomials defined as the interior of the unit circle. The
general form of the Zernike Moments is defined in Eq. (2).
Znm(x, y) = Znm(p, θ) = Rnm(p)jmθ
(2)
where x, y, p and θ correspond to Cartesian and Polar coordinates respectively,
n ∈ Z+
and m ∈ Z, constrained to n − m even, m ≤ n
Rnm(p) =
n−m
2
k=0
(−1)k
(n − k)!
k!(n+m
2 − k)!(n−m
2 − k)!
pn−2k
(3)
where Rnm(p) is a radial polynomial and k is the order.
The Haralick Texture Moments are texture features that can be used to
analyze the spatial distribution of the image’s texture features [6] with differ-
ent spatial positions and angles. Four of these Haralick Texture Moments are
computed; Energy, Entropy, Correlation and Homogeneity.
4. 296 A. Boyseens and S. Viriri
Entropy is the reflection of the disorder and complexity of the images texture.
This is defined using Eq. (4).
Entropy =
ij
ˆ
f(i, j) log ˆ
f(i, j) (4)
where ˆ
f(i, j) is the [i, j] entry if the gray level value of image matrix and i and
j are points on the image matrix.
Energy is the measure of the local homogeneity and is the opposite of Entropy.
This shows the uniformity of the images texture and is computed using the
Eq. (5).
Energy =
ij
ˆ
f(i, j)2
(5)
where ˆ
f(i, j) is the [i, j] entry if the gray level value of image matrix and i and
j are points on the image matrix.
Homogeneity is the reflection of equaliness of the images’ textures and scale
of local changes in the texture of the image. High homogeneity shows no change
between the regions with regards to images texture, this is defined in Eq. (6).
Homogeneity =
i
j
1
1 + (i − j)2
ˆ
fi,j (6)
where ˆ
f(i, j) is the [i, j] entry if the gray level value of image matrix and i and
j are points on the image matrix.
Correlation is the consistency of the images texture which is described in
Eq. (7).
Correlation =
ij
(i − μi)(j − μj) ˆ
f(i, j)
σiσj
(7)
In which μj, μi, σi and σj are described as:
μi =
n
i=1
n
j=1
i ˆ
f(i, j) μj =
n
i=1
n
j=1
j ˆ
f(i, j) (8)
σi =
n
i=1
n
j=1(i − μi)2 ˆ
f(i, j) σj =
n
i=1
n
j=1(i − μj)2 ˆ
f(i, j)
(9)
The Gabor Filters are geometric moments which are the product between
an elliptical Gaussian and a sinusoidal, [7]. Gabor elementary function can be
defined as the product of the pulse with a harmonic oscillation of frequency.
g(t) = −α2
(t−t0)2
−i2π(f−fo)+φ
(10)
where α is the time duration of the Gaussian envelope, t0 denotes the centroid,
f0 is the frequency of the sinusoidal and φ denotes the phase shift.
5. Component-Based Ethnicity Identification from Facial Images 297
Local Binary Pattern (LBP) is a geometric moment operator that described
the surrounding of the pixels by obtaining a bit code of a pixel [8], this is defined
using Eq. (11).
C =
k=7
k=0
(2k
bk) (11)
bk =
1,
k=7
k=0(tk ≥ C)
0,
k=7
k=0(tk C)
where tk is the gray scale amount. bk is the binary variables between 1 and 0
and C is a constant value of 0 and 1.
3.3 Classification
A number of different supervised and unsupervised machine learning algorithms
are used in order to identify ethnicity of the facial images. The classification
techniques used are Naı̈ve Bayes, K-Nearest Neighbor and Decision Tree.
The Decision Trees use recursive partitioning to separate the dataset by
finding the best variable [9], and using the selected variable to split the data.
Then using the entropy, defined in Eqs. (12) and (13), to calculate the difference
that variable would make on the results if it is chosen. If the entropy is 0 then
that variable is prefect to use, else a new variable needs to be selected.
H(D) = −
k
i=1
P(Ci|D) logk(P(Ci|D)) (12)
where the entropy of a sample D with repest to target variable of k possible
classes Ci.
P(Ci|D) =
numberofcorrectobservationforthatclass
totalobservationforthatclass
(13)
where the probability of class Ci in D is obtained directly from the dataset.
Naı̈ve Bayes classify an instance by assuming the presence or absence of a par-
ticular feature and sees if it is unrelated to the presence or absence of another
feature, given in the class variable [9,10]. This is done by calculating the prob-
ability for which it occurred, as defined in Eq. (14).
P(x1, ......., xn|y) =
n
i=1 P(y)P(xn|y)
P(x1, ......., xn)
(14)
where case y is a class value, attributes are x1, ........, xn and n is the sample size.
6. 298 A. Boyseens and S. Viriri
K-Nearest Neighbor (KNN) classifies by a majority vote of its neighbors,
with the case being assigned to the class with the most common amongst its
dataset. The KNN is measured by a distance function, for example Euclidean,
as defined in Eq. (15).
d =
n
i=1
(xi − qi)
2
(15)
where n is the size of the data, xi is an element in the dataset and qi is a central
point.
4 Results and Discussion
This paper used a union of four different facial image databases. The total dataset
contained 1300 facial images of 900 subjects. The subjects were divided into six
different ethnic groups (Asian, African, African American, Asian Middle East,
Caucasian and Other). Asian dataset composed of Yale [11] and FERET [12],
African dataset was composed of MUCT [13], African American was composed
of Yale [11], ORL [14] and FERET [12], Caucasian dataset was composed of Yale
[11], ORL [14], FERET [12] and MUCT [13], Asian Middle East was composed
of ORL [14], Yale [11], FERET [12] and MUCT [13] and Other was composed
of FERET [12].
Analysis was done to obtain which feature extraction technique would achieve
the most accurate True Positive Rate (TPR) for ethnic identification. These
analysis results were obtained by taking the feature vector for each component
and classifying it using K-Nearest Neighbor, to observe which feature vector
obtained the highest True Positive Rate, results obtained are shown in Table 1.
Table 1. Accuracy rates per component per feature extraction technique
7 Hu
moments
Zernike
moments
LBP Gabor
filter
Haralick
texture
moments
Nose 85.0 % 80.1 % 62.7 % 82.5 % 66.5 %
Left eye 57.2 % 76.4 % 83.7 % 73.4 % 77.1 %
Right eye 62.7 % 76.3 % 37.6 % 71.8 % 63.4 %
Mouth 89.5 % 72.1 % 76.9 % 88.2 % 70.1 %
Forehead 45.2 % 79.5 % 81.0 % 70.6 % 82.3 %
Chin 80.5 % 72.6 % 47.5 % 66.8 % 69.8 %
Left cheek 60.4 % 52.6 % 30.5 % 66.7 % 72.8 %
Right cheek 62.5 % 71.5 % 22.7 % 81.8 % 84.2 %
Zernike Moments is a geometric feature extraction technique which achieved
a TPR of 76.3 % for the right eye. LBP is a geometric feature extraction tech-
nique it achieved a TPR of 83.7 % for the left eye. These two geometric feature
7. Component-Based Ethnicity Identification from Facial Images 299
extraction technique achieved high results as the eyes are structurally different
in size and shape between ethnic groups.
Gabor Filter, 88.2 %, and 7 Hu Moment, 89.5 %, achieved a high TPR for the
mouth component. Both feature extraction techniques are textural based. These
achieved high results as the mouth is different in colour, shape and coarseness
for different ethnic groups.
The forehead, 82.3 % TPR, left cheek, 72.8 % TPR and right cheek, 84.2 %
TPR, have different textures, colours and gradients. These achieved high results
due to the fact that Haralick Texture Moments calculate the correlation and
homogeneity for each pixel.
7 Hu Moments is a textural feature extraction techniques it achieved 80.5 %
for the chin and 85.0 % for the nose. This is due to all components needing to be
identified by using the texture of the component which 7 Hu Moments achieves.
4.1 African Ethnicity Results
In Buchala et al. [3] it was shown that for African ethnicity the percentage ach-
ieved was on average 80 % for the Principal Component Analysis (PCA). These
results were obtained from a dataset size of 320 images from the FERET dataset.
The Naı̈ve Bayes produced the best results for the whole database. K-nearest
Neighbour produced high results, shown in Fig. 2.
Fig. 2. Accuracy achieved for African ethnicity
4.2 African American Ethnicity Results
Buchala et al. [3] showed that for African ethnicity the percentage achieved
was on average 80 % for the Principal Component Analysis (PCA). The results
obtained are shown in Fig. 3. The Decision Trees achieved the best results of
85.8 %.
4.3 Asian Ethnicity Results
In Lu and Juain [2] obtained an average of 97 % for Nearest Neighbour and
95 % for Linear Discriminant Analysis (LDA) for Asian identification. The Naı̈ve
Bayes produced the best accuracy rate of 86 %, and the K-nearest Neighbor
produced the accuracy rate of 75 % as shown in Fig. 4.
8. 300 A. Boyseens and S. Viriri
Fig. 3. Accuracy achieved for African American ethnicity
Fig. 4. Accuracy achieved for Asian ethnicity
4.4 Asian Middle East Ethnicity Results
Buchala et al. [3] results obtained for Asian Middle East ethnicity were on
average 83 % for the Principal Component Analysis (PCA). These results were
obtained from a dataset size of 363 images from the FERET dataset. The
K-Nearest Neighbor is the best machine learning algorithm to identify the Asian
Middle East ethnicity for all images, this produced 86.8 %, among other tech-
nique as shown in Fig. 5.
Fig. 5. Accuracy achieved for Asian Middle East ethnicity
9. Component-Based Ethnicity Identification from Facial Images 301
4.5 Caucasian Ethnicity Results
In Buchala et al. [3] results showed that for Caucasian ethnicity the percentage
achieved was 82 % for the Principal Component Analysis (PCA). These results
were obtained by using 1758 images from the FERET dataset. The Random
Forest is best used to identify the Caucasian ethnicity for all images, this pro-
duced 90.8 %. Random Forest achieved well here as most of the testing data was
made up of Caucasian images and the decision rule produced easily classified
the Caucasian ethnicity. After testing it was found that all datasets produced
on average a 90 % for ethnic accuracy identification, as shown in Fig. 6.
Fig. 6. Accuracy achieved for caucasian ethnicity
4.6 Other Ethnicity Results
Tin and Sein [4] achieved on average for other ethnicity was on average 93 % for
Nearest Neighbour (NN) and 96 % for Principal Component Analysis (PCA).
These results were obtained from a 250 images obtained from the Internet. The
K-Nearest Neighbor is best accuracy rate of 80 % as shown in Fig. 7.
Fig. 7. Accuracy achieved for other ethnicity
10. 302 A. Boyseens and S. Viriri
4.7 Results of All Six Ethnicities
A number empirical experiments were carried out to investigate which machine
learning algorithm would be suitable to determine the best results for ethnicity.
The dataset testing sizes ranged from 10 % to 100 % of the original dataset and
were tested against the different machine learning algorithms. These training
datasets were filled with randomly chosen images from the original dataset.
The tests showed that the Decision Tree machine learning algorithm achieved
86.6 % ethnicity detection rate. The worst machine learning algorithm was Ran-
dom Forest which achieved 70 % ethnic accuracy identification rate. This is a
variation of 6 % between the worst and best ethnic accuracy identification rate,
as shown in Fig. 8.
Fig. 8. Accuracy achieved for all six ethnicity
Table 2 shows the comparison between the results achieved by related works
for ethnicity identification and our work for ethnicity identification. It is seen
that the results obtained for Asian ethnicity identification were lower than that
of Lu and Jain [2], possibly due the bigger size dataset used in this work.
Table 2. Comparison of related works results to our results for ethnicity identification
Asian African African
American
Asian
Middle
East
Caucasian Other
Lu and Jain [2] 97.7 % - - - - -
Buchala et al. [3] - 80.2 % 80.0 % 83.1 % 82.0 % -
Tin and Sein [4] - - - - - 96.0 %
Our Work 85.6 % 84.7 % 85.8 % 86.8 % 90.8 % 82.3 %
5 Conclusion
This paper presented a component-based ethnicity identification from facial
images using machine learning algorithms. The ethnicities that were identi-
fied were Asian, African, African American, Asian Middle East, Caucasian and
11. Component-Based Ethnicity Identification from Facial Images 303
Other. The feature vector that was obtained were Haralick Texture Moments
for the forehead, Left cheek and right cheek. Zernike Moments was used for
the right eye and LBP for the left eye. Gabor Filter for the Mouth and 7 Hu
Moments for the chin, mouth and nose, which is fused and normalized to obtain
results. African ethnicity identification rate achieved 84.7 % with Naı̈ve Bayes,
85.8 % was achieved with Decision Tree achieved for African American ethnicity
identification rate. Naı̈ve Bayes achieved 85.6 % for Asian ethnicity identifica-
tion rate, K-Nearest Neighbor Classification achieved 86.8 % for Asian Middle
East ethnicity identification rate. 90.8 % was achieved for Caucasian ethnicity
identification rate with Random Forest achieved and K-Nearest Neighbor Clas-
sification achieved a 82.3 % for Other ethnicity identification rate. This research
achieved a total ethnicity identification rate of 86.6 %.
References
1. Lu, X.: Image analysis for face recognition. Personal notes, 5 May (2003)
2. Lu, X., Jain, A.K.: Ethnicity identification from face images. In: Defense and Secu-
rity, International Society for Optics and Photonics, pp. 114–123 (2004)
3. Buchala, S., Davey, N., Gale, T.M., Frank, R.J.: Principal component analysis of
gender, ethnicity, age, and identity of face images. In: Proceedings of IEEE ICMI
(2005)
4. Tin, H.H.K., Sein, M.M.: Race identification for face images. ACEEE Int. J. Inform.
Tech. 1(02) (2011)
5. Mulcahy, C.: Image compression using the haar wavelet transform. Spelman Sci.
Math. J. 1(1), 22–31 (1997)
6. Teague, M.R.: Image analysis via the general theory of moments. JOSA 70(8),
920–930 (1980)
7. Berisha, S.: Image classification using gabor filters and machine learning (2009)
8. Salah, S.H., Du, H., Al-Jawad, N.: Fusing local binary patterns with wavelet fea-
tures for ethnicity identification. In: Proceedings of IEEE International Conference
on Signal Image Process, vol. 21, pp. 416–422 (2013)
9. Domingos, P.: A few useful things to know about machine learning. Commun.
ACM 55(10), 78–87 (2012)
10. Lowd, D., Domingos, P.: Naive bayes models for probability estimation. In: Pro-
ceedings of the 22nd International Conference on Machine Learning, pp. 529–536.
ACM (2005)
11. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: recog-
nition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell.
19(7), 711–720 (1997)
12. Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.J.: The feret database and evalua-
tion procedure for face-recognition algorithms. Image Vis. Comput. 16(5), 295–306
(1998)
13. Milborrow, S., Morkel, J., Nicolls, F.: The MUCT landmarked face database. In:
Pattern Recognition Association of South Africa (2010)
14. Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face
identification. In: Proceedings of the Second IEEE Workshop on Applications of
Computer Vision, pp. 138–142. IEEE (1994)