SlideShare a Scribd company logo
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
DOI : 10.5121/sipij.2017.8101 1
MULTIMODAL BIOMETRICS RECOGNITION FROM
FACIAL VIDEO VIA DEEP LEARNING
Sayan Maity1
, Mohamed Abdel-Mottaleb2
and Shihab S. Asfour1
1
Industrial Engineering (IE) Department, University of Miami; 1251 Memorial Drive;
Coral Gables; Florida
2
Electrical and Computer Engineering Department, University of Miami; 1251 Memorial
Drive; Coral Gables; Florida
ABSTRACT
Biometrics identification using multiple modalities has attracted the attention of many researchers as it
produces more robust and trustworthy results than single modality biometrics. In this paper, we present a
novel multimodal recognition system that trains a Deep Learning Network to automatically learn features
after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing
different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in
the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-
redundant features. The automatically learned features are then used to train modality specific sparse
classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video
dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and
97.14% rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the
superiority and robustness of the proposed approach irrespective of the illumination, non-planar
movement, and pose variations present in the video clips.
KEYWORDS
Multimodal Biometrics, Auto-encoder, Deep Learning, Sparse Classification.
1. INTRODUCTION
There are several motivations for building robust multimodal biometric systems that extract
multiple modalities from a single source of biometrics, i.e., facial video clips. Firstly, acquiring
video clips of facial data is straight forward using conventional video cameras, which are
ubiquitous. Secondly, the nature of data collection is non-intrusive and the ear, frontal, and profile
face can appear in the same video. The proposed system, shown in Figure 1, consists of three
distinct components to perform the task of efficient multimodal recognition from facial video
clips. First, the object detection technique proposed by Viola and Jones [1], was adopted for the
automatic detection of modality specific regions from the video frames. Unconstrained facial
video clips contain significant head pose variations due to non-planar movements, and sudden
changes in facial expressions. This results in an uneven number of detected modality specific
video frames for the same subject in different video clips, and also a different number of modality
specific images for different subject. From the aspect of building a robust and accurate model, it
is always preferable to use the entire available training data. However, classification through
sparse representation (SRC) is vulnerable in the presence of uneven number of modality specific
training samples for different subjects. Thus, to overcome the vulnerability of SRC whilst using
all of the detected modality specific regions, in the model building phase we train supervised
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
denoising sparse auto-encoder to construct a mapping function. This
automatically extract the discriminative features
variances using the uneven number of
Deep Learning Network as the second component i
training sample features for the different subjects. Finally, using the modality
results, score level multimodal fusion is performed to obtain
Figure 1. System Block Diagram: Multimodal Biometrics Recognition from Facial Video
Due to the unavailability of proper datasets f
multimodal databases are synthetically obtained by pair
different databases. To the best of our
multiple modalities are extracted from a single data source that belongs to the same subject. The
main contributions of the proposed a
Network for automatic feature learning in multimodal biometrics
source of biometrics i.e., facial video data, irrespective
and pose variations present in the
The remainder of this paper is organized as follows: Section 2 details the
detection from the constrained and unconstrained
automatic feature learning using supervised denoising sparse auto
Section 4 presents the modality specific classification using
fusion. Section 5 provides the experimental
[3]) and the unconstrained facial video dataset (HONDA/UCSD [4]) to demonstrate the
performance of the proposed framework. Finally, conclusions and future research directions are
presented in Section 6.
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
encoder to construct a mapping function. This mapping function is used to
automatically extract the discriminative features preserving the robustness to the possible
variances using the uneven number of detected modality specific regions. Therefore, by applying
work as the second component in the pipeline results in an equal number of
training sample features for the different subjects. Finally, using the modality specific recognition
results, score level multimodal fusion is performed to obtain the multimodal recognition result.
System Block Diagram: Multimodal Biometrics Recognition from Facial Video
Due to the unavailability of proper datasets for multimodal recognition studies [2], often virtual
synthetically obtained by pairing modalities of different subjects from
different databases. To the best of our knowledge, the proposed approach is the first study where
are extracted from a single data source that belongs to the same subject. The
main contributions of the proposed approach is the application of training a Deep Learning
Network for automatic feature learning in multimodal biometrics recognition using a single
, facial video data, irrespective of the illumination, non-planar movement,
variations present in the face video clips.
The remainder of this paper is organized as follows: Section 2 details the modality specific frame
constrained and unconstrained facial video clips. Section 3 de
earning using supervised denoising sparse auto-encoder (deep
Section 4 presents the modality specific classification using sparse representation and multimodal
fusion. Section 5 provides the experimental results on the constrained facial video d
facial video dataset (HONDA/UCSD [4]) to demonstrate the
the proposed framework. Finally, conclusions and future research directions are
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
2
function is used to
preserving the robustness to the possible
detected modality specific regions. Therefore, by applying
n the pipeline results in an equal number of
specific recognition
the multimodal recognition result.
System Block Diagram: Multimodal Biometrics Recognition from Facial Video
ies [2], often virtual
different subjects from
knowledge, the proposed approach is the first study where
are extracted from a single data source that belongs to the same subject. The
Deep Learning
recognition using a single
planar movement,
modality specific frame
acial video clips. Section 3 describes the
coder (deep-learning).
sparse representation and multimodal
results on the constrained facial video dataset (WVU
facial video dataset (HONDA/UCSD [4]) to demonstrate the
the proposed framework. Finally, conclusions and future research directions are
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
3
2. MODALITY SPECIFIC IMAGE FRAME DETECTION
To perform multimodal biometric recognition, we first need to detect the images of the different
modalities from the facial video. The facial video clips in the constrained dataset are collected in
a controlled environment, where the camera rotates around the subject's head. The video
sequences start with the left profile of each subject (0 degrees) and proceed to the right profile
(180 degrees). Each of these video sequences contains image frames of different modalities, e.g.,
left ear, left profile face, frontal face, right profile face, and right ear, respectively. The video
sequences in the unconstrained dataset contains uncontrolled and non-uniform head rotations and
changing facial expressions. Thus, the appearance of a specific modality in a certain frame of the
unconstrained video clip is random compared with the constrained video clips.
The algorithm was trained to detect the different modalities that appear in the facial video clips.
To automate the detection process of the modality specific image frames, we adopt the Adaboost
object detection technique, proposed by Viola and Jones [1]. The algorithm is trained to detect
frontal and profile faces in the video frames, respectively, using manually cropped frontal face
images from color FERET database, and profile face images from the University of Notre Dame
Collection J2 database. Moreover, it is trained using cropped ear images from UND color ear
database to detect ear images in the video frames. By using these modality specific trained
detectors, we can detect faces and ears in the video frames. The modality specific trained
detectors are applied to the entire video sequence to detect the face and the ear regions in the
video frames.
Before using the detected modality specific regions from the video frames for extracting features,
some pre-processing steps are performed. The facial video clips recorded in the unconstrained
environment contain variations in illumination and low contrast. Histogram equalization is
performed to enhance the contrast of the images. Finally, all detected modality specific regions
from the facial video clips were resized; ear images were resized to 110 X 70 pixels and faces
images (frontal and profile) were resized to 128 X 128 pixels.
3. AUTOMATIC FEATURE LEARNING USING DEEP NEURAL NETWORK
Even though the modality specific sparse classifiers result in relatively high recognition accuracy
on the constrained face video clips, the accuracy suffers in case of unconstrained video because
the sparse classifier is vulnerable to the bias in the number of training images from different
subjects. For example, subjects in the HONDA/UCSD dataset [4] randomly change their head
pose. This results in a non-uniform number of detected modality specific video frames across
different video clips, which is not ideal to perform classification through sparse representation.
In the subsequent sections we first describe the Gabor feature extraction technique. Then, we
describe the supervised denoising sparse auto-encoders, which we use to automatically learn
equal number of feature vectors for each subject from the uneven number of modality specific
detected regions.
3.1. Feature Extraction
2D Gabor filters [5] are used in broad range of applications to extract scale and rotation invariant
feature vectors. In our feature extraction step, uniform down-sampled Gabor wavelets are
computed for the detected regions:
߰ஜ,ఔሺ‫ݖ‬ሻ =
ฮ௞ಔ,ഌฮ
మ
௦మ ℯ
ሺ
షฮೖಔ,ഌฮ
మ
‖೥‖మ
మೞమ ሻ
൤ℯ௜௞ಔ,ഌ௭
− ℯ
ೞమ
మ ൨, (1)
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
4
where z = (x, y) represents each pixel in the 2D image, ݇μ,ߥ is the wave vector, which can be
defined as ݇μ,ߥ = ݇ߥℯ݅߶‫ݑ‬ , ݇ߥ =
௞೘ೌೣ
௙ഌ , ݇௠௔௫ is the maximum frequency, and f is the spacing
factor between kernels in the frequency domain, ߶‫ݑ‬ =
గஜ
ଶ
, and the value of s determines the ratio
of the Gaussian window width to wavelength. Using equation 1, Gabor kernels can be generated
from one filter using different scaling and rotation factors. In this paper, we used five scales,
ߥ ∈ 0,… , 4 and eight orientations μ ∈ 0,… , 7. The other parameter values used are s = 2π
, ݇௠௔௫ =
గ
ଶ
, and ݂ = √2.
Before computing the Gabor features, all detected ear regions are resized to the average size of all
the ear images, i.e., 110 X 70 pixels, and all face images (frontal and profile) are resized to the
average size of all the face images, i.e., 128 X 128 pixels. Gabor features are computed by
convolving each Gabor wavelet with the detected 2D region, as follows:
‫ܥ‬ஜ,ఔሺ‫ݖ‬ሻ = ܶሺ‫ݖ‬ሻ ∗ ߰ஜ,ఔሺ‫ݖ‬ሻ, (2)
where T(z) is the detected 2D region, and z = (x, y) represents the pixel location. The feature
vector is constructed out of ‫ܥ‬ஜ,ఔ by concatenating its rows.
3.2. Supervised Stacked Denoising Auto-encoder
The application of neural networks to supervised learning [6] is well proven in different
applications including computer vision and speech recognition. An auto-encoder neural network
is an unsupervised learning algorithm, one of the commonly used building blocks in deep neural
networks, that applies backpropagation to set the target values to be equal to the inputs. The
reconstruction error between the input and the output of the network is used to adjust the weights
of each layer. An auto-encoder tries to learn a function ‫ݔ‬௜ = ‫ݔ‬పෝ, where ‫ݔ‬௜ belongs to unlabelled
training examples set {‫ݔ‬ሺଵሻ, ‫ݔ‬ሺଶሻ, ‫ݔ‬ሺଷሻ, …, ‫ݔ‬ሺ௡ሻ}, and ‫ݔ‬௜߳ ℝ௡
. In other words, it is trying to learn
an approximation to the identity function, to produce an output ‫ݔ‬ො that is similar to x, in two
subsequent stages: (i) An encoder that maps the input x to the hidden nodes through some
deterministic mapping function f : h = f(x), then (ii) A decoder that maps the hidden nodes back
to the original input space through another deterministic mapping function g : ‫ݔ‬ො = ݃ሺℎሻ. For real-
valued input, by minimizing the reconstruction error ‖‫ݔ‬ − ݃ሺ݂ሺ‫ݔ‬ሻሻ‖ଶ
, the parameters of encoder
and decoder can be learned.
To learn features, which are robust to illumination, viewing angle, pose etc., from modality
specific image regions, we adopted the supervised auto-encoder [7]. The supervised auto-encoder
is trained using features extracted from image regions (‫ݔ‬పෝ) containing variations in illumination,
viewing angle and pose whereas the features of selected image regions, (‫ݔ‬௜), with similar
illumination and without pose variations are utilized as the target. By minimizing the objective
criterion given in Equation 3 (subject to, the modality-specific features of the same person are
similar), the supervised auto-encoders learn to capture the modality specific robust representation.
݉݅݊ௐ,௕೐,௕೏
ଵ
ே
∑ ሺ‖‫ݔ‬௜ − ݃ሺ݂ሺ‫ݔ‬పෝሻሻ‖ଶ
ଶ
+ ߣ‖݂ሺ‫ݔ‬௜ሻ − ݂ሺ‫ݔ‬పෝሻ‖ଶ
ଶሻ,௜ (3)
where the output of the hidden layer, h, is defined as ℎ = ݂ሺ‫ݔ‬ሻ = tanh ሺܹ‫ݔ‬ + ܾ௘ሻ, ݃ = ℎሺ‫ݔ‬ሻ =
tanh ሺ்ܹ
ℎ + ܾௗሻ, N is the total number of training samples, and is the weight preservation term.
The first term in Equation 3 minimize the reconstruction error, i.e., after passing through the
encoder and the decoder, the variations (illumination, viewing angle and pose) of the features
extracted from the unconstrained images will be repaired. The second term in Equation 3 enforces
the similarity of modality specific features corresponding to the same person.
After training a stack of encoders its highest level output representation can be used as input to a
stand-alone supervised learning algorithm. A logistic regression (LR) layer was added on top of
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
5
the encoders as the final output layer which enable the deep neural network to perform supervised
learning. By performing gradient descent on a supervised cost function, the Supervised Stacked
Denoising Auto-encoder (SDAE) automatically learned fine-tuned network weights. Thus, the
parameters of the entire SDAE network are fine-tuned to minimize the error in predicting the
supervised target (e.g., class labels).
3.3. Training the Deep Learning Network
We adopt the two stage training of the Deep Learning Network, where we have a better
initialization to begin with and a fine tuned network weights that lead us to a more accurate high-
level representation of the dataset. The steps of two stage Deep Learning Network training are as
follows:
Step1. Stacked Denoising Autoencoders are used to train the initial network weights one layer at a
time in a greedy fashion using Deep Belief Network (DBN).
Step2. The initial weights of the Deep learning network are initialized using the learned
parameters from DBN.
Step3. Labelled training data are used as input, and their predicted classification labels obtained
from the Logistic regression layer along with the initial weights are used to apply back
propagation on the SDAE.
Step4. Back propagation is applied on the network to optimize the objective function (given in
equation 5), results in fine tune the weights and bias for the entire network.
Step5. Finally, the learned network weights and bias are used to extract image features to train the
sparse classifier.
Figure 2. Supervised Stacked Denoising Auto-encoder
The network is illustrated in Figure 2, which shows a two-category classification problem (there
are two output values), where the decoding part of SDAE is removed and the encoding part of
SDAE is retained to produce the initial features. In addition, the output layer of the whole
network, which is also called logistic regression layer, is added. The following sigmoid function
is used as activation function of the logistic regression layer:
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
6
ℎሺ‫ݔ‬ሻ =
ଵ
௘షೈೣష್ (4)
where x is the output of the last encoding layer ‫ݕ‬௟
, in other words the features are pre-trained by
the SDAE network. The output of the sigmoid function is between 0 and 1, which denotes the
classification results in case of two class classification problem. Therefore, we can use the errors
between the predicted classification results and the true labels associated with the training data
points to fine-tune the whole network weights. The cost function is defined as the following
cross-entropy function:
‫ݐݏ݋ܥ‬ = −
ଵ
௠
ቂ∑ ݈௜௠
௜ୀଵ log ቀℎ൫‫ݔ‬௜
൯ቁ + ൫1 − ݈௜
൯ log ቀ1 − ℎ൫‫ݔ‬௜
൯ቁቃ, (5)
where ݈௜
denotes the label of the sample ‫ݔ‬௜. By minimizing the cost function, we update the
network weights.
4. MODALITY SPECIFIC AND MULTIMODAL RECOGNITION
The modality specific sub-dictionaries (݀௝
௜
) contain feature vectors generated by Deep Learning
Network using the modality specific training data of each individual subject; where i represents
the modality, ݅ ߳ 1,2, … ,5; and j stands for the number of training video sequence.
Later, we concatenate the modality specific learned sub-dictionaries (݀௝
௜
) of all the subjects in the
dataset to obtain the modality specific (i.e., left ear, left profile face, frontal face, right profile
face, and right ear) dictionary ‫ܦ‬௜, as follows.
‫ܦ‬௜ = ൣ݀ଵ
௜
; ݀ଶ
௜
; … ; ݀௝
௜
൧; ∀݅߳ 1,2, … , 5 (6)
4.1. Multimodal Recognition
The recognition results from the five modalities -- left ear, left profile face, frontal face, right
profile face, and right ear are combined using score level fusion. Score level fusion has the
flexibility of fusing various modalities upon their availability. To prepare for fusion, the matching
scores obtained from the different matchers are transformed into a common domain using a score
normalization technique. Later, the weighted sum technique is used to fuse the results at the score
level. We have adopted the Tanh score normalization technique [8], which is both robust and
efficient. The normalized match scores are then fused using the weighted sum technique:
ܵ௣ = ∑ ‫ݓ‬௜ ∗ ‫ݏ‬௜
௡ெ
௜ୀଵ , (7)
where ‫ݓ‬௜ and ‫ݏ‬௜
௡
are the weight and normalized match score of the ith
modality specific classifier,
respectively, such that ∑ ‫ݓ‬௜ = 1ெ
௜ୀଵ . In this study, the weights ‫ݓ‬௜, ݅ = 1, 2, 3, 4, 5; correspond for
the left ear, left profile face, frontal face, right profile face, and right ear modalities, respectively.
These weights can be obtained by exhaustive search or based on the individual performance of
the classifiers [8]. Later, the weights for the modality specific classifiers in the score level fusion
were determined by using a separate training set with the goal of maximizing the fused
multimodal recognition accuracy.
5. EXPERIMENTAL RESULTS
In this section we describe the results of the modality specific and multi-modal recognition
experiments on both datasets. The feature vectors automatically learned using the trained Deep
Learning network resulted in length of 9600 for frontal and profile face; 4160 for ear. In order to
decrease the computational complexity and to find out most effective feature vector length to
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
7
maximize the recognition accuracy, the dimensionality of the feature vector is reduced to a lower
dimension using Principal Component Analysis (PCA) [9]. Using PCA, the number of features is
reduced to 500 and 1000. In Table- 1 the modality specific recognition accuracy obtained for the
reduced feature vector of 500, 1000 is shown. Feature vectors of length 1000 resulted in best
recognition accuracy for both modality specific and multimodal recognition.
Table 1. Modality Specific and Multimodal Rank-1 Recognition Accuracy.
Gabor
Feature
Length
Frontal
face
Left
profile
face
Right
profile
face
Left ear Right ear Multimodal
No feature
reduction
91.43% 71.43% 71.43% 85.71% 85.71% 88.57%
1000 91.43% 71.43% 74.29% 88.57% 88.57% 97.14%
500 88.57% 68.57% 68.57% 85.71% 82.86% 91.42%
The best rank-1 recognition rates, using ear, frontal and profile face modalities for multimodal
recognition, compared with the results reported in [10-12] is shown in Table 2.
Table 2. Comparison of 2D multimodal (frontal face, profile face and ear) rank-1 recognition accuracy
with the state-of-the-art techniques
Approaches Modalities Fusion
Performed In
Best Reported Rank-1 accuracy
Kisku et
al.[11]
Ear and
Frontal Face
Decision Level Ear: 93.53%; Frontal Face: 91.96%;
Profile Face: NA; Fusion: 95.53%
Pan et al. [12] Ear and
Profile Face
Feature Level Ear: 91.77%; Frontal Face: NA; Profile Face:
93.46%; Fusion: 96.84%
Boodoo et
al.[10]
Ear and
Frontal Face
Decision Level Ear: 90.7%; Frontal Face: 94.7%; Profile
Face: NA; Fusion: 96%
This Work Ear, Frontal
and Profile
Face
Score Level Ear: 95.04%; Frontal Face: 97.52%;
Profile Face: 93.39%; Fusion: 99.17%
5.1. Parameter selection for the Deep Neural Network
We have tested the performance of the proposed multimodal recognition framework against
different parameters of the Deep Neural Network. We varied the number of hidden layers from
three to seven. By using five hidden layers we achieved the best performance. The pre-training
learning rate of the DBN is used as 0.001 and the fine tuning learning rate of the SADE is used as
0.1 to achieve the optimal performance. The number of nodes in the input layer of the SADE is
equal to the size of the modality specific detected image, i.e., 7700 nodes for the left and right ear
(size of ear images: 110 X 70 pixels) and 16384 nodes for the frontal and profile face images
(size of frontal and profile face images: 128 X 128 pixels). While training the SADE network in a
Core i7-2600K CPU clocked at 3.40GHz Windows®
PC using Theano Library (Python
Programming Language) pre-training of the DBN takes approximately 600 minutes and the fine-
tuning of the SADE network converged within 48 epochs in 560.2 minutes.
6. CONCLUSIONS
We proposed a system for multimodal recognition using a single biometrics data source, i.e.,
facial video clips. Using the Adaboost detector, we automatically detect modality specific
regions. We use Gabor features extracted from the detected regions to automatically learn robust
and non-redundant features by training a Supervised Stacked Denoising Auto-encoder (Deep
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
8
Learning) network. Classification through sparse representation is used for each modality. Then,
the multimodal recognition is obtained through the fusion of the results from the modality
specific recognition.
REFERENCES
[1] Viola, P. & Jones, M. (2001) "Grid Rapid object detection using a boosted cascade of simple
features.", Computer Vision and Pattern Recognition, pp 511-518.
[2] Huang , Zengxi & Liu , Yiguang & Li , Chunguang & Yang , Menglong & Chen , Liping (2013) "A
robust face and ear based multimodal biometric system using sparse representation.", Pattern
Recognition, pp 2156-2168.
[3] Fahmy, Gamal & El-sherbeeny, Ahmed & M., Susmita and Abdel-Mottaleb, Mohamed & Ammar,
Hany (2006) "The effect of lighting direction/condition on the performance of face recognition
algorithms.", SPIE Conference on Biometrics for Human Identification, pp 188-200.
[4] Lee, K.C. & Ho, J. & Yang, M.H. & Kriegman, D. (2005) "Visual Tracking and Recognition Using
Probabilistic Appearance Manifolds.” , Computer Vision and Image Understanding.
[5] Liu, Chengjun & Wechsler, H. (2002) "Gabor feature based classification using the enhanced fisher
linear discriminant model for face recognition.", IEEE Transactions on Image Processing, pp 467-
476.
[6] Rumelhart, David E. & McClelland, James L. (1986) "Parallel Distributed Processing: Explorations in
the Microstructure of Cognition.", MIT Press. Cambridge, MA, USA.
[7] Gao, Shenghua and Zhang, Yuting and Jia, Kui and Lu, Jiwen and Zhang, Yingying (2015) "Single
Sample Face Recognition via Learning Deep Supervised Autoencoders.", IEEE Transactions on
Information Forensics and Security, pp 2108-2118.
[8] Ross, A. A. & Nandakumar, K. & Jain, A. K. (2006) "Handbook of multibiometrics." Springer.
[9] Matthew, Turk & Ale, Pentland (1991) "Eigenfaces for recognition.", J. Cognitive Neuroscience. MIT
Press, pp 71-86.
[10] Boodoo, Nazmeen Bibi & Subramanian, R. K. (2009) "Robust Multi biometric Recognition Using
Face and Ear Images.", J. CoRR.
[11] Kisku, Dakshina Ranjan & Sing, Jamuna Kanta & Gupta, Phalguni (2010) "Multibiometrics Belief
Fusion." J. CoRR.
[12] Pan, Xiuqin & Cao, Yongcun & Xu, Xiaona & Lu, Yong and Zha, Yue (2008) "Ear and face based
multimodal recognition based on KFDA." International Conference on Audio, Language and Image
Processing, pp 965-969.
AUTHORS
Sayan Maity received the M.E. in Electronics and Telecommunication Engineering
from Jadavpur University, India in 2011. He is currently working towards his PhD
degree in Industrial Engineering at the University of Miami. His research interests
include biometrics, computer vision and pattern recognition.
Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017
9
Mohamed Abdel-Mottaleb received the Ph.D. degree in computer science from the
University of Maryland, College Park, in 1993. From 1993 to 2000, he was with Philips
Research, Briarcliff Manor, NY, where he was a Principal Member of the Research Staff
and a Project Leader. At Philips Research, he led several projects in image processing
and content-based multimedia retrieval. He joined the University of Miami in 2001. He
is currently a Professor and a Chairman of the Department of Electrical and Computer
Engineering. He represented Philips in the standardization activity of ISO for MPEG-7,
where some of his work was included in the standard. He holds 22 U.S. patents and over 30 international
patents. He has authored over 120 journal and conference papers in the areas of image processing,
computer vision, and content-based retrieval. His research focuses on 3-D face and ear biometrics, dental
biometrics, visual tracking, and human activity recognition. He is an Editorial Board Member of the Pattern
Recognition journal.
Dr. Shihab Asfour received his Ph.D. in Industrial Engineering from Texas Tech
University, Lubbock, Texas. Dr. Asfour currently serves as the Associate Dean of the
College of Engineering, since August 15th, 2007, at the University of Miami. In
addition, Dr. Asfour has been serving as the Chairman of the Department of Industrial
Engineering at the University of Miami since June 1st, 1999. In addition to his
appointment in Industrial Engineering, he holds the position of Professor in the
Biomedical Engineering and the Orthopedics and Rehabilitation Departments. Dr.
Asfour has published over 250 articles in national and international journals, proceedings and books. His
publications have appeared in the Ergonomics, Human Factors, Spine, and IIE journals. He has also edited
a two-volume book titled Trends in Ergonomics/Human Factors IV published by Elsevier Science
Publishers in 1987, and the book titled Computer Aided Ergonomics, published by Taylor and Francis,
1990.

More Related Content

What's hot

Ch 2
Ch 2Ch 2
An overview of face liveness detection
An overview of face liveness detectionAn overview of face liveness detection
An overview of face liveness detection
ijitjournal
 
Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)
Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)
Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Face recognization using artificial nerual network
Face recognization using artificial nerual networkFace recognization using artificial nerual network
Face recognization using artificial nerual network
Dharmesh Tank
 
Face recognition a survey
Face recognition a surveyFace recognition a survey
Face recognition a survey
ieijjournal
 
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
ijceronline
 
Face detection and recognition
Face detection and recognitionFace detection and recognition
Face detection and recognition
Pankaj Thakur
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for Blind
IRJET Journal
 
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATIONA SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
International Journal of Technical Research & Application
 
Explaining Aluminous Ascientification Of Significance Examples Of Personal St...
Explaining Aluminous Ascientification Of Significance Examples Of Personal St...Explaining Aluminous Ascientification Of Significance Examples Of Personal St...
Explaining Aluminous Ascientification Of Significance Examples Of Personal St...
SubmissionResearchpa
 
International Journal of Image Processing (IJIP) Volume (1) Issue (2)
International Journal of Image Processing (IJIP) Volume (1) Issue (2)International Journal of Image Processing (IJIP) Volume (1) Issue (2)
International Journal of Image Processing (IJIP) Volume (1) Issue (2)
CSCJournals
 
Ijetcas14 598
Ijetcas14 598Ijetcas14 598
Ijetcas14 598
Iasir Journals
 
IRIS Based Human Recognition System
IRIS Based Human Recognition SystemIRIS Based Human Recognition System
IRIS Based Human Recognition System
CSCJournals
 
IRDO: Iris Recognition by fusion of DTCWT and OLBP
IRDO: Iris Recognition by fusion of DTCWT and OLBPIRDO: Iris Recognition by fusion of DTCWT and OLBP
IRDO: Iris Recognition by fusion of DTCWT and OLBP
IJERA Editor
 
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
IJET - International Journal of Engineering and Techniques
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
Antispoofing techniques in Facial recognition
Antispoofing techniques in Facial recognitionAntispoofing techniques in Facial recognition
Antispoofing techniques in Facial recognition
Rishabh shah
 
Biometric Iris Recognition Based on Hybrid Technique
Biometric Iris Recognition Based on Hybrid TechniqueBiometric Iris Recognition Based on Hybrid Technique
Biometric Iris Recognition Based on Hybrid Technique
ijsc
 

What's hot (18)

Ch 2
Ch 2Ch 2
Ch 2
 
An overview of face liveness detection
An overview of face liveness detectionAn overview of face liveness detection
An overview of face liveness detection
 
Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)
Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)
Face Recognition (D2L5 2017 UPC Deep Learning for Computer Vision)
 
Face recognization using artificial nerual network
Face recognization using artificial nerual networkFace recognization using artificial nerual network
Face recognization using artificial nerual network
 
Face recognition a survey
Face recognition a surveyFace recognition a survey
Face recognition a survey
 
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
 
Face detection and recognition
Face detection and recognitionFace detection and recognition
Face detection and recognition
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for Blind
 
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATIONA SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
 
Explaining Aluminous Ascientification Of Significance Examples Of Personal St...
Explaining Aluminous Ascientification Of Significance Examples Of Personal St...Explaining Aluminous Ascientification Of Significance Examples Of Personal St...
Explaining Aluminous Ascientification Of Significance Examples Of Personal St...
 
International Journal of Image Processing (IJIP) Volume (1) Issue (2)
International Journal of Image Processing (IJIP) Volume (1) Issue (2)International Journal of Image Processing (IJIP) Volume (1) Issue (2)
International Journal of Image Processing (IJIP) Volume (1) Issue (2)
 
Ijetcas14 598
Ijetcas14 598Ijetcas14 598
Ijetcas14 598
 
IRIS Based Human Recognition System
IRIS Based Human Recognition SystemIRIS Based Human Recognition System
IRIS Based Human Recognition System
 
IRDO: Iris Recognition by fusion of DTCWT and OLBP
IRDO: Iris Recognition by fusion of DTCWT and OLBPIRDO: Iris Recognition by fusion of DTCWT and OLBP
IRDO: Iris Recognition by fusion of DTCWT and OLBP
 
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Antispoofing techniques in Facial recognition
Antispoofing techniques in Facial recognitionAntispoofing techniques in Facial recognition
Antispoofing techniques in Facial recognition
 
Biometric Iris Recognition Based on Hybrid Technique
Biometric Iris Recognition Based on Hybrid TechniqueBiometric Iris Recognition Based on Hybrid Technique
Biometric Iris Recognition Based on Hybrid Technique
 

Viewers also liked

ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES
ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISESANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES
ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES
sipij
 
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTSA PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
ijsc
 
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODEL
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODELA PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODEL
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODEL
VLSICS Design
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ijnlc
 
BUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATION
BUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATIONBUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATION
BUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATION
ijnlc
 
INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...
INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...
INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...
ijsptm
 
A CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMS
A CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMSA CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMS
A CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMS
IJMIT JOURNAL
 
TRUST: DIFFERENT VIEWS, ONE GOAL
TRUST: DIFFERENT VIEWS, ONE GOALTRUST: DIFFERENT VIEWS, ONE GOAL
TRUST: DIFFERENT VIEWS, ONE GOAL
ijsptm
 
A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...
A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...
A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...
ijwmn
 
NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...
NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...
NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...
ijwmn
 
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
ijma
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
sipij
 
DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...
DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...
DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...
ijma
 
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
sipij
 
A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...
A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...
A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...
ijma
 
ASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITY
ASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITYASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITY
ASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITY
ijsc
 
BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...
BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...
BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...
ijsc
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
ijp2p
 
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMSTANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
ijnlc
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
ijnlc
 

Viewers also liked (20)

ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES
ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISESANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES
ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES
 
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTSA PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
 
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODEL
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODELA PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODEL
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODEL
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
 
BUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATION
BUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATIONBUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATION
BUILDING A SYLLABLE DATABASE TO SOLVE THE PROBLEM OF KHMER WORD SEGMENTATION
 
INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...
INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...
INNOVATIVE AND SOCIAL TECHNOLOGIES ARE REVOLUTIONIZING SAUDI CONSUMER ATTITUD...
 
A CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMS
A CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMSA CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMS
A CASE STUDY ON AUTO SOCIALIZATION IN ONLINE PLATFORMS
 
TRUST: DIFFERENT VIEWS, ONE GOAL
TRUST: DIFFERENT VIEWS, ONE GOALTRUST: DIFFERENT VIEWS, ONE GOAL
TRUST: DIFFERENT VIEWS, ONE GOAL
 
A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...
A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...
A STUDY ON QUANTITATIVE PARAMETERS OF SPECTRUM HANDOFF IN COGNITIVE RADIO NET...
 
NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...
NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...
NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELES...
 
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...
DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...
DEVELOPMENT OF AN ANDROID APPLICATION FOR OBJECT DETECTION BASED ON COLOR, SH...
 
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
 
A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...
A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...
A COLLAGE IMAGE CREATION & “KANISEI” ANALYSIS SYSTEM BY COMBINING MULTIPLE IM...
 
ASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITY
ASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITYASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITY
ASPHALTIC MATERIAL IN THE CONTEXT OF GENERALIZED POROTHERMOELASTICITY
 
BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...
BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...
BFO – AIS: A FRAME WORK FOR MEDICAL IMAGE CLASSIFICATION USING SOFT COMPUTING...
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
 
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMSTANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
 

Similar to MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING

Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning
cscpconf
 
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET Journal
 
Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...
IJECEIAES
 
C1802031219
C1802031219C1802031219
C1802031219
IOSR Journals
 
184
184184
Semantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and DescriptionSemantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and Description
Distinguished Lecturer Series - Leon The Mathematician
 
inam
inaminam
Video Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPVideo Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAP
inventionjournals
 
PAPER2
PAPER2PAPER2
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWDEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
vivatechijri
 
Exploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extractionExploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extraction
Muthu Samy
 
Exploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extractionExploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extraction
Muthu Samy
 
Human Face Identification
Human Face IdentificationHuman Face Identification
Human Face Identification
bhupesh lahare
 
Face Recognition
Face RecognitionFace Recognition
Face Recognition
Saraj Sadanand
 
An Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance SystemAn Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance System
IJERDJOURNAL
 
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTIONMUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
IRJET Journal
 
Implementation of video tagging identifying characters in video
Implementation of video tagging identifying characters in videoImplementation of video tagging identifying characters in video
Implementation of video tagging identifying characters in video
eSAT Publishing House
 
Fake Video Creation and Detection: A Review
Fake Video Creation and Detection: A ReviewFake Video Creation and Detection: A Review
Fake Video Creation and Detection: A Review
IRJET Journal
 
Internation Journal Conference
Internation Journal ConferenceInternation Journal Conference
Internation Journal Conference
Hemanth Kumar
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
IJCSEIT Journal
 

Similar to MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING (20)

Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning
 
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
 
Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...
 
C1802031219
C1802031219C1802031219
C1802031219
 
184
184184
184
 
Semantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and DescriptionSemantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and Description
 
inam
inaminam
inam
 
Video Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPVideo Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAP
 
PAPER2
PAPER2PAPER2
PAPER2
 
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWDEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
 
Exploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extractionExploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extraction
 
Exploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extractionExploring visual and motion saliency for automatic video object extraction
Exploring visual and motion saliency for automatic video object extraction
 
Human Face Identification
Human Face IdentificationHuman Face Identification
Human Face Identification
 
Face Recognition
Face RecognitionFace Recognition
Face Recognition
 
An Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance SystemAn Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance System
 
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTIONMUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
 
Implementation of video tagging identifying characters in video
Implementation of video tagging identifying characters in videoImplementation of video tagging identifying characters in video
Implementation of video tagging identifying characters in video
 
Fake Video Creation and Detection: A Review
Fake Video Creation and Detection: A ReviewFake Video Creation and Detection: A Review
Fake Video Creation and Detection: A Review
 
Internation Journal Conference
Internation Journal ConferenceInternation Journal Conference
Internation Journal Conference
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
 

Recently uploaded

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING

  • 1. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 DOI : 10.5121/sipij.2017.8101 1 MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING Sayan Maity1 , Mohamed Abdel-Mottaleb2 and Shihab S. Asfour1 1 Industrial Engineering (IE) Department, University of Miami; 1251 Memorial Drive; Coral Gables; Florida 2 Electrical and Computer Engineering Department, University of Miami; 1251 Memorial Drive; Coral Gables; Florida ABSTRACT Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a Deep Learning Network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non- redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips. KEYWORDS Multimodal Biometrics, Auto-encoder, Deep Learning, Sparse Classification. 1. INTRODUCTION There are several motivations for building robust multimodal biometric systems that extract multiple modalities from a single source of biometrics, i.e., facial video clips. Firstly, acquiring video clips of facial data is straight forward using conventional video cameras, which are ubiquitous. Secondly, the nature of data collection is non-intrusive and the ear, frontal, and profile face can appear in the same video. The proposed system, shown in Figure 1, consists of three distinct components to perform the task of efficient multimodal recognition from facial video clips. First, the object detection technique proposed by Viola and Jones [1], was adopted for the automatic detection of modality specific regions from the video frames. Unconstrained facial video clips contain significant head pose variations due to non-planar movements, and sudden changes in facial expressions. This results in an uneven number of detected modality specific video frames for the same subject in different video clips, and also a different number of modality specific images for different subject. From the aspect of building a robust and accurate model, it is always preferable to use the entire available training data. However, classification through sparse representation (SRC) is vulnerable in the presence of uneven number of modality specific training samples for different subjects. Thus, to overcome the vulnerability of SRC whilst using all of the detected modality specific regions, in the model building phase we train supervised
  • 2. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 denoising sparse auto-encoder to construct a mapping function. This automatically extract the discriminative features variances using the uneven number of Deep Learning Network as the second component i training sample features for the different subjects. Finally, using the modality results, score level multimodal fusion is performed to obtain Figure 1. System Block Diagram: Multimodal Biometrics Recognition from Facial Video Due to the unavailability of proper datasets f multimodal databases are synthetically obtained by pair different databases. To the best of our multiple modalities are extracted from a single data source that belongs to the same subject. The main contributions of the proposed a Network for automatic feature learning in multimodal biometrics source of biometrics i.e., facial video data, irrespective and pose variations present in the The remainder of this paper is organized as follows: Section 2 details the detection from the constrained and unconstrained automatic feature learning using supervised denoising sparse auto Section 4 presents the modality specific classification using fusion. Section 5 provides the experimental [3]) and the unconstrained facial video dataset (HONDA/UCSD [4]) to demonstrate the performance of the proposed framework. Finally, conclusions and future research directions are presented in Section 6. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 encoder to construct a mapping function. This mapping function is used to automatically extract the discriminative features preserving the robustness to the possible variances using the uneven number of detected modality specific regions. Therefore, by applying work as the second component in the pipeline results in an equal number of training sample features for the different subjects. Finally, using the modality specific recognition results, score level multimodal fusion is performed to obtain the multimodal recognition result. System Block Diagram: Multimodal Biometrics Recognition from Facial Video Due to the unavailability of proper datasets for multimodal recognition studies [2], often virtual synthetically obtained by pairing modalities of different subjects from different databases. To the best of our knowledge, the proposed approach is the first study where are extracted from a single data source that belongs to the same subject. The main contributions of the proposed approach is the application of training a Deep Learning Network for automatic feature learning in multimodal biometrics recognition using a single , facial video data, irrespective of the illumination, non-planar movement, variations present in the face video clips. The remainder of this paper is organized as follows: Section 2 details the modality specific frame constrained and unconstrained facial video clips. Section 3 de earning using supervised denoising sparse auto-encoder (deep Section 4 presents the modality specific classification using sparse representation and multimodal fusion. Section 5 provides the experimental results on the constrained facial video d facial video dataset (HONDA/UCSD [4]) to demonstrate the the proposed framework. Finally, conclusions and future research directions are Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 2 function is used to preserving the robustness to the possible detected modality specific regions. Therefore, by applying n the pipeline results in an equal number of specific recognition the multimodal recognition result. System Block Diagram: Multimodal Biometrics Recognition from Facial Video ies [2], often virtual different subjects from knowledge, the proposed approach is the first study where are extracted from a single data source that belongs to the same subject. The Deep Learning recognition using a single planar movement, modality specific frame acial video clips. Section 3 describes the coder (deep-learning). sparse representation and multimodal results on the constrained facial video dataset (WVU facial video dataset (HONDA/UCSD [4]) to demonstrate the the proposed framework. Finally, conclusions and future research directions are
  • 3. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 3 2. MODALITY SPECIFIC IMAGE FRAME DETECTION To perform multimodal biometric recognition, we first need to detect the images of the different modalities from the facial video. The facial video clips in the constrained dataset are collected in a controlled environment, where the camera rotates around the subject's head. The video sequences start with the left profile of each subject (0 degrees) and proceed to the right profile (180 degrees). Each of these video sequences contains image frames of different modalities, e.g., left ear, left profile face, frontal face, right profile face, and right ear, respectively. The video sequences in the unconstrained dataset contains uncontrolled and non-uniform head rotations and changing facial expressions. Thus, the appearance of a specific modality in a certain frame of the unconstrained video clip is random compared with the constrained video clips. The algorithm was trained to detect the different modalities that appear in the facial video clips. To automate the detection process of the modality specific image frames, we adopt the Adaboost object detection technique, proposed by Viola and Jones [1]. The algorithm is trained to detect frontal and profile faces in the video frames, respectively, using manually cropped frontal face images from color FERET database, and profile face images from the University of Notre Dame Collection J2 database. Moreover, it is trained using cropped ear images from UND color ear database to detect ear images in the video frames. By using these modality specific trained detectors, we can detect faces and ears in the video frames. The modality specific trained detectors are applied to the entire video sequence to detect the face and the ear regions in the video frames. Before using the detected modality specific regions from the video frames for extracting features, some pre-processing steps are performed. The facial video clips recorded in the unconstrained environment contain variations in illumination and low contrast. Histogram equalization is performed to enhance the contrast of the images. Finally, all detected modality specific regions from the facial video clips were resized; ear images were resized to 110 X 70 pixels and faces images (frontal and profile) were resized to 128 X 128 pixels. 3. AUTOMATIC FEATURE LEARNING USING DEEP NEURAL NETWORK Even though the modality specific sparse classifiers result in relatively high recognition accuracy on the constrained face video clips, the accuracy suffers in case of unconstrained video because the sparse classifier is vulnerable to the bias in the number of training images from different subjects. For example, subjects in the HONDA/UCSD dataset [4] randomly change their head pose. This results in a non-uniform number of detected modality specific video frames across different video clips, which is not ideal to perform classification through sparse representation. In the subsequent sections we first describe the Gabor feature extraction technique. Then, we describe the supervised denoising sparse auto-encoders, which we use to automatically learn equal number of feature vectors for each subject from the uneven number of modality specific detected regions. 3.1. Feature Extraction 2D Gabor filters [5] are used in broad range of applications to extract scale and rotation invariant feature vectors. In our feature extraction step, uniform down-sampled Gabor wavelets are computed for the detected regions: ߰ஜ,ఔሺ‫ݖ‬ሻ = ฮ௞ಔ,ഌฮ మ ௦మ ℯ ሺ షฮೖಔ,ഌฮ మ ‖೥‖మ మೞమ ሻ ൤ℯ௜௞ಔ,ഌ௭ − ℯ ೞమ మ ൨, (1)
  • 4. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 4 where z = (x, y) represents each pixel in the 2D image, ݇μ,ߥ is the wave vector, which can be defined as ݇μ,ߥ = ݇ߥℯ݅߶‫ݑ‬ , ݇ߥ = ௞೘ೌೣ ௙ഌ , ݇௠௔௫ is the maximum frequency, and f is the spacing factor between kernels in the frequency domain, ߶‫ݑ‬ = గஜ ଶ , and the value of s determines the ratio of the Gaussian window width to wavelength. Using equation 1, Gabor kernels can be generated from one filter using different scaling and rotation factors. In this paper, we used five scales, ߥ ∈ 0,… , 4 and eight orientations μ ∈ 0,… , 7. The other parameter values used are s = 2π , ݇௠௔௫ = గ ଶ , and ݂ = √2. Before computing the Gabor features, all detected ear regions are resized to the average size of all the ear images, i.e., 110 X 70 pixels, and all face images (frontal and profile) are resized to the average size of all the face images, i.e., 128 X 128 pixels. Gabor features are computed by convolving each Gabor wavelet with the detected 2D region, as follows: ‫ܥ‬ஜ,ఔሺ‫ݖ‬ሻ = ܶሺ‫ݖ‬ሻ ∗ ߰ஜ,ఔሺ‫ݖ‬ሻ, (2) where T(z) is the detected 2D region, and z = (x, y) represents the pixel location. The feature vector is constructed out of ‫ܥ‬ஜ,ఔ by concatenating its rows. 3.2. Supervised Stacked Denoising Auto-encoder The application of neural networks to supervised learning [6] is well proven in different applications including computer vision and speech recognition. An auto-encoder neural network is an unsupervised learning algorithm, one of the commonly used building blocks in deep neural networks, that applies backpropagation to set the target values to be equal to the inputs. The reconstruction error between the input and the output of the network is used to adjust the weights of each layer. An auto-encoder tries to learn a function ‫ݔ‬௜ = ‫ݔ‬పෝ, where ‫ݔ‬௜ belongs to unlabelled training examples set {‫ݔ‬ሺଵሻ, ‫ݔ‬ሺଶሻ, ‫ݔ‬ሺଷሻ, …, ‫ݔ‬ሺ௡ሻ}, and ‫ݔ‬௜߳ ℝ௡ . In other words, it is trying to learn an approximation to the identity function, to produce an output ‫ݔ‬ො that is similar to x, in two subsequent stages: (i) An encoder that maps the input x to the hidden nodes through some deterministic mapping function f : h = f(x), then (ii) A decoder that maps the hidden nodes back to the original input space through another deterministic mapping function g : ‫ݔ‬ො = ݃ሺℎሻ. For real- valued input, by minimizing the reconstruction error ‖‫ݔ‬ − ݃ሺ݂ሺ‫ݔ‬ሻሻ‖ଶ , the parameters of encoder and decoder can be learned. To learn features, which are robust to illumination, viewing angle, pose etc., from modality specific image regions, we adopted the supervised auto-encoder [7]. The supervised auto-encoder is trained using features extracted from image regions (‫ݔ‬పෝ) containing variations in illumination, viewing angle and pose whereas the features of selected image regions, (‫ݔ‬௜), with similar illumination and without pose variations are utilized as the target. By minimizing the objective criterion given in Equation 3 (subject to, the modality-specific features of the same person are similar), the supervised auto-encoders learn to capture the modality specific robust representation. ݉݅݊ௐ,௕೐,௕೏ ଵ ே ∑ ሺ‖‫ݔ‬௜ − ݃ሺ݂ሺ‫ݔ‬పෝሻሻ‖ଶ ଶ + ߣ‖݂ሺ‫ݔ‬௜ሻ − ݂ሺ‫ݔ‬పෝሻ‖ଶ ଶሻ,௜ (3) where the output of the hidden layer, h, is defined as ℎ = ݂ሺ‫ݔ‬ሻ = tanh ሺܹ‫ݔ‬ + ܾ௘ሻ, ݃ = ℎሺ‫ݔ‬ሻ = tanh ሺ்ܹ ℎ + ܾௗሻ, N is the total number of training samples, and is the weight preservation term. The first term in Equation 3 minimize the reconstruction error, i.e., after passing through the encoder and the decoder, the variations (illumination, viewing angle and pose) of the features extracted from the unconstrained images will be repaired. The second term in Equation 3 enforces the similarity of modality specific features corresponding to the same person. After training a stack of encoders its highest level output representation can be used as input to a stand-alone supervised learning algorithm. A logistic regression (LR) layer was added on top of
  • 5. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 5 the encoders as the final output layer which enable the deep neural network to perform supervised learning. By performing gradient descent on a supervised cost function, the Supervised Stacked Denoising Auto-encoder (SDAE) automatically learned fine-tuned network weights. Thus, the parameters of the entire SDAE network are fine-tuned to minimize the error in predicting the supervised target (e.g., class labels). 3.3. Training the Deep Learning Network We adopt the two stage training of the Deep Learning Network, where we have a better initialization to begin with and a fine tuned network weights that lead us to a more accurate high- level representation of the dataset. The steps of two stage Deep Learning Network training are as follows: Step1. Stacked Denoising Autoencoders are used to train the initial network weights one layer at a time in a greedy fashion using Deep Belief Network (DBN). Step2. The initial weights of the Deep learning network are initialized using the learned parameters from DBN. Step3. Labelled training data are used as input, and their predicted classification labels obtained from the Logistic regression layer along with the initial weights are used to apply back propagation on the SDAE. Step4. Back propagation is applied on the network to optimize the objective function (given in equation 5), results in fine tune the weights and bias for the entire network. Step5. Finally, the learned network weights and bias are used to extract image features to train the sparse classifier. Figure 2. Supervised Stacked Denoising Auto-encoder The network is illustrated in Figure 2, which shows a two-category classification problem (there are two output values), where the decoding part of SDAE is removed and the encoding part of SDAE is retained to produce the initial features. In addition, the output layer of the whole network, which is also called logistic regression layer, is added. The following sigmoid function is used as activation function of the logistic regression layer:
  • 6. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 6 ℎሺ‫ݔ‬ሻ = ଵ ௘షೈೣష್ (4) where x is the output of the last encoding layer ‫ݕ‬௟ , in other words the features are pre-trained by the SDAE network. The output of the sigmoid function is between 0 and 1, which denotes the classification results in case of two class classification problem. Therefore, we can use the errors between the predicted classification results and the true labels associated with the training data points to fine-tune the whole network weights. The cost function is defined as the following cross-entropy function: ‫ݐݏ݋ܥ‬ = − ଵ ௠ ቂ∑ ݈௜௠ ௜ୀଵ log ቀℎ൫‫ݔ‬௜ ൯ቁ + ൫1 − ݈௜ ൯ log ቀ1 − ℎ൫‫ݔ‬௜ ൯ቁቃ, (5) where ݈௜ denotes the label of the sample ‫ݔ‬௜. By minimizing the cost function, we update the network weights. 4. MODALITY SPECIFIC AND MULTIMODAL RECOGNITION The modality specific sub-dictionaries (݀௝ ௜ ) contain feature vectors generated by Deep Learning Network using the modality specific training data of each individual subject; where i represents the modality, ݅ ߳ 1,2, … ,5; and j stands for the number of training video sequence. Later, we concatenate the modality specific learned sub-dictionaries (݀௝ ௜ ) of all the subjects in the dataset to obtain the modality specific (i.e., left ear, left profile face, frontal face, right profile face, and right ear) dictionary ‫ܦ‬௜, as follows. ‫ܦ‬௜ = ൣ݀ଵ ௜ ; ݀ଶ ௜ ; … ; ݀௝ ௜ ൧; ∀݅߳ 1,2, … , 5 (6) 4.1. Multimodal Recognition The recognition results from the five modalities -- left ear, left profile face, frontal face, right profile face, and right ear are combined using score level fusion. Score level fusion has the flexibility of fusing various modalities upon their availability. To prepare for fusion, the matching scores obtained from the different matchers are transformed into a common domain using a score normalization technique. Later, the weighted sum technique is used to fuse the results at the score level. We have adopted the Tanh score normalization technique [8], which is both robust and efficient. The normalized match scores are then fused using the weighted sum technique: ܵ௣ = ∑ ‫ݓ‬௜ ∗ ‫ݏ‬௜ ௡ெ ௜ୀଵ , (7) where ‫ݓ‬௜ and ‫ݏ‬௜ ௡ are the weight and normalized match score of the ith modality specific classifier, respectively, such that ∑ ‫ݓ‬௜ = 1ெ ௜ୀଵ . In this study, the weights ‫ݓ‬௜, ݅ = 1, 2, 3, 4, 5; correspond for the left ear, left profile face, frontal face, right profile face, and right ear modalities, respectively. These weights can be obtained by exhaustive search or based on the individual performance of the classifiers [8]. Later, the weights for the modality specific classifiers in the score level fusion were determined by using a separate training set with the goal of maximizing the fused multimodal recognition accuracy. 5. EXPERIMENTAL RESULTS In this section we describe the results of the modality specific and multi-modal recognition experiments on both datasets. The feature vectors automatically learned using the trained Deep Learning network resulted in length of 9600 for frontal and profile face; 4160 for ear. In order to decrease the computational complexity and to find out most effective feature vector length to
  • 7. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 7 maximize the recognition accuracy, the dimensionality of the feature vector is reduced to a lower dimension using Principal Component Analysis (PCA) [9]. Using PCA, the number of features is reduced to 500 and 1000. In Table- 1 the modality specific recognition accuracy obtained for the reduced feature vector of 500, 1000 is shown. Feature vectors of length 1000 resulted in best recognition accuracy for both modality specific and multimodal recognition. Table 1. Modality Specific and Multimodal Rank-1 Recognition Accuracy. Gabor Feature Length Frontal face Left profile face Right profile face Left ear Right ear Multimodal No feature reduction 91.43% 71.43% 71.43% 85.71% 85.71% 88.57% 1000 91.43% 71.43% 74.29% 88.57% 88.57% 97.14% 500 88.57% 68.57% 68.57% 85.71% 82.86% 91.42% The best rank-1 recognition rates, using ear, frontal and profile face modalities for multimodal recognition, compared with the results reported in [10-12] is shown in Table 2. Table 2. Comparison of 2D multimodal (frontal face, profile face and ear) rank-1 recognition accuracy with the state-of-the-art techniques Approaches Modalities Fusion Performed In Best Reported Rank-1 accuracy Kisku et al.[11] Ear and Frontal Face Decision Level Ear: 93.53%; Frontal Face: 91.96%; Profile Face: NA; Fusion: 95.53% Pan et al. [12] Ear and Profile Face Feature Level Ear: 91.77%; Frontal Face: NA; Profile Face: 93.46%; Fusion: 96.84% Boodoo et al.[10] Ear and Frontal Face Decision Level Ear: 90.7%; Frontal Face: 94.7%; Profile Face: NA; Fusion: 96% This Work Ear, Frontal and Profile Face Score Level Ear: 95.04%; Frontal Face: 97.52%; Profile Face: 93.39%; Fusion: 99.17% 5.1. Parameter selection for the Deep Neural Network We have tested the performance of the proposed multimodal recognition framework against different parameters of the Deep Neural Network. We varied the number of hidden layers from three to seven. By using five hidden layers we achieved the best performance. The pre-training learning rate of the DBN is used as 0.001 and the fine tuning learning rate of the SADE is used as 0.1 to achieve the optimal performance. The number of nodes in the input layer of the SADE is equal to the size of the modality specific detected image, i.e., 7700 nodes for the left and right ear (size of ear images: 110 X 70 pixels) and 16384 nodes for the frontal and profile face images (size of frontal and profile face images: 128 X 128 pixels). While training the SADE network in a Core i7-2600K CPU clocked at 3.40GHz Windows® PC using Theano Library (Python Programming Language) pre-training of the DBN takes approximately 600 minutes and the fine- tuning of the SADE network converged within 48 epochs in 560.2 minutes. 6. CONCLUSIONS We proposed a system for multimodal recognition using a single biometrics data source, i.e., facial video clips. Using the Adaboost detector, we automatically detect modality specific regions. We use Gabor features extracted from the detected regions to automatically learn robust and non-redundant features by training a Supervised Stacked Denoising Auto-encoder (Deep
  • 8. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 8 Learning) network. Classification through sparse representation is used for each modality. Then, the multimodal recognition is obtained through the fusion of the results from the modality specific recognition. REFERENCES [1] Viola, P. & Jones, M. (2001) "Grid Rapid object detection using a boosted cascade of simple features.", Computer Vision and Pattern Recognition, pp 511-518. [2] Huang , Zengxi & Liu , Yiguang & Li , Chunguang & Yang , Menglong & Chen , Liping (2013) "A robust face and ear based multimodal biometric system using sparse representation.", Pattern Recognition, pp 2156-2168. [3] Fahmy, Gamal & El-sherbeeny, Ahmed & M., Susmita and Abdel-Mottaleb, Mohamed & Ammar, Hany (2006) "The effect of lighting direction/condition on the performance of face recognition algorithms.", SPIE Conference on Biometrics for Human Identification, pp 188-200. [4] Lee, K.C. & Ho, J. & Yang, M.H. & Kriegman, D. (2005) "Visual Tracking and Recognition Using Probabilistic Appearance Manifolds.” , Computer Vision and Image Understanding. [5] Liu, Chengjun & Wechsler, H. (2002) "Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition.", IEEE Transactions on Image Processing, pp 467- 476. [6] Rumelhart, David E. & McClelland, James L. (1986) "Parallel Distributed Processing: Explorations in the Microstructure of Cognition.", MIT Press. Cambridge, MA, USA. [7] Gao, Shenghua and Zhang, Yuting and Jia, Kui and Lu, Jiwen and Zhang, Yingying (2015) "Single Sample Face Recognition via Learning Deep Supervised Autoencoders.", IEEE Transactions on Information Forensics and Security, pp 2108-2118. [8] Ross, A. A. & Nandakumar, K. & Jain, A. K. (2006) "Handbook of multibiometrics." Springer. [9] Matthew, Turk & Ale, Pentland (1991) "Eigenfaces for recognition.", J. Cognitive Neuroscience. MIT Press, pp 71-86. [10] Boodoo, Nazmeen Bibi & Subramanian, R. K. (2009) "Robust Multi biometric Recognition Using Face and Ear Images.", J. CoRR. [11] Kisku, Dakshina Ranjan & Sing, Jamuna Kanta & Gupta, Phalguni (2010) "Multibiometrics Belief Fusion." J. CoRR. [12] Pan, Xiuqin & Cao, Yongcun & Xu, Xiaona & Lu, Yong and Zha, Yue (2008) "Ear and face based multimodal recognition based on KFDA." International Conference on Audio, Language and Image Processing, pp 965-969. AUTHORS Sayan Maity received the M.E. in Electronics and Telecommunication Engineering from Jadavpur University, India in 2011. He is currently working towards his PhD degree in Industrial Engineering at the University of Miami. His research interests include biometrics, computer vision and pattern recognition.
  • 9. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.1, February 2017 9 Mohamed Abdel-Mottaleb received the Ph.D. degree in computer science from the University of Maryland, College Park, in 1993. From 1993 to 2000, he was with Philips Research, Briarcliff Manor, NY, where he was a Principal Member of the Research Staff and a Project Leader. At Philips Research, he led several projects in image processing and content-based multimedia retrieval. He joined the University of Miami in 2001. He is currently a Professor and a Chairman of the Department of Electrical and Computer Engineering. He represented Philips in the standardization activity of ISO for MPEG-7, where some of his work was included in the standard. He holds 22 U.S. patents and over 30 international patents. He has authored over 120 journal and conference papers in the areas of image processing, computer vision, and content-based retrieval. His research focuses on 3-D face and ear biometrics, dental biometrics, visual tracking, and human activity recognition. He is an Editorial Board Member of the Pattern Recognition journal. Dr. Shihab Asfour received his Ph.D. in Industrial Engineering from Texas Tech University, Lubbock, Texas. Dr. Asfour currently serves as the Associate Dean of the College of Engineering, since August 15th, 2007, at the University of Miami. In addition, Dr. Asfour has been serving as the Chairman of the Department of Industrial Engineering at the University of Miami since June 1st, 1999. In addition to his appointment in Industrial Engineering, he holds the position of Professor in the Biomedical Engineering and the Orthopedics and Rehabilitation Departments. Dr. Asfour has published over 250 articles in national and international journals, proceedings and books. His publications have appeared in the Ergonomics, Human Factors, Spine, and IIE journals. He has also edited a two-volume book titled Trends in Ergonomics/Human Factors IV published by Elsevier Science Publishers in 1987, and the book titled Computer Aided Ergonomics, published by Taylor and Francis, 1990.