Human face detection is a significant problem of
image processing and is usually a first step for face
recognition and visual surveillance. This paper presents the
details of face detection approach that is implemented to
achieve accurate face detection in group color images which
are based on facial feature and Support Vector Machine. In
the first step, the proposed approach quickly separates skin
color regions from the background and from non-skin color
regions using YCbCr color space transformation. After the
detection of skin regions, the images are processed with,
wavelet transforms (WT) and discrete cosine transforms
(DCT) as a result of which the 30×30 pixel sub images are
found. These sub images are then assigned to SVM classifier
as an input. The SVM is used to classify non-face regions from
the remaining regions more accurately, that are obtained
from previous steps and having big difference between faces
regions and non-faces regions. The experimental results on
different types of group color images show that this approach
improves the detection speed and minimizes the false
detection rate in less time and detects faces in different color
images.
This paper presents the development of a camera-based assistive text reading framework to help blind persons read text labels and product packaging from hand-held objects in their daily lives. Recent developments in computer vision, digital cameras, and portable computers make it feasible to assist these individuals by developing camera-based products that combine computer vision technology with other existing commercial products such optical character recognition (OCR) systems. To automatically extract the text regions from the object, we propose a artificial neural network algorithm by learning gradient features of stroke orientations and distributions of edge pixels in an Adaboost model. Text characters in the localized text regions are binarized for processing the algorithm and the text characters are recognized by off-the-shelf OCR (Optical Character Recognition) and other process involved . Now the binarized signals are converted to audible signal. The working principle is as follows first the respected image will be captured and then it is converted to binary signals. Now the image is diagnosed to find whether the text is present in the image. Secondly, if the text is present, then the object of interest is detected. The respected text of the image is recognized and then converted to audible signals. Thus the recognized text codes are given as speech to the user.
Face Recognition Methods based on Convolutional Neural NetworksElaheh Rashedi
Convolutional neural networks (CNN) have improved the state of the art in many applications, especially the face recognition area. In this work, we present a review on latest face verification techniques based on Convolutional Neural Networks. In addition, we give a comparison on these techniques regarding their architecture, depth level, number of parameters in the network, and the obtained accuracy in identification and/or verification. Furthermore, as the availability of large-scale training dataset has significantly affected the performance of CNN based recognition methods, we present a preface to the most common large-scale face datasets, and then we describe some of the successful automatic data collection procedures.
Human face detection is a significant problem of
image processing and is usually a first step for face
recognition and visual surveillance. This paper presents the
details of face detection approach that is implemented to
achieve accurate face detection in group color images which
are based on facial feature and Support Vector Machine. In
the first step, the proposed approach quickly separates skin
color regions from the background and from non-skin color
regions using YCbCr color space transformation. After the
detection of skin regions, the images are processed with,
wavelet transforms (WT) and discrete cosine transforms
(DCT) as a result of which the 30×30 pixel sub images are
found. These sub images are then assigned to SVM classifier
as an input. The SVM is used to classify non-face regions from
the remaining regions more accurately, that are obtained
from previous steps and having big difference between faces
regions and non-faces regions. The experimental results on
different types of group color images show that this approach
improves the detection speed and minimizes the false
detection rate in less time and detects faces in different color
images.
This paper presents the development of a camera-based assistive text reading framework to help blind persons read text labels and product packaging from hand-held objects in their daily lives. Recent developments in computer vision, digital cameras, and portable computers make it feasible to assist these individuals by developing camera-based products that combine computer vision technology with other existing commercial products such optical character recognition (OCR) systems. To automatically extract the text regions from the object, we propose a artificial neural network algorithm by learning gradient features of stroke orientations and distributions of edge pixels in an Adaboost model. Text characters in the localized text regions are binarized for processing the algorithm and the text characters are recognized by off-the-shelf OCR (Optical Character Recognition) and other process involved . Now the binarized signals are converted to audible signal. The working principle is as follows first the respected image will be captured and then it is converted to binary signals. Now the image is diagnosed to find whether the text is present in the image. Secondly, if the text is present, then the object of interest is detected. The respected text of the image is recognized and then converted to audible signals. Thus the recognized text codes are given as speech to the user.
Face Recognition Methods based on Convolutional Neural NetworksElaheh Rashedi
Convolutional neural networks (CNN) have improved the state of the art in many applications, especially the face recognition area. In this work, we present a review on latest face verification techniques based on Convolutional Neural Networks. In addition, we give a comparison on these techniques regarding their architecture, depth level, number of parameters in the network, and the obtained accuracy in identification and/or verification. Furthermore, as the availability of large-scale training dataset has significantly affected the performance of CNN based recognition methods, we present a preface to the most common large-scale face datasets, and then we describe some of the successful automatic data collection procedures.
Face detection is one of the most suitable applications for image processing and biometric programs. Artificial neural networks have been used in the many field like image processing, pattern recognition, sales forecasting, customer research and data validation. Face detection and recognition have become one of the most popular biometric techniques over the past few years. There is a lack of research literature that provides an overview of studies and research-related research of Artificial neural networks face detection. Therefore, this study includes a review of facial recognition studies as well systems based on various Artificial neural networks methods and algorithms.
The following resources come from the 2009/10 BSc (Hons) in Multimedia Technology (course number 2ELE0075) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
The objectives of this project are to demonstrate abilities to:
• Handle camera setup, calibrate and capture still and video faces
• Pre-process images and extract features
• Perform face recognition by a) using existing methods and b) trying new techniques.
This project requires the students to apply their abilities to handle image capture hardware and software. Since this is an active area of research, students will need to perform literature survey and discuss ( through brainstorm sessions) their performance characteristics. In addition, they will need to design and implement pre-processing and recognition codes leading to face recognition.
FUSION BASED MULTIMODAL AUTHENTICATION IN BIOMETRICS USING CONTEXT-SENSITIVE ...cscpconf
Biometrics is one of the primary key concepts of real application domains such as aadhar card, passport, pan card, etc. In such applications user can provide two to three biometrics patterns
like face, finger, palm, signature, iris data, and so on. We considered face and finger patterns
for encoding and then also for verification. Using this data we proposed a novel model for
authentication in multimodal biometrics often called Context-Sensitive Exponent Associative Memory Model (CSEAM). It provides different stages of security for biometrics patterns. In
stage 1, face and finger patterns can be fusion through Principal Component Analysis (PCA), in stage 2 by applying SVD decomposition to generate keys from the fusion data and preprocessed face pattern and then in stage 3, using CSEAM model the generated keys can be encoded. The final key will be stored in the smart cards. In CSEAM model, exponential
kronecker product plays a critical role for encoding and also for verification to verify the chosen samples from the users. This paper discusses by considering realistic biometric data in
terms of time and space
Face Detection and Recognition System (FDRS) is a physical characteristics recognition technology, using the inherent physiological features of humans for ID recognition. The technology does not need to be carried about and will not be lost, so it is convenient and safe for use
A hybrid approach for face recognition using a convolutional neural network c...IAESIJAI
Facial recognition technology has been used in many fields such as security,
biometric identification, robotics, video surveillance, health, and commerce
due to its ease of implementation and minimal data processing time.
However, this technology is influenced by the presence of variations such as
pose, lighting, or occlusion. In this paper, we propose a new approach to
improve the accuracy rate of face recognition in the presence of variation or
occlusion, by combining feature extraction with a histogram of oriented
gradient (HOG), scale invariant feature transform (SIFT), Gabor, and the
Canny contour detector techniques, as well as a convolutional neural
network (CNN) architecture, tested with several combinations of the
activation function used (Softmax and Segmoïd) and the optimization
algorithm used during training (adam, Adamax, RMSprop, and stochastic
gradient descent (SGD)). For this, a preprocessing was performed on two
databases of our database of faces (ORL) and Sheffield faces used, then we
perform a feature extraction operation with the mentioned techniques and
then pass them to our used CNN architecture. The results of our simulations
show a high performance of the SIFT+CNN combination, in the case of the
presence of variations with an accuracy rate up to 100%.
Deep hypersphere embedding for real-time face recognitionTELKOMNIKA JOURNAL
With the advancement of human-computer interaction capabilities of robots, computer vision surveillance systems involving security yields a large impact in the research industry by helping in digitalization of certain security processes. Recognizing a face in the computer vision involves identification and classification of which faces belongs to the same person by means of comparing face embedding vectors. In an organization that has a large and diverse labelled dataset on a large number of epoch, oftentimes, creates a training difficulties involving incompatibility in different versions of face embedding that leads to poor face recognition accuracy. In this paper, we will design and implement robotic vision security surveillance system incorporating hybrid combination of MTCNN for face detection, and FaceNet as the unified embedding for face recognition and clustering.
Face detection is one of the most suitable applications for image processing and biometric programs. Artificial neural networks have been used in the many field like image processing, pattern recognition, sales forecasting, customer research and data validation. Face detection and recognition have become one of the most popular biometric techniques over the past few years. There is a lack of research literature that provides an overview of studies and research-related research of Artificial neural networks face detection. Therefore, this study includes a review of facial recognition studies as well systems based on various Artificial neural networks methods and algorithms.
The following resources come from the 2009/10 BSc (Hons) in Multimedia Technology (course number 2ELE0075) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
The objectives of this project are to demonstrate abilities to:
• Handle camera setup, calibrate and capture still and video faces
• Pre-process images and extract features
• Perform face recognition by a) using existing methods and b) trying new techniques.
This project requires the students to apply their abilities to handle image capture hardware and software. Since this is an active area of research, students will need to perform literature survey and discuss ( through brainstorm sessions) their performance characteristics. In addition, they will need to design and implement pre-processing and recognition codes leading to face recognition.
FUSION BASED MULTIMODAL AUTHENTICATION IN BIOMETRICS USING CONTEXT-SENSITIVE ...cscpconf
Biometrics is one of the primary key concepts of real application domains such as aadhar card, passport, pan card, etc. In such applications user can provide two to three biometrics patterns
like face, finger, palm, signature, iris data, and so on. We considered face and finger patterns
for encoding and then also for verification. Using this data we proposed a novel model for
authentication in multimodal biometrics often called Context-Sensitive Exponent Associative Memory Model (CSEAM). It provides different stages of security for biometrics patterns. In
stage 1, face and finger patterns can be fusion through Principal Component Analysis (PCA), in stage 2 by applying SVD decomposition to generate keys from the fusion data and preprocessed face pattern and then in stage 3, using CSEAM model the generated keys can be encoded. The final key will be stored in the smart cards. In CSEAM model, exponential
kronecker product plays a critical role for encoding and also for verification to verify the chosen samples from the users. This paper discusses by considering realistic biometric data in
terms of time and space
Face Detection and Recognition System (FDRS) is a physical characteristics recognition technology, using the inherent physiological features of humans for ID recognition. The technology does not need to be carried about and will not be lost, so it is convenient and safe for use
A hybrid approach for face recognition using a convolutional neural network c...IAESIJAI
Facial recognition technology has been used in many fields such as security,
biometric identification, robotics, video surveillance, health, and commerce
due to its ease of implementation and minimal data processing time.
However, this technology is influenced by the presence of variations such as
pose, lighting, or occlusion. In this paper, we propose a new approach to
improve the accuracy rate of face recognition in the presence of variation or
occlusion, by combining feature extraction with a histogram of oriented
gradient (HOG), scale invariant feature transform (SIFT), Gabor, and the
Canny contour detector techniques, as well as a convolutional neural
network (CNN) architecture, tested with several combinations of the
activation function used (Softmax and Segmoïd) and the optimization
algorithm used during training (adam, Adamax, RMSprop, and stochastic
gradient descent (SGD)). For this, a preprocessing was performed on two
databases of our database of faces (ORL) and Sheffield faces used, then we
perform a feature extraction operation with the mentioned techniques and
then pass them to our used CNN architecture. The results of our simulations
show a high performance of the SIFT+CNN combination, in the case of the
presence of variations with an accuracy rate up to 100%.
Deep hypersphere embedding for real-time face recognitionTELKOMNIKA JOURNAL
With the advancement of human-computer interaction capabilities of robots, computer vision surveillance systems involving security yields a large impact in the research industry by helping in digitalization of certain security processes. Recognizing a face in the computer vision involves identification and classification of which faces belongs to the same person by means of comparing face embedding vectors. In an organization that has a large and diverse labelled dataset on a large number of epoch, oftentimes, creates a training difficulties involving incompatibility in different versions of face embedding that leads to poor face recognition accuracy. In this paper, we will design and implement robotic vision security surveillance system incorporating hybrid combination of MTCNN for face detection, and FaceNet as the unified embedding for face recognition and clustering.
Innovative Analytic and Holistic Combined Face Recognition and Verification M...ijbuiiir1
Automatic recognition and verification of human faces is a significant problem in the development and application of Human Computer Interaction (HCI).In addition, the demand for reliable personal identification in computerized access control has resulted in an increased interest in biometrics to replace password and identification (ID) card. Over the last couple of years, face recognition researchers have been developing new techniques fuelled by the advances in computer vision techniques, Design of computers, sensors and in fast emerging face recognition systems. In this paper, a Face Recognition and Verification System has been designed which is robust to variations of illumination, pose and facial expression but very sensitive to variations of the features of the face. This design reckons in the holistic or global as well as the analyticor geometric features of the face of the human beings. The global structure of the human face is analysed by Principal Component Analysis while the features of the local structure are computed considering the geometric features of the face such as the eyes, nose and the mouth. The extracted local features of the face are trained and later tested using Artificial Neural Network (ANN). This combined approach of the global and the local structure of the face image is proved very effective in the system we have designed as it has a correct recognition rate of over 90%.
Real Time Face Recognition Based on Face Descriptor and Its ApplicationTELKOMNIKA JOURNAL
This paper presents a real time face recognition based on face descriptor and its application for door
locking. The face descriptor is represented by both local and global information. The local information, which
is the dominant frequency content of sub-face, is extracted by zoned discrete cosine transforms (DCT). While
the global information, which also is the dominant frequency content and shape information of the whole face,
is extracted by DCT and by Hu-moment. Therefore, face descriptor has rich information about a face image
which tends to provide good performance for real time face recognition. To decrease the dimensional size of
face descriptor, the predictive linear discriminant analysis (PDLDA) is employed and the face classification
is done by kNN. The experimental results show that the proposed real time face recognition provides good
performances which indicated by 98.30%, 21.99%, and 1.8% of accuracy, FPR, and FNR respectively. In
addition, it also needs short computational time (1 second).
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...Hadi Santoso
Face detection is a very important biometric application in the field of image
analysis and computer vision. The basic face detection method is AdaBoost
algorithm with a cascading Haar-like feature classifiers based on the
framework proposed by Viola and Jones. Real-time multiple-face detection,
for instance on CCTVs with high resolution, is a computation-intensive
procedure. If the procedure is performed sequentially, an optimal real-time
performance will not be achieved. In this paper we propose an architectural
design for a parallel and multiple-face detection technique based on Viola
and Jones' framework. To do this systematically, we look at the problem
from 4 points of view, namely: data processing taxonomy, parallel memory
architecture, the model of parallel programming, as well as the design of
parallel program. We also build a prototype of the proposed parallel
technique and conduct a series of experiments to investigate the gained
acceleration.
Face Recognition Based Intelligent Door Control Systemijtsrd
This paper presents the intelligent door control system based on face detection and recognition. This system can avoid the need to control by persons with the use of keys, security cards, password or pattern to open the door. The main objective is to develop a simple and fast recognition system for personal identification and face recognition to provide the security system. Face is a complex multidimensional structure and needs good computing techniques for recognition. The system is composed of two main parts face recognition and automatic door access control. It needs to detect the face before recognizing the face of the person. In face detection step, Viola Jones face detection algorithm is applied to detect the human face. Face recognition is implemented by using the Principal Component Analysis PCA and Neural Network. Image processing toolbox which is in MATLAB 2013a is used for the recognition process in this research. The PIC microcontroller is used to automatic door access control system by programming MikroC language. The door is opened automatically for the known person according to the result of verification in the MATLAB. On the other hand, the door remains closed for the unknown person. San San Naing | Thiri Oo Kywe | Ni Ni San Hlaing ""Face Recognition Based Intelligent Door Control System"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23893.pdf
Paper URL: https://www.ijtsrd.com/engineering/electrical-engineering/23893/face-recognition-based-intelligent-door-control-system/san-san-naing
Facial recognition based on enhanced neural networkIAESIJAI
Accurate automatic face recognition (FR) has only become a practical goal of biometrics research in recent years. Detection and recognition are the primary steps for identifying faces in this research, and The Viola-Jones algorithm implements to discover faces in images. This paper presents a neural network solution called modify bidirectional associative memory (MBAM). The basic idea is to recognize the image of a human's face, extract the face image, enter it into the MBAM, and identify it. The output ID for the face image from the network should be similar to the ID for the image entered previously in the training phase. The tests have conducted using the suggested model using 100 images. Results show that FR accuracy is 100% for all images used, and the accuracy after adding noise is the proportions that differ between the images used according to the noise ratio. Recognition results for the mobile camera images were more satisfactory than those for the Face94 dataset.
Realtime human face tracking and recognition system on uncontrolled environmentIJECEIAES
Recently, one of the most important biometrics is that automatically recognized human faces are based on dynamic facial images with different rotations and backgrounds. This paper presents a real-time system for human face tracking and recognition with various expressions of the face, poses, and rotations in an uncontrolled environment (dynamic background). Many steps are achieved in this paper to enhance, detect, and recognize the faces from the image frame taken by web-camera. The system has three steps: the first is to detect the face, Viola-Jones algorithm is used to achieve this purpose for frontal and profile face detection. In the second step, the color space algorithm is used to track the detected face from the previous step. The third step, principal component analysis (eigenfaces) algorithm is used to recognize faces. The result shows the effectiveness and robustness depending on the training and testing results. The real-time system result is compared with the results of the previous papers and gives a success, effectiveness, and robustness recognition rate of 91.12% with a low execution time. However, the execution time is not fixed due depending on the frame background and specification of the web camera and computer.
The goal of this report is the presentation of our biometry and security course’s project: Face recognition for Labeled Faces in the Wild dataset using Convolutional Neural Network technology with Graphlab Framework.
Similar to Real time multi face detection using deep learning (20)
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
2. orientation in localized portion of an image. Finally, HOG
descriptor used Support Vector Machine (SVM) as a classifier
to classify whether the face or non-face.
Some researches proved that neural network can improve
the performance of face detection, but it increases the
execution time. Farfade et al. [6] proposed a method called
Deep Dense Face Detector (DDFD) which is able to detect
faces in a wide range of orientations using a single model
based on deep convolutional neural networks. The method
dose not required segmentation, bounding-box regression, or
SVM classifiers. The study found that the method is able to
detect faces from different angles and there seem to be
correlation between distribution of positive examples in the
training set and score of the proposed face detector.
Zhang et al. [7] proposed a framework to detect face and
alignment using unified cascaded Convolutional Neural
Networks (CNN) by multi-task learning. The framework
consists of three stages: first, to obtain the bounding box
regression, utilizing a convolutional network called Proposal
Network (P-Net). After getting the estimated bounding box
regression candidates, then employ non-maximum
suppression (NMS) to ignore bounding boxes that
significantly overlap each other. All candidates are fed to
Refine Network (R-Net), R-Net aims to reject a large number
of false candidates, performs calibration with bounding box
regression, and NMS candidate merge. Finally, the Output
Network (O-Net) describe the details of the face and output
five facial landmarks’ position. On a 2.60 GHz CPU, it is
possible to achieve a framerate of 16 frames per second (fps).
With Nvidia Titan Black GPU, the achievable frame rate can
be up to 99 fps. The approach achieved 95% in average of
accuracy in each stage across several challenging benchmarks
while keeping real-time performance.
B. Face Recognition
The most important part of face recognition system is to
learn the unique feature of the face to be able to differentiate
distinct faces. It is important to design face recognition
algorithm to be resilient under ill-illuminated conditions.
Previously, conventional face recognition approaches based
on geometric features [8], principle component analysis
(PCA) [9] and linear discriminant analysis [10] may achieve
great recognition rate in a certain condition. Recently, face
recognition based on CNN have been proposed with
algorithms higher accuracy. We briefly reviewed some of
state of the art methods as follows.
Taigman et al. [3] proposed an algorithm for face
recognition called DeepFace which is combined 3D alignment
and similarity metric with deep neural network. The network
architecture is learned from the raw pixel RGB values. 67
fiducial points are extracted by a Support Vector Regressor
(SVR) trained to predict point configurations from an image
descriptor based on Local Binary Pattern (LBP) histogram.
The DeepFace runs at 0.33 seconds per image on single core
Intel 2.2 GHz CPU. As a result, the system achieved 97.35%
in accuracy. Sun et al. [2] proposed the method called
DeepID3. The method proposed two deep network
architectures based on stacked convolution and inception
layers to make them suitable to face recognition. By
improving the method from DeepID2+, DeepFace utilized
Joint identification-verification supervisory signals to the last
fully connected layer as well as a few fully connected layers
branches out from pooling layer. This method is also used in
DeepID3 to make the architecture better supervise early
feature extraction processes. In addition, the DeepID3
network is significant deeper compared to DeepID2+ by using
ten to fifteen non-linear feature extraction layers. As a result,
DeepID3 archived 99.53% for face verification accuracy and
96.0% for face identification accuracy.
The state-of-art face recognition has been proposed by
Schroff et al. [1] called FaceNet which based on Deep
convolutional network. The network is trained to directly
optimize the embedding of the input image. The approach is
directly learning from the pixels of the face image. FaceNet
used the triplet loss to minimize the distance of the
embedding in the feature space between an anchor (reference
face) and a positive (the same person) and maximize the
anchor and negative (a different person). it is considered the
face of the same person have small distances, where the faces
of distinct person have large distance. The network is trained
such that the squared L2-norm distances in the embedding
space directly correspond to face similarity. FaceNet trained
the CNN using Stochastic Gradient Descent (SGD) with
standard back propagation and AdaGrad. The approach
achieved the accuracy of 99.63%. In addition, the system used
128 bytes per face which is suitable for running in the
embedded computer vision.
C. Tracking
Both face detection and recognition processes can consume
significant amount of the processing time. The tracking
algorithm is the essential part to reduce the processing time.
Danelljan et al. [11] proposed an approach to learn
discriminative correlation filters based on a scale of pyramid
representation. The method improved the performance
compared to an exhaustive scale search with significant scale
variation. A closely related approach, proposed by Bolme et
al. [12], is based on finding an adaptive correlation filter by
minimizing the output sum of squared error (MOSSE).
Danelljan et al. method traines a classifier on a scale pyramid.
This allows to independently estimate the target scale after the
optimal translation is found. The score, y, of the output can
calculate by using (1).
Given an image patch z of size MxN in a new frame, the
correlation score, y, is computed as
(1)
Here, capital letters denote the discrete Fourier transforms
(DFT), denotes the inverse DFT operator. The new target
location is estimated to be at the maximum correlation score
of y.
1
{ }ty H Z-
= Á
1-
Á
1319
Proceedings, APSIPAAnnual Summit and Conference 2018 12-15 November 2018, Hawaii
3. III. PROPOSED FRAMEWORK
The overview of the framework is described in Figure 1.
Firstly, we apply face detection and alignment algorithm to
localize the face in the scene, then use detected face as an
input for the recognition step. Finally, we apply the tracking
algorithm along with the recognized face.
Figure 1. Overview of the proposed framework.
A. Face Detection and Alignment
Face detection is the essential part in the framework as an
input to recognition step. We utilized the face detection
algorithm based on [7], the algorithm based on CNN which
can detect the face in many variation and illumination
condition. Moreover, the output of the detection is the
bounding box and facial landmark for alignment which
increase the recognition accurately. Given the input image,
the algorithm resizes the image into different scales which we
called the image pyramid. These different images size is fed
to the three stages cascaded framework: P-Net, R-Net and O-
Net for face detection and alignment. The classification
classifies the image whether face or non-face. Finally, the
bounding box regression and facial landmark localization are
extracted. All of the face image from detector is resized to
160x160 pixels before the recognition step.
B. Recognition Module
Most of the face recognition based on CNN is designed for
high-performance computing due to the computational cost in
the network layers. In this paper, we will implement the
framework into embedded computer vision system which is
used low power consumption, the architecture should be able
to run in real-time without accessing to the cloud and
recognize multiple face at the same time within the board.
Therefore, we applied face recognition method based on
FaceNet [1], FaceNet is based on Inception architecture [13].
The goal is to minimize the embedding of the input image in
the feature space of the same person while maximizing the
face of different person to be far away.
To decrease parameters of the network as well as
decreasing the size of the model to save the memory usage of
the GPU, we utilize the SqueezeNet architecture [14] instead.
This leads to the faster training and face prediction but may
have a slightly drop in accuracy. Moreover, the model size
also decreases. The training process, results, and parameters
used in the training are described in the section IV.
C. Adaptive Face Tracking
The processing time is the main problem in face
recognition system for real-time application due to the
processes of each step such as face detection, face alignment
and face recognition itself. In the experiments, when we
applied face detection algorithm in the whole image of every
frame in video, face detection consumes high processing time
in each frame. We assume that when the face is recognized,
face still in the scene in short period. Thus, we applied
algorithm to track the face using correlation filter based on
[11] algorithm.
IV. EXPERIMENTAL AND RESULTS
A. Experimental Setup
The design of hardware specification for face detection and
recognition is crucial. The main challenge is the
computational cost of the CNN network architecture when we
implement the algorithm based on deep learning into a board
that have a limit of computational resources. To overcome the
problem, the board should support parallel programming
which can compute multiple processes at the same time. In
this experiment, we implemented a framework into NVIDIA
Jetson TX2 board which is a powerful embedded board.
Jetson TX2 support a 256 CUDA cores NVIDIA Pascal of
GPU. It packs a high-performance processor into a small and
power-efficient design that suitable for intelligence devices
like robot, drones, and portable devices that rely on computer
vision applications.
To measure the performance of the framework, we used
Recognition Rate (RR) [15] , precision, recall, f-score [16]
and computation time as metrics, RR can be calculated using
(2) and the computation time of face recognition presented in
seconds.
(2)
The precision recall and f-measure for face detection
evaluation can calculated as follows,
(3)
(4)
(5)
where TP (True Positive) is a detection bounding box presents
in system under test (SUT) and ground truth (GT), FP (False
Positive) presents in SUT but not in GT and FN (False
Negative) presents in GT but not in SUT.
number of collect recognize
100
number of total testing image
RR = ´
#
Precision
# #
TP
TP FP
=
+
#
Recall
# #
TP
TP FN
=
+
Precision×Recall
F-Score = 2
Precision+ Recall
´
1320
Proceedings, APSIPAAnnual Summit and Conference 2018 12-15 November 2018, Hawaii
4. B. Datasets
The dataset used in this paper can be divided into three
parts: training, validation and testing where each dataset
consists of the faces with different variations, illuminations,
poses and occlusion. The detail of datasets is described as the
following.
Figure 2. Examples of VGGFace2 [15].
To achieve highest accuracy and to reduce CNN network
parameters, the dataset must contain enough samples of
variation and good representation of face. If dataset has less
data to train, it comes up with underfitting. In contrast, if the
dataset contains lots of data, the training model become
overfitting. Therefore, the training requires a good model
architecture to reduce the weight of the training as well as to
reach the optimal value of parameters setting such as learning
rate, number of iterations, etc. The VGGFace2 dataset [17]
has been used for training the network. The dataset contains
3.31 million images of 9,131 subjects. Images are
downloaded from Google Image Search and have large
variations in pose, age, illumination, ethnicity and profession
as shown in Figure 2.
The validation set is used for minimizing overfitting as well
as tuning the parameters such as weights and biases to
increase the accuracy over the dataset that has not been shown
in the training before. LWF dataset [18] has been used for
validate the model in this work. The dataset contains 13,233
face images with 5749 different subjects. The resolution of
the images is 250x250 pixels.
Figure 3. Examples of LWF dataset.
Test set is the face set that used to measure the performance
of face recognition algorithm in term of recognition rate and
time to recognize the face. YoutubeFace database [19] used to
measure the performance of tracking and FEI face database
[20] used for measure the recognition rate.
YoutubeFace database is a database of face videos,
containing of 3425 videos of 1595 different people. We used
YoutubeFace database to measure performance of tracking
due the dataset contains a people movement within the scene
and unconstrained of the background. Whereas, FEI face
database is used to measure the recognition rate of the
algorithm. The database contains 2800 images of 200 people.
14 images for one people. All the images are RGB image and
rotate up to about 180 degrees in frontal position. The original
size of each image is 640x480 pixels. The frontal face image
and the label of each person were used to train a classifier in
the experiment.
To measure the processing time of multiple face recognition,
we generated the CUFace dataset by capturing a video of
people standing in front of the camera with the resolution of
640x360 pixels and the frame rate of 25 fps. The distance of
the people from the camera is between 1.00-1.20 meters. The
example of CUFace dataset shown in Figure 4.
C. Model and Classifier Training
The model was trained on SqueezeNet architecture with
VGGFace2, the input size of image is 224x224 pixels. The
parameters used for training are initialized with the learning
rate of 0.1, randomly crop and flip an image, training with
2000 epochs.
Figure 5. Example of 40 image generated from image augmentation
technique.
To train a classifier, we cropped and aligned the image
and resized to 160x160, then utilizing image augmentation
technique to generate 40 faces (as shown in Figure 5) from
original face images with many variations such as adding
noise, illumination, rotatation, shear the image, etc. These 40
images are used as an input to train a classifier. The accuracy
of the recognition is varied by the training sample as we will
discuss in the results.
Figure 4. Example of CUFace dataset.
1321
Proceedings, APSIPAAnnual Summit and Conference 2018 12-15 November 2018, Hawaii
5. Support Vector Machine (SVM) classifier is used to
classify the person in the feature space. The output of the
prediction is the probability distribution of the person in the
database which can be calculated by (6) based on [21]. The
maximum prediction is considered as a label of the person if
the value greater than the threshold value, otherwise, we
considered as unknown person. SVM provide stable results
and faster training while softmax might be stucked by all the
calculations if we have a complex training data with many
labels.
(6)
Where P(label/input) is the probability that input belongs
to label and f(input) is a signed distance of the embedding in
the feature space, A and B are the parameters from the training
classifier.
(a)
(b)
Figure 6. (a) Probability distribution of predicting unknown face. (b)
Probability distribution of predicting face in the database.
In the experiment, we found that the maximum of the
probability is varied by the amount of person in the database.
In other word, when increasing the amount of person in the
database, the maximum probability is decrease. This lead to
varying the threshold when the database is increased. As we
show in Figure 6, when recognizing the face in database, the
maximum probability is totally higher than others compared
with recognizing the unknown face.
In this experiment, we used (7) to calculate the threshold
of the predicting δ by the assumption that the probability of
the prediction has the ratio difference between the maximum
and the others.
(7)
where C is the ratio value between maximum probability and
the others (C=1,2, …, n), 𝑥𝑖 denotes the probability of 𝑖th
person, 𝑥 𝑚𝑎𝑥 is the maximum probability of the prediction,
and 𝑁 is the total number of persons in the database. In the
experiment, the optimal value of C is equal to 10 which
archive the highest recognition rate.
D. Experimental results
The experimental results in this paper can be divided into
three parts: performance of tracking, recognition rate for
multiple faces, recognition rate and processing time on
CUFace dataset.
1. Tracking performance
To show the tracking algorithm performance, we tested
the detection algorithm with YoutubeFace database which
contains the video clip of people movement. The tracking
algorithm performance is comparable to normal detection
where the processing time is faster. The results are shown in
Table 1.
Table 1. Comparison of average precision, recall and f-score of detection and
detection with tracking.
Precision Recall F-Score
Detection 0.75 0.82 0.76
Detection with tracking 0.69 0.99 0.75
2. Recognition rate for multiple faces
The recognition rate depends on the number of trained
faces. We test the algorithm with FEI database. The test set
consists of people face images in different direction which is
difficult to recognize when the features of face does not
appear. The result in Figure 1 indicates that 40 trained face
images achieves the highest recognition rate.
1
( / ) =
(1+ exp(A* ( ) + B)
P label input
f input
δ = C × 1
N
∑xi
− xmax
N −1
Figure 7. Comparison of recognition rate and number of training face on
FEI database.
1322
Proceedings, APSIPAAnnual Summit and Conference 2018 12-15 November 2018, Hawaii
6. 3. Recognition rate and processing time
To measure recognition rate and the processing time of
recognition algorithm of multiple faces, we tested the
algorithm with CUFace dataset. We compared normal
recognition and recognition using tracking algorithm. The
processing time is speeded up by 56.10% on average. The
frame rate varied by the number of face in recognition, our
algorithm can recognize multiple face around 5-10 fps with
recognition rate around 90%.
Table 2. Comparison of processing time between recognition and recognition
with tracking.
Number
of face
Processing time (s)
Speed up
(%)
RR(%) of
recognition
with
tracking
Recognition
Recognition
with tracking
1 0.19 0.05 75.79 96.00
2 0.23 0.07 69.78 98.67
3 0.25 0.09 63.39 95.11
4 0.28 0.12 57.30 94.50
5 0.32 0.16 50.31 84.73
6 0.35 0.20 44.13 84.11
7 0.39 0.22 43.78 83.67
8 0.41 0.23 44.31 85.50
Average 0.30 0.14 56.10 90.29
The outputs of the recognition are shown in the Figure 8.
The bounding box is located the face with the label (name) of
the person.
Figure 8. The output of the recognition. The bounding box located the face
and showing the name of each person.
V. CONCLUSIONS
In this paper, we proposed a framework for real-time
multiple face recognition. The recognition algorithm based on
CNN which is the state-of-art algorithm. The framework
consists of the tracking technique and using the minimal
weight of the model. This can reduce the processing time and
network parameter to learn recognize multiple face feature in
real-time. The framework is implemented into NVIDIA
Jetson TX2 board which supports CUDA, the advantage of
the board is able to run the process in parallel. As a result, the
processing time is faster and suitable for real-time multiple
face recognition with acceptable recognition rate.
VI. ACKNOWLEDGEMENT
This work is supported in part by AUN/SEED-Net (JICA),
Collaborative Research Project entitled Video Processing and
Transmission. This support is gratefully acknowledged.
REFERENCES
[1] F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A
unified embedding for face recognition and clustering," in
Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2015, pp. 815-823.
[2] Y. Sun, D. Liang, X. Wang, and X. Tang, "Deepid3: Face
recognition with very deep neural networks," arXiv
preprint arXiv:1502.00873, 2015.
[3] Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf,
"Deepface: Closing the gap to human-level performance in
face verification," in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2014, pp.
1701-1708.
[4] P. Viola and M. J. Jones, "Robust real-time face
detection," International journal of computer vision, vol.
57, no. 2, pp. 137-154, 2004.
[5] N. Dalal and B. Triggs, "Histograms of oriented gradients
for human detection," in Computer Vision and Pattern
Recognition, 2005. CVPR 2005. IEEE Computer Society
Conference on, 2005, vol. 1, pp. 886-893: IEEE.
[6] S. S. Farfade, M. J. Saberian, and L.-J. Li, "Multi-view
face detection using deep convolutional neural networks,"
in Proceedings of the 5th ACM on International
Conference on Multimedia Retrieval, 2015, pp. 643-650:
ACM.
[7] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face
detection and alignment using multitask cascaded
convolutional networks," IEEE Signal Processing Letters,
vol. 23, no. 10, pp. 1499-1503, 2016.
[8] L. Wiskott, N. Krüger, N. Kuiger, and C. Von Der
Malsburg, "Face recognition by elastic bunch graph
matching," IEEE Transactions on pattern analysis and
machine intelligence, vol. 19, no. 7, pp. 775-779, 1997.
[9] J. Yang, D. Zhang, A. F. Frangi, and J.-y. Yang, "Two-
dimensional PCA: a new approach to appearance-based
face representation and recognition," IEEE transactions on
pattern analysis and machine intelligence, vol. 26, no. 1,
pp. 131-137, 2004.
[10] W. Zhao, A. Krishnaswamy, R. Chellappa, D. L. Swets,
and J. Weng, "Discriminant analysis of principal
1323
Proceedings, APSIPAAnnual Summit and Conference 2018 12-15 November 2018, Hawaii
7. components for face recognition," in Face Recognition:
Springer, 1998, pp. 73-85.
[11] M. Danelljan, G. Häger, F. Khan, and M. Felsberg,
"Accurate scale estimation for robust visual tracking," in
British Machine Vision Conference, Nottingham,
September 1-5, 2014, 2014: BMVA Press.
[12] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui,
"Visual object tracking using adaptive correlation filters,"
in Computer Vision and Pattern Recognition (CVPR),
2010 IEEE Conference on, 2010, pp. 2544-2550: IEEE.
[13] C. Szegedy et al., "Going deeper with convolutions," 2015:
Cvpr.
[14] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J.
Dally, and K. Keutzer, "SqueezeNet: AlexNet-level
accuracy with 50x fewer parameters and< 0.5 MB model
size," arXiv preprint arXiv:1602.07360, 2016.
[15] B. Dai and D. Zhang, "Evaluation of face recognition
techniques," in PIAGENG 2009: Image Processing and
Photonics for Agricultural Engineering, 2009, vol. 7489, p.
74890M: International Society for Optics and Photonics.
[16] A. Baumann et al., "A review and comparison of measures
for automatic video surveillance systems," EURASIP
Journal on Image and Video Processing, vol. 2008, no. 1,
p. 824726, 2008.
[17] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman,
"VGGFace2: A dataset for recognising faces across pose
and age," arXiv preprint arXiv:1710.08092, 2017.
[18] G. B. Huang and E. Learned-Miller, "Labeled faces in the
wild: Updates and new reporting procedures," Dept.
Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA,
USA, Tech. Rep, pp. 14-003, 2014.
[19] L. Wolf, T. Hassner, and I. Maoz, "Face recognition in
unconstrained videos with matched background
similarity," in Computer Vision and Pattern Recognition
(CVPR), 2011 IEEE Conference on, 2011, pp. 529-534:
IEEE.
[20] C. E. Thomaz and G. A. Giraldi, "A new ranking method
for principal components analysis and its application to
face image analysis," Image and Vision Computing, vol. 28,
no. 6, pp. 902-913, 2010.
[21] J. Platt, "Probabilistic outputs for support vector machines
and comparisons to regularized likelihood methods,"
Advances in large margin classifiers, vol. 10, no. 3, pp. 61-
74, 1999.
1324
Proceedings, APSIPAAnnual Summit and Conference 2018 12-15 November 2018, Hawaii