This paper presents a study of the efficiency and performance speedup achieved by applying Graphics Processing Units for Face Recognition Solutions. We explore one of the possibilities of parallelizing and optimizing a well-known Face Recognition algorithm, Principal Component Analysis (PCA) with Eigenfaces. In recent years, the Graphics Processing Units (GPU) has been the subject of extensive research and the computation speed of GPUs has been rapidly increasing.
Background Estimation Using Principal Component Analysis Based on Limited Mem...IJECEIAES
Given a video of 푀 frames of size ℎ × 푤. Background components of a video are the elements matrix which relative constant over 푀 frames. In PCA (principal component analysis) method these elements are referred as “principal components”. In video processing, background subtraction means excision of background component from the video. PCA method is used to get the background component. This method transforms 3 dimensions video (ℎ × 푤 × 푀) into 2 dimensions one (푁 × 푀), where 푁 is a linear array of size ℎ × 푤 . The principal components are the dominant eigenvectors which are the basis of an eigenspace. The limited memory block Krylov subspace optimization then is proposed to improve performance the computation. Background estimation is obtained as the projection each input image (the first frame at each sequence image) onto space expanded principal component. The procedure was run for the standard dataset namely SBI (Scene Background Initialization) dataset consisting of 8 videos with interval resolution [146 150, 352 240], total frame [258,500]. The performances are shown with 8 metrics, especially (in average for 8 videos) percentage of error pixels (0.24%), the percentage of clustered error pixels (0.21%), multiscale structural similarity index (0.88 form maximum 1), and running time (61.68 seconds).
A broad ranging open access journal Fast and efficient online submission Expe...ijceronline
nternational Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...ijtsrd
The inclusion of time based queries in video indexing application is enables by the recognition of time and date stamps in CCTV video. In this paper, we propose the system for reconstructing the path of the object in surveillance cameras based on time and date optical character recognition system. Since there is no boundary in region for time and date, Discrete Cosine Transform DCT method is applied in order to locate the region area. After the region for time and date is located, it is segmented and then features for the symbols of the time and date are extracted. Back propagation neural network is used for recognition of the features and then stores the result in the database. By using the resulted database, the system reconstructs the path for the object based on time. The proposed system will be implemented in MATLAB. Pyae Phyo Thu | Mie Mie Tin | Ei Phyu Win | Cho Thet Mon "Reconstructing the Path of the Object based on Time and Date OCR in Surveillance System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27981.pdfPaper URL: https://www.ijtsrd.com/home-science/education/27981/reconstructing-the-path-of-the-object-based-on-time-and-date-ocr-in-surveillance-system/pyae-phyo-thu
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
Background Estimation Using Principal Component Analysis Based on Limited Mem...IJECEIAES
Given a video of 푀 frames of size ℎ × 푤. Background components of a video are the elements matrix which relative constant over 푀 frames. In PCA (principal component analysis) method these elements are referred as “principal components”. In video processing, background subtraction means excision of background component from the video. PCA method is used to get the background component. This method transforms 3 dimensions video (ℎ × 푤 × 푀) into 2 dimensions one (푁 × 푀), where 푁 is a linear array of size ℎ × 푤 . The principal components are the dominant eigenvectors which are the basis of an eigenspace. The limited memory block Krylov subspace optimization then is proposed to improve performance the computation. Background estimation is obtained as the projection each input image (the first frame at each sequence image) onto space expanded principal component. The procedure was run for the standard dataset namely SBI (Scene Background Initialization) dataset consisting of 8 videos with interval resolution [146 150, 352 240], total frame [258,500]. The performances are shown with 8 metrics, especially (in average for 8 videos) percentage of error pixels (0.24%), the percentage of clustered error pixels (0.21%), multiscale structural similarity index (0.88 form maximum 1), and running time (61.68 seconds).
A broad ranging open access journal Fast and efficient online submission Expe...ijceronline
nternational Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...ijtsrd
The inclusion of time based queries in video indexing application is enables by the recognition of time and date stamps in CCTV video. In this paper, we propose the system for reconstructing the path of the object in surveillance cameras based on time and date optical character recognition system. Since there is no boundary in region for time and date, Discrete Cosine Transform DCT method is applied in order to locate the region area. After the region for time and date is located, it is segmented and then features for the symbols of the time and date are extracted. Back propagation neural network is used for recognition of the features and then stores the result in the database. By using the resulted database, the system reconstructs the path for the object based on time. The proposed system will be implemented in MATLAB. Pyae Phyo Thu | Mie Mie Tin | Ei Phyu Win | Cho Thet Mon "Reconstructing the Path of the Object based on Time and Date OCR in Surveillance System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27981.pdfPaper URL: https://www.ijtsrd.com/home-science/education/27981/reconstructing-the-path-of-the-object-based-on-time-and-date-ocr-in-surveillance-system/pyae-phyo-thu
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
An empirical assessment of different kernel functions on the performance of s...riyaniaes
Artificial intelligence (AI) and machine learning (ML) have influenced every part of our day-to-day activities in this era of technological advancement, making a living more comfortable on the earth. Among the several AI and ML algorithms, the support vector machine (SVM) has become one of the most generally used algorithms for data mining, prediction and other (AI and ML) activities in several domains. The SVM’s performance is significantly centred on the kernel function (KF); nonetheless, there is no universal accepted ground for selecting an optimal KF for a specific domain. In this paper, we investigate empirically different KFs on the SVM performance in various fields. We illustrated the performance of the SVM based on different KF through extensive experimental results. Our empirical results show that no single KF is always suitable for achieving high accuracy and generalisation in all domains. However, the gaussian radial basis function (RBF) kernel is often the default choice. Also, if the KF parameters of the RBF and exponential RBF are optimised, they outperform the linear and sigmoid KF based SVM method in terms of accuracy. Besides, the linear KF is more suitable for the linearly separable dataset.
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
Nonlinear systems with uncertainty and disturbance are very difficult to model using mathematic approach. Therefore, a black-box modeling approach without any prior knowledge is necessary. There are some modeling approaches have been used to develop a black box model such as fuzzy logic, neural network, and evolution algorithms. In this paper, an evolutionary neural network by combining a neural network and a modified differential evolution algorithm is applied to model a nonlinear system. The feasibility and effectiveness of the proposed modeling are tested on a piezoelectric actuator SISO system and an experimental quadruple tank MIMO system.
A complete user adaptive antenna tutorial demonstration. a gui based approach...Pablo Velarde A
This paper is aimed at creation of an easy to use
Antenna Graphical User Interface(GUI) Tutorial Demonstration
using JAVA in BlueJ environment which provides the user with
ample opportunity to feed desired input parameters and to study
the basic antenna patterns and parameters according to the
inputs using MATLAB® interfacing. It also presents three
dimensional physical appearances using the JAVA3D utility
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...ijwmn
Antenna promises to provide significant increases in system capacity and performance in
wireless systems. In this paper, a simplified, near-optimum array receiver is proposed,
which is based on the angular gain of the spatial filter. This detection is then analyzed by
calculating the exact error probability.The proposed model confirms the benefits of adaptive
antennas in reducing the overall interference level (intercell/intracell) and to find an
accurate approximation of the error probability. We extend the method that has been
proposed for propagation over Nakagami-m fading channels, the model shows good
agreements with simulation results.
A Novel Algorithm for Watermarking and Image Encryption cscpconf
Digital watermarking is a method of copyright protection of audio, images, video and text. We
propose a new robust watermarking technique based on contourlet transform and singular value
decomposition. The paper also proposes a novel encryption algorithm to store a signed double
matrix as an RGB image. The entropy of the watermarked image and correlation coefficient of
extracted watermark image is very close to ideal values, proving the correctness of proposed
algorithm. Also experimental results show resiliency of the scheme against large blurring attack
like mean and gaussian filtering, linear filtering (high pass and low pass filtering) , non-linear
filtering (median filtering), addition of a constant offset to the pixel values and local exchange of pixels .Thus proving the security, effectiveness and robustness of the proposed watermarking algorithm.
ROI Based Image Compression in Baseline JPEGIJERA Editor
To improve the efficiency of standard JPEG compression algorithm an adaptive quantization technique based on the support for region of interest of compression is introduced. Since this is a lossy compression technique the less important bits are discarded and are not restored back during decompression. Adaptive quantization is carried out by applying two different quantization to the picture provided by the user. The user can select any part of the image and enter the required quality for compression. If according to the user the subject is more important than the background then more quality is provided to the subject than the background and vice- versa. Adaptive quantization in baseline sequential JPEG is carried out by applying Forward Discrete Cosine Transform (FDCT), two different quantization provided by the user for compression, thereby achieving region of interest compression and Inverse Discrete Cosine Transform (IDCT) for decompression. This technique makes sure that the memory is used efficiently. Moreover we have specifically designed this for identifying defects in the leather samples clearly.
The network anomaly detection technology based
on support vector machine (SVM) can efficiently detect unknown
attacks or variants of known attacks. However, it cannot be used
for detection of large-scale intrusion scenarios due to the demand
of computational time. The graphics processing unit (GPU) has
the characteristics of multi-threads and powerful parallel
processing capability. Hence Parallel computing framework is
used to accelerate the SVM-based classification.
This seminar is created for families who need assistance getting their reluctant learner back into the swing of things. I will cover topics from the importance of keeping it simple when planning your routines to homework tips and setting up positive expectations for next year.
In this paper a PDE based hybrid method for image denoising is introduced. The method is a bi-stage filter with anisotropic diffusion followed by wavelet based bayesian shrinkage. Here efficient denoising is achieved by reducing the convergence time of anisotropic diffusion.
In this paper we discuss the speckle reduction in images with the recently proposed Wavelet Embedded Anisotropic Diffusion (WEAD) and Wavelet Embedded Complex Diffusion (WECD). Both these methods are improvements over anisotropic and complex diffusion by adding wavelet based bayes shrink in its second stage. Both WEAD and WECD produce excellent results when compared with the existing speckle reduction filters.
An empirical assessment of different kernel functions on the performance of s...riyaniaes
Artificial intelligence (AI) and machine learning (ML) have influenced every part of our day-to-day activities in this era of technological advancement, making a living more comfortable on the earth. Among the several AI and ML algorithms, the support vector machine (SVM) has become one of the most generally used algorithms for data mining, prediction and other (AI and ML) activities in several domains. The SVM’s performance is significantly centred on the kernel function (KF); nonetheless, there is no universal accepted ground for selecting an optimal KF for a specific domain. In this paper, we investigate empirically different KFs on the SVM performance in various fields. We illustrated the performance of the SVM based on different KF through extensive experimental results. Our empirical results show that no single KF is always suitable for achieving high accuracy and generalisation in all domains. However, the gaussian radial basis function (RBF) kernel is often the default choice. Also, if the KF parameters of the RBF and exponential RBF are optimised, they outperform the linear and sigmoid KF based SVM method in terms of accuracy. Besides, the linear KF is more suitable for the linearly separable dataset.
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
Nonlinear systems with uncertainty and disturbance are very difficult to model using mathematic approach. Therefore, a black-box modeling approach without any prior knowledge is necessary. There are some modeling approaches have been used to develop a black box model such as fuzzy logic, neural network, and evolution algorithms. In this paper, an evolutionary neural network by combining a neural network and a modified differential evolution algorithm is applied to model a nonlinear system. The feasibility and effectiveness of the proposed modeling are tested on a piezoelectric actuator SISO system and an experimental quadruple tank MIMO system.
A complete user adaptive antenna tutorial demonstration. a gui based approach...Pablo Velarde A
This paper is aimed at creation of an easy to use
Antenna Graphical User Interface(GUI) Tutorial Demonstration
using JAVA in BlueJ environment which provides the user with
ample opportunity to feed desired input parameters and to study
the basic antenna patterns and parameters according to the
inputs using MATLAB® interfacing. It also presents three
dimensional physical appearances using the JAVA3D utility
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...ijwmn
Antenna promises to provide significant increases in system capacity and performance in
wireless systems. In this paper, a simplified, near-optimum array receiver is proposed,
which is based on the angular gain of the spatial filter. This detection is then analyzed by
calculating the exact error probability.The proposed model confirms the benefits of adaptive
antennas in reducing the overall interference level (intercell/intracell) and to find an
accurate approximation of the error probability. We extend the method that has been
proposed for propagation over Nakagami-m fading channels, the model shows good
agreements with simulation results.
A Novel Algorithm for Watermarking and Image Encryption cscpconf
Digital watermarking is a method of copyright protection of audio, images, video and text. We
propose a new robust watermarking technique based on contourlet transform and singular value
decomposition. The paper also proposes a novel encryption algorithm to store a signed double
matrix as an RGB image. The entropy of the watermarked image and correlation coefficient of
extracted watermark image is very close to ideal values, proving the correctness of proposed
algorithm. Also experimental results show resiliency of the scheme against large blurring attack
like mean and gaussian filtering, linear filtering (high pass and low pass filtering) , non-linear
filtering (median filtering), addition of a constant offset to the pixel values and local exchange of pixels .Thus proving the security, effectiveness and robustness of the proposed watermarking algorithm.
ROI Based Image Compression in Baseline JPEGIJERA Editor
To improve the efficiency of standard JPEG compression algorithm an adaptive quantization technique based on the support for region of interest of compression is introduced. Since this is a lossy compression technique the less important bits are discarded and are not restored back during decompression. Adaptive quantization is carried out by applying two different quantization to the picture provided by the user. The user can select any part of the image and enter the required quality for compression. If according to the user the subject is more important than the background then more quality is provided to the subject than the background and vice- versa. Adaptive quantization in baseline sequential JPEG is carried out by applying Forward Discrete Cosine Transform (FDCT), two different quantization provided by the user for compression, thereby achieving region of interest compression and Inverse Discrete Cosine Transform (IDCT) for decompression. This technique makes sure that the memory is used efficiently. Moreover we have specifically designed this for identifying defects in the leather samples clearly.
The network anomaly detection technology based
on support vector machine (SVM) can efficiently detect unknown
attacks or variants of known attacks. However, it cannot be used
for detection of large-scale intrusion scenarios due to the demand
of computational time. The graphics processing unit (GPU) has
the characteristics of multi-threads and powerful parallel
processing capability. Hence Parallel computing framework is
used to accelerate the SVM-based classification.
This seminar is created for families who need assistance getting their reluctant learner back into the swing of things. I will cover topics from the importance of keeping it simple when planning your routines to homework tips and setting up positive expectations for next year.
In this paper a PDE based hybrid method for image denoising is introduced. The method is a bi-stage filter with anisotropic diffusion followed by wavelet based bayesian shrinkage. Here efficient denoising is achieved by reducing the convergence time of anisotropic diffusion.
In this paper we discuss the speckle reduction in images with the recently proposed Wavelet Embedded Anisotropic Diffusion (WEAD) and Wavelet Embedded Complex Diffusion (WECD). Both these methods are improvements over anisotropic and complex diffusion by adding wavelet based bayes shrink in its second stage. Both WEAD and WECD produce excellent results when compared with the existing speckle reduction filters.
Test Automation is an accepted technique which is adapted by the industry for increasing the effectiveness of the testing phase. The recurring tasks are being automated by the tools thus simplifying the human efforts and results in increased quality of product under test. A study of test automation programmes in the industry reveals the fact that a good percentage of them fail to find the intended results.
The improved hybrid model for molecular image denoising, proposed by NeST Software, can give a better SNR Molecular Image output. Read more on the proposed hybrid model.
Software Defined Networking (SDN) is an emerging trend in the networking and communication industry and promises to deliver enormous benefits, from reduced costs to more efficient network operations. It is a new approach that gives network operators and owners more control of the infrastructure, allowing optimization, customization and virtualization that enable the creation of new types of network services. This is done by decoupling the management and control planes that make decisions about where traffic is sent from (the control plane) the underlying hardware that forwards data traffic to the selected destination (the data plane).
Identification of Focal Cortical Dysplasia (FCD) can be difficult due to the subtle MRI changes. Though sequences like FLAIR (fluid attenuated inversion recovery) can detect a large majority of these lesions, there are smaller lesions without signal changes that can easily go unnoticed by the naked eye. The aim of this study is to improve the visibility of Focal Cortical Dysplasia lesions in the T1 weighted brain MRI images. In the proposed method, we used a complex diffusion based approach for calculating the FCD affected areas.
This paper provides an overview of Universal Plug and Play (UPnP) and how it works to build a digital home network. UPnP network technology allows personal computer and consumer electronics devices to advertise and offer their services to network clients. UPnP can be viewed as the technological foundation of the digital home, enabling innovative usage models, higher levels of automation, and easier integration of devices from different manufacturers. UPnP technology is all about making home networking simple and affordable for users.
Analog-to-Digital Converter (ADC) is an integral part of high-speed signal processing applications. This paper discusses about 10-bit SAR based ADC that enables very low power consumption and sampling rate as high as 165 MSPS.
Complex digital and analog circuits and multiple clock signals used for design and development of modern systems usually make the job of engineers and designers a tedious one. While working with complex circuits and signals, a designer might encounter problems with circuit validation due to long simulation time. These complexities adversely affect the development time and hence increase time to market incurring higher production costs. By applying a new methodology in their Digital Phase-Locked Loop (Digital PLL) design, the engineers at QuEST reduced the simulation effort to one-by-third.
A Set-top-Box (STB) is a very common name heard in the consumer electronics market. It is a device that is attached to a Television for enhancing its functions or the quality of its functions. On the other side, the STB is connected to an external source of signal, like satellite, cable, terrestrial or internet. The STB processes the signal it receives, turns it into content, which is then displayed on the television screen or other display device. There are different types of STBs based on what kind of signals it can receive and what kind of processing it can do. The most widely used STBs are DVB STBs, which receive DVB (Digital Video Broadcast) transmission.
Ground breaking innovations like Advanced Driver Assistance System (ADAS) makes driving easier and safer on congested roads. The whitepaper details how FPGA technology emerges as a complete solution for ADAS.
Reusable Video IP Cores give software engineering service providers flexibility and less time to market while catering to the ever increasing demands of customers. Read on to know more about the Reusable IP Cores developed by NeST Software.
Template matching is a basic method in image analysis to extract useful information from images. In this
paper, we suggest a new method for pattern matching. Our method transform the template image from two
dimensional image into one dimensional vector. Also all sub-windows (same size of template) in the
reference image will transform into one dimensional vectors. The three similarity measures SAD, SSD, and
Euclidean are used to compute the likeness between template and all sub-windows in the reference image
to find the best match. The experimental results show the superior performance of the proposed method
over the conventional methods on various template of different sizes.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Abstract Face recognition is a form of computer vision that uses faces to identify a person or verify a person’s claimed identity. In this paper, a neural based algorithm is presented, to detect frontal views of faces. The dimensionality of input face image is reduced by the Principal component analysis and the Classification is by the neural back propagation network. This method is robust for a dataset of 300 face images and has better performance in terms of 80 – 90 % recognition rate.
This is the Bangla Handwritten Digit Recognition Report. you can see this report for your helping hand.
**Bengali is the world's fifth most spoken language, with 265 million native and non-native speakers accounting for 4% of the global population.
**Despite the large number of Bengali speakers, very little research has been conducted on Bangali handwritten digit recognition.
**The application of the BHwDR system is wide from postal code digit recognition to license plate recognition, digit recognition in cheques in the banking system to exam paper registration number recognition.
Eigenfaces , Fisherfaces and Dimensionality_Reductionmostafayounes012
Eigenfaces , Fisherfaces and Dimensionality Reduction are explained easily and clearly. These topics focus on 2 face recognition methods and explains the mathematics behind them. Hope it Helps!
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...csandit
Recognizing Faces helps to name the various subjects present in the image. This work focuses
on labeling faces on an image which includes faces of humans being of various age group
(heterogeneous set ). Principal component analysis concentrates on finds the mean of the data
set and subtracts the mean value from the data set with an intention to normalize that data.
Normalization with respect to image is the removal of common features from the data set. This
work brings in the novel idea of deploying the median another measure of central tendency for
normalization rather than mean. The above work was implemented using matlab. Results show
that Median is the best measure for normalization for a heterogeneous data set which gives
raise to outliers.
Quality Prediction in Fingerprint CompressionIJTET Journal
A new algorithm for fingerprint compression based on sparse representation is introduced. At first, dictionary is constructed by sparse combination of set of fingerprint patches. Designing dictionaries can be done by either selecting one from a prespecified set or adapting a dictionary to a set of training signals. In this paper, we use K-SVD algorithm to construct dictionary. After computation of dictionary, the image gets quantized, filtered and encoded. The resultant image obtained may be of three qualities: Good, Bad and Ugly (GBU problem). In this paper, we overcome the GBU problem by prediction the quality of image.
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
Multimodal biometric system utilizes two or more character modalities, e.g., face, ear, and fingerprint,
Signature, plamprint to improve the recognition accuracy of conventional unimodal methods. We propose a new
dimensionality reduction method called Dimension Diminish Projection (DDP) in this paper. DDP can not only
preserve local information by capturing the intra-modal geometry, but also extract between-class relevant
structures for classification effectively. Experimental results show that our proposed method performs better
than other algorithms including PCA, LDA and MFA.
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
Image reconstruction is a process of obtaining the original image from corrupted data. Applications of image reconstruction include Computer Tomography, radar imaging, weather
forecasting etc. Recently steering kernel regression method has been applied for image reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is computationally intensive. Secondly, output of the algorithm suffers form spurious edges (especially in case of denoising). We propose a modified version of Steering Kernel Regression called as Median Based Parallel Steering Kernel Regression Technique. In the proposed algorithm the first problem is overcome by implementing it in on GPUs and multi-cores. The second problem is addressed by a gradient based suppression in which median filter is used. Our algorithm gives better output than that of the Steering Kernel Regression. The results are
compared using Root Mean Square Error(RMSE). Our algorithm has also shown a speedup of 21x using GPUs and shown speedup of 6x using multi-cores.
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
Image reconstruction is a process of obtaining the original image from corrupted data.Applications of image reconstruction include Computer Tomography, radar imaging, weather forecasting etc. Recently steering kernel regression method has been applied for image reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is computationally intensive. Secondly, output of the algorithm suffers form spurious edges(especially in case of denoising). We propose a modified version of Steering Kernel Regression called as Median Based Parallel Steering Kernel Regression Technique. In the proposed algorithm the first problem is overcome by implementing it in on GPUs and multi-cores. The second problem is addressed by a gradient based suppression in which median filter is used.Our algorithm gives better output than that of the Steering Kernel Regression. The results are compared using Root Mean Square Error(RMSE). Our algorithm has also shown a speedup of 21x using GPUs and shown speedup of 6x using multi-cores.
Median based parallel steering kernel regression for image reconstructioncsandit
Image reconstruction is a process of obtaining the original image from corrupted data.
Applications of image reconstruction include Computer Tomography, radar imaging, weather
forecasting etc. Recently steering kernel regression method has been applied for image
reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is
computationally intensive. Secondly, output of the algorithm suffers form spurious edges
(especially in case of denoising). We propose a modified version of Steering Kernel Regression
called as Median Based Parallel Steering Kernel Regression Technique. In the proposed
algorithm the first problem is overcome by implementing it in on GPUs and multi-cores. The
second problem is addressed by a gradient based suppression in which median filter is used.
Our algorithm gives better output than that of the Steering Kernel Regression. The results are
compared using Root Mean Square Error(RMSE). Our algorithm has also shown a speedup of
21x using GPUs and shown speedup of 6x using multi-cores.
Semantic Image Retrieval Using Relevance Feedback dannyijwest
This paper presents optimized interactive content-based image retrieval framework based on AdaBoost
learning method. As we know relevance feedback (RF) is online process, so we have optimized the learning
process by considering the most positive image selection on each feedback iteration. To learn the system we
have used AdaBoost. The main significances of our system are to address the small training sample and to
reduce retrieval time. Experiments are conducted on 1000 semantic colour images from Corel database to
demonstrate the effectiveness of the proposed framework. These experiments employed large image
database and combined RCWFs and DT-CWT texture descriptors to represent content of the images.
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
As data processing requirements increased with new applications, new processing technologies like Stream computing and parallel execution came into being. This write‐up briefly compares two competing performance architectures for data parallelism – Cell Broadband Engine (Cell BE in short) and the GPU (Graphics Processing Unit). The Cell BE Processor architecture was developed in collaboration between IBM, Sony and Toshiba. Development started in 2001 and first set of products based on this architecture started appearing in 2005.
Fast and robust tracking of multiple faces is receiving increased attention from computer vision researchers as it finds potential applications in many fields like video surveillance and computer mediated video conferencing. Real-time tracking of multiple faces in high resolution videos involve three basic tasks namely initialization, tracking and display. Among these, tracking is quite compute intensive as it involves particle filtering that won’t yield a real time performance if we use a conventional CPU based system alone.
In today’s competitive software development scenario, the customer demands a testing coverage which not only ensures the stated requirements but also the implied ones. This situation calls for an exhaustive testing which may not be always possible due to various reasons. Testing, due to its last position in SDLC, often gets crunched due to the cumulative schedule slippages. Hence Tester is faced with a challenge to make testing as efficient as possible within a short time span due to cost constraints. With selective testing an only option, test leads usually go for the age-old approach of Random Testing. Random testing does not ensure coverage in a scientific manner.
SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. For efficient construction of large maps searching the best-matching unit is usually the computationally heaviest operation in the SOM. The parallel nature of the algorithm and the huge computations involved makes it a good target for GPU based parallel implementation. This paper presents an overall idea of the optimization strategies used for the parallel implementation of Basic-SOM on GPU using CUDA programming paradigm.
In this paper we present a recently developed tool named BrainAssist, which can be used for the study and analysis of brain abnormalities like Focal Cortical Dysplasia (FCD), Heterotopia and Multiple Sclerosis (MS). For the analysis of FCD and Heterotopia we used T1 weighted MR images and for the analysis of Multiple Sclerosis we used Proton Density (PD) images. 52 patients were studied. Out of 52 cases 36 were affected with FCDs, 6 with MS lesions and 10 normal cases. Preoperative MR images were acquired on a 1.5-T scanner (Siemens Medical Systems, Germany).
Software Testing is the last phase in software development lifecycle which has high impact on the quality of the final product delivered to the customer. Even after being a critical phase, it was not given the importance as it actually deserves. The schedule constraints and slippage carry forwarded from the previous phase also make the testing phase more torrent. History reveals that the situation has changed with time, wherein testing is now visualized as one of the most critical, phase of software development. This makes software testing a discipline which demands for continuous and systematic growth. Software testing is a trade-off between Cost, Time and Quality.
In software industry, test automation is a key solution for achieving volume verification and validation with optimal costs. Picking up the right automation tool and underlying scripting language has always been a challenge, balancing between cost factors and team’s expertise levels in various tools and scripting languages. A real solution would be one that allows full flexibility for team on these two core concern areas – test automation tool and scripting language. Flexi any Script any Tool (FaSaT) is a test automation framework which provides interoperability among multiple test automation tools and multiple scripting languages.
More from QuEST Global (erstwhile NeST Software) (8)
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3
CUDA Accelerated Face Recognition
1. CUDA Accelerated Face Recognition
Numaan. A
Third Year Undergraduate
Department of Electrical Engineering
Indian Institute of Technology, Madras, India
ee08b044@smail.iitm.ac.in
Sibi. A
NeST-NVIDIA Centre for GPU Computing
NeST, India
sibi.a@nestgroup.net
July 26, 2010
Abstract
This paper presents a study of the efficiency and
performance speedup achieved by applying Graph-
ics Processing Units for Face Recognition Solutions.
We explore one of the possibilities of parallelizing and
optimizing a well-known Face Recognition algorithm,
Principal Component Analysis (PCA) with Eigen-
faces.
1 Introduction
In recent years, the Graphics Processing Units (GPU)
has been the subject of extensive research and the
computation speed of GPUs has been rapidly increas-
ing. The computational power of the latest genera-
tion of GPUs, measured in Flops1
, is several times
that of a high end CPU and for this reason, they
are being increasingly used for non-graphics applica-
tions or general-purpose computing (GPGPU). Tra-
ditionally, this power of the GPUs could only be
harnessed through graphics APIs and was primarily
used only by professionals familiar with these APIs.
1floating-point operations per second
CUDA (Compute Unified Device Architecture) de-
veloped by NVIDIA, tries to address this issue by in-
troducing a familiar C like development environment
to GPGPU programming and allows programmers to
launch hundreds of concurrent threads to run on the
“massively” parallel NVIDIA GPUs, with very little
software overhead. This paper portrays our efforts to
use this power to tame a computationally intensive,
yet highly parallelizable PCA based algorithm used in
face recognition solutions. We developed both CPU
serial code and GPU parallel code to compare the ex-
ecution time in each case and measure the speed up
achieved by the GPU over the CPU.
2 PCA Theory
2.1 Introduction
Principal Component Analysis (PCA) is one of the
early and most successful techniques that have been
used for face recognition. PCA aims to reduce the
dimensionality of data so that it can be economically
represented and processed. Information contained in
a human face is highly redundant, with each pixel
highly correlated to its neighboring pixels and the
1
2. main idea of using PCA for face recognition is to
remove this redundancy, and extract the features re-
quired for comparison of faces. To increase accuracy
of the algorithm, we are using a slightly modified
method called Improved PCA, in which the images
used for training are grouped into different classes
and each class contains multiple images of a single
person with different facial expressions. The mathe-
matics behind PCA is described in the following sub-
sections.
2.2 Mathematics of PCA
Firstly, the 2-D facial images are resized using bilin-
ear interpolation to reduce the dimension of data and
increase the speed of computation. The resized im-
age is represented as a 1-D vector by concatenating
each row into one long vector . Let’s suppose we have
M training samples per class, each of size N (=total
pixels in the resized image). Let the training vectors
be represented as xi. pj’s represent pixel values.
xi = [p1 . . . pN ]T
, i = 1, . . . , M
The training vectors are then normalized by sub-
tracting the class mean image m from them.
m =
1
M
M
i=1
xi
Let wi be the normalized images.
wi = xi − m
The matrix W is composed by placing the normal-
ized image vectors wi side by side. The eigenvectors
and eigenvalues of the covariance matrix C is com-
puted.
C = WWT
The size of C is N × N which could be enormous.
For example, images of size 16 × 16 give a covari-
ance of size 256 × 256. It is not practical to solve
for eigenvectors of C directly. Hence the eigenvectors
of the surrogate matrix WT
W of size M × M are
computed and the first M −1 eigenvectors and eigen-
values of C are given by Wdi and µi, where di and
µi are eigenvectors and eigenvalues of the surrogate
matrix, respectively.
The eigenvectors corresponding to non-zero eigen-
values of the covariance matrix make an orthonormal
basis for the subspace within which most of the image
data can be represented with minimum error. The
eigenvector associated with the highest eigenvalue re-
flects the greatest variance in the image. These eigen-
vectors are known as eigenimages or eigenfaces and
when normalized look like faces. The eigenvectors of
all the classes are computed similarly and all these
eigenvectors are placed side by side to make up the
eigenspace S.
The mean image of the entire training set, m
is computed and each training vector xi is normal-
ized. The normalized training vectors wi are pro-
jected onto the eigenspace S, and the projected fea-
ture vectors yi of the training samples are obtained.
wi = xi − m
yi = ST
wi
The simplest method for determining which class
the test face falls under is to find the class k, that
minimizes the Euclidean distance. The test image
is projected onto the eigenspace and the Euclidean
distance between the projected test image and each
of the projected training samples are computed. If
the minimum Euclidean distance falls under a prede-
fined threshold θ, the face is classified as belonging to
the class to which contained the feature vector that
yielded the minimum Euclidean distance.
3 CPU Implementation
3.1 Database
For our implementation, we are using one of the most
common databases used for testing face recognition
solutions, called the ORL Face Database (formerly
known as the AT&T Database). The ORL Database
2
3. contains 400 grayscale images of resolution 112 ×
92 of 40 subjects with 10 images per subject. The
images are taken under various situations, such as
different time, different angles, different expressions
(happy, angry, surprise, etc.) and different face de-
tails (with/without spectacles, with/without beard,
different hair styles etc). To truly show the power
of a GPU, the image database has to be large. So
in our implementation we scaled the ORL Database
multiple times by copying and concatenating, to cre-
ate bigger databases. This allowed us to measure the
GPU performance for very large databases, as high
as 15000 images.
3.2 Training phase
The most time consuming operation in the training
phase is the extraction of feature vector from the
training samples by projecting each one of them to
the eigenspace. The computation of eigenfaces and
eigenspace is relatively less intensive since we have
used the surrogate matrix workaround to decrease the
size of the matrix and compute the eigenvectors with
ease. Hence we have decided to parallelize only pro-
jection step of the training process. The steps till the
projection of training samples are done in MATLAB
and the projection is written in C.
The MATLAB routine acquires the images from
files, resizes them to a standard 16 × 16 resolution
and computes the eigenfaces and eigenspace. The
data required for the projection step, namely, the
resized training samples, the database mean image
and the eigenvectors, are then dumped into a bi-
nary file with is later read by the C routine to com-
plete the training process. The C routine reads the
data dumped by the MATLAB routine and extracts
the feature vectors by normalizing the resized train-
ing samples and projecting each of them onto the
eigenspace. With this the training process is com-
plete and the feature vectors are dumped onto a bi-
nary file to be used in the testing process.
3.3 Testing phase
The entire testing process is written in C. OpenCV
is used to acquire the testing images from file, and
resize them to the standard 16 × 16 resolution using
bilinear interpolation. The resized image is normal-
ized with the database mean image and projected
onto the eigenspace computed in the training phase.
The euclidean distance between the test image fea-
ture vector and the training sample feature vectors
are computed and the index of the feature vector
yielding the minimum euclidean distance is found.
The face that yielded this feature vector is the most
probable match for the input test face.
4 GPU Implementation
4.1 Training phase
4.1.1 Introduction
As mentioned in Section 3.2, only the final feature ex-
traction process of the training phase is parallelized.
Before the training samples can be projected, all the
data required for the projection process is copied to
the device’s global memory and the time taken for
copying is noted. As all the data are of a read-only
nature, they are bound as texture to take advantage
of the cached texture memory.
4.1.2 Kernel
The projection process is highly parallelizable and
can be parallelized in two ways. The threads can be
launched to parallelize the computation of a partic-
ular feature vector, wherein, each thread computes a
single element of the feature vector. Or, the threads
can be launched to parallelize projection of multiple
training samples, wherein each thread projects and
computes the feature vector of a particular training
sample. Since the number of training samples is large,
the latter is adopted for the projection operation in
training phase. We have adopted the former in the
testing phase, where only one image has to be pro-
jected, details of which are explained in Section 4.2.
Before the projection kernel is called, the execution
configuration is set. The number of threads per block,
T1, is set to a standard of 256 and the total number of
blocks is B1, where B1 = (ceil) (N1/T1), N1 = total
number of training samples.
3
4. Each thread projects and computes the feature vec-
tor of a particular training sample by serially comput-
ing each element of the feature vector one by one.
Each element of the feature vector is obtained by
taking inner product of the training image vector
and the corresponding eigenvector in the eigenspace.
The training sample is normalized with the database
mean image element by element as it is fetched from
texture memory and the intermediate sum of the in-
ner product with eigenvector is stored in the shared
memory. After each element of the feature vector is
computed, the data is written back into the global
memory and the next element is computed. All the
data is aligned in a columnar fashion to avoid un-
coalesced memory accesses and shared memory bank
conflicts.
Figure 1: Threads in Projection Kernel (Training)
After the kernel has finished running on the device
the entire data is copied back to the host memory
and dumped as a binary file to be used in the testing
phase.
4.2 Testing Phase
4.2.1 Introduction
The testing process is completely run on the GPU
and is handled by three kernels. The first kernel
normalizes and projects it onto the eigenspace and
extracts the feature vector. The second kernel par-
allely computes the euclidean distance between the
feature vector of the test image and that of the train-
ing images. The final kernel, finds the minimum of
the euclidean distance and index of the training sam-
ple which yielded that minimum. The resized test
image, database mean image, eigenvectors and the
projected training samples are first copied to the de-
vice memory and the test image, mean image and
eigenvectors are bound as texture to take advantage
of the cached texture memory. Due to the relatively
larger size of the projected training samples data and
the limitation on maximum texture memory, the pro-
jected training samples are not bound as texture.
Figure 2: Recognition pipeline
4.2.2 Projection Kernel
As mentioned in section 4.1.2, the projection kernel
in testing process is parallelized to concurrently com-
pute each element of the feature vector. The number
of threads per block, T2, is set to a standard of 256
and the total number of blocks is B2, where B2 =
(ceil) (N2/T2), N2 = size of feature vector.
Each thread computes each element of the feature
vector, which is obtained by taking inner product of
the test image vector and the corresponding eigenvec-
tor in the eigenspace. The test image is normalized
with the database mean image, element by element
as it is fetched from texture memory and the inter-
mediate sum of the inner product with eigenvector is
4
5. stored in the shared memory.
Figure 3: Threads in Projection Kernel (Testing)
After the entire feature vector is computed, the
data is written back into global memory. The colum-
nar alignment of all the eigenvectors avoids unco-
alesced memory accesses and shared memory bank
conflicts.
4.2.3 Euclidean Distance Kernel
The kernel for computing the euclidean distance is
very similar to the projection kernel used in training
phase. Threads are launched to concurrently com-
pute the euclidean distance between the test image
feature vector and the training sample feature vec-
tors. The number of threads per block, T3, is set to
a standard of 256 and the total number of blocks is
B3, where B3 = (ceil) (N3/T3), N3 = total number
of training samples.
Each thread computes a particular euclidean dis-
tance serially. The difference of each element of the
vectors are computed, squared and summed. The in-
termediate sum is stored in shared memory. After all
the euclidean distances are computed, the data from
the shared memory is written to the global mem-
ory. The columnar alignment of training sample fea-
ture vectors avoids uncoalesced memory accesses and
shared memory bank conflicts.
Figure 4: Threads in Euclidean Distance Kernel
4.2.4 Minimum Kernel
The minimum kernel computes the minimum value
of euclidean distance and its distance. The vector
containing euclidean distances is divided into smaller
blocks and each thread serially finds the value and
index of the minimum in a particular block. The ker-
nel is called iteratively with fewer and fewer threads
till only one block is left. After execution of the ker-
nel the minimum value and its index is copid back
to host memory. The training sample at the index
computed by the kernel is the most probable match
for the test image.
5 Performance Statistics
To test the performance of CPU and GPU, 5 images
per subject of the ORL Database was selected as the
training set. This set of 200 images was then repli-
cated and concatenated to create databases of size
ranging from 1000 to 15000 images. The eigenvectors
corresponding to the 4 highest eigenvalues per class,
were selected for forming the eigenspace. This led
to feature vectors which grew in size as the database
grew. This replicated database was trained with CPU
and GPU and the execution time was noted and the
GPU speedup for the training process was calculated.
5
6. For testing, one image per subject from the ORL
Database were selected and the total time taken by
the CPU and GPU to test all 40 test images was
noted and was used to calculate the GPU speedup
for testing process.
To get accurate performance statistics, the training
and testing processes were run multiple times on dif-
ferent CPUs and GPUs. The following graphs were
plotted with the data obtained from the performance
tests. All the CPU times are based on single-core
performances.
Fig. 5 shows the time taken three different CPUs
to execute the projection process during training.
Figure 5: Training time for different CPUs
Fig. 6 shows the time taken by different NVIDIA
GPUs to execute the projection process during train-
ing. It includes the time taken for data transfers to
and from the device.
Figure 6: Training time for different GPUs
Fig. 7 shows the performance speedup of different
GPUs over Intel Core 2 Quad Q9550 CPU during
training databases of varying sizes.
Figure 7: Training Speedup
Fig. 8 shows total time taken by different CPUs
for testing 40 images.
6
7. Figure 8: Testing using GPU
Fig. 9 shows total time taken by GPUs to test 40
images. For this test, the trained database was copied
to device memory once and 40 images were tested
one by one. It includes time taken for transferring
test image to device and getting match index from
device.
Figure 9: Testing using GPU
Fig. 10 shows the performance speedup of differ-
ent GPUs over Intel Core 2 Quad Q9550 CPU when
testing 40 images.
Figure 10: Testing Speedup
Fig 11 shows the execution time of the recognition
pipeline on the GPU for varying database sizes.
Figure 11: Recognition pipeline on CPU
7
8. Fig. 12 shows the execution time of the recognition
pipeline on the GPU for varying database sizes. It is
the time taken to transfer test image to device, find
the match and transfer its index back to host.
Figure 12: Recognition Pipeline on GPU
Fig. 13 shows the performance speedup of differ-
ent GPUs over Intel Core 2 Quad Q9550 CPU when
executing the recognition pipeline.
Figure 13: Recognition Pipeline Speedup
6 Conclusion
The recognition rate of a PCA based face recognition
solution depends heavily on the exhaustiveness of the
training samples. Higher the number of training sam-
ples, higher the recognition rate. But as the num-
ber of training samples increases, CPUs get highly
strained and the training process will take several
minutes to complete (refer Fig. 5). But the same
process, when run on a GPU, will be completed in
a manner of seconds (refer Fig. 6). The highest
speedup achieved was 207x for training process, 330x
for the recognition pipeline and 165x for overall test-
ing process on the latest GeForce GTX 480 GPU, for
a database size of 15,000 images.
The execution time of the recognition pipeline on
the GPU is in the order of a few milli seconds even
for very large databases and and this allows the GPU
based testing to be integrated with real time video
and used for other applications involving large vol-
umes of test images. Our primary purpose in writ-
ing this paper is to make clear, the high perfor-
mance boosts that can be obtained by developing
GPU based face recognition solutions.
8
9. 7 Future Works
Our future plans on this field include the paralleliza-
tion of other face recognition algorithms like LDA
(Linear Discriminant Analysis) and to replace the eu-
clidean distance based matching process with a neu-
ral network based one. We feel that algorithms with
a high degree of parallelism in them, like neural net-
works, will benefit immensely, if implemented on the
GPU. We are also working on integrating the GPU
recognition pipeline with real time video.
9