This document summarizes a research paper that presents a real-time face detection algorithm using local binary patterns (LBP) as features. The algorithm is implemented on a GPU using OpenCL to achieve faster processing speeds compared to CPU implementations. LBP features are extracted from divided image blocks in parallel on the GPU. Histograms of the LBP codes from each block are concatenated to form an overall feature vector for classification. Experimental results show the GPU implementation achieves a 5x speed improvement over CPU for 640x480 images.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
A Robust Object Recognition using LBP, LTP and RLBPEditor IJMTER
The document proposes two new feature sets called Discriminative Robust Local Binary Pattern (DRLBP) and Discriminative Robust Local Ternary Pattern (DRLTP) for object recognition. It summarizes the drawbacks of existing features like Local Binary Pattern (LBP), Local Ternary Pattern (LTP), and Robust LBP (RLBP) that do not differentiate between weak and strong contrast patterns or brightness reversal. The proposed DRLBP and DRLTP combine edge and texture information into a single representation to better analyze objects and address the issues with prior features. They are designed to improve object recognition performance.
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYcsandit
This paper presents a parallel approach to improve the time complexity problem associated
with sequential algorithms. An image steganography algorithm in transform domain is
considered for implementation. Image steganography is a technique to hide secret message in
an image. With the parallel implementation, large message can be hidden in large image since
it does not take much processing time. It is implemented on GPU systems. Parallel
programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is
processed
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
The document discusses video compression using the Set Partitioning in Hierarchical Trees (SPIHT) algorithm and neural networks. It presents the principles of SPIHT coding and the backpropagation algorithm for neural networks. Various neural network training algorithms are tested for compressing video frames, including gradient descent with momentum and adaptive learning. The results show the compressed frames with different algorithms, and gradient descent with momentum and adaptive learning achieved the best compression ratio of 1.1737089:1 while maintaining image clarity.
Performance Analysis of Various Activation Functions in Generalized MLP Archi...Waqas Tariq
This document compares the performance of various activation functions in multilayer perceptron (MLP) neural networks. It analyzes MLP architectures using different activation functions, including bi-polar sigmoid, uni-polar sigmoid, hyperbolic tangent, conic section, and radial basis functions. Based on experiments, hyperbolic tangent performed the best in terms of accuracy, requiring fewer iterations than other functions to solve nonlinear problems. While conic section had the lowest training error, hyperbolic tangent produced the most accurate results during testing. In general, the hyperbolic tangent function achieved high accuracy and is a good choice for most MLP applications.
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORIJNSA Journal
In this paper, we propose an elliptic curve key generation processor over GF(2163) scheme based on the Montgomery scalar multiplication algorithm. The new architecture is performed using polynomial basis. The Finite Field operations use a cellular automata multiplier and Fermat algorithm for inversion. For real time implementation, the architecture has been tested on an ISE 9.1 Software using Xilinx Virtex II Pro FPGA and on an ASIC CMOS 45 nm technology as well. The proposed implementation provides a time of 2.07 ms and 38 percent of Slices in Xilinx Virtex II Pro FPGA. Such features reveal the high efficiently of this implementation design.
This document describes an FPGA-based human detection system with an embedded platform. Key points:
- The system uses HOG features, SVM classification, and AdaBoost algorithms for human detection in images and video.
- FPGA circuits are designed to accelerate the computationally intensive HOG feature extraction, including modules for gradient calculation, histogram accumulation, and more.
- The full system is implemented on an embedded platform to achieve a real-time human detection system running at 15 frames per second.
- Experimental results show the FPGA-based system has similar detection accuracy to a PC-based software implementation but significantly faster speed, suitable for real-time embedded applications.
This document discusses parallelizing graph algorithms on GPUs for optimization. It summarizes previous work on parallel Breadth-First Search (BFS), All Pair Shortest Path (APSP), and Traveling Salesman Problem (TSP) algorithms. It then proposes implementing BFS, APSP, and TSP on GPUs using optimization techniques like reducing data transfers between CPU and GPU and modifying the algorithms to maximize GPU computing power and memory usage. The paper claims this will improve performance and speedup over CPU implementations. It focuses on optimizing graph algorithms for parallel GPU processing to accelerate applications involving large graph analysis and optimization problems.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
A Robust Object Recognition using LBP, LTP and RLBPEditor IJMTER
The document proposes two new feature sets called Discriminative Robust Local Binary Pattern (DRLBP) and Discriminative Robust Local Ternary Pattern (DRLTP) for object recognition. It summarizes the drawbacks of existing features like Local Binary Pattern (LBP), Local Ternary Pattern (LTP), and Robust LBP (RLBP) that do not differentiate between weak and strong contrast patterns or brightness reversal. The proposed DRLBP and DRLTP combine edge and texture information into a single representation to better analyze objects and address the issues with prior features. They are designed to improve object recognition performance.
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYcsandit
This paper presents a parallel approach to improve the time complexity problem associated
with sequential algorithms. An image steganography algorithm in transform domain is
considered for implementation. Image steganography is a technique to hide secret message in
an image. With the parallel implementation, large message can be hidden in large image since
it does not take much processing time. It is implemented on GPU systems. Parallel
programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is
processed
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
The document discusses video compression using the Set Partitioning in Hierarchical Trees (SPIHT) algorithm and neural networks. It presents the principles of SPIHT coding and the backpropagation algorithm for neural networks. Various neural network training algorithms are tested for compressing video frames, including gradient descent with momentum and adaptive learning. The results show the compressed frames with different algorithms, and gradient descent with momentum and adaptive learning achieved the best compression ratio of 1.1737089:1 while maintaining image clarity.
Performance Analysis of Various Activation Functions in Generalized MLP Archi...Waqas Tariq
This document compares the performance of various activation functions in multilayer perceptron (MLP) neural networks. It analyzes MLP architectures using different activation functions, including bi-polar sigmoid, uni-polar sigmoid, hyperbolic tangent, conic section, and radial basis functions. Based on experiments, hyperbolic tangent performed the best in terms of accuracy, requiring fewer iterations than other functions to solve nonlinear problems. While conic section had the lowest training error, hyperbolic tangent produced the most accurate results during testing. In general, the hyperbolic tangent function achieved high accuracy and is a good choice for most MLP applications.
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORIJNSA Journal
In this paper, we propose an elliptic curve key generation processor over GF(2163) scheme based on the Montgomery scalar multiplication algorithm. The new architecture is performed using polynomial basis. The Finite Field operations use a cellular automata multiplier and Fermat algorithm for inversion. For real time implementation, the architecture has been tested on an ISE 9.1 Software using Xilinx Virtex II Pro FPGA and on an ASIC CMOS 45 nm technology as well. The proposed implementation provides a time of 2.07 ms and 38 percent of Slices in Xilinx Virtex II Pro FPGA. Such features reveal the high efficiently of this implementation design.
This document describes an FPGA-based human detection system with an embedded platform. Key points:
- The system uses HOG features, SVM classification, and AdaBoost algorithms for human detection in images and video.
- FPGA circuits are designed to accelerate the computationally intensive HOG feature extraction, including modules for gradient calculation, histogram accumulation, and more.
- The full system is implemented on an embedded platform to achieve a real-time human detection system running at 15 frames per second.
- Experimental results show the FPGA-based system has similar detection accuracy to a PC-based software implementation but significantly faster speed, suitable for real-time embedded applications.
This document discusses parallelizing graph algorithms on GPUs for optimization. It summarizes previous work on parallel Breadth-First Search (BFS), All Pair Shortest Path (APSP), and Traveling Salesman Problem (TSP) algorithms. It then proposes implementing BFS, APSP, and TSP on GPUs using optimization techniques like reducing data transfers between CPU and GPU and modifying the algorithms to maximize GPU computing power and memory usage. The paper claims this will improve performance and speedup over CPU implementations. It focuses on optimizing graph algorithms for parallel GPU processing to accelerate applications involving large graph analysis and optimization problems.
The objective of this paper is to present the hybrid approach for edge detection. Under this technique, edge
detection is performed in two phase. In first phase, Canny Algorithm is applied for image smoothing and in
second phase neural network is to detecting actual edges. Neural network is a wonderful tool for edge
detection. As it is a non-linear network with built-in thresholding capability. Neural Network can be trained
with back propagation technique using few training patterns but the most important and difficult part is to
identify the correct and proper training set.
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...Sunny Kr
Cardinality estimation has a wide range of applications and
is of particular importance in database systems. Various
algorithms have been proposed in the past, and the HyperLogLog algorithm is one of them
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
The Digital Image processing applications like medical imaging, satellite imaging, Biometric trait images
etc., rely on multipliers to improve the quality of image. However, existing multiplication techniques
introduce errors in the output with consumption of more time, hence error free high speed multipliers has
to be designed. In this paper we propose FPGA based Recursive Error Free Mitchell Log Multiplier
(REFMLM) for image Filters. The 2x2 error free Mitchell log multiplier is designed with zero error by
introducing error correction term is used in higher order Karastuba-Ofman Multiplier (KOM)
Architectures. The higher order KOM multipliers is decomposed into number of lower order multipliers
using radix 2 till basic multiplier block of order 2x2 which is designed by error free Mitchell log multiplier.
The 8x8 REFMLM is tested for Gaussian filter to remove noise in fingerprint image. The Multiplier is
synthesized using Spartan 3 FPGA family device XC3S1500-5fg320. It is observed that the performance
parameters such as area utilization, speed, error and PSNR are better in the case of proposed architecture
compared to existing architectures.
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...IRJET Journal
This document discusses the Iterative Closest Point (ICP) algorithm, which is commonly used for 3D shape registration. It first provides background on ICP and describes some variants like Comprehensive ICP and Trimmed ICP. It then focuses on the Comprehensive ICP algorithm, explaining how it uses a lookup matrix to ensure unique point correspondences between shapes. Finally, it introduces the Modified Hausdorff Distance metric for evaluating similarity between registered shapes, which is more robust than other metrics. The document aims to analyze ICP variant performance using this distance metric.
CycleGAN is a framework that uses generative adversarial networks to perform image-to-image translation without paired training examples. It consists of two generators that map images from one domain to another and back, along with two discriminators that classify images as real or fake. The generators are trained to translate images such that they are classified as real by the discriminators, while also remaining consistent when translated back to the original domain. The authors demonstrate it for the task of colorizing grayscale images without paired color-grayscale image samples.
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
Image compression technique is used in many applications for example, satellite imaging, medical imaging, video where the size of the iamge requires more space to store, in such application image compression effectively can be used. There are two types in image compression techniques Lossy and Lossless comression. Both these techniques are used for compression of images, but these techniques are not fast. The image compression techniques both lossy and lossless image compression techniques are not fast, they take more time for compression and decompression. For fast and efficient image compression a parallel computing technique is used in matlab. Matlab is used in this project for parallel computing of images. In this paper we will discuss Regular image compression technique, three alternatives of parallel computing using matlab, comparison of image compression with and without parallel computing.
Motion planning and controlling algorithm for grasping and manipulating movin...ijscai
Many of the robotic grasping researches have been focusing on stationary objects. And for dynamic moving
objects, researchers have been using real time captured images to locate objects dynamically. However,
this approach of controlling the grasping process is quite costly, implying a lot of resources and image
processing.Therefore, it is indispensable to seek other method of simpler handling… In this paper, we are
going to detail the requirements to manipulate a humanoid robot arm with 7 degree-of-freedom to grasp
and handle any moving objects in the 3-D environment in presence or not of obstacles and without using
the cameras. We use the OpenRAVE simulation environment, as well as, a robot arm instrumented with the
Barrett hand. We also describe a randomized planning algorithm capable of planning. This algorithm is an
extent of RRT-JT that combines exploration, using a Rapidly-exploring Random Tree, with exploitation,
using Jacobian-based gradient descent, to instruct a 7-DoF WAM robotic arm, in order to grasp a moving
target, while avoiding possible encountered obstacles . We present a simulation of a scenario that starts
with tracking a moving mug then grasping it and finally placing the mug in a determined position, assuring
a maximum rate of success in a reasonable time.
This document proposes a new image encryption scheme based on chaotic encryption. It provides a fast encryption algorithm using a pseudorandom key stream generator based on coupled chaotic maps. Only the most important image components identified using discrete wavelet transform are encrypted. Statistical analysis shows the encrypted images have uniform histograms and negligible pixel correlations, resisting cryptanalysis attacks. The partial encryption also reduces computation time for applications with bandwidth and power constraints like mobile devices.
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
MetaPerturb is a meta-learned perturbation function that can enhance generalization of neural networks on different tasks and architectures. It proposes a novel meta-learning framework involving jointly training a main model and perturbation module on multiple source tasks to learn a transferable perturbation function. This meta-learned perturbation function can then be transferred to improve performance of a target model on an unseen target task or architecture, outperforming baselines on various datasets and architectures.
Manifold learning with application to object recognitionzukun
This document discusses manifold learning techniques for dimensionality reduction that can uncover the intrinsic structure of high-dimensional data. It introduces Isomap and Locally Linear Embedding (LLE) as two popular manifold learning algorithms. Isomap uses graph-based distances to preserve global structure, while LLE aims to preserve local linear relationships between neighbors. Both techniques find low-dimensional embeddings that best represent the high-dimensional data. Manifold learning provides data compression and enables techniques like object recognition by discovering the underlying manifold structure.
Molecular dynamics (MD) is a very useful tool to understand various phenomena in atomistic detail. In MD, we can overcome the size- and time-scale problems by efficient parallelization. In this lecture, I’ll explain various parallelization methods of MD with some examples of GENESIS MD software optimization on Fugaku.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
This document summarizes a research paper about developing a new set of low-complexity features for detecting steganography in JPEG images. The proposed features, called DCTR features, are computed by taking the discrete cosine transform (DCT) of non-overlapping 8x8 blocks of the image, resulting in 64 feature maps. Histograms are formed from the quantized noise residuals in these feature maps. This approach has lower computational complexity than previous rich models used for steganalysis and provides competitive detection accuracy across different steganographic algorithms while using fewer features. The paper introduces the concept of an undecimated DCT and explains how it relates to previous work in JPEG steganalysis.
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is able to mesh automatically the simulation domain according to the propagation of fluids. This method can also be useful in order to perform several types of physical simulations. In this paper, we associate this
algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is
able to perform various types of simulations on complex geometries. The use of this algorithm combined
with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the
staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.
AN EFFICIENT FPGA IMPLEMENTATION OF MRI IMAGE FILTERING AND TUMOUR CHARACTERI...VLSICS Design
This paper presents an efficient architecture for various image filtering algorithms and tumor characterization using Xilinx System Generator (XSG). This architecture offers an alternative through a graphical user interface that combines MATLAB, Simulink and XSG and explores important aspectsconcerned to hardware implementation. Performance of this architecture implemented in SPARTAN-3E Starter kit (XC3S500E-FG320) exceeds those of similar or greater resources architectures. The proposed architecture reduces the resources available on target device by 50%.
The document is a report on implementing and testing a radial basis function neural network for clustering iris flower data. It introduces RBF networks and the methodology used, which involved locating RBF nodes as cluster centers, calculating Gaussian functions, training the RBF layer unsupervised and a perceptron layer supervised. Results show the network accurately clustered most iris flowers into the three expected categories when trained on the iris data set.
REAL TIME FACE DETECTION ON GPU USING OPENCLNARMADA NAIK
This paper presents a novel approach for real time face detection using heterogeneous
computing. The algorithm uses local binary pattern (LBP) as feature vector for face detection.
OpenCL is used to accelerate the code using GPU[1]. Illuminance invariance is achieved using
gamma correction and Difference of Gaussian(DOG) to make the algorithm robust against
varying lighting conditions. This implementation is compared with previous parallel
implementation and is found to perform faster.
Performance analysis of sobel edge filter on heterogeneous system using opencleSAT Publishing House
This document discusses performance analysis of the Sobel edge detection filter on heterogeneous systems using OpenCL. It begins with an introduction to OpenCL and describes its architecture, including the platform model, execution model, memory model, and programming model. It then provides an overview of GPUs and CPUs, comparing their architectures and number of cores. It also gives mathematical representations of image convolution and describes how the Sobel filter works. The document analyzes the performance of implementing Sobel edge detection using OpenCL on CPUs and GPUs and finds that GPUs provide much higher performance compared to CPUs.
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY cscpconf
This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not take much processing time. It is implemented on GPU systems. Parallel programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is processed
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
International Journal of Computational Engineering Research (IJCER) ijceronline
This document describes a system for implementing an artificial neuron using an FPGA. The system first converts analog signals from electrochemical sensors to digital signals using a 12-bit analog-to-digital converter (ADC). It then implements the mathematical operations of a neuron in digital logic on the FPGA, including multiplication, accumulation, and an activation function. Simulation and chipscope results are presented which verify the design and operation of the artificial neuron on the FPGA board. The system provides a modular design that could be expanded to create a complete artificial neural network for processing electrochemical sensor data.
The objective of this paper is to present the hybrid approach for edge detection. Under this technique, edge
detection is performed in two phase. In first phase, Canny Algorithm is applied for image smoothing and in
second phase neural network is to detecting actual edges. Neural network is a wonderful tool for edge
detection. As it is a non-linear network with built-in thresholding capability. Neural Network can be trained
with back propagation technique using few training patterns but the most important and difficult part is to
identify the correct and proper training set.
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...Sunny Kr
Cardinality estimation has a wide range of applications and
is of particular importance in database systems. Various
algorithms have been proposed in the past, and the HyperLogLog algorithm is one of them
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
The Digital Image processing applications like medical imaging, satellite imaging, Biometric trait images
etc., rely on multipliers to improve the quality of image. However, existing multiplication techniques
introduce errors in the output with consumption of more time, hence error free high speed multipliers has
to be designed. In this paper we propose FPGA based Recursive Error Free Mitchell Log Multiplier
(REFMLM) for image Filters. The 2x2 error free Mitchell log multiplier is designed with zero error by
introducing error correction term is used in higher order Karastuba-Ofman Multiplier (KOM)
Architectures. The higher order KOM multipliers is decomposed into number of lower order multipliers
using radix 2 till basic multiplier block of order 2x2 which is designed by error free Mitchell log multiplier.
The 8x8 REFMLM is tested for Gaussian filter to remove noise in fingerprint image. The Multiplier is
synthesized using Spartan 3 FPGA family device XC3S1500-5fg320. It is observed that the performance
parameters such as area utilization, speed, error and PSNR are better in the case of proposed architecture
compared to existing architectures.
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...IRJET Journal
This document discusses the Iterative Closest Point (ICP) algorithm, which is commonly used for 3D shape registration. It first provides background on ICP and describes some variants like Comprehensive ICP and Trimmed ICP. It then focuses on the Comprehensive ICP algorithm, explaining how it uses a lookup matrix to ensure unique point correspondences between shapes. Finally, it introduces the Modified Hausdorff Distance metric for evaluating similarity between registered shapes, which is more robust than other metrics. The document aims to analyze ICP variant performance using this distance metric.
CycleGAN is a framework that uses generative adversarial networks to perform image-to-image translation without paired training examples. It consists of two generators that map images from one domain to another and back, along with two discriminators that classify images as real or fake. The generators are trained to translate images such that they are classified as real by the discriminators, while also remaining consistent when translated back to the original domain. The authors demonstrate it for the task of colorizing grayscale images without paired color-grayscale image samples.
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
Image compression technique is used in many applications for example, satellite imaging, medical imaging, video where the size of the iamge requires more space to store, in such application image compression effectively can be used. There are two types in image compression techniques Lossy and Lossless comression. Both these techniques are used for compression of images, but these techniques are not fast. The image compression techniques both lossy and lossless image compression techniques are not fast, they take more time for compression and decompression. For fast and efficient image compression a parallel computing technique is used in matlab. Matlab is used in this project for parallel computing of images. In this paper we will discuss Regular image compression technique, three alternatives of parallel computing using matlab, comparison of image compression with and without parallel computing.
Motion planning and controlling algorithm for grasping and manipulating movin...ijscai
Many of the robotic grasping researches have been focusing on stationary objects. And for dynamic moving
objects, researchers have been using real time captured images to locate objects dynamically. However,
this approach of controlling the grasping process is quite costly, implying a lot of resources and image
processing.Therefore, it is indispensable to seek other method of simpler handling… In this paper, we are
going to detail the requirements to manipulate a humanoid robot arm with 7 degree-of-freedom to grasp
and handle any moving objects in the 3-D environment in presence or not of obstacles and without using
the cameras. We use the OpenRAVE simulation environment, as well as, a robot arm instrumented with the
Barrett hand. We also describe a randomized planning algorithm capable of planning. This algorithm is an
extent of RRT-JT that combines exploration, using a Rapidly-exploring Random Tree, with exploitation,
using Jacobian-based gradient descent, to instruct a 7-DoF WAM robotic arm, in order to grasp a moving
target, while avoiding possible encountered obstacles . We present a simulation of a scenario that starts
with tracking a moving mug then grasping it and finally placing the mug in a determined position, assuring
a maximum rate of success in a reasonable time.
This document proposes a new image encryption scheme based on chaotic encryption. It provides a fast encryption algorithm using a pseudorandom key stream generator based on coupled chaotic maps. Only the most important image components identified using discrete wavelet transform are encrypted. Statistical analysis shows the encrypted images have uniform histograms and negligible pixel correlations, resisting cryptanalysis attacks. The partial encryption also reduces computation time for applications with bandwidth and power constraints like mobile devices.
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
MetaPerturb is a meta-learned perturbation function that can enhance generalization of neural networks on different tasks and architectures. It proposes a novel meta-learning framework involving jointly training a main model and perturbation module on multiple source tasks to learn a transferable perturbation function. This meta-learned perturbation function can then be transferred to improve performance of a target model on an unseen target task or architecture, outperforming baselines on various datasets and architectures.
Manifold learning with application to object recognitionzukun
This document discusses manifold learning techniques for dimensionality reduction that can uncover the intrinsic structure of high-dimensional data. It introduces Isomap and Locally Linear Embedding (LLE) as two popular manifold learning algorithms. Isomap uses graph-based distances to preserve global structure, while LLE aims to preserve local linear relationships between neighbors. Both techniques find low-dimensional embeddings that best represent the high-dimensional data. Manifold learning provides data compression and enables techniques like object recognition by discovering the underlying manifold structure.
Molecular dynamics (MD) is a very useful tool to understand various phenomena in atomistic detail. In MD, we can overcome the size- and time-scale problems by efficient parallelization. In this lecture, I’ll explain various parallelization methods of MD with some examples of GENESIS MD software optimization on Fugaku.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
This document summarizes a research paper about developing a new set of low-complexity features for detecting steganography in JPEG images. The proposed features, called DCTR features, are computed by taking the discrete cosine transform (DCT) of non-overlapping 8x8 blocks of the image, resulting in 64 feature maps. Histograms are formed from the quantized noise residuals in these feature maps. This approach has lower computational complexity than previous rich models used for steganalysis and provides competitive detection accuracy across different steganographic algorithms while using fewer features. The paper introduces the concept of an undecimated DCT and explains how it relates to previous work in JPEG steganalysis.
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is able to mesh automatically the simulation domain according to the propagation of fluids. This method can also be useful in order to perform several types of physical simulations. In this paper, we associate this
algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is
able to perform various types of simulations on complex geometries. The use of this algorithm combined
with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the
staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.
AN EFFICIENT FPGA IMPLEMENTATION OF MRI IMAGE FILTERING AND TUMOUR CHARACTERI...VLSICS Design
This paper presents an efficient architecture for various image filtering algorithms and tumor characterization using Xilinx System Generator (XSG). This architecture offers an alternative through a graphical user interface that combines MATLAB, Simulink and XSG and explores important aspectsconcerned to hardware implementation. Performance of this architecture implemented in SPARTAN-3E Starter kit (XC3S500E-FG320) exceeds those of similar or greater resources architectures. The proposed architecture reduces the resources available on target device by 50%.
The document is a report on implementing and testing a radial basis function neural network for clustering iris flower data. It introduces RBF networks and the methodology used, which involved locating RBF nodes as cluster centers, calculating Gaussian functions, training the RBF layer unsupervised and a perceptron layer supervised. Results show the network accurately clustered most iris flowers into the three expected categories when trained on the iris data set.
REAL TIME FACE DETECTION ON GPU USING OPENCLNARMADA NAIK
This paper presents a novel approach for real time face detection using heterogeneous
computing. The algorithm uses local binary pattern (LBP) as feature vector for face detection.
OpenCL is used to accelerate the code using GPU[1]. Illuminance invariance is achieved using
gamma correction and Difference of Gaussian(DOG) to make the algorithm robust against
varying lighting conditions. This implementation is compared with previous parallel
implementation and is found to perform faster.
Performance analysis of sobel edge filter on heterogeneous system using opencleSAT Publishing House
This document discusses performance analysis of the Sobel edge detection filter on heterogeneous systems using OpenCL. It begins with an introduction to OpenCL and describes its architecture, including the platform model, execution model, memory model, and programming model. It then provides an overview of GPUs and CPUs, comparing their architectures and number of cores. It also gives mathematical representations of image convolution and describes how the Sobel filter works. The document analyzes the performance of implementing Sobel edge detection using OpenCL on CPUs and GPUs and finds that GPUs provide much higher performance compared to CPUs.
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY cscpconf
This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not take much processing time. It is implemented on GPU systems. Parallel programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is processed
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
International Journal of Computational Engineering Research (IJCER) ijceronline
This document describes a system for implementing an artificial neuron using an FPGA. The system first converts analog signals from electrochemical sensors to digital signals using a 12-bit analog-to-digital converter (ADC). It then implements the mathematical operations of a neuron in digital logic on the FPGA, including multiplication, accumulation, and an activation function. Simulation and chipscope results are presented which verify the design and operation of the artificial neuron on the FPGA board. The system provides a modular design that could be expanded to create a complete artificial neural network for processing electrochemical sensor data.
The document describes the design and development of a floating-point co-processor to accelerate graphics functions such as geometric transformations in OpenGL. It details the optimization of the architecture for executing nested loop programs common in graphics, including matrix-vector multiplication. The controller was optimized for executing typical geometric transformation algorithms through techniques like strength reduction and exploiting induction variables. The complete processor was tested hierarchically and demonstrated accelerating the model-view transformation in OpenGL.
An OpenCL Method of Parallel Sorting Algorithms for GPU ArchitectureWaqas Tariq
In this paper, we present a comparative performance analysis of different parallel sorting algorithms: Bitonic sort and Parallel Radix Sort. In order to study the interaction between the algorithms and architecture, we implemented both the algorithms in OpenCL and compared its performance with Quick Sort algorithm, the fastest algorithm. In our simulation, we have used Intel Core2Duo CPU 2.67GHz and NVidia Quadro FX 3800 as graphical processing unit.
This document summarizes a research paper that implemented Levenberg-Marquardt artificial neural network training using graphics processing unit (GPU) hardware acceleration. The key points are:
1) This appears to be the first description of implementing artificial neural networks using the Levenberg-Marquardt training method on a GPU.
2) The paper describes their approach for implementing the Levenberg-Marquardt algorithm on a GPU, which involves solving the matrix inversion operation that is typically computationally expensive.
3) Results show that training networks using the GPU implementation can be up to 10 times faster than using a CPU-only implementation on the same hardware.
On Implementation of Neuron Network(Back-propagation)Yu Liu
This document outlines Yu Liu's work implementing and comparing different parallel versions of a neural network using backpropagation. It discusses motivations for parallel programming practice and library study. It provides an introduction to neural networks and backpropagation algorithms. Three implementations are compared: sequential C++ STL, Skelton library, and Intel TBB. Benchmark results show improved speedups from parallel versions. Remaining challenges are also noted, like addressing local minima problems and testing on larger data.
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
This document discusses using OpenCL parallel programming to compute Latin squares of order 3 more efficiently than sequential algorithms. It proposes dividing the input matrix into sub-matrices that are processed concurrently by multiple processing elements in the GPU. This parallel approach reduces the computation time compared to performing the operations sequentially on the CPU. First, the input matrix is divided based on task or data parallelism. Then the sub-matrices are computed simultaneously by different processing elements. The results are combined and stored in GPU memory before being transferred to CPU memory and output. Implementing the Latin square computation with OpenCL exploits parallelism to improve efficiency over the traditional sequential approach.
IRJET- Digital Image Forgery Detection using Local Binary Patterns (LBP) and ...IRJET Journal
This document proposes a method to detect digital image forgeries using local binary patterns (LBP) and histogram of oriented gradients (HOG). It extracts LBP features from the input image, then applies HOG to the LBP features. These combined features are classified using a support vector machine (SVM) as authentic or tampered. Testing on CASIA datasets achieved detection rates of 92.3% for CASIA-1 and 96.1% for CASIA-2, outperforming other existing methods. The method is effective at forgery detection while having reduced time complexity.
International Refereed Journal of Engineering and Science (IRJES)irjes
International Refereed Journal of Engineering and Science (IRJES) is a leading international journal for publication of new ideas, the state of the art research results and fundamental advances in all aspects of Engineering and Science. IRJES is a open access, peer reviewed international journal with a primary objective to provide the academic community and industry for the submission of half of original research and applications
International Refereed Journal of Engineering and Science (IRJES)irjes
International Refereed Journal of Engineering and Science (IRJES) is a leading international journal for publication of new ideas, the state of the art research results and fundamental advances in all aspects of Engineering and Science. IRJES is a open access, peer reviewed international journal with a primary objective to provide the academic community and industry for the submission of half of original research and applications
This document discusses single-GPU and multi-GPU implementations of the MAD IQA algorithm to improve its computational performance. A single-GPU implementation achieved a 24x speedup over the CPU version, bringing the runtime down to 40ms. A multi-GPU implementation using 3 GPUs achieved an additional speedup of 33x over the CPU version, bringing the runtime to 28.9ms, but required many more data transfers between GPUs and system memory. While parallelizing tasks across GPUs improved performance, latency from data transfers between devices limited gains.
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...CSCJournals
The document compares the Levenberg-Marquardt and Scaled Conjugate Gradient algorithms for training a multilayer perceptron neural network for image compression. It finds that while both algorithms performed comparably in terms of accuracy and speed, the Levenberg-Marquardt algorithm achieved slightly better accuracy as measured by average training accuracy and mean squared error, while the Scaled Conjugate Gradient algorithm was faster as measured by average training iterations. The document compresses a standard test image called Lena using both algorithms and analyzes the results.
Real-Time Implementation and Performance Optimization of Local Derivative Pat...IJECEIAES
Pattern based texture descriptors are widely used in Content Based Image Retrieval (CBIR) for efficient retrieval of matching images. Local Derivative Pattern (LDP), a higher order local pattern operator, originally proposed for face recognition, encodes the distinctive spatial relationships contained in a local region of an image as the feature vector. LDP efficiently extracts finer details and provides efficient retrieval however, it was proposed for images of limited resolution. Over the period of time the development in the digital image sensors had paid way for capturing images at a very high resolution. LDP algorithm though very efficient in content-based image retrieval did not scale well when capturing features from such high-resolution images as it becomes computationally very expensive. This paper proposes how to efficiently extract parallelism from the LDP algorithm and strategies for optimally implementing it by exploiting some inherent General-Purpose Graphics Processing Unit (GPGPU) characteristics. By optimally configuring the GPGPU kernels, image retrieval was performed at a much faster rate. The LDP algorithm was ported on to Compute Unified Device Architecture (CUDA) supported GPGPU and a maximum speed up of around 240x was achieved as compared to its sequential counterpart.
This project required me to find the optimum architecture configuration that would run the 'eeg' benchmark application using the SimpleScalar simulator.
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSIRJET Journal
The document discusses face counting using OpenCV and Python by analyzing unusual events in crowds. It proposes using the Haar cascade algorithm for face detection and counting. Feature extraction is performed using gray-level co-occurrence matrix (GLCM) to extract texture and edge features. Discriminant analysis is then used to differentiate between samples accurately. The system aims to correctly detect and count faces in images using Python tools like OpenCV for digital image processing tasks and feature extraction algorithms like GLCM and discrete wavelet transform (DWT). It is intended to have good recognition accuracy compared to previous methods.
This document presents a benchmark for deep learning algorithms developed by identifying basic operations that account for most CPU usage. Three algorithms were implemented - sparse autoencoder, convolutional neural network, and FISTA optimization. The operations were abstracted into an API for easier optimization. Results showed the Theano GPU implementation was 3-15 times faster than Numpy. Challenges included choosing array dimensions and memory allocation to optimize performance. Convolution was identified as the most expensive operation for CNNs in terms of CPU usage.
OpenGL Based Testing Tool Architecture for Exascale ComputingCSCJournals
1) The document proposes an OpenGL based testing tool architecture for exascale computing to improve performance and accuracy of OpenGL programs.
2) It identifies common errors that occur when programming shaders in OpenGL Shading Language (GLSL) such as errors in file reading, compilation, linking, and rendering.
3) The proposed testing architecture divides the GLSL programming process into four stages - file reading, compilation, pre-linking/linking, and rendering - and validates each stage to detect errors and enforce error-free code.
Similar to Real Time Face Detection on GPU Using OPENCL (20)
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
2. 442
Computer Science & Information Technology (CS & IT)
Section 4 discusses the implementation details on GPU. Experimental results are shown in section
5. In Section 6, we give a brief conclusion of the paper.
2. HETEROGENEOUS COMPUTING WITH OPENCL
OpenCL[3] is an industry standard writing parallel programs targeting heterogeneous platforms.
In this section a brief overview of heterogeneous computing with OpenCL programming model is
given. Programming model of OpenCL is classified into 4 models[4] Platform model, Execution
model, Memory model, Programming model.
2.1. platform model
The OpenCL platform model defines a high-level representation of any heterogeneous platform
used with OpenCL. This model is shown in the Fig 1. The host can be connected to one or more
OpenCL devices(DSP, FPGA, GPU, CPU etc), the device is where the kernel execute. OpenCL
devices are further divided into compute units which are further divided into processing
elements(PEs), and computation occurs in this PEs. Each PEs is used to execute an SIMD.
Fig. 1. OpenCL Platform Model .
2.2. Execution model
OpenCL execution consist of two parts - host program and collection of kernels. OpenCL
abstracts away the exact steps for processing of kernel on various platforms like CPU, GPU etc.
Kernels execute on OpenCL devices, they are simple functions that transforms input memory
object into output memory objects. OpenCL defines two types of kernel, OpenCL kernels and
Native kernels.
Execution of kernel on a OpenCL device:
1.
2.
3.
4.
Kernel is defined in the host,
Host issues a command for execution of kernel on OpenCL device,
As a result OpenCL runtime creates an index space.
An instance of the kernel is called work item, which is defined by the coordinates in the
indexspace (NDRange) as shown in Fig 2.
3. Computer Science & Information Technology (CS & IT)
443
Fig. 2. Block diagram of NDRange.
2.3. Memory model
OpenCL defines two types of memory objects, buffer objects and image objects. Buffer object is
a contiguous block of memory made available to the kernel, whereas image buffers are restricted
to holding images. To use the image buffer format OpenCL device should support it. In this paper
buffer object is used for face detection. OpenCL memory model defines five memory region:
•
•
•
•
•
Host memory: This memory is visible to host only.
Global memory: This memory region permits read/write to all work items in all the
work groups.
Local memory: This memory region, local to the work group, can be accessed by work
items within the work group.
Constant memory: This region of global memory remains constant during execution of
the kernel. Workitems have read only access to these objects.
Private memory: This memory region is private to a work item i.e variables defined
private in one work item are not visible to the other work item. Block diagram of memory
model is shown in Fig 3.
Fig. 3. Block diagram for memory model
4. 444
Computer Science & Information Technology (CS & IT)
2.4. Programming model
Programming model is where the programmer will parallelize the algorithm. OpenCL is designed
both for data and task parallelism. In this paper we have used data parallelism which will be
discussed in section 4.
Basic work flow of an application in OpenCL frame work is shown below in block diagram Fig.4
FIG. 4. WORK FLOW DIAGRAM OF OPENCL.
Here we start with the host program that defines the context. The context contains two OpenCL
devices, a CPU and a GPU. Next the command queue is defined, one for GPU and the other for
CPU. Host program then defines the program object to compile and generate the kernel object
both for OpenCL devices. After that host program defines the memory object required by the
program and maps them to the arguments of the kernel. Finally the host program enqueue the
commands to the command queue to execute the kernels and then the results are read back.
3. OVERVIEW OF THE ALGORITHM
In this section it is discussed about how the image is captured from camera and converted to grey
scale used gamma operation and DOG operation in preprocessing[1] and LBP feature extraction,
Histogram and classifier.
3.1 Face detection using LBP:
LBP operator labels the pixel by thresholding the 3x3 neighbourhood of each pixel with the
center value. This generic formulation of the operator puts no limitations to the size of the
neighbourhood [5]. Here each pixel is compared to its 8 neighbours (on its left-top, left-middle,
left bottom, right-top, right-middle, right-bottom) followed in clockwise direction. Wherever the
center pixel value is greater than the neighbour write 1 to the corresponding neighbour pixel
otherwise write 0. This gives an 8-digit binary number as shown in Fig.5. This 8-digit binary
number is then converted to a decimal.
5. Computer Science & Information Technology (CS & IT)
445
Fig. 5. LPB Thresholding
After getting the LBP of the block , histogram of each block is calculated in parallel and is
concatenated as shown in Fig.6. This gives the feature vector used for training the classifier. LBP
operator[6] is defined as
where gp is the intensity of the image at the pth sample point where p is the total number of the
sample point at a radius of R denoted by(P,R). The P spaced sampling points of the window are
used to calculate the difference between center gc and its surrounding pixel[5]. The feature vector
of the image obtained after cascading the histogram is,
where k is an integer to represent the sub histogram that is obtained from each block k=1,2...K. K
is the total no of histograms ,and f(x,y) =
calculated value at pixel (x,y).
Fig. 6. LBP in Face detection
where f(x; y) is the LBP
6. 446
Computer Science & Information Technology (CS & IT)
3.2. classifier
There are many methods to determine the dissimilarity between the LBP pattern, here chi-square
method is used and further work is going on with SVM for training and classifying the LBP
feature. The chi-square distance used to measure the dissimilarity between two LBP images S and
M is given by
where L is the length of the feature vector of the image and Sx and Mx are respectively the sample
and model image in the respective bin.
4. IMPLEMENTATION
The goal of this paper is to implement LBP algorithm on a cpu, gpu based heterogeneous
platform using OpenCL, to reduce computation time. The process of LBP feature extraction and
histogram calculation from the image is computationally expensive (N2xW2, where NxN is size of
image and WxW is size of LBP block) and it is easy to figure out that the extraction in different
parts are independent as discussed in section 3. Thus it can be efficiently parallelized [7].
For real time implementation, first the image is captured using OpenCV and is converted to a
grey scale image. Then preprocessing of the image is done. To figure out different features in the
image, various algorithms are implemented on the image. For parallel processing of the
algorithm, the image is subdivided into smaller parts. In this case the image is divided into 16 x
16 pixels blocks. The task of calculating LBP for each block is given to work items. So there are
different work items processing different blocks in parallel. Each work item processes 256 pixels
and different work items work in parallel, reducing the processing time up to a significant extent.
So, to calculate the LBP of 16x16 pixels blocks the image is converted to an one dimensional
array. A global reference of the image is used by each work item for creating one 18 x 18 two
dimensional matrix to find out the LBP for each block. From the 18 x 18 matrix, LBP is
calculated as discussed in section 3 for 16x16 pixels not considering the boundary pixel of 18 x
18 matrix . Afterwards the calculated values for LBP are processed to form a histogram ranging
from 0 to 255. Each work item formulates a histogram accordingly for one block and the different
histograms are cascaded to form the actual histogram for the complete image in order to get the
LBP feature vector as shown in Fig.7. This feature vector is then classified using nearest
neighbourhood method.
Fig. 7. Histogram Calculation on GPU
7. Computer Science & Information Technology (CS & IT)
447
The Block Diagram Of The Overall Method Is Shown In Fig.8
Fig. 8. Block diagram of the overall method
5. RESULTS
In the proposed paper we have calculated each block in different compute units. Since the
calculation of histogram depends on all the pixels within a block thus it is better to do the whole
calculation within one compute unit. Additionally, the amount of computation per compute unit
shouldn’t be too small otherwise the overhead associated with managing a compute unit will be
more than the actual computation. Since the whole computation is done in the GPU and only the
input image and the final histogram are transferred between the CPU and GPU thus overheads
associated with data transfer are minimal. As a result the computational time is 20 ms. The
performance table of the implementation is shown in Table 1.
Table 1. Performance Table
Input Resolution
Sub Histograms
CPU(i5 3rd generation)
AMD(7670M)
640x480
256
109 ms
20 ms
Table 2. Comparison With Previous Work
PREVIOUS WORK[8]
Image size
Sub Histograms
Feature Extraction
OUR WORK
512x512
256
36.3 ms
640x480
256
20 ms
Compared to previous work[8], the image is grabbed from camera and processed in real time. As
can be seen from TABLE 2, computational time for our implementation is less. Performance of
the proposed paper is tested on AMD 7670M GPU, i5 3rd generation CPU based system. Total
time for feature extraction on CPU was 108 ms and on GPU was 20 ms for 640X480 resolution
input image. Thus we get a 5x improvement in speed using GPU implementation.
8. 448
Computer Science & Information Technology (CS & IT)
6. CONCLUSION
In this paper real time face detection using LBP feature extraction is done and is classified using
nearest neighbourhood method. We have parallelized the existing LBP algorithm to make it
suitable for implementation on SIMD architecture such as GPGPU. Performance gain has been
achieved over previous implementations.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Xiaoyang Tan; Triggs, B., ”Enhanced Local Texture Feature Sets for Face Recognition Under
Difficult Lighting Conditions,” Image Processing, IEEE Transactions on , vol.19, no.6, pp.1635,1650,
June 2010 doi: 10.1109/TIP.2010.2042645
Computational Imaging and Vision, Vol. 40 Pietikinen, M., Hadid, A.,Zhao, G., Ahonen, T. 2011,
XVI, 212 p.
KHRONOS:OpenCLoverviewwebpage, http://www.khronos.org/opencl/,2009.
Aaftab Munshi, Benedict Gaster, Timothy G. Mattson, James Fung, Dan Ginsburg ISBN: 978-0-3217
4964-2
TY - JOUR T1 - Facial expression recognition based on Local Binary Patterns: A comprehensive
study JO - Image and Vision Computing VL - 27 IS - 6 SP - 803 EP - 816 PY - 2009/5/4/ T2 - AU Shan, Caifeng AU - Gong, Shaogang AU - McOwan, Peter W. SN - 0262-8856 DO
http://dx.doi.org/10.1016/j.imavis.2008.08.005
UR-http://www.sciencedirect.com/science/article/pii/S0262885608001844 KW - Facial expression
recognition KW - Local Binary Patterns KW - Support vector machine KW - Adaboost KW – Linear
discriminant analysis KW - Linear programming ER
Caifeng Shan; Shaogang Gong; McOwan, Peter W., ”Robust facial expression recognition using local
binary patterns,” Image Processing, 2005. ICIP 2005. IEEE International Conference on , vol.2, no.,
pp.II,370-3, 11-14 Sept. 2005 doi: 10.1109/ICIP.2005.1530069
Miguel Bordallo Lpez ; Henri Nyknen ; Jari Hannuksela ; Olli Silvn and Markku Vehvilinen
”Accelerating image recognition on mobile devices using GPGPU”, Proc. SPIE 7872, Parallel
Processing for Imaging Applications, 78720R (January 25, 2011); doi:10.1117/12.872860;
http://dx.doi.org/10.1117/12.872860
Parallel Implementation of LBP Based Face recognition on GPU Using OpenCL Dwith, C.Y.N. ;
Rathna, G.N. Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012
13th International Conference on Digital Object Identifier: 10.1109/PDCAT.2012.107 Publication
Year: 2012 , Page(s): 755 - 760