In recent years, with the development of graphics processors, graphics cards have been widely
used to perform general-purpose calculations. Especially with release of CUDA C
programming languages in 2007, most of the researchers have been used CUDA C
programming language for the processes which needs high performance computing.
In this paper, a scaling approach for image segmentation using level sets is carried out by the
GPU programming techniques. Approach to level sets is mainly based on the solution of partial
differential equations. The proposed method does not require the solution of partial differential
equation. Scaling approach, which uses basic geometric transformations, is used. Thus, the
required computational cost reduces. The use of the CUDA programming on the GPU has taken
advantage of classic programming as spending time and performance. Thereby results are
obtained faster. The use of the GPU has provided to enable real-time processing. The developed
application in this study is used to find tumor on MRI brain images.
n recent years, with the development of graphics p
rocessors, graphics cards have been widely
used to perform general-purpose calculations. Espec
ially with release of CUDA C
programming languages in 2007, most of the research
ers have been used CUDA C
programming language for the processes which needs
high performance computing.
In this paper, a scaling approach for image segment
ation using level sets is carried out by the
GPU programming techniques. Approach to level sets
is mainly based on the solution of partial
differential equations. The proposed method does no
t require the solution of partial differential
equation. Scaling approach, which uses basic geomet
ric transformations, is used. Thus, the
required computational cost reduces. The use of the
CUDA programming on the GPU has taken
advantage of classic programming as spending time a
nd performance. Thereby results are
obtained faster. The use of the GPU has provided to
enable real-time processing. The developed
application in this study is used to find tumor on
MRI brain images.
This document discusses single-GPU and multi-GPU implementations of the MAD IQA algorithm to improve its computational performance. A single-GPU implementation achieved a 24x speedup over the CPU version, bringing the runtime down to 40ms. A multi-GPU implementation using 3 GPUs achieved an additional speedup of 33x over the CPU version, bringing the runtime to 28.9ms, but required many more data transfers between GPUs and system memory. While parallelizing tasks across GPUs improved performance, latency from data transfers between devices limited gains.
This document discusses accelerating the seam carving algorithm for image resizing using CUDA (Compute Unified Device Architecture) on a GPU (graphics processing unit). Seam carving is a content-aware image resizing technique that identifies paths of least importance (seams) through an image that can be removed or inserted to change the image size. The document explains that seam carving involves large matrix calculations that can be significantly accelerated by implementing them in parallel on a CUDA-enabled GPU. It presents the seam carving algorithm, describes CUDA and GPU architecture, proposes implementing seam carving using CUDA to achieve speed-ups, and concludes that parallelizing seam carving calculations on a GPU exploits its massive parallelism for faster execution compared to
This document describes a hardware implementation of the discrete cosine transform (DCT) using an FPGA for image compression. It presents the theory behind DCT and describes implementing a 2D DCT algorithm using a Lee algorithm on an FPGA. Experimental results show the FPGA implementation achieves a maximum 8% error compared to MATLAB and uses only 14% of FPGA resources while allowing real-time processing for video compression.
Cuda Based Performance Evaluation Of The Computational Efficiency Of The Dct ...acijjournal
Recent advances in computing such as the massively parallel GPUs (Graphical Processing Units),coupled
with the need to store and deliver large quantities of digital data especially images, has brought a number
of challenges for Computer Scientists, the research community and other stakeholders. These challenges,
such as prohibitively large costs to manipulate the digital data amongst others, have been the focus of the
research community in recent years and has led to the investigation of image compression techniques that
can achieve excellent results. One such technique is the Discrete Cosine Transform, which helps separate
an image into parts of differing frequencies and has the advantage of excellent energy-compaction.
This paper investigates the use of the Compute Unified Device Architecture (CUDA) programming model
to implement the DCT based Cordic based Loeffler algorithm for efficient image compression. The
computational efficiency is analyzed and evaluated under both the CPU and GPU. The PSNR (Peak Signal
to Noise Ratio) is used to evaluate image reconstruction quality in this paper. The results are presented
and discussed
This document summarizes a research paper that proposes a parallel molecular dynamics simulation algorithm for carbon nanotubes (CNTs) using GPU computing. The algorithm divides the CNT system into computational blocks that can be processed independently and in parallel on the GPU. Experimental results showed the GPU-based algorithm achieved over 10 times speedup compared to a serial CPU implementation, demonstrating the effectiveness of the parallel approach for simulating large CNT systems. Keywords included CNTs, molecular dynamics simulation, CUDA, parallel computing, and performance improvements.
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
Image reconstruction is a process of obtaining the original image from corrupted data.Applications of image reconstruction include Computer Tomography, radar imaging, weather forecasting etc. Recently steering kernel regression method has been applied for image reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is computationally intensive. Secondly, output of the algorithm suffers form spurious edges(especially in case of denoising). We propose a modified version of Steering Kernel Regression called as Median Based Parallel Steering Kernel Regression Technique. In the proposed algorithm the first problem is overcome by implementing it in on GPUs and multi-cores. The second problem is addressed by a gradient based suppression in which median filter is used.Our algorithm gives better output than that of the Steering Kernel Regression. The results are compared using Root Mean Square Error(RMSE). Our algorithm has also shown a speedup of 21x using GPUs and shown speedup of 6x using multi-cores.
Median based parallel steering kernel regression for image reconstructioncsandit
Image reconstruction is a process of obtaining the original image from corrupted data.
Applications of image reconstruction include Computer Tomography, radar imaging, weather
forecasting etc. Recently steering kernel regression method has been applied for image
reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is
computationally intensive. Secondly, output of the algorithm suffers form spurious edges
(especially in case of denoising). We propose a modified version of Steering Kernel Regression
called as Median Based Parallel Steering Kernel Regression Technique. In the proposed
algorithm the first problem is overcome by implementing it in on GPUs and multi-cores. The
second problem is addressed by a gradient based suppression in which median filter is used.
Our algorithm gives better output than that of the Steering Kernel Regression. The results are
compared using Root Mean Square Error(RMSE). Our algorithm has also shown a speedup of
21x using GPUs and shown speedup of 6x using multi-cores.
n recent years, with the development of graphics p
rocessors, graphics cards have been widely
used to perform general-purpose calculations. Espec
ially with release of CUDA C
programming languages in 2007, most of the research
ers have been used CUDA C
programming language for the processes which needs
high performance computing.
In this paper, a scaling approach for image segment
ation using level sets is carried out by the
GPU programming techniques. Approach to level sets
is mainly based on the solution of partial
differential equations. The proposed method does no
t require the solution of partial differential
equation. Scaling approach, which uses basic geomet
ric transformations, is used. Thus, the
required computational cost reduces. The use of the
CUDA programming on the GPU has taken
advantage of classic programming as spending time a
nd performance. Thereby results are
obtained faster. The use of the GPU has provided to
enable real-time processing. The developed
application in this study is used to find tumor on
MRI brain images.
This document discusses single-GPU and multi-GPU implementations of the MAD IQA algorithm to improve its computational performance. A single-GPU implementation achieved a 24x speedup over the CPU version, bringing the runtime down to 40ms. A multi-GPU implementation using 3 GPUs achieved an additional speedup of 33x over the CPU version, bringing the runtime to 28.9ms, but required many more data transfers between GPUs and system memory. While parallelizing tasks across GPUs improved performance, latency from data transfers between devices limited gains.
This document discusses accelerating the seam carving algorithm for image resizing using CUDA (Compute Unified Device Architecture) on a GPU (graphics processing unit). Seam carving is a content-aware image resizing technique that identifies paths of least importance (seams) through an image that can be removed or inserted to change the image size. The document explains that seam carving involves large matrix calculations that can be significantly accelerated by implementing them in parallel on a CUDA-enabled GPU. It presents the seam carving algorithm, describes CUDA and GPU architecture, proposes implementing seam carving using CUDA to achieve speed-ups, and concludes that parallelizing seam carving calculations on a GPU exploits its massive parallelism for faster execution compared to
This document describes a hardware implementation of the discrete cosine transform (DCT) using an FPGA for image compression. It presents the theory behind DCT and describes implementing a 2D DCT algorithm using a Lee algorithm on an FPGA. Experimental results show the FPGA implementation achieves a maximum 8% error compared to MATLAB and uses only 14% of FPGA resources while allowing real-time processing for video compression.
Cuda Based Performance Evaluation Of The Computational Efficiency Of The Dct ...acijjournal
Recent advances in computing such as the massively parallel GPUs (Graphical Processing Units),coupled
with the need to store and deliver large quantities of digital data especially images, has brought a number
of challenges for Computer Scientists, the research community and other stakeholders. These challenges,
such as prohibitively large costs to manipulate the digital data amongst others, have been the focus of the
research community in recent years and has led to the investigation of image compression techniques that
can achieve excellent results. One such technique is the Discrete Cosine Transform, which helps separate
an image into parts of differing frequencies and has the advantage of excellent energy-compaction.
This paper investigates the use of the Compute Unified Device Architecture (CUDA) programming model
to implement the DCT based Cordic based Loeffler algorithm for efficient image compression. The
computational efficiency is analyzed and evaluated under both the CPU and GPU. The PSNR (Peak Signal
to Noise Ratio) is used to evaluate image reconstruction quality in this paper. The results are presented
and discussed
This document summarizes a research paper that proposes a parallel molecular dynamics simulation algorithm for carbon nanotubes (CNTs) using GPU computing. The algorithm divides the CNT system into computational blocks that can be processed independently and in parallel on the GPU. Experimental results showed the GPU-based algorithm achieved over 10 times speedup compared to a serial CPU implementation, demonstrating the effectiveness of the parallel approach for simulating large CNT systems. Keywords included CNTs, molecular dynamics simulation, CUDA, parallel computing, and performance improvements.
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
Image reconstruction is a process of obtaining the original image from corrupted data.Applications of image reconstruction include Computer Tomography, radar imaging, weather forecasting etc. Recently steering kernel regression method has been applied for image reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is computationally intensive. Secondly, output of the algorithm suffers form spurious edges(especially in case of denoising). We propose a modified version of Steering Kernel Regression called as Median Based Parallel Steering Kernel Regression Technique. In the proposed algorithm the first problem is overcome by implementing it in on GPUs and multi-cores. The second problem is addressed by a gradient based suppression in which median filter is used.Our algorithm gives better output than that of the Steering Kernel Regression. The results are compared using Root Mean Square Error(RMSE). Our algorithm has also shown a speedup of 21x using GPUs and shown speedup of 6x using multi-cores.
Median based parallel steering kernel regression for image reconstructioncsandit
Image reconstruction is a process of obtaining the original image from corrupted data.
Applications of image reconstruction include Computer Tomography, radar imaging, weather
forecasting etc. Recently steering kernel regression method has been applied for image
reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is
computationally intensive. Secondly, output of the algorithm suffers form spurious edges
(especially in case of denoising). We propose a modified version of Steering Kernel Regression
called as Median Based Parallel Steering Kernel Regression Technique. In the proposed
algorithm the first problem is overcome by implementing it in on GPUs and multi-cores. The
second problem is addressed by a gradient based suppression in which median filter is used.
Our algorithm gives better output than that of the Steering Kernel Regression. The results are
compared using Root Mean Square Error(RMSE). Our algorithm has also shown a speedup of
21x using GPUs and shown speedup of 6x using multi-cores.
Orthogonal Matching Pursuit in 2D for Java with GPGPU ProspectivesMatt Simons
This document summarizes a project that implemented the Orthogonal Matching Pursuit algorithm in two dimensions (OMP2D) in Java and created an ImageJ plugin to apply it. It discusses the algorithm, details the Java implementation and optimizations, and proposes methods for accelerating it using GPUs. The author created a fully functional OMP2D ImageJ plugin with good performance compared to other implementations. The open source software and documentation are publicly available. The document outlines how further speed improvements could be achieved through mass parallelization on GPUs.
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...IOSRJVSP
Most of the digital image processing application uses various domain transformation technique to convert time domain information to transform domain which will help to simplify the mathematical modeling. Discrete Wavelet Transform is one of the best transformation techniques. The time-frequency resolution makes this transform sensitive to both time and frequency which will give very good compression and decompression. In this paper, we propose FPGA implementation of multiplier-less CDF-5/3 wavelet transform for image processing application using System-Generator tool.To maintain low area and high frequency we use multiplier-less architecture for CDF-5/3 DWT for our implementation. The VHDL code for multiplier-less structure is fed to system generator tool using standard procedure and synthesis the structure to get the area and frequency
HARDWARE/SOFTWARE CO-DESIGN OF A 2D GRAPHICS SYSTEM ON FPGAijesajournal
This document describes the hardware/software co-design of a 2D graphics system implemented on an FPGA. It discusses the hardware design which includes developing Bresenham and BitBLT IP cores to accelerate computationally intensive 2D graphics operations. It also discusses the software design which includes graphics drivers and APIs running on a CPU core to initialize and manage the graphics creation process by driving the IP cores. The system is aimed to benefit low-end embedded applications by providing reconfigurable 2D graphics capabilities on FPGA.
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
This document summarizes the performance analysis of a resin infusion flow modeling application on two multiple CPU/GPU computing systems (System A and System B). The application models resin flow during liquid composite molding and involves solving large sparse linear equation systems using an iterative preconditioned conjugate gradient method. The performance of the application is analyzed on the two systems which have different CPU and GPU specifications. Key factors like local CPU-GPU communication and their effect on overall performance are examined.
Performance analysis of real-time and general-purpose operating systems for p...IJECEIAES
In general, modern operating systems can be divided into two essential parts, real-time operating systems (RTOS) and general-purpose operating systems (GPOS). The main difference between GPOS and RTOS is the system is time-critical or not. It means that; in GPOS, a high-priority thread cannot preempt a kernel call. But, in RTOS, a low-priority task is preempted by a high-priority task if necessary, even if it’s executing a kernel call. Most Linux distributions can be used as both GPOS and RTOS with kernel modifications. In this study, two Linux distributions, Ubuntu and Pardus, were analyzed and their performances were compared both as GPOS and RTOS for path planning of the multi-robot systems. Robot groups with different numbers of members were used to perform the path tracking tasks using both Ubuntu and Pardus as GPOS and RTOS. In this way, both the performance of two different Linux distributions in robotic applications were observed and compared in two forms, GPOS, and RTOS.
Real-time traffic sign detection and recognition using Raspberry Pi IJECEIAES
This document presents a real-time traffic sign detection and recognition system developed using a Raspberry Pi 3 processor. The system uses a Raspberry Pi camera to record real-time video and the TensorFlow machine learning algorithm to detect and identify traffic signs based on a dataset of 500 labeled images across 5 sign classes. The system's accuracy, delay, and reliability were evaluated during testbed implementation considering different environmental and sign conditions. Results showed the system achieved over 90% accuracy on average with a maximum detection delay of 3.44 seconds, demonstrating reliable performance even for broken, faded, or low-light signs. This real-time traffic sign recognition system developed with affordable hardware has potential to increase road safety.
Performance and Analysis of Video Compression Using Block Based Singular Valu...IJMER
This document presents an analysis of low-complexity video compression using block-based singular value decomposition (SVD) algorithms. It begins with an introduction to video compression and its importance for reducing storage and transmission costs. Current video compression standards like MPEG and H.26x are computationally expensive, making them unsuitable for real-time applications. The document then discusses block SVD algorithms as an alternative that can provide higher quality compression at lower computational complexity. It analyzes reducing the time complexity of video compression using block SVD and compares it to other compression methods. The document outlines the SVD decomposition process and how a 2D version can be applied to groups of image blocks for more efficient compression than 1D SVD.
This document summarizes a research paper that implemented Levenberg-Marquardt artificial neural network training using graphics processing unit (GPU) hardware acceleration. The key points are:
1) This appears to be the first description of implementing artificial neural networks using the Levenberg-Marquardt training method on a GPU.
2) The paper describes their approach for implementing the Levenberg-Marquardt algorithm on a GPU, which involves solving the matrix inversion operation that is typically computationally expensive.
3) Results show that training networks using the GPU implementation can be up to 10 times faster than using a CPU-only implementation on the same hardware.
Design and development of DrawBot using image processing IJECEIAES
Extracting text from an image and reproducing them can often be a laborious task. We took it upon ourselves to solve the problem. Our work is aimed at designing a robot which can perceive an image shown to it and reproduce it on any given area as directed. It does so by first taking an input image and performing image processing operations on the image to improve its readability. Then the text in the image is recognized by the program. Points for each letter are taken, then inverse kinematics is done for each point with MATLAB/Simulink and the angles in which the servo motors should be moved are found out and stored in the Arduino. Using these angles, the control algorithm is generated in the Arduino and the letters are drawn.
On the Computational Performance of Single-GPU and Multi-GPU Implementations of the MAD IQA Algorithm
The document describes implementations of the MAD image quality assessment algorithm on single and multiple GPUs. The single GPU version achieved speedups over a CPU implementation by parallelizing kernels. The multi-GPU version further improved performance by distributing tasks across GPUs, hiding PCIe transfer latency through asynchronous memory copies. Experimental results showed the multi-GPU version was 30% faster than the single GPU implementation due to improved GPU utilization and overlap of computation and data transfers.
Novel hybrid framework for image compression for supportive hardware design o...IJECEIAES
Performing the image compression over the resource constrained hardware is quite a challenging task. Although, there has been various approaches being carried out towards image compression considering the hardware aspect of it, but still there are problems associated with the memory acceleration associated with the entire operation that downgrade the performance of the hardware device. Therefore, the proposed approach presents a cost effective image compression mechanism which offers lossless compression using a unique combination of the non-linear filtering, segmentation, contour detection, followed by the optimization. The compression mechanism adapts analytical approach for significant image compression. The execution of the compression mechanism yields faster response time, reduced mean square error, improved signal quality and significant compression ratio performance.
This document discusses using CUDA on GPUs to accelerate map projection calculations. It presents a method for implementing the Universal Transverse Mercator projection on a GPU using CUDA. Experiments show the GPU implementation provides a 6-8x speedup over a CPU version when including data transfer times, and a 70-90x speedup when only considering calculation times. Two task assignment approaches are evaluated, with striped partitioning performing slightly better than a matrix distribution method. Future work is proposed to implement other GIS algorithms on GPUs to take advantage of the significant speed increases possible.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
This document summarizes a survey on GPU systems and their performance on different applications. It discusses how GPUs can be used for general purpose computing due to their high parallel processing capabilities. Several computational intensive applications that achieve speedups when implemented on GPUs are described, including video decoding, matrix multiplication, parallel AES encryption, and password recovery for MS office documents. The GPU architecture and Nvidia's CUDA programming model are also summarized. While GPUs provide significant performance benefits, some limitations for non-graphics applications are noted. The conclusion is that GPUs are a good alternative for computational intensive tasks to reduce CPU load and improve performance compared to CPU-only implementations.
Medical imaging computing based on graphical processing units for high perfor...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative study to realize an automatic speaker recognition system IJECEIAES
This document presents a comparative study between an adaptive orthogonal transform method and mel-frequency cepstral coefficients (MFCCs) for automatic speaker recognition. The adaptive orthogonal transform method uses an adaptive operator to extract informative features from input speech signals with minimum dimensions. Experimental results show the adaptive orthogonal transform method achieved 96.8% accuracy using Fourier transform and 98.1% accuracy using correlation, outperforming MFCCs which achieved 49.3% and 53.1% accuracy respectively. The proposed method successfully identified speakers with a recognition rate of 98.1% compared to 53.1% for MFCCs, demonstrating the efficiency of the adaptive orthogonal transform approach.
The document discusses visualization systems and proposes concepts for their future development. It summarizes:
1) The "Visual Realityware" visualization software development environment, which uses an abstraction layer to allow developers to freely select mainstream graphics technologies and expand applications across multiple platforms with minimal bugs.
2) An application called "Virtual Anatomia" developed using Visual Realityware to visualize 3D biological data in real-time.
3) The concept of "Visionize" which is defined as a risk management methodology using visual communication to allow sharing of goals and visions in order to identify and prevent risks before issues arise.
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...csandit
In this paper, we present the implemented denoising section in the coding strategy of cochlear
implants, the technique used is the technique of wavelet bionic BWT (Bionic Wavelet
Transform). We have implemented the algorithm for denoising Raise the speech signal by the
hybrid method BWT in the FPGA (Field Programmable Gate Array), Xilinx (Virtex5
XC5VLX110T). In our study, we considered the following: at the beginning, we present how to
demonstrate features of this technique. We present an algorithm implementation we proposed,
we present simulation results and the performance of this technique in terms of improvement of
the SNR (Signal to Noise Ratio). The proposed implementations are realized in VHDL (Very
high speed integrated circuits Hardware Description Language). Different algorithms for
speech processing, including CIS (Continuous Interleaved Sampling) have been implemented
the strategy in this processor and tested successfully.
Este documento presenta un portafolio de proyectos de un estudiante de ingeniería llamado Yandry Apolo Renda. Incluye su currículum vitae con información personal y estudios realizados, así como varios diarios de campo sobre temas como la definición y clasificación de problemas, estrategias para resolver problemas, y el uso de tablas y representaciones gráficas para analizar problemas con múltiples variables.
Antecedentes y posturas filosoficas keny galindezKeny Kira'
Este documento presenta resúmenes de diferentes sistemas de organización laboral a través de la historia como el sistema tribal, el sistema de esclavitud, el sistema feudal, el sistema gremial y el sistema industrial. También describe diferentes posturas filosóficas sobre el trabajo como las posturas católicas, clásico-tradicionales, marxistas y postmodernas.
La Unión Europea ha anunciado nuevas sanciones contra Rusia por su invasión de Ucrania. Las sanciones incluyen prohibiciones de viaje y congelamiento de activos para más funcionarios rusos, así como restricciones a las importaciones de productos rusos de acero y tecnología. Los líderes de la UE esperan que estas medidas adicionales aumenten la presión sobre Rusia para poner fin a su guerra contra Ucrania.
Orthogonal Matching Pursuit in 2D for Java with GPGPU ProspectivesMatt Simons
This document summarizes a project that implemented the Orthogonal Matching Pursuit algorithm in two dimensions (OMP2D) in Java and created an ImageJ plugin to apply it. It discusses the algorithm, details the Java implementation and optimizations, and proposes methods for accelerating it using GPUs. The author created a fully functional OMP2D ImageJ plugin with good performance compared to other implementations. The open source software and documentation are publicly available. The document outlines how further speed improvements could be achieved through mass parallelization on GPUs.
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...IOSRJVSP
Most of the digital image processing application uses various domain transformation technique to convert time domain information to transform domain which will help to simplify the mathematical modeling. Discrete Wavelet Transform is one of the best transformation techniques. The time-frequency resolution makes this transform sensitive to both time and frequency which will give very good compression and decompression. In this paper, we propose FPGA implementation of multiplier-less CDF-5/3 wavelet transform for image processing application using System-Generator tool.To maintain low area and high frequency we use multiplier-less architecture for CDF-5/3 DWT for our implementation. The VHDL code for multiplier-less structure is fed to system generator tool using standard procedure and synthesis the structure to get the area and frequency
HARDWARE/SOFTWARE CO-DESIGN OF A 2D GRAPHICS SYSTEM ON FPGAijesajournal
This document describes the hardware/software co-design of a 2D graphics system implemented on an FPGA. It discusses the hardware design which includes developing Bresenham and BitBLT IP cores to accelerate computationally intensive 2D graphics operations. It also discusses the software design which includes graphics drivers and APIs running on a CPU core to initialize and manage the graphics creation process by driving the IP cores. The system is aimed to benefit low-end embedded applications by providing reconfigurable 2D graphics capabilities on FPGA.
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
This document summarizes the performance analysis of a resin infusion flow modeling application on two multiple CPU/GPU computing systems (System A and System B). The application models resin flow during liquid composite molding and involves solving large sparse linear equation systems using an iterative preconditioned conjugate gradient method. The performance of the application is analyzed on the two systems which have different CPU and GPU specifications. Key factors like local CPU-GPU communication and their effect on overall performance are examined.
Performance analysis of real-time and general-purpose operating systems for p...IJECEIAES
In general, modern operating systems can be divided into two essential parts, real-time operating systems (RTOS) and general-purpose operating systems (GPOS). The main difference between GPOS and RTOS is the system is time-critical or not. It means that; in GPOS, a high-priority thread cannot preempt a kernel call. But, in RTOS, a low-priority task is preempted by a high-priority task if necessary, even if it’s executing a kernel call. Most Linux distributions can be used as both GPOS and RTOS with kernel modifications. In this study, two Linux distributions, Ubuntu and Pardus, were analyzed and their performances were compared both as GPOS and RTOS for path planning of the multi-robot systems. Robot groups with different numbers of members were used to perform the path tracking tasks using both Ubuntu and Pardus as GPOS and RTOS. In this way, both the performance of two different Linux distributions in robotic applications were observed and compared in two forms, GPOS, and RTOS.
Real-time traffic sign detection and recognition using Raspberry Pi IJECEIAES
This document presents a real-time traffic sign detection and recognition system developed using a Raspberry Pi 3 processor. The system uses a Raspberry Pi camera to record real-time video and the TensorFlow machine learning algorithm to detect and identify traffic signs based on a dataset of 500 labeled images across 5 sign classes. The system's accuracy, delay, and reliability were evaluated during testbed implementation considering different environmental and sign conditions. Results showed the system achieved over 90% accuracy on average with a maximum detection delay of 3.44 seconds, demonstrating reliable performance even for broken, faded, or low-light signs. This real-time traffic sign recognition system developed with affordable hardware has potential to increase road safety.
Performance and Analysis of Video Compression Using Block Based Singular Valu...IJMER
This document presents an analysis of low-complexity video compression using block-based singular value decomposition (SVD) algorithms. It begins with an introduction to video compression and its importance for reducing storage and transmission costs. Current video compression standards like MPEG and H.26x are computationally expensive, making them unsuitable for real-time applications. The document then discusses block SVD algorithms as an alternative that can provide higher quality compression at lower computational complexity. It analyzes reducing the time complexity of video compression using block SVD and compares it to other compression methods. The document outlines the SVD decomposition process and how a 2D version can be applied to groups of image blocks for more efficient compression than 1D SVD.
This document summarizes a research paper that implemented Levenberg-Marquardt artificial neural network training using graphics processing unit (GPU) hardware acceleration. The key points are:
1) This appears to be the first description of implementing artificial neural networks using the Levenberg-Marquardt training method on a GPU.
2) The paper describes their approach for implementing the Levenberg-Marquardt algorithm on a GPU, which involves solving the matrix inversion operation that is typically computationally expensive.
3) Results show that training networks using the GPU implementation can be up to 10 times faster than using a CPU-only implementation on the same hardware.
Design and development of DrawBot using image processing IJECEIAES
Extracting text from an image and reproducing them can often be a laborious task. We took it upon ourselves to solve the problem. Our work is aimed at designing a robot which can perceive an image shown to it and reproduce it on any given area as directed. It does so by first taking an input image and performing image processing operations on the image to improve its readability. Then the text in the image is recognized by the program. Points for each letter are taken, then inverse kinematics is done for each point with MATLAB/Simulink and the angles in which the servo motors should be moved are found out and stored in the Arduino. Using these angles, the control algorithm is generated in the Arduino and the letters are drawn.
On the Computational Performance of Single-GPU and Multi-GPU Implementations of the MAD IQA Algorithm
The document describes implementations of the MAD image quality assessment algorithm on single and multiple GPUs. The single GPU version achieved speedups over a CPU implementation by parallelizing kernels. The multi-GPU version further improved performance by distributing tasks across GPUs, hiding PCIe transfer latency through asynchronous memory copies. Experimental results showed the multi-GPU version was 30% faster than the single GPU implementation due to improved GPU utilization and overlap of computation and data transfers.
Novel hybrid framework for image compression for supportive hardware design o...IJECEIAES
Performing the image compression over the resource constrained hardware is quite a challenging task. Although, there has been various approaches being carried out towards image compression considering the hardware aspect of it, but still there are problems associated with the memory acceleration associated with the entire operation that downgrade the performance of the hardware device. Therefore, the proposed approach presents a cost effective image compression mechanism which offers lossless compression using a unique combination of the non-linear filtering, segmentation, contour detection, followed by the optimization. The compression mechanism adapts analytical approach for significant image compression. The execution of the compression mechanism yields faster response time, reduced mean square error, improved signal quality and significant compression ratio performance.
This document discusses using CUDA on GPUs to accelerate map projection calculations. It presents a method for implementing the Universal Transverse Mercator projection on a GPU using CUDA. Experiments show the GPU implementation provides a 6-8x speedup over a CPU version when including data transfer times, and a 70-90x speedup when only considering calculation times. Two task assignment approaches are evaluated, with striped partitioning performing slightly better than a matrix distribution method. Future work is proposed to implement other GIS algorithms on GPUs to take advantage of the significant speed increases possible.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
This document summarizes a survey on GPU systems and their performance on different applications. It discusses how GPUs can be used for general purpose computing due to their high parallel processing capabilities. Several computational intensive applications that achieve speedups when implemented on GPUs are described, including video decoding, matrix multiplication, parallel AES encryption, and password recovery for MS office documents. The GPU architecture and Nvidia's CUDA programming model are also summarized. While GPUs provide significant performance benefits, some limitations for non-graphics applications are noted. The conclusion is that GPUs are a good alternative for computational intensive tasks to reduce CPU load and improve performance compared to CPU-only implementations.
Medical imaging computing based on graphical processing units for high perfor...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative study to realize an automatic speaker recognition system IJECEIAES
This document presents a comparative study between an adaptive orthogonal transform method and mel-frequency cepstral coefficients (MFCCs) for automatic speaker recognition. The adaptive orthogonal transform method uses an adaptive operator to extract informative features from input speech signals with minimum dimensions. Experimental results show the adaptive orthogonal transform method achieved 96.8% accuracy using Fourier transform and 98.1% accuracy using correlation, outperforming MFCCs which achieved 49.3% and 53.1% accuracy respectively. The proposed method successfully identified speakers with a recognition rate of 98.1% compared to 53.1% for MFCCs, demonstrating the efficiency of the adaptive orthogonal transform approach.
The document discusses visualization systems and proposes concepts for their future development. It summarizes:
1) The "Visual Realityware" visualization software development environment, which uses an abstraction layer to allow developers to freely select mainstream graphics technologies and expand applications across multiple platforms with minimal bugs.
2) An application called "Virtual Anatomia" developed using Visual Realityware to visualize 3D biological data in real-time.
3) The concept of "Visionize" which is defined as a risk management methodology using visual communication to allow sharing of goals and visions in order to identify and prevent risks before issues arise.
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...csandit
In this paper, we present the implemented denoising section in the coding strategy of cochlear
implants, the technique used is the technique of wavelet bionic BWT (Bionic Wavelet
Transform). We have implemented the algorithm for denoising Raise the speech signal by the
hybrid method BWT in the FPGA (Field Programmable Gate Array), Xilinx (Virtex5
XC5VLX110T). In our study, we considered the following: at the beginning, we present how to
demonstrate features of this technique. We present an algorithm implementation we proposed,
we present simulation results and the performance of this technique in terms of improvement of
the SNR (Signal to Noise Ratio). The proposed implementations are realized in VHDL (Very
high speed integrated circuits Hardware Description Language). Different algorithms for
speech processing, including CIS (Continuous Interleaved Sampling) have been implemented
the strategy in this processor and tested successfully.
Este documento presenta un portafolio de proyectos de un estudiante de ingeniería llamado Yandry Apolo Renda. Incluye su currículum vitae con información personal y estudios realizados, así como varios diarios de campo sobre temas como la definición y clasificación de problemas, estrategias para resolver problemas, y el uso de tablas y representaciones gráficas para analizar problemas con múltiples variables.
Antecedentes y posturas filosoficas keny galindezKeny Kira'
Este documento presenta resúmenes de diferentes sistemas de organización laboral a través de la historia como el sistema tribal, el sistema de esclavitud, el sistema feudal, el sistema gremial y el sistema industrial. También describe diferentes posturas filosóficas sobre el trabajo como las posturas católicas, clásico-tradicionales, marxistas y postmodernas.
La Unión Europea ha anunciado nuevas sanciones contra Rusia por su invasión de Ucrania. Las sanciones incluyen prohibiciones de viaje y congelamiento de activos para más funcionarios rusos, así como restricciones a las importaciones de productos rusos de acero y tecnología. Los líderes de la UE esperan que estas medidas adicionales aumenten la presión sobre Rusia para poner fin a su guerra contra Ucrania.
ESSENTIAL MODIFICATIONS ON BIOGEOGRAPHY-BASED OPTIMIZATION ALGORITHMcsandit
Biogeography-based optimization (BBO) is a new population-based evolutionary algorithm and
is based on an old theory of island biogeography that explains the geographical distribution of
biological organisms. BBO was introduced in 2008 and then a lot of modifications were
employed to enhance its performance. This paper proposes two modifications; firstly,
modifying the probabilistic selection process of the migration and mutation stages to give a
fairly randomized selection for all the features of the islands. Secondly, the clear duplication
process after the mutation stage is sized to avoid any corruption on the suitability index
variables. The obtained results through wide variety range of test functions with different
dimensions and complexities proved that the BBO performance can be enhanced effectively
without using any complicated form of the immigration and emigration rates. This essential
modification has to be considered as an initial step for any other modification.
This document appears to be about color palettes and silhouettes for baby and toddler products in 2016. It contains the title "2016S/SBaby-Toddler" and mentions Vivian Warman, suggesting it is a design document focused on colors and shapes for babies and toddlers in a particular year.
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONcsandit
The document proposes a new text categorization framework called LATTICE-CELL that is based on concepts lattice and cellular automata. It models concept structures using a Cellular Automaton for Symbolic Induction (CASI) in order to reduce the time complexity of categorization caused by concept lattices. The framework consists of a preprocessing module to create a vector representation of documents and a categorization module that generates the categorization model by representing the concept lattice structure as a cellular lattice. Experiments show the approach improves performance while reducing categorization time compared to other algorithms such as Naive Bayes and k-nearest neighbors.
This infographic shows car insurance rates across the world. It gives country wise car insurance rates and how Ontario stacks up all these countries. Also it gives the factors that contribute to such high rates and how they are to be brought down.
Dokumen tersebut merangkum langkah-langkah pembuatan gantungan kunci, yaitu 1) mempersiapkan bahan seperti kain, benang, lem, 2) menggambar pola pada kain dan memotongnya, 3) menempel pola menggunakan lem, 4) menjahit sisi pola dan membuat lubang untuk memasukkan kapas, 5) memasang gantungan kunci menggunakan jarum dan lem.
El documento describe un estudio sobre cómo el control de presupuesto influye en el mejoramiento de la gestión de ventas de la empresa Distribuidora Raíces S.A.C. El estudio encontró que el control de presupuesto dentro de la empresa tuvo una influencia significativa en mejorar la gestión de ventas, debido a que el personal tiene experiencia en elaboración presupuestaria y el sistema de control permite aplicar medidas correctivas de manera oportuna. La comparación de las ventas reales con las presupuestadas también mostró que el control de presupuesto permite que las distint
SMWSantiago - Marcelo del Pino y Dana Hermosilladanidron
El documento habla sobre un evento llamado #SMWSantiago que se celebró del 16 al 20 de noviembre de 2015. El evento se centró en los adolescentes y las celebridades. Se discutió que los adolescentes ahora son más empoderados, participativos y selectivos con el contenido, y tienen más acceso a la información, lo que los hace más influyentes. También se mencionó que los influenciadores ahora son íconos para grupos con intereses comunes y ayudan a formar comunidades.
КОМЕРЦІАЛІЗАЦІЯ ІННОВАЦІЙ В УКРАЇНІ: ПРОБЛЕМИ ТА ПЕРСПЕКТИВИAlex Grebeshkov
Насібович Аліна
ФЕтаУ, 4 курс, ЕЕП-408, a.nasibovych@gmail.com
(науковий керівник: Данильченко С.С., магістр, ст.викладач)
КОМЕРЦІАЛІЗАЦІЯ ІННОВАЦІЙ В УКРАЇНІ: ПРОБЛЕМИ ТА ПЕРСПЕКТИВИ
Насибович Алина
КОММЕРЦИАЛИЗАЦИЯ ИННОВАЦИЙ В УКРАИНЕ: ПРОБЛЕМЫ И ПЕСРПЕКТИВЫ
Nasibovych Alina
COMMERCIALIZATION OF INNOVATIONS IN UKRAINE: PROBLEMS AND PROSPECTS
http://conference.spkneu.org/2015/12/komertsializatsiya-innovatsij-v-ukrayini-problemi-ta-perspektivi/
This document provides a summary of the housing market outlook for Edmonton, Canada. It finds that total housing starts will decrease in 2016 and 2017 after increasing in 2015. Specifically, single-detached housing starts will decline in 2015-2016 and rise modestly in 2017, while multi-family starts will rise in 2015 before falling in 2016-2017. MLS home sales are projected to decrease in 2015 before posting small gains in 2016-2017. The apartment vacancy rate is expected to rise over the forecast period as supply outpaces demand.
Validacion de un instrumento para medir la calidad de servicio en programa de...Universidad de Santander
Este artículo reporta los resultados de un estudio para validar un instrumento para medir la calidad
de servicio en programas universitarios. Tiene como base, el modelo SERVQUAL, adaptado a la
realidad universitaria, y considerando solo las percepciones de la calidad de servicio.
HUMAN BODY DETECTION AND SAFETY CARE SYSTEM FOR A FLYING ROBOTcsandit
Image-processing is one the challenging issue in robotic as well as electrical engineering
research contexts. This study proposes a system for extract and tracking objects by a
quadcopter’s flying robot and how to extract the human body. It is observed in image taken
from real-time camera that is embedded bottom of the quadcopter, there is a variance in human
behaviour being tracked or recorded such as position and, size, of the human. In the regard, the
paper tries to investigate an image-processing method for tracking humans’ body, concurrently.
For this process, an extraction method, which defines features to distinguish a human body, is
proposed. The proposed method creates a virtual shape of bodies for recognizing the body of
humans, also, generate an extractor according to its edge information. This method shows
better performance in term of precision as well as speed experimentally.
El documento contiene una serie de reflexiones sobre temas como el amor, la amistad, la actitud mental, la naturaleza y el medio ambiente. Algunas de las ideas principales son que dejar de querer a alguien es cuestionarse si lo quieres, que la actitud mental puede superar cualquier circunstancia, y que la verdadera amistad acepta a los amigos tal como son.
Campos de Investigación y aplicación de de las NTIC en el proceso de enseñanz...Anthony Gomez
Este documento discute el campo de aprendizaje y aplicación de las TIC en el proceso de enseñanza-aprendizaje. Explica que el proceso de enseñanza-aprendizaje debe cubrir la distancia entre la situación actual y la solución deseada, logrando un cambio de comportamiento en el alumno. También describe cómo las TIC pueden integrarse en la educación superior de una manera que respete la idiosincrasia de cada institución, y cómo el modelo pedagógico constructivista enfatiza la construcción activa del conocimiento
GPU-BASED IMAGE SEGMENTATION USING LEVEL SET METHOD WITH SCALING APPROACHcscpconf
In recent years, with the development of graphics processors, graphics cards have been widely used to perform general-purpose calculations. Especially with release of CUDA C
programming languages in 2007, most of the researchers have been used CUDA C programming language for the processes which needs high performance computing. In this paper, a scaling approach for image segmentation using level sets is carried out by the GPU programming techniques. Approach to level sets is mainly based on the solution of partial
differential equations. The proposed method does not require the solution of partial differential equation. Scaling approach, which uses basic geometric transformations, is used. Thus, the
required computational cost reduces. The use of the CUDA programming on the GPU has taken advantage of classic programming as spending time and performance. Thereby results are obtained faster. The use of the GPU has provided to enable real-time processing. The developed application in this study is used to find tumor on MRI brain images.
Abstract
The purpose of this review paper is to show the difference between executing the seam carving algorithm using sequential approach on a traditional CPU (central processing unit) and using parallel approach on a modern CUDA (compute unified device architecture) enabled GPU (graphics processing unit). Seam Carving is a content-aware image resizing method proposed by Avidan and Shamir of MERL.[1] It functions by identifying seams, or paths of least importance, through an image. These seams can either be removed or inserted in order to change the size of the image. It is determined that the success of this algorithm depends on a lot of factors: the number of objects in the picture, the size of monotonous background and the energy function. The purpose of the algorithm is to reduce image distortion in applications where images cannot be displayed at their original size. CUDA is a parallel architecture for GPUs, developed in the year 2007 by the Nvidia Corporation. Besides their primary function i.e. rendering of graphics, GPUs can also be used for general purpose computing (GPGPU). CUDA enabled GPU helps its user to harness massive parallelism in regular computations. If an algorithm can be made parallel, the use of GPUs significantly improves the performance and reduces the load of the central processing units (CPUs). The implementation of seam carving uses massive matrix calculations which could be performed in parallel to achieve speed ups in the execution of the algorithm as a whole. The entire algorithm itself cannot be run in parallel, and so some part of the algorithm mandatorily needs a CPU for performing sequential computations.
Keywords: Seam Carving, CUDA, Parallel Processing, GPGPU, CPU, GPU, Parallel Computing.
Image Processing Application on Graphics processorsCSCJournals
In this work, we introduce real time image processing techniques using modern programmable Graphic Processing Units GPU. GPU are SIMD (Single Instruction, Multiple Data) device that is inherently data-parallel. By utilizing NVIDIA new GPU programming framework, “Compute Unified Device Architecture” CUDA as a computational resource, we realize significant acceleration in image processing algorithm computations. We show that a range of computer vision algorithms map readily to CUDA with significant performance gains. Specifically, we demonstrate the efficiency of our approach by a parallelization and optimization of image processing, Morphology applications and image integral.
The document discusses the evolution of GPU architecture and capabilities over time. It describes how GPUs have become massively parallel processors with programmable capabilities beyond just graphics. The document outlines the core components of a GPU including the graphics pipeline and programming model. It also discusses how GPUs are well suited for parallel, data-intensive applications and how their capabilities have expanded into general purpose computing through technologies like CUDA.
Satellite image processing is an intricate task that requires vast computation and data processing, which cannot
be handled by a single computer. Furthermore, the processing of the massive amount of data accumulated by
the satellite is a huge challenge for the end user. Hence, grid computing is the essential platform to provide high
computing performance at the user end. This article reviews the grid services used for satellite image processing
and significant data processing.
This document provides an overview of a project that implemented image filtering using VHDL on an FPGA board. It discusses designing filters like average, Sobel, Gaussian, and Laplacian filters. Cache memory and a processing unit were developed to hold pixel values and apply filter kernels. Different methods for multiplication in the convolution process were evaluated. Results showed the output images after applying each filter both in software and on the FPGA board. In conclusion, FPGAs provide reconfigurable, accelerated processing for image applications like filtering compared to general purpose computers.
SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. For efficient construction of large maps searching the best-matching unit is usually the computationally heaviest operation in the SOM. The parallel nature of the algorithm and the huge computations involved makes it a good target for GPU based parallel implementation. This paper presents an overall idea of the optimization strategies used for the parallel implementation of Basic-SOM on GPU using CUDA programming paradigm.
A Novel Image Compression Approach Inexact Computingijtsrd
This work proposes a novel approach for digital image processing that relies on faulty computation to address some of the issues with discrete cosine transformation DCT compression. The proposed system has three processing stages the first employs approximated DCT for picture compression to eliminate all compute demanding floating point multiplication and to execute DCT processing with integer additions and, in certain cases, logical right left modifications. The second level reduces the amount of data that must be processed from the first level by removing frequencies that cannot be perceived by human senses. Finally, in order to reduce power consumption and delay, the third stage employs erroneous circuit level adders for DCT computation. A collection of structured pictures is compressed for measurement using the suggested three level method. Various figures of merit such as energy consumption, delay, power signal to noise ratio, average difference, and absolute maximum difference are compared to current compression techniques an error analysis is also carried out to substantiate the simulation findings. The results indicate significant gains in energy and time reduction while retaining acceptable accuracy levels for image processing applications. Sonam Kumari | Manish Rai "A Novel Image Compression Approach-Inexact Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-6 , October 2022, URL: https://www.ijtsrd.com/papers/ijtsrd52197.pdf Paper URL: https://www.ijtsrd.com/engineering/electrical-engineering/52197/a-novel-image-compression-approachinexact-computing/sonam-kumari
Graphics processing units (GPUs) are increasingly being used for general-purpose computing applications due to their highly parallel and programmable nature. GPU computing uses the GPU alongside the CPU in a heterogeneous model, with the sequential CPU portion handling control flow and passing data to the GPU for parallel intensive computations. GPUs have evolved from fixed-function processors into fully programmable parallel processors. Many applications that require large amounts of parallelism and throughput can benefit from offloading work to the GPU. GPU architectures provide a high degree of parallelism through multiple stream processors that can execute the same instructions on different data sets. Software environments like CUDA and OpenCL allow general-purpose programming of GPUs for applications beyond graphics. Future improvements may include
Graphics Processing Unit GPU is a processor or electronic chip for graphics. GPUs are massively parallel processors used widely used for 3D graphic and many non graphic applications. As the demand for graphics applications increases, GPU has become indispensable. The use of GPUs has now matured to a point where there are countless industrial applications. This paper provides a brief introduction on GPUs, their properties, and their applications. Matthew N. O. Sadiku | Adedamola A. Omotoso | Sarhan M. Musa "Graphics Processing Unit: An Introduction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29647.pdf Paper URL: https://www.ijtsrd.com/engineering/electrical-engineering/29647/graphics-processing-unit-an-introduction/matthew-n-o-sadiku
A Review on Image Compression in Parallel using CUDAIJERD Editor
Now a days images are prodigiously and sizably voluminous in size. So, this size is not facilely fits in applications. For that image compression is require. Image Compression algorithms are more resource conserving. It takes more time to consummate the task of compression. Utilizing Parallel implementation of the compression algorithm this quandary can be overcome. CUDA (Compute Unified Device Architecture) Provides parallel execution for algorithm utilizing the multi-threading. CUDA is NVIDIA`s parallel computing platform. CUDA uses GPU (Graphical Processing Unit) for the parallel execution. GPU have the number of the cores for parallel execution support. Image compression can additionally implemented in parallel utilizing CUDA. There are number of algorithms for image compression. Among them DWT (Discrete Wavelet Transform) is best suited for parallel implementation due to its more mathematical calculation and good compression result compare to other methods. In this paper included different parallel techniques for image compression. With the actualizing this image compression algorithm over the GPU utilizing CUDA it will perform the operations in parallel. In this way, vast diminish in processing time is conceivable. Furthermore it is conceivable to enhance the execution of image compression algorithms.
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY cscpconf
This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not take much processing time. It is implemented on GPU systems. Parallel programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is processed
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYcsandit
This paper presents a parallel approach to improve the time complexity problem associated
with sequential algorithms. An image steganography algorithm in transform domain is
considered for implementation. Image steganography is a technique to hide secret message in
an image. With the parallel implementation, large message can be hidden in large image since
it does not take much processing time. It is implemented on GPU systems. Parallel
programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is
processed
This document evaluates the performance of parallel computing systems using Pthreads and CUDA. It discusses:
1) Pthreads and CUDA are techniques for parallel processing on CPUs and GPUs respectively that can improve system performance over increasing clock frequency alone.
2) The paper assesses the performance behavior of Pthreads and CUDA in different conditions, finding CUDA provides better performance.
3) Suggestions are provided for optimizing performance with Pthreads and CUDA parallel programming approaches.
Real-Time Implementation and Performance Optimization of Local Derivative Pat...IJECEIAES
Pattern based texture descriptors are widely used in Content Based Image Retrieval (CBIR) for efficient retrieval of matching images. Local Derivative Pattern (LDP), a higher order local pattern operator, originally proposed for face recognition, encodes the distinctive spatial relationships contained in a local region of an image as the feature vector. LDP efficiently extracts finer details and provides efficient retrieval however, it was proposed for images of limited resolution. Over the period of time the development in the digital image sensors had paid way for capturing images at a very high resolution. LDP algorithm though very efficient in content-based image retrieval did not scale well when capturing features from such high-resolution images as it becomes computationally very expensive. This paper proposes how to efficiently extract parallelism from the LDP algorithm and strategies for optimally implementing it by exploiting some inherent General-Purpose Graphics Processing Unit (GPGPU) characteristics. By optimally configuring the GPGPU kernels, image retrieval was performed at a much faster rate. The LDP algorithm was ported on to Compute Unified Device Architecture (CUDA) supported GPGPU and a maximum speed up of around 240x was achieved as compared to its sequential counterpart.
Aerial image semantic segmentation based on 3D fits a small dataset of 1DIAESIJAI
Time restrictions and lack of precision demand that the initial technique be abandoned. Even though the remaining datasets had fewer identified classes than initially planned for the study, the labels were more accurate. Because of the need for additional data, a single network cannot categorize all the essential elements in a picture, including bodies of water, roads, trees, buildings, and crops. However, the final network gains some invariance in detecting these classes with environmental changes due to the different geographic positions of roads and buildings discovered in the final datasets, which could be valuable in future navigation research. At the moment, binary classifications of a single class are the only datasets that can be used for the semantic segmentation of aerial images. Even though some pictures have more than one classification, images of roads and buildings were only found in a significant number of samples. Then, the building datasets were pooled to produce a larger dataset and for the constructed models to gain some invariance on image location. Because of the massive disparity in sample size, road datasets needed to be integrated.
Nowadays modern computer GPU (Graphic Processing Unit) became widely used to improve the
performance of a computer, which is basically for the GPU graphics calculations, are now used not only
for the purposes of calculating the graphics but also for other application. In addition, Graphics
Processing Unit (GPU) has high computation and low price. This device can be treat as an array of SIMD
processor using CUDA software. This paper talks about GPU application, CUDA memory and efficient
CUDA memory using Reduction kernel. High-performance GPU application requires reuse of data inside
the streaming multiprocessor (SM). The reason is that onboard global memory is simply not fast enough to
meet the needs of all the streaming multiprocessor on the GPU. In addition, CUDA exposes the memory
space within the SM and provides configurable caches to give the developer the greatest opportunity of
data reuse.
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf
This paper proposes about motion estimation in H.264/AVC encoder. Compared with standards
such as MPEG-2 and MPEG-4 Visual, H.264 can deliver better image quality at the same
compressed bit rate or at a lower bit rate. The increase in compression efficiency comes at the
expense of increase in complexity, which is a fact that must be overcome. An efficient Co-design
methodology is required, where the encoder software application is highly optimized and
structured in a very modular and efficient manner, so as to allow its most complex and time
consuming operations to be offloaded to dedicated hardware accelerators. The Motion
Estimation algorithm is the most computationally intensive part of the encoder which is simulated using MATLAB. The hardware/software co-simulation is done using system generator tool and implemented using Xilinx FPGA Spartan 3E for different scanning methods.
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET Journal
This document discusses parallelizing genetic algorithms on GPUs using CUDA. It begins with an abstract and introduction on using GPUs for parallel computing due to their large number of cores compared to CPUs. The document then discusses using CUDA to program Nvidia GPUs and implement genetic algorithms and numeric analysis algorithms in parallel. It analyzes the performance of these algorithms by calculating a speedup factor from comparing serial CPU and parallel GPU computation times. Graphs of the results are plotted to visually compare CPU and GPU performance, helping determine where parallelization provides better results.
This document discusses parallelizing graph algorithms on GPUs for optimization. It summarizes previous work on parallel Breadth-First Search (BFS), All Pair Shortest Path (APSP), and Traveling Salesman Problem (TSP) algorithms. It then proposes implementing BFS, APSP, and TSP on GPUs using optimization techniques like reducing data transfers between CPU and GPU and modifying the algorithms to maximize GPU computing power and memory usage. The paper claims this will improve performance and speedup over CPU implementations. It focuses on optimizing graph algorithms for parallel GPU processing to accelerate applications involving large graph analysis and optimization problems.
Similar to GPU-BASED IMAGE SEGMENTATION USING LEVEL SET METHOD WITH SCALING APPROACH (20)
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
2. Computer Science & Information Technology (CS & IT)
82
Level set method quite suitable for the method such as mud and water physically-based
simulations [4], but the method require solving the partial differential equations (PDEs), so the
solution requires a high computational cost. Many methods have been developed to deal with this
problem. Two major approaches are narrow band [5] and sparse field technique [6]. With these
methods, PDE improvements calculated only around the zero level set and the speed
improvement has obtained. But enough acceleration could not achieve. Later studies focused on
running parallel to the level set method. Thus the parallel algorithms have been developed [7].
Multiprocessor structures emerged in the last ten years, and today began to be used widely. These
structures are used in both industrial and academic research, and the difficulties and problems can
be solved with these structures. Nowadays, many problems can be implemented easily with this
structure. Especially with the rapid advances in video card, running parallel to the level set
method has become an important factor to the speed increase.
Graphics processor unit (GPU) to be used for general purpose applications is not an approach that
emerged in recent years, but in 2007, with the development of the CUDA architecture, it is
expanded rapidly. The CUDA architecture provide to execute general purpose application without
knowledge of the graphics processor. GPU-based applications are used not only in the scientific
field, but also in other fields that require high performance such as, image and video processing,
fluid dynamics simulation [8].
Level set method began to run on the GPU in the early 2000s. The first GPU implementation of
the level set method is implemented by Rumpf and Strzodka[18] in 2001. Researchers continued
to study in this area until 2007[19, 11]. With the release of CUDA C programming language in
2007, high performance solutions have been obtained [12-15]. One of the important studies on the
level set method was presented by Robert et al. [16] and Jalba et al [17]. Robert’s method
represents approach to implement narrow-band approaches on the GPU and Jalba’s method
represents approach to implement sparse approaches on the GPU.
In this paper, a novel GPU-based level set method is presented. The proposed method does not
include PDE solution, so our algorithm has significantly increased the speed of calculation. We
use basic geometric transformations for curve evolution. This paper is organized as flows. In
Section 2 provides information on the new generation of GPU and memory management. In
section 3 we explain the major details of developed GPU-based algorithm. Finally, results
andtime measurements are presented.
2. CUDA ARCHITECTURE
Graphics processors are rapidly developed and become available for general purpose applications.
Thus it has become to use in many areas to improve the speed of application. CUDA architecture
developed by NVIDIA company and this architecture allows high increase in computing
performance. In fact, this performance difference is due to the fact that, the GPU architecture
designed for parallel operations. A simple CPU (central processor unit) and GPU architecture are
illustrated in figure 1. As shown in the illustration GPU has a large number of arithmetic logic
unit (ALU), but its cache memory is low [18].
3. 83
Computer Science & Information Technology (CS & IT)
Figure 1.CPU and GPU architecture [18].
GPUs execute a large number of threads on a set of data at the same time. Therefore it is only
appropriate for parallel data. Thus the successful result can be obtained. For example if a program
contains many flow control, its calculation speed may be reduce rather than increase [7].
2.1. CUDA C Programming Language
A CUDA C programming language introduces a small number of extensions to C language and
allows us to define C function called kernel. With kernel we can define and execute N function in
parallel. When kernel calls, unique threadId is available. Thus, with the Id number we can
determine which thread is currently running within the kernel. Threads are organized in blocks. In
the same as thread, blocks have an identification number. So that which block and which thread
actually running, can be easily determined in large data sets. But the number of thread in the
block is limited. Current GPUs are supported up to 1024 thread. Threads and blocks can be one,
two or three dimensional. Blocks are organized in grids. A general structure is illustrated in figure
2 [18].
Figure 2.Grid of the blocks [18].
4. Computer Science & Information Technology (CS & IT)
84
2.2. Memory Management
CUDA architecture supports many memory types. Threads, blocks and type of memory are
illustrated in figure 3. Each thread access registers, local memory, shared memory, constant
memory and global memory. The CPU part of the application can be access global memory and
constant memory. The used memory types are listed below.
Registers: Registers are defined in the thread. These variables cannot be accessed from outside
the thread. Generally, it is used to store local variable in the function. It is not require any extra
programming extension [8, 9, 10]
Local Memory: Same as registers it is valid only in the thread. It is defined by __local__
keyword. t is slower than register [8].
Shared Memory: All of the thread in the blocks can access to shared memory. The values in the
blocks can be accessed by any thread. In general, it is fast as registers. t can be defined by
__shared__ keyword [8, 9].
Global memory: All application (CPU and GPU part) can access global memory. It is defined by
__device__ keyword. Data transfer speed is very slow [18].
Constant Memory: Constant memory can only be read by the GPU. Each running thread can read
from memory at the same time. Thus, we obtain very fast data transfer [8, 10].
Figure 3.GPU hardware architecture and memory types [20].
5. 85
Computer Science & Information Technology (CS & IT)
3. DEVELOPED METHODS
In this paper, a novel level set method based on scaling approaches for extracting object on the
images is proposed. In this section, developed GPU algorithm will be discussed. The application
is implemented by using the Visual Studio 2012 environment with CUDA C language. The
developed method consists of the following five stages. These stages and their working devices is
presented in Figure-4.
Figure 4.General algorithm of the developed method.
3.1. Preparation Stage
In this stage, user can select any picture by means of the developed user interface. After getting
the image from a user, it sends to the graphics processor. OpenGL library was used to display the
picture from the user. This process will be performed by the CPU.
3.2. Pre-processing Stage
In this stage some preprocessing techniques are performed on the images. The first technique is
noise reduction. We use Gaussian smoothing for this purpose [21]. The Gaussian smoothing
operator is a 2-D convolution operator that is used to remove detail and noise. We use 5x5
Gaussian filter. The second technique is to convert color to grayscale. The developed method
applies on gray level images so gray level transformation is made in this section for color images.
The final technique is Sobel operator. This operator is used in image processing, particularly edge
6. Computer Science & Information Technology (CS & IT)
86
detection algorithm. This technique is used to help “EDGE” algorithm which is mentioned in
section 3.4. These three techniques executes on the GPU.
3.3. Determination of Contour and Center
Sometimes the initial contour is very important to find object boundaries quickly. The developed
application has tested on MRI brain tumors images. Here, we have expected to identify tumor
area sketchy by expert. Determined initial contour curve is drawn as a circle. Then this curve is
sent to the graphic processor to provide curve evolution.
The center of the object is the main element of the curve evolution algorithm. But we have no
idea about the center of the object. Fortunately, it can be adjusted by the system automatically or
by an expert. In this paper, we prefer to determine the center by an expert. Thus, we provide a
more accurate center. Also, we allow selecting more than one center, so that the concave shapes is
successfully determined. An example is illustrated in figure 5. U-shaped object having a concave
region at the top of the figure. Figure 1a shows initial condition, figure 1b shows the result of
boundaries with one center, and figure 1c shows the result of boundaries with three centers. As
shown in the figure, using more than one center is more accurate.
(a) Initial Condition
(b) Result with one center
(c) Result with three centers
Figure 5.Comparison of the use of a single center and multi-center on U-shaped object.
3.4. Contour Evolution
In level set method, curve evolution is based on the solution of partial differential equation. This
causes the algorithm to run slowly. The developed algorithm for this paper use different approach
for curve evolution.
The developed curve evolution method uses two-dimensional geometric transformations such as
scaling and it uses the logic of binary search. Binary search technique is applied on series of
numbers and with this techniques search time significantly reduces. In developed algorithm, the
evolution of next point on the curve is determined for each point. Normally, binary search is
performed on numbers but in our method, we use pixels for binary search operation. Rather than a
set of number of binary search, we define a set of pixels and half of the cluster is eliminated at
each cycle approximately. This process continues until the edge is found or until there are no
elements in the cluster. Elements of the cluster are pixels between a point on the contour and
shape center.
7. 87
Computer Science & Information Technology (CS & IT)
Figure 6.The movement of a pixel on the curve. (a) Initial state. (b) After first step. (c) After secondstep. (d)
After third step. (e) After fourth step – Edge found.
Algorithm is illustrated for a pixel in Figure 6. Thus, when we move contour pixel by pixel, the
running time of the edge detection process will be O(n) , but with this algorithm the running time
of the edge detection process will be O(logn) (n is the number of pixels between initial point and
center point). This process is repeated for all points on the curve, by this way the development of
the initial curve is provided.
In algorithm, we use scaling in order to ensure the movement of pixels. At the beginning of the
evolution, there are two variables defined for scaling which are called “sr1” and “sr2” and
respectively they are set 1 and 2. “sr1” variable refers to scaling factor which is applied to the
previous cycle and “sr2” variable refers to current scaling factor. Variables are calculated at the
end of each cycle by the following formula according to the movement of each pixel.
2ݎݏ = 1ݎݏ
||2ݎݏ − 1ݎݏ
2
= 2ݎݏ൞
||2ݎݏ − 1ݎݏ
− 2ݎݏ
2
+ 2ݎݏ
(1)
ݐ ݂ ݁݀݅ݏ݊݅ ݈݁ݔ݅ ݂ܫℎ݁ ݏℎܽ݁
ݐ ݂ ݁݀݅ݏݐݑ ݈݁ݔ݅ ݂ܫℎ݁ ݏℎܽ݁
(2)
Developed GPU algorithm is given in Figure 7. Algorithm uses two functions. The first function
is “EDGE” function. This function provides information about whether or not a pixel closes to the
edge. The other function is “INSIDE” function. This function is used to determine a pixel is
inside or outside the shape.
8. Computer Science & Information Technology (CS & IT)
88
Figure 7.GPU algorithm of the algorithm.
Figu
“EDGE” function uses the calculated edge information which is calculated in preprocessing
section and return in the form of true or false. In the same way “INSIDE” function returns result
as true or false. “INSIDE” function uses centers values, initial contour and the average density of
nction
contour area to decide. Every pixels on the contour execute on a CUDA thread, so application
accelerated. Curve evolution algorithm is applied an MRI liver image and illustrated in figu 8.
figure
The image size is 512x512 and as shown in the illustration contour evolution is very fast.
(a) Initial State
(b) Cycle-1
(c) Cycle
Cycle-2
(d) Cycle-3
(e) Cycle-5
(f) Cycle
Cycle-7
Figure 8.An example of curve evolution algorithm.
8.
3.5. Presentation
At this stage, the entire calculation has been completed. Final contour and preprocessed image
copy from GPU to CPU memory. Then, contour points and image are combined and presented to
the user.
9. 89
Computer Science & Information Technology (CS & IT)
4. RESULTS
In this section we present experimental results obtained by the proposed methods. All
experiments were performed on a machine equipped with an Intel Core i7-3770 CPU at 8GB
RAM and a GeForge GTX 660 Ti GPU. GPU has 7 streaming multiprocessor (SM) and each SM
has 192 CUDA processor. This means 1344 computing core per chip. The number of register per
multiprocessor is 65536, the total amount of constant memory is 64KB. The amount of shared
memory per multiprocessor is 48KB, organized into 32 banks. The 2GB amount of global
memory is reached through a GGDR5 interface. The architecture supports the double precision
floating point arithmetic.
In this study, we use MRI brain tumor images. The resolution of the image is512x512. The results
are illustrated in figure 9. As shown in the figure the tumor has found with high success rate.
Figure 9.The results of the proposed method.
10. Computer Science & Information Technology (CS & IT)
90
The developed application has been tested with different settings on different images with
different resolution. Firstly, the preprocessing operation is tested with different settings. We test
our algorithm with three different parameters. These are block number, thread number and image
resolution. The results are shown in Table 1. Table 1 also shows that execution time generally
depends on the image resolution but block number and thread number are also very important for
performance optimization. As shown in second and last row of the table 1, thread number is
selected very low, so the execution speed increases.
Table 1. Execution time of the preprocessing stage.
Image Resolution Block Number Thread Number Execution Time (ms)
256x256
16x16
16x16 (256)
0,36
256x256
32x32
8x8 (64)
0,70
256x256
8x8
32x32 (1024)
0,36
256x256
16x8
16x32 (512)
0,36
512x512
16x16
32x32 (1024)
1,34
512x512
32x32
16x16 (256)
1,33
512x512
16x32
32x16 (512)
1,32
512x512
32x64
16x8 (128)
1,33
1024x1024
64x64
16x16 (256)
5,14
1024x1024
64x32
16x32 (512)
5,19
1024x1024
32x32
32x32 (1024)
5,19
1024x1024
128x64
8x16 (128)
5,36
Secondly, curve evolution algorithm is tested with different settings. We test our algorithm with
four different parameters. These are block number, thread number, image resolution and number
of contour points. The results are shown in Table 2. Considering that, curve evolution algorithm
is a circular structure, so we only give one cycle’s execution time. In Table 2, execution time is
given in milliseconds and indicates that, the slow-running cycle. As shown in table 2, the
execution time is not change with the image resolution, because we use image only for
information. In other word, we do not use image information primarily, so threads will access
image data with global memory when it is necessary.. We use contour points as primary data, but
as shown in the figure 2, it is generally no effect for execution time. There are two reasons for this
result. Firstly, the complexity of the algorithm is O(1). Each thread executes their code and this
code does not include a circular part. Second and main reason, the video card capacity is not used
in this process. As shown in table 2, threads and blocks number are very limited. Here, the
important factor for execution time is to select a sufficient number of threads. For example, the
second, fourth and sixth row of the table 2, we select 4x4 thread number. But there is no effect for
performance. On the other hand, when we increase the number of contour points, 4x4 thread
number is not enough and this leads to increase execution time. For example, in seventh row in
table 2, we use 4096 contour points and 4x4 thread number is not enough.
A key decision for performance optimization in CUDA programming is not only the choice of the
size of the block and thread number, but also the choice of the memory type. With an only small
change to the type of memory used, can be achieved great acceleration. We use shared and
constant memory in this study. The main aim is to maximize memory bandwidth. For this,in
11. 91
Computer Science & Information Technology (CS & IT)
preprocessing stage, we use shared memory for image convolution. Also the constant memory is
used in all GPU-based function.
Table 2. Execution time of the curve evolution algorithm. (For 1 cycle)
Image
Resolution
256x256
Block
Number
4x4
Thread
Number
8x8
Contour Point
Number
1024
Execution Time
(ms)
0,058
256x256
8x8
4x4
1024
0,055
256x256
4x2
4x8
256
0,057
256x256
4x4
4x4
256
0,054
512x512
4x4
8x8
1024
0.058
512x512
8x8
4x4
1024
0.056
512x512
16x16
4x4
4096
0.090
512x512
8x8
8x8
4096
0.059
1024x1024
4x8
16x8
4096
0.060
1024x1024
4x4
16x16
4096
0.060
5. CONCLUSIONS
In this paper, a novel GPU-based level set method has presented with scaling approach.We
followed the method of heterogeneous programming technique for this study. Some of the steps
were made on the CPU side and some of the steps were made on the GPU side. Minor procedures
and user inputs performed on the CPU side. Operations that require high computing such as curve
evolution and preprocessing performed on the GPU side. Thus, we achieved very fast execution
time.
In developed method, the curve evolution algorithm used basic geometric transformation like
scaling, so that the algorithm has been run much faster. As a result, all operations were completed
in about 2 milliseconds with all memory operations. Thus, it is possible to show that with the
developed method, we can perform real-time operation.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
S. Osher, and N. Paragios, (2003)“Geometric Level Set Methods in Imaging, Vision and Graphics”,
Springer-Verlag New York, Secaucus, NJ, USA.
D. W.Shattuck, G. Prasad, M. Mirza, K. L. Narr, and A. W. Toga, (2009)“Online resource for
validation of brain segmentation methods”, NeuroImaging, 45, 431-439.
S. Osher, and J. Sethian, (1998)“Fronts propagating with curvature-dependent speed: Algorithims
based on Hamilton-Jakobi formulation”, J. Comput. Phys., 79-1, 12-49.
R. Fetkiw, (2002)“Simulating Natural Phenomena for Computer Graphics, Geometric Level Set in
Imaging”, Vision and Graphics, 461-479.
D. Adalsteinsson, and J. Sethian, (1995)“A Fast Level Set Method for Propagating Interfaces”,
Journal of Computational Physics, 118-2, 269-277.
R. Whitaker, (1998)“A Level-Set Approach to 3D Reconstruction from Range Data”, International
Journal of Computer Vision, 29-3, 203-232.
12. Computer Science & Information Technology (CS & IT)
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
92
S. P. Awate, and R. T. Whitaker, (2004)“An Interactive Parallel Multiprocessor Levet-Set Solver with
Dynamic Load Balancing”, Scientific Computing and Imaging Institute Technical Report, UUCS-05002, University of Utah, Salt Lake City, UT, USA.
J. Sanders, and E. Kandrot, (2011)“CUDA By Example: An Introduction to General-Purpose GPU
Programming”, Addison-Wesley, NVIDIA Cooporation.
M. Rumpf, and R. Strzodka, (2001) “Level Set Segmentation in Graphics Hardware”, IEEE
International Conference on Image Processing (ICIP’01), Thessaloniki, Greece, October 7-10, pp.
1103-1106.
A. Lefohn, and R. Whitaker,(2002)“A GPU-Based Three-Dimensional Level Sey Solver with
Curvature Flow”, Scientific Computing and Imaging Institute Technical Report, UUCS-02-017,
University of Utah, Salt Lake City, UT, USA.
A. Lefohn, J.M. Kniss, C. D. Hansen, and R. T. Whitaker, (2004)“A Streaming Narrow-Band
Algorithm: Interactive Computation and Visualization of Level Sets”, IEEETransactions on
Visualization and Computer Graphics, 10, 422-433.
O. Klar, (2007)“Interactive GPU-based Segmentation of Large Medical Volume Data with LevelSets”, 11th Central European Seminar on Computer Graphics (CESCG’07), Budmerice Castle,
Slovakia, April 23 - 25.
H. Mostofi, and K. Colege, (2009)“Fast level Set Segmentation of Biomedical Images using Graphics
Processing Units”, Final Year Project, University of Oxford, Department of Engineering Science
Oxford.
A. Hagan, and Y. Zhao, (2009)“Parallel 3D Image Segmentation of Large Data Sets on a GPU
Cluster”, 5th International Symposium on Visual Computing, Las Vegas, Nevada, USA, November
30 - December 2, pp. 960-969.
G. J. Tornai, and G. Cserey, (2010)“2D and 3D level-set Algorithm on GPU”, 12th International
Workshop on Cellular Nanoscale Network and Their Applications (CNNA), Berkeley, CA, USA,
February 3-5, pp. 1-5.
M. Roberts, J. Packer, M. C. Sousa, and J. R. Mitchell, (2010)“A Work-Efficient GPU Algorithm for
Level Set Segmentation”, High Performance Graphics (HGP ‘10), Saarbrucken, Germany, June 2527, pp. 123-132.
A. C. Jalba, W. J. Van Der Laan, and J. B. T. M. Roerdink, (2013).“Fast Sparse Level Sets on
Graphics Hardware”, Visualization and Computer Graphics, 19-1, 30-44.
NVIDIA., (2012) “CUDA C Programming Guide v5.0”.
http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf, 15 Mayıs 2013.
S. Cook, (2012)“Cuda Programming: A Developer’s Guide to Paralel Computing with GPUs”,
Elsevier, Morgan Kaufmann.
J. Ooster,“Cuda Memory Model”, http://3dgep.com/?p=2012, 14 Mayıs 2013.
“Gaussian Smoothing”, http://homepages.inf.ed.ac.uk/rbf/HIPR2/gsmooth.htm, 14 Mayıs 2013.
Authors
ZaferGüler is a research assistant in the Department of Software Engineering at the Firat
University of Elazig where he has been a faculty member since 2010. He completed his
master at Firat University Computer Engineering department. His research interests are
GPU programming, level set methods.image segmentation.
AhmetÇınar was born in Elazig (1972). He received the PhD degree in ElectricElectronics Engineering in 2003 from Firat University. He has graduated in 1993 BSc.
He has been working on Firat Univ. Department of Computer Engineering, (Assistant
Professor). His research is interested in development and improvement of mesh
generation methods, and applications of virtual reality, augmented reality, artificial
intelligence and game programming.