In this work, we introduce real time image processing techniques using modern programmable Graphic Processing Units GPU. GPU are SIMD (Single Instruction, Multiple Data) device that is inherently data-parallel. By utilizing NVIDIA new GPU programming framework, “Compute Unified Device Architecture” CUDA as a computational resource, we realize significant acceleration in image processing algorithm computations. We show that a range of computer vision algorithms map readily to CUDA with significant performance gains. Specifically, we demonstrate the efficiency of our approach by a parallelization and optimization of image processing, Morphology applications and image integral.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
In this paper we study NVIDIA graphics processing unit (GPU) along with its computational power and applications. Although these units are specially designed for graphics application we can employee there computation power for non graphics application too. GPU has high parallel processing power, low cost of computation and less time utilization; it gives good result of performance per energy ratio. This GPU deployment property for excessive computation of similar small set of instruction played a significant role in reducing CPU overhead. GPU has several key advantages over CPU architecture as it provides high parallelism, intensive computation and significantly higher throughput. It consists of thousands of hardware threads that execute programs in a SIMD fashion hence GPU can be an alternate to CPU in high performance environment and in supercomputing environment. The base line is GPU based general purpose computing is a hot topics of research and there is great to explore rather than only graphics processing application.
Nowadays modern computer GPU (Graphic Processing Unit) became widely used to improve the
performance of a computer, which is basically for the GPU graphics calculations, are now used not only
for the purposes of calculating the graphics but also for other application. In addition, Graphics
Processing Unit (GPU) has high computation and low price. This device can be treat as an array of SIMD
processor using CUDA software. This paper talks about GPU application, CUDA memory and efficient
CUDA memory using Reduction kernel. High-performance GPU application requires reuse of data inside
the streaming multiprocessor (SM). The reason is that onboard global memory is simply not fast enough to
meet the needs of all the streaming multiprocessor on the GPU. In addition, CUDA exposes the memory
space within the SM and provides configurable caches to give the developer the greatest opportunity of
data reuse.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
In this paper we study NVIDIA graphics processing unit (GPU) along with its computational power and applications. Although these units are specially designed for graphics application we can employee there computation power for non graphics application too. GPU has high parallel processing power, low cost of computation and less time utilization; it gives good result of performance per energy ratio. This GPU deployment property for excessive computation of similar small set of instruction played a significant role in reducing CPU overhead. GPU has several key advantages over CPU architecture as it provides high parallelism, intensive computation and significantly higher throughput. It consists of thousands of hardware threads that execute programs in a SIMD fashion hence GPU can be an alternate to CPU in high performance environment and in supercomputing environment. The base line is GPU based general purpose computing is a hot topics of research and there is great to explore rather than only graphics processing application.
Nowadays modern computer GPU (Graphic Processing Unit) became widely used to improve the
performance of a computer, which is basically for the GPU graphics calculations, are now used not only
for the purposes of calculating the graphics but also for other application. In addition, Graphics
Processing Unit (GPU) has high computation and low price. This device can be treat as an array of SIMD
processor using CUDA software. This paper talks about GPU application, CUDA memory and efficient
CUDA memory using Reduction kernel. High-performance GPU application requires reuse of data inside
the streaming multiprocessor (SM). The reason is that onboard global memory is simply not fast enough to
meet the needs of all the streaming multiprocessor on the GPU. In addition, CUDA exposes the memory
space within the SM and provides configurable caches to give the developer the greatest opportunity of
data reuse.
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEijcsity
It is shown that neural networks (NNs) achieve excellent performances in image compression and reconstruction. However, there are still many shortcomings in the practical application, which eventually lead to the loss of neural network image processing ability. Based on this, a joint framework based on neural network and scale compression is proposed in this paper. The framework first encodes the incoming PNG image information, and then the image is converted into binary input decoder to reconstruct the intermediate state image, next, we import the intermediate state image into the zooming compressor and repressurize it, and reconstruct the final image. From the experimental results, this method can better process the digital image and suppress the reverse expansion problem, and the compression effect can be improved by 4 to 10 times as much as that of using RNN alone, showing better ability in the application. In this paper, the method is transmitted over a digital image, the effect is far better than the existing compression method alone, the Human visual system cannot feel the change of the effect.
This project deals with the warehouse scale computers that power all the internet services which we use today. The project covers the hardware blocks used in a Google WSC. Also, the project deals with the architecture of hardware accelerators such as the Graphical Processing Unit and the Tensor Processing Unit, which is highly useful for the warehouse scale machines to run heavy tasks and also to support application-specific machine learning and deep learning tasks. Also, the project explains about the energy efficiency of the processors used by the Google WSC to achieve high performance. The project also tries to explain about performance enhancement mechanism used by Google WSC.
Graphics Processing Unit GPU is a processor or electronic chip for graphics. GPUs are massively parallel processors used widely used for 3D graphic and many non graphic applications. As the demand for graphics applications increases, GPU has become indispensable. The use of GPUs has now matured to a point where there are countless industrial applications. This paper provides a brief introduction on GPUs, their properties, and their applications. Matthew N. O. Sadiku | Adedamola A. Omotoso | Sarhan M. Musa "Graphics Processing Unit: An Introduction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29647.pdf Paper URL: https://www.ijtsrd.com/engineering/electrical-engineering/29647/graphics-processing-unit-an-introduction/matthew-n-o-sadiku
Cuda Based Performance Evaluation Of The Computational Efficiency Of The Dct ...acijjournal
Recent advances in computing such as the massively parallel GPUs (Graphical Processing Units),coupled
with the need to store and deliver large quantities of digital data especially images, has brought a number
of challenges for Computer Scientists, the research community and other stakeholders. These challenges,
such as prohibitively large costs to manipulate the digital data amongst others, have been the focus of the
research community in recent years and has led to the investigation of image compression techniques that
can achieve excellent results. One such technique is the Discrete Cosine Transform, which helps separate
an image into parts of differing frequencies and has the advantage of excellent energy-compaction.
This paper investigates the use of the Compute Unified Device Architecture (CUDA) programming model
to implement the DCT based Cordic based Loeffler algorithm for efficient image compression. The
computational efficiency is analyzed and evaluated under both the CPU and GPU. The PSNR (Peak Signal
to Noise Ratio) is used to evaluate image reconstruction quality in this paper. The results are presented
and discussed
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
n recent years, with the development of graphics p
rocessors, graphics cards have been widely
used to perform general-purpose calculations. Espec
ially with release of CUDA C
programming languages in 2007, most of the research
ers have been used CUDA C
programming language for the processes which needs
high performance computing.
In this paper, a scaling approach for image segment
ation using level sets is carried out by the
GPU programming techniques. Approach to level sets
is mainly based on the solution of partial
differential equations. The proposed method does no
t require the solution of partial differential
equation. Scaling approach, which uses basic geomet
ric transformations, is used. Thus, the
required computational cost reduces. The use of the
CUDA programming on the GPU has taken
advantage of classic programming as spending time a
nd performance. Thereby results are
obtained faster. The use of the GPU has provided to
enable real-time processing. The developed
application in this study is used to find tumor on
MRI brain images.
GPU-BASED IMAGE SEGMENTATION USING LEVEL SET METHOD WITH SCALING APPROACHcsandit
In recent years, with the development of graphics processors, graphics cards have been widely
used to perform general-purpose calculations. Especially with release of CUDA C
programming languages in 2007, most of the researchers have been used CUDA C
programming language for the processes which needs high performance computing.
In this paper, a scaling approach for image segmentation using level sets is carried out by the
GPU programming techniques. Approach to level sets is mainly based on the solution of partial
differential equations. The proposed method does not require the solution of partial differential
equation. Scaling approach, which uses basic geometric transformations, is used. Thus, the
required computational cost reduces. The use of the CUDA programming on the GPU has taken
advantage of classic programming as spending time and performance. Thereby results are
obtained faster. The use of the GPU has provided to enable real-time processing. The developed
application in this study is used to find tumor on MRI brain images.
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEijcsity
It is shown that neural networks (NNs) achieve excellent performances in image compression and reconstruction. However, there are still many shortcomings in the practical application, which eventually lead to the loss of neural network image processing ability. Based on this, a joint framework based on neural network and scale compression is proposed in this paper. The framework first encodes the incoming PNG image information, and then the image is converted into binary input decoder to reconstruct the intermediate state image, next, we import the intermediate state image into the zooming compressor and repressurize it, and reconstruct the final image. From the experimental results, this method can better process the digital image and suppress the reverse expansion problem, and the compression effect can be improved by 4 to 10 times as much as that of using RNN alone, showing better ability in the application. In this paper, the method is transmitted over a digital image, the effect is far better than the existing compression method alone, the Human visual system cannot feel the change of the effect.
This project deals with the warehouse scale computers that power all the internet services which we use today. The project covers the hardware blocks used in a Google WSC. Also, the project deals with the architecture of hardware accelerators such as the Graphical Processing Unit and the Tensor Processing Unit, which is highly useful for the warehouse scale machines to run heavy tasks and also to support application-specific machine learning and deep learning tasks. Also, the project explains about the energy efficiency of the processors used by the Google WSC to achieve high performance. The project also tries to explain about performance enhancement mechanism used by Google WSC.
Graphics Processing Unit GPU is a processor or electronic chip for graphics. GPUs are massively parallel processors used widely used for 3D graphic and many non graphic applications. As the demand for graphics applications increases, GPU has become indispensable. The use of GPUs has now matured to a point where there are countless industrial applications. This paper provides a brief introduction on GPUs, their properties, and their applications. Matthew N. O. Sadiku | Adedamola A. Omotoso | Sarhan M. Musa "Graphics Processing Unit: An Introduction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29647.pdf Paper URL: https://www.ijtsrd.com/engineering/electrical-engineering/29647/graphics-processing-unit-an-introduction/matthew-n-o-sadiku
Cuda Based Performance Evaluation Of The Computational Efficiency Of The Dct ...acijjournal
Recent advances in computing such as the massively parallel GPUs (Graphical Processing Units),coupled
with the need to store and deliver large quantities of digital data especially images, has brought a number
of challenges for Computer Scientists, the research community and other stakeholders. These challenges,
such as prohibitively large costs to manipulate the digital data amongst others, have been the focus of the
research community in recent years and has led to the investigation of image compression techniques that
can achieve excellent results. One such technique is the Discrete Cosine Transform, which helps separate
an image into parts of differing frequencies and has the advantage of excellent energy-compaction.
This paper investigates the use of the Compute Unified Device Architecture (CUDA) programming model
to implement the DCT based Cordic based Loeffler algorithm for efficient image compression. The
computational efficiency is analyzed and evaluated under both the CPU and GPU. The PSNR (Peak Signal
to Noise Ratio) is used to evaluate image reconstruction quality in this paper. The results are presented
and discussed
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
n recent years, with the development of graphics p
rocessors, graphics cards have been widely
used to perform general-purpose calculations. Espec
ially with release of CUDA C
programming languages in 2007, most of the research
ers have been used CUDA C
programming language for the processes which needs
high performance computing.
In this paper, a scaling approach for image segment
ation using level sets is carried out by the
GPU programming techniques. Approach to level sets
is mainly based on the solution of partial
differential equations. The proposed method does no
t require the solution of partial differential
equation. Scaling approach, which uses basic geomet
ric transformations, is used. Thus, the
required computational cost reduces. The use of the
CUDA programming on the GPU has taken
advantage of classic programming as spending time a
nd performance. Thereby results are
obtained faster. The use of the GPU has provided to
enable real-time processing. The developed
application in this study is used to find tumor on
MRI brain images.
GPU-BASED IMAGE SEGMENTATION USING LEVEL SET METHOD WITH SCALING APPROACHcsandit
In recent years, with the development of graphics processors, graphics cards have been widely
used to perform general-purpose calculations. Especially with release of CUDA C
programming languages in 2007, most of the researchers have been used CUDA C
programming language for the processes which needs high performance computing.
In this paper, a scaling approach for image segmentation using level sets is carried out by the
GPU programming techniques. Approach to level sets is mainly based on the solution of partial
differential equations. The proposed method does not require the solution of partial differential
equation. Scaling approach, which uses basic geometric transformations, is used. Thus, the
required computational cost reduces. The use of the CUDA programming on the GPU has taken
advantage of classic programming as spending time and performance. Thereby results are
obtained faster. The use of the GPU has provided to enable real-time processing. The developed
application in this study is used to find tumor on MRI brain images.
GPU-BASED IMAGE SEGMENTATION USING LEVEL SET METHOD WITH SCALING APPROACHcscpconf
In recent years, with the development of graphics processors, graphics cards have been widely used to perform general-purpose calculations. Especially with release of CUDA C
programming languages in 2007, most of the researchers have been used CUDA C programming language for the processes which needs high performance computing. In this paper, a scaling approach for image segmentation using level sets is carried out by the GPU programming techniques. Approach to level sets is mainly based on the solution of partial
differential equations. The proposed method does not require the solution of partial differential equation. Scaling approach, which uses basic geometric transformations, is used. Thus, the
required computational cost reduces. The use of the CUDA programming on the GPU has taken advantage of classic programming as spending time and performance. Thereby results are obtained faster. The use of the GPU has provided to enable real-time processing. The developed application in this study is used to find tumor on MRI brain images.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusbyteLAKE
The document summarizes byteLAKE’s basic benchmark results between two different setups of example edge devices: with NVIDIA GPU and with Intel’s Movidius cards.
Key takeaway: the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks.
Abstract
The purpose of this review paper is to show the difference between executing the seam carving algorithm using sequential approach on a traditional CPU (central processing unit) and using parallel approach on a modern CUDA (compute unified device architecture) enabled GPU (graphics processing unit). Seam Carving is a content-aware image resizing method proposed by Avidan and Shamir of MERL.[1] It functions by identifying seams, or paths of least importance, through an image. These seams can either be removed or inserted in order to change the size of the image. It is determined that the success of this algorithm depends on a lot of factors: the number of objects in the picture, the size of monotonous background and the energy function. The purpose of the algorithm is to reduce image distortion in applications where images cannot be displayed at their original size. CUDA is a parallel architecture for GPUs, developed in the year 2007 by the Nvidia Corporation. Besides their primary function i.e. rendering of graphics, GPUs can also be used for general purpose computing (GPGPU). CUDA enabled GPU helps its user to harness massive parallelism in regular computations. If an algorithm can be made parallel, the use of GPUs significantly improves the performance and reduces the load of the central processing units (CPUs). The implementation of seam carving uses massive matrix calculations which could be performed in parallel to achieve speed ups in the execution of the algorithm as a whole. The entire algorithm itself cannot be run in parallel, and so some part of the algorithm mandatorily needs a CPU for performing sequential computations.
Keywords: Seam Carving, CUDA, Parallel Processing, GPGPU, CPU, GPU, Parallel Computing.
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
This presentation focuses on Nvidia GPUs and explores the topics of what a GPU is, its basic architecture, how it is different from a CPU, its basic working, and what new Nvidia has to offer in consumer as well as server market
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY cscpconf
This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not take much processing time. It is implemented on GPU systems. Parallel programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is processed
Real-Time Implementation and Performance Optimization of Local Derivative Pat...IJECEIAES
Pattern based texture descriptors are widely used in Content Based Image Retrieval (CBIR) for efficient retrieval of matching images. Local Derivative Pattern (LDP), a higher order local pattern operator, originally proposed for face recognition, encodes the distinctive spatial relationships contained in a local region of an image as the feature vector. LDP efficiently extracts finer details and provides efficient retrieval however, it was proposed for images of limited resolution. Over the period of time the development in the digital image sensors had paid way for capturing images at a very high resolution. LDP algorithm though very efficient in content-based image retrieval did not scale well when capturing features from such high-resolution images as it becomes computationally very expensive. This paper proposes how to efficiently extract parallelism from the LDP algorithm and strategies for optimally implementing it by exploiting some inherent General-Purpose Graphics Processing Unit (GPGPU) characteristics. By optimally configuring the GPGPU kernels, image retrieval was performed at a much faster rate. The LDP algorithm was ported on to Compute Unified Device Architecture (CUDA) supported GPGPU and a maximum speed up of around 240x was achieved as compared to its sequential counterpart.
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYcsandit
This paper presents a parallel approach to improve the time complexity problem associated
with sequential algorithms. An image steganography algorithm in transform domain is
considered for implementation. Image steganography is a technique to hide secret message in
an image. With the parallel implementation, large message can be hidden in large image since
it does not take much processing time. It is implemented on GPU systems. Parallel
programming is done using OpenCL in CUDA cores from NVIDIA. The speed-up improvement
obtained is very good with reasonably good output signal quality, when large amount of data is
processed
Highlighted notes on Hybrid Multicore Computing
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
In this comprehensive report, Prof. Dip Banerjee describes about the benefit of utilizing both multicore systems, CPUs with vector instructions, and manycore systems, GPUs with large no. of low speed ALUs. Such hybrid systems are beneficial to several algorithms as an accelerator cant optimize for all parts of an algorithms (some computations are very regular, while some very irregular).
Similar to Image Processing Application on Graphics processors (20)
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Image Processing Application on Graphics processors
1. Chouchene Marwa, Bahri Haythem, Sayadi Fatma Ezahra & Atri Mohamed
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 66
Image Processing Application on Graphics processors
Chouchene Marwa ch.marwa.84@gmail.com
Laboratory of Electronics and Microelectronics (EμE)
Faculty of Sciences Monastir
Monastir, 5000, Tunisia
Bahri Haythem bahri.haythem@hotmail.com
Laboratory of Electronics and Microelectronics (EμE)
Faculty of Sciences Monastir
Monastir, 5000, Tunisia
Sayadi Fatma Ezahra sayadi_fatma@yahoo.fr
Laboratory of Electronics and Microelectronics (EμE)
Faculty of Sciences Monastir
Monastir, 5000, Tunisia
Atri Mohamed mohamed.atri@fsm.rnu.tn
Laboratory of Electronics and Microelectronics (EμE)
Faculty of Sciences Monastir
Monastir, 5000, Tunisia
Abstract
In this work, we introduce real time image processing techniques using modern programmable
Graphic Processing Units GPU. GPU are SIMD (Single Instruction, Multiple Data) device that is
inherently data-parallel. By utilizing NVIDIA new GPU programming framework, “Compute Unified
Device Architecture” CUDA as a computational resource, we realize significant acceleration in
image processing algorithm computations. We show that a range of computer vision algorithms
map readily to CUDA with significant performance gains. Specifically, we demonstrate the
efficiency of our approach by a parallelization and optimization of image processing, Morphology
applications and image integral.
Keywords: Image Processing, GPU, CUDA.
1. INTRODUCTION
The graphics processing unit (GPU) has become an integral part of today’s mainstream
computing systems. The modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory bandwidth that
substantially outpaces its CPU counterpart.
The speed of GPU increases the programming capacity so the resolution and processing
complex problems. This effort in computing has positioned the GPU as an alternative to
traditional microprocessors in high-performance computing system.
CUDA (Compute Unified Device Architecture) is a general architecture for parallel computing
introduced by NVidia in November 2007[1]. It includes a new programming model, new
architecture and an different set instruction.
In other words, generally the image processing algorithms treat all pixels in any way. But with
new technology it is possible to treat all pixels in parallel. For this reason, we can use the
2. Chouchene Marwa, Bahri Haythem, Sayadi Fatma Ezahra & Atri Mohamed
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 67
graphics cards because they are capable of performing a large number of times the same
function in parallel. In this paper, we will perform some image processing algorithms on the GPU.
The paper is organized as follows: In the first part we present a GPU overview. Subsequently a
second part deals with the image processing on the GPU. Finally, we conclude the paper.
2. GPU OVERVIEW
There are about ten years of the birth of a hardware dedicated to graphics processing offloading
the CPU, called GPU, Graphics Processing Unit. GPUs are supported by a map annex, taking
charge of their image processing and 3D data, and display. With its architecture is designed for a
massively parallel processing of data, they should become a co-processor used by any
application.
Graphics processors are now used to accelerate graphics and some general applications with
high data parallelism (GPGPU). For good performance, the parallel data is processed by a large
number of processing units that are integrated within the GPU. These are organized according to
a single instruction, multiple data (SIMD) or single program, multiple data (SPMD). There are
arithmetic units, loading units and interpolation, units of data pre-processing and finally evaluation
units of the basic functions.
Indeed, these architectures have evolved to move towards systems more programmable: first
through the shaders, but remains very specific 2D or 3D graphics, and more recently through the
kernels total. In addition, it is possible that a cluster of GPUs is both more efficient and less
energy than a cluster of CPU [2], which makes GPU clusters particularly attractive in the field of
high performance computing.
3. NVIDIA CUDA PROGRAMMING
Traditionally, general-purpose GPU programming was accomplished by using a shader-based
framework [3]. This framework has a steep learning curve that requires in-depth knowledge of
specific rendering pipelines and graphics programming. Algorithms have to be mapped into
vertex transformations or pixel illuminations. Data have to be cast into texture maps and operated
on like they are texture data. Because shader-based programming was originally intended for
graphics processing, there is little programming support for control over data flow; and, unlike a
CPU program, a shader-based program cannot have random memory access for writing data.
There are limitations on the number of branches and loops a program can have. All of these
limitations hindered the use of the GPU for general-purpose computing. NVIDIA released CUDA,
a new GPU programming model, to assist developers in general-purpose computing in 2007 [4].
In the CUDA programming framework, the GPU is viewed as a compute device that is a co-
processor to the CPU. The GPU has its own DRAM, referred to as device memory, and execute a
very high number of threads in parallel. More precisely, data-parallel portions of an application
are executed on the device as kernels which run in parallel on many threads. In order to organize
threads running in parallel on the GPU, CUDA organizes them into logical blocks. Each block is
mapped onto a multiprocessor in the GPU. All the threads in one block can be synchronized
together and communicate with each other. Because there is a limited number of threads that a
block can contain, these blocks are further organized into grids allowing for a larger number of
threads to run concurrently as illustrated in Figure 1. Threads in different blocks cannot be
synchronized, nor can they communicate even if they are in the same grid. All the threads in the
same grid run the same GPU code.
3. Chouchene Marwa, Bahri Haythem, Sayadi Fatma Ezahra & Atri Mohamed
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 68
FIGURE 1: Thread and Block Structure of CUDA.
CUDA has several advantages over the shader-based model. Because CUDA is an extension of
C, there is no longer a need to understand shader-based graphics APIs. This reduces the
learning curve for most of C/C++ programmers. CUDA also supports the use of memory pointers,
which enables random memory-read and write-access ability. In addition, the CUDA framework
provides a controllable memory hierarchy which allows the program to access the cache (shared
memory) between GPU processing cores and GPU global memory. As an example, the
architecture of the GeForce 8 Series, the eighth generation of NVIDIA’s graphics cards, based on
CUDA is shown in Figure 2.
FIGURE 2: GeForce 8 Series GPU Architecture.
The GeForce 8 GPU is a collection of multiprocessors, each of which has 16 SIMD (Single
Instruction, Multiple Data) processing cores. The SIMD processor architecture allows each
processor in a multiprocessor to run the same instruction on different data, making it ideal for
data-parallel computing. Each multiprocessor has a set of 32-bit registers per processors, 16KB
of shared memory, 8KB of read-only constant cache, and 8KB of read-only texture cache. As
depicted in Figure 2, shared memory and cache memory are on-chip. The global memory and
texture memory that can be read from or written to by the CPU are also in the regions of device
memory. The global and texture memory spaces are persistent across all the multiprocessors.
4. GPU COMPUTATION IN IMAGE PROCESSING
In recent years, manufacturers of graphics cards highlight the capacity of their GPU to do
anything but the game use include implementation we have the image processing applications.
In this part we have implemented many application of image processing using GPU.
4. Chouchene Marwa, Bahri Haythem, Sayadi Fatma Ezahra & Atri Mohamed
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 69
4.1 Transformation RGB to Gray
Our application is to display an image, tests are performed on an RGB color image, and we will
load this image and send it directly as a texture in memory and then process it for image
grayscale first time on CPU and GPU on the second time.
In the remainder of our testing of graphics processors, we ran our program by changing the used
image size. We measured the time taken to carry out this treatment using the CPU and GPU, the
results are as follows:
0
5
10
15
20
25
30
35
128² 256² 512² 1024² 2048²
Timems
Size image
Temps CPU
Temps GPU
FIGURE 3: Image Processing of Various Sizes.
It should be noted that the execution time on GPU still lower than on the CPU. By increasing the
size of the image The GPU time becomes slower and it is due to the data transfer time.
4.2 Morphology Applications
In the following, we will present results of applying morphological (Sobel, Erode) on different
images, performed using library GPUCV, and compared with that of the OpenCV library.
We compared the execution time of native CPU operators with their GPU counterpart. GPUCV is
up to times faster than native CPU, as shown in figure 4.
5. Chouchene Marwa, Bahri Haythem, Sayadi Fatma Ezahra & Atri Mohamed
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 70
FIGURE 4: Sobel and Erode Transformation of Various Sizes.
4.3 Calculation For Integral Image
The integral image was proposed by Paul Viola and Michael Jones in 2001 and has been used in
the real-time object detection [5]. The integral image used is constructed from an original image
grayscale.
In [6], the integral image has been extended; it is then used to compute the Haar-like features,
and characteristics of centersurround. In the SURF algorithm [7] [8], it accelerates the calculation
first and second order of Gaussian derivative that uses the integral image.
In the algorithm CenSurE [9], and its improved version SUSur [10], it uses two levels of filter to
approximate the Laplace operator using the integral image.
We present our algorithm to calculate the integral image. The implementation steps are shown in
figure 5:
6. Chouchene Marwa, Bahri Haythem, Sayadi Fatma Ezahra & Atri Mohamed
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 71
FIGURE 5: Algorithm as Implemented on Hardware.
In order to evaluate our parallel integral image implementation, we executed our algorithm for
different image sizes in both CPU and GPU (figure 6).
FIGURE 6: Integral Image For Various Sizes.
As can be seen from the table 1, compared to the corresponding CPU-based serial algorithm, our
algorithm has a relatively reduction in time-consuming. With the increasing of the image size used
in the experiment, speedup increases. However, due to the memory limitations in the video chip,
the processed image size at one time cannot be increased unlimitedly.
7. Chouchene Marwa, Bahri Haythem, Sayadi Fatma Ezahra & Atri Mohamed
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 72
5. CONCLUSIONS
In this work, we focused on the use of generic graphics cards (GPU) as a parallel computing
machine, by explaining their structure and characteristics of their programming. Their increasing
use in this context was exposed through a wide range of diverse application areas.
With the GPU, we responded to the needs generated by different computational applications.
First, we have accelerated the simulation of a simple image processing. In a second step, we
proposed an adaptation GPU, a morphological algorithm (Sobel, Erode). We got time gains
evidence from a CPU implementation.
Finally, a parallel integral image algorithm, is presented and implemented on GPU, and compared
with the sequential implementations based on CPU. Performance results indicate that significant
speedup can be achieved.
6. REFERENCES
[1] NVIDIA Corporation, "NVIDIA CUDA Compute Unified Device Architecture Programming
Guide", Version 3, 2013.
[2] L. Abbas-Turki, S. Vialle, B. Lapeyre, and P. Mercier. "High Dimensional Pricing of Exotic
European Contracts on a GPU Cluster, and Comparison to a CPU Cluster". In Second
InternationalWorkshop on Parallel and Distributed Computing in Finance, May 2009.
[3] Allard, J. and Raffin, B., "A shader-based parallel rendering framework. in Visualization",
2005,VIS 05. IEEE, pp 127-134.
[4] NVIDIA, CUDA Programming Guide Version 1.1. 2007, NVIDIA Corporation: Santa Clara,
California.
[5] P Viola, M Jones, "Rapid object detection using a boosted cascade of simple features",
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2001.
[6] R Lienhart, A Kuranov, V Pisarevsky, "Empirical Analysis of Detection Cascades of Boosted
Classifiers for Rapid Object Detection", Pattern Recognition. 2003, vol. 2781, pp. 297-304.
[7] H Bay, A Ess, T Tuytelaars, L Van Gool, "Speededup robust features (SURF) ", International
Journal on Computer Vision and Image Understanding, 2008, vol. 110, no. 3, pp. 346–359.
[8] H Bay, T Tuytelaars, L Van Gool, "SURF: Speeded up robust features", Proceedings of the
European Conference on Computer Vision, 2006, Springer LNCS volume 3951, part 1, pp 404–
417.
[9] M Agrawal, K Konolige, M R Blas "Censure: Center surround extremas for realtime feature
detection and matching", 2008, ECCV (4), volume 5305 of Lecture Notes in Computer Science,
Springer. pp 102– 115.
[10] M Ebrahimi, W W Mayol-Cuevas, "SUSurE: Speeded Up Surround Extrema feature detector
and descriptor for realtime applications", IEEE Computer Society Conference on Computer Vision
and Pattern Recognition Workshops, 2009, CVPRW, pp.9-14.