SlideShare a Scribd company logo
1 of 4
Download to read offline
ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012



 Simple and Fast Implementation of Segmented Matrix
    Algorithm for Haar DWT on a Low Cost GPU
                                     Madeena Sultana1 and Nurul Muntasir Mamun2
                1
                 Dept. of Computer Science and Engineering, Jahangirnagar University, Dhaka, Bangladesh
                                              Email: deena.sultana@gmail.com
      2
        Dept. of Applied Physics Electronics and Communication Engineering, Dhaka University, Dhaka, Bangladesh
                                            Email: mamun.muntasir@yahoo.com


Abstract— Haar discrete wavelet transform (DWT), the                   sorts of high performance computers, for special purpose
simplest among all DWTs, has diverse applications in signal            hardware [2]-[4], for FPGAs [5][6] and for SIMD architectures
and image processing fields. A traditional approach for 2D             [7].Considerable amount of speedup is also achieved by
Haar DWT is 1D row operation followed by and 1D column                 employing GPUs with OpenGL and Cg-based implementations
operation. In 2002, Chen and Liao presented a fast algorithm
                                                                       for DWT computations [8]-[10].
for 2D Haar DWT based on segmented matrix. However, this
method is infeasible for its high computational requirements               However, GPU accelerated computation became especially
for processing large sized images. In this paper, we have              interesting since early 2007 when NVIDIA introduced CUDA
implemented the segmented matrix algorithm on a low cost               (Compute Unified Device Architecture) enabled GPUs, which
NVIDIA’s GPU to achieve speedup in computation. The                    offer massive parallel computation power. Providing many
efficiency of our GPU based implementation is measured and             hundreds of gigaflops of processing power current GPUs are
compared with CPU based algorithms. Our experimental                   leveraging the parallel computation in a more efficient way
results show performance improvement over a factor of 28.5             than on a CPU [11].
compared with Chen and Liao’s CPU based segmented matrix                    Being harnessed by many researches, these commodity
algorithm and a factor of 8 compared to MATLAB’s wavelet
                                                                       and readily available GPUs are providing dramatic
function for an image of size 2560×2560.
                                                                       computation speedup in various research fields. Joaquín
Index Terms—Haar discrete wavelet transform (DWT), CUDA,               Franco, Gregorio Bernabé, Juan Fernández and Manuel E.
GPU, segmented matrix algorithm, parallel discrete wavelet             Acacio [12] achieved significant speed up with NVIDIA’s
transform                                                              Tesla C870 over Intel’s Core 2 Quad Q6700 (2.66GHz). Vaclav
                                                                       Simek and Ram Rakesh Asn [13] used CUDA enabled GPU
             I. BACKGROUND    AND INTRODUCTION                         for accelerated 2D wavelet based image compression.
                                                                       Recently, Wladimir J. van der Laan, Andrei C. Jalba and Jos
    Discrete wavelet transforms (DWTs) has been used in a
                                                                       B.T.M. Roerdink [14] implemented a fast hybrid method for
wide range of signal and image processing applications such
                                                                       2D DWT on CUDA for both 2D images and 3D volume data.
as – image and video coding (MPEG-4 or JPEG 2000), pattern
                                                                           In this paper we have implemented the segmented matrix
recognition, image watermarking, medical image analysis etc.
                                                                       algorithm for 2D Haar wavelet transform on a low cost,
In traditional approach, 2D (two-dimensional) Haar DWT is
                                                                       commodity GPU. Our objective is to achieve computation
performed in two phase- one row operation, one column
                                                                       speed up to process large scaled images without increasing
operation, and column operation cannot be performed until
                                                                       computational complexity and cost.
the row operation is completed. Therefore, the speed of
computation degrades significantly. To address this problem,
                                                                                       II. TRADITIONAL COMPUTATION
Chen and Liao [1] proposed the segmented matrix algorithm
where computation is performed by data rearrangements and                  Haar DWT is the simplest since it only uses two low pass
one matrix multiplication. Therefore, this simple algorithm can        filter coefficients (1,1) and two high pass filter coefficients
produce the same results as traditional 2D Haar DWT with a             (1,-1). Haar wavelet transform in frequency domain can be
much faster speed. Moreover, it is highly suitable for parallel        obtained by addition and subtraction of the pixels of images.
implementation as only two rows are involved in computation            2D haar DWT decomposes an input image into four sub-
at a time.                                                             bands, one average component (WLL ) and three detail
    Nowadays large size images are common due to the                   components (WLH, WHL, WHH).
availability and advancement of image capturing technology.                Traditionally, 2D Haar wavelet transform can be
Therefore many wavelet based applications have to manage               accomplished by one row and one column operations where
large scaled image processing. Parallel computing is a direct          the result of row transform is the input of column transform.
way of speeding up these high computation requirements. A              Fig. 1 represents the 2D Haar wavelet transforms of a 4×4
significant amount of works have already been done for all             image.



© 2012 ACEEE                                                      32
DOI: 01.IJSIP.03.01.117
ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012


                                                                       The rearrangements are as follows -
                                                                       a) The elements in the first column of H are filled in WLL row
                                                                       by row.
                                                                       b) The elements in the second column of H are filled in WHL
                                                                       row by row.
                                                                       c) The elements in the third column of H are filled in WLH row
                                                                       by row.
                                                                       d) The elements in the fourth column of H are filled in WHH
                                                                       row by row.

                                                                                         IV. CUDA IMPLEMENTATION
                                                                           The CUDA platform is currently concentrating an
                                                                       enormous attention due to its tremendous potential of parallel
                                                                       processing. In November 2006, NVIDIA introduced CUDA
 Figure.1. 2D Haar DWT of a 4×4 image by traditional approach.
                                                                       with a new parallel programming model and instruction set
                                                                       architecture to solve many complex computational problems
             III. SEGMENTED MATRIX ALGORITHM
                                                                       very efficiently [11]. Each CUDA complainant device is a set
    Chen and Liao [1] proposed a computationally fast                  of multiprocessor cores where each core has SIMT (Single
algorithm called “segmented matrix algorithm” where 2D Haar            Instruction, Multiple Thread) architecture. Today four quad-
DWT can be performed by only one matrix multiplication                 core CPUs can run only 16 threads concurrently, whereas the
instead of two separate 1D transforms. The step by step                smallest executable parallel unit on a CUDA device comprised
process of this algorithm is as follows.                               of 32 threads. All CUDA enabled NVIDIA GPUs support at
Step 1: Consider I as the input image of size m×n. Form Bij=2×2        least 768 concurrently active threads per multiprocessor.
sub-blocks from original image I where i=1…m/2 and j=1…n/              Moreover, some GPUs can support 1,024 or more active
2. For example,                                                        threads per multiprocessor [11]. Devices comprise of 30
                                                                       multiprocessors (e.g. NVIDIAGeForce GTX 280), can support
                                                                       more than 30,000 active threads [15]. A good parallel
                                                                       implementation of an application on a GPU can achieve more
                                                                       than 100 times speedup over sequential execution [16].
Step 2: Z-scan each Bij and generate m×n row vectors Aij. For              In SIMT architecture of CUDA, a portion of a parallel
example,                                                               application executed many times independently on different
                                                                       data, by many threads running on different processors, at
                                                                       any given clock cycle. This parallel portion can be isolated
                                                                       into a function which is called kernel. A kernel is organized as
Step 3: Express these row matrices as an intermediate matrix           a set of thread blocks and each thread block is, in turn,
M.                                                                     organized as a three-dimensional array of threads. Threads
                                                                       within the same block can efficiently cooperate through shared
                                                                       memory and can synchronize with each other. Each thread
                                                                       has its own unique thread ID which is defined by the three
                                                                       thread indices: threadIdx.x, threadIdx.y, and threadIdx.z. Each
                                                                       block is identified by a unique two-dimensional coordinate
                                                                       given by the CUDA specific keywords blockIdx.x and
Step 4: Consider filter coefficient matrix                             blockIdx.y. All blocks must have the equal number of threads
                                                                       organized exactly in the same manner. The use of
                                                                       multidimensional identifiers simplifies memory addressing of
                                                                       multidimensional data. The block and grid dimensions,
                                                                       collectively known as execution configuration, can be set at
Find H= M×C.                                                           run-time.
                                                                           In our implementation we have used blocks each having
Step 5: Haar wavelet transform can be divided into four sub-           16×16 threads. The grid size is set at run-time according to
matrices of size
                   m
                     
                       n
                           ,                                           the size of input image. Our CUDA implementation consists
                   2   2                                               of the following steps:
                                                                       1. Copy image data from host memory to GPU memory.
   The rearrangement of the elements of H into four sub-
                                                                       2. Determine the execution configuration.
matrices will produce the resultant Haar wavelet transform
                                                                       3. GPU executes kernel to compute the elements of the
matrix W.
                                                                       intermediate matrix M on each core in a parallel fashion.

© 2012 ACEEE                                                      33
DOI: 01.IJSIP.03.01.117
ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012


4. The resulted matrix H is computed simultaneously in GPU.               Table I shows that the performance of CPU based segmented
5. Copy the result from GPU memory to host memory.                        matrix algorithm declined noticeably for large sized images,
    The CPU-based algorithm is implemented on Intel Pentium               although it performed better for small sized images. In contrast,
IV, 3.00GHz processor equipped with 512MB DDR2 RAM.                       GPU based implementation of this algorithm improved the
The GPU based algorithm is tested on NVIDIA GeForce                       performance for large over a factor of 10 to 28 for images
8500GT graphics card containing 16 cores, maximum 512                     sized 1024×1024 to 2560×2560. Moreover, it performed better
threads per block and 512 MB global memory.                               than MATLAB’s wavelet function for all small and large sized
                                                                          images. Therefore, among the three algorithms our GPU based
                  V. RESULTS AND DISCUSSION                               segmented matrix algorithm performed the best for high
                                                                          resolution images.
    To test the computational efficiency of our GPU based
                                                                              However, the main drawback of GPU computation is the
segmented matrix algorithm, we have taken images of different
                                                                          transfer time between the host memory and device memory.
sizes as inputs. Fig. 2 shows one level 2D Haar DWT of
                                                                          The time needed to copy data from the host’s memory to
256×256 lena image using CPU based and GPU based
                                                                          GPU’s global memory requires a large fraction of total
segmented matrix algorithm. For comparison we also have
                                                                          execution time. Therefore, if we exclude the data transfer time
considered MATLAB’s dwt2() function from wavelet toolbox
                                                                          from execution time, we would get significant speedup for
and the CPU implementation of segmented matrix algorithm.
                                                                          large sized images.

                                                                                                    CONCLUSIONS
                                                                              The widespread usage of the Haar Discrete Wavelet
                                                                          Transform (DWT) has motivated the implementation of a
                                                                          simple and low cost GPU based DWT algorithm. Our
                                                                          experimental results show that for an image of size 2560×2560,
                                                                          the GPU based segmented matrix algorithm is more than 28.5
                                                                          times faster than CPU computation including data transfer.
                                                                          Moreover, this GPU based method achieved approximately 8x
                                                                          speedup than the CPU based computation of MATLAB’s dwt2()
                                                                          for the same image. Due to the speedy calculations we believe
                  (a)                                 (b)                 that the ideas presented in this paper will have widespread
   Figure.2. One level 2D Haar DWT using (a) CPU based and (b)            applications in processing large sized images.
              GPU based segmented matrix algorithm.
 Table I represents the comparison of computing time of                                              REFERENCES
 MATLAB’s dwt2(), segmented matrix algorithm on CPU and
                                                                          [1] P. Y. Chen and E. C. Liao, “A new algorithm for Haar discrete
 on GPU with increasing size of input images.
                                                                          wavelet transform,” IEEE International Symposium on Intelligent
      TABLE.I. COMPUTATION TIME COMPARISON RELATIVE TO IMAGE SIZE         Signal Processing and Communication Systems, pp. 453-457, 2002.
                                                                          [2] M. Martina, G.Masera, G.Piccinini and M.Zamboni, “A VLSI
                                                                          Architecture for IWT (Integer Wavelet Transform),” Proc. of 43rd
                                                                          Midwest Symposium on Circuits and Systems, pp. 1174-1177,
                                                                          August 2000.
                                                                          [3]K. Haapala, P. Kolinummi, T. Hamalainen, and J.
                                                                          Saarinen, ”Parallel  DSP  implementation  of  wavelet  transform  in
                                                                          image compression,” Proc. of ISCAS IEEE International Symposium
                                                                          on Circuits and Systems, vol. 5. pp. 89-92, 2000.
                                                                          [4] Matthias Hopf and Thomas Ertl., “Hardware Accelerated
                                                                          Wavelet Transformations,” Proc. of EG/IEEE TCVG Symposium
                                                                          on Visualization VisSym, pp. 93–103, 2000.
                                                                          [5] C. Graves and C. Gloster, “Use of dynamically reconfigurable
                                                                          logic in adaptive wavelet packet applications,” Proc. of the 5th
                                                                          Canadian Workshop on Field-Programmable Devices, June 1998.
                                                                          [6] Baofeng Li, Yong Dou, Haifang Zhou, and Xingming Zhou,
                                                                          “FPGA accelerator for wavelet-based automated global image
                                                                          registration,” EURASIP J. Embedded Syst., pp. 1–10, 2009.
                                                                          [7] Mats Holmström, “Parallelizing the fast wavelet transform,”
                                                                          Parallel Computing, vol. 11(21), pp. 837-1848, April 1995.
                                                                          [8] T. T. Wong, C. S. Leung, P. A. Heng, and J. Wang, “Discrete
                                                                          wavelet transform on consumer-level graphics hardware,” IEEE
                                                                          Transactions on Multimedia, vol. 9(3), pp. 668–673, April 2007.


© 2012 ACEEE                                                         34
DOI: 01.IJSIP.03.01.117
ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012

[9] C. Tenllado, J. Setoain, M. Prieto, L. Piñuel, and F. Tirado,           [13] Vaclav Simek and Ram Rakesh Asn, “GPU Acceleration of
“Parallel Implementation of the 2D Discrete Wavelet Transform               2D-DWT Image Compression in MATLAB with CUDA,” Second
on Graphics Processing Units: Filter Bank versus Lifting,” IEEE             UKSIM European Symposium on Computer Modeling and
Trans. Parallel Distrib. Syst., vol. 19(3), pp. 299-310, 2008.              Simulation, pp.274-277, 2008.
[10] Antonio Garcia and Han-Wei Shen, “GPU-based 3D wavelet                 [14] Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M.
reconstruction with tileboarding,” The Visual Computer , vol. 21(8),        Roerdink, “Accelerating Wavelet Lifting on Graphics Hardware
pp. 755-763, September 2005.                                                Using CUDA,” IEEE Transactions on Parallel and Distributed
[11] “NVIDIA CUDA C Programming Guide 4.0,” Available at                    Systems, vol. 22(1), pp. 132-146, January 2011.
http://developer.nvidia.com/cuda-toolkit-3.1-downloads, accessed            [15] “CUDA C best practices guide,” Available at http://
August 02, 2011.                                                            developer.nvidia.com/cuda-toolkit-31-downloads, accessed August
[12] Joaquín Franco, Gregorio Bernabé, Juan Fernández, and Manuel           05, 2011.
E. Acacio, “A Parallel Implementation of the 2D Wavelet Transform           [16] David B. Kirk and Wen-mei W. Hwu, Programming massively
Using CUDA,” Proc. of International Conf. on Parallel, Distributed          parallel processors- a hands-on approach, Elsevier Inc., USA,
and Network-based Processing, pp.111-118, 2009.                             January 22, 2010.




© 2012 ACEEE                                                           35
DOI: 01.IJSIP.03.01.117

More Related Content

What's hot

FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...IJERA Editor
 
Lifting Scheme Cores for Wavelet Transform
Lifting Scheme Cores for Wavelet TransformLifting Scheme Cores for Wavelet Transform
Lifting Scheme Cores for Wavelet TransformDavid Bařina
 
IRJET- Digital Watermarking using Integration of DWT & SVD Techniques
IRJET- Digital Watermarking using Integration of DWT & SVD TechniquesIRJET- Digital Watermarking using Integration of DWT & SVD Techniques
IRJET- Digital Watermarking using Integration of DWT & SVD TechniquesIRJET Journal
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467IJRAT
 
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...IJSRD
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
 
A High Performance Modified SPIHT for Scalable Image Compression
A High Performance Modified SPIHT for Scalable Image CompressionA High Performance Modified SPIHT for Scalable Image Compression
A High Performance Modified SPIHT for Scalable Image CompressionCSCJournals
 
Satellite Image Resolution Enhancement Technique Using DWT and IWT
Satellite Image Resolution Enhancement Technique Using DWT and IWTSatellite Image Resolution Enhancement Technique Using DWT and IWT
Satellite Image Resolution Enhancement Technique Using DWT and IWTEditor IJCATR
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
0 nidhi sethi_finalpaper--1-5
0 nidhi sethi_finalpaper--1-50 nidhi sethi_finalpaper--1-5
0 nidhi sethi_finalpaper--1-5Alexander Decker
 
High Speed and Area Efficient 2D DWT Processor Based Image Compression
High Speed and Area Efficient 2D DWT Processor Based Image CompressionHigh Speed and Area Efficient 2D DWT Processor Based Image Compression
High Speed and Area Efficient 2D DWT Processor Based Image Compressionsipij
 
Iaetsd wavelet transform based latency optimized image compression for
Iaetsd wavelet transform based latency optimized image compression forIaetsd wavelet transform based latency optimized image compression for
Iaetsd wavelet transform based latency optimized image compression forIaetsd Iaetsd
 
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...VLSICS Design
 

What's hot (20)

An35225228
An35225228An35225228
An35225228
 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
 
Lifting Scheme Cores for Wavelet Transform
Lifting Scheme Cores for Wavelet TransformLifting Scheme Cores for Wavelet Transform
Lifting Scheme Cores for Wavelet Transform
 
IRJET- Digital Watermarking using Integration of DWT & SVD Techniques
IRJET- Digital Watermarking using Integration of DWT & SVD TechniquesIRJET- Digital Watermarking using Integration of DWT & SVD Techniques
IRJET- Digital Watermarking using Integration of DWT & SVD Techniques
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
Ppt
PptPpt
Ppt
 
476 479
476 479476 479
476 479
 
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
A High Performance Modified SPIHT for Scalable Image Compression
A High Performance Modified SPIHT for Scalable Image CompressionA High Performance Modified SPIHT for Scalable Image Compression
A High Performance Modified SPIHT for Scalable Image Compression
 
Satellite Image Resolution Enhancement Technique Using DWT and IWT
Satellite Image Resolution Enhancement Technique Using DWT and IWTSatellite Image Resolution Enhancement Technique Using DWT and IWT
Satellite Image Resolution Enhancement Technique Using DWT and IWT
 
Ceis 4
Ceis 4Ceis 4
Ceis 4
 
R044120124
R044120124R044120124
R044120124
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
0 nidhi sethi_finalpaper--1-5
0 nidhi sethi_finalpaper--1-50 nidhi sethi_finalpaper--1-5
0 nidhi sethi_finalpaper--1-5
 
High Speed and Area Efficient 2D DWT Processor Based Image Compression
High Speed and Area Efficient 2D DWT Processor Based Image CompressionHigh Speed and Area Efficient 2D DWT Processor Based Image Compression
High Speed and Area Efficient 2D DWT Processor Based Image Compression
 
Iaetsd wavelet transform based latency optimized image compression for
Iaetsd wavelet transform based latency optimized image compression forIaetsd wavelet transform based latency optimized image compression for
Iaetsd wavelet transform based latency optimized image compression for
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
 

Similar to Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on a Low Cost GPU

nternational Journal of Computational Engineering Research(IJCER)
nternational Journal of Computational Engineering Research(IJCER)nternational Journal of Computational Engineering Research(IJCER)
nternational Journal of Computational Engineering Research(IJCER)ijceronline
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
 
Reversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachReversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachIAEME Publication
 
Springer base paper
Springer base paperSpringer base paper
Springer base paperchndu
 
A robust combination of dwt and chaotic function for image watermarking
A robust combination of dwt and chaotic function for image watermarkingA robust combination of dwt and chaotic function for image watermarking
A robust combination of dwt and chaotic function for image watermarkingijctet
 
Fpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revisedFpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revisedijcite
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineeringpriyanka singh
 
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...INFOGAIN PUBLICATION
 
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...INFOGAIN PUBLICATION
 

Similar to Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on a Low Cost GPU (20)

nternational Journal of Computational Engineering Research(IJCER)
nternational Journal of Computational Engineering Research(IJCER)nternational Journal of Computational Engineering Research(IJCER)
nternational Journal of Computational Engineering Research(IJCER)
 
Ceis 5
Ceis 5Ceis 5
Ceis 5
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
 
Reversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachReversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approach
 
Ijetr011837
Ijetr011837Ijetr011837
Ijetr011837
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
Ju3417721777
Ju3417721777Ju3417721777
Ju3417721777
 
Hv2514131415
Hv2514131415Hv2514131415
Hv2514131415
 
Hv2514131415
Hv2514131415Hv2514131415
Hv2514131415
 
Springer base paper
Springer base paperSpringer base paper
Springer base paper
 
145 153
145 153145 153
145 153
 
A robust combination of dwt and chaotic function for image watermarking
A robust combination of dwt and chaotic function for image watermarkingA robust combination of dwt and chaotic function for image watermarking
A robust combination of dwt and chaotic function for image watermarking
 
K42016368
K42016368K42016368
K42016368
 
Fpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revisedFpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revised
 
Bf36342346
Bf36342346Bf36342346
Bf36342346
 
A280108
A280108A280108
A280108
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineering
 
Gh2411361141
Gh2411361141Gh2411361141
Gh2411361141
 
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
 
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
 

More from IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

More from IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on a Low Cost GPU

  • 1. ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012 Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on a Low Cost GPU Madeena Sultana1 and Nurul Muntasir Mamun2 1 Dept. of Computer Science and Engineering, Jahangirnagar University, Dhaka, Bangladesh Email: deena.sultana@gmail.com 2 Dept. of Applied Physics Electronics and Communication Engineering, Dhaka University, Dhaka, Bangladesh Email: mamun.muntasir@yahoo.com Abstract— Haar discrete wavelet transform (DWT), the sorts of high performance computers, for special purpose simplest among all DWTs, has diverse applications in signal hardware [2]-[4], for FPGAs [5][6] and for SIMD architectures and image processing fields. A traditional approach for 2D [7].Considerable amount of speedup is also achieved by Haar DWT is 1D row operation followed by and 1D column employing GPUs with OpenGL and Cg-based implementations operation. In 2002, Chen and Liao presented a fast algorithm for DWT computations [8]-[10]. for 2D Haar DWT based on segmented matrix. However, this method is infeasible for its high computational requirements However, GPU accelerated computation became especially for processing large sized images. In this paper, we have interesting since early 2007 when NVIDIA introduced CUDA implemented the segmented matrix algorithm on a low cost (Compute Unified Device Architecture) enabled GPUs, which NVIDIA’s GPU to achieve speedup in computation. The offer massive parallel computation power. Providing many efficiency of our GPU based implementation is measured and hundreds of gigaflops of processing power current GPUs are compared with CPU based algorithms. Our experimental leveraging the parallel computation in a more efficient way results show performance improvement over a factor of 28.5 than on a CPU [11]. compared with Chen and Liao’s CPU based segmented matrix Being harnessed by many researches, these commodity algorithm and a factor of 8 compared to MATLAB’s wavelet and readily available GPUs are providing dramatic function for an image of size 2560×2560. computation speedup in various research fields. Joaquín Index Terms—Haar discrete wavelet transform (DWT), CUDA, Franco, Gregorio Bernabé, Juan Fernández and Manuel E. GPU, segmented matrix algorithm, parallel discrete wavelet Acacio [12] achieved significant speed up with NVIDIA’s transform Tesla C870 over Intel’s Core 2 Quad Q6700 (2.66GHz). Vaclav Simek and Ram Rakesh Asn [13] used CUDA enabled GPU I. BACKGROUND AND INTRODUCTION for accelerated 2D wavelet based image compression. Recently, Wladimir J. van der Laan, Andrei C. Jalba and Jos Discrete wavelet transforms (DWTs) has been used in a B.T.M. Roerdink [14] implemented a fast hybrid method for wide range of signal and image processing applications such 2D DWT on CUDA for both 2D images and 3D volume data. as – image and video coding (MPEG-4 or JPEG 2000), pattern In this paper we have implemented the segmented matrix recognition, image watermarking, medical image analysis etc. algorithm for 2D Haar wavelet transform on a low cost, In traditional approach, 2D (two-dimensional) Haar DWT is commodity GPU. Our objective is to achieve computation performed in two phase- one row operation, one column speed up to process large scaled images without increasing operation, and column operation cannot be performed until computational complexity and cost. the row operation is completed. Therefore, the speed of computation degrades significantly. To address this problem, II. TRADITIONAL COMPUTATION Chen and Liao [1] proposed the segmented matrix algorithm where computation is performed by data rearrangements and Haar DWT is the simplest since it only uses two low pass one matrix multiplication. Therefore, this simple algorithm can filter coefficients (1,1) and two high pass filter coefficients produce the same results as traditional 2D Haar DWT with a (1,-1). Haar wavelet transform in frequency domain can be much faster speed. Moreover, it is highly suitable for parallel obtained by addition and subtraction of the pixels of images. implementation as only two rows are involved in computation 2D haar DWT decomposes an input image into four sub- at a time. bands, one average component (WLL ) and three detail Nowadays large size images are common due to the components (WLH, WHL, WHH). availability and advancement of image capturing technology. Traditionally, 2D Haar wavelet transform can be Therefore many wavelet based applications have to manage accomplished by one row and one column operations where large scaled image processing. Parallel computing is a direct the result of row transform is the input of column transform. way of speeding up these high computation requirements. A Fig. 1 represents the 2D Haar wavelet transforms of a 4×4 significant amount of works have already been done for all image. © 2012 ACEEE 32 DOI: 01.IJSIP.03.01.117
  • 2. ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012 The rearrangements are as follows - a) The elements in the first column of H are filled in WLL row by row. b) The elements in the second column of H are filled in WHL row by row. c) The elements in the third column of H are filled in WLH row by row. d) The elements in the fourth column of H are filled in WHH row by row. IV. CUDA IMPLEMENTATION The CUDA platform is currently concentrating an enormous attention due to its tremendous potential of parallel processing. In November 2006, NVIDIA introduced CUDA Figure.1. 2D Haar DWT of a 4×4 image by traditional approach. with a new parallel programming model and instruction set architecture to solve many complex computational problems III. SEGMENTED MATRIX ALGORITHM very efficiently [11]. Each CUDA complainant device is a set Chen and Liao [1] proposed a computationally fast of multiprocessor cores where each core has SIMT (Single algorithm called “segmented matrix algorithm” where 2D Haar Instruction, Multiple Thread) architecture. Today four quad- DWT can be performed by only one matrix multiplication core CPUs can run only 16 threads concurrently, whereas the instead of two separate 1D transforms. The step by step smallest executable parallel unit on a CUDA device comprised process of this algorithm is as follows. of 32 threads. All CUDA enabled NVIDIA GPUs support at Step 1: Consider I as the input image of size m×n. Form Bij=2×2 least 768 concurrently active threads per multiprocessor. sub-blocks from original image I where i=1…m/2 and j=1…n/ Moreover, some GPUs can support 1,024 or more active 2. For example, threads per multiprocessor [11]. Devices comprise of 30 multiprocessors (e.g. NVIDIAGeForce GTX 280), can support more than 30,000 active threads [15]. A good parallel implementation of an application on a GPU can achieve more than 100 times speedup over sequential execution [16]. Step 2: Z-scan each Bij and generate m×n row vectors Aij. For In SIMT architecture of CUDA, a portion of a parallel example, application executed many times independently on different data, by many threads running on different processors, at any given clock cycle. This parallel portion can be isolated into a function which is called kernel. A kernel is organized as Step 3: Express these row matrices as an intermediate matrix a set of thread blocks and each thread block is, in turn, M. organized as a three-dimensional array of threads. Threads within the same block can efficiently cooperate through shared memory and can synchronize with each other. Each thread has its own unique thread ID which is defined by the three thread indices: threadIdx.x, threadIdx.y, and threadIdx.z. Each block is identified by a unique two-dimensional coordinate given by the CUDA specific keywords blockIdx.x and Step 4: Consider filter coefficient matrix blockIdx.y. All blocks must have the equal number of threads organized exactly in the same manner. The use of multidimensional identifiers simplifies memory addressing of multidimensional data. The block and grid dimensions, collectively known as execution configuration, can be set at Find H= M×C. run-time. In our implementation we have used blocks each having Step 5: Haar wavelet transform can be divided into four sub- 16×16 threads. The grid size is set at run-time according to matrices of size m  n , the size of input image. Our CUDA implementation consists 2 2 of the following steps: 1. Copy image data from host memory to GPU memory. The rearrangement of the elements of H into four sub- 2. Determine the execution configuration. matrices will produce the resultant Haar wavelet transform 3. GPU executes kernel to compute the elements of the matrix W. intermediate matrix M on each core in a parallel fashion. © 2012 ACEEE 33 DOI: 01.IJSIP.03.01.117
  • 3. ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012 4. The resulted matrix H is computed simultaneously in GPU. Table I shows that the performance of CPU based segmented 5. Copy the result from GPU memory to host memory. matrix algorithm declined noticeably for large sized images, The CPU-based algorithm is implemented on Intel Pentium although it performed better for small sized images. In contrast, IV, 3.00GHz processor equipped with 512MB DDR2 RAM. GPU based implementation of this algorithm improved the The GPU based algorithm is tested on NVIDIA GeForce performance for large over a factor of 10 to 28 for images 8500GT graphics card containing 16 cores, maximum 512 sized 1024×1024 to 2560×2560. Moreover, it performed better threads per block and 512 MB global memory. than MATLAB’s wavelet function for all small and large sized images. Therefore, among the three algorithms our GPU based V. RESULTS AND DISCUSSION segmented matrix algorithm performed the best for high resolution images. To test the computational efficiency of our GPU based However, the main drawback of GPU computation is the segmented matrix algorithm, we have taken images of different transfer time between the host memory and device memory. sizes as inputs. Fig. 2 shows one level 2D Haar DWT of The time needed to copy data from the host’s memory to 256×256 lena image using CPU based and GPU based GPU’s global memory requires a large fraction of total segmented matrix algorithm. For comparison we also have execution time. Therefore, if we exclude the data transfer time considered MATLAB’s dwt2() function from wavelet toolbox from execution time, we would get significant speedup for and the CPU implementation of segmented matrix algorithm. large sized images. CONCLUSIONS The widespread usage of the Haar Discrete Wavelet Transform (DWT) has motivated the implementation of a simple and low cost GPU based DWT algorithm. Our experimental results show that for an image of size 2560×2560, the GPU based segmented matrix algorithm is more than 28.5 times faster than CPU computation including data transfer. Moreover, this GPU based method achieved approximately 8x speedup than the CPU based computation of MATLAB’s dwt2() for the same image. Due to the speedy calculations we believe (a) (b) that the ideas presented in this paper will have widespread Figure.2. One level 2D Haar DWT using (a) CPU based and (b) applications in processing large sized images. GPU based segmented matrix algorithm. Table I represents the comparison of computing time of REFERENCES MATLAB’s dwt2(), segmented matrix algorithm on CPU and [1] P. Y. Chen and E. C. Liao, “A new algorithm for Haar discrete on GPU with increasing size of input images. wavelet transform,” IEEE International Symposium on Intelligent TABLE.I. COMPUTATION TIME COMPARISON RELATIVE TO IMAGE SIZE Signal Processing and Communication Systems, pp. 453-457, 2002. [2] M. Martina, G.Masera, G.Piccinini and M.Zamboni, “A VLSI Architecture for IWT (Integer Wavelet Transform),” Proc. of 43rd Midwest Symposium on Circuits and Systems, pp. 1174-1177, August 2000. [3]K. Haapala, P. Kolinummi, T. Hamalainen, and J. Saarinen, ”Parallel  DSP  implementation  of  wavelet  transform  in image compression,” Proc. of ISCAS IEEE International Symposium on Circuits and Systems, vol. 5. pp. 89-92, 2000. [4] Matthias Hopf and Thomas Ertl., “Hardware Accelerated Wavelet Transformations,” Proc. of EG/IEEE TCVG Symposium on Visualization VisSym, pp. 93–103, 2000. [5] C. Graves and C. Gloster, “Use of dynamically reconfigurable logic in adaptive wavelet packet applications,” Proc. of the 5th Canadian Workshop on Field-Programmable Devices, June 1998. [6] Baofeng Li, Yong Dou, Haifang Zhou, and Xingming Zhou, “FPGA accelerator for wavelet-based automated global image registration,” EURASIP J. Embedded Syst., pp. 1–10, 2009. [7] Mats Holmström, “Parallelizing the fast wavelet transform,” Parallel Computing, vol. 11(21), pp. 837-1848, April 1995. [8] T. T. Wong, C. S. Leung, P. A. Heng, and J. Wang, “Discrete wavelet transform on consumer-level graphics hardware,” IEEE Transactions on Multimedia, vol. 9(3), pp. 668–673, April 2007. © 2012 ACEEE 34 DOI: 01.IJSIP.03.01.117
  • 4. ACEEE Int. J. on Signal & Image Processing, Vol. 03, No. 01, Jan 2012 [9] C. Tenllado, J. Setoain, M. Prieto, L. Piñuel, and F. Tirado, [13] Vaclav Simek and Ram Rakesh Asn, “GPU Acceleration of “Parallel Implementation of the 2D Discrete Wavelet Transform 2D-DWT Image Compression in MATLAB with CUDA,” Second on Graphics Processing Units: Filter Bank versus Lifting,” IEEE UKSIM European Symposium on Computer Modeling and Trans. Parallel Distrib. Syst., vol. 19(3), pp. 299-310, 2008. Simulation, pp.274-277, 2008. [10] Antonio Garcia and Han-Wei Shen, “GPU-based 3D wavelet [14] Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. reconstruction with tileboarding,” The Visual Computer , vol. 21(8), Roerdink, “Accelerating Wavelet Lifting on Graphics Hardware pp. 755-763, September 2005. Using CUDA,” IEEE Transactions on Parallel and Distributed [11] “NVIDIA CUDA C Programming Guide 4.0,” Available at Systems, vol. 22(1), pp. 132-146, January 2011. http://developer.nvidia.com/cuda-toolkit-3.1-downloads, accessed [15] “CUDA C best practices guide,” Available at http:// August 02, 2011. developer.nvidia.com/cuda-toolkit-31-downloads, accessed August [12] Joaquín Franco, Gregorio Bernabé, Juan Fernández, and Manuel 05, 2011. E. Acacio, “A Parallel Implementation of the 2D Wavelet Transform [16] David B. Kirk and Wen-mei W. Hwu, Programming massively Using CUDA,” Proc. of International Conf. on Parallel, Distributed parallel processors- a hands-on approach, Elsevier Inc., USA, and Network-based Processing, pp.111-118, 2009. January 22, 2010. © 2012 ACEEE 35 DOI: 01.IJSIP.03.01.117