SlideShare a Scribd company logo
1 of 6
Download to read offline
GPU Based Image Compression and Interpolation with Anisotropic
Diffusion
Vartika Sharma Umang Sehgal
Electronics and Communication Engineering Computer Science and Engineering
LNM Institute of Information Technology LNM Institute of Information Technology
Jaipur, India Jaipur, india
vartika.y12@lnmiit.ac.in umangsehgal.y12@lnmiit.ac.in
December 25, 2014
Abstract
Image compression is used to reduce irrelevance and
redundancy of the image data in order to be able
to store or transmit data in an efficient form. The
best image quality at a given bit-rate or compression
rate is the main goal of image compression. Methods
based on partial differential equation (PDEs) have
been used in the past for inpainting and reconstruc-
tion from digital image features. We go for PDE
method because optimal set for image compression
and interpolation depends on PDE, i.e., good PDEs
can cope with bad points and good points allow sim-
ple (suboptimal) PDEs. Suboptimal point set can
pay off if coded efficiently. During encoding, the basic
idea is to store only a few relevant pixel coordinates in
the encoding step. We use an adaptive triangulation
method based on binary tree coding for removing less
significant pixels from the image. Decoding is done
by the Perona and Malik diffusion process for which
the remaining points serve as scattered interpolation
data. Our goal in this paper is to analyse the poten-
tial of differential equations for image compression
and interpolation and analyse the performance speed
of the algorithm both on CPU and GPU. Graphics
Processing Units (GPUs) are used in image process-
ing because they accelerate parallel computing, are
affordable and energy efficient. Research has also
proved that GPUs perform better even at lower occu-
pancies. In this paper, we will see the advantage we
achieve with respect to the productivity and main-
tainability when applying concepts of the hardware
system. Our experiment illustrates that the compu-
tation time for CPU code increases significantly as we
increase the image dimension but higher dimensional
images are processed with equal ease using GPU com-
puting.
Keywords- Image Compression, Partial Differ-
ential Equations(PDEs), Binary Tree Triangulation,
GPU Computing
1 Introduction
Image compression is concerned with taking an
image and compressing it down to its smallest
possible size without much loss of data. The main
purpose of image compression is to reduce the file
size of an image, so that it can be transferred quickly
over a communication network. It is desirable to
have algorithms that are faster because fast coding is
useful during image interpolation and compression.
The first part of our paper deals with image com-
pression using Binary Tree Triangular Coding. We
aim to gather our compression points as leafs to the
binary tree which is challenged through triangular
coding. Further, we aim to demonstrate that in a
hierarchical tree traversal, optimizing can result in
substantial performance gains on the GPU.
In the next part, we aim at filling in missing
information in certain corrupted image areas by
means of second-order PDEs. The basic idea is to
interpolate the data in the inpainting regions by
solving appropriate boundary value problems. We
take the example of the Perona and Malik non linear
diffusion method [5] and show implementation on
both CPU and GPU.
Image processing has challenges that pose “inherent
parallel” nature. Owing to the large number of
cores, GPUs hold a potential for high performance
computing. The paper focuses on the variation in
1
the performance obtained on the CPU (Matlab and
C++) and the GPU (CUDA C++) architectures
with image interpolation using Perona and Malik
diffusion of greyscale images of different sizes.
The paper is organized as follows. Section II ex-
plains the B-tree triangulation coding scheme and
tree construction on the GPU. PDE based image
interpolation and its GPU implementation using
CUDA is discussed in section III. Results are shown
in Section IV and Section V gives the conclusion.
2 Image Compression using B-
tree triangular coding
We will discuss an algorithm for image compression
called B-tree triangular coding. [6]It is based on the
recursive decomposition of the image into isosceles-
angled triangles arranged in a binary tree. The
method is attractive because of its fast encoding,
and decoding, and because it is easy to implement
and to parallelize.
The image to be encoded is regarded as a discrete
surface, by considering a non-negative discrete
function of two discrete variables F(x, y), and
establishing a correspondence between the image
and the surface A = (x, y, c)|c = F(x, y), so that
each point in A corresponds to a pixel in the image:
where c gives the pixel’s density.
Our goal is to approximate A by a discrete surface
B = (x, y, d)|d = G(x, y), defined by a finite means
of a finite set of polyhedrons. Each polyhedron has
a right-angled triangle (RAT) face on the XY plane
and a RAT upper face approximating A. The sur-
face B is made by the upper faces of the polyhedrons.
To show how we make our binary tree using the
image, first of all, let T be a generic RAT, on the
XY plane, of vertexes
P1 = (x1, y1), P2 = (x2, y2), P3 = (x3, y3), (1)
and let,
c1 = F(x1, y1), c2 = F(x2, y2), c3 = F(x3, y3) (2)
so that,
(x1, y1, c1), (x2, y2, c2), (x3, y3, c3) ∈ A (3)
Figure 1: Image partition process using Binary Tree
Triangulation Coding. A leaf of the binary tree
marked with a small triangle indicates that the corre-
sponding triangle satisfies the uniformity predicate.
The condition inside T of the approximating function
G is given by the linear interpolation
G(x, y) = c1 + α(c2–c1) + β(c3–c1) (4)
where α and β are defined by the two relations
α =
(x − x1)(y3 − y1) − (y − y1)(x3 − x1)
(x2 − x1)(y3 − y1) − (y2 − y1)(x3 − x1)
(5)
β =
(x2 − x1)(y − y1) − (y2 − y1)(x − x1)
(x2 − x1)(y3 − y1) − (y2 − y1)(x3 − x1)
(6)
Therefore, using the definition of linear interpola-
tion, it can be concluded that, values of F and G
coincide on the vertexes of T:
F(P1) = G(P1); F(P2) = G(P2); F(P3) = G(P3)
(7)
We now let our approximation function G by defin-
ing
err(x, y) = F(x, y)–G(x, y) (8)
and see whether,
err(x, y) ≤ ε, (9)
2
Figure 2: Implementation of Binary Tree structure on the GPU. There is one thread for each node at each
level.
where ε > 0 is an adjustable quality factor.
If the condition does not hold, T is divided along
its height relative to the hypotenuse. If the subdi-
vision process is reiterated indefinitely, we eventually
obtain a minimal triangles, comprising only three pix-
els or vertexes, which satisfy the above condition as
err(x, y) = 0 on each vertex.
The topological information relative to all subdivi-
sions is stored in a hierarchical structure or B-tree.
2.1 Tree Construction on the GPU
In the recent year, general-purpose GPU comput-
ing has given rise to a number of methods for
constructing bounding volume hierarchies (BVHs).
In our case, the matter of essence is the speed of
construction.
For parallel implementation of Binary-tree trian-
gulation coding, the idea is to process the levels
of the nodes sequentially, starting from the root.
[1]Therefore, every level in the binary tree hierarchy
corresponds to a linear range of nodes. On a given
level, we launch one thread for each node that falls
into this range.
However, this process is fast only when there are
millions of objects, to fully employ the GPU. The
main short-coming that we face with the existing
methods that aim to maximize construction speed is
that they generate the node hierarchy in a sequential
pattern, which is usually one level at a time. This is
bound to limit the amount of parallelism that we can
achieve at the top levels of the tree, and can lead to
lesser utilization of the parallel cores available to us.
3 PDE based image interpola-
tion
For decompression firstly, the vertex mask is recov-
ered from the binary tree representation, and the
stored grey values are placed at the appropriate
pixel positions to give the sparse image. To recover
the vertex mask, the tree is generated in the same
order as it was stored. Along with generating nodes,
vertex positions are calculated and marked in the
vertex mask.
The second step consists in the interpolation of
the image, where the vertex mask becomes the
interpolation mask. Recovering the image is done by
treating the final image as the steady state of some
diffusion process taken over the data points (subset
of pixels selected through BTTC). After the subset
of pixels has been chosen, we will interpolate the
points and recreate the original image.
In this section we consider the following non-linear
diffusion scheme:
∂tu = (c u) (10)
where c is a conductivity function introduced by
Perona and Malik.
3
Now, we discretize the given diffusion filter as -
∂tu = ∂x(c∂xu) + ∂y(c∂xu) (11)
We will discretize the PDE using a symmetric
scheme for the first order derivatives (in a 3x3 sten-
cil), as given by Weickert [2] [3]. First we will con-
sider the term ∂x(c∂xu) . We will discretize it using
forward differences –
∂x(c∂xu) ≈
[∂x(c∂xu)]|i+1,j − [∂x(c∂xu)]|i,j
2
(12)
Now, we discretize the term c∂xu using backward-
differences -
c∂xu =
ci,j + ci−1,j ∗ ui,j − ui−1,j
2
(13)
Adding (12) and (13) together, we obtain
∂x(c∂xu) ≈ [
ci,j + ci−1,j ∗ ui,j − ui−1,j
2
]|i+1,j− (14)
[
ci,j + ci−1,j ∗ ui,j − ui−1,j
2
]|i,j (15)
Similarly, we can write for ∂y(c∂yu).
The discretization of the PDE then becomes -
ut+dt
i,j = ut
i,j +
dt
2
(∂x(c∂xu) + ∂y(c∂xu)) (16)
where dt is the step-size.
We take an image and then iterate the code n
times ( 80). For every step, the derivative of the
Gaussian kernel and the c-function of the square
of the derivative is calculated, given by the Matlab
Command-
c = exp(−grad2/(k2
));
where grad2 is the square of the gradient norm.
We then calculate the non linear diffusion step,
given by equation (16), to calculate the change in
image with every iteration. The resulting image
is our diffused image. The resulting image is our
interpolated image.
3.1 GPU implementation of Perona
and Malik using CUDA
As our algorithm require many floating point com-
putations per pixel, it can result in slow run-time
even for the fastest of CPUs. The slow speed of a
CPU is a serious hindrance to productivity. Using
CUDA, we can spawn exactly one thread per pixel.
Each thread will be responsible for calculating the
final color of exactly one pixel.
Since images are naturally two dimensional, it makes
sense to have each block be two dimensional. (32x16
is a good size because it allows each thread block to
run 512 threads). Then, we spawn as many thread
blocks in the x and y dimension as necessary to
cover the entire image. For example, for a 1024x768
image, the grid of thread blocks is 32x48, with each
thread block having 32x16 threads.
CUDA uses the GPU (device) to execute code.
A function that executes on the device is called a
kernel, which is qualified with global
A call to the kernel is done by kernel name<<<
blocks, threads >>>. [4]
Now to define the height and width of an image,
we write the code -
int i = blockIdx.y * blockDim.y + threadIdx.y;
int j = blockIdx.x * blockDim.x + threadIdx.x;
Because our function runs on a CUDA device,
the image data must be copied over to the GPU.
Therefore, the image is copied to the GPU, then
that image is copied to another place on the GPU
with a GPU to GPU memory copy. The kernel (our
function) is called, and finally the resulting image
can be copied back to the host.
Copy the data to the device -
cudaMemcpy(unsigned int *image, float k, float
timestep, int nsteps, float w, cudaMemcpyHostToDe-
vice) ;
cudaMemcpy(unsigned int *imageDataCopy,float k,
float timestep, int nsteps, float w, cudaMemcpyDevice-
ToDevice) );
Function call -
pm diffusion<<< blocks, threads >>>(image,
0.001f, 0.2, 80, w);
HANDLE ERROR(”pm diffusion() execution failed
n”);
syncthreads();
Copy the data back to the host -
cudaMemcpyunsigned int *image, float k, float
timestep, int nsteps, float w, cudaMemcpyDeviceTo-
Host);
We then run our algorithms for both CPU (Matlab
4
Figure 3: Test of Perona and Malik function. On the top, original images of three different sizes(128x128,
512x512,1024x1024) are shown. On the bottom, the result of Perona and Malik diffusion is shown (after 80
iterations).
and C++) and GPU (CUDA C++).
4 Results
To see the performance analysis, we took three images
of different sizes and run our Perona and Malik code
in Matlab, C + + and CUDAC + + using NVIDIA
GPUs.
We run the benchmark for 5 times and average
the time consumed. The results shown are for 80
iterations in Perona and Malik diffusion algorithm.
Image
Dimentions
Matlab
Time(sec)
C++
Time
(sec)
CUDA
C++
Time(sec)
128X128 3.766 1.009 0.069
512x512 12.745 3.126 0.081
1024x1024 40.737 6.309 0.106
Table 1: Test of Perona and Malik function. Com-
parison in processing speed between Matlab, C++ and
CUDA C++
5 Conclution
In comparison, the benefits of speed offered by CUDA
C++ far outweigh that of both Matlab and C++.
Processing speed is especially important when deal-
ing with high dimensional images, since many cal-
culations involve immense optimization with com-
plex equations and algorithms or calculations with
a large number of iterations. As the amount of
data increases, the computation time for both Matlab
and C++ code increases significantly, therefore CPU
codes becomes infeasible for those calculations. How-
ever, higher dimensional images are processed with
equal ease in CUDA using GPU computing. How-
ever, it must also be taken in mind that the devel-
opment time in C++ and CUDA C++ is much high
compared to Matlab.
References
[1] Taro Carras. “Maximizing Parallelism in the
Construction of BVHs, Octrees, and k-d Trees”.
In: (2012).
[2] Martin Welk Irena Galic Joachim Weickert. “Im-
age Compression with Anisotropic Diffusion”.
In: (2008).
[3] Martin Welk Irena Galic Joachim Weickert.
“Towards PDE-Based Image Compression”. In:
(2005).
[4] Edward Kandrot Jason Sanders. CUDA by Ex-
ample. 2010.
5
[5] J. Malik P. Perona. “Scale-Space and Edge De-
tection Using Anisotropic Diffusion”. In: (1992).
[6] Michele Nappi Riccardo Distasi and Sergio Vitu-
lano. “Image Compression by B-Tree Triangular
Coding”. In: (1997).
6

More Related Content

What's hot

IRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation ModelsIRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation ModelsIRJET Journal
 
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEMULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEijcsity
 
An Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingAn Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingCSCJournals
 
3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compression3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compressionAlexander Decker
 
RTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSNRTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSNIJECEIAES
 
Substitution-diffusion based Image Cipher
Substitution-diffusion based Image CipherSubstitution-diffusion based Image Cipher
Substitution-diffusion based Image CipherIJNSA Journal
 
D0325016021
D0325016021D0325016021
D0325016021theijes
 
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHM
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHMPIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHM
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHMijcsit
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
2013 1-2-07-salina
2013 1-2-07-salina2013 1-2-07-salina
2013 1-2-07-salinashruti jain
 
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...inventionjournals
 
Az2419511954
Az2419511954Az2419511954
Az2419511954IJMER
 
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...cscpconf
 
Selective image encryption using
Selective image encryption usingSelective image encryption using
Selective image encryption usingcsandit
 

What's hot (17)

IRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation ModelsIRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation Models
 
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEMULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
 
20120140504016
2012014050401620120140504016
20120140504016
 
An Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingAn Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video Coding
 
3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compression3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compression
 
RTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSNRTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSN
 
Substitution-diffusion based Image Cipher
Substitution-diffusion based Image CipherSubstitution-diffusion based Image Cipher
Substitution-diffusion based Image Cipher
 
D0325016021
D0325016021D0325016021
D0325016021
 
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHM
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHMPIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHM
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHM
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
2013 1-2-07-salina
2013 1-2-07-salina2013 1-2-07-salina
2013 1-2-07-salina
 
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...
 
Bg044357364
Bg044357364Bg044357364
Bg044357364
 
Cc24529533
Cc24529533Cc24529533
Cc24529533
 
Az2419511954
Az2419511954Az2419511954
Az2419511954
 
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
 
Selective image encryption using
Selective image encryption usingSelective image encryption using
Selective image encryption using
 

Viewers also liked

Colours through favourite cartoons
Colours through favourite cartoonsColours through favourite cartoons
Colours through favourite cartoonsSunny Day
 
Tics en la carrera
Tics en la carreraTics en la carrera
Tics en la carreraEric Garcia
 
Mahrukh jinyoung-bookshare
Mahrukh jinyoung-bookshareMahrukh jinyoung-bookshare
Mahrukh jinyoung-bookshareTriciaMowat
 
Michael Zapytowski _SFLLC
Michael Zapytowski _SFLLCMichael Zapytowski _SFLLC
Michael Zapytowski _SFLLCMike Zapytowski
 
070415 -Upasana Arora - Fellowship
070415 -Upasana Arora - Fellowship070415 -Upasana Arora - Fellowship
070415 -Upasana Arora - FellowshipUpasana Arora
 
PRES ACE Coaching and CCS R06 06-27-16
PRES ACE Coaching and CCS R06 06-27-16PRES ACE Coaching and CCS R06 06-27-16
PRES ACE Coaching and CCS R06 06-27-16Kasey Cummings
 
Metaheuristics Using Agent-Based Models for Swarms and Contagion
Metaheuristics Using Agent-Based Models for Swarms and ContagionMetaheuristics Using Agent-Based Models for Swarms and Contagion
Metaheuristics Using Agent-Based Models for Swarms and ContagionLingge Li, PhD
 
Chapter 5 - Basics
Chapter 5 - BasicsChapter 5 - Basics
Chapter 5 - BasicsMarleneDJ
 
Test case design_the_basicsv0.4
Test case design_the_basicsv0.4Test case design_the_basicsv0.4
Test case design_the_basicsv0.4guest31fced
 
The church in Philadelphia
The church in PhiladelphiaThe church in Philadelphia
The church in Philadelphiapegbaker
 
Ride the roller coaster - Bing Ads, Yahoo & Polyvore
Ride the roller coaster - Bing Ads, Yahoo & PolyvoreRide the roller coaster - Bing Ads, Yahoo & Polyvore
Ride the roller coaster - Bing Ads, Yahoo & PolyvoreCommerceHubOfficial
 
Test Case Design
Test Case DesignTest Case Design
Test Case Designacatalin
 
Presentacion del ph
Presentacion del phPresentacion del ph
Presentacion del phDuvan Tafur
 

Viewers also liked (19)

Trees
TreesTrees
Trees
 
Texas v. johnson
Texas v. johnsonTexas v. johnson
Texas v. johnson
 
Colours through favourite cartoons
Colours through favourite cartoonsColours through favourite cartoons
Colours through favourite cartoons
 
52 ways to_lose_weight_all_year
52 ways to_lose_weight_all_year52 ways to_lose_weight_all_year
52 ways to_lose_weight_all_year
 
Story board
Story board Story board
Story board
 
Tics en la carrera
Tics en la carreraTics en la carrera
Tics en la carrera
 
Mahrukh jinyoung-bookshare
Mahrukh jinyoung-bookshareMahrukh jinyoung-bookshare
Mahrukh jinyoung-bookshare
 
Michael Zapytowski _SFLLC
Michael Zapytowski _SFLLCMichael Zapytowski _SFLLC
Michael Zapytowski _SFLLC
 
070415 -Upasana Arora - Fellowship
070415 -Upasana Arora - Fellowship070415 -Upasana Arora - Fellowship
070415 -Upasana Arora - Fellowship
 
PRES ACE Coaching and CCS R06 06-27-16
PRES ACE Coaching and CCS R06 06-27-16PRES ACE Coaching and CCS R06 06-27-16
PRES ACE Coaching and CCS R06 06-27-16
 
Cv - Anil
Cv - AnilCv - Anil
Cv - Anil
 
Metaheuristics Using Agent-Based Models for Swarms and Contagion
Metaheuristics Using Agent-Based Models for Swarms and ContagionMetaheuristics Using Agent-Based Models for Swarms and Contagion
Metaheuristics Using Agent-Based Models for Swarms and Contagion
 
Chapter 5 - Basics
Chapter 5 - BasicsChapter 5 - Basics
Chapter 5 - Basics
 
Test case design_the_basicsv0.4
Test case design_the_basicsv0.4Test case design_the_basicsv0.4
Test case design_the_basicsv0.4
 
The church in Philadelphia
The church in PhiladelphiaThe church in Philadelphia
The church in Philadelphia
 
Ride the roller coaster - Bing Ads, Yahoo & Polyvore
Ride the roller coaster - Bing Ads, Yahoo & PolyvoreRide the roller coaster - Bing Ads, Yahoo & Polyvore
Ride the roller coaster - Bing Ads, Yahoo & Polyvore
 
Test Case Design
Test Case DesignTest Case Design
Test Case Design
 
Shashikant Mane Resume
Shashikant Mane ResumeShashikant Mane Resume
Shashikant Mane Resume
 
Presentacion del ph
Presentacion del phPresentacion del ph
Presentacion del ph
 

Similar to GPU_Based_Image_Compression_and_Interpolation_with_Anisotropic_Diffusion

Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniquesIRJET Journal
 
Evaluation of graphic effects embedded image compression
Evaluation of graphic effects embedded image compression Evaluation of graphic effects embedded image compression
Evaluation of graphic effects embedded image compression IJECEIAES
 
Paper id 25201490
Paper id 25201490Paper id 25201490
Paper id 25201490IJRAT
 
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEMULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEijcsity
 
7419ijcsity01.pdf
7419ijcsity01.pdf7419ijcsity01.pdf
7419ijcsity01.pdfijcsity
 
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEMULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEijcsity
 
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...VLSICS Design
 
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...VLSICS Design
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...Alexander Decker
 
Jpeg image compression using discrete cosine transform a survey
Jpeg image compression using discrete cosine transform   a surveyJpeg image compression using discrete cosine transform   a survey
Jpeg image compression using discrete cosine transform a surveyIJCSES Journal
 
Color image compression based on spatial and magnitude signal decomposition
Color image compression based on spatial and magnitude signal decomposition Color image compression based on spatial and magnitude signal decomposition
Color image compression based on spatial and magnitude signal decomposition IJECEIAES
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptxJAEMINJEONG5
 
ROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEGROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEGIJERA Editor
 
A Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWTA Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWTIJSRD
 

Similar to GPU_Based_Image_Compression_and_Interpolation_with_Anisotropic_Diffusion (20)

Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniques
 
Evaluation of graphic effects embedded image compression
Evaluation of graphic effects embedded image compression Evaluation of graphic effects embedded image compression
Evaluation of graphic effects embedded image compression
 
B017120611
B017120611B017120611
B017120611
 
Paper id 25201490
Paper id 25201490Paper id 25201490
Paper id 25201490
 
Medial axis transformation based skeletonzation of image patterns using image...
Medial axis transformation based skeletonzation of image patterns using image...Medial axis transformation based skeletonzation of image patterns using image...
Medial axis transformation based skeletonzation of image patterns using image...
 
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEMULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
 
7419ijcsity01.pdf
7419ijcsity01.pdf7419ijcsity01.pdf
7419ijcsity01.pdf
 
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEMULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGE
 
Jv2517361741
Jv2517361741Jv2517361741
Jv2517361741
 
Jv2517361741
Jv2517361741Jv2517361741
Jv2517361741
 
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
 
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
 
H010315356
H010315356H010315356
H010315356
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...
 
B070306010
B070306010B070306010
B070306010
 
Jpeg image compression using discrete cosine transform a survey
Jpeg image compression using discrete cosine transform   a surveyJpeg image compression using discrete cosine transform   a survey
Jpeg image compression using discrete cosine transform a survey
 
Color image compression based on spatial and magnitude signal decomposition
Color image compression based on spatial and magnitude signal decomposition Color image compression based on spatial and magnitude signal decomposition
Color image compression based on spatial and magnitude signal decomposition
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
 
ROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEGROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEG
 
A Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWTA Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWT
 

GPU_Based_Image_Compression_and_Interpolation_with_Anisotropic_Diffusion

  • 1. GPU Based Image Compression and Interpolation with Anisotropic Diffusion Vartika Sharma Umang Sehgal Electronics and Communication Engineering Computer Science and Engineering LNM Institute of Information Technology LNM Institute of Information Technology Jaipur, India Jaipur, india vartika.y12@lnmiit.ac.in umangsehgal.y12@lnmiit.ac.in December 25, 2014 Abstract Image compression is used to reduce irrelevance and redundancy of the image data in order to be able to store or transmit data in an efficient form. The best image quality at a given bit-rate or compression rate is the main goal of image compression. Methods based on partial differential equation (PDEs) have been used in the past for inpainting and reconstruc- tion from digital image features. We go for PDE method because optimal set for image compression and interpolation depends on PDE, i.e., good PDEs can cope with bad points and good points allow sim- ple (suboptimal) PDEs. Suboptimal point set can pay off if coded efficiently. During encoding, the basic idea is to store only a few relevant pixel coordinates in the encoding step. We use an adaptive triangulation method based on binary tree coding for removing less significant pixels from the image. Decoding is done by the Perona and Malik diffusion process for which the remaining points serve as scattered interpolation data. Our goal in this paper is to analyse the poten- tial of differential equations for image compression and interpolation and analyse the performance speed of the algorithm both on CPU and GPU. Graphics Processing Units (GPUs) are used in image process- ing because they accelerate parallel computing, are affordable and energy efficient. Research has also proved that GPUs perform better even at lower occu- pancies. In this paper, we will see the advantage we achieve with respect to the productivity and main- tainability when applying concepts of the hardware system. Our experiment illustrates that the compu- tation time for CPU code increases significantly as we increase the image dimension but higher dimensional images are processed with equal ease using GPU com- puting. Keywords- Image Compression, Partial Differ- ential Equations(PDEs), Binary Tree Triangulation, GPU Computing 1 Introduction Image compression is concerned with taking an image and compressing it down to its smallest possible size without much loss of data. The main purpose of image compression is to reduce the file size of an image, so that it can be transferred quickly over a communication network. It is desirable to have algorithms that are faster because fast coding is useful during image interpolation and compression. The first part of our paper deals with image com- pression using Binary Tree Triangular Coding. We aim to gather our compression points as leafs to the binary tree which is challenged through triangular coding. Further, we aim to demonstrate that in a hierarchical tree traversal, optimizing can result in substantial performance gains on the GPU. In the next part, we aim at filling in missing information in certain corrupted image areas by means of second-order PDEs. The basic idea is to interpolate the data in the inpainting regions by solving appropriate boundary value problems. We take the example of the Perona and Malik non linear diffusion method [5] and show implementation on both CPU and GPU. Image processing has challenges that pose “inherent parallel” nature. Owing to the large number of cores, GPUs hold a potential for high performance computing. The paper focuses on the variation in 1
  • 2. the performance obtained on the CPU (Matlab and C++) and the GPU (CUDA C++) architectures with image interpolation using Perona and Malik diffusion of greyscale images of different sizes. The paper is organized as follows. Section II ex- plains the B-tree triangulation coding scheme and tree construction on the GPU. PDE based image interpolation and its GPU implementation using CUDA is discussed in section III. Results are shown in Section IV and Section V gives the conclusion. 2 Image Compression using B- tree triangular coding We will discuss an algorithm for image compression called B-tree triangular coding. [6]It is based on the recursive decomposition of the image into isosceles- angled triangles arranged in a binary tree. The method is attractive because of its fast encoding, and decoding, and because it is easy to implement and to parallelize. The image to be encoded is regarded as a discrete surface, by considering a non-negative discrete function of two discrete variables F(x, y), and establishing a correspondence between the image and the surface A = (x, y, c)|c = F(x, y), so that each point in A corresponds to a pixel in the image: where c gives the pixel’s density. Our goal is to approximate A by a discrete surface B = (x, y, d)|d = G(x, y), defined by a finite means of a finite set of polyhedrons. Each polyhedron has a right-angled triangle (RAT) face on the XY plane and a RAT upper face approximating A. The sur- face B is made by the upper faces of the polyhedrons. To show how we make our binary tree using the image, first of all, let T be a generic RAT, on the XY plane, of vertexes P1 = (x1, y1), P2 = (x2, y2), P3 = (x3, y3), (1) and let, c1 = F(x1, y1), c2 = F(x2, y2), c3 = F(x3, y3) (2) so that, (x1, y1, c1), (x2, y2, c2), (x3, y3, c3) ∈ A (3) Figure 1: Image partition process using Binary Tree Triangulation Coding. A leaf of the binary tree marked with a small triangle indicates that the corre- sponding triangle satisfies the uniformity predicate. The condition inside T of the approximating function G is given by the linear interpolation G(x, y) = c1 + α(c2–c1) + β(c3–c1) (4) where α and β are defined by the two relations α = (x − x1)(y3 − y1) − (y − y1)(x3 − x1) (x2 − x1)(y3 − y1) − (y2 − y1)(x3 − x1) (5) β = (x2 − x1)(y − y1) − (y2 − y1)(x − x1) (x2 − x1)(y3 − y1) − (y2 − y1)(x3 − x1) (6) Therefore, using the definition of linear interpola- tion, it can be concluded that, values of F and G coincide on the vertexes of T: F(P1) = G(P1); F(P2) = G(P2); F(P3) = G(P3) (7) We now let our approximation function G by defin- ing err(x, y) = F(x, y)–G(x, y) (8) and see whether, err(x, y) ≤ ε, (9) 2
  • 3. Figure 2: Implementation of Binary Tree structure on the GPU. There is one thread for each node at each level. where ε > 0 is an adjustable quality factor. If the condition does not hold, T is divided along its height relative to the hypotenuse. If the subdi- vision process is reiterated indefinitely, we eventually obtain a minimal triangles, comprising only three pix- els or vertexes, which satisfy the above condition as err(x, y) = 0 on each vertex. The topological information relative to all subdivi- sions is stored in a hierarchical structure or B-tree. 2.1 Tree Construction on the GPU In the recent year, general-purpose GPU comput- ing has given rise to a number of methods for constructing bounding volume hierarchies (BVHs). In our case, the matter of essence is the speed of construction. For parallel implementation of Binary-tree trian- gulation coding, the idea is to process the levels of the nodes sequentially, starting from the root. [1]Therefore, every level in the binary tree hierarchy corresponds to a linear range of nodes. On a given level, we launch one thread for each node that falls into this range. However, this process is fast only when there are millions of objects, to fully employ the GPU. The main short-coming that we face with the existing methods that aim to maximize construction speed is that they generate the node hierarchy in a sequential pattern, which is usually one level at a time. This is bound to limit the amount of parallelism that we can achieve at the top levels of the tree, and can lead to lesser utilization of the parallel cores available to us. 3 PDE based image interpola- tion For decompression firstly, the vertex mask is recov- ered from the binary tree representation, and the stored grey values are placed at the appropriate pixel positions to give the sparse image. To recover the vertex mask, the tree is generated in the same order as it was stored. Along with generating nodes, vertex positions are calculated and marked in the vertex mask. The second step consists in the interpolation of the image, where the vertex mask becomes the interpolation mask. Recovering the image is done by treating the final image as the steady state of some diffusion process taken over the data points (subset of pixels selected through BTTC). After the subset of pixels has been chosen, we will interpolate the points and recreate the original image. In this section we consider the following non-linear diffusion scheme: ∂tu = (c u) (10) where c is a conductivity function introduced by Perona and Malik. 3
  • 4. Now, we discretize the given diffusion filter as - ∂tu = ∂x(c∂xu) + ∂y(c∂xu) (11) We will discretize the PDE using a symmetric scheme for the first order derivatives (in a 3x3 sten- cil), as given by Weickert [2] [3]. First we will con- sider the term ∂x(c∂xu) . We will discretize it using forward differences – ∂x(c∂xu) ≈ [∂x(c∂xu)]|i+1,j − [∂x(c∂xu)]|i,j 2 (12) Now, we discretize the term c∂xu using backward- differences - c∂xu = ci,j + ci−1,j ∗ ui,j − ui−1,j 2 (13) Adding (12) and (13) together, we obtain ∂x(c∂xu) ≈ [ ci,j + ci−1,j ∗ ui,j − ui−1,j 2 ]|i+1,j− (14) [ ci,j + ci−1,j ∗ ui,j − ui−1,j 2 ]|i,j (15) Similarly, we can write for ∂y(c∂yu). The discretization of the PDE then becomes - ut+dt i,j = ut i,j + dt 2 (∂x(c∂xu) + ∂y(c∂xu)) (16) where dt is the step-size. We take an image and then iterate the code n times ( 80). For every step, the derivative of the Gaussian kernel and the c-function of the square of the derivative is calculated, given by the Matlab Command- c = exp(−grad2/(k2 )); where grad2 is the square of the gradient norm. We then calculate the non linear diffusion step, given by equation (16), to calculate the change in image with every iteration. The resulting image is our diffused image. The resulting image is our interpolated image. 3.1 GPU implementation of Perona and Malik using CUDA As our algorithm require many floating point com- putations per pixel, it can result in slow run-time even for the fastest of CPUs. The slow speed of a CPU is a serious hindrance to productivity. Using CUDA, we can spawn exactly one thread per pixel. Each thread will be responsible for calculating the final color of exactly one pixel. Since images are naturally two dimensional, it makes sense to have each block be two dimensional. (32x16 is a good size because it allows each thread block to run 512 threads). Then, we spawn as many thread blocks in the x and y dimension as necessary to cover the entire image. For example, for a 1024x768 image, the grid of thread blocks is 32x48, with each thread block having 32x16 threads. CUDA uses the GPU (device) to execute code. A function that executes on the device is called a kernel, which is qualified with global A call to the kernel is done by kernel name<<< blocks, threads >>>. [4] Now to define the height and width of an image, we write the code - int i = blockIdx.y * blockDim.y + threadIdx.y; int j = blockIdx.x * blockDim.x + threadIdx.x; Because our function runs on a CUDA device, the image data must be copied over to the GPU. Therefore, the image is copied to the GPU, then that image is copied to another place on the GPU with a GPU to GPU memory copy. The kernel (our function) is called, and finally the resulting image can be copied back to the host. Copy the data to the device - cudaMemcpy(unsigned int *image, float k, float timestep, int nsteps, float w, cudaMemcpyHostToDe- vice) ; cudaMemcpy(unsigned int *imageDataCopy,float k, float timestep, int nsteps, float w, cudaMemcpyDevice- ToDevice) ); Function call - pm diffusion<<< blocks, threads >>>(image, 0.001f, 0.2, 80, w); HANDLE ERROR(”pm diffusion() execution failed n”); syncthreads(); Copy the data back to the host - cudaMemcpyunsigned int *image, float k, float timestep, int nsteps, float w, cudaMemcpyDeviceTo- Host); We then run our algorithms for both CPU (Matlab 4
  • 5. Figure 3: Test of Perona and Malik function. On the top, original images of three different sizes(128x128, 512x512,1024x1024) are shown. On the bottom, the result of Perona and Malik diffusion is shown (after 80 iterations). and C++) and GPU (CUDA C++). 4 Results To see the performance analysis, we took three images of different sizes and run our Perona and Malik code in Matlab, C + + and CUDAC + + using NVIDIA GPUs. We run the benchmark for 5 times and average the time consumed. The results shown are for 80 iterations in Perona and Malik diffusion algorithm. Image Dimentions Matlab Time(sec) C++ Time (sec) CUDA C++ Time(sec) 128X128 3.766 1.009 0.069 512x512 12.745 3.126 0.081 1024x1024 40.737 6.309 0.106 Table 1: Test of Perona and Malik function. Com- parison in processing speed between Matlab, C++ and CUDA C++ 5 Conclution In comparison, the benefits of speed offered by CUDA C++ far outweigh that of both Matlab and C++. Processing speed is especially important when deal- ing with high dimensional images, since many cal- culations involve immense optimization with com- plex equations and algorithms or calculations with a large number of iterations. As the amount of data increases, the computation time for both Matlab and C++ code increases significantly, therefore CPU codes becomes infeasible for those calculations. How- ever, higher dimensional images are processed with equal ease in CUDA using GPU computing. How- ever, it must also be taken in mind that the devel- opment time in C++ and CUDA C++ is much high compared to Matlab. References [1] Taro Carras. “Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees”. In: (2012). [2] Martin Welk Irena Galic Joachim Weickert. “Im- age Compression with Anisotropic Diffusion”. In: (2008). [3] Martin Welk Irena Galic Joachim Weickert. “Towards PDE-Based Image Compression”. In: (2005). [4] Edward Kandrot Jason Sanders. CUDA by Ex- ample. 2010. 5
  • 6. [5] J. Malik P. Perona. “Scale-Space and Edge De- tection Using Anisotropic Diffusion”. In: (1992). [6] Michele Nappi Riccardo Distasi and Sergio Vitu- lano. “Image Compression by B-Tree Triangular Coding”. In: (1997). 6