Fast dct algorithm using winograd’s method

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
INTERNATIONAL JOURNAL OF ELECTRONICS AND
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 3, Issue 1, January- June (2012), pp. 98-110
IJECET
© IAEME: www.iaeme.com/ijecet.html
Journal Impact Factor (2011): 0.8500 (Calculated by GISI) ©IAEME
www.jifactor.com

FAST DCT ALGORITHM USING WINOGRAD’S METHOD
Ch. Ramesh1, Dr.N.B. Venkateswarlu2, Dr. J.V.R. Murthy3
1
Professor, Dept .of CSE, AITAM, Tekkali, A.P, India
chappa_ramesh01@yahoo.co.in
2
Professor, Dept .of CSE, AITAM, Tekkali, A.P, India
venkat_ritch@yahoo.com
3
Professor, Dept .of CSE, College of Engineering, JNTUK, A.P, India
mjonnalagedda@yahoo.com

ABSTRACT

Applications of Digital Image Communication have increased exponentially in the
recent years. Evidently, discrete cosine transform (DCT) based algorithms are in wide
use for reducing communication cost. Forward DCT and inverse DCT computation
are reported to be taking very long time which may often impede real time responses
in some applications. In this paper, we present Winograd’s matrix multiplication
approach for forward DCT and inverse DCT computation to reduce their CPU time.
Experiments are made with standard images and synthetic images.

Key Words: DCT, IDCT, Winograd’s, JPEG, MATLAB

I. INTRODUCTION

Discrete cosine transform (DCT) based algorithms such as JPEG, JPEG2000, MP3,
are the most widely used in the audio, image, and video data compression. DCT is
originally developed by Ahmed, Natarajan, and Rao (1974). Its application to image
compression was pioneered by Chen and Pratt (1984). DCT is a technique for
converting a signal into elementary frequency components. DCT represents an image
as a sum of sinusoids of varying magnitudes and frequencies.

The DCT has the property that, for a typical image, most of the visually significant
information about the image is concentrated in just a few coefficients of the DCT, for
this reason the DCT is often used in image compression applications [3]. The cosines
transform converts each block of spatial information into an efficient frequency space
representation that is better suited for compression. Specifically, the transform
produces an array of coefficients for real-valued basis functions that represent each
block of data in frequency space. The magnitude of the DCT coefficients exhibits a
distinct pattern within the array, where transform coefficients corresponding to the

98

lowest frequency basis functions usually have the highest magnitude and are the most
perceptually significant. Similarly, discrete cosine transform coefficients
corresponding to the highest frequency basis functions usually have the lowest
magnitude and are the least perceptually significant. In DCT based compression
methods, only important DCT coefficients are retained. Thus, we achieve
compression of data.

The 2D-DCT equation (Eq-1) computes the u, v th entry of the DCT of an image [5]
 ( 2 x + 1)uπ 
N −1 N −1
 (2 y + 1)vπ  for u, v = 0, 1, 2 …….N-1 (1)
C (u, v ) = α (u )α (v)∑∑ f ( x, y ) cos  cos 
x =0 y =0  2N   2N 

 1/ N for u = 0
α (u ) =  (2)
 2/ N for u > 0
 1/ N for u = 0
α (v ) =  (3)
 2/ N for u > 0

f (x, y) is the x, yth element of the image represented by the matrix f. N is the size of
the block that the DCT is done on. The equation calculates one entry (u, vth) of the
transformed image from the pixel values of the original image matrix.

The first coefficient C00 is termed the “DC coefficient” and the remaining coefficients
are called the “AC coefficients”. After performing DCT, the remaining operations at
the sender side are quantization, zigzag and encoding. The reverse operations at the
receiving side are decoding, inverse zigzag, de-quantization and IDCT. As these
concepts are widely reported elsewhere, we skip discussion about them for the
reasons of terseness.

The IDCT is a transform that converts a set of frequency coefficients to a signal for an
image, this transform is performed on a 2 dimensional array of coefficients resulting
in a 2 dimensional array of samples.

The 2D-IDCT equation (Eq-4) computes the x, yth entry of an image. [5]

N −1 N −1
 (2 x + 1)uπ   (2 y + 1)vπ  (4)
f ( x, y ) = ∑∑α (u )α (v)C (u, v) cos  cos 2 N 
u =0 v =0  2N   
for x, y = 0, 1, 2 …….N-1

C (u, v) is u, vth DCT coefficient of the image represented by the matrix C. N is the
size of the block that the IDCT is done on. The equation calculates one entry (x, yth)
of the image from the transformed coefficients of the IDCT matrix.
This paper is organized as follows. In section II, we have given a brief overview of
DCT and IDCT algorithms by conventional approach . The proposed Winograd’s
based DCT and IDCT algorithms are described in section III. Experimental results are
presented in IV. Finally, concluding remarks are given in section V.

II. Computational Complexity of Conventional DCT/IDCT

99


 (2 x + 1)uπ 
In the 2D-DCT (Eq-1) the cosine functions cos   and
 2N 
 (2 y + 1)vπ   (2 y + 1)vπ 
cos   are computationally very expensive. cos   is the transpose
 2N   2N 
 (2 x + 1)uπ   (2 x + 1)uπ 
of cos   . Calculation of cos   requires 4 multiplications, 1
 2N   2N 
addition, and 1 division. For calculation of each element in DCT matrix the loop in
 (2 x + 1)uπ 
Eq-1 iterates 64 times. Therefore cos   requires 256 multiplications, 64
 2N 
additions and 64 divisions. For calculation of all elements in DCT matrix, it requires
16384 multiplications 4096 additions and 4096 divisions. Therefore both the cos
functions require 32768 multiplications 8192 additions and 8192 divisions. Therefore
the way to improve the performance is to pre compute the coefficients and read them
during DCT algorithms. In this way for the calculation of each element in DCT
matrix, the Eq-1 requires 130 multiplications, 63 additions and 2 divisions. Similarly
for the calculation of all the elements in 8x8 DCT matrix The Eq-1 requires 8320
multiplications,4023 additions and 2 divisions. For the calculation of each element in
IDCT matrix, the Eq-4 requires 256 multiplications, 63 additions and 2 divisions.
Similarly for the calculation of all the elements in 8x8 IDCT matrix The Eq-4
requires 16384 multiplications, 4023 additions and 2 divisions. The IDCT requires
more number of arithmetic operations compared to DCT.

III. Winograd’s Approach

Consider calculation of scalar or dot product of two vectors, X and Y
X = [x1, x2, …..xN] (5)
Y = [y1, y2,…...yN] (6)
T
X Y= x1y1 +x2y2+ …+xNyN (7)
This calculation usually requires N multiplications and N additions. Winograd’s
algorithm [2, 5] is used in the literature to reduce these computations in applications
such as classification, etc. According to [2],
XTY= [(x1+y2)(x2+y1)+(x3+y4)(x4+y3) + …. + (xN+yN-1)(xN-1+yN) ] –
[ x1x2+x3x4+ … xN-1xN] –
[ y1y2 + y3y4 + …. + yN-1yN] (8)

This can be also represented as below assuming k=N/2.
2k k (9) k k
X T Y = ∑xi yi = ∑( x2u −1 + y2u )(x2u + y2u −1 ) − ∑x2u x2u −1 − ∑y2u y2u −1
i =1 u =1 u =1 u =1

By representing in the above form, if last two terms are assumed to be pre-calculated,
we can get dot product with N/2 multiplications itself. In some applications, last two
terms can be re-used. Thus, we may get computational benefit. This theme we
propose to use in our DCT/IDCT algorithm’s by extending this to matrix
multiplication. Of course, here we have assumed N is even number, thus N/2 pairs are
available. If N is not even, we can simply convert X and Y into even by adding one 0
at the end.

Though Winograd’s algorithm [2, 5] reduces actual computations involved, its
asymptotic computational complexity is same as the naïve matrix multiplication

100

algorithm. In our DCT algorithm, we are required to carry a series of matrix
multiplications. We propose to reduce CPU time requirements by meticulously using
Winograd’s method.

For the multiplication of two N x N square matrices A and B Winogard’s algorithm is
defined as shown in equations (11), (12) and (13) below.

Ci,j=Product of Ai and Bj (10)

n/2
Ci , j = ∑ ( ai , 2 K −1 + b2 K , j )(ai , 2 K + b2 K −1, j ) − Ai − B j
(11)
K =1

n/2
Ai = ∑ ai , 2 K −1 .a i , 2 K (12)
K =1
n/2
B j = ∑ b2 K −1, j .b2 K , j (13)
K =1

Ai → Sum of pairwise multiplication of couples in ith row.
B j → Sum of pairwise multiplication of couples in jth column.
Ci , j → ith row, jth column element of matrix C.

Since Ai and Bj are pre-computed once for each row of A and column of B. They
require only N2 multiplications. That is, to calculate pair-wise product of any row or
column of N x N matrix, we need N/2 multiplications. For N rows or columns, we
need NxN/2 multiplications. Thus, in total to calculate pair-wise product of rows of A
and columns of B, we need N2 multiplications. The total number of multiplications
1 3
needed to calculate matrix product becomes: N + N 2 . However, the number of
2
3
additions and subtractions has been increased to ( ) N 3 + 2 N 2 − 2 N .Winograd’s
2
algorithm is theoretically faster than the naïve matrix multiplication algorithm,
because additions takes very less CPU time compared to multiplications. In DCT or
IDCT computations matrices are of size 8 x 8. For an 8x8 matrix, each Ai calculation
requires 4 multiplications, 3 additions. Each B j calculation requires 4

multiplications, 3 additions. Each Ci , j calculation requires 4 multiplications, 3
additions and 2 subtractions. For multiplication of two 8x8 matrices 320
multiplications, 752 additions and 128 subtractions are required.

Now let us discuss how Winograd’s matrix multiplication method can be used with
DCT. The Eq-1 in matrix notation can be represented as [12]
T
C (u, v ) = α (u )α (v) * (c1 * f ( x, y) * c1 ) (14)
f (x, y) → 8 x 8 image block
c1 → 8 x 8 matrix belongs to the 1st cos function in Eq-1 – c1 is

101

Constant for all the blocks.
C1T → 8 x 8 matrix belongs to the 2nd cos function in Eq-1 – C1T is
C1T is the transpose of c1
We propose the following steps to calculate C (u , v ) .
1. Calculate the Matrix product of c1 and f ( x, y )
2. Calculate the product of Result at Step 1 and C1T.
3. Multiply the resultant Matrix at Step 2 with scalar α ( u ) α ( v )
For the multiplication of two matrices (c1 and f (x, y)) Ai calculation requires 32
multiplications, 24 additions, Bj calculation requires 32 multiplications, 24 additions
and cij calculation requires 256 multiplications, 704 additions,128 subtractions.
Therefore totally 320 multiplications, 752 additions, and 128 subtractions are
required. For the multiplication of resultant matrix (c1*f (x, y)) with C1T, Ai
calculation requires 32 multiplications, 24 additions, Bj is not required any
calculations because Bj in c1T is same as Ai in c1, cij calculation requires 256
multiplications, 704 additions, 128 subtractions. Therefore the term c1 * f (x, y)*c1T
calculation requires 608 multiplications, 1480 additions and 128 subtractions. Ai for
c1 or Bj for C1T is constant irrespective of the image block. For the calculation of all
the elements in 8 x 8 DCT matrix, according to Eq-14 requires 736 multiplications
1480 additions, 128 subtractions and 4 divisions. Thus, the no of arithmetic operations
required is less compared to conventional approach.

Now let us discuss how Winograd’s can be used with IDCT. The Eq-4 in matrix
notation is
T
f ( x, y ) = c1 * (α (u )α (v) * c(u, v)) * c1 ) (15)

c1T → 8 x 8 matrix belongs to the 1st cos function in Eq-4 – c1T is
C1 → 8 x 8 matrix belongs to the 2nd cos function in Eq-4 – c1 is
c4 is the transpose of c3

We propose the following steps to calculate f ( x, y )
1. Multiply the Matrix c(u , v) with scalar α (u )α (v)
2. Calculate the Product of Resultant Matrix at Step 1 with c1
T
3. Calculate the Product of c 1 with the Resultant Matrix at Step 2.

α (u )α (v) * c(u, v) calculation requires 4 divisions and 128 multiplications. For
the multiplication of two matrices C1T and (α (u )α (v) * c(u, v)) Ai Calculation
requires 32 multiplications, 24 additions, Bj calculation requires 32 multiplications, 24
additions and cij requires 256 multiplications, 704 additions,128 subtractions.
Therefore totally 320 multiplications, 752 additions, and 128 subtractions are
T
required. For the multiplication of resultant matrix c1 * (α (u )α (v) * c(u, v)) with
C1, Ai calculation requires 32 multiplications, 24 additions, Bj is not required any
calculations because Bj in c1 is same as Ai in, c1T ,cij calculation requires 256

102

multiplications, 704 additions, 128 subtractions. For the calculation of all the
elements in 8 x 8 DCT matrix, according to Eq-15 requires 736 multiplications 1480
additions, 128 subtractions and 4 divisions. The total no of arithmetical operations
required is less when compared to conventional approach.

IV. EXPERIMENTAL WORK

In this study a number of images in tiff format are used including the widely used
Lena, Mandrill and Pepper images. The Table-1 shows the complete details of images
used in our study.

S. No Fig No Image Size Type
1 1(a) Chess 128x128 Gray
2 1(b) Helmet 128x128 Gray
3 1(c) X-ray 128x128 Gray
4 1(d) Clock 256x256 Gray
5 1(e) Moon surface 256x256 Gray
6 1(f) Cameraman 256x256 Gray
7 1(g) Lena 512x512 Gray
8 1(h) Mandrill 512x512 Gray
9 1(i) Peppers 512x512 Gray
10 1(j) Man 1024x1024 Gray
11 1(k) Airplane2 1024x1024 Gray
12 1(l) Airport 1024x1024 Gray
13 1(m) Flowers 2048x2048 Gray
14 1(n) Flowers1 2048x2048 Gray
15 1(0) City 2048x2048 Gray
16 2(a) Couple 128x128 Color
17 2(b) House 128x128 Color
18 2(c) Jennybeans1 128x128 Color
19 2(d) Girl1 256x256 Color
20 2(e) Jennybeans 256x256 Color
21 2(f) Tree 256x256 Color
22 2(g) Girl2 512x512 Color
23 2(h) Sailboat 512x512 Color
24 2(i) Splash 512x512 Color
25 2(j) Oakland 1024x1024 Color
26 2(k) Richmond 1024x1024 Color
27 2(l) Shreport 1024x1024 Color
28 2(m) Flowers 2048x2048 Color
29 2(n) Flowers1 2048x2048 Color
30 2(0) City 2048x2048 Color

Table 1: Details of images used in our study

All the above images are taken from the USC-SIPI image database
“http://sipi.usc.edu/database” [6]

103

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME

Experiments are carried out on MS Windows XP version 2002, SP3 edition and Fedora 10,
Kernel Linux 2.6.27.5-117. fc 10.i 686. The system is equipped with Intel core 2 Duo 2.60
GHz with 1 GB RAM. Under Windows XP, programs are written in C language under
Micro Soft Visual Studio 2005 version 8.0. Under Linux, we have used GNU g++ 4.32.
MATLAB is a popular numerical computing environment and fourth generation
programming language developed by Mathworks. The dct2() function in the image
processing tool box computes the two dimensional discrete cosine transforms (DCT) of an
image. The idct2() function in the image processing tool box computes the two dimensional
inverse discrete cosine transform (IDCT). We have used these functions to compare our
algorithms performance. In Windows environment the CPU time for DCT and IDCT is
calculated by using the function GetSystemTime(). In UNIX environment the CPU time for
DCT and IDCT is calculated by using the function gettimeofday(). In MATLAB the CPU
time for DCT and IDCT is calculated by using the function cputime().

Speed Up of Speed Up of
S. Conventional Wino Winograd’s Winograd’s compared
Fig No MAT LAB
No Approach grad’s compared to to Conventional
MATLAB Approach

1 1(a) 0.0313 0.0167 0.0035 8.942 4.771
2 1(b) 0.0313 0.0168 0.0035 8.942 4.8
3 1(c) 0.0312 0.0167 0.0035 8.914 4.771
4 1(d) 0.1250 0.0686 0.0140 8.928 4.9
5 1(e) 0.1248 0.0685 0.0141 8.851 4.858
6 1(f) 0.1250 0.0689 0.0141 8.865 4.886
7 1(g) 0.4688 0.2715 0.0550 8.523 4.936
8 1(h) 0.4683 0.2714 0.0530 8.835 5.120
9 1(i) 0.4688 0.2716 0.0550 8.523 4.938
10 1(j) 1.8288 1.0949 0.2250 8.128 4.866
11 1(k) 1.8290 1.0949 0.2250 8.128 4.866
12 1(l) 1.8281 1.0948 0.2240 8.161 4.8875
13 1(m) 7.1242 4.4053 0.9030 7.889 4.878
14 1(n) 7.1240 4.4051 0.9030 7.889 4.878
15 1(0) 7.1250 4.4055 0.9040 7.881 4.873

Table 2: CPU time in Secs for DCT Calculation (Windows Environment)-gray level images

104


Conventional Wino Winograd’s Winograd’s compared
S. No Fig No MAT LAB
Approach grad’s compared to to Conventional
MATLAB Approach

1 2(a) 0.0939 0.051 0.0105 8.942 4.857
2 2(b) 0.0939 0.0512 0.0105 8.942 4.876
3 2(c) 0.0939 0.0512 0.0105 8.942 4.876
4 2(d) 0.3273 0.2035 0.0406 8.061 5.012
5 2(e) 0.3280 0.2049 0.0407 8.058 5.034
6 2(f) 0.3282 0.205 0.042 7.814 4.880
7 2(g) 1.3124 0.8124 0.1672 7.849 4.858
8 2(h) 1.3125 0.8124 0.1673 7.845 4.855
9 2(i) 1.3125 0.8126 0.1673 7.845 4.857
10 2(j) 5.2521 3.3 0.68 7.723 4.852
11 2(k) 5.2560 3.301 0.68 7.729 4.854
12 2(l) 5.2500 3.2846 0.672 7.8125 4.887
13 2(m) 21.842 12.3595 2.7037 8.078 4.571
14 2(n) 21.6093 12.3592 2.7036 7.992 4.571
15 2(0) 21.8439 12.3633 2.7037 8.079 4.572

Table 3: CPU time in Secs for DCT Calculation (Windows Environment)-color images

Conventional Winograd’s Winograd’s compared
S. No Fig No MAT LAB Wino grad’s
Approach compared to to Conventional
MAT LAB Approach
1 1(a) 0.0313 0.014 0.0029 10.793 4.827
2 1(b) 0.0313 0.014 0.0029 10.793 4.827
3 1(c) 0.0312 0.014 0.0029 10.793 4.827
4 1(d) 0.125 0.0573 0.0117 10.683 4.897
5 1(e) 0.1248 0.0568 0.0116 10.758 4.896
6 1(f) 0.1250 0.0575 0.0117 10.683 4.914
7 1(g) 0.4688 0.2267 0.0466 10.060 4.864
8 1(h) 0.4683 0.2262 0.0465 10.070 4.864
9 1(i) 0.4688 0.2268 0.0468 10.017 4.846
10 1(j) 1.8288 0.9142 0.1871 9.774 4.886
11 1(k) 1.8290 0.9153 0.1873 9.765 4.886
12 1(l) 1.8281 0.9139 0.1870 9.775 4.887
13 1(m) 7.1242 3.6676 0.7500 9.498 4.890
14 1(n) 7.1240 3.6515 0.7479 9.525 4.882
15 1(0) 7.1250 3.6774 0.7546 9.442 4.873

Table 4: CPU time in Secs for DCT Calculation (UNIX Environment)-gray level images

105


MATLAB Approach
1 2(a) 0.0939 0.0426 0.0087 10.793 4.896
2 2(b) 0.0939 0.0424 0.0087 10.793 4.873
3 2(c) 0.0939 0.0423 0.0086 10.918 4.918
4 2(d) 0.3273 0.1699 0.0339 9.654 5.011
5 2(e) 0.3280 0.1707 0.0349 9.398 4.891
6 2(f) 0.3282 0.1708 0.0350 9.377 4.88
7 2(g) 1.3124 0.6782 0.1382 9.496 4.907
8 2(h) 1.3125 0.6798 0.1396 9.401 4.869
9 2(i) 1.3125 0.6803 0.1412 9.295 4.817
10 2(j) 5.2521 2.7466 0.5618 9.348 4.888
11 2(k) 5.2560 2.7561 0.5620 9.352 4.904
12 2(l) 5.2500 2.7418 0.5610 9.358 4.887
13 2(m) 21.842 11.00 2.2393 9.753 4.912
14 2(n) 21.6093 10.9607 2.1900 9.867 5.004
15 2(0) 21.8439 11.0320 2.2569 9.678 4.888

Table 5: CPU time in Secs for DCT Calculation (UNIX Environment)-color images

MAT LAB Approach
1 1(a) 0.0313 0.0203 0.0031 10.09 6.54
2 1(b) 0.0313 0.0202 0.0032 9.78 6.31
3 1(c) 0.0313 0.0204 0.0032 9.78 6.37
4 1(d) 0.1406 0.0837 0.0129 10.89 6.48
5 1(e) 0.1407 0.0838 0.0129 10.90 6.48
6 1(f) 0.1406 0.0837 0.0128 10.98 6.53
7 1(g) 0.5314 0.3278 0.049 10.84 6.82
8 1(h) 0.5469 0.3279 0.050 10.93 6.55
9 1(i) 0.5313 0.3276 0.049 10.84 6.68
10 1(j) 2.1716 1.3184 0.2076 10.46 6.35
11 1(k) 2.1710 1.3183 0.2076 10.45 6.35
12 1(l) 2.1719 1.3185 0.2078 10.45 6.34
13 1(m) 8.6406 5.24 0.8292 10.42 6.31
14 1(n) 8.6094 5.241 0.8292 10.38 6.32
15 1(0) 8.5 5.239 0.8291 10.25 6.31

Table 6: CPU time in Seconds for IDCT Calculation (Windows Environment)-gray level images

106


Conventional Winograd’s Winograd’s compared
S. No Fig No MAT LAB Wino grad’s
Approach compared to to Conventional
MAT LAB Approach

1 2(a) 0.0938 0.0608 0.0092 10.19 6.60
2 2(b) 0.0938 0.0609 0.0091 10.30 6.69
3 2(c) 0.0938 0.0608 0.0092 10.19 6.60
4 2(d) 0.4218 0.2511 0.0388 10.87 6.47
5 2(e) 0.4216 0.251 0.0387 10.89 6.48
6 2(f) 0.424 0.253 0.0388 10.92 6.52
7 2(g) 1.5942 0.9833 0.147 10.84 6.68
8 2(h) 1.5944 0.9835 0.148 10.77 6.64
9 2(i) 1.5939 0.9857 0.148 10.76 6.66
10 2(j) 6.6578 3.9552 0.6227 10.69 6.35
11 2(k) 6.6589 3.9573 0.6229 10.69 6.35
12 2(l) 6.6564 3.9570 0.6226 10.69 6.35
13 2(m) 25.926 15.720 2.4876 10.42 6.31
14 2(n) 25.7641 15.692 2.4869 10.35 6.30
15 2(0) 25.7814 15.696 2.4871 10.36 6.31

Table 7: CPU time in Secs for IDCT Calculation (Windows Environment)-color images

Speed Up of
Speed Up of
Winograd’s
S. Fig MAT Conventional Wino Winograd’s
compared to
No No LAB Approach grad’s compared to
Conventional
MAT LAB
Approach

1 1(a) 0.0313 0.0177 0.0028 11.17 6.32
2 1(b) 0.0313 0.0180 0.0028 11.17 6.42
3 1(c) 0.0313 0.0179 0.0028 11.17 6.35
4 1(d) 0.1406 0.0735 0.0114 12.33 6.44
5 1(e) 0.1407 0.0736 0.0115 12.23 6.40
6 1(f) 0.1406 0.0735 0.0115 12.22 6.39
7 1(g) 0.5314 0.2852 0.041 12.96 6.95
8 1(h) 0.5469 0.2856 0.044 12.42 6.94
9 1(i) 0.5313 0.2851 0.041 12.95 6.95
10 1(j) 2.1716 1.1466 0.1807 12.02 6.34
11 1(k) 2.1710 1.1462 0.1806 12.02 6.34
12 1(l) 2.1719 1.1467 0.1809 12.00 6.33
13 1(m) 8.6406 4.5770 0.7211 11.98 6.34
14 1(n) 8.6094 4.5640 0.7210 11.94 6.33
Figure 4: CPU time in Secs for DCT Calculation
15 1(0) 8.5 4.5636 0.7210 11.78 6.32 (Windows Environment)-color images.

Table 8: CPU time in Secs for IDCT Calculation (UNIX Environment)-gray level images

107


Speed Up
Speed Up of
of Winograd’s
S. Fig MAT Conventional Wino
Winograd’s compared to
No No LAB Approach grad’s 8
compared to Conventiona
MAT LAB l Approach
6 Matlab

1 2(a) 0.0938 0.5320 0.0084 11.16 6.33 4 Conventional
2 2(b) 0.0938 0.5310 0.0083 11.30 6.39
Winograd
3 2(c) 0.0938 0.5310 0.0083 11.30 6.39 2
4 2(d) 0.4218 0.2206 0.0343 12.29 6.43
5 2(e) 0.4216 0.2205 0.0342 12.32 6.44
0

128x128

256x256

512x512

1024x1024

2048x2048
6 2(f) 0.424 0.2205 0.0342 12.39 6.44
7 2(g) 1.5942 0.8557 0.124 12.85 6.90
8 2(h) 1.5944 0.8558 0.124 12.45 6.90
9 2(i) 1.5939 0.8560 0.132 12.95 6.95
10 2(j) 6.6578 3.4397 0.5422 12.27 6.34
11 2(k) 6.6589 3.4399 0.5421 12.28 6.34
12 2(l) 6.6564 3.4392 0.5320 12.59 6.46
13 2(m) 25.926 13.692 2.163 11.98 6.33
Figure 5: CPU time in Secs for DCT Calculation (UNIX
14 2(n) 25.7641 13.679 2.162 11.91 6.32 Environment)-gray level images
15 2(0) 25.7814 13.679 2.163 11.91 6.32

Table 9: CPU time in Secs for IDCT Calculation (UNIX Environment)-color images

Figure 6: CPU time in Secs for DCT Calculation (UNIX
environment)-color images

Figure 3: CPU time in Secs for DCT Calculation (Windows
Environment)-gray level images

Figure 7: CPU time in Secs for IDCT Calculation (Windows Environment)-gray level images

108


Figure 8: CPU time in Secs for IDCT Calculation (Windows Environment)-Color images

Figure 9: CPU time in Secs for IDCT Calculation (UNIX Environment)-gray level images

Figure 10: CPU time in Secs for IDCT Calculation (UNIX Environment)-color images

Our Winograd’s based DCT algorithm is consistently taking less CPU time than
conventional algorithm and MATLAB function. Also, the CPU time for our IDCT
algorithm is very less compared to MATLAB function and conventional approach.
Table 2 & 3 displays CPU time for DCT in windows XP with gray and color images.
Our algorithm is consistently giving better results than MATLAB routines and
conventional algorithm. We are getting a speed up of more than 8 when compared to
MATLAB and more than 4 when compared to conventional algorithm.

Table 4 & 5 displays CPU time for DCT in UNIX with gray and color images. Our
algorithm is consistently giving better results than MATLAB routings and
conventional algorithm. We are getting a speed up of more than 9 when compared to
MATLAB and more than 4 when compared to conventional algorithm.

Our algorithm is consistently giving better results than MATLAB routings and

109


conventional algorithm. We are getting a speed up of more than 10 when compared
to MATLAB and more than 6 when compared to conventional algorithm.

Our algorithm is consistently giving better results than MATLAB routings and
conventional algorithm. We are getting a speed up of more than 11 when compared
to MATLAB and more than 6 when compared to conventional algorithm.

The CPU time for DCT and IDCT calculations for the color images is around 3 times
for the corresponding size gray level image.

The Speedup of CPU time for DCT and IDCT calculations in UNIX environment as
compared to CPU time for DCT and IDCT calculations in Windows environment is
around 15%.
V. CONCLUSIONS
In this paper, Winograd based fast DCT and IDCT algorithms were proposed. We
have compared our DCT and IDCT algorithms with conventional, MATLAB
counterparts. From our experiments it is evident that our Winograd’s based DCT and
IDCT algorithms is the most preferred algorithms as they consume very less CPU
time compared to conventional implementation and MATLAB. Our approach can be
employed to compress video sequences also.
REFERENCES
[1]N.BVenkateswarlu and P.S.V.S.K.Raju “Winograd’s method:A perspective for some
pattern recognition problems” 105-109,Vol 15 ,No2,1994 Pattern Recognition
Letters.
[2]N.BVenkateswarlu and P.S.V.S.K.Raju “Winograd’s Inequality:A perspective for
some PR problems”, Pattern Recognition Letters 1991.
[3]R.C.Gonzalez and R.E.Woods “Digital Image Processing”,2nd Edition Addison
Wesley,USA ISBN:0-201-60078,1993
[4]The USC–SIPI image database (http://sipi.usc.edu/database).Signal and image
processing institute Ming Hgieh Department of Electrical Engineering..
[5]Rudra Pratap “Getting started with Matlab”:A Quick Introduction for Scientist and
Engineer” version 6 Oxford university press 2003.
[6]Andrew B.Watson “Image Compression using the discrete cosine transform”
Mathematica Journal 4(1),1994, p-81-88
[7]D.L.Lee and M.A.Aboelaze “Linear speedup of Winograd’s matrix multiplication
algorithm using an array processor”. Distributed memory computing conference,1991
proceedings of IEEE,pages(427-430).
[8]R.P.Bent “Algorithms for matrix multiplication”,Technical report TR-CS-70-
157,DCS,Stanford University (March 1970).
[9]Boyko Kakaradov “Ultra-fast Matrix Multiplication:An Empirical Analysis of
Highly optimized vector Algorithms“, Stanford under graduate Research journal
2004
[10]Ken cabeen and peter gent,”Image compression and the discrete cosine
transform”,Math 45 college of redwoods

110

Fast dct algorithm using winograd’s method

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (9)

Similar to Fast dct algorithm using winograd’s method

Similar to Fast dct algorithm using winograd’s method (20)

More from IAEME Publication

More from IAEME Publication (20)

Fast dct algorithm using winograd’s method