Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Â
4 - Simulation and analysis of different DCT techniques on MATLAB (presented in a Malaysian conference)
1. Simulation and Analysis of Different
Discrete Cosine Transform (DCT)
Techniques on MATLAB®
Muhammad Munim Zabidi 1
, Youness Lahdili 2*
1, 2
Faculty of Electrical Engineering, VeCAD Lab, Universiti Teknologi Malaysia, Malaysia
* corresponding author: y.lahdili@gmail.com
Abstract— Empirical approaches can sometimes be
the only guarantee that a prototype will be
functional, seamless, and practically feasible.
Discrete Cosine Transform is one of those systems
that require passing experimental tests, because of
the capital role it occupies in redundancy removal
of Video codecs. Therefore, a beforehand
simulation and fine tuning is always to be operated.
We also address the problem of DCT integration as
an element within the Video
COmpression/DECompression framework.
In this experimental paper, DCT will be subject to
simulation in four different forms, namely in Chen
form [2], Loeffler form [10], the 8x8 Basic Pattern-
based form, and finally the MATLAB® built-in
function form, even if this latter is not intended to
be RTL abstracted, and therefore it is not compliant
with FPGA. We will measure for each running
simulation, the MSE, PSNR and the approximated
speed of execution of each algorithm. We will
subsequently compare the algorithms against each
others, and postulate our own improvement
propositions.
Keywords— Discrete Cosine Transform, Video
Compression, MPEG, MATLAB Simulation, Image
Processing, Color Plan, Quantization
I. INTRODUCTION
A. Overview on Discrete Cosine Transform
The 8 x 8 DCT is used to decorrelate the 8 x 8
blocks of original pixels or motion compensated
difference pixels and to compact their energy into
few coefficient as possible.
The basic computation element in a DCT-based
system is the transformation of an NxN image
block from the spatial domain to the DCT domain.
For the video compression standards, N is usually 8.
The 8 x 8 DCT is simple, efficient and well suited
for hardware and software implementations. The
mathematical formulation for the (two-dimensional)
2-D DCT is shown in this equation:
The above 2-D DCT transforms an 8 x 8 block of
pictures samples xi,j into spatial frequency
components yk,l for 0 ≤ k, j ≤ l. On the other hand,
the 2-D IDCT below will perform the inverse
transform for 0 ≤ i, j ≤ 7:
Since the straightforward implementation of the
above equations is computationally expensive (with
4096 multiplications), many researches have been
done to optimize the DCT/IDCT computational
effort using the fast algorithms such as Chen and
Loeffler. Most of the efforts have been devoted to
reduce the number of operations, mainly
multiplications and additions.
For this introduction to DCT/IDCT, we illustrate
it by the butterfly diagram of an 8-point one-
dimensional (1-D) DCT/IDCT algorithm, proposed
2. by van Eijdhoven and Sijstermans (see Ref. [15]). It
was selected due the minimum required number of
additions and multiplications (11 Multiplications
and 29 additions). This algorithm is obtained by a
slight modification of the original Loeffler
Algorithm, which provides one of the most
computationally efficient 1-D DCT/IDCT
calculation.
The modified Loeffler Algorithm for calculating
8-point 1-D DCT is depicted below:
where:
B. Methodology of Analysis
To set the stage for simulation, we could have
written a routine using any programming language
capable of encompassing our video compression
algorithms, but there is no doubt that the
computational power of MATLAB® is alone
sufficing to procure all the utilities we need to
realize a so called “beta” version of the video
compression codec we are prototyping. The
MATLAB® deliverables can later be released in
SystemVerilog format for deployment on FPGA as
a concluding phase. There is a special preference
for SystemVerilog here because of its robustness
and its emergence as a novel HDL language.
Computer experimentation constitutes an
important portion of this paper. And this is to
ensure the acceptable level of operation of DCT
engine as well as its responsive communication
with the other co-sited video compression
components, before validating them for compilation
and bundling in the FPGA platform. Then probably
comes the eventual off-the-shelf volume
commercialization on SoC.
The acceptable level of operation is a commonly
agreed threshold that will be determined arbitrarily,
mainly by looking at the output video properties,
and parameters: MSE (mean squared error), PSNR
(Peak signal-to-noise ratio), elapsed time of
compression, chip footprint and retro-compatibility
vis-Ă -vis other video compression standards.
These abovementioned benchmarking criterions
will be found empirically using MATLAB® or the
EDA simulators from FPGA vendors.
II. DESIGN & SIMULATION
A. The Block Diagram Construction
At this early stage before preliminary results, we
designed the core framework of a typical video
compression/decompression using MATLAB®
syntax code. This code is comporting the essential
elements of compression, namely the RGB to YUV
color plan transformation, image fragmentation
(macroblock desiccation), DCT transformation, and
Quantization.
Fig. 1 The black-box overview of our video codec framework
3. We tailored our code so that we can insert our
candidates DCT algorithms into it, and yet reach a
just consensus about which DCT algorithm is better
to be adopted. The black-box overview of the
framework we coded is presented in Fig. 1 above.
This premature code will naturally be
supplemented by other components such as
intraframe, interframe prediction in the form of
Motion Compensation module, Entropy coding
such as Huffman coding, RLC and VLC. One can
also add MPEG-2 recommended controlling units
in due time, so that the code he is engineering can
be supported by MPEG-2 readers.
The anatomy of the framework we coded is
presented in Fig. 2 below:
Fig. 2 The concrete anatomy of our video encoder framework
B. The Code Running in Workspace
Having set the structure of the code, we can now
lay out the properly speaking code that abstract this
framework and translate it into output image that
can be gauged. The test image we used for this
purpose is the “Mandrill” standard image. We could
also have tested our code on the “Lena” or the
“Cameraman” photo. These standard photos contain
recognizable artwork and coloration than will
normalize our measurements with respect to other
video compression codec developed in the past or
in the future.
1) Running DCT Chen Algorithm
We synthesize the topology of DCT/IDCT Chen
Algorithm and render it into a MATLAB® code
inserted within the video codec framework. The
said code can be consulted in our sideline listing
literature, but we provide an excerpt of it here to
give a hint of the complete code.
Here is the code header and footer as seen in the
MATLAB® workspace editing window:
clear;
% Start a Timer
tic,
% Read file into a matrix
array_input_rgb = imread('mandrill.jpg');
% Reads file info
array_info = imfinfo('mandrill.jpg');
width=array_info.Width;
height=array_info.Height;
% Colour conversion from RGB to YCbCr
array_input=rgb2ycbcr(array_input_rgb);
% Create matrix which its height and width are
divisible by 8.
% If is not, fill up with zeros
W=ceil(width/8);
H=ceil(height/8);
input=zeros(H*8,W*8,3,'uint8');
% Preallocate a size to output matrix 'trunk'
(recommended practice but not mandatory)
trunk=input;
for I=1:3
input(1:height,1:width,I)=array_input(1:height,1:wid
th,I);
end
% Divide input matrix into WxH 8x8 matrices
.
.
.
% Colour conversion from YCbCr to RGB
array_output_rgb=ycbcr2rgb(array_output);
subplot(2,2,1), imshow(uint8(array_input_rgb)),
title('Original Image');
subplot(2,2,2), imshow(uint8(array_output_rgb)),
title('Compressed Image');
% MSE Calculation
count = 0;
for I=1:3
for J=1:height
for K=1:width
reg =(double(array_output_rgb(J,K,I))-
double(array_input_rgb(J,K,I)))^2;
count = count + reg;
end
end
end
MSE = count/(width*height*3)
PSNR = 10*log10((255)^2/MSE)
imwrite(array_output_rgb,'mandrill_new.jpg','jpg');
% Stop the Timer
toc,
fprintf('Time spent for execution is: %.4f
secondsn',toc);
4. MSE is calculated at the end of code using the
algebraic formula that takes into account single
pixels discrepancy between the two images. This
slight pixels chromatic disparity would have been
imperceptible to the naked eye otherwise.
PSNR is direct logarithmic function of MSE.
Though, it is not the only way to find it.
The whole code is bracketed by the tic & toc
commands to measure time elapsed.
The above code being executed will return the
following screenshots on MATLAB®
Fig. 3 The screenshots after completion of Chen Algorithm-based code
2) Running DCT Loeffler Algorithm
Fig. 4 The screenshots after completion of Loeffler Algorithm-based code
5. To adapt the Chen Algorithm code to Loeffler
Algorithm, we had to replace the whole DCT/IDCT
block with a function call that has dependency to an
external file function which contains the actual
Loffler Algorithm for any N-point 2D DCT.
The external file called “Loeffler_DCT.m” is
located on the same working folder as the
framework code, and was build by following a
paper on binDCT by Liang, Tran (see Ref. [14]).
The result after execution is illustrated in the
figure Fig. 4 above. It is worth to mention that,
from the subjective viewer point of view, the
“Mandrill” image compressed using Loeffler shows
no veritable difference from the one compressed
using Chen.
3) Running DCT 8x8 Basic Pattern-based
In a third time, we re-modified our framework
code, so that we can try our highly anticipated
method that employs the Predefined DCT 8x8 Basic
Pattern and which does not require any sort of
rotation algorithms unlike the previous two ones.
This would also be synonym of shorter coding,
better chances for debugging, and presumably faster
execution time since the 2D DCT processing is
carried out in one shot, and all operations are
parallelized simultaneously.
As explained in MATLAB® documentations, the
Predefined DCT 8x8 Basic Pattern is directly
generated by the internal MATLAB® function
dctmtx(8), (8 for the number of points desired)
and subsequently passed to a matrix variable we
call “T”. Then removing the whole DCT/IDCT
block in the original Chen Algorithm framework,
and adding instead the simple line: myDCT =
T*temp*T'.
The Quantization and Dequantization part will
change accordingly, as the scaling now is done at
the matrix level and not at the level of each
coefficient as previously.
Fig. 5 The screenshots after completion of DCT 8x8 Basic Pattern-based code
4) Running DCT MATLAB® built-in function
At this point, we sought at comparing our three
previous 2D DCT experiments results, with the
MATLAB® 2D DCT pre-built function dct2.
This inherently integrated function MATLAB®
is not recommended to be used as our choice of 2D
DCT computational method, for two reasons: (1) If
one use it in his publicly targeted SoC prototype,
and while he ignore the patenting law that underlie
this function, especially that MATLAB® is a
6. property language, he might fall into some
patenting litigation with the owners. (2) We have
no clear description of the algorithm that rule this
function, and there are chances, that the algorithm
used in dct2 is not hardware friendly and that it
may be unstable or erratic when used in FPGA,
where basically everything is converted to LUT and
Add/Shifts maneuvers. The content description of
dct2 is not disclosed partly because of reason (1).
But the intended aim from using the native
MATLAB® 2D DCT function is to see to how
extent our three major algorithms have surpassed
the software 2D DCT method.
To adapt our previous code to the internal
MATLAB® 2D DCT function dct2, we just have
to delete any mention of dctmtx, and replace the
line: myDCT = T*temp*T' with: myDCT =
dct2(temp);
Fig. 6 The screenshots after completion of the native MATLAB® 2D DCT code
III. CONCLUSION
A good class of Fast Discrete Cosine Transform
algorithms have been developed which provided a
factor of six improvement in computational
complexity when compared to conventional
Discrete Cosine Transform algorithms using the
Fast Fourier Transform.
Our test benches algorithms were derived in the
form of matrices and illustrated by signal-flow
graphs, which may be readily translated to
hardware or even software implementations. This
transform has been successfully applied to the
coding of high resolution imagery.
By analyzing our preliminary results, we
surprisingly found that the Chen Algorithm
presented the most rapid execution time, and even
the lowest MSE, which also signify a better
preservation of the original image qualities. And as
expected the native MATLAB® function for 2D
DCT calculation is the slowest and does not pay off
in quality either. But we are still believing that the
Predefined DCT 8x8 Basic Pattern multiplication
could be further refined to lead to a faster code and
faithful compression system.
Having said that, we could postulate that there is
still room for improvement of our DCT units, if we
omit to compute the last 3 or 4 rows of the
macroblocks, as they would be trunkated to “0”
anyway in the Quantization stage. This would save
a huge amount of calculation resources without
noticeably affecting the output image frame.
The conventional method of implementing the
DCT utilized a double size Fast Fourier Transform
7. (FFT) algorithm employing complex arithmetic
throughout the computation. The use of the DCT in
a wide variety of applications has not been as
extensive as its properties would imply due to the
lack of an efficient algorithm. This paper described
more efficient algorithms involving only real
operations for computing the Fast Discrete Cosine
Transform (FDCT) of a set of 8 points.
These algorithms can be extended to any desired
value of N=2m
. The generalization consists of
alternating cosine/sine butterfly matrices with
binary matrices to reorder the matrix elements to a
form which preserves a recognizable bit-reversed
pattern at every other node. The method described
herein appears to be the simplest to interpret, yet it
is not necessarily the most efficient FDCT which
could be constructed but represents one technique
for methodical extension.
IV.REFERENCES
[1] N. Ahmed, T. Natarajan, and K. R. Rao (1974).
"Discrete Cosine Transform" IEEE Trans. Computer, vol.
COM-23, no. 1, pp. 90-93.
[2] W. A. Chen , C. Harrison and S. C. Fralick (1977). "A
Fast computational Algorithm for the Discrete Cosine
Transform", IEEE Transactions on
Communications, vol. COM-25, no. 9, pp.1004-1011.
[3] B. Tseng. and W. Miller (1978). "On computing the
discrete cosine transform" IEEE Trans. Comput. C-27,
pp. 966-968.
[4] Z. Wang and B. R. Hunt (1983). "The discrete cosine
transform-A new Version", in Proc. Int. Conf. Acoust.,
Speech, Signal Processing.
[5] B. Lee (1984). "A new algorithm to compute the discrete
cosine transform" IEEE Trans. Acoust. Speech Signal
Process, 32, pp. 1243-1245.
[6] M. Vetterli and H. Nussbaumer (1984). "Simple FFT and
DCT Algorithms with Reduced Number of
Operations", Signal Processing (North Holland), vol.
6, no. 4, pp.267-278
[7] M. Vetterli and A. Ligtenberg (1986). "A Discrete
Fourier-Cosine Transform Chip", IEEE Journal on
Selected Areas of Comm., vol. SAC-4, no. 1, pp.49-61
[8] H. Hou (1987). "A fast recursive algorithm for
computing the discrete cosine transform" IEEE Trans.
Acoust. Speech Signal Process, 35, pp. 1455-1461.
[9] P. Duhamel and H. H’Mida (1987). "New 2n
DCT
Algorithms Suitable for VLSI Implementation". In
Proceedings of the IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP ’87),
Boston, MA, USA, Volume 12, pp. 1805-1808.
[10] C. Loeffler, A. Ligtenberg and G. Moschytz (1989).
"Practical Fast 1-D DCT Algorithms with 11
Multiplications". In Proceedings of the IEEE Int’l
Conference on Acoustics, Speech, and Signal Processing
(ICASSP ’89), Glasgow, Scotland, vol. 2, pp. 988-991.
[11] W.H. Chen , C.H. Smith and S.C. Fralick (1991). "A Fast
Computational Algorithm for the Discrete Cosine
Transform", IEEE Trans. Circuit and System, vol. Com.
25, no. 9, pp.1004 -1009.
[12] S. A. Martucci (1994). "Symmetric Convolution and the
Discrete Sine and Cosine Transforms," IEEE Trans. on
Signal Processing, vol. 42, pp. 1038-1051.
[13] Z. Mohd-Yusof, I. Suleiman, and Z. Aspar (2000)
"Implementation of two dimensional forward DCT and
inverse DCT using FPGA," in TENCON 2000.
Proceedings, vol. 3, pp. 242-245 vol.3.
[14] J. Liang and T. D. Tran (2001). "Fast Multiplierless
Approximations of the DCT with the Lifting Scheme"
IEEE Trans. on Signal Processing, vol. 49, no. 12, pp.
3032-3044.
[15] Josephus T.J. van Eijndhoven, Franciscus W.
Sijstermans (Sept. 23, 1999) ”Data procesing device and
method of computing the cosine transform of a matrix"
International patent WO-99/48025 A2.
[16] N.J. August and Dong Sam Ha (2004). "Low power
design of DCT and IDCT for low bit rate video codecs"
IEEE Trans. Multimedia, Vol. 6, no.3, pp. 414- 422.
[17] S. Ghosh, S. Venigalla and M. Bayoumi (2005). "Design
and Implementaion of a 2D-DCT Architecture Using
Coefficient Distributed" In Proceedings of the IEEE
Computer Society Annual Symposium on VLSI, Tampa,
FL, USA, pp. 162-166.
[18] A. Ben Atitallah, P. Kadionik, F. Ghozzi, P.Nouel, N.
Masmoudi and H. Levi (2011). “An FPGA
Implementation of HW/SW Codesign Architecture for
H.263 Video Coding”, Effective Video Coding for
Multimedia Applications, Dr Sudhakar Radhakrishnan
(Ed.), ISBN: 978-953-307-177-0.
[19] K. R. Rao, and P. Yip (1990). "Discrete Cosine
Transform: Algorithms, Advantages, Applications",
Academic Press.
[20] V. Bhaskaran and K. Konstantinides (1997). "Image and
Video Compression Standards: Algorithms and
Architectures", Kluwer.