SlideShare a Scribd company logo
1 of 3
Download to read offline
Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA)
      ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415

      Low Power DCT Implementation In An Image Compression
                           System
                                             Gulnar Perveen
         Solar Energy Centre, Ministry of New & Renewable Energy, CGO Complex, New Delhi, India


Abstract
         Low power serves as the most important          takes 8 bit input data and produces 12 bit output
challenges to maximize battery life & to save the        using 12 bit DCT matrix coefficients [4].
energy for many signal-processing system
designs, particularly in multimedia cellular
applications and multimedia system on chip
design. The 2-D DCT is a commonly used
frequency transformation in compression
algorithms. In this paper, an Efficient Baseline 2-
D DCT Architecture is compared with the
Row/Column        Approach,      a     Distributed
Arithmetic Architecture & Fully Pipelined DCT
Architecture for wireless Image Compression
Systems. It is observed that the Row/Column DA
DCT Architecture provides power saving of
24.4% and the Fully Pipelined Architecture
provides power saving of 16.4% as compared to
2-D DCT Baseline Architecture. The speed is also
measured & observed that Fully Pipelined
Architecture exploits the principle of pipelining
& parallelism to obtain throughput of 4.703 GHz.

Keywords-DCT (Discrete Cosine Transform), 2-
D(Two-Dimensional), DA(Distributed Arithmetic).

I.     ROW/COLUMN ARCHITECTURE
         In this an 8-point 1-D DCT is applied to 8
rows, and then again to each of the 8 columns. The
1-D algorithm that is applied to both the rows and
columns are the same. Therefore, it could be
possible to use the identical pieces of hardware to do   Figure 1.    Row/Column Approach Distributed
the row computation as well as the column                arithmetic DCT Architecture, D0-D7: Registers.
computation.
                                                                  In the above architecture, when second
         The bulk of the design and computation is       stage of DCT reads out data from transposition
in the 8-point 1-D DCT block, which can potentially      memory 1, first DCT stage can populate second
be reduced 16 - 8 times for each row and 8 times for     transposition memory with new data. 1-D DCT use
each column. Therefore, the fast algorithm for           Distributed Arithmetic with butterfly computation to
computing 1-D DCT is usually selected. The DCT           compute DCT values. Because of parallel DA they
core processor implements a Row-Column                   need considerable amount of ROM memories to
Distributed Arithmetic fast DCT algorithm                compute one DCT value.
enhanced with the activity reduction methods
coefficients. The Row/Column Architecture has two
1-D DCT units connected through transposition
matrix memory.

         Design is synchronous, with single positive
clock edge and no internal tri-state buffers. This has
RAM for storing product results after first DCT
stage for maximized performance. This way both 1-
D DCT units can work in parallel. This architecture



                                                                                             1413 | P a g e
Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA)
   ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415
                                                         output elements are computed simultaneously and
                                                         are shifted out serially. An input vector is multiplied
                                                         by the coefficient matrix M’ to get the output. The
                                                         second element of Y is computed through some
                                                         additions and subtractions of the elements of X, and
                                                         then multiplied by constant a. Permutation is done
                                                         before the result goes into the accumulator.




       Figure 2. RAC unit of Row/Column Approach.                 Figure 3. Baseline 2-D DCT Architecture.


          Design based on Distributed Arithmetic
does not use any multipliers for computing MAC                     The circuit accepts one pixel per clock
(multiply and accumulate), instead it stores results     cycle and the entire processing is performed as a
in ROM memory. There is a buffer which is                linear pipe. When the left column of register set RS
essentially a memory arbiter between 1-D DCT             is filled with eight data elements, the entire column
stages. The DCT architecture implements a Row-           is copied onto the corresponding registers in the
Column Distributed Arithmetic fast DCT algorithm         right column. A similar process occurs in each of
enhanced with the activity reduction methods [4].        the partitions simultaneously. The transpose buffer
                                                         consists of an 8x8 array of register pairs, the data is
                                                         input to the transpose buffer in row-wise fashion
II. BASELINE 2-D DCT ARCHITECTURE
         The Baseline 2-D DCT Architecture               until all the 64 registers are loaded. The data in
                                                         those registers are copied in parallel onto the
provides a reference design for application of our
                                                         corresponding adjacent registers which are
low power techniques. It is based on Chen
Algorithm. This acts as a reference design for the       connected in column-wise fashion. While the data is
computation of power savings techniques. This            being read out from the column registers, the row
approach requires three steps: eight 1-D
DCT/IDCTs along the rows, a memory
transposition, and another eight 1-D DCT/IDCTs
along the transposed columns.
         A block diagram of the Baseline
Architecture shown above includes the controller
which enables input of the first row of data (DIN)
through the ser2par unit under the SEN signal. It
then activates the 1-D DCT unit with the SEL and
REN signals determining the data path. The first
row of the transposition memory stores the results &
the process repeats for the remaining seven rows of
the input block. Next, the ISEL and COLACK
signals enable the 1-D DCT unit to receive the input
data from the columns of the transposition memory.
The final results of the column-wise 1-D DCT are
available at the output [1].
                                                                           Figure 4. Fully Pipelined 2-D DCT Architecture.
III. FULLY PIPELINED ARCHITECTURE
        In this architecture, a row output vector is     registers will keep receiving further data from the
computed      using     multipliers,     multiplexers,   DCT module.
accumulators, and registers. The elements of input                Thus, the output of row-wise DCT
vector X are fed into the circuit one at a time. The 8   computation is transposed for column-wise DCT


                                                                                                  1414 | P a g e
Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA)
    ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415
 computation [2].                                                The Fully Pipelined 2-D DCT architecture
                                                        exploits the principles of pipelining and parallelism
 IV. SIMULATION RESULTS                                 to the maximum extent so to obtain high speed and
 TABLE I.                                               throughput when compared with Row/Column (DA)
                                                        DCT approach & Fully Pipelined 2-D DCT
             Comparison of Speed & Power for            approach.
             DCT Architectures
Architect
                          Power                         REFERENCES
ure                                  Power
             Speed        consumptio                      [1]    Nathaniel J. August and Dong Sam Ha,
                                     saving                      “Low power Design of DCT and IDCT for
                          n
2D DCT                                                           low bit rate video codecs”, IEEE
Baseline                                                         Transactions on Multimedia, vol. 6, No. 3,
             2.934 GHz      9mW            ---
Architectu                                                       June 2004.
re                                                        [2]    Jim Li and Shih –Lien Lu, “Low Power
Row                                                              Design of Two-Dimensional DCT”, IEEE
Column                                                           Transactions on communication, vol. 14,
             3.23 GHz       6.3mW          24.4%                 No. 4, April 1996.
DA
Approach                                                  [3]    F. Bensaali, A. Amira and A. Bouridane,
Fully                                                            “An efficient Architecture for color space
Pipelined                                                        conversion using Distributed Arithmetic”
 2D - DCT    4.703 GHz      7.5mW          16.6%                 The 10th IEEE International conference on
Architectu                                                       Electronics, Circuits & Systems (ICECS
re                                                               2003) Sharjah, UAE, December 14-17,
                                                                 2004.
 TABLE II.                                                [4]    Thucydides Xanthopoulos, and Anantha P.
                                                                 Chandraprakasan, “Low power DCT core
             Performance     Analysis    of      JPEG            using Adaptive Bit width and Arithmetic
             Compressor                                          Activity Exploiting signal correlations and
Units                                                            Quantization”, IEEE Journal of Solid state
                                           Frequenc
             Logic cells   Memory bits                           Circuits, vol. 35, No. 5, May 2000.
                                           y (MHZ)
                                                          [5]    Jie Chen and K.J. Ray Liu, “Low-Power
DCT 2D       3,749         1,528           50.485                Architectures for Compressed Domain
Quantizati                                                       Video Coding Co-Processor”, IEEE
             309           700             85.700                transaction on circuits & systems.,vol. 7,
on
Zigzag                                                           pp.459-467, June 2000.
             84            1,380           181.285        [6]    L. fanucci, S. Saponara, “Data Driven
Buffer
Run                                                              VLSI Computation for Low Power DCT-
Length       510           3,901           78.500                based Video Coding”, Proceedimgs of
Encoder                                                          IEEE, vol. 83, No. 2, pp.220-246, May
JPEG                                                             2002.
Compresso                                                 [7]    Hyeonuk Jeong, Jinsang Kim, and Won-
             4,652         7,509           395.970               kyung Cho, “Low-Power Multiplier less
r
                                                                 DCT Architecture using Image Data
                                                                 Correlation”, IEEE Transactions on
 V. CONCLUSION                                                   Consumer Electronics, Vol. 50, No.1,
          The comparison of the Power Consumption                February 2004.
 & Speed for three different DCT architectures & the
 Performance Analysis for the JPEG Compressor
 discussed above have been done using Verilog &
 Synthesis using Xilinx.

          The Row Column (Distributed Arithmetic)
 DCT Architecture approach was selected for
 implementation after power savings against the
 Fully Pipelined Architecture & Baseline 2-D DCT
 Architecture. This architecture results in maximum
 power savings as it uses the butterfly operation
 which results in minimum data path bit widths since
 fewer flip flops were needed between stages, hence
 reduction in power consumption.


                                                                                            1415 | P a g e

More Related Content

What's hot

POWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAY
POWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAYPOWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAY
POWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAYecij
 
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...IOSR Journals
 
Design Analysis of Delay Register with PTL Logic using 90 nm Technology
Design Analysis of Delay Register with PTL Logic using 90 nm TechnologyDesign Analysis of Delay Register with PTL Logic using 90 nm Technology
Design Analysis of Delay Register with PTL Logic using 90 nm TechnologyIJEEE
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET Journal
 
Fpga based low power and high performance address
Fpga based low power and high performance addressFpga based low power and high performance address
Fpga based low power and high performance addresseSAT Publishing House
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Editor IJARCET
 
Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...eSAT Journals
 
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAIMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAeeiej_journal
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
An approach to design Flash Analog to Digital Converter for High Speed and Lo...
An approach to design Flash Analog to Digital Converter for High Speed and Lo...An approach to design Flash Analog to Digital Converter for High Speed and Lo...
An approach to design Flash Analog to Digital Converter for High Speed and Lo...VLSICS Design
 
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...sipij
 

What's hot (16)

Ad4103173176
Ad4103173176Ad4103173176
Ad4103173176
 
POWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAY
POWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAYPOWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAY
POWER GATING STRUCTURE FOR REVERSIBLE PROGRAMMABLE LOGIC ARRAY
 
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
 
Design Analysis of Delay Register with PTL Logic using 90 nm Technology
Design Analysis of Delay Register with PTL Logic using 90 nm TechnologyDesign Analysis of Delay Register with PTL Logic using 90 nm Technology
Design Analysis of Delay Register with PTL Logic using 90 nm Technology
 
Ko2518481855
Ko2518481855Ko2518481855
Ko2518481855
 
Lc3519051910
Lc3519051910Lc3519051910
Lc3519051910
 
2012
20122012
2012
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
 
Fpga based low power and high performance address
Fpga based low power and high performance addressFpga based low power and high performance address
Fpga based low power and high performance address
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362
 
Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...
 
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAIMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
An approach to design Flash Analog to Digital Converter for High Speed and Lo...
An approach to design Flash Analog to Digital Converter for High Speed and Lo...An approach to design Flash Analog to Digital Converter for High Speed and Lo...
An approach to design Flash Analog to Digital Converter for High Speed and Lo...
 
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
 

Viewers also liked (20)

Hg2513121314
Hg2513121314Hg2513121314
Hg2513121314
 
Gp2512011210
Gp2512011210Gp2512011210
Gp2512011210
 
Hm2513521357
Hm2513521357Hm2513521357
Hm2513521357
 
Jz2517611766
Jz2517611766Jz2517611766
Jz2517611766
 
Gc2511111116
Gc2511111116Gc2511111116
Gc2511111116
 
Hj2513301335
Hj2513301335Hj2513301335
Hj2513301335
 
I25038042
I25038042I25038042
I25038042
 
H25031037
H25031037H25031037
H25031037
 
Gt2512361238
Gt2512361238Gt2512361238
Gt2512361238
 
Gi2511541161
Gi2511541161Gi2511541161
Gi2511541161
 
Hq2513761382
Hq2513761382Hq2513761382
Hq2513761382
 
Ht2514031407
Ht2514031407Ht2514031407
Ht2514031407
 
Mp2521672171
Mp2521672171Mp2521672171
Mp2521672171
 
G31048051
G31048051G31048051
G31048051
 
Q26099103
Q26099103Q26099103
Q26099103
 
Hu2615271543
Hu2615271543Hu2615271543
Hu2615271543
 
Hw2615491553
Hw2615491553Hw2615491553
Hw2615491553
 
Ba31354368
Ba31354368Ba31354368
Ba31354368
 
Bd31386389
Bd31386389Bd31386389
Bd31386389
 
1ª division provincial benjamin sala 1ª v
1ª division provincial benjamin sala 1ª v1ª division provincial benjamin sala 1ª v
1ª division provincial benjamin sala 1ª v
 

Similar to Low Power DCT Implementations Compared for Image Compression Systems

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...IJERA Editor
 
Implementation of resource sharing strategy for power optimization in embedde...
Implementation of resource sharing strategy for power optimization in embedde...Implementation of resource sharing strategy for power optimization in embedde...
Implementation of resource sharing strategy for power optimization in embedde...Alexander Decker
 
Bivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm forBivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm foreSAT Publishing House
 
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...ijsrd.com
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...IOSR Journals
 
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERS
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR  TCM DECODERSHIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR  TCM DECODERS
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERSLalitha Gosukonda
 
Design and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesDesign and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
 
Adiabatic technique based low power synchronous counter design
Adiabatic technique based low power synchronous counter  designAdiabatic technique based low power synchronous counter  design
Adiabatic technique based low power synchronous counter designIJECEIAES
 
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationMangaiK4
 
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationMangaiK4
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET Journal
 
An Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS GateAn Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS GateIJERA Editor
 
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...idescitation
 
A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...eSAT Journals
 
A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...eSAT Publishing House
 

Similar to Low Power DCT Implementations Compared for Image Compression Systems (20)

Ju3417721777
Ju3417721777Ju3417721777
Ju3417721777
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Ek35775781
Ek35775781Ek35775781
Ek35775781
 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
 
Implementation of resource sharing strategy for power optimization in embedde...
Implementation of resource sharing strategy for power optimization in embedde...Implementation of resource sharing strategy for power optimization in embedde...
Implementation of resource sharing strategy for power optimization in embedde...
 
Bivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm forBivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm for
 
Ix3416271631
Ix3416271631Ix3416271631
Ix3416271631
 
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
 
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERS
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR  TCM DECODERSHIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR  TCM DECODERS
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERS
 
Design and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesDesign and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemes
 
Adiabatic technique based low power synchronous counter design
Adiabatic technique based low power synchronous counter  designAdiabatic technique based low power synchronous counter  design
Adiabatic technique based low power synchronous counter design
 
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
 
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
 
An Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS GateAn Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS Gate
 
Bf36342346
Bf36342346Bf36342346
Bf36342346
 
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
 
A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...
 
A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...A continuous time adc and digital signal processing system for smart dust and...
A continuous time adc and digital signal processing system for smart dust and...
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Low Power DCT Implementations Compared for Image Compression Systems

  • 1. Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415 Low Power DCT Implementation In An Image Compression System Gulnar Perveen Solar Energy Centre, Ministry of New & Renewable Energy, CGO Complex, New Delhi, India Abstract Low power serves as the most important takes 8 bit input data and produces 12 bit output challenges to maximize battery life & to save the using 12 bit DCT matrix coefficients [4]. energy for many signal-processing system designs, particularly in multimedia cellular applications and multimedia system on chip design. The 2-D DCT is a commonly used frequency transformation in compression algorithms. In this paper, an Efficient Baseline 2- D DCT Architecture is compared with the Row/Column Approach, a Distributed Arithmetic Architecture & Fully Pipelined DCT Architecture for wireless Image Compression Systems. It is observed that the Row/Column DA DCT Architecture provides power saving of 24.4% and the Fully Pipelined Architecture provides power saving of 16.4% as compared to 2-D DCT Baseline Architecture. The speed is also measured & observed that Fully Pipelined Architecture exploits the principle of pipelining & parallelism to obtain throughput of 4.703 GHz. Keywords-DCT (Discrete Cosine Transform), 2- D(Two-Dimensional), DA(Distributed Arithmetic). I. ROW/COLUMN ARCHITECTURE In this an 8-point 1-D DCT is applied to 8 rows, and then again to each of the 8 columns. The 1-D algorithm that is applied to both the rows and columns are the same. Therefore, it could be possible to use the identical pieces of hardware to do Figure 1. Row/Column Approach Distributed the row computation as well as the column arithmetic DCT Architecture, D0-D7: Registers. computation. In the above architecture, when second The bulk of the design and computation is stage of DCT reads out data from transposition in the 8-point 1-D DCT block, which can potentially memory 1, first DCT stage can populate second be reduced 16 - 8 times for each row and 8 times for transposition memory with new data. 1-D DCT use each column. Therefore, the fast algorithm for Distributed Arithmetic with butterfly computation to computing 1-D DCT is usually selected. The DCT compute DCT values. Because of parallel DA they core processor implements a Row-Column need considerable amount of ROM memories to Distributed Arithmetic fast DCT algorithm compute one DCT value. enhanced with the activity reduction methods coefficients. The Row/Column Architecture has two 1-D DCT units connected through transposition matrix memory. Design is synchronous, with single positive clock edge and no internal tri-state buffers. This has RAM for storing product results after first DCT stage for maximized performance. This way both 1- D DCT units can work in parallel. This architecture 1413 | P a g e
  • 2. Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415 output elements are computed simultaneously and are shifted out serially. An input vector is multiplied by the coefficient matrix M’ to get the output. The second element of Y is computed through some additions and subtractions of the elements of X, and then multiplied by constant a. Permutation is done before the result goes into the accumulator. Figure 2. RAC unit of Row/Column Approach. Figure 3. Baseline 2-D DCT Architecture. Design based on Distributed Arithmetic does not use any multipliers for computing MAC The circuit accepts one pixel per clock (multiply and accumulate), instead it stores results cycle and the entire processing is performed as a in ROM memory. There is a buffer which is linear pipe. When the left column of register set RS essentially a memory arbiter between 1-D DCT is filled with eight data elements, the entire column stages. The DCT architecture implements a Row- is copied onto the corresponding registers in the Column Distributed Arithmetic fast DCT algorithm right column. A similar process occurs in each of enhanced with the activity reduction methods [4]. the partitions simultaneously. The transpose buffer consists of an 8x8 array of register pairs, the data is input to the transpose buffer in row-wise fashion II. BASELINE 2-D DCT ARCHITECTURE The Baseline 2-D DCT Architecture until all the 64 registers are loaded. The data in those registers are copied in parallel onto the provides a reference design for application of our corresponding adjacent registers which are low power techniques. It is based on Chen Algorithm. This acts as a reference design for the connected in column-wise fashion. While the data is computation of power savings techniques. This being read out from the column registers, the row approach requires three steps: eight 1-D DCT/IDCTs along the rows, a memory transposition, and another eight 1-D DCT/IDCTs along the transposed columns. A block diagram of the Baseline Architecture shown above includes the controller which enables input of the first row of data (DIN) through the ser2par unit under the SEN signal. It then activates the 1-D DCT unit with the SEL and REN signals determining the data path. The first row of the transposition memory stores the results & the process repeats for the remaining seven rows of the input block. Next, the ISEL and COLACK signals enable the 1-D DCT unit to receive the input data from the columns of the transposition memory. The final results of the column-wise 1-D DCT are available at the output [1]. Figure 4. Fully Pipelined 2-D DCT Architecture. III. FULLY PIPELINED ARCHITECTURE In this architecture, a row output vector is registers will keep receiving further data from the computed using multipliers, multiplexers, DCT module. accumulators, and registers. The elements of input Thus, the output of row-wise DCT vector X are fed into the circuit one at a time. The 8 computation is transposed for column-wise DCT 1414 | P a g e
  • 3. Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415 computation [2]. The Fully Pipelined 2-D DCT architecture exploits the principles of pipelining and parallelism IV. SIMULATION RESULTS to the maximum extent so to obtain high speed and TABLE I. throughput when compared with Row/Column (DA) DCT approach & Fully Pipelined 2-D DCT Comparison of Speed & Power for approach. DCT Architectures Architect Power REFERENCES ure Power Speed consumptio [1] Nathaniel J. August and Dong Sam Ha, saving “Low power Design of DCT and IDCT for n 2D DCT low bit rate video codecs”, IEEE Baseline Transactions on Multimedia, vol. 6, No. 3, 2.934 GHz 9mW --- Architectu June 2004. re [2] Jim Li and Shih –Lien Lu, “Low Power Row Design of Two-Dimensional DCT”, IEEE Column Transactions on communication, vol. 14, 3.23 GHz 6.3mW 24.4% No. 4, April 1996. DA Approach [3] F. Bensaali, A. Amira and A. Bouridane, Fully “An efficient Architecture for color space Pipelined conversion using Distributed Arithmetic” 2D - DCT 4.703 GHz 7.5mW 16.6% The 10th IEEE International conference on Architectu Electronics, Circuits & Systems (ICECS re 2003) Sharjah, UAE, December 14-17, 2004. TABLE II. [4] Thucydides Xanthopoulos, and Anantha P. Chandraprakasan, “Low power DCT core Performance Analysis of JPEG using Adaptive Bit width and Arithmetic Compressor Activity Exploiting signal correlations and Units Quantization”, IEEE Journal of Solid state Frequenc Logic cells Memory bits Circuits, vol. 35, No. 5, May 2000. y (MHZ) [5] Jie Chen and K.J. Ray Liu, “Low-Power DCT 2D 3,749 1,528 50.485 Architectures for Compressed Domain Quantizati Video Coding Co-Processor”, IEEE 309 700 85.700 transaction on circuits & systems.,vol. 7, on Zigzag pp.459-467, June 2000. 84 1,380 181.285 [6] L. fanucci, S. Saponara, “Data Driven Buffer Run VLSI Computation for Low Power DCT- Length 510 3,901 78.500 based Video Coding”, Proceedimgs of Encoder IEEE, vol. 83, No. 2, pp.220-246, May JPEG 2002. Compresso [7] Hyeonuk Jeong, Jinsang Kim, and Won- 4,652 7,509 395.970 kyung Cho, “Low-Power Multiplier less r DCT Architecture using Image Data Correlation”, IEEE Transactions on V. CONCLUSION Consumer Electronics, Vol. 50, No.1, The comparison of the Power Consumption February 2004. & Speed for three different DCT architectures & the Performance Analysis for the JPEG Compressor discussed above have been done using Verilog & Synthesis using Xilinx. The Row Column (Distributed Arithmetic) DCT Architecture approach was selected for implementation after power savings against the Fully Pipelined Architecture & Baseline 2-D DCT Architecture. This architecture results in maximum power savings as it uses the butterfly operation which results in minimum data path bit widths since fewer flip flops were needed between stages, hence reduction in power consumption. 1415 | P a g e