Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter

Indo-Iranian Journal of Scientific Research (IIJSR)
(Peer Reviewed Quarterly International Journal) Volume 2, Issue 4, Pages 50-57, Oct-Dec 2018
50 | P a g e ISSN: 2581-4362 Website: www.iijsr.com
Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter
S.Priyadharsini1
, S.Gnanamurugan2
1
PG Scholar, Department of ECE, Vivekanandha College of Engineering for Women, Namakkal, India.
2
Assistant Professor, Department of ECE, Vivekanandha College of Engineering for Women, Namakkal, India.
Article Received: 29 May 2018 Article Accepted: 22 September 2018 Article Published: 06 October 2018
1. INTRODUCTION
Among various number-crunching obstructs, the multiplier is one of the fundamental squares, which is broadly
utilized in different applications particularly flag preparing applications. There are two general structures for the
multipliers, which are consecutive and parallel. While consecutive designs are low control, their dormancy is
expansive. On the other hand, parallel designs, (for example, Wallace tree and Dadda) are fast while having
high-control utilizations. The parallel multipliers are utilized in elite applications where their large control
utilizations may make problem area areas on the kick the bucket. Since the power utilization and speed are critical
parameters in the plan of computerized circuits, the advancements of these parameters for multipliers turn out to be
fundamentally important. Very frequently, the enhancement of one parameter is performed considering a
requirement for the other parameter.
Specifically, achieving the coveted execution (speed) considering the restricted power spending plan of convenient
frameworks is testing task .In expansion, having a given level of unwavering quality might be another obstacle in
achieving the framework target performance. To meet the power and speed particulars, a variety of techniques at
various outline reflection levels have been suggested. Surmised registering approaches are based on achieving the
objective particulars at the expense of diminishing the computation exactness.
The methodology might be utilized for applications where there is certifiably not an exceptional answer and
additionally a set of replies close to the precise outcome can be viewed as satisfactory. These applications
incorporate sight and sound handling, machine learning, flag preparing, and other blunder resilient computations.
Inexact math units are chiefly based on the improvement of the number-crunching units circuits. There are
ABSTRACT
Distributed arithmetic (DA) calculation is generally utilized for FIR channel execution. In the starting, DA was proposed as successive DA (SDA),
and at that point was stretched out to parallel DA (PDA) for higher throughput. This paper introduces a novel PDA FIR channel design in view of 4:2
compressors which can be mapped on Xilinx FPGAs effectively. Overall, our proposed FIR models accomplish 17.5% decrease in asset use and
20.7% change in execution contrasted with the cutting edge PDA FIR channel. Additionally, overall, there is 57.9% decrease in asset utilization and
23.0% change in execution contrasted with PDA FIR. Another 4:2 compressor design in light of changing some inward conditions are proposed.
Furthermore, utilizing an efficient full-snake (FA) square is considered to have a fast blower. Three 4:2 compressors are considered for examination.
The proposed engineering is contrasted and the best existing plans exhibited in the best in class writing regarding force, deferral and territory. The
paper presents compressors that are broadly utilized as building squares of multipliers.
Keywords: 4:2 compressor, Distributed arithmetic, PDA FIR channel.

numerous earlier works concentrating on estimated multipliers, which give higher velocities and lower control
utilizations at the expense of lower correctness.
Nearly, the majority of the proposed surmised multipliers depend on having a settled level of exactness amid the
runtime. The runtime exactness reconfigurability, be that as it may, is considered as a valuable component for
giving diverse levels of nature of administration amid the framework activity. Here, by reducing the quality
(exactness), the deferral and additionally control consumption of the unit might be lessened.
2. PRELIMINARIES
2.1 PARRALLEL FIR FILTER
Parallel FIR computerized channel are planned utilizing three structures for 2*2 parallel channels. The 2*2 parallel
FIR channel comprise of two channel inputs (X0, X1), two channel coefficient (H0, H1), and two channel yield
(Y0, Y1).
Fig 1 Parallel FIR digital filter
Customary parallel FIR channel structure Y0=H0X0+Z−2H1X1 Y1=H0X1+H1X0 This condition gives the yield
of 2*2 conventional parallel FIR channel structure .This conventional channel requires four sub channel squares of
length N/2, 4 multiplier and 2 adders
2.2. FIR FILTER IMPLEMENTATION USING DA TECHNIQUE
Distributed arithmetic (DA) gives a way to deal with multiplier-less usage of DSP frameworks. It replaces all
increases and augmentations by a LUT and a shifter-aggregator. It can spare extensive equipment assets through
utilizing LUT. DA can be utilized to register whole of items. The channel coefficients are as of now known, in this
way augmentation of every one of the channel coefficients by the information grouping turns into a consistent.
These incomplete items are recomputed and saved a look into table.
According to the LSB of input sequence, we can conserve the coefficient values in LUT unit. With the increase of
filter order, the scale of LUT will increase dramatically which will cost more time to look table and more memory
to store the values. Therefore, we can divide the LUT unit into small LUT units in to solve this problem. Then the
values of the divided LUT units are added as the final value expense of increase in adders. By using distributed
arithmetic, we can completely replace all multipliers by using 303 adders. In 27 tap 6 parallel FIR filter, the device
utilization is more compared to the 3 parallel FIR filter.

Fig 2FIR filter implementation using DA technique
2.3 4:2 COMPRESSOR
Fig 3 4:2 compressor
To include parallel numbers with negligible convey spread we utilize blower snake rather than other viper. Blower
is a advanced present day circuit which is utilized for fast with least entryways requires outlining method. This
blower turns into the fundamental instrument for quick increase including procedure by watching out for quick
processor and lesser area.4:2 blowers are equipped for including 4 bits and one convey, thusly delivering a 3 bit
yield. The 5:3 blowers has 4inputs G1, G2, G3 and G4 and 2 yields Sum and Carry alongside a Carry-in (Cin) and a
Carry-out (Cout). The information Cin is the yield from the past lower critical compressor. The Cout is the yield to
the blower in the following huge stage. The basic way is littler in correlation with an equivalent circuit to include 5
bits utilizing full adders and half adders.
3. LITERATURE SURVEY
Parallel Distributed Arithmetic FIR Filter Design Based on 4:2 Compressors on Xilinx FPGAs (2017) says that
Distributed arithmetic (DA) algorithm is widely used for finite impulse response (FIR) filter implementation. In the
beginning, DA was proposed as sequential DA (SDA), and then was extended to parallel DA (PDA) for higher
throughput. This paper presents a novel PDA FIR filter architecture based on 4:2 compressors which can be
mapped on Xilinx FPGAs efficiently. On average, our proposed FIR architectures achieve 17.5% reduction in
resource usage and 20.7% [1-24] improvement in performance compared to the state-of-the-art PDA FIR filter.
Also, on average, there is 57.9% reduction in resource usage and 23.0% improvement in performance compared to
PDA FIR filters generated by Xilinx Coregen.
An energy and area efficient 4:2 compressor based on FinFETs (2017) The proposed compressor is designed
efficiently based on multiplexer and XOR modules. The efficiency of the 4:2 compressor, as a fundamental

arithmetic block, directly affects the performance of large multipliers. In the same way, the performance of
multipliers impacts the efficiency of digital signal processing (DSP) units. On the other hand FinFET has been
successfully utilized in industry as a substitute device for conventional bulk MOSFETs in sub 32 nm technologies,
due to its superior gate control, lower short channel effects and higher scalability.
Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers (2017) In this paper,
we propose four 4:2 compressors, which have the flexibility of switching between the exact and approximate
operating modes. In the surmised mode, these double quality blowers give higher speeds and lower control
utilizations at the expense of lower exactness. Every one of these blowers has its very own level of precision in the
surmised mode and also extraordinary postponements and power disseminations in the inexact and correct modes.
Utilizing these blowers in the structures of parallel multipliers gives configurable multipliers whose exactnesses
(and additionally their forces and speeds) may change powerfully amid the runtime. The efficiencies of these
blowers in a 32-bit Dadda multiplier are assessed in a 45-nm standard CMOS innovation by contrasting their
parameters and those of the best in class surmised multipliers. The after effects of correlation show, by and large,
46% and 68% lower deferral and power utilization in the inexact mode. Likewise, the viability of these blowers is
evaluated in some picture handling applications.
Structured Approach for Designing 4:2 Compressor (2017) 4:2 compressors play a large role in the design of
arithmetic units, especially in the case of multipliers. In this study we compare different implementations of 4:2
compressors in terms of their power, delay, power delay product and the number of transistors used. A total of 36
different implementations are compared and simulated using LTSPICE, and the power consumption was found to
vary from 0.12μw to 0.38μw, and the power-delay product from 0.12fJ to 0.30fJ at 0.6V using 45nm BICMOS
technology. The transistor counts vary from 30 to 38 for different implementations of the 4:2 compressor.
Urdhwa Multiplier using XOR-XNOR based 4:2 and 7:2 Compressors (2015) With the advent of new technology
in the fields of VLSI and communication, there is also an ever growing demand for high speed processing and low
area design. It is also a well-known fact that the multiplier unit forms an integral part of processor design. Due to
this regard, high speed multiplier architectures become the need of the day. In this paper, we have developed three
designs for Urdhwa Multiplier. In first design we have developed 5:3compressors based on full adder and
utilization in term of 42 delays and 21 areas. In second design we have developed5:3 compressors based on XOR
gate and utilization in term of 36 delays and 24 areas. In third design we have developed 5:3 compressors based on
full adder and utilization in term of 28 delays and 18 areas.
4. EXISTING METHOD
The 4:2 compressor has five input signals, including four main inputs and an input carry bit (Cin), coming from the
previous stage, and three output signals including Sum and Carry main outputs and an output carry signal (Cout)
which serves as the Cin of the next neighbouring block. It is worth mentioning that the Cout signal is independent

of the Cin input due to the elimination of carry propagation through the multiplier tree. A 4:2 compressor cell
generates Sum, Carry and Cout outputs from×1, ×2, ×3, ×4 and Cin inputs.
Sum = x⨁x⨁x ⨁x ⨁C1 2 3 4 in (1)
Carry = Majority[(x⨁x⨁x ), x , C ] 1 2 3 4 in (2)
C = Majority(x ,x , x ) out 1 2 3 (3)
The gate-level and the transistor-level designs are two different approaches for designing blocks such as 4:2
compressors. Each method has its own merit and demerit which should be considered based on design issues and
requirements. The main advantages of the gate-leveldesign approach are acceptable time to market and more
facilitated hardware development processes under various platforms. The major drawback of this approach is its
limited optimization capability.
5. PROPOSED DESIGN
The most different part from the common PDA FIR filter architecture shown in Fig. 2 is that the adder tree using
2-input adders is replaced by a compressor tree using 4:2 compressors. Moreover, registers (all registers are not
shown) are used after pre-adders, 4:2 compressors and LUTs to reduce the critical path delay. A compressor tree
usually has several levels: for the first level, the inputs are the original operands; for other levels, the inputs are the
outputs from previous level. Each level contains several columns with different ranks, and in each column three
basic units, 4:2 compressors, 2-input adders, and pipeline registers are used. The input bits are considered to map
onto 4:2 compressors in priority, and after 4:2 compressor mapping, if one bit is left, a pipeline register unit is used;
if two bits are left, a 2-input adder unit is used; if three bits are left, a 4:2 compressor unit with an input connected to
constant 0 is used.
We also proposed a synthesis heuristic to construct the compressor tree, Parameter M represents for the number of
operands, and parameter Input Number Array includes a series of operands to be added. The output is a compressor
tree described by Verilog. Then, the summation bit width for six 4-bit signed operands is calculated to 7 bits. After
that, the heuristic begins to construct the first compress level.
The six operands are divided into 2 groups, one group has 4 operands with 2-bit sign extension, and the other group
has 2 operands with 1-bit sign extension. Note that the bits in brackets are the extended sign bits. After sign
extension, column inputs are mapped onto the basic units. For example, 4inputs in level 1 column 0 are mapped
onto a 4:2 compressor and the rest 2 inputs in the same column are connected to a 2-input adder.
The process will repeat until all first level columns are covered. Next, the inputs of basic units in first level are
connected to original inputs, and the outputs of first level are generated. After that, the second compress level is
constructed in the same manner. Finally, the outputs produced by the second compress level are connected to the
CPA.

Fig 5 Proposed block diagram
6. RESULTS AND DISCUSSION
Graph 1 shows the used slice number for the proposed PDA FIR filters and slice reduction in percentage compared
to Kumm and Xilinx Coregen PDA FIR filters. As expected, the proposed PDA uses fewer slices in all cases.
Compared to Kumm’s PDA, the proposed PDA achieves reduction of slice 17.5% on average, and the highest
reduction of slice is 32.9% when the tap number is 10. On average, the proposed PDA achieves 57.9% reduction of
slice compared to Xilinx Codegen. Furthermore, the proposed PDA achieves at least 53% slice reduction in all
cases compared to Xilinx Codegen.

Graph 1 Slice Resource Usage Results
Graph II shows the proposed PDA FIR filter performance and improvements in percentage measured by maximum
clock frequency, compared to Kumm and Xilinx Coregen PDA FIR filters. On average, the proposed PDAs are
20.7% and 23.0% faster than the PDA generated by Kumm and Xilinx Coregen respectively. Moreover, in most of
the cases (except for N = 119 and N = 151), the clock frequency of the proposed PDA FIR filter can be over
500MHz. The main reasons are: 1) different from the 3-input adder, the 4:2 compressor does not have routing delay
between adjacent LUTs; 2) the 4:2 compressor has a more compact structure. In addition, the reduction of resource
usage can ease the place & route burden, and the fully pipeline structure can reduce critical path delay.
7. CONCLUSION AND FUTURE WORK
This paper presented an improved parallel distributed arithmetic FIR filter architecture based on compressor trees
using 4:2 compressors that can be mapped to low level structure of Xilinx FPGAs efficiently. We analyzed the
structure of compressor trees and designed a synthesis heuristic to construct the compressor trees. What’s more, a
code generator implemented by C++ was designed to generate the Verilog code automatically. The pipeline method
is also improved to reduce the critical path delay. Several benchmarks were used to validate the improvement. The
experimental results showed the proposed PDA FIR filter architecture could be well mapped to Xilinx FPGAs. On
average, our proposed PDA FIR filters achieve 17.5% resource usage reduction and 20.7% performance
improvement, compared to the state-of the- art PDA FIR filter. Also, on average, there are 57.9% resource usage
reduction and 23.0% performance improvement, compared to PDA FIR filters generated by Xilinx Coregen. In
future enchancement, we use if 7:2 compressor in parallel fir filter, we can reduce the power, memory size of the
filter.
REFERENCES
1. Xiaoqiang Zhang, “Parallel Distributed Arithmetic FIR Filter Design Based on 4:2 Compressors on Xilinx FPGAs” 4th
IEEE International Conference on Signal Processing, Computing and Control (ISPCC 2k17), Sep21-23, 2017, Solan,
India
0
500
1000
1500
2000
2500
1 2 3 4 5
FIR Complier 5.0
PDA Slices
PDA Kumm PDA
Slices
Proposed Slices
Filter Slices

2. Nader Bagherzadeh “An energy and area efficient 4:2 compressor based on FinFETs” INTEGRATION the VLSI journal
60 (2018) 224–231
3. Anil Lamba, "Uses Of Artificial Intelligent Techniques To Build Accurate Models For Intrusion Detection System”,
International Journal for Technological Research in Engineering, Volume 2, Issue 12, pp.5826-5830, 2015.
4. Anil Lamba, "Mitigating Zero-Day Attacks In IOT Using A Strategic Framework", International Journal for
Technological Research in Engineering, Volume 4, Issue 1, pp.5711-5714, 2016.
5. Anil Lamba, "Identifying & Mitigating Cyber Security Threats In Vehicular Technologies", International Journal for
6. Anil Lamba, "S4: A Novel & Secure Method For Enforcing Privacy In Cloud Data Warehouses", International Journal for
7. Anil Lamba, “Cyber Attack Prevention Using VAPT Tools (Vulnerability Assessment & Penetration Testing)”, Cikitusi
Journal for Multidisciplinary Research, Volume 1, Issue 2, July - December, pp.64-71, 2014.
8. Anil Lamba, “A Through Analysis on Protecting Cyber Threats and Attacks on CPS Embedded Subsystems”,
International Journal of Current Engineering and Scientific Research (IJCESR), Volume-1, Issue-3, pp.48-55, 2014.
9. Anil Lamba, “Analysing Sanitization Technique of Reverse Proxy Framework for Enhancing Database-Security”,
International Journal of Information and Computing Science, Volume 1, Issue 1, pp.30-44, 2014.
10. Anil Lamba, “Enhancing Awareness of Cyber-Security and Cloud Computing using Principles of Game Theory”,
International Journal of Advanced in Management, Technology and Engineering Sciences, Volume III, Issue I, pp.71-82,
2013.
11. Anil Lamba, “Resolve Security Policies Conflicts Through Semantics Matching Alignment”, International Journal of
Scientific Research and Review, Volume 2, Issue 2, pp.43-58, 2013.
12. Anil Lamba, “A Detailed Analysis of Data Security in a cloud Environment”, SURAJ PUNJ Journal for Multidisciplinary
Research, Volume 3, Issue 2, pp.43-51, 2013.
13. D. Bhat, A. Kaur, and S. Singh, "Wireless sensor network specific low power FIR filter design and implementation on
FPGA," 2015 2nd
International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi,
pp. 1534-1536, 2015.
A. Martins, M. Fonseca, and E. Costa, "Optimal combination of dedicated multiplication blocks and adder trees schemes for
optimizedradix-2m array multipliers realization," 2015 IEEE International Conference on Electronics, Circuits, and
Systems (ICECS), Cairo, pp.352-355, 2015.
14. S. Y. Park, and P. K. Meher, "Efficient FPGA and ASIC Realizations of a DA-Based Reconfigurable FIR Digital Filter,"
in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 61, no. 7, pp. 511-515, July 2014
15. M. Kumm, and P. Zipf, “Efficient high speed compression trees onxilinx FPGAs,” in Methodenund Beschreibungss
prachenzur Modellierung und Verifikation von Schaltungen und Systemen, Böblingen, Germany, March 2014
16. M. Kumm, K. Möller, and P. Zipf, "Reconfigurable FIR filter using distributed arithmetic on FPGAs," 2013 IEEE
International Symposium on Circuits and Systems (ISCAS2013), Beijing, pp. 2058-2061, 2013.
17. M. Kumm, K. Möller, and P. Zipf, "Partial LUT size analysis in distributed arithmetic FIR Filters on FPGAs," 2013 IEEE
International Symposium on Circuits and Systems (ISCAS2013), Beijing, pp. 2054-2057, 2013.
18. H. Ariyadoost, Y. S. Kavian, and K. Ansari-Asl, "Two dimensional systolic adaptive DLMS FIR filters for image
processing on FPGA,"20th Iranian Conference on Electrical Engineering (ICEE2012), Tehran, pp. 243-248, 2012.
A. Pishvaie, G. Jaberipur, A. Jahanian, Improved CMOS (4;2) compressor designs for parallel multipliers, Comput. Electr.
Eng. 38 (6) (2012) 1703–1716.
19. Anil Lamba, "Uses Of Cluster Computing Techniques To Perform Big Data Analytics For Smart Grid Automation
System", International Journal for Technological Research in Engineering, Volume 1 Issue 7, pp.5804-5808, 2014.
20. Anil Lamba, “Uses Of Different Cyber Security Service To Prevent Attack On Smart Home Infrastructure", International
Journal for Technological Research in Engineering, Volume 1, Issue 11, pp.5809-5813, 2014.
21. Anil Lamba, "A Role Of Data Mining Analysis To Identify Suspicious Activity Alert System”, International Journal for
Technological Research in Engineering, Volume 2 Issue 3, pp.5814-5825, 2014.
22. Anil Lamba, "To Classify Cyber-Security Threats In Automotive Doming Using Different Assessment Methodologies”,
International Journal for Technological Research in Engineering, Volume 3, Issue 3, pp.5831-5836, 2015.
23. Anil Lamba, “A Study Paper On Security Related Issue Before Adopting Cloud Computing Service Model”, International
Journal for Technological Research in Engineering, Volume 3, Issue 4, pp.5837-5840, 2015.
24. H. Parandeh-Afshar, A. Neogy, P. Brisk, and P. Ienne, “Compressor Tree Synthesis on Commercial High-Performance
FPGAs,” ACM Transactions on Reconfigurable Technology and Systems, Vol. 4, No.4, Article 39, December 2011.

Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter

Similar to Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter (20)

More from Associate Professor in VSB Coimbatore

More from Associate Professor in VSB Coimbatore (20)

Recently uploaded

Recently uploaded (20)

Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter