The paper mainly concentrates on the development of the new architecture for BCD parallel multiplier that
exploits some properties of two different redundant BCD codes to speed up its computation: the redundant
BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In this we have developed a 16
bit BCD multiplier using some new techniques to reduce significantly the latency and area of previous
representative high-performance implementations. The key role plays by the Partial product generation in
parallel using a signed-digit radix-10 recoding of the BCD multiplier with the digit set [-5, 5], and a set of
positive multiplicand multiples (1X, 2X, 3X, 4X, 5X) coded in XS-3.By using the above approach of encoding
there are several advantages like mainly it is a self-complementing code, so that a negative multiplicand
multiple can be obtained by just inverting the bits of the corresponding positive one. Also, the available
redundancy allows a fast and simple generation of multiplicand multiples in a carry-free way and finally, the
partial products can be recoded to the ODDS representation by just adding a constant factor into the partial
product reduction tree. Since the ODDS uses a similar 4-bit binary encoding as non-redundant BCD,
conventional binary VLSI circuit techniques. We had developed a new approach of BCD addition for the final
stage. The above developed architecture of 4X4 has been synthesized a RTL model and given better
performance compared to old version multipliers.
Implementation of High Speed & Area Efficient Modified Booth Recoder for Effi...IJMTST Journal
Many communication applications require multifaceted arithmetic operation are used in many digital
signal processing (DSP) relevance. Mainly in the reduction of multiplier power and area consumption it can
play an important role in high performance of any digital indication processing system. within this paper,
mainly centre of attention on optimizing and increased performance by reduction in power consumption in
propose of the fused Add-Multiply (FAM) operator. This implements a new technique by straight recoding of
sum two numbers in Modified Booth (MB) form. In this paper implemented a new and efficient structured
technique by straight recoding of sum of two numbers by considering existing modified booth (MB)
technique. The new technique is implemented by three new dissimilar schemes by integrating them within
existing FAM plans. The performance of the proposed three different schemes with the implementation of
new model carry select adder (K-adders) gives reduction in conditions of critical delay, hardware
complication and power utilization while comparing with the existing AM design.
1. The document presents a design for a modified Booth recoder using a fused add-multiply (FAM) operator to implement digital signal processing applications more efficiently.
2. It proposes a new recoding technique to decrease the critical path delay and reduce area and power consumption of the FAM unit compared to existing recoding schemes.
3. The technique is also applied to the implementation of finite impulse response (FIR) filters to further optimize hardware usage and achieve faithfully rounded outputs within tight area and power constraints for mobile applications.
This document presents the implementation of an optimized floating point adder on an FPGA that follows the IEEE 754-2008 standard for decimal floating point numbers. It uses a densely packed decimal encoding scheme. The design uses a low power equal bypass adder to reduce power consumption and delay. Testing showed the design has a maximum delay of 45ns and operates in a single clock cycle on a Virtex-5 FPGA. The optimized adder design can help reduce power usage in larger floating point arithmetic units.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFTIRJET Journal
This document describes a proposed area efficient mixed decimation Multipath Delay Feedback (MDF) architecture for radix parallel Fast Fourier Transform (FFT) computation. The proposed architecture aims to improve efficiency in utilization of arithmetic resources in MDF architectures by integrating Decimation-In-Time (DIT) operations into Decimation-In-Frequency (DIF) operated computing units. This activates idle periods of arithmetic units in MDF architectures. The proposed architecture, called a mixed decimation MDF or DF architecture, is compared theoretically and experimentally to conventional MDF architectures. Results show the proposed DF architecture achieves improved efficiency in consumption of arithmetic resources.
Comparative Design of Regular Structured Modified Booth MultiplierVLSICS Design
Multiplication is a crucial function and plays a vital role for practically any DSP system. Several DSP
algorithms require different types of multiplications, specifically modified booth multiplication algorithm.
In this paper, a simple approach is proposed for generating last partial product row for reducing extra
sign (negative bit) bit to achieve more regular structure. As compared to the conventional multipliers these
proposed modified Booth’s multipliers can achieve improved reduction in area 5.9%, power 3.2%, and
delay 0.5% for 8 x 8 multipliers. We can also observe that achievable improvement for 16 x 16 multiplier
in area, power, delay are 4.0%, 2.3%, 0.3% respectively. These multipliers are implemented using verilog
HDL and synthesized by using synopsis design compiler with an Artisan TSMC 90nm Technology
This document describes the design and implementation of binary coded decimal (BCD) digit adders and multipliers on an FPGA platform. Specifically, it proposes a correction-free BCD digit adder that avoids the need for a correction circuit, improving speed and efficiency. It also proposes a BCD digit multiplier based on the Wallace tree architecture. The designs are described in VHDL, simulated, synthesized using Quartus II, and compared. Results show the correction-free BCD adder has the fastest delay and lowest power compared to other proposed adders. The BCD Wallace tree multiplier also achieves better performance than an array-based multiplier design.
This document describes an efficient algorithm for two's complement multiplication. The algorithm focuses on reducing the number of partial product rows generated in the first step of multiplication. It presents a method called "sign extension prevention" that removes unequal row lengths from the partial product array. However, there is still one additional partial product row generated from the last "neg" signal of the modified Booth encoding. The algorithm then describes a quick method to calculate the two's complement of a number by selectively complementing bits based on "conversion signals". This allows generating the two's complement while also producing the other partial products, removing the need for the extra row from the last "neg" signal.
Implementation of High Speed & Area Efficient Modified Booth Recoder for Effi...IJMTST Journal
Many communication applications require multifaceted arithmetic operation are used in many digital
signal processing (DSP) relevance. Mainly in the reduction of multiplier power and area consumption it can
play an important role in high performance of any digital indication processing system. within this paper,
mainly centre of attention on optimizing and increased performance by reduction in power consumption in
propose of the fused Add-Multiply (FAM) operator. This implements a new technique by straight recoding of
sum two numbers in Modified Booth (MB) form. In this paper implemented a new and efficient structured
technique by straight recoding of sum of two numbers by considering existing modified booth (MB)
technique. The new technique is implemented by three new dissimilar schemes by integrating them within
existing FAM plans. The performance of the proposed three different schemes with the implementation of
new model carry select adder (K-adders) gives reduction in conditions of critical delay, hardware
complication and power utilization while comparing with the existing AM design.
1. The document presents a design for a modified Booth recoder using a fused add-multiply (FAM) operator to implement digital signal processing applications more efficiently.
2. It proposes a new recoding technique to decrease the critical path delay and reduce area and power consumption of the FAM unit compared to existing recoding schemes.
3. The technique is also applied to the implementation of finite impulse response (FIR) filters to further optimize hardware usage and achieve faithfully rounded outputs within tight area and power constraints for mobile applications.
This document presents the implementation of an optimized floating point adder on an FPGA that follows the IEEE 754-2008 standard for decimal floating point numbers. It uses a densely packed decimal encoding scheme. The design uses a low power equal bypass adder to reduce power consumption and delay. Testing showed the design has a maximum delay of 45ns and operates in a single clock cycle on a Virtex-5 FPGA. The optimized adder design can help reduce power usage in larger floating point arithmetic units.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFTIRJET Journal
This document describes a proposed area efficient mixed decimation Multipath Delay Feedback (MDF) architecture for radix parallel Fast Fourier Transform (FFT) computation. The proposed architecture aims to improve efficiency in utilization of arithmetic resources in MDF architectures by integrating Decimation-In-Time (DIT) operations into Decimation-In-Frequency (DIF) operated computing units. This activates idle periods of arithmetic units in MDF architectures. The proposed architecture, called a mixed decimation MDF or DF architecture, is compared theoretically and experimentally to conventional MDF architectures. Results show the proposed DF architecture achieves improved efficiency in consumption of arithmetic resources.
Comparative Design of Regular Structured Modified Booth MultiplierVLSICS Design
Multiplication is a crucial function and plays a vital role for practically any DSP system. Several DSP
algorithms require different types of multiplications, specifically modified booth multiplication algorithm.
In this paper, a simple approach is proposed for generating last partial product row for reducing extra
sign (negative bit) bit to achieve more regular structure. As compared to the conventional multipliers these
proposed modified Booth’s multipliers can achieve improved reduction in area 5.9%, power 3.2%, and
delay 0.5% for 8 x 8 multipliers. We can also observe that achievable improvement for 16 x 16 multiplier
in area, power, delay are 4.0%, 2.3%, 0.3% respectively. These multipliers are implemented using verilog
HDL and synthesized by using synopsis design compiler with an Artisan TSMC 90nm Technology
This document describes the design and implementation of binary coded decimal (BCD) digit adders and multipliers on an FPGA platform. Specifically, it proposes a correction-free BCD digit adder that avoids the need for a correction circuit, improving speed and efficiency. It also proposes a BCD digit multiplier based on the Wallace tree architecture. The designs are described in VHDL, simulated, synthesized using Quartus II, and compared. Results show the correction-free BCD adder has the fastest delay and lowest power compared to other proposed adders. The BCD Wallace tree multiplier also achieves better performance than an array-based multiplier design.
This document describes an efficient algorithm for two's complement multiplication. The algorithm focuses on reducing the number of partial product rows generated in the first step of multiplication. It presents a method called "sign extension prevention" that removes unequal row lengths from the partial product array. However, there is still one additional partial product row generated from the last "neg" signal of the modified Booth encoding. The algorithm then describes a quick method to calculate the two's complement of a number by selectively complementing bits based on "conversion signals". This allows generating the two's complement while also producing the other partial products, removing the need for the extra row from the last "neg" signal.
This document discusses optimizing the gate-level area of digit-serial finite impulse response (FIR) filter designs using multiple constant multiplication (MCM) blocks. It introduces the problem of designing a digit-serial MCM operation with minimal area and presents algorithms to formalize the area optimization problem. Results show the proposed algorithms and digit-serial MCM architectures efficiently design digit-serial MCM operations and FIR filters with reduced area compared to existing approaches.
Implementation of Low-Complexity Redundant Multiplier Architecture for Finite...ijcisjournal
In the present work, a low-complexity Digit-Serial/parallel Multiplier over Finite Field is proposed. It is
employed in applications like cryptography for data encryption and decryptionto deal with discrete
mathematical andarithmetic structures. The proposedmultiplier utilizes a redundant representation because
of their free squaring and modular reduction. The proposed 10-bit multiplier is simulated and synthesized
using Xilinx VerilogHDL. It is evident from the simulation results that the multiplier has significantly low
area and power when compared to the previous structures using the same representation.
A comparative study of different multiplier designsHoopeer Hoopeer
This document compares four different multiplier designs from the perspectives of power consumption, delay, and area usage. It finds that a carry save adder (CSA) multiplier achieves the lowest power consumption of 0.198mW and delay of 6psec, making it suitable for low-power applications. A modified Booth encoder (MBE) multiplier requires the smallest area of 4030μm2. In general, the four multiplier designs show tradeoffs between power, speed, and area depending on the specific implementation and modification techniques used.
IRJET- The RTL Model of a Reconfigurable Pipelined MCMIRJET Journal
This document presents two Register Transfer Level (RTL) models for a reconfigurable pipelined multiple constant multiplier (MCM) architecture. The first model uses a ripple carry adder, while the second model replaces it with a carry lookahead adder. Both models are constructed using Verilog and simulated using ModelSim. Simulation results show the second model has a 3.53% higher maximum clock frequency and throughput due to the faster carry lookahead adder. It also has 3.53% lower latency but uses 1.65% more power. The second model provides faster performance with moderate increases in resource usage and power consumption.
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...IRJET Journal
This document describes a proposed MAC (multiply and accumulate) unit design that aims to improve performance and reduce resource usage compared to conventional pipeline MAC unit designs. The proposed design uses Booth encoding to reduce the number of partial products, groups the partial products into blocks that are added using multi-operand adders, and implements circular convolution by rearranging the partial products. Simulation results show that the proposed design achieves higher performance and lower resource usage than conventional pipeline and redundant carry-save MAC unit designs. The design is synthesized on an Altera Stratix III FPGA to take advantage of fast carry chains.
Design and implementation of address generator for wi max deinterleaver on fpgaeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document summarizes a research paper on designing low power digit serial finite impulse response (FIR) filters using multiple constant multiplication (MCM) techniques. It proposes an architecture that optimizes the area of digit serial MCM operations at the gate level by considering the implementation costs of digit serial addition, subtraction and shift operations. An algorithm is presented to design digit serial FIR filters under a shift-adds architecture to reduce area compared to designs using generic digit serial multipliers. Experimental results show the technique leads to lower complexity digit serial MCM designs.
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...ijceronline
In this paper, high linear architectures for analysing the first two maximum or minimum values are of paramount importance in several uses, including iterative decoders. We proposed the adder and LDPC. The min-sum processing step that it gives only two different output magnitude values irrespective of the number of incoming bit-to check messages. These new micro-architecture layouts would employ the minimum number of comparators by exploiting the concept of survivors in the search. These would result in reduced number of comparisons and consequently reduced energy use. Multipliers are complex units and play an important role in finding the overall area, speed and power consumption of digital designs. By using the multiplier we can minimize the parameters like latency, complexity and power consumption.
IRJET- Efficient Design of Radix Booth MultiplierIRJET Journal
The document proposes a method to optimize binary radix-16 Booth multipliers by reducing the maximum height of the partial product columns from (n + 1)/4 to n/4 for n-bit operands. This is achieved by performing a short carry-propagate addition in parallel to the regular partial product generation, which reduces the maximum height by one row. The method allows further optimizations in the partial product array reduction stage in terms of area, delay, and power. It can also allow additional partial products to be included without increasing delay. The method is generally applicable but provides the most benefit for 64-bit radix-16 Booth multipliers.
Efficient implementation of bit parallel finite field multiplierseSAT Publishing House
This document summarizes an article that presents an efficient implementation of a bit parallel Karatsuba finite field multiplier on FPGA. It begins by introducing finite field arithmetic and multiplication, which is the most resource intensive operation. It then discusses different multiplier designs, including the classical multiplier and Karatsuba multiplier. The Karatsuba multiplier has lower complexity than the classical multiplier by reducing the number of gates required. Experimental results on FPGAs show that the bit parallel Karatsuba multiplier consumes the fewest resources among known FPGA implementations. In summary, the document presents an efficient Karatsuba finite field multiplier design with lower complexity than alternative designs.
Efficient implementation of bit parallel finite eSAT Journals
Abstract Arithmetic in Finite/Galois field is a major aspect for many applications such as error correcting code and cryptography. Addition and multiplication are the two basic operations in the finite field GF (2m).The finite field multiplication is the most resource and time consuming operation. In this paper the complexity (space) analysis and efficient FPGA implementation of bit parallel Karatsuba Multiplier over GF (2m) is presented. This is especially interesting for high performance systems because of its carry free property. To reduce the complexity of Classical Multiplier, multiplier with less complexity over GF (2m) based on Karatsuba Multiplier is used. The LUT complexity is evaluated on FPGA by using Xilinx ISE 8.1i.Furthermore,the experimental results on FPGAs for bit parallel Karatsuba Multiplier and Classical Multiplier were shown and the comparison table is provided. To the best of our knowledge, the bit parallel karatsuba multiplier consumes least resources among the known FPGA implementation. Keywords: Classical Multiplier, Cryptograph, FPGA, Galois field, Karatsuba Multiplier
Arithmetic Operations in Multi-Valued LogicVLSICS Design
This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, keeping minimum number of gates and depth of net in to onsideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice.
A Spurious-Power Suppression technique for a Low-Power MultiplierIOSR Journals
Abstract: This paper presents the design exploration of a spurious-power suppression technique (SPST) which can dramatically reduce the power dissipation of combinational VLSI designs for multimedia/DSP purposes. The proposed SPST separates the target designs into two parts, i.e., the most significant part and least significant part (MSP and LSP), and turns off the MSP when it does not affect the computation results to save power. The objective of a good multiplier is to provide a physically compact, good speed and low power consuming chip. To save significant power consumption of a VLSI design, it is a good direction to reduce its dynamic power that is the major part of total power dissipation. In this paper, we propose a high speed low-power multiplier adopting the new SPST implementing approach. This multiplier is designed by equipping the Spurious Power Suppression Technique (SPST) on a modified Booth encoder which is controlled by a detection unit using an AND gate. The modified booth encoder will reduce the number of partial products generated by a factor of 2. The SPST adder will avoid the unwanted addition and thus minimize the switching power dissipation. Keywords-Booth encoder; low power; spurious power suppression technique(SPST); SPST-Adder
The document presents a new reversible logic gate called BBCDC (Binary to BCD conversion) and a more effective realization of a BCD adder circuit using the proposed BBCDC gate. The BBCDC is a 5x5 reversible gate that converts binary numbers to BCD format. The proposed BCD adder uses DKFG reversible gates for addition and the BBCDC gate for binary to BCD conversion. A comparison shows the proposed design uses fewer gates and garbage outputs than existing BCD adder designs. The efficient design of the BCD adder depends on the reversible ripple carry adder and the reversible binary to BCD converter used.
IRJET- Design and Implementation of Code Converters for High Speed Multiplier...IRJET Journal
This document describes the design and implementation of code converters for high speed multiplier applications. It proposes an 8-bit and 16-bit BCD to binary code converter using shifters and adders to reduce delay compared to existing converters. A 4-bit binary multiplier is also developed using multiplexers as the base for an 8-bit BCD multiplication. The converters and multiplier are used in a decimal multiplication architecture based on the 'Urdhava Triyagbhyam' sutra. Simulation results on FPGA and using a 45nm Cadence technology library show the 16-bit converter has lower delay than the 8-bit converter. The proposed designs aim to improve performance of decimal arithmetic operations.
This document presents a comparative analysis of low power 4-bit multipliers designed using 120nm CMOS technology. It discusses different low power multiplier designs that utilize row bypassing, column bypassing, and 2-dimensional bypassing techniques to reduce power consumption by disabling operations in rows or columns where the partial products are zero. Simulation results show that the Braun 2-dimensional bypassing multiplier has the lowest power consumption of 0.31mw and shortest delay of 11ns, making it the most effective design in terms of power and area for a 1GHz multiplier. Layout designs of components like full adders, tri-state buffers, and multiplexers used in the bypassing multiplier architectures are also presented.
REDUCTION OF BUS TRANSITION FOR COMPRESSED CODE SYSTEMSVLSICS Design
Low power VLSI circuit design is one of the most important issues in present day technology. One of the ways of reducing power is to reduce the number of transitions on the bus. The main focus here is to present a method for reducing the power consumption of compressed-code systems by inverting the bits that are transmitted on the bus. Compression will generally increase bit-toggling, as it removes redundancies from the code transmitted on the bus. Arithmetic coding technique is used for compression /decompression and bit-toggling reduction is done by using shift invert coding technique. Therefore, there is also an additional challenge, to find the right balance between compression ratio and the bit-toggling reduction. This technique requires only 2 extra bits for the low Power coding, irrespective of the bit-width of the bus for compressed data.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
The document presents a generic architecture for an area-efficient 4-input binary coded decimal (BCD) adder implemented on an FPGA. It modifies a previously proposed area-efficient 3-input decimal adder to support a generic number of inputs. The proposed 4-input adder has four stages: 1) carry save addition and propagation/generation signal generation, 2) carry network, 3) correction stage, and 4) final addition. Simulation results on a Xilinx FPGA for different number of bits and inputs are presented, showing the adder has reduced delay and area compared to previous approaches. The generic approach can support addition of any number of inputs.
This document discusses binary coded decimal (BCD) and decimal decoders. It begins by introducing the team members and then provides information about BCD, including that each decimal digit is represented by 4 bits. It describes how BCD is commonly used in electronic displays. It then discusses decimal decoders, including how they work to decode a BCD input into one of ten decimal outputs. It provides examples of 2-4 and BCD to decimal decoders. It concludes by discussing other types of decoders and how decoders can be used in logic design.
This document contains an assignment given to students with 17 questions covering various digital logic design topics. The questions cover converters between BCD and excess-3 code and BCD and gray code using K-maps, magnitude comparators, encoders and decoders, 7-segment displays, multiplexers, demultiplexers, adders including look ahead carry and BCD adders, shift registers, counters including asynchronous and Johnson counters, and sequential circuits. The assignment was prepared by Prashant Kumar and verified by the head of department.
This document presents a comparative study of implementing single precision floating point division using different computational algorithms on FPGAs. It describes two commonly used division algorithms: Goldschmidt and Newton-Raphson. The Goldschmidt algorithm continually multiplies the numerator and denominator by a common factor to converge the denominator to 1. The Newton-Raphson algorithm calculates the multiplicative inverse of the denominator through iterative processing. A 32-bit floating point divider is designed using a 32-bit floating point multiplier module based on 24-bit Vedic multiplication and a 32-bit floating point subtractor module. Synthesis results on a Xilinx Spartan 6 FPGA show the resource utilization and propagation delay for the proposed design, which is
This document discusses optimizing the gate-level area of digit-serial finite impulse response (FIR) filter designs using multiple constant multiplication (MCM) blocks. It introduces the problem of designing a digit-serial MCM operation with minimal area and presents algorithms to formalize the area optimization problem. Results show the proposed algorithms and digit-serial MCM architectures efficiently design digit-serial MCM operations and FIR filters with reduced area compared to existing approaches.
Implementation of Low-Complexity Redundant Multiplier Architecture for Finite...ijcisjournal
In the present work, a low-complexity Digit-Serial/parallel Multiplier over Finite Field is proposed. It is
employed in applications like cryptography for data encryption and decryptionto deal with discrete
mathematical andarithmetic structures. The proposedmultiplier utilizes a redundant representation because
of their free squaring and modular reduction. The proposed 10-bit multiplier is simulated and synthesized
using Xilinx VerilogHDL. It is evident from the simulation results that the multiplier has significantly low
area and power when compared to the previous structures using the same representation.
A comparative study of different multiplier designsHoopeer Hoopeer
This document compares four different multiplier designs from the perspectives of power consumption, delay, and area usage. It finds that a carry save adder (CSA) multiplier achieves the lowest power consumption of 0.198mW and delay of 6psec, making it suitable for low-power applications. A modified Booth encoder (MBE) multiplier requires the smallest area of 4030μm2. In general, the four multiplier designs show tradeoffs between power, speed, and area depending on the specific implementation and modification techniques used.
IRJET- The RTL Model of a Reconfigurable Pipelined MCMIRJET Journal
This document presents two Register Transfer Level (RTL) models for a reconfigurable pipelined multiple constant multiplier (MCM) architecture. The first model uses a ripple carry adder, while the second model replaces it with a carry lookahead adder. Both models are constructed using Verilog and simulated using ModelSim. Simulation results show the second model has a 3.53% higher maximum clock frequency and throughput due to the faster carry lookahead adder. It also has 3.53% lower latency but uses 1.65% more power. The second model provides faster performance with moderate increases in resource usage and power consumption.
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...IRJET Journal
This document describes a proposed MAC (multiply and accumulate) unit design that aims to improve performance and reduce resource usage compared to conventional pipeline MAC unit designs. The proposed design uses Booth encoding to reduce the number of partial products, groups the partial products into blocks that are added using multi-operand adders, and implements circular convolution by rearranging the partial products. Simulation results show that the proposed design achieves higher performance and lower resource usage than conventional pipeline and redundant carry-save MAC unit designs. The design is synthesized on an Altera Stratix III FPGA to take advantage of fast carry chains.
Design and implementation of address generator for wi max deinterleaver on fpgaeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document summarizes a research paper on designing low power digit serial finite impulse response (FIR) filters using multiple constant multiplication (MCM) techniques. It proposes an architecture that optimizes the area of digit serial MCM operations at the gate level by considering the implementation costs of digit serial addition, subtraction and shift operations. An algorithm is presented to design digit serial FIR filters under a shift-adds architecture to reduce area compared to designs using generic digit serial multipliers. Experimental results show the technique leads to lower complexity digit serial MCM designs.
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...ijceronline
In this paper, high linear architectures for analysing the first two maximum or minimum values are of paramount importance in several uses, including iterative decoders. We proposed the adder and LDPC. The min-sum processing step that it gives only two different output magnitude values irrespective of the number of incoming bit-to check messages. These new micro-architecture layouts would employ the minimum number of comparators by exploiting the concept of survivors in the search. These would result in reduced number of comparisons and consequently reduced energy use. Multipliers are complex units and play an important role in finding the overall area, speed and power consumption of digital designs. By using the multiplier we can minimize the parameters like latency, complexity and power consumption.
IRJET- Efficient Design of Radix Booth MultiplierIRJET Journal
The document proposes a method to optimize binary radix-16 Booth multipliers by reducing the maximum height of the partial product columns from (n + 1)/4 to n/4 for n-bit operands. This is achieved by performing a short carry-propagate addition in parallel to the regular partial product generation, which reduces the maximum height by one row. The method allows further optimizations in the partial product array reduction stage in terms of area, delay, and power. It can also allow additional partial products to be included without increasing delay. The method is generally applicable but provides the most benefit for 64-bit radix-16 Booth multipliers.
Efficient implementation of bit parallel finite field multiplierseSAT Publishing House
This document summarizes an article that presents an efficient implementation of a bit parallel Karatsuba finite field multiplier on FPGA. It begins by introducing finite field arithmetic and multiplication, which is the most resource intensive operation. It then discusses different multiplier designs, including the classical multiplier and Karatsuba multiplier. The Karatsuba multiplier has lower complexity than the classical multiplier by reducing the number of gates required. Experimental results on FPGAs show that the bit parallel Karatsuba multiplier consumes the fewest resources among known FPGA implementations. In summary, the document presents an efficient Karatsuba finite field multiplier design with lower complexity than alternative designs.
Efficient implementation of bit parallel finite eSAT Journals
Abstract Arithmetic in Finite/Galois field is a major aspect for many applications such as error correcting code and cryptography. Addition and multiplication are the two basic operations in the finite field GF (2m).The finite field multiplication is the most resource and time consuming operation. In this paper the complexity (space) analysis and efficient FPGA implementation of bit parallel Karatsuba Multiplier over GF (2m) is presented. This is especially interesting for high performance systems because of its carry free property. To reduce the complexity of Classical Multiplier, multiplier with less complexity over GF (2m) based on Karatsuba Multiplier is used. The LUT complexity is evaluated on FPGA by using Xilinx ISE 8.1i.Furthermore,the experimental results on FPGAs for bit parallel Karatsuba Multiplier and Classical Multiplier were shown and the comparison table is provided. To the best of our knowledge, the bit parallel karatsuba multiplier consumes least resources among the known FPGA implementation. Keywords: Classical Multiplier, Cryptograph, FPGA, Galois field, Karatsuba Multiplier
Arithmetic Operations in Multi-Valued LogicVLSICS Design
This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, keeping minimum number of gates and depth of net in to onsideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice.
A Spurious-Power Suppression technique for a Low-Power MultiplierIOSR Journals
Abstract: This paper presents the design exploration of a spurious-power suppression technique (SPST) which can dramatically reduce the power dissipation of combinational VLSI designs for multimedia/DSP purposes. The proposed SPST separates the target designs into two parts, i.e., the most significant part and least significant part (MSP and LSP), and turns off the MSP when it does not affect the computation results to save power. The objective of a good multiplier is to provide a physically compact, good speed and low power consuming chip. To save significant power consumption of a VLSI design, it is a good direction to reduce its dynamic power that is the major part of total power dissipation. In this paper, we propose a high speed low-power multiplier adopting the new SPST implementing approach. This multiplier is designed by equipping the Spurious Power Suppression Technique (SPST) on a modified Booth encoder which is controlled by a detection unit using an AND gate. The modified booth encoder will reduce the number of partial products generated by a factor of 2. The SPST adder will avoid the unwanted addition and thus minimize the switching power dissipation. Keywords-Booth encoder; low power; spurious power suppression technique(SPST); SPST-Adder
The document presents a new reversible logic gate called BBCDC (Binary to BCD conversion) and a more effective realization of a BCD adder circuit using the proposed BBCDC gate. The BBCDC is a 5x5 reversible gate that converts binary numbers to BCD format. The proposed BCD adder uses DKFG reversible gates for addition and the BBCDC gate for binary to BCD conversion. A comparison shows the proposed design uses fewer gates and garbage outputs than existing BCD adder designs. The efficient design of the BCD adder depends on the reversible ripple carry adder and the reversible binary to BCD converter used.
IRJET- Design and Implementation of Code Converters for High Speed Multiplier...IRJET Journal
This document describes the design and implementation of code converters for high speed multiplier applications. It proposes an 8-bit and 16-bit BCD to binary code converter using shifters and adders to reduce delay compared to existing converters. A 4-bit binary multiplier is also developed using multiplexers as the base for an 8-bit BCD multiplication. The converters and multiplier are used in a decimal multiplication architecture based on the 'Urdhava Triyagbhyam' sutra. Simulation results on FPGA and using a 45nm Cadence technology library show the 16-bit converter has lower delay than the 8-bit converter. The proposed designs aim to improve performance of decimal arithmetic operations.
This document presents a comparative analysis of low power 4-bit multipliers designed using 120nm CMOS technology. It discusses different low power multiplier designs that utilize row bypassing, column bypassing, and 2-dimensional bypassing techniques to reduce power consumption by disabling operations in rows or columns where the partial products are zero. Simulation results show that the Braun 2-dimensional bypassing multiplier has the lowest power consumption of 0.31mw and shortest delay of 11ns, making it the most effective design in terms of power and area for a 1GHz multiplier. Layout designs of components like full adders, tri-state buffers, and multiplexers used in the bypassing multiplier architectures are also presented.
REDUCTION OF BUS TRANSITION FOR COMPRESSED CODE SYSTEMSVLSICS Design
Low power VLSI circuit design is one of the most important issues in present day technology. One of the ways of reducing power is to reduce the number of transitions on the bus. The main focus here is to present a method for reducing the power consumption of compressed-code systems by inverting the bits that are transmitted on the bus. Compression will generally increase bit-toggling, as it removes redundancies from the code transmitted on the bus. Arithmetic coding technique is used for compression /decompression and bit-toggling reduction is done by using shift invert coding technique. Therefore, there is also an additional challenge, to find the right balance between compression ratio and the bit-toggling reduction. This technique requires only 2 extra bits for the low Power coding, irrespective of the bit-width of the bus for compressed data.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
The document presents a generic architecture for an area-efficient 4-input binary coded decimal (BCD) adder implemented on an FPGA. It modifies a previously proposed area-efficient 3-input decimal adder to support a generic number of inputs. The proposed 4-input adder has four stages: 1) carry save addition and propagation/generation signal generation, 2) carry network, 3) correction stage, and 4) final addition. Simulation results on a Xilinx FPGA for different number of bits and inputs are presented, showing the adder has reduced delay and area compared to previous approaches. The generic approach can support addition of any number of inputs.
This document discusses binary coded decimal (BCD) and decimal decoders. It begins by introducing the team members and then provides information about BCD, including that each decimal digit is represented by 4 bits. It describes how BCD is commonly used in electronic displays. It then discusses decimal decoders, including how they work to decode a BCD input into one of ten decimal outputs. It provides examples of 2-4 and BCD to decimal decoders. It concludes by discussing other types of decoders and how decoders can be used in logic design.
This document contains an assignment given to students with 17 questions covering various digital logic design topics. The questions cover converters between BCD and excess-3 code and BCD and gray code using K-maps, magnitude comparators, encoders and decoders, 7-segment displays, multiplexers, demultiplexers, adders including look ahead carry and BCD adders, shift registers, counters including asynchronous and Johnson counters, and sequential circuits. The assignment was prepared by Prashant Kumar and verified by the head of department.
This document presents a comparative study of implementing single precision floating point division using different computational algorithms on FPGAs. It describes two commonly used division algorithms: Goldschmidt and Newton-Raphson. The Goldschmidt algorithm continually multiplies the numerator and denominator by a common factor to converge the denominator to 1. The Newton-Raphson algorithm calculates the multiplicative inverse of the denominator through iterative processing. A 32-bit floating point divider is designed using a 32-bit floating point multiplier module based on 24-bit Vedic multiplication and a 32-bit floating point subtractor module. Synthesis results on a Xilinx Spartan 6 FPGA show the resource utilization and propagation delay for the proposed design, which is
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
High Performance FPGA Based Decimal-to-Binary Conversion SchemesSilicon Mentor
Here we represent high performance FPGA based decimal to binary conversion scheme to support BCD arithmetic based on binary hardware .The architecture presented here requires less LUTs as compare to others and delay is also reduced by the help of shifters in place of multipliers.
For more info visit us at:
http://www.siliconmentor.com/
Design and Implementation of Encoder for (15, k) Binary BCH Code Using VHDL a...IOSR Journals
Abstract: In this paper we have designed and implemented(15, k) a BCH Encoder on FPGA using VHDL for reliable data transfers in AWGN channel with multiple error correction control. The digital logic implementation of binary encoding of multiple error correcting BCH code (15, k) of length n=15 over GF (24) with irreducible primitive polynomial x4+x+1 is organized into shift register circuits. Using the cyclic codes, the reminder b(x) can be obtained in a linear (15-k) stage shift register with feedback connections corresponding to the coefficients of the generated polynomial. Three encoder are designed using VHDL to encode the single, double and triple error correcting BCH code (15, k) corresponding to the coefficient of generated polynomial. Information bit is transmitted in unchanged form up to k clock cycles and during this period parity bits are calculated in the LFSR then the parity bits are transmitted from k+1 to 15 clock cycles. Total 15-k numbers of parity bits with k information bits are transmitted in 15 code word. Here we have implemented (15, 5, 3), (15, 7, 2) and (15, 11, 1) BCH code encoder on Xilinx Spartan 3 FPGA using VHDL and the simulation & synthesis are done using Xilinx ISE 13.3. BCH encoders are conventionally implemented by linear feedback shift register architecture. Encoders of long BCH codes may suffer from the effect of large fan out, which may reduce the achievable clock speed. The data rate requirement of optical applications require parallel implementations of the BCH encoders. Also a comparative performance based on synthesis & simulation on FPGA is presented. Keywords: BCH, BCH Encoder, FPGA, VHDL, Error Correction, AWGN, LFSR cyclic redundancy checking, fan out .
This document describes a proposed VLSI implementation of a high-speed DCT architecture for H.264 video codec design. It presents a Booth radix-8 multiplier-based multiply-accumulate (MAC) unit to improve throughput and minimize area complexity for 8x8 2D DCT computation. The proposed MAC architecture achieves a maximum operating frequency of 129.18MHz while reducing area by 64% compared to a regular merged MAC unit with a conventional multiplier. FPGA implementation and performance analysis demonstrate the suitability of the proposed DCT architecture for applications in HDTV systems.
1. The document discusses various types of code conversions including binary to BCD, BCD to excess-3, and gray to excess-3. Circuit designs and truth tables are provided for each conversion.
2. Applications of different codes are covered. BCD is used for 7-segment displays. Gray code is used in shaft encoders. Excess-3 is used for subtraction.
3. The document concludes by designing a BCD to 7-segment display decoder circuit and explaining the application of gray code in shaft encoders.
The document describes a design for a low power 32x32 multiplier that combines Booth and Vedic multiplication architectures. It partitions each 32-bit input into two 16-bit blocks, uses 16x16 Booth multipliers to generate partial products for each block, and employs 16x16 Vedic multipliers and carry select adders to add the partial products. This combined architecture achieves lower power and faster performance than individual Booth or Vedic multipliers. The design is implemented using Xilinx Vivado and evaluated for applications such as floating point multiplication.
33 8951 suseela g suseela paper8 (edit)new2IAESIJEECS
The contemporary advancements in CMOS technology and Multimedia systems enabled image communication over resource constrained Visual Sensor Network (VSN). However, the size of image data is huge. It is wise to send the compressed image over the bandwidth limited network. Compression algorithms also should be of energy efficient with low complexity and should offer acceptable image quality. In this work an image coder offering low bit rate less than 0.5bpp is aimed. In this paper, a low memory and energy efficient skipped zonal based Binary DCT is proposed that codes the transformed samples using Golomb Rice code. Simulation results offered low bitrate 0.39 bpp with PSNR of 27.34 dB for the standard gray scale test image of Lena size 512 x512 pixels.
34 8951 suseela g suseela paper8 (edit)newIAESIJEECS
The document proposes a low memory and energy efficient image compression technique called Skipped Zonal based Binary DCT (SZBinDCT) for transmitting images over resource constrained visual sensor networks. SZBinDCT first removes alternating rows and columns of pixels to reduce redundancy. It then applies a low complexity Binary DCT transform to the remaining pixels. Only the top left zone of transformed coefficients containing most energy are considered. These coefficients are quantized and entropy encoded using Golomb Rice coding. Simulation results show the technique compresses images to 0.39 bpp with 27.34 dB PSNR while consuming only 11.352 microjoules of energy per 8x8 block, significantly less than other techniques. The low complexity compression allows extended
33 8951 suseela g suseela paper8 (edit)new2IAESIJEECS
Modern manufacturing methods permit the study and prediction of surface roughness since the acquisition of signals and its processing is made instantaneously. With the availability of better computing facilities and newer algorithms in the machine learning domain, online surface roughness prediction will lead to the manufacture of intelligent machines that alert the operator when the process crosses the specified range of roughness. Prediction of surface roughness by multiple linear regression, regression tree and M5P tree methods using multivariable predictors and a single response dependent variable Ra (surface roughness) is attempted. Vibration signal from the boring operation has been acquired for the study that predicts the surface roughness on the inner face of the workpiece. A machine learning approach was used to extract the statistical features and analyzed by four different cases to achieve higher predictability, higher accuracy, low computing effort and reduction of the root mean square error. One case among them was carried out upon feature reduction using Principle Component Analysis (PCA) to examine the effect of feature reduction.
Fpga implementation of (15,7) bch encoder and decoder for text messageeSAT Journals
Abstract In a communication channel, noise and interferences are the two main sources of errors occur during the transmission of the message. Thus, to get the error free communication error control codes are used. This paper discusses, FPGA implementation of (15, 7) BCH Encoder and Decoder for text message using Verilog Hardware Description Language. Initially each character in a text message is converted into binary data of 7 bits. These 7 bits are encoded into 15 bit codeword using (15, 7) BCH encoder. If any 2 bit error in any position of 15 bit codeword, is detected and corrected. This corrected data is converted back into an ASCII character. The decoder is implemented using the Peterson algorithm and Chine’s search algorithm. Simulation was carried out by using Xilinx 12.1 ISE simulator, and verified results for an arbitrarily chosen message data. Synthesis was successfully done by using the RTL compiler, power and area is estimated for 180nm Technology. Finally both encoder and decoder design is implemented on Spartan 3E FPGA. Index Terms: BCH Encoder, BCH Decoder, FPGA, Verilog, Cadence RTL compiler
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Designing of Adders and Vedic Multiplier using Gate Diffusion InputIRJET Journal
This document discusses the design of adders and Vedic multipliers using Gate Diffusion Input (GDI) logic. GDI logic is introduced as an alternative to traditional CMOS logic for low power VLSI design. Various adders including Ripple Carry Adder, Kogge Stone Adder, and Brent Kung Adder are designed using GDI logic. Vedic multipliers are also designed using the Urdhva-Tiryagbhyam multiplication technique along with the GDI-based adders. Comparative analysis shows that designs using GDI logic have lower power consumption, transistor count, and area compared to equivalent CMOS designs.
Design and Implementation of Different types of Carry skip adderIRJET Journal
The document describes the design and implementation of different types of carry skip adders. It begins with an introduction to carry skip adders and their advantages over other adder types in terms of speed, area usage, and transistor count. It then reviews existing carry skip adder designs and their limitations. A new design called the Common Boolean Logic (CBL) carry skip adder is proposed that aims to reduce area and power consumption by eliminating redundant adder cells through shared logic. Simulation results show that an 8-bit CBL carry skip adder has 64.6% lower power and 18.7% smaller area than a conventional carry skip adder. In conclusion, the CBL carry skip adder achieves improved performance and efficiency.
Increasing data storage of coloured QR code using compress, multiplexing and ...journalBEEI
Quick response (QR) code is a printed code of black and white squares that is able to store data without the use of any of the electronic devices. There are many existing researches on coloured QR code to increase the storage capacity but from time to time the storage capacity still need to be improved. This paper proposes the use ofcompress, multiplexing and multilayered techniques, as an integrated technique known as CoMM, to increase the storage of the existing QR code. The American Standard Code for Information Interchange (ASCII) text characters are used as an input and performance is measured by the number of characters that can be stored in a single black and white QR code version 40. The experiment metrics also include percentage of missing characters, number of produced QR code, and elapsed time to create the QR code. Simulation results indicate that the proposed algorithm stores 24 times more characters than the black and white QR code and 9 times more than other coloured QR code. Hence, this shows that the coloured QR code has the potential of becoming useful mini-data storage as it does not rely on internet connection.
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...Hari M
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTIPLIER:
Reduce the maximum height of the partial product columns to [n/4] for n = 64-bit unsigned
operand. This is in contrast to the conventional maximum height of [(n + 1)/4].
The multiplier algorithm is normally used for higher bit length applications and ordinary multiplier is good for lower order bits.
Simulation of Turbo Convolutional Codes for Deep Space MissionIJERA Editor
In satellite communication deep space mission are the most challenging mission, where system has to work at very low Eb/No. Concatenated codes are the ideal choice for such deep space mission. The paper describes simulation of Turbo codes in SIMULINK . The performance of Turbo code is depend upon various factor. In this paper ,we have consider impact of interleaver design in the performance of Turbo code. A details simulation is presented and compare the performance with different interleaver design .
Similar to Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 Codes (20)
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
The chapter Lifelines of National Economy in Class 10 Geography focuses on the various modes of transportation and communication that play a vital role in the economic development of a country. These lifelines are crucial for the movement of goods, services, and people, thereby connecting different regions and promoting economic activities.
2. 37 International Journal for Modern Trends in Science and Technology
K. Swamiji, N. Praveen Kumar : Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 Codes
Moreover, the aggressive cycle time of these
processors puts an additional constraint on the
use of parallel techniques [6], [19], [30] for reducing
the latency of DFP multiplication in
high-performance DFPUs. Thus, efficient
algorithms for accelerating DFP multiplication
should result in regular VLSI layouts that allow an
aggressive pipelining.
Hardware implementations normally use BCD
instead of binary to manipulate decimal fixed-point
operands and integer significands of DFP numbers
for easy conversion between machine and user
representations [21], [25]. BCD encodes a number
X in decimal (non-redundant radix-10) format,
with each decimal digit Xi ; represented in a 4-bit
binary number system. However, BCD is less
efficient for encoding integers than binary, since
codes 10 to 15 are unused. Moreover, the
implementation of BCD arithmetic has more
complications than binary, which lead to area and
delay penalties in the resulting arithmetic units. A
variety of redundant decimal formats and
arithmetics have been proposed to improve the
performance of BCD multiplication. The BCD
carry-save format [9] represents a radix-10
operand using a BCD digit and a carry bit at each
decimal position. It is intended for carry-free
accumulation of BCD partial products using rows
of BCD digit adders arranged in linear [9], [20] or
tree-like configurations [19]. Decimal signed-digit
(SD) representations [10], [14], rely on a redundant
digit set to
allow decimal carry-free addition.
Furthermore, these codes are
self-complementing, so that the 9’s complement of
a digit, required for negation, is easily obtained by
bit-inversion of its 4-bit representation. A
disadvantage of 4221 and 5211 codes, is the use of
a non-redundant radix-10 digit set [0, 9] as BCD.
Thus, the redundancy is constrained to the digit
bounds, so that complex decimal multiples, such
as X, cannot be obtained in a carry-freeway.
In this work, we focus on the improvement of
parallel decimal multiplication by exploiting the
redundancy of two decimal representations: the
ODDS and the redundant BCD excess-3 (XS-3)
representation, a self-complementing code with the
digit set [ 3, 12]. We use a minimally redundant
digit set for the recoding of the BCD multiplier
digits, the signed-digit radix-10 recoding [30], that
is, the recoded signed digits are in the set
For this digit
set, the main issue is to perform the multiple
without long carry-propagation (note that and are
easy multiples for decimal [30] and that is
generated as two consecutive operations). We
propose the use of a general redundant BCD
arithmetic (that includes the ODDS, For this digit
set, the main issue is to perform the multiple
without long carry-propagation (note that and are
easy multiples for decimal [30] and that is
generated as two consecutive operations). We
propose the use of a general redundant BCD
arithmetic (that includes the ODDS,XS-3 and BCD
representations) to accelerate parallel BCD
multiplication in two ways:
Partial product generation (PPG). By generating
positive multiplicand multiples coded in XS-3 in a
carry-free form. An advantage of the XS-3
representation over non-redundant decimal codes
(BCD and 4221/5211 [30]) is that all the
interesting multiples for decimal partial product
generation, including the X multiple, can be
implemented in constant time with an equivalent
delay of about three XOR gate levels. Moreover,
since XS-3 is a self-complementing code, The 9’s
complement of a positive multiple can be obtained
by just inverting its bits as in binary. Partial
product reduction (PPR). By performing the
reduction of partial products coded in ODDS via
binary carry-save arithmetic. Partial products can
be recoded from the XS-3 representation to the
ODDS representation by just adding a constant
factor into the partial product reduction tree. The
resultant partial product reduction tree is
implemented using regular structures of binary
carry-save adders or compressors. The 4-bit binary
encoding of ODDS operands allows a more efficient
mapping of decimal algorithms into binary
techniques. By contrast signed-digit radix-10 and
BCD carry-save redundant representations require
specific radix-10 digit adders [14], [22], [27].
The paper is organized as follows. Section 2
introduces formally the redundant BCD
representations used in this work. Section 3
outlines the high level implementation (algorithm
and architecture) of the proposed BCD parallel
multiplier. In Section 4 we describe the techniques
developed for the generation of decimal partial
products. Decimal partial product reduction and
the final conversion to a non-redundant BCD
product are detailed in Sections 5 and 6
respectively.
II. REDUNDANT BCD REPRESENTATIONS
The proposed decimal multiplier uses internally
a redundant BCD arithmetic to speed up and
simplify the implementation. This arithmetic deals
3. 38 International Journal for Modern Trends in Science and Technology
K. Swamiji, N. Praveen Kumar : Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 Codes
with radix-10 ten’s complement integers of
theform:
where d is the number of digits, sz is the sign bit,
On the other hand, the binary value of the 4-bit
vector representation of Zi is given by
zi;j being the jth bit of the ith digit. Therefore, the
value of digit Zi can be obtained by subtracting the
excess e of the representation from the binary value
of its 4-bit encoding, that is,
Note that bit-weighted code such as BCD
and ODDS use the 4-bit binary encoding (or BCD
encoding) defined in Expression (2). Thus, Zi Zi for
operands Z represented in BCD or ODDS. This
binary encoding simplifies the hardware
implementation of decimal arithmetic units, since
we can make use of state-of-the-art binary logic
and binary arithmetic techniques to implement
digit operations. In particular, the ODDS
representation presents interesting properties
(redundancy and binary encoding of its digit set) for
a fast and efficientimplementation of
multi-operand addition. Moreover, conversions
from BCD to the ODDS representation are straight-
forward, since the digit set of BCD is a subset of the
ODDS representation. In our work we use a SD
radix-10 recoding of the BCD multiplier [30], which
requires to compute a set of decimal multiples
of the BCD
multiplicand The main issue is to perform the X3
multiple without long carry-propagation.
For input digits of the multiplicand in
conventional BCD (i.e., in the range [0, 9], e , r ), the
multiplication by 3 leads to a maximum decimal
carry to the next position of 2 and to a maximum
value of the interim digit (the result digit before
adding the carry from the lower position) of 9.
Therefore the resultant maximum digit (after
adding the decimal carry and the interim digit) is
11. Thus, the range of the digits after the 3
multiplication is in the range [0, 11]. Therefore the
redundant BCD representations can host the
resultant digits with just one decimal carry
propagation. An important issue for this
representation is the ten’s complement operation.
Since after the recoding of the multiplier digits,
negative multiplication digits may result, it is
necessary to negate (ten’s complement) the
multiplicand to obtain the negative partial
products. This operation is usually done by
computing the nine’s complement of the
multiplicand and adding a one in the proper place
on the digit array. The nine’s complement of a
positive decimal operand is given by
The implementation of 9-Zi leads to a complex
implementation, since the Zi digits of the multiples
generated may take values higher than 9. A simple
implementation is obtained by observing that the
excess-3 of the nine’s complement of an operand is
equal to the bit-complement of the operand coded
in excess-3.
Table 1: Nine’s Complement for the XS -3 Representation
In Table 1 we show how the nine’s complement
can be performed by simply inverting the bits of a
digit Zi coded in XS-3. At the decimal digit level,
this is due to the fact that:
( 9 _ Zi ) + 3 = 15 – ( Zi + 3 )
for the ranges Therefore to
have a simple negation for partial product
generation we produce the decimal multiples in an
excess-3 code. The negation is performed by simple
bit inversion, that corresponds to the excess-3 of
the nine’s complement of the multiple. Moreover, to
simplify the implementation we combine the
multiple generation stage and the digit increment
by 3(to produce the excess-3) into a single module
by using the XS-3 code (more details in Section
4.1).In summary, the main reasons for using the
redundant XS-3 code are: (1) to avoid long
carry-propagations in the generation of decimal
positive multiplicand multiples, (2) to obtain the
negative multiples from the corresponding positive
ones easily, (3) simple conversion of the partial
products generated in XS-3 to the ODDS
representation
4. 39 International Journal for Modern Trends in Science and Technology
K. Swamiji, N. Praveen Kumar : Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 Codes
III. HIGH-LEVEL ARCHITECTURE
The high-level block diagram of the proposed
parallel architecture for d d-digit BCD decimal
integer and fixed-point multiplication is shown in
Fig. 1. This architecture accepts conventional
(non-redundant) BCD inputs X, Y , generates
redundant BCD partial products PP, and computes
the BCD product P X Y . It consists of the following
three stages (1) Parallel generation of partial
products coded in XS-3, including generation of
multiplicand multiples and recoding of the
multiplier operand, (2) recoding of partial products
from XS-3 to the ODDS representation and
subsequent reduction, and (3) final conversion to a
non-redundant d-digit BCD product..
Stage 1) Decimal partial product generation. A
SDradix-10 recoding of the BCD multiplier has
been used. This recoding produces a reduced
number of partial products that leads to a
significant reduction in the overall multiplier area
[29]. Therefore, the recoding of the d-digit
multiplier Y into SD radix-10 digits Y ; ;Yb
,produces d partial products PP d ; ;PP ,one per
digit; note that each Ybk recoded digit is
represented in a 6–bit hot-one code to be used as
control input of the multiplexers for selecting the
proper multiplicand multiple, An additional partial
product PP d is produced by the most significant
multiplier digit after the recoding, so that the total
number of partial roducts generated is d .
Stage 2)Decimal partial product reduction. In this
stage, the array of d+1 ODDS partial products are
reduced to two 2d -digit words (A, B). Our proposal
relies on a binary carrysave adder tree to perform
carry-free additions of the decimal partial
products. The array of d +1 ODDS partial products
can be viewed as adjacent digit columns of height
h<d + 1.
Since ODDS digits are encoded in binary, the
rules for binary arithmetic apply within the digit
bounds, and only carries generated between
radix-10 digits (4-bit columns) contribute to the
decimal correction of the binary sum. That is, If a
carry out is produced as a result of a 4-bit (modulo
16) binary addition, the binary sum must be
incremented by 6 at the appropriate position to
obtain the correct decimal sum (modulo 10
addition).
Stage 3) Conversion to (non-redundant) BCD. We
consider the use of a BCD carry-propagate adder
[29] to perform the final conversion to a
non-redundant BCD product P= A+ B. The
proposed architecture is a d-digit hybrid parallel
prefix/carry-select adder, the BCD Quaternary
Tree adder (see Section 6). The sum of input digits
Ai, Bi at each position i has to be in the range
[0,18]; so that at most one decimal carry is
propagated to the next position i+1 [22].
Furthermore, to generate the correct decimal carry,
the BCD addition algorithm implemented requires
Ai+ Bi to be obtained in excess-6. Several choices
are possible. We opt for representing operand A in
BCD excess-6
IV. DECIMAL PARTIAL PRODUCT GENERATION
The partial product generation stage comprises
the recoding of the multiplier to a SD radix-10
representation, the calculation of the multiplicand
multiples in XS-3 code and the generation of the
ODDS partial products.
The negative multiples are obtained by ten’s
complementing the positive ones. This is
equivalent to taking the nine’s complement of the
positive multiple and then adding 1. As we have
shown in Section 2, the nine’s complement can be
obtained simply by bit inversion. This needs the
positive multiplicand multiples to be coded in
XS-3,with digits in ; .The d least significant partial
products PP d ; PP are generated from digits Ybk by
using a set of 5:1 muxes, as shown in Fig. 2. The
xor gates at the output of the mux invert the
multiplicand multiple, to obtain its 9’s
complement, if the SD radix-10 digit is negative
(Ysk =1 ).
On the other hand, if the signals
are all zero then PP k , but it
has to be coded in XS-3 (bit encoding 0011). Then,
to set the two least significant bits to 1, the input to
the XOR gate is Ysk Ysk Ybk is zero ( denotes the
boolean OR operator), where Ybk iszero equals 1 if
5. 40 International Journal for Modern Trends in Science and Technology
K. Swamiji, N. Praveen Kumar : Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 Codes
all the signals (Y1 k;Y2 k;Y3 k;Y4 k;Y5 k) are zero.
In addition, the partial product signs are encoded
into their MSDs (see Section 4.2). The generation of
the most significant partial product PPand only
depends on Ysd , the sign of the most significant
SD radix-10 digit.
4.1 Generation of the Multiplicand Multiples
We denote by NX X; X; X; X; X , the set of
multiplicand multiples coded in the XS-3
representation, with digits NXi ; , being NXi NXi ;
the corresponding value of the 4-bit binary
encoding of NXi given by Equation (2).Fig. 3 shows
the high-level block diagram of the multiples
generation with just one carry propagation. This is
performed in two steps
1) digit recoding of the BCD multiplicand digits Xi
into a decimal carry and a digit
such as
being Tmax the maximum possible value for the
decimal carry.
2) The decimal carries transferred between
adjacent digits are assimilated obtaining the
correct 4-bit representation of XS-3 digits NXi, that
is
4.2 Most-Significant Digit Encoding
The MSD of each PP k , PPd k , is directly
obtained in the ODDS representation. Note that
these digits store the carries generated in the
computation of the multiplicand multiples and the
sign bit of the partial product.
4.3 Correction Term
The resultant partial product sum has to be
corrected off the-critical-path by adding a
precomputed term, fc which only depends on the
format precision d. This term has to gather: (a) the
constants that have not been included in the MSD
encoding and (b) a constant for every XS-3 partial
product digit (introduced to simplify the nine’s
complement operation). Actually, the addition of
these constants is equivalent to convert the XS-3
digits of the partial products to the ODDS
representation. Note that the 4-bit encoding of a
XS-3 digit.
6. 41 International Journal for Modern Trends in Science and Technology
K. Swamiji, N. Praveen Kumar : Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 Codes
4.4.Product Array
Fig. 4 illustrates the shape of the partial product
array, particularizing for d =16. Note that the
maximum digit column height
Is d+1.
V. DECIMAL PARTIAL PRODUCT REDUCTION
The PPR tree consists of three parts: (1) a regular
binary CSA tree to compute an estimation of the
decimal partial product sum in a binary carry-save
form (S, C), (2) a sum correction block to count the
carries generated between the digit columns, and
(3) a decimal digit 3:2 compressor which
increments the carry-save sum according to the
carries count to obtain the final double-word
product (A;B), A being represented with excess-6
BCD digits and B being represented with BCD
digits. The PPR tree can be viewed as adjacent
columns of h ODDS digits each, h being the
column height (see Fig. 4), and h < = d+1.Fig. 5
shows the high-level architecture of a column of
the PPR tree (the ith column) with h ODDS digits in
[0, 15]. (4 bits per digit). Each digit column of the
binary CSA tree (the gray colored box in Fig. 5)
reduces the h input digits and n cin input carry
bits, transferred from the previous
VI. FINAL CONVERSION TO BCD
The selected architecture is a 2d -digit hybrid
parallel prefix/ carry-select adder, the BCD
Quaternary Tree adder. The delay of this adder is
slightly higher to the delay of a binary adder of 8d
bits with a similar topology. The decimal carries are
computed using a carry prefix tree, while two
conditional BCD digit sums are computed out of
the critical path using 4-bit digit adders which
implements [Ai] + Bi+ 0 and [A] + Bi+1.These
conditional sums correspond to each one of the
carry input values. If the conditional carry out from
a digit isone, the digit adder performs a -6
subtraction. The selection of the appropriate
conditional BCD digit sums is implemented with a
final level of 2 : 1 multiplexers. To design the carry
prefix tree we analyzed the signal arrival profile
from the PPRT tree, and considered the use of
different prefix tree topologies to optimize the area
for the minimum delay adder.
VII. RESULTS AND CONCLUSION
We had verified this by writing the VHDL code ,
simulated and synthesized on FPG board. The
following results have been shown below in these
two examples we have given two different values
and seen the correct values. We have taken two 4
bit BCD number and performed multiplication.
Conclusion:
Finally we have observed that this product is
better than older BCD multipliers. We have
implemented with VHDL and simulated along with
synthesis on Sparton -3 FPGA board. We have
dumped into Xilinx Chip (XCV3S400E-6s). The
area has been minimized by 24% which shows the
decrease of power consumption by 32%.
REFERENCES
[1] Alvaro Vazquez, Member, IEEE, Elisardo Antelo, and
Javier D. Bruguera, Member, IEEE “Fast Radix-10
Multiplication Using Redundant BCD Codes “IEEE
7. 42 International Journal for Modern Trends in Science and Technology
K. Swamiji, N. Praveen Kumar : Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 Codes
TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 8,
AUGUST 2014
[2] A. Aswal, M. G. Perumal, and G. N. S. Prasanna, “On
basic finanial decimal operations on binary
machines,” IEEE Trans. Comput.,vol. 61, no. 8, pp.
1084–1096, Aug. 2012.
[3] M. F. Cowlishaw, E. M. Schwarz, R. M. Smith, and C.
F. Webb, “A decimal floating-point specification,” in
Proc. 15th IEEE Symp.Comput. Arithmetic, Jun.
2001, pp. 147–154.
[4] M. F. Cowlishaw, “Decimal floating-point: Algorism
for computers,” in Proc. 16th IEEE Symp. Comput.
Arithmetic, Jul. 2003,pp. 104–111.
[5] S. Carlough and E. Schwarz, “Power6 decimal
divide,” in Proc. 18th IEEE Symp. Appl.-Specific
Syst., Arch., Process., Jul. 2007, pp. 128–133.
[6] S. Carlough, S. Mueller, A. Collura, and M. Kroener,
“The IBM zEnterprise-196 decimal floating point
accelerator,” in Proc. 20th IEEE Symp. Comput.
Arithmetic, Jul. 2011, pp. 139–146.
[7] L. Dadda, “Multioperand parallel decimal adder: A
mixed binary and BCD approach,” IEEE Trans.
Comput., vol. 56, no. 10, pp. 1320–1328, Oct. 2007.
[8] L. Dadda and A. Nannarelli, “A variant of a Radix-10
combinational multiplier,” in Proc. IEEE Int. Symp.
Circuits Syst., May 2008, pp. 3370–3373.
[9] L. Eisen, J. W. Ward, H.-W. Tast, N. Mading, J.
Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M.
Schwarz, and S. R. Carlough, “IBM POWER6
accelerators: VMX and DFU,” IBM J. Res. Dev., vol.
51, no. 6, pp. 663–684, Nov. 2007.
[10]M. A. Erle and M. J. Schulte, “Decimal multiplication
via carry- save addition,” in Proc. IEEE Int. Conf
Appl.-Specific Syst., Arch., Process., Jun. 2003, pp.
348–358
[11]M. A. Erle, E. M. Schwarz, and M. J. Schulte,
“Decimal multiplication with efficient partial product
generation,” in Proc. 17th IEEE
[12]Faraday Tech. Corp. (2004). 90nm UMC L90
standard performance low-K library (RVT). [Online].
Available: http://freelibrary.faraday-tech.com/
[13]S. Gorgin and G. Jaberipur, “A fully redundant
decimal adder and its application in parallel decimal
multipliers,” Microelectron. J., vol. 40, no. 10, pp.
1471–1481, Oct. 2009.