This document describes a proposed modification to the carry select adder (CSLA) circuit to reduce its area and power consumption. Specifically, it replaces the ripple carry adder with a carry input of 1 with a binary to excess-1 converter (BEC) in the CSLA. This modification is analyzed through logic effort methodology and custom layout simulations in a 0.18um CMOS process. The results show the modified CSLA design reduces area by 113 gates compared to the regular CSLA, with only an 11 gate delay increase. Overall, the proposed BEC-based modification achieves lower area and power for the CSLA.
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
Texture is an image having repetition of patterns. There are two types, static and dynamic texture. Static texture is an image having repetitions of patterns in the spatial domain. Dynamic texture is number of frames having repetitions in spatial and temporal domain. This paper introduces a novel method for dynamic texture coding to achieve higher compression ratio of dynamic texture using 2D-modified Haar wavelet transform. The dynamic texture video contains high redundant parts in spatial and temporal domain. Redundant parts can be removed to achieve high compression ratios with better visual quality. The modified Haar wavelet is used to exploit spatial and temporal correlations amongst the pixels. The YCbCr color model is used to exploit chromatic components as HVS is less sensitive to chrominance. To decrease the time complexity of algorithm parallel programming is done using CUDA (Compute Unified Device Architecture). GPU contains the number of cores as compared to CPU, which is utilized to reduce the time complexity of algorithms.
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
This document describes a proposed low-power CORDIC-based DCT architecture that prioritizes processing of low-frequency DCT coefficients over high-frequency coefficients to reduce power consumption with minimal image quality degradation. It uses a look-ahead CORDIC approach to allow varying the number of CORDIC iterations for different coefficients. Experimental results show the proposed architecture achieves 38.1% area and power savings compared to DA-based DCT, with comparable power to MCM-based DCT but using 100% less area and a minor 0.04dB quality loss.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Design and implementation a prototype system for fusion image by using SWT-PC...IJECEIAES
The technology of fusion image is dominance strongly over domain research for recent years, the techniques of fusion have various applications in real time used and proposed such as purpose of military and remote sensing etc., the fusion image is very efficient in processing of digital image. Single image produced from two images or more information of relevant combining process results from multi sensor fusion image. FPGA is the best implementation types of most technology enabling wide spread.This device works with modern versions for different critical characteristics same huge number of elements logic in order to permit complex algorithm implemented. In this paper,filters are designed and implemented in FPGA utilized for disease specified detection from images CT/MRI scanned where the samples are taken for human's brain with various medical images and the processing of fusion employed by using technique Stationary Wavelet Transform and Principal Component Analysis (SWT-PCA). Accuracy image output increases when implemented this technique and that was done by sampling down eliminating where effects blurring and artifacts doesn't influenced. The algorithm of SWT-PCA parameters quality measurements like NCC, MSE, PSNR, coefficients and Eigen values.The advantages significant of this system that provide real time, time rapid to market and portability beside the change parametric continuing in the DWT transform. The designed and simulation of module proposed system has been done by using MATLAB simulink and blocks generator system, Xilinx synthesized with synthesis tool (XST) and implemented in XilinxSpartan 6-SP605 device.
This document describes the design of a Parallel Self-Timed Adder (PASTA). PASTA is an asynchronous adder circuit that operates in parallel for bits that do not require carry propagation. It uses half adders and multiplexers in a regular structure with minimal interconnections. The document discusses the architecture and operation of PASTA through state diagrams. It also reviews related work on asynchronous and synchronous adder designs. Simulation results using MICROWIND and DSCH tools in 45nm CMOS technology verify the performance advantages of PASTA over existing asynchronous adders in terms of area, power, delay.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Refereed Journal of Engineering and Science (IRJES)irjes
International Refereed Journal of Engineering and Science (IRJES) is a leading international journal for publication of new ideas, the state of the art research results and fundamental advances in all aspects of Engineering and Science. IRJES is a open access, peer reviewed international journal with a primary objective to provide the academic community and industry for the submission of half of original research and applications
The document provides an overview of the Dynamic Reconfigurability in Embedded System Design (DRESD) project. It describes the MicroLAB organization involved in DRESD, various motivations for reconfiguration, definitions of reconfiguration, examples of reconfiguration in everyday life, new frontiers for reconfigurable computing, the DRESD philosophy, involvement of DRESD at Politecnico di Milano and internationally, main DRESD projects, and opportunities to get involved.
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
Texture is an image having repetition of patterns. There are two types, static and dynamic texture. Static texture is an image having repetitions of patterns in the spatial domain. Dynamic texture is number of frames having repetitions in spatial and temporal domain. This paper introduces a novel method for dynamic texture coding to achieve higher compression ratio of dynamic texture using 2D-modified Haar wavelet transform. The dynamic texture video contains high redundant parts in spatial and temporal domain. Redundant parts can be removed to achieve high compression ratios with better visual quality. The modified Haar wavelet is used to exploit spatial and temporal correlations amongst the pixels. The YCbCr color model is used to exploit chromatic components as HVS is less sensitive to chrominance. To decrease the time complexity of algorithm parallel programming is done using CUDA (Compute Unified Device Architecture). GPU contains the number of cores as compared to CPU, which is utilized to reduce the time complexity of algorithms.
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
This document describes a proposed low-power CORDIC-based DCT architecture that prioritizes processing of low-frequency DCT coefficients over high-frequency coefficients to reduce power consumption with minimal image quality degradation. It uses a look-ahead CORDIC approach to allow varying the number of CORDIC iterations for different coefficients. Experimental results show the proposed architecture achieves 38.1% area and power savings compared to DA-based DCT, with comparable power to MCM-based DCT but using 100% less area and a minor 0.04dB quality loss.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Design and implementation a prototype system for fusion image by using SWT-PC...IJECEIAES
The technology of fusion image is dominance strongly over domain research for recent years, the techniques of fusion have various applications in real time used and proposed such as purpose of military and remote sensing etc., the fusion image is very efficient in processing of digital image. Single image produced from two images or more information of relevant combining process results from multi sensor fusion image. FPGA is the best implementation types of most technology enabling wide spread.This device works with modern versions for different critical characteristics same huge number of elements logic in order to permit complex algorithm implemented. In this paper,filters are designed and implemented in FPGA utilized for disease specified detection from images CT/MRI scanned where the samples are taken for human's brain with various medical images and the processing of fusion employed by using technique Stationary Wavelet Transform and Principal Component Analysis (SWT-PCA). Accuracy image output increases when implemented this technique and that was done by sampling down eliminating where effects blurring and artifacts doesn't influenced. The algorithm of SWT-PCA parameters quality measurements like NCC, MSE, PSNR, coefficients and Eigen values.The advantages significant of this system that provide real time, time rapid to market and portability beside the change parametric continuing in the DWT transform. The designed and simulation of module proposed system has been done by using MATLAB simulink and blocks generator system, Xilinx synthesized with synthesis tool (XST) and implemented in XilinxSpartan 6-SP605 device.
This document describes the design of a Parallel Self-Timed Adder (PASTA). PASTA is an asynchronous adder circuit that operates in parallel for bits that do not require carry propagation. It uses half adders and multiplexers in a regular structure with minimal interconnections. The document discusses the architecture and operation of PASTA through state diagrams. It also reviews related work on asynchronous and synchronous adder designs. Simulation results using MICROWIND and DSCH tools in 45nm CMOS technology verify the performance advantages of PASTA over existing asynchronous adders in terms of area, power, delay.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Refereed Journal of Engineering and Science (IRJES)irjes
International Refereed Journal of Engineering and Science (IRJES) is a leading international journal for publication of new ideas, the state of the art research results and fundamental advances in all aspects of Engineering and Science. IRJES is a open access, peer reviewed international journal with a primary objective to provide the academic community and industry for the submission of half of original research and applications
The document provides an overview of the Dynamic Reconfigurability in Embedded System Design (DRESD) project. It describes the MicroLAB organization involved in DRESD, various motivations for reconfiguration, definitions of reconfiguration, examples of reconfiguration in everyday life, new frontiers for reconfigurable computing, the DRESD philosophy, involvement of DRESD at Politecnico di Milano and internationally, main DRESD projects, and opportunities to get involved.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Design and Implementation of Multiplier using Advanced Booth Multiplier and R...IRJET Journal
This document describes a proposed design for a high-speed multiplier that uses a modified Booth algorithm and adaptive hold logic (AHL) along with razor flip-flops to mitigate performance degradation due to aging effects like NBTI and PBTI. The proposed design uses a radix-4 Booth multiplier in place of traditional row/column bypass multipliers to increase throughput. It also includes an AHL circuit that can dynamically determine if an operation requires one or two cycles to complete and can adjust this determination over time to account for aging. Simulation results showed that the proposed design achieved higher performance than fixed-latency multipliers and was more resistant to aging effects.
IRJET- Energy Efficient One Bit Subtractor Circuits for Computing Application...IRJET Journal
The document describes a proposed 10-transistor one-bit full subtractor circuit for computing applications. It begins with an abstract discussing the need for low-power VLSI circuits as technology scales down. It then reviews prior work on subtractor circuit designs and leakage power reduction techniques. The document goes on to propose a new 10-transistor full subtractor design and compares it to 20-transistor and 14-transistor designs in terms of area, delay, power consumption and energy efficiency. Simulation results indicate the proposed 10-transistor design is more energy efficient.
Energy efficiency is one of the most critical issue in design of System on Chip. In Network On
Chip (NoC) based system, energy consumption is influenced dramatically by mapping of
Intellectual Property (IP) which affect the performance of the system. In this paper we test the
antecedently extant proposed algorithms and introduced a new energy proficient algorithm
stand for 3D NoC architecture. In addition a hybrid method has also been implemented using
bioinspired optimization (particle swarm optimization) technique. The proposed algorithm has
been implemented and evaluated on randomly generated benchmark and real life application
such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing algorithm (spiral and crinkle) and has shown better
reduction in the communication energy consumption and shows improvement in the
performance of the system. Comparing our work with spiral and crinkle, experimental result
shows that the average reduction in communication energy consumption is 19% with spiral and
17% with crinkle mapping algorithms, while reduction in communication cost is 24% and 21%
whereas reduction in latency is of 24% and 22% with spiral and crinkle. Optimizing our work
and the existing methods using bio-inspired technique and having the comparison among them
an average energy reduction is found to be of 18% and 24%.
D-Flip Flop Layout: Efficient in Terms of Area and Power IJEEE
Flip flop forms the very basic element for the sequential circuits which are synchronous. This paper talks about D-Flip flop, which has been made area and power efficient with the aid of software tools DSCH 3.1 and Microwind 3.1. D-flip flop is implemented through Nand gates. Layout of DFF designed through auto generated and semi custom is compared, analyses and finally the results are computed showing 57% improvement in area and approximately 2 % reduction in power. CMOS 90nm technology has been used and efforts are made to reduce area and power.
Layout Design Analysis of SR Flip Flop using CMOS TechnologyIJEEE
This paper presents an area, delay and power efficient design of SR flip flop. As the chip manufacturing technology is on the threshold of evaluation, which shrinks a chip in size and enhances its performance, here the flip flop is implemented in a layout level which develops an optimized design using recent CMOS layout tools. The proposed SR flip flop has been designed and simulated using 45nm technology. After that, parametric analysis has been done. In this paper, flip flop has been developed using full automatic design flow and semi-custom design flow. The performance of SR flip flop layouts using different design flows has been analyzed and compared in terms of area, delay and power consumption. The simulation results show that the design of SR flip flop using semi-custom design flow improved the area occupied by 46.9% and power consumption is reduced by 38.4%.
High performance parallel prefix adders with fast carry chain logiciaemedu
This document discusses and compares different types of parallel prefix adders. It begins by introducing binary adders and describing the three stages of prefix addition: pre-computation, prefix-computation, and post-computation. It then describes various parallel prefix adders like Brent-Kung, Kogge-Stone, Ladner-Fischer, and Han-Carlson adders. For FPGA implementation, simple adders typically perform better than parallel prefix adders due to fast carry chains. The document proposes modifying the Kogge-Stone adder using fast carry logic to make it more suitable for FPGAs. Simulation results show that for higher bit widths, the modified Kogge-Stone adder provides better delay than a simple ad
Design of 32 bit Parallel Prefix Adders IOSR Journals
In this paper, we propose 32 bit Kogge-Stone, Brent-Kung, Ladner-Fischer parallel prefix adders. In
general N-bit adders like Ripple Carry Adders (slow adders compare to other adders), and Carry Look Ahead
adders (area consuming adders) are used in earlier days. But now the most Industries are using parallel prefix
adders because of their advantages compare to other adders. Parallel prefix adders are faster and area
efficient. Parallel prefix adder is a technique for increasing the speed in DSP processor while performing
addition. We simulate and synthesis different types of 32-bit prefix adders using Xilinx ISE 10.1i tool. By using
these synthesis results, we noted the performance parameters like number of LUTs and delay. We compare these three adders in terms of LUTs (represents area) and delay values.
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
The Digital Image processing applications like medical imaging, satellite imaging, Biometric trait images
etc., rely on multipliers to improve the quality of image. However, existing multiplication techniques
introduce errors in the output with consumption of more time, hence error free high speed multipliers has
to be designed. In this paper we propose FPGA based Recursive Error Free Mitchell Log Multiplier
(REFMLM) for image Filters. The 2x2 error free Mitchell log multiplier is designed with zero error by
introducing error correction term is used in higher order Karastuba-Ofman Multiplier (KOM)
Architectures. The higher order KOM multipliers is decomposed into number of lower order multipliers
using radix 2 till basic multiplier block of order 2x2 which is designed by error free Mitchell log multiplier.
The 8x8 REFMLM is tested for Gaussian filter to remove noise in fingerprint image. The Multiplier is
synthesized using Spartan 3 FPGA family device XC3S1500-5fg320. It is observed that the performance
parameters such as area utilization, speed, error and PSNR are better in the case of proposed architecture
compared to existing architectures.
Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are
well-suited for VLSI implementations. In this paper, a novel framework is introduced, which allows
the design of parallel-prefix Ling adders. The proposed approach saves one-logic level of
implementation compared to the parallel-prefix structures proposed for the traditional definition of
carry look ahead equations and reduces the fan out requirements of the design. Experimental results
reveal that the proposed adders achieve delay reductions of up to 14 percent when compared to the
fastest parallel-prefix architectures presented for the traditional definition of carry equations
A High Performance Modified SPIHT for Scalable Image CompressionCSCJournals
In this paper, we present a novel extension technique to the Set Partitioning in Hierarchical Trees (SPIHT) based image compression with spatial scalability. The present modification and the preprocessing techniques provide significantly better quality (both subjectively and objectively) reconstruction at the decoder with little additional computational complexity. There are two proposals for this paper. Firstly, we propose a pre-processing scheme, called Zero-Shifting, that brings the spatial values in signed integer range without changing the dynamic ranges, so that the transformed coefficient calculation becomes more consistent. For that reason, we have to modify the initialization step of the SPIHT algorithms. The experiments demonstrate a significant improvement in visual quality and faster encoding and decoding than the original one. Secondly, we incorporate the idea to facilitate resolution scalable decoding (not incorporated in original SPIHT) by rearranging the order of the encoded output bit stream. During the sorting pass of the SPIHT algorithm, we model the transformed coefficient based on the probability of significance, at a fixed threshold of the offspring. Calling it a fixed context model and generating a Huffman code for each context, we achieve comparable compression efficiency to that of arithmetic coder, but with much less computational complexity and processing time. As far as objective quality assessment of the reconstructed image is concerned, we have compared our results with popular Peak Signal to Noise Ratio (PSNR) and with Structural Similarity Index (SSIM). Both these metrics show that our proposed work is an improvement over the original one.
Dg source allocation by fuzzy and sa in distribution system for loss reductio...Alexander Decker
This document presents a new methodology using fuzzy logic and simulated annealing (SA) for optimally placing distributed generators (DGs) in distribution systems. The goal is to reduce real power losses and improve voltage profiles. A two-stage approach is used: (1) Fuzzy logic determines optimal DG locations based on minimizing losses and maintaining voltages. (2) SA determines the optimal DG sizes at these locations to maximize loss reduction. The method is tested on the IEEE 33-bus system and outperforms other approaches in solution quality and computation efficiency.
International Journal of Computational Engineering Research (IJCER) ijceronline
This document describes a system for implementing an artificial neuron using an FPGA. The system first converts analog signals from electrochemical sensors to digital signals using a 12-bit analog-to-digital converter (ADC). It then implements the mathematical operations of a neuron in digital logic on the FPGA, including multiplication, accumulation, and an activation function. Simulation and chipscope results are presented which verify the design and operation of the artificial neuron on the FPGA board. The system provides a modular design that could be expanded to create a complete artificial neural network for processing electrochemical sensor data.
Comparative analysis of multi stage cordic using micro rotation techniqueIAEME Publication
This document presents a comparative analysis of multi-stage CORDIC algorithms using a micro-rotation technique. It proposes a novel pipelined CORDIC architecture for efficient hardware implementation of trigonometric functions. The architecture reduces critical path delay by dividing the CORDIC rotations into multiple pipeline stages. It also suggests a generalized micro-rotation selection technique to reduce the number of iterations needed. Simulation results on a Xilinx FPGA show the proposed design has 51.9% lower slice-delay product and consumes 0.192 watts of power, demonstrating improved efficiency over other CORDIC implementations.
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
This document proposes an efficient CORDIC pipelined FFT algorithm for fingerprint recognition on FPGAs. The CORDIC algorithm uses only shift and add operations, making it suitable for replacing multipliers in the butterfly operations of an FFT. This reduces computational complexity. The proposed system takes a fingerprint, processes it with the CORDIC pipelined FFT, extracts features which are stored and then matched against a test fingerprint for recognition. The algorithm aims to provide an efficient hardware implementation of FFT and fingerprint recognition using minimal computations.
Qualitative Analysis of Optical Interleave Division Multiple Access using Spe...IRJET Journal
This document analyzes the effect of seed length on the performance of an optical interleave division multiple access (IDMA) system using prime inter-leavers. It compares the bit error rate (BER) performance of prime inter-leavers with different seed lengths ranging from 2 to 13 for a fixed number of users and data length. The simulation results show that increasing the seed length from 2 to the maximum single digit prime number 7 decreases the BER significantly, with the optimal BER achieved for a seed length of 7. However, BER increases again for seed lengths in double digits. Similarly for a longer data length, the optimal BER is obtained for a seed length of 7. In conclusion, using prime inter
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
The document discusses video compression using the Set Partitioning in Hierarchical Trees (SPIHT) algorithm and neural networks. It presents the principles of SPIHT coding and the backpropagation algorithm for neural networks. Various neural network training algorithms are tested for compressing video frames, including gradient descent with momentum and adaptive learning. The results show the compressed frames with different algorithms, and gradient descent with momentum and adaptive learning achieved the best compression ratio of 1.1737089:1 while maintaining image clarity.
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...sipij
Digital Signal Processing functions are widely used in real time high speed applications. Those functions
are generally implemented either on ASICs with inflexibility, or on FPGAs with bottlenecks of relatively
smaller utilization factor or lower speed compared to ASIC. The proposed reconfigurable DSP processor is
redolent to FPGA, but with basic fixed Common Modules (CMs) (like adders, subtractors, multipliers,
scaling units, shifters) instead of CLBs. This paper introduces the development of a reconfigurable DSP
processor that integrates different filter and transform functions. The switching between DSP functions is
occurred by reconfiguring the interconnection between CMs. Validation of the proposed reconfigurable
architecture has been achieved on Virtex5 FPGA. The architecture provides sufficient amount of flexibility,
parallelism and scalability.
This document summarizes the delay and area analysis of a regular 16-bit square root carry select adder (SQRT CSLA) architecture and a proposed modified architecture. The regular design contains five groups of ripple carry adders of different sizes. The delay and area of each group is evaluated based on the delays of basic blocks like full adders and multiplexers. The proposed design aims to reduce area and power by replacing one ripple carry adder in each group with a binary to excess-1 converter, which requires fewer logic gates. Implementation results show the proposed design has lower area and power with a slight increase in delay compared to the regular SQRT CSLA architecture.
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSIAEME Publication
An efficient Priority-Arbiter based Router is designed along with 2X2 and 3X3 mesh
topology based NOC architecture are designed. The Priority –Arbiter based Router
design includes Input registers, Priority arbiter, and XY- Routing algorithm. The
Priority-Arbiter based Router and NOC 2X2 and 3X3 Router designs are synthesized
and implemented using Xilinx ISE Tool and simulated using Modelsim6.5f. The
implementation is done by Artix-7 FPGA device, and the physically debugging of the
NOC 2X2 Router design is verified using Chipscope pro tool. The performance results
are analyzed in terms of the Area (Slices, LUT’s), Timing period, and Maximum
operating frequency. The comparison of the Priority-Arbiter based Router is made
concerning previous similar architecture with improvements.
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSIAEME Publication
An efficient Priority-Arbiter based Router is designed along with 2X2 and 3X3 mesh
topology based NOC architecture are designed. The Priority –Arbiter based Router
design includes Input registers, Priority arbiter, and XY- Routing algorithm. The
Priority-Arbiter based Router and NOC 2X2 and 3X3 Router designs are synthesized
and implemented using Xilinx ISE Tool and simulated using Modelsim6.5f. The
implementation is done by Artix-7 FPGA device, and the physically debugging of the
NOC 2X2 Router design is verified using Chipscope pro tool. The performance results
are analyzed in terms of the Area (Slices, LUT’s), Timing period, and Maximum
operating frequency. The comparison of the Priority-Arbiter based Router is made
concerning previous similar architecture with improvements.
Implementation of Area Effective Carry Select AddersKumar Goud
Abstract: In the design of Integrated circuit area occupancy plays a vital role because of increasing the necessity of portable systems. Carry Select Adder (CSLA) is a fast adder used in data processing processors for performing fast arithmetic functions. From the structure of the CSLA, the scope is reducing the area of CSLA based on the efficient gate-level modification. In this paper 16 bit, 32 bit, 64 bit and 128 bit Regular Linear CSLA, Modified Linear CSLA, Regular Square-root CSLA (SQRT CSLA) and Modified SQRT CSLA architectures have been developed and compared. However, the Regular CSLA is still area-consuming due to the dual Ripple-Carry Adder (RCA) structure. For reducing area, the CSLA can be implemented by using a single RCA and an add-one circuit instead of using dual RCA. Comparing the Regular Linear CSLA with Regular SQRT CSLA, the Regular SQRT CSLA has reduced area as well as comparing the Modified Linear CSLA with Modified SQRT CSLA; the Modified SQRT CSLA has reduced area. The results and analysis show that the Modified Linear CSLA and Modified SQRT CSLA provide better outcomes than the Regular Linear CSLA and Regular SQRT CSLA respectively. This project was aimed for implementing high performance optimized FPGA architecture. Modelsim 10.0c is used for simulating the CSLA and synthesized using Xilinx PlanAhead13.4. Then the implementation is done in Virtex5 FPGA Kit.
Keywords: Field Programmable Gate Array (FPGA), efficient, Carry Select Adder (CSLA), Square-root CSLA (SQRTCSLA).
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Design and Implementation of Multiplier using Advanced Booth Multiplier and R...IRJET Journal
This document describes a proposed design for a high-speed multiplier that uses a modified Booth algorithm and adaptive hold logic (AHL) along with razor flip-flops to mitigate performance degradation due to aging effects like NBTI and PBTI. The proposed design uses a radix-4 Booth multiplier in place of traditional row/column bypass multipliers to increase throughput. It also includes an AHL circuit that can dynamically determine if an operation requires one or two cycles to complete and can adjust this determination over time to account for aging. Simulation results showed that the proposed design achieved higher performance than fixed-latency multipliers and was more resistant to aging effects.
IRJET- Energy Efficient One Bit Subtractor Circuits for Computing Application...IRJET Journal
The document describes a proposed 10-transistor one-bit full subtractor circuit for computing applications. It begins with an abstract discussing the need for low-power VLSI circuits as technology scales down. It then reviews prior work on subtractor circuit designs and leakage power reduction techniques. The document goes on to propose a new 10-transistor full subtractor design and compares it to 20-transistor and 14-transistor designs in terms of area, delay, power consumption and energy efficiency. Simulation results indicate the proposed 10-transistor design is more energy efficient.
Energy efficiency is one of the most critical issue in design of System on Chip. In Network On
Chip (NoC) based system, energy consumption is influenced dramatically by mapping of
Intellectual Property (IP) which affect the performance of the system. In this paper we test the
antecedently extant proposed algorithms and introduced a new energy proficient algorithm
stand for 3D NoC architecture. In addition a hybrid method has also been implemented using
bioinspired optimization (particle swarm optimization) technique. The proposed algorithm has
been implemented and evaluated on randomly generated benchmark and real life application
such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing algorithm (spiral and crinkle) and has shown better
reduction in the communication energy consumption and shows improvement in the
performance of the system. Comparing our work with spiral and crinkle, experimental result
shows that the average reduction in communication energy consumption is 19% with spiral and
17% with crinkle mapping algorithms, while reduction in communication cost is 24% and 21%
whereas reduction in latency is of 24% and 22% with spiral and crinkle. Optimizing our work
and the existing methods using bio-inspired technique and having the comparison among them
an average energy reduction is found to be of 18% and 24%.
D-Flip Flop Layout: Efficient in Terms of Area and Power IJEEE
Flip flop forms the very basic element for the sequential circuits which are synchronous. This paper talks about D-Flip flop, which has been made area and power efficient with the aid of software tools DSCH 3.1 and Microwind 3.1. D-flip flop is implemented through Nand gates. Layout of DFF designed through auto generated and semi custom is compared, analyses and finally the results are computed showing 57% improvement in area and approximately 2 % reduction in power. CMOS 90nm technology has been used and efforts are made to reduce area and power.
Layout Design Analysis of SR Flip Flop using CMOS TechnologyIJEEE
This paper presents an area, delay and power efficient design of SR flip flop. As the chip manufacturing technology is on the threshold of evaluation, which shrinks a chip in size and enhances its performance, here the flip flop is implemented in a layout level which develops an optimized design using recent CMOS layout tools. The proposed SR flip flop has been designed and simulated using 45nm technology. After that, parametric analysis has been done. In this paper, flip flop has been developed using full automatic design flow and semi-custom design flow. The performance of SR flip flop layouts using different design flows has been analyzed and compared in terms of area, delay and power consumption. The simulation results show that the design of SR flip flop using semi-custom design flow improved the area occupied by 46.9% and power consumption is reduced by 38.4%.
High performance parallel prefix adders with fast carry chain logiciaemedu
This document discusses and compares different types of parallel prefix adders. It begins by introducing binary adders and describing the three stages of prefix addition: pre-computation, prefix-computation, and post-computation. It then describes various parallel prefix adders like Brent-Kung, Kogge-Stone, Ladner-Fischer, and Han-Carlson adders. For FPGA implementation, simple adders typically perform better than parallel prefix adders due to fast carry chains. The document proposes modifying the Kogge-Stone adder using fast carry logic to make it more suitable for FPGAs. Simulation results show that for higher bit widths, the modified Kogge-Stone adder provides better delay than a simple ad
Design of 32 bit Parallel Prefix Adders IOSR Journals
In this paper, we propose 32 bit Kogge-Stone, Brent-Kung, Ladner-Fischer parallel prefix adders. In
general N-bit adders like Ripple Carry Adders (slow adders compare to other adders), and Carry Look Ahead
adders (area consuming adders) are used in earlier days. But now the most Industries are using parallel prefix
adders because of their advantages compare to other adders. Parallel prefix adders are faster and area
efficient. Parallel prefix adder is a technique for increasing the speed in DSP processor while performing
addition. We simulate and synthesis different types of 32-bit prefix adders using Xilinx ISE 10.1i tool. By using
these synthesis results, we noted the performance parameters like number of LUTs and delay. We compare these three adders in terms of LUTs (represents area) and delay values.
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
The Digital Image processing applications like medical imaging, satellite imaging, Biometric trait images
etc., rely on multipliers to improve the quality of image. However, existing multiplication techniques
introduce errors in the output with consumption of more time, hence error free high speed multipliers has
to be designed. In this paper we propose FPGA based Recursive Error Free Mitchell Log Multiplier
(REFMLM) for image Filters. The 2x2 error free Mitchell log multiplier is designed with zero error by
introducing error correction term is used in higher order Karastuba-Ofman Multiplier (KOM)
Architectures. The higher order KOM multipliers is decomposed into number of lower order multipliers
using radix 2 till basic multiplier block of order 2x2 which is designed by error free Mitchell log multiplier.
The 8x8 REFMLM is tested for Gaussian filter to remove noise in fingerprint image. The Multiplier is
synthesized using Spartan 3 FPGA family device XC3S1500-5fg320. It is observed that the performance
parameters such as area utilization, speed, error and PSNR are better in the case of proposed architecture
compared to existing architectures.
Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are
well-suited for VLSI implementations. In this paper, a novel framework is introduced, which allows
the design of parallel-prefix Ling adders. The proposed approach saves one-logic level of
implementation compared to the parallel-prefix structures proposed for the traditional definition of
carry look ahead equations and reduces the fan out requirements of the design. Experimental results
reveal that the proposed adders achieve delay reductions of up to 14 percent when compared to the
fastest parallel-prefix architectures presented for the traditional definition of carry equations
A High Performance Modified SPIHT for Scalable Image CompressionCSCJournals
In this paper, we present a novel extension technique to the Set Partitioning in Hierarchical Trees (SPIHT) based image compression with spatial scalability. The present modification and the preprocessing techniques provide significantly better quality (both subjectively and objectively) reconstruction at the decoder with little additional computational complexity. There are two proposals for this paper. Firstly, we propose a pre-processing scheme, called Zero-Shifting, that brings the spatial values in signed integer range without changing the dynamic ranges, so that the transformed coefficient calculation becomes more consistent. For that reason, we have to modify the initialization step of the SPIHT algorithms. The experiments demonstrate a significant improvement in visual quality and faster encoding and decoding than the original one. Secondly, we incorporate the idea to facilitate resolution scalable decoding (not incorporated in original SPIHT) by rearranging the order of the encoded output bit stream. During the sorting pass of the SPIHT algorithm, we model the transformed coefficient based on the probability of significance, at a fixed threshold of the offspring. Calling it a fixed context model and generating a Huffman code for each context, we achieve comparable compression efficiency to that of arithmetic coder, but with much less computational complexity and processing time. As far as objective quality assessment of the reconstructed image is concerned, we have compared our results with popular Peak Signal to Noise Ratio (PSNR) and with Structural Similarity Index (SSIM). Both these metrics show that our proposed work is an improvement over the original one.
Dg source allocation by fuzzy and sa in distribution system for loss reductio...Alexander Decker
This document presents a new methodology using fuzzy logic and simulated annealing (SA) for optimally placing distributed generators (DGs) in distribution systems. The goal is to reduce real power losses and improve voltage profiles. A two-stage approach is used: (1) Fuzzy logic determines optimal DG locations based on minimizing losses and maintaining voltages. (2) SA determines the optimal DG sizes at these locations to maximize loss reduction. The method is tested on the IEEE 33-bus system and outperforms other approaches in solution quality and computation efficiency.
International Journal of Computational Engineering Research (IJCER) ijceronline
This document describes a system for implementing an artificial neuron using an FPGA. The system first converts analog signals from electrochemical sensors to digital signals using a 12-bit analog-to-digital converter (ADC). It then implements the mathematical operations of a neuron in digital logic on the FPGA, including multiplication, accumulation, and an activation function. Simulation and chipscope results are presented which verify the design and operation of the artificial neuron on the FPGA board. The system provides a modular design that could be expanded to create a complete artificial neural network for processing electrochemical sensor data.
Comparative analysis of multi stage cordic using micro rotation techniqueIAEME Publication
This document presents a comparative analysis of multi-stage CORDIC algorithms using a micro-rotation technique. It proposes a novel pipelined CORDIC architecture for efficient hardware implementation of trigonometric functions. The architecture reduces critical path delay by dividing the CORDIC rotations into multiple pipeline stages. It also suggests a generalized micro-rotation selection technique to reduce the number of iterations needed. Simulation results on a Xilinx FPGA show the proposed design has 51.9% lower slice-delay product and consumes 0.192 watts of power, demonstrating improved efficiency over other CORDIC implementations.
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
This document proposes an efficient CORDIC pipelined FFT algorithm for fingerprint recognition on FPGAs. The CORDIC algorithm uses only shift and add operations, making it suitable for replacing multipliers in the butterfly operations of an FFT. This reduces computational complexity. The proposed system takes a fingerprint, processes it with the CORDIC pipelined FFT, extracts features which are stored and then matched against a test fingerprint for recognition. The algorithm aims to provide an efficient hardware implementation of FFT and fingerprint recognition using minimal computations.
Qualitative Analysis of Optical Interleave Division Multiple Access using Spe...IRJET Journal
This document analyzes the effect of seed length on the performance of an optical interleave division multiple access (IDMA) system using prime inter-leavers. It compares the bit error rate (BER) performance of prime inter-leavers with different seed lengths ranging from 2 to 13 for a fixed number of users and data length. The simulation results show that increasing the seed length from 2 to the maximum single digit prime number 7 decreases the BER significantly, with the optimal BER achieved for a seed length of 7. However, BER increases again for seed lengths in double digits. Similarly for a longer data length, the optimal BER is obtained for a seed length of 7. In conclusion, using prime inter
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
The document discusses video compression using the Set Partitioning in Hierarchical Trees (SPIHT) algorithm and neural networks. It presents the principles of SPIHT coding and the backpropagation algorithm for neural networks. Various neural network training algorithms are tested for compressing video frames, including gradient descent with momentum and adaptive learning. The results show the compressed frames with different algorithms, and gradient descent with momentum and adaptive learning achieved the best compression ratio of 1.1737089:1 while maintaining image clarity.
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...sipij
Digital Signal Processing functions are widely used in real time high speed applications. Those functions
are generally implemented either on ASICs with inflexibility, or on FPGAs with bottlenecks of relatively
smaller utilization factor or lower speed compared to ASIC. The proposed reconfigurable DSP processor is
redolent to FPGA, but with basic fixed Common Modules (CMs) (like adders, subtractors, multipliers,
scaling units, shifters) instead of CLBs. This paper introduces the development of a reconfigurable DSP
processor that integrates different filter and transform functions. The switching between DSP functions is
occurred by reconfiguring the interconnection between CMs. Validation of the proposed reconfigurable
architecture has been achieved on Virtex5 FPGA. The architecture provides sufficient amount of flexibility,
parallelism and scalability.
This document summarizes the delay and area analysis of a regular 16-bit square root carry select adder (SQRT CSLA) architecture and a proposed modified architecture. The regular design contains five groups of ripple carry adders of different sizes. The delay and area of each group is evaluated based on the delays of basic blocks like full adders and multiplexers. The proposed design aims to reduce area and power by replacing one ripple carry adder in each group with a binary to excess-1 converter, which requires fewer logic gates. Implementation results show the proposed design has lower area and power with a slight increase in delay compared to the regular SQRT CSLA architecture.
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSIAEME Publication
An efficient Priority-Arbiter based Router is designed along with 2X2 and 3X3 mesh
topology based NOC architecture are designed. The Priority –Arbiter based Router
design includes Input registers, Priority arbiter, and XY- Routing algorithm. The
Priority-Arbiter based Router and NOC 2X2 and 3X3 Router designs are synthesized
and implemented using Xilinx ISE Tool and simulated using Modelsim6.5f. The
implementation is done by Artix-7 FPGA device, and the physically debugging of the
NOC 2X2 Router design is verified using Chipscope pro tool. The performance results
are analyzed in terms of the Area (Slices, LUT’s), Timing period, and Maximum
operating frequency. The comparison of the Priority-Arbiter based Router is made
concerning previous similar architecture with improvements.
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSIAEME Publication
An efficient Priority-Arbiter based Router is designed along with 2X2 and 3X3 mesh
topology based NOC architecture are designed. The Priority –Arbiter based Router
design includes Input registers, Priority arbiter, and XY- Routing algorithm. The
Priority-Arbiter based Router and NOC 2X2 and 3X3 Router designs are synthesized
and implemented using Xilinx ISE Tool and simulated using Modelsim6.5f. The
implementation is done by Artix-7 FPGA device, and the physically debugging of the
NOC 2X2 Router design is verified using Chipscope pro tool. The performance results
are analyzed in terms of the Area (Slices, LUT’s), Timing period, and Maximum
operating frequency. The comparison of the Priority-Arbiter based Router is made
concerning previous similar architecture with improvements.
Implementation of Area Effective Carry Select AddersKumar Goud
Abstract: In the design of Integrated circuit area occupancy plays a vital role because of increasing the necessity of portable systems. Carry Select Adder (CSLA) is a fast adder used in data processing processors for performing fast arithmetic functions. From the structure of the CSLA, the scope is reducing the area of CSLA based on the efficient gate-level modification. In this paper 16 bit, 32 bit, 64 bit and 128 bit Regular Linear CSLA, Modified Linear CSLA, Regular Square-root CSLA (SQRT CSLA) and Modified SQRT CSLA architectures have been developed and compared. However, the Regular CSLA is still area-consuming due to the dual Ripple-Carry Adder (RCA) structure. For reducing area, the CSLA can be implemented by using a single RCA and an add-one circuit instead of using dual RCA. Comparing the Regular Linear CSLA with Regular SQRT CSLA, the Regular SQRT CSLA has reduced area as well as comparing the Modified Linear CSLA with Modified SQRT CSLA; the Modified SQRT CSLA has reduced area. The results and analysis show that the Modified Linear CSLA and Modified SQRT CSLA provide better outcomes than the Regular Linear CSLA and Regular SQRT CSLA respectively. This project was aimed for implementing high performance optimized FPGA architecture. Modelsim 10.0c is used for simulating the CSLA and synthesized using Xilinx PlanAhead13.4. Then the implementation is done in Virtex5 FPGA Kit.
Keywords: Field Programmable Gate Array (FPGA), efficient, Carry Select Adder (CSLA), Square-root CSLA (SQRTCSLA).
Design of Multiplier Less 32 Tap FIR Filter using VHDLIJMER
This Paper provide the principles of Distributed Arithmetic, and introduce it into the FIR
filters design, and then presents a 32-Tap FIR low-pass filter using Distributed Arithmetic, which save
considerable MAC blocks to decrease the circuit scale and pipeline structure is also used to increase the
system speed. The implementation of FIR filters on FPGA based on traditional method costs considerable
hardware resources, which goes against the decrease of circuit scale and the increase of system speed.
It is very well known that the FIR filter consists of Delay elements, Multipliers and Adders. Because of
usage of Multipliers in early design gives rise to 2 demerits that are:
(i) Increase in Area and
(ii) Increase in the Delay which ultimately results in low performance (Less speed).
So the Distributed Arithmetic for FIR Filter design and Implementation is provided in this work to solve
this problem. Distributed Arithmetic structure is used to increase the recourse usage and pipeline
structure is used to increase the system speed. Distributed Arithmetic can save considerable hardware
resources through using LUT to take the place of MAC units
Top Read Articles---January 2024--VLSICSVLSICS Design
International journal of VLSI design & Communication Systems (VLSICS) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of VLSI Design & Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & communication concepts a destablishing new collaborations in these areas.
Authors are solicited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the VLSI design & Communications.
Regular clocking scheme based design of cost-efficient comparator in QCAnooriasukmaningtyas
Quantum-dot cellular automata (QCA) gained a notable attraction in the emerging nanotechnology to get the better of power consumption, density, nano-scale design, the performance of the present CMOS technology. Many designs had been proposed in QCA for an arithmetic circuit like adder, divider, parity checker and comparator etc. Most of the designs have been facing the challenges of cost efficiency, power dissi-pation, device density etc. However, consideration of design automation, underlying clocking layout and integration of the sub modules are the most important which has a direct impact on the fabrication of the design. This work proposed a novel cost ef-fective and power aware comparator design, which is an essential segment in central processing unit (CPU). The noticeable novelty of the design was the use of underlying regular clocking scheme. A new scalable, regular clocking scheme has been utilized in the coplanar design of the comparator which enables regular or uniform cell layout of QCA circuit. It also exhibited the significant improvement over existing counterparts having irregular clocking in terms of area and latency. QCADesigner was used to test and verify the functionality of the circuit and by using QCAPro the power dissipation has been analyzed.
Many intellectual property (IP) modules are present in contemporary system on chips (SoCs). This could provide an issue with interconnection among different IP modules, which would limit the system's ability to scale. Traditional bus-based SoC architectures have a connectivity bottleneck, and network on chip (NoC) has evolved as an embedded switching network to address this issue. The interconnections between various cores or IP modules on a chip have a significant impact on communication and chip performance in terms of power, area latency and throughput. Also, designing a reliable fault tolerant NoC became a significant concern. In fault tolerant NoC it becomes critical to identify faulty node and dynamically reroute the packets keeping minimum latency. This study provides an insight into a domain of NoC, with intention of understanding fault tolerant approach based on the XY routing algorithm for 4×4 mesh architecture. The fault tolerant NoC design is synthesized on field programmable gate array (FPGA).
Parallel Processing Technique for Time Efficient Matrix MultiplicationIJERA Editor
The document proposes a parallel-parallel input single output (PPI-SO) design for matrix multiplication that reduces hardware resources compared to existing designs. It uses fewer multipliers and registers than existing designs, trading off increased completion time. Simulation results show the PPI-SO design uses 30% less energy and involves 70% less area-delay product than other designs.
PERFORMANCE EVALUATION OF LOW POWER CARRY SAVE ADDER FOR VLSI APPLICATIONSVLSICS Design
This document discusses the performance evaluation of a low power carry save adder (CSA) for VLSI applications. It begins with an abstract that examines subthreshold leakage in CSA circuits and how reducing threshold voltage can lower power consumption. The document then reviews previous work on CSA design. It presents the architecture of a proposed 4-bit CSA designed using gate diffusion input cells to reduce area and power. Simulation results show the CSA has total average power of 4.93μW, propagation delay of 16.3ns, and 37% reduced area due to using GDI cells. The CSA operates as intended in subthreshold regions with low static leakage current.
PERFORMANCE EVALUATION OF LOW POWER CARRY SAVE ADDER FOR VLSI APPLICATIONSVLSICS Design
This report examines the subject of sub threshold leakage on carry save adder. When the gate to source voltage reduces to the threshold voltage at that place is yet some amount of current flow in the circuit and that is undesired. As the process technology advancing much rapidly the threshold voltage of MOS devices reduces very drastically, and it must be applied in lower power devices since it contributes to low amount of leakage current which confine increases the power consumption of the devices. Adders are the basic building blocks for any digital circuit design and used in almost all arithmetic’s. The CSA proves efficient adders due to its quick and precise computations. Hence this paper performs sub threshold analysis on CSA and the scrutinize results that the total average power is around 4.93µW, the propagation delay for complete operation is 16.3ns and since this design uses GDI cell so there is a reduction in area with 37%.
PERFORMANCE EVALUATION OF LOW POWER CARRY SAVE ADDER FOR VLSI APPLICATIONSVLSICS Design
This report examines the subject of sub threshold leakage on carry save adder. When the gate to source voltage reduces to the threshold voltage at that place is yet some amount of current flow in the circuit and that is undesired. As the process technology advancing much rapidly the threshold voltage of MOS devices reduces very drastically, and it must be applied in lower power devices since it contributes to low amount
of leakage current which confine increases the power consumption of the devices. Adders are the basic
building blocks for any digital circuit design and used in almost all arithmetic’s. The CSA proves efficient adders due to its quick and precise computations. Hence this paper performs sub threshold analysis on CSA and the scrutinize results that the total average power is around 4.93µW, the propagation delay for complete operation is 16.3ns and since this design uses GDI cell so there is a reduction in area with 37%.
Duet advancement of new technology in the field of VLSI and Embedded system, there is an increasing demand of high speed and low power consumption processor. Speed of processor greatly depends on its multiplier as well as adder performance. Due to which high speed adder architecture become important. Sever a ladder architecture designs have been developed to increase the efficiency of the adder. In this paper, we introduce an architecture that performs high speed modified carry select adder using boot hen coder (BEC) Technique. Booth encoder, Mathematics is an ancient Indian system of Mathematics. Here we are introduced two carry select based design. These designs are implementation Xilinx Vertex device family.
A STUDY OF ENERGY-AREA TRADEOFFS OF VARIOUS ARCHITECTURAL STYLES FOR ROUTING ...VLSICS Design
This document presents a study exploring various architectural styles for routing inputs in a coarse-grained reconfigurable fabric. It examines integrated constants, inputs coming from the side of the fabric, and combinations with dedicated pass gates. It implements these styles using a 90nm ASIC process and analyzes the area and energy tradeoffs. It finds that an approach combining inputs coming from the side with 50% dedicated pass gates provides the best results, with 31% energy savings and 62% area savings over the baseline architecture.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET Journal
1) The document discusses a review paper on implementing a radix-2 Decimation-In-Time (DIT) Fast Fourier Transform (FFT) using reversible DKG gates.
2) The proposed design uses a 4x4 reversible DKG gate that functions as both an adder and subtractor.
3) The design is synthesized using Xilinx ISE software and simulated using VHDL test benches to evaluate performance.
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...CSCJournals
An ideal Network Processor, that is, a programmable multi-processor device must be capable of offering both the flexibility and speed required for packet processing. But current Network Processor systems generally fall short of the above benchmarks due to traffic fluctuations inherent in packet networks, and the resulting workload variation on individual pipeline stage over a period of time ultimately affects the overall performance of even an otherwise sound system. One potential solution would be to change the code running at these stages so as to adapt to the fluctuations; a near robust system with standing traffic fluctuations is the dynamic adaptive processor, reconfiguring the entire system, which we introduce and study to some extent in this paper. We achieve this by using a crucial decision making model, transferring the binary code to the processor through the SOAP protocol.
Design of high speed adders for efficient digital design blocksBharath Chary
This document discusses the design of high-speed adders for efficient digital design blocks. It compares parallel-prefix adders like Kogge-Stone, Brent-Kung, Sklansky, and Kogge-Stone Ling adders. The Kogge-Stone Ling adder is found to perform most efficiently. Different tree adder structures are designed using CMOS logic and transmission gate logic and compared in terms of area, delay, and power consumption. Ling adders are analyzed, and it is shown that they reduce dependency on previous bit additions compared to other adders.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
2. 372 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012
TABLE I
DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA
TABLE II
Fig. 1. Delay and Area evaluation of an XOR gate. FUNCTION TABLE OF THE 4-b BEC
III. BEC
As stated above the main idea of this work is to use BEC instead of
the RCA with Cin = 1 in order to reduce the area and power consump-
tion of the regular CSLA. To replace the n-bit RCA, an n + 1-bit BEC
Fig. 2. 4-b BEC. is required. A structure and the function table of a 4-b BEC are shown
in Fig. 2 and Table II, respectively.
Fig. 3 illustrates how the basic function of the CSLA is obtained by
using the 4-bit BEC together with the mux. One input of the 8:4 mux
gets as it input (B3, B2, B1, and B0) and another input of the mux is the
BEC output. This produces the two possible partial results in parallel
and the mux is used to select either the BEC output or the direct inputs
according to the control signal Cin. The importance of the BEC logic
stems from the large silicon area reduction when the CSLA with large
number of bits are designed. The Boolean expressions of the 4-bit BEC
is listed as (note the functional symbols NOT, AND;^XOR)
X0 = B0
X1 = B0^B1
X2 = B2^(B0 B1)
Fig. 3. 4-b BEC with 8:4 mux. X3 = B3^(B0 B1 B2) :
II. DELAY AND AREA EVALUATION METHODOLOGY OF THE BASIC IV. DELAY AND AREA EVALUATION METHODOLOGY OF REGULAR
ADDER BLOCKS 16-B SQRT CSLA
The AND, OR, and Inverter (AOI) implementation of an XOR gate is The structure of the 16-b regular SQRT CSLA is shown in Fig. 4. It
shown in Fig. 1. The gates between the dotted lines are performing the has five groups of different size RCA. The delay and area evaluation of
operations in parallel and the numeric representation of each gate indi- each group are shown in Fig. 5, in which the numerals within [] specify
cates the delay contributed by that gate. The delay and area evaluation the delay values, e.g., sum2 requires 10 gate delays. The steps leading
methodology considers all gates to be made up of AND, OR, and In- to the evaluation are as follows.
verter, each having delay equal to 1 unit and area equal to 1 unit. We 1) The group2 [see Fig. 5(a)] has two sets of 2-b RCA. Based on
then add up the number of gates in the longest path of a logic block the consideration of delay values of Table I, the arrival time of
that contributes to the maximum delay. The area evaluation is done by selection input c1[time(t) = 7] of 6:3 mux is earlier than s3[t =
counting the total number of AOI gates required for each logic block. 8] and later than s2[t = 6]. Thus, sum3[t = 11] is summation of
Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder s3 and mux[t = 3] and sum2[t = 10] is summation of c1 and
(HA), and FA are evaluated and listed in Table I. mux.
3. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 373
Fig. 4. Regular 16-b SQRT CSLA.
TABLE III
DELAY AND AREA COUNT OF REGULAR SQRT CSLA GROUPS
of Table I, the total number of gate counts in group2 is determined
as follows:
Gate count = 57 (FA + HA + Mux)
FA = 39(3 3 13)
HA = 6(1 3 6)
Mux = 12(3 3 4):
4) Similarly, the estimated maximum delay and area of the other
groups in the regular SQRT CSLA are evaluated and listed in
Table III.
V. DELAY AND AREA EVALUATION METHODOLOGY OF MODIFIED
16-B SQRT CSLA
The structure of the proposed 16-b SQRT CSLA using BEC for RCA
with Cin = 1 to optimize the area and power is shown in Fig. 6. We
again split the structure into five groups. The delay and area estimation
Fig. 5. Delay and area evaluation of regular SQRT CSLA: (a) group2, (b) of each group are shown in Fig. 7. The steps leading to the evaluation
group3, (c) group4, and (d) group5. F is a Full Adder.
are given here.
1) The group2 [see Fig. 7(a)] has one 2-b RCA which has 1 FA and
1 HA for Cin = 0. Instead of another 2-b RCA with Cin = 1
2) Except for group2, the arrival time of mux selection input is al- a 3-b BEC is used which adds one to the output from 2-b RCA.
ways greater than the arrival time of data outputs from the RCA’s. Based on the consideration of delay values of Table I, the arrival
Thus, the delay of group3 to group5 is determined, respectively as time of selection input c1[time(t) = 7] of 6:3 mux is earlier than
follows: the s3[t = 9] and c3[t = 10] and later than the s2[t = 4]. Thus,
the sum3 and final c3 (output from mux) are depending on s3
fc6; sum[6 : 4]g = c3[t = 10] + mux
and mux and partial c3 (input to mux) and mux, respectively. The
fc10; sum[10 : 7]g = c6[t = 13] + mux sum2 depends on c1 and mux.
fcout; sum[15 : 11]g = c10[t = 16] + mux: 2) For the remaining group’s the arrival time of mux selection input is
always greater than the arrival time of data inputs from the BEC’s.
3) The one set of 2-b RCA in group2 has 2 FA for Cin = 1 and the Thus, the delay of the remaining groups depends on the arrival
other set has 1 FA and 1 HA for Cin = 0. Based on the area count time of mux selection input and the mux delay.
4. 374 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012
Fig. 6. Modified 16-b SQRT CSLA. The parallel RCA with Cin = 1 is replaced with BEC.
TABLE IV
DELAY AND AREA COUNT OF MODIFIED SQRT CSLA
4) Similarly, the estimated maximum delay and area of the other
groups of the modified SQRT CSLA are evaluated and listed in
Table IV.
Comparing Tables III and IV, it is clear that the proposed modified
SQRT CSLA saves 113 gate areas than the regular SQRT CSLA, with
only 11 increases in gate delays. To further evaluate the performance,
we have resorted to ASIC implementation and simulation.
VI. ASIC IMPLEMENTATION RESULTS
The design proposed in this paper has been developed using Ver-
ilog-HDL and synthesized in Cadence RTL compiler using typical li-
braries of TSMC 0.18 um technology. The synthesized Verilog netlist
and their respective design constraints file (SDC) are imported to Ca-
dence SoC Encounter and are used to generate automated layout from
standard cells and placement and routing [7]. Parasitic extraction is per-
formed using Encounter’s Native RC extraction tool and the extracted
parasitic RC (SPEF format) is back annotated to Common Timing En-
gine in Encounter platform for static timing analysis. For each word
size of the adder, the same value changed dump (VCD) file is generated
Fig. 7. Delay and area evaluation of modified SQRT CSLA: (a) group2, for all possible input conditions and imported the same to Cadence En-
(b) group3, (c) group4, and (d) group5. H is a Half Adder. counter Power Analysis to perform the power simulations. The similar
design flow is followed for both the regular and modified SQRT CSLA.
Table V exhibits the simulation results of both the CSLA structures
3) The area count of group2 is determined as follows: in terms of delay, area and power. The area indicates the total cell area
Gate count = 43 (FA + HA + Mux + BEC) of the design and the total power is sum of the leakage power, internal
FA = 13(1 3 13) power and switching power. The percentage reduction in the cell area,
total power, power-delay product and the area–delay product as func-
HA = 6(1 3 6) tion of the bit size are shown in Fig. 8(a). Also plotted is the percentage
AND = 1 delay overhead in Fig. 8(b). It is clear that the area of the 8-, 16-, 32-,
NOT = 1 and 64-b proposed SQRT CSLA is reduced by 9.7%, 15%, 16.7%, and
17.4%, respectively. The total power consumed shows a similar trend
XOR = 10(2 3 5) of increasing reduction in power consumption 7.6%, 10.56%, 13.63%,
Mux = 12(3 3 4) : and 15.46 % with the bit size. Interestingly, the delay overhead also
5. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 375
TABLE V
COMPARISON OF THE REGULAR AND MODIFIED SQRT CSLA
the 32-b and 64-b by as much as 8.18%, and 12.28% respectively. Sim-
ilarly the area-delay product of the proposed design for 16-, 32-, and
64-b is also reduced by 6.7%, 11%, and 14.4% respectively.
VII. CONCLUSION
A simple approach is proposed in this paper to reduce the area and
power of SQRT CSLA architecture. The reduced number of gates of
this work offers the great advantage in the reduction of area and also the
total power. The compared results show that the modified SQRT CSLA
has a slightly larger delay (only 3.76%), but the area and power of the
64-b modified SQRT CSLA are significantly reduced by 17.4% and
15.4% respectively. The power-delay product and also the area-delay
product of the proposed design show a decrease for 16-, 32-, and 64-b
sizes which indicates the success of the method and not a mere tradeoff
of delay for power and area. The modified CSLA architecture is there-
fore, low area, low power, simple and efficient for VLSI hardware im-
plementation. It would be interesting to test the design of the modified
128-b SQRT CSLA.
ACKNOWLEDGMENT
The authors would like to thank S. Sivanantham, P. MageshKannan,
and S. Ravi of the VLSI Division, VIT University, Vellore, India, for
their contributions to this work.
REFERENCES
[1] O. J. Bedrij, “Carry-select adder,” IRE Trans. Electron. Comput., pp.
340–344, 1962.
[2] B. Ramkumar, H. M. Kittur, and P. M. Kannan, “ASIC implementation
Fig. 8. (a) Percentage reduction in the cell area, total power, power–delay of modified faster carry save adder,” Eur. J. Sci. Res., vol. 42, no. 1, pp.
product, and area–delay product. (b) Percentage of delay overhead. 53–58, 2010.
[3] T. Y. Ceiang and M. J. Hsiao, “Carry-select adder using single ripple
carry adder,” Electron. Lett., vol. 34, no. 22, pp. 2101–2103, Oct. 1998.
[4] Y. Kim and L.-S. Kim, “64-bit carry-select adder with reduced area,”
exhibits a similarly decreasing trend with bit size. The delay overhead Electron. Lett., vol. 37, no. 10, pp. 614–615, May 2001.
for the 8, 16, and 32-b is 14%, 9.8%, and 6.7% respectively, whereas [5] J. M. Rabaey, Digtal Integrated Circuits—A Design Perspec-
for the 64-b it reduces to only 3.76%. The power–delay product of the tive. Upper Saddle River, NJ: Prentice-Hall, 2001.
[6] Y. He, C. H. Chang, and J. Gu, “An area efficient 64-bit square root
proposed 8-b is higher than that of the regular SQRT CSLA by 5.2% carry-select adder for low power applications,” in Proc. IEEE Int. Symp.
and the area-delay product is lower by 2.9%. However, the power-delay Circuits Syst., 2005, vol. 4, pp. 4082–4085.
product of the proposed 16-b SQRT CSLA reduces by 1.76% and for [7] Cadence, “Encounter user guide,” Version 6.2.4, March 2008.