Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42% as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA.
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET Journal
This document presents a flexible DSP accelerator architecture using a carry lookahead tree. It aims to improve on existing flexible accelerator designs by enabling computations to be efficiently performed using carry lookahead formatted data. The proposed architecture contains flexible computational units that can efficiently execute a wide range of DSP operation patterns. It differs from prior work by using a carry lookahead tree instead of a carry save tree. Simulation results show the proposed design achieves faster execution and a larger reduction in delay time compared to existing flexible accelerator approaches.
IRJET- A Review of Approximate Adders for Energy-Efficient Digital Signal Pro...IRJET Journal
The document reviews recent progress in approximate adders for energy-efficient digital signal processing. It summarizes various types of approximate adders that have been proposed, including speculative adders, segmented adders, carry select adders, and adders using approximate full adders. The document provides details on the design and operation of several specific approximate adder circuits. It also compares the delay and area complexity of different approximate adder designs.
128 bit low power and area efficient carry select adder amit bakshi academiagopi448
The document describes a 128-bit low power and area efficient carry select adder design. It proposes using a binary to excess-1 converter instead of a ripple carry adder for the carry select adder, which can reduce area and power consumption. Simulation results show the modified 128-bit carry select adder design achieves a 15.48% reduction in area and 7.41% reduction in power compared to a regular carry select adder design.
IRJET - A Speculative Approximate Adder for Error Recovery UnitIRJET Journal
This document presents a speculative approximate adder for error recovery. The adder is partitioned into non-overlapping blocks whose carries are predicted based on input signals of the current and next blocks. This reduces the critical path delay on average to one block. An error recovery unit is also proposed to further reduce output error rates. Simulation results show the proposed adder achieves lower error rates and error metrics compared to state-of-the-art approximate adders, with only a small increase in delay and area due to the error recovery unit.
This document presents a design for a simple accuracy reconfigurable adder (SARA) to improve the accuracy-power-delay efficiency of approximate adders. SARA is proposed as an alternative to existing accuracy configurable adders that require large areas due to redundancy and error correction circuitry. SARA uses a simple carry prediction approach without redundancy. It is evaluated as part of a 16-bit adder and shown to reduce delay compared to a ripple carry adder. SARA is also applied in an 8x8 Wallace multiplier, demonstrating reduced delay compared to using a ripple carry adder. The proposed SARA design achieves improvements in area, power, and delay efficiency for approximate arithmetic circuits.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET Journal
This document presents a flexible DSP accelerator architecture using a carry lookahead tree. It aims to improve on existing flexible accelerator designs by enabling computations to be efficiently performed using carry lookahead formatted data. The proposed architecture contains flexible computational units that can efficiently execute a wide range of DSP operation patterns. It differs from prior work by using a carry lookahead tree instead of a carry save tree. Simulation results show the proposed design achieves faster execution and a larger reduction in delay time compared to existing flexible accelerator approaches.
IRJET- A Review of Approximate Adders for Energy-Efficient Digital Signal Pro...IRJET Journal
The document reviews recent progress in approximate adders for energy-efficient digital signal processing. It summarizes various types of approximate adders that have been proposed, including speculative adders, segmented adders, carry select adders, and adders using approximate full adders. The document provides details on the design and operation of several specific approximate adder circuits. It also compares the delay and area complexity of different approximate adder designs.
128 bit low power and area efficient carry select adder amit bakshi academiagopi448
The document describes a 128-bit low power and area efficient carry select adder design. It proposes using a binary to excess-1 converter instead of a ripple carry adder for the carry select adder, which can reduce area and power consumption. Simulation results show the modified 128-bit carry select adder design achieves a 15.48% reduction in area and 7.41% reduction in power compared to a regular carry select adder design.
IRJET - A Speculative Approximate Adder for Error Recovery UnitIRJET Journal
This document presents a speculative approximate adder for error recovery. The adder is partitioned into non-overlapping blocks whose carries are predicted based on input signals of the current and next blocks. This reduces the critical path delay on average to one block. An error recovery unit is also proposed to further reduce output error rates. Simulation results show the proposed adder achieves lower error rates and error metrics compared to state-of-the-art approximate adders, with only a small increase in delay and area due to the error recovery unit.
This document presents a design for a simple accuracy reconfigurable adder (SARA) to improve the accuracy-power-delay efficiency of approximate adders. SARA is proposed as an alternative to existing accuracy configurable adders that require large areas due to redundancy and error correction circuitry. SARA uses a simple carry prediction approach without redundancy. It is evaluated as part of a 16-bit adder and shown to reduce delay compared to a ripple carry adder. SARA is also applied in an 8x8 Wallace multiplier, demonstrating reduced delay compared to using a ripple carry adder. The proposed SARA design achieves improvements in area, power, and delay efficiency for approximate arithmetic circuits.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm.
Design and Implementation of an Efficient Carry Skip AdderIRJET Journal
1) The document describes a design for an efficient 32-bit carry skip adder to achieve high speed and low area consumption.
2) It proposes using a Knowles adder in the middle stage and an optimized ripple carry adder instead of a Brent-Kung adder and normal RCA to improve speed and reduce area.
3) Simulation results show the proposed hybrid carry skip adder has a 38% reduced delay compared to a conventional carry skip adder and 16% reduced delay compared to a previous hybrid design. It is coded in Verilog and tested on a Xilinx FPGA.
This document describes the design of a 4-bit SFQ (Single Flux Quantum) multiplier circuit using a modified Booth encoding technique. It begins with background on multipliers and the advantages of the Booth encoding method over an AND array for reducing the number of partial products. The document then presents the design and analysis of the proposed 4-bit SFQ multiplier using a modified 2-bit Booth encoder, Carry Save Adder tree, and 6-bit carry lookahead adder. Simulation results show the modified Booth encoder approach reduces power consumption by 22.24% and delay by 23.96% compared to a conventional Booth encoder design.
AN EFFICIENT DSP ARCHITECTURE DESIGN IN FPGA USING LOOP BACK ALGORITHMIJARIDEA Journal
Abstract— Advanced flag processors assume a noteworthy part in electronic gadgets, bio restorative applications, correspondence conventions. Effective IC configuration is a key variable to accomplish low power and high throughput IP center improvement for convenient gadgets. Computerized flag processors assume a huge parts progressively figuring and preparing yet region overhead and power utilization are real disadvantages to accomplish productive outline requirements. Adaptable DSP engineering utilizing circle back calculation is a proposed way to deal with beat existing outline limitations. For instance, plan of 8 point FFT engineering requires 3 phases for butterfly calculation unit that 48 adders and 12 multipliers prompts high power and territory utilization. To lessen range and power, Loop back calculation is proposed and it requires 16 adders and 4 multipliers for general outline. Likewise outline of various DSP layouts like FFT, First request FIR channel and Second request FIR channel is acquainted and mapping in with the design as preparing component and applying the circle back calculation. In the outline of FFT,FIR formats adders, for example, parallel prefix viper, move and include multiplier, baugh-wooley multiplier are utilized to break down effective plan of DSP engineering. Recreation and examination of inactivity, territory, control productivity with the current structures are happens utilizing model sim 6.4a and combine utilizing Xilinx 14.3 ISE.
Keywords— Baugh-UWooley Multiplier, Loop Back Algorithm, Parallel Prefix Adders.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IRJET- Design and HW/SW Implementation of a Nonlinear Interpolator for Border...IRJET Journal
This document describes the design and hardware/software implementation of a nonlinear interpolator for border preserving. It presents a reconfigurable hardware architecture using an FPGA to accelerate the interpolation process compared to a software-only solution. The proposed interpolator calculates pixel values using a weighted average that considers edge information to preserve image quality during resizing. Validation results showed the hardware/software solution improved interpolation speed by 258 times compared to software while maintaining high image quality.
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
This document proposes a flexible accelerator architecture that exploits carry-save arithmetic to efficiently implement digital signal processing kernels. The architecture includes flexible computational units that can be configured to perform chained addition, multiplication, and addition operations directly on carry-save formatted data without intermediate conversions. Experimental results show the proposed architecture delivers average gains of 61.91% in area-delay product and 54.43% in energy consumption compared to state-of-the-art flexible data paths.
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
This document discusses the performance evaluation of FPGA-based runtime dynamic partial reconfiguration for matrix multiplication. The author implements matrix multiplication using a Virtex-5 FPGA, with the ability to reconfigure modules for the multiplication at runtime. Simulation results show the multiplication of 2x2 and 4x4 matrices using this partial reconfiguration method. The results demonstrate that dynamic partial reconfiguration allows changing modules while keeping other parts of the design static, improving resource utilization and reducing power consumption compared to a fully static implementation.
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
This document discusses performance evaluation of FPGA-based runtime dynamic partial reconfiguration for matrix multiplication. It implements matrix multiplication using a Virtex-5 FPGA, with the design reconfigured at runtime by changing partial modules. Results show that dynamic partial reconfiguration increases the availability of reconfigurable resources on the FPGA. The paper also outlines the hardware implementation methodology, proposed architecture using partial reconfiguration, and Xilinx design flow.
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...IRJET Journal
This document proposes modifying the standard AMBA bus architecture to support simultaneous read and write operations to memory, which is needed for some applications but not supported in the original AMBA design. It describes designing a high-speed AMBA bus architecture for image transmission and face detection using FPGA. The proposed design uses a finite state machine-based memory controller and pipeline data transfer to reduce wait states and improve system performance compared to existing AMBA designs. It aims to implement this modified AMBA architecture on FPGA for real-time image processing applications.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
An Improved Optimization Techniques for Parallel Prefix Adder using FPGAIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
The document summarizes a research paper on the design of high-speed low-power Viterbi decoders for trellis coded modulation systems. It proposes a pre-computation architecture incorporated with the T-algorithm to effectively reduce power consumption without degrading decoding speed. The key aspects are:
1) A pre-computation approach is presented to calculate the optimal path metric in advance, allowing it to be determined faster and avoiding delays in the feedback loop.
2) A method is provided to systematically determine the optimal number of pre-computation steps needed to achieve the theoretical iteration bound for the Viterbi decoder.
3) Implementation results show the proposed pre-computation architecture can reduce power consumption by up to 70% without loss
Energy efficiency is one of the most critical issue in design of System on Chip. In Network On
Chip (NoC) based system, energy consumption is influenced dramatically by mapping of
Intellectual Property (IP) which affect the performance of the system. In this paper we test the
antecedently extant proposed algorithms and introduced a new energy proficient algorithm
stand for 3D NoC architecture. In addition a hybrid method has also been implemented using
bioinspired optimization (particle swarm optimization) technique. The proposed algorithm has
been implemented and evaluated on randomly generated benchmark and real life application
such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing algorithm (spiral and crinkle) and has shown better
reduction in the communication energy consumption and shows improvement in the
performance of the system. Comparing our work with spiral and crinkle, experimental result
shows that the average reduction in communication energy consumption is 19% with spiral and
17% with crinkle mapping algorithms, while reduction in communication cost is 24% and 21%
whereas reduction in latency is of 24% and 22% with spiral and crinkle. Optimizing our work
and the existing methods using bio-inspired technique and having the comparison among them
an average energy reduction is found to be of 18% and 24%.
This document presents a new technique to reduce power consumption and increase speed in Content Addressable Memory (CAM) using memory partition and clock gating. The proposed CAM design partitions the memory into segments based on the most significant bits to check for matches. Since most words fail to match in their segments, the search can be discontinued for those segments, reducing power and increasing speed. Clock gating is also used to power gate unused portions of the CAM, further reducing static and dynamic power. Simulation and analysis using Quartus II and ModelSim show the proposed design reduces total power dissipation from 369.9mW to 46.3mW compared to an existing design.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET Journal
This document presents a novel architecture for a high-speed inexact speculative adder using carry lookahead adder and Brent Kung adder. The proposed adder splits the critical path into shorter paths using fine-grain pipelining to improve speed. It uses a carry lookahead adder and Brent Kung adder in 4-bit blocks to speculate the carry and calculate sums, with error compensation. The design is implemented using a 5-stage pipeline to further reduce delay. Implementation in MAGIC shows the proposed adder achieves higher speed through optimized pipelining of key components in the speculative and compensation blocks.
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
This document discusses the implementation of a lightweight VLSI design for the HIGHT cipher on an FPGA. It begins with an introduction to lightweight VLSI architecture and its applications in low-resource devices. It then provides background on the HIGHT cipher and discusses prior work implementing cryptographic algorithms on FPGAs. The document goes on to describe the proposed VLSI design for the HIGHT cipher, which is optimized for size, power, and speed. It achieves a throughput of 25 Mbps with an encryption/decryption delay of 0.64 ms. Evaluation results demonstrate the effectiveness and suitability of the design for low-power applications.
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
This document presents a review of FPGA-based architectures for image capturing, processing, and display using a VGA monitor. It discusses using the Xilinx AccelDSP tool to develop the system on a Spartan 3E FPGA. The AccelDSP tool allows converting a MATLAB design into HDL for implementation on the FPGA. It summarizes the FPGA-based system architecture, which includes units for initialization, data transfer, image processing, and memory management. It then outlines the Xilinx AccelDSP design flow, which verifies the functionality at each stage of converting the floating-point MATLAB model to a fixed-point hardware implementation on the FPGA. The goal is to accelerate image processing applications using the parallel
Design and Implementation of Different types of Carry skip adderIRJET Journal
The document describes the design and implementation of different types of carry skip adders. It begins with an introduction to carry skip adders and their advantages over other adder types in terms of speed, area usage, and transistor count. It then reviews existing carry skip adder designs and their limitations. A new design called the Common Boolean Logic (CBL) carry skip adder is proposed that aims to reduce area and power consumption by eliminating redundant adder cells through shared logic. Simulation results show that an 8-bit CBL carry skip adder has 64.6% lower power and 18.7% smaller area than a conventional carry skip adder. In conclusion, the CBL carry skip adder achieves improved performance and efficiency.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm.
Design and Implementation of an Efficient Carry Skip AdderIRJET Journal
1) The document describes a design for an efficient 32-bit carry skip adder to achieve high speed and low area consumption.
2) It proposes using a Knowles adder in the middle stage and an optimized ripple carry adder instead of a Brent-Kung adder and normal RCA to improve speed and reduce area.
3) Simulation results show the proposed hybrid carry skip adder has a 38% reduced delay compared to a conventional carry skip adder and 16% reduced delay compared to a previous hybrid design. It is coded in Verilog and tested on a Xilinx FPGA.
This document describes the design of a 4-bit SFQ (Single Flux Quantum) multiplier circuit using a modified Booth encoding technique. It begins with background on multipliers and the advantages of the Booth encoding method over an AND array for reducing the number of partial products. The document then presents the design and analysis of the proposed 4-bit SFQ multiplier using a modified 2-bit Booth encoder, Carry Save Adder tree, and 6-bit carry lookahead adder. Simulation results show the modified Booth encoder approach reduces power consumption by 22.24% and delay by 23.96% compared to a conventional Booth encoder design.
AN EFFICIENT DSP ARCHITECTURE DESIGN IN FPGA USING LOOP BACK ALGORITHMIJARIDEA Journal
Abstract— Advanced flag processors assume a noteworthy part in electronic gadgets, bio restorative applications, correspondence conventions. Effective IC configuration is a key variable to accomplish low power and high throughput IP center improvement for convenient gadgets. Computerized flag processors assume a huge parts progressively figuring and preparing yet region overhead and power utilization are real disadvantages to accomplish productive outline requirements. Adaptable DSP engineering utilizing circle back calculation is a proposed way to deal with beat existing outline limitations. For instance, plan of 8 point FFT engineering requires 3 phases for butterfly calculation unit that 48 adders and 12 multipliers prompts high power and territory utilization. To lessen range and power, Loop back calculation is proposed and it requires 16 adders and 4 multipliers for general outline. Likewise outline of various DSP layouts like FFT, First request FIR channel and Second request FIR channel is acquainted and mapping in with the design as preparing component and applying the circle back calculation. In the outline of FFT,FIR formats adders, for example, parallel prefix viper, move and include multiplier, baugh-wooley multiplier are utilized to break down effective plan of DSP engineering. Recreation and examination of inactivity, territory, control productivity with the current structures are happens utilizing model sim 6.4a and combine utilizing Xilinx 14.3 ISE.
Keywords— Baugh-UWooley Multiplier, Loop Back Algorithm, Parallel Prefix Adders.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IRJET- Design and HW/SW Implementation of a Nonlinear Interpolator for Border...IRJET Journal
This document describes the design and hardware/software implementation of a nonlinear interpolator for border preserving. It presents a reconfigurable hardware architecture using an FPGA to accelerate the interpolation process compared to a software-only solution. The proposed interpolator calculates pixel values using a weighted average that considers edge information to preserve image quality during resizing. Validation results showed the hardware/software solution improved interpolation speed by 258 times compared to software while maintaining high image quality.
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
This document proposes a flexible accelerator architecture that exploits carry-save arithmetic to efficiently implement digital signal processing kernels. The architecture includes flexible computational units that can be configured to perform chained addition, multiplication, and addition operations directly on carry-save formatted data without intermediate conversions. Experimental results show the proposed architecture delivers average gains of 61.91% in area-delay product and 54.43% in energy consumption compared to state-of-the-art flexible data paths.
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
This document discusses the performance evaluation of FPGA-based runtime dynamic partial reconfiguration for matrix multiplication. The author implements matrix multiplication using a Virtex-5 FPGA, with the ability to reconfigure modules for the multiplication at runtime. Simulation results show the multiplication of 2x2 and 4x4 matrices using this partial reconfiguration method. The results demonstrate that dynamic partial reconfiguration allows changing modules while keeping other parts of the design static, improving resource utilization and reducing power consumption compared to a fully static implementation.
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
This document discusses performance evaluation of FPGA-based runtime dynamic partial reconfiguration for matrix multiplication. It implements matrix multiplication using a Virtex-5 FPGA, with the design reconfigured at runtime by changing partial modules. Results show that dynamic partial reconfiguration increases the availability of reconfigurable resources on the FPGA. The paper also outlines the hardware implementation methodology, proposed architecture using partial reconfiguration, and Xilinx design flow.
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...IRJET Journal
This document proposes modifying the standard AMBA bus architecture to support simultaneous read and write operations to memory, which is needed for some applications but not supported in the original AMBA design. It describes designing a high-speed AMBA bus architecture for image transmission and face detection using FPGA. The proposed design uses a finite state machine-based memory controller and pipeline data transfer to reduce wait states and improve system performance compared to existing AMBA designs. It aims to implement this modified AMBA architecture on FPGA for real-time image processing applications.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
An Improved Optimization Techniques for Parallel Prefix Adder using FPGAIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
The document summarizes a research paper on the design of high-speed low-power Viterbi decoders for trellis coded modulation systems. It proposes a pre-computation architecture incorporated with the T-algorithm to effectively reduce power consumption without degrading decoding speed. The key aspects are:
1) A pre-computation approach is presented to calculate the optimal path metric in advance, allowing it to be determined faster and avoiding delays in the feedback loop.
2) A method is provided to systematically determine the optimal number of pre-computation steps needed to achieve the theoretical iteration bound for the Viterbi decoder.
3) Implementation results show the proposed pre-computation architecture can reduce power consumption by up to 70% without loss
Energy efficiency is one of the most critical issue in design of System on Chip. In Network On
Chip (NoC) based system, energy consumption is influenced dramatically by mapping of
Intellectual Property (IP) which affect the performance of the system. In this paper we test the
antecedently extant proposed algorithms and introduced a new energy proficient algorithm
stand for 3D NoC architecture. In addition a hybrid method has also been implemented using
bioinspired optimization (particle swarm optimization) technique. The proposed algorithm has
been implemented and evaluated on randomly generated benchmark and real life application
such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing algorithm (spiral and crinkle) and has shown better
reduction in the communication energy consumption and shows improvement in the
performance of the system. Comparing our work with spiral and crinkle, experimental result
shows that the average reduction in communication energy consumption is 19% with spiral and
17% with crinkle mapping algorithms, while reduction in communication cost is 24% and 21%
whereas reduction in latency is of 24% and 22% with spiral and crinkle. Optimizing our work
and the existing methods using bio-inspired technique and having the comparison among them
an average energy reduction is found to be of 18% and 24%.
This document presents a new technique to reduce power consumption and increase speed in Content Addressable Memory (CAM) using memory partition and clock gating. The proposed CAM design partitions the memory into segments based on the most significant bits to check for matches. Since most words fail to match in their segments, the search can be discontinued for those segments, reducing power and increasing speed. Clock gating is also used to power gate unused portions of the CAM, further reducing static and dynamic power. Simulation and analysis using Quartus II and ModelSim show the proposed design reduces total power dissipation from 369.9mW to 46.3mW compared to an existing design.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET Journal
This document presents a novel architecture for a high-speed inexact speculative adder using carry lookahead adder and Brent Kung adder. The proposed adder splits the critical path into shorter paths using fine-grain pipelining to improve speed. It uses a carry lookahead adder and Brent Kung adder in 4-bit blocks to speculate the carry and calculate sums, with error compensation. The design is implemented using a 5-stage pipeline to further reduce delay. Implementation in MAGIC shows the proposed adder achieves higher speed through optimized pipelining of key components in the speculative and compensation blocks.
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
This document discusses the implementation of a lightweight VLSI design for the HIGHT cipher on an FPGA. It begins with an introduction to lightweight VLSI architecture and its applications in low-resource devices. It then provides background on the HIGHT cipher and discusses prior work implementing cryptographic algorithms on FPGAs. The document goes on to describe the proposed VLSI design for the HIGHT cipher, which is optimized for size, power, and speed. It achieves a throughput of 25 Mbps with an encryption/decryption delay of 0.64 ms. Evaluation results demonstrate the effectiveness and suitability of the design for low-power applications.
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
This document presents a review of FPGA-based architectures for image capturing, processing, and display using a VGA monitor. It discusses using the Xilinx AccelDSP tool to develop the system on a Spartan 3E FPGA. The AccelDSP tool allows converting a MATLAB design into HDL for implementation on the FPGA. It summarizes the FPGA-based system architecture, which includes units for initialization, data transfer, image processing, and memory management. It then outlines the Xilinx AccelDSP design flow, which verifies the functionality at each stage of converting the floating-point MATLAB model to a fixed-point hardware implementation on the FPGA. The goal is to accelerate image processing applications using the parallel
Design and Implementation of Different types of Carry skip adderIRJET Journal
The document describes the design and implementation of different types of carry skip adders. It begins with an introduction to carry skip adders and their advantages over other adder types in terms of speed, area usage, and transistor count. It then reviews existing carry skip adder designs and their limitations. A new design called the Common Boolean Logic (CBL) carry skip adder is proposed that aims to reduce area and power consumption by eliminating redundant adder cells through shared logic. Simulation results show that an 8-bit CBL carry skip adder has 64.6% lower power and 18.7% smaller area than a conventional carry skip adder. In conclusion, the CBL carry skip adder achieves improved performance and efficiency.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is an open access journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
This document proposes power efficient sum of absolute difference (SAD) algorithms for video compression. It describes:
1. Developing low power 1-bit full adder architectures including a proposed design using NAND, AND, and OR gates that improves power over existing designs.
2. Implementing 4x4 and 8x8 SAD architectures using the proposed low power full adder, ripple carry adders, and carry save adders.
3. Synthesizing the SAD designs in a 180nm technology and finding the proposed 4x4 SAD improves total power by 61% compared to an existing design.
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf
This paper proposes about motion estimation in H.264/AVC encoder. Compared with standards
such as MPEG-2 and MPEG-4 Visual, H.264 can deliver better image quality at the same
compressed bit rate or at a lower bit rate. The increase in compression efficiency comes at the
expense of increase in complexity, which is a fact that must be overcome. An efficient Co-design
methodology is required, where the encoder software application is highly optimized and
structured in a very modular and efficient manner, so as to allow its most complex and time
consuming operations to be offloaded to dedicated hardware accelerators. The Motion
Estimation algorithm is the most computationally intensive part of the encoder which is simulated using MATLAB. The hardware/software co-simulation is done using system generator tool and implemented using Xilinx FPGA Spartan 3E for different scanning methods.
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
The document describes a novel design for a multiplier and accumulator (MAC) unit using the modified Booth algorithm and parallel self-timed adder (PASTA). The modified Booth algorithm reduces the number of partial products compared to a regular multiplication process, lowering delay. A carry save adder design is also proposed to further improve performance in terms of computation speed, power consumption, and area compared to a conventional design using the modified Booth algorithm. Simulation results show the proposed MAC design with PASTA has better performance and reduced area overhead and critical path delay compared to conventional methods.
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
This document describes several designs for high-speed matrix multiplication. It proposes a new design called Parallel-Parallel Input Multi-Output (PPI-MO) that uses multiple multipliers and registers to perform matrix multiplication in parallel. This increases throughput. For an n×n matrix multiplication, it uses n2 multipliers, n2 registers, and n(n-2) adders. Elements of the first matrix are input to rows of multipliers simultaneously, while elements of the second matrix are input to columns. The partial products from each column are added to calculate elements of the output matrix.
Run time dynamic partial reconfiguration using microblaze soft core processor...eSAT Journals
aydeshmukh@gmail.com
Abstract
DSP Application requires a fast computations & flexibility of the design. Partial Reconfiguration (PR) is an advanced technique,
which improves the flexibility of FPGAs by allowing portions of a design to be reconfigured at runtime by overwriting parts of the
configuration memory. In this paper we are using microblaze soft core processor & ICAP Port to reconfigure the FPGA at runtime.
ICAP is accessed through a light-weight custom IP which requires bit stream length, go, and done signal to interface to a system that
provides partial bit stream data. The partial bit stream is provided by the processor system by reading the partial bit files from the
compact flash card. Our targeted DSP application is matrix multiplication; we are reconfiguring design by changing partial modules
at run time. To change the partial bit stream we interfaces a microblaze Soft processor & using a UART interface.ISE13.1 &
PlanAhead is used for partial reconfiguration of FPGA .EDK is used for microblaze soft processor design & ICAP Interface .The
simulation is done using Chip Scope Logic Analyzer & the complete hardware implementation is done on Xilinx VIRTEX -6 ML605
Platform.
Keywords — PlanAhead, EDK, Dynamic partial reconfiguration, ICAP, Matrix multiplication, Chipscope pro analysis,
DSP application, Microblaze processor
Coarse grained hybrid reconfigurable architecture with no c routerDhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and improve execution time. The architecture supports fast search algorithms like diamond search that further improve efficiency over full search.
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Dhiraj Chaudhary
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and increase computation efficiency. The architecture supports fast search algorithms like diamond search that further improve performance over full search.
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Dhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
This paper is devoted to the design of dual core crypto processor for executing both Prime field and binary field instructions. The proposed design is specifically optimized for Field programmable gate array (FPGA) platform. Combination of two different field (prime field GF(p) and Binary field GF(2m)) instructions execution is analysed.The design is implemented in Spartan 3E and virtex5. Both the performance results are compared. The implementation result shows the execution of parallelism using dual field instructions
Designing of Adders and Vedic Multiplier using Gate Diffusion InputIRJET Journal
This document discusses the design of adders and Vedic multipliers using Gate Diffusion Input (GDI) logic. GDI logic is introduced as an alternative to traditional CMOS logic for low power VLSI design. Various adders including Ripple Carry Adder, Kogge Stone Adder, and Brent Kung Adder are designed using GDI logic. Vedic multipliers are also designed using the Urdhva-Tiryagbhyam multiplication technique along with the GDI-based adders. Comparative analysis shows that designs using GDI logic have lower power consumption, transistor count, and area compared to equivalent CMOS designs.
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET Journal
This document describes hardware co-simulation of classical edge detection algorithms using Xilinx System Generator. It presents designs for Robert, Prewitt, and Sobel edge detection operators using minimum FPGA resources. The designs were tested on a Spartan-3E FPGA board using JTAG hardware co-simulation. Results show the Sobel operator provides the best edge detection, followed by Prewitt and Robert. Compared to prior work, the proposed designs utilize significantly fewer FPGA slices, flip flops, and lookup tables, demonstrating more efficient implementation of the classical edge detection algorithms.
Design of efficient reversible floating-point arithmetic unit on field progr...IJECEIAES
The reversible logic gates are used to improve the power dissipation in modern computer applications. The floating-point numbers with reversible features are added advantage to performing complex algorithms with highperformance computations. This manuscript implements an efficient reversible floating-point arithmetic (RFPA) unit, and its performance metrics are realized in detail. The RFP adder/subtractor (A/S), RFP multiplier, and RFP divider units are designed as a part of the RFP arithmetic unit. The RFPA unit is designed by considering basic reversible gates. The mantissa part of the RFP multiplier is created using a 24x24 Wallace tree multiplier. In contrast, the reciprocal unit of the RFP divider is designed using Newton Raphson’s method. The RFPA unit and its submodules are executed in parallel by utilizing one clock cycle individually. The RFPA unit and its submodules are synthesized separately on the Vivado IDE environment and obtained the implementation results on Artix-7 field programmable gate array (FPGA). The RFPA unit utilizes only 18.44% slice look-up tables (LUTs) by consuming the 0.891 W total power on Artix-7 FPGA. The RFPA unit sub-models are compared with existing approaches with better performance metrics and chip resource utilization improvements.
Similar to EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA (20)
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELijaia
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Transcat
Join us for this solutions-based webinar on the tools and techniques for commissioning and maintaining PV Systems. In this session, we'll review the process of building and maintaining a solar array, starting with installation and commissioning, then reviewing operations and maintenance of the system. This course will review insulation resistance testing, I-V curve testing, earth-bond continuity, ground resistance testing, performance tests, visual inspections, ground and arc fault testing procedures, and power quality analysis.
Fluke Solar Application Specialist Will White is presenting on this engaging topic:
Will has worked in the renewable energy industry since 2005, first as an installer for a small east coast solar integrator before adding sales, design, and project management to his skillset. In 2022, Will joined Fluke as a solar application specialist, where he supports their renewable energy testing equipment like IV-curve tracers, electrical meters, and thermal imaging cameras. Experienced in wind power, solar thermal, energy storage, and all scales of PV, Will has primarily focused on residential and small commercial systems. He is passionate about implementing high-quality, code-compliant installation techniques.
Accident detection system project report.pdfKamal Acharya
The Rapid growth of technology and infrastructure has made our lives easier. The
advent of technology has also increased the traffic hazards and the road accidents take place
frequently which causes huge loss of life and property because of the poor emergency facilities.
Many lives could have been saved if emergency service could get accident information and
reach in time. Our project will provide an optimum solution to this draw back. A piezo electric
sensor can be used as a crash or rollover detector of the vehicle during and after a crash. With
signals from a piezo electric sensor, a severe accident can be recognized. According to this
project when a vehicle meets with an accident immediately piezo electric sensor will detect the
signal or if a car rolls over. Then with the help of GSM module and GPS module, the location
will be sent to the emergency contact. Then after conforming the location necessary action will
be taken. If the person meets with a small accident or if there is no serious threat to anyone’s
life, then the alert message can be terminated by the driver by a switch provided in order to
avoid wasting the valuable time of the medical rescue team.
Open Channel Flow: fluid flow with a free surfaceIndrajeet sahu
Open Channel Flow: This topic focuses on fluid flow with a free surface, such as in rivers, canals, and drainage ditches. Key concepts include the classification of flow types (steady vs. unsteady, uniform vs. non-uniform), hydraulic radius, flow resistance, Manning's equation, critical flow conditions, and energy and momentum principles. It also covers flow measurement techniques, gradually varied flow analysis, and the design of open channels. Understanding these principles is vital for effective water resource management and engineering applications.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Generative AI Use cases applications solutions and implementation.pdf
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
1. DOI : 10.5121/vlsic.2019.10201 1
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR
SAD COMPUTATION ON FPGA
Jaya Koshta, Kavita Khare and M.K Gupta
Maulana Azad National Institute of Technology, Bhopal
ABSTRACT
Video Compression is very essential to meet the technological demands such as low power, less memory
and fast transfer rate for different range of devices and for various multimedia applications. Video
compression is primarily achieved by Motion Estimation (ME) process in any video encoder which
contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric
in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung
Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder
scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42
% as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using
Virtex 7 FPGA.
KEYWORDS
HEVC, motion estimation, sum of absolute difference, parallel prefix adders, Brent Kung Adder.
1. INTRODUCTION
To meet the technological demands such as low power, less memory and fast transfer rate for a
wide range of applications including the growing demand of High Definition(HD)(1080p) to
Ultra-High Definition(UHD)(4K and 8K), resulted in creation of stronger needs for better video
compression efficiency. HEVC or H.265is a video compression standard designed to substantially
improve coding efficiency when compared to its precedent, the Advanced Video Coding (AVC)
or H.264.HEVC is a block-based video compression designed to support higher resolutions and it
can achieve 50% bit rate saving compared to H.264/MPEG-4 AVC for the same video quality
[1][2].This considerable increase in performance is due to the many enhanced techniques and
methodologies that have been introduced in H.265/HEVC. Some of these enhancements are
included in the motion estimation process, which is one the most complex and time consuming
block in video encoding. The objective of the ME unit is to find the best matched block in the
reference (past/future) frame search window (region of interest), for every block of the current
frame such that there constructed frame contributes to the lowest residual information [3].The
bottleneck for the ME system design lies in the implementation of an appropriate SAD
architecture. For block based ME, the Sum of Absolute Difference(SAD) is the generally used
metric which adds up the absolute differences between corresponding elements in a candidate and
reference block in video frames. However with the increase in the coding block size to 64x64 in
HEVC, compared to 16x16 in H.264/AVC, results in greater complexity in ME process. Thus in
HEVC for ME process, the number of SAD computations vastly increases resulting in increase in
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
2. 2
the processing delay, power consumption and hardware complexity. Thus, a pragmatic strategy to
elevate the performance of H.265/HEVC relies on enhancing the performance of the core
component of the motion estimation engine which is the block matching unit.
Different VLSI architectures for SAD computations are available in the literature where there is a
lot of tradeoffs between the speed, power and area during the hardware implementation. Due to
the increasing demand for the portable devices with low power consumption and high
performance, the circuits are optimized according to the applications. In this paper,
implementation of absolute difference circuit on FPGA is proposed to increase speed
performance and to minimize the amount of occupied resources on FPGA for SAD calculation.
2. RELATED WORK
SAD is one of the most popular techniques used for motion estimation in digital video encoding
systems. There are large number of methods available for SAD computation where there are
tradeoffs between speed, power and area during the hardware implementation. Typically, the
SAD computation consists of first computing the absolute difference between corresponding
pixels in current and reference video frame.
The calculation of SAD from the current and reference block is performed using equation (1),
SAD = -------------(1)
Where,
CB - Current Block,
RB –Reference Block,
N X M - block size of current and reference block,
i,j - two dimensional coordinates of block.
Generally, the SAD operation basically consists of first computing the absolute difference |CB(i,j)
– RB(i,j)| and then summing up these in a multi-input addition.
For absolute difference calculation, one method is to detect the smaller operand in the absolute
difference computation |CB−RB| and to subtract it from the larger operand [4],[5].The other
method comprises of complimenting the smaller of the two numbers and then performing
addition of two numbers followed by plus one to compute the absolute difference[6]. To compute
the absolute difference for SAD, in [7] a novel architecture is optimized for realizing efficient
absolute difference circuits in Virtex-5 FPGA devices which uses the 6-input look-up tables
available within the chosen devices family to maximize speed performance and to minimize the
amount of occupied resources. In [8] an improved architecture for efficiently computing the sum
of absolute differences (SAD) on FPGAs is proposed based on a configurable adder/subtractor
implementation in which each adder input can be negated at runtime. The SAD architecture
proposed in this paper provides a significant resource reduction on current FPGAs. In [9] FPGA
design for fast computing of the minimum SAD is proposed. The hardware unit proposed is
intended to augment a general-purpose core and with the use online arithmetic (OLA) it is
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
3. 3
possible to implement a full 16 X 16 macroblock SAD in a single FPGA device and it permits to
speed up the computation by early truncation of the SAD calculation when the involved candidate
is bigger than the current reference SAD. In [10] pipelined SAD architecture for efficient SAD
calculations is proposed where consecutive pipeline stages performs the addition of absolute
differences to obtain SAD of 8 X 8 block. In [11] Different architectures for binary addition were
proposed for SAD computation on FPGA
Table 1 summarises the above mentioned related work on SAD computation. Review work
motivates for a proposal of high speed compact AD circuits on FPGA for SAD calculation. In this
paper, AD circuit on FPGA is proposed to increase speed performance and to minimize the
amount of occupied resources on FPGA for SAD calculation.
Table 1: Related work on SAD computation
3. PROPOSED ARCHITECTURE
Hardware architecture for computing the absolute difference between corresponding pixels in
current and reference video block is proposed. The method used for AD calculation is as used in
[6] where adder and comparator form the basic component as shown in Figure 1.The 8-bit
comparator compares two numbers and returns the 1’s complement of the smaller number and the
larger number as it is.
8-bit Comparator
8-bit Adder
8-bit Adder
8B’1
Reg
Current Pixel Reference Pixel
Figure 1: AD circuit block diagram
Ref. Architecture Advantages Disadvantages
[7] 6-input look-up tables on
Virtex 5
High speed and compact
Optimized for only
Virtex 5
[8] Configurable adder/subtractor
on Virtex 6
Resource reduction on current
FPGAs
SAD 1X2
implemented
[9] Online arithmetic for 16 X 16
SAD on Virtex 2
High speed Large Area
[10]
Pipelined SAD architecture High speed and low power Large area
[11] Different architectures for
binary addition
Low power
Implemented on
FPGA Spartan
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
4. 4
Proposed Architecture: In this architecture:
Ripple Carry Adders in AD circuit (Figure 1) are replaced by Brent Kung Adder (Type of
Parallel Prefix Adders-PPA).
8-bit conventional comparator in Figure 1 is replaced by comparator based on modified 1’s
complement principle and conditional sum adder scheme.
These replacements results in reduction in delay and reduces LUTs count used in the circuit.
PPA:
PPA (Parallel Prefix Adder) circuits use a tree network to reduce the latency to O (log2n)where
‘n’ represents the number of bits.PPA employs 3-structural stages as shown in Figure 2. The first
stage at the top is used for computing generate and propagate signals as given in equation (2) and
(3)exactly as in Carry Look ahead Adder (CLA) where a and b represents binary digits.
g=a.b--------------------------------------------(2)
p=a⊕b------------------------------------------(3)
The carry bits are calculated in the second stage. In this stage, to formulate the operation of the
prefix adders, “prefix operator” which is represented as “.” is used. The “prefix operator” function
has two essential properties to keep the computational operation faster. The first one is called the
associative property and the second one is idempotency property. Using the advantages of these
properties, carry-out can be found at a depth proportional to log2(n) [12]. The final summation is
obtained in the last stage.The associative and idempotency properties of "." operator allow carry
output to be computed in a different number of levels or simply depth. Therefore, various
topologies of prefix adders can be designed which are mentioned in the literature and are
inspiring to VLSI designers because of their minimum depth and delay.
The logical structures used in prefix adders scheme consists of black cell, gray cell and white cell
as defined in Figure 3[13].
Calculation of the carries
This part is parallelizable to reduce time
Pre-calculation of pi ,gi terms
Simple adder to generate the sum
Prefix graphs can
be used to describe
the structure that
performs this part
Straight forward
as in the CLA
adder
Figure 2: Parallel Prefix Adder stages
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
5. 5
The logical structure of black cell can propagate and generate signals while the logical structure
of gray cell can only generate signals. The white cell (buffer)is used for loading the signal out. In
parallel prefix scheme, generate and propagate signals can be grouped in multiple ways to get the
same correct carry signals. Based on different methodologies of grouping these signals, different
prefix architectures can be created. The parallel form of obtaining the carry bit makes PPAs
perform addition arithmetic faster. There are many types of PPA such as Brent Kung, Kogge
Stone, Ladner Fisher, Hans Carlson and Knowles[12].
i : k i : k
i : j i : j
k-1: j k-1: j i : j
Black cell Gray cell Buffer
i : j
p k-1:j
g i:k
g i:j
p i:k
g k-1:j
p i:j
g i:k
p i:k
g k-i:j
g i:j
g i:j g i:j
p i:jp i:j
Both propagate & generate Generate only Different load
Figure 3 Cell definitions for parallel-prefix scheme [10]
Out of many types of PPA, Brent Kung Adder(BKA)is used as for 8-bit implementation it has
less delay compared to other types of PPA[12][13].
In BKA propagate signals and generate signals are combined into groups of two by using the
associative property. The first stage in BKA includes computation of generate and propagate
signals corresponding to each pair of bits in a and bas given by equation (2) and (3). The second
stage ie. prefix carry tree includes computation of carries corresponding to each bit. Equations (4)
and (5)below shows how propagate and generate signals are calculated in BKA,
pi:j = pi:k. pk-1:j--------------------------------------(4)
gi:j = gi:k + (pi:k. gk-1:j)-----------------------------(5)
These propagate and generate signals given in equation (4) and (5) are calculated using black cell,
gray cell and white cell as defined in Figure 3. The last stage ie.post processing stage includes
computation of sum bits which is given by the equation (6).
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
6. 6
si = pi⊕ci------------------------------------------(6)
7 6 5 4 3 2 1 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
7 : 0 6 : 0 5 : 0 4 : 0 3 : 0 2 : 0 1 : 0 0 : 0
Input
Output
c 3 : 2 = g 3 : 2 + p 3 : 3 g 2 : 2
c 4 : 0 = g 4: 4 + p 4 : 4 g 3 : 0
Figure 4 : 8-bit Brent-Kung adder(BKA) tree [13]
Figure 4 shows 8-bit BKA tree adder which has lower delay as compared to Ripple Carry
Adder(RCA).Also, apart from using BKA, a new efficient comparator based on modified 1’s
complement principle and conditional sum adder scheme is used in this architecture as discussed
below.
Modified Comparator:
The proposed architecture apart from using BKA, also uses comparator (based on modified 1’s
complement principle and conditional sum adder scheme). In this modified 1's complement
scheme, if A>B, bit Cout = 1 and if A≤B bit Cout =0 so the only concern is about carry out bit
information as shown in Figure 5.This method always adds a fixed carry after modification, so if
A ≥ B, bit Comp = 1 and if A < B, bit Comp = 0 [15].For realization of this comparator
conditional sum adder scheme has been used that provides a logarithmic increase in speed for
addition[16].
The principle behind this scheme is to generate two sets of outputs for a given group of k bits
operands. Each set includes k sum bits and an outgoing carry. The one set assumes that the
eventual incoming carry will be zero, while the other assumes that it will be one. Depending upon
the incoming carry correct set of outputs (out of the two sets) is selected without waiting for the
carry to further propagate through the k positions as shown in Figure 6.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
7. 7
Y = A – A = 0 1 0 1 0 1 0 0 - 0 1 0 1 0 1 0 0 = (84 = 84)
0 1 0 1 0 1 0 0 = A
1 0 1 0 1 0 1 1 = 1's compliment of A
Z = B – A = 0 1 0 0 0 0 1 1 - 0 1 0 1 0 1 0 0 = (67 < 84)
0 1 0 0 0 0 1 1 = B
1 0 1 0 1 0 1 1 = 1's compliment of A
X = A - B = 0 1 0 1 0 1 0 0 - 0 1 0 0 0 0 1 1 = (84 > 67)
0 1 0 1 0 1 0 0 = A
1 0 1 1 1 1 0 0 = 1's compliment of B
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1Cout 0
+ 1 = Fixed carry-in bit
1Comp
1 1 1 1 1 1 1 1Cout 0
+ 1 = Fixed carry-in bit
0Comp 1 1 1 0 1 1 1 1
0 0 0 1 0 0 0 0
A = 0 1 0 1 0 1 0 0 = 84
B = 0 1 0 0 0 0 1 1 = 67
Cout 1
+ 1 = Fixed carry-in bit
1Comp 0 0 0 1 0 0 0 0
Status bit Cout Comp
A>B 1 1
A=B 0 1
A<B 0 0
Figure 5 Modified l's complement method for improved comparator design
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
8. 8
k – bit adder
k – bit adder
Multiplexer
1
CinCout
0
k k
k
Figure 6 Conditional sum adder scheme
However, this scheme is generally not applied to long n-bits operand at the beginning of the add
operation, since it will add delay as the carry propagates through all n positions before making the
selection for correct output. Generally, the given n bits are divided into smaller groups to apply
this conditional adder scheme separately. Thus, the serial carry-propagation inside the separate
groups can be done in parallel, reducing the overall execution time. The outputs of the subgroups
are then combined to generate the output of the final output.
A7
B7
A6
B6
A5
B5
A4
B4
A3
B3
A2
B2
B1
A1
B0
A0
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
Comp = 1 ,A>= B,
= 0, A<B
Comp
Stage 0 Stage 1 Stage 2 Stage 3
Figure 7 Modified comparator architecture
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
9. 9
Figure 7 shows comparator architecture using modified 1’s complement and conditional sum
adder design. The 8-bit comparator needs 11(=1+3+7) 2-to-1 multiplexers and eight inverters to
generate complementary values of input B. Originally Carry = AB + AC + BC = AB + (A + B) C
and if C= 0 then Carry =AB or if C=l then Carry = AB + (A+B) =A + B. The sum of MUX gates
of N-bit comparator is,
-----------------------(7)
This architecture reduces delay and also provides significant reduction in resource utilization in
FPGA.
4. SIMULATION AND RESULTS
Simulations of the conventional and proposed architecture for AD circuit implementation have
been carried out using Verilog HDL programming in Xilinx ISE 14.2 platform and implemented
on Virtex7 FPGA. Adequate testing of each design was done to verify correct operation. Figure 8
shows the simulation result of proposed AD circuit.
Figure 8 Simulation result of Absolute Difference Circuit
Figure 9 Graphical representation of delays in proposed architecture with conventional architecture
0
1
2
3
4
5
6
7
Logic
delay(ns)
Routing
delay(ns)
Total
Delay(ns)
Conventional
Proposed
Architecture
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
10. 10
Figure 10 Implementation of proposed AD architecture on FPGA
Architectu
re
Logic
delay(ns)
Routing
delay(ns)
Totalde
lay(ns)
Numbe
r of
slice
LUTs
Convention
al
0.484
(8.0%
logic)
5.673
(92.0%rout
e)
6.157 50
Proposed
Archiecture
0.391
(7.5%
logic)
4.849
(92.5%
route)
5.240 29
Table 2 Comparison of proposed AD architecture architecture with conventional architecture
Figure 9 shows graphical representation of delays in proposed architecture with conventional
architecture and Figure 10 shows implementation and area occupied of proposed AD architecture
on FPGA. Table 2 shows comparison of proposed AD circuit architecture in terms of time delay
and area. Referring to the above table, it can be seen that there is reduction in delay and number
of occupied resources on FPGA in the proposed architecture for AD circuit resulting in improved
performance for SAD computation in motion estimation.
5. CONCLUSIONS
VLSI architecture for SAD computations is proposed in this paper for reducing delay and area
and is implemented on Virtex7 FPGA. The proposed architecture reduces delay and provides
significant reduction in resource utilization in FPGA. Synthesis results shows that proposed
architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to
conventional architecture.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
11. 11
REFERENCES:
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video
coding (HEVC)standard,” IEEE Trans. Circuits Syst. Video Technol., vol.22, no. 12, pp. 1649-1668,
December 2012.
[2] I. Richardson, “HEVC: An introduction to high efficiency video coding,” 2001,
https://www.vcodex.com/h265.html
[3] N. Purnachand, L. N. Alves, and A. Navarro, “Fast Motion Estimation Algorithm for HEVC”, IEEE
International Conference on Consumer Electronics-Berlin (ICCE-Berlin), September 2012.
[4] S.Wong, B. Stougie, and S. Cotofana, “Alternatives in FPGA-based SAD Implementations,” in IEEE
International Conference on Field-Programmable Technology (FPT),IEEE, 2002, pp. 449–452.
[5] S. Vassiliadis, E. A. Hakkennes, J. S. S. M. Wong, and G.G.Pechanek, “The Sum-Absolute-
Difference Motion Estimation Accelerator,” in Euromicro Conference, 1998. Proceedings, 24th.
IEEE, 1998, pp. 559–566.
[6] Ahmed Medhat, Ahmed Shalaby, Mohammed S. Sayed, Maha Elsabrouty and Farhad Mehdipour. “A
Highly Parallel SAD Architecture for Motion Estimation in HEVC Encoder”, Circuits and Systems
(APCCAS), IEEE Asia Pacific Conference, 280 - 283, 2014.
[7] Stefania Perri , Paolo Zicari , Pasquale Corsonello, “Efficient Absolute Difference Circuits in Virtex-
5 FPGAs”, IEEE, 2010.
[8] Martin Kumm, Marco Kleinlein and Peter Zipf, “Efficient Sum of Absolute Difference Computation
on FPGAs”,26thInternational Conference on Field Programmable Logic and Applications.
[9] Joaquin Olivares, Ignacio Benavides and et.al., “Minimum Sum of Absolute Differences
implementation in a single FPGA device”, Dept. of Electro-technics and Electronics, University of
Cordoba, Spain.
[10] P.Jayakrishnan and Harish. M. Kittur, “Pipelined Arch.for Motion Estimation in HEVC Video
Coding”, Indian Journal of Science and Tech,August 2016.
[11] D. V Manjunatha, Pradeep Kumar and R. Karthik, “FPGA Implementation of Sum of Absolute
Difference (SAD) for video applications”, ARPN Journal of Engineering and Applied Sciences, Vol.
12, No. 24, December 2017
[12] Geeta Rani and Sachin Kumar, “Delay analysis of parallel-prefix adders”, International Journal of
Science and Research (IJSR), 3(6):2339-2342, 2014.
[13] Nurdiani Zamhari, Peter Voon, Kuryati Kipli, Kho Lee Chin, Maimun Huja Husin,“Comparison of
Parallel Prefix Adder (PPA)”, Proceedings of the World Congress on Engineering 2012,Vol II.
[14] R. P Brent & H. T. Kung, “A Regular Layout for Parallel Adders”, IEEE Trans. Computers, Vol. C-
31, pp 260-264, 1982.
[15] Shun-Wen Cheng, “A High-Speed Magnitude Comparator with Small Transistor Count”, in
Proceedings of IEEE internationalconference ICECS, 1168 - 1171 Vol.3, Dec 2003.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
12. 12
[16] J.Sklansky, “Conditional-Sum Addition Logic,” IRE Transactions on Electronic Computers, Vol. EC-
9, No. 2, pp. 226-231, June,1960
[17] S. Rehman; R. Young;C. Chatwin;P. Birch, “An FPGA Based Generic Framework for High Speed
Sum of Absolute Difference Implementation”, Europ. Jour Scient. Res., vol.33, no.1, 2009.
[18] Manjunatha, D. V., and G. Sainarayanan. “Low-Power Sum of Absolute Difference Architecture for
Video Coding”, Emerging Research in Electronics, Computer Science and Technology. Springer
India, 2014. 335-341.
[19] LiYufei,Feng Xiubo and Wang Q in, “A High-Performance Low Cost SAD Architecture for Video
Coding”, IEEE Transactions on Consumer Electronics, pp. 535-541, Vol. 53, No. 2, May 2007.
[20] Jarno, Vanne, Eero Aho, Timo D. Hamalainen and Kimmo Kuusilinna, “A High-Performance Sum of
Absolute Difference Implementation Motion Estimation”, IEEE Transactions on Circuits and Systems
for Video Technology, pp. 876-883, Vol. 16, No. 7, 2006.
[21] Elhamzi W., Dubois J., Miteran J “An efficient low-cost FPGA implementation of a configurable
motion estimation for H.264 video coding”, Springer Journal of Real-Time Processing,Vol:9, No:1,
pp. 19–30,2014.
[22] Moorthy T., Ye A, “A scalable architecture for variable block size motion estimation on field-
programmable gate arrays” , IEEE Canadian Conference of Electrical and Computer Engineering
(CCECE),Niagara Falls, May, pp.1303–1308,2008.
[23] Davis P., Sangeetha M. ,“Implementation of Motion Estimation Algorithm for H.265/HEVC”,
International Journal of Advanced Research in Electrical, Electronics
and Instrumentation Engineering. Vol:3,No:3, pp. 122–126,2014.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019