Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET Journal
This document presents a flexible DSP accelerator architecture using a carry lookahead tree. It aims to improve on existing flexible accelerator designs by enabling computations to be efficiently performed using carry lookahead formatted data. The proposed architecture contains flexible computational units that can efficiently execute a wide range of DSP operation patterns. It differs from prior work by using a carry lookahead tree instead of a carry save tree. Simulation results show the proposed design achieves faster execution and a larger reduction in delay time compared to existing flexible accelerator approaches.
A Dynamic Programming Approach to Energy-Efficient Scheduling on Multi-FPGA b...TELKOMNIKA JOURNAL
This paper has been studied an important issue of energy-efficient scheduling on multi-FPGA systems. The main challenges are integral allocation, reconfiguration overhead and exclusiveness and energy minimization with deadline constraint. To tackle these challenges, based on the theory of dynamic programming, we have designed and implemented an energy-efficient scheduling on multi-FPGA systems. Differently, we have presented a MLPF algorithm for task placement on FPGAs. Finally, the experimental results have demonstrated that the proposed algorithm can successfully accommodate all tasks without violation of the deadline constraint. Additionally, it gains higher energy reduction 13.3% and 26.3% than that of Particle Swarm Optimization and fully balanced algorithm, respectively.
128 bit low power and area efficient carry select adder amit bakshi academiagopi448
The document describes a 128-bit low power and area efficient carry select adder design. It proposes using a binary to excess-1 converter instead of a ripple carry adder for the carry select adder, which can reduce area and power consumption. Simulation results show the modified 128-bit carry select adder design achieves a 15.48% reduction in area and 7.41% reduction in power compared to a regular carry select adder design.
IRJET- A Review of Approximate Adders for Energy-Efficient Digital Signal Pro...IRJET Journal
The document reviews recent progress in approximate adders for energy-efficient digital signal processing. It summarizes various types of approximate adders that have been proposed, including speculative adders, segmented adders, carry select adders, and adders using approximate full adders. The document provides details on the design and operation of several specific approximate adder circuits. It also compares the delay and area complexity of different approximate adder designs.
A Review of Different Methods for Booth MultiplierIJERA Editor
In this review paper, different type of implementation of Booth multiplier has been studied. Multipliers has great importance in digital signal processor, so designing a high-speed multiplier is the need of the hour. Advantages of using modified booth multiplier algorithm is that the number of partial product is reduced. Different types of addition algorithms are also discussed which are used for addition operation of multiplier.
IRJET - A Speculative Approximate Adder for Error Recovery UnitIRJET Journal
This document presents a speculative approximate adder for error recovery. The adder is partitioned into non-overlapping blocks whose carries are predicted based on input signals of the current and next blocks. This reduces the critical path delay on average to one block. An error recovery unit is also proposed to further reduce output error rates. Simulation results show the proposed adder achieves lower error rates and error metrics compared to state-of-the-art approximate adders, with only a small increase in delay and area due to the error recovery unit.
This document presents a design for a simple accuracy reconfigurable adder (SARA) to improve the accuracy-power-delay efficiency of approximate adders. SARA is proposed as an alternative to existing accuracy configurable adders that require large areas due to redundancy and error correction circuitry. SARA uses a simple carry prediction approach without redundancy. It is evaluated as part of a 16-bit adder and shown to reduce delay compared to a ripple carry adder. SARA is also applied in an 8x8 Wallace multiplier, demonstrating reduced delay compared to using a ripple carry adder. The proposed SARA design achieves improvements in area, power, and delay efficiency for approximate arithmetic circuits.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET Journal
This document presents a flexible DSP accelerator architecture using a carry lookahead tree. It aims to improve on existing flexible accelerator designs by enabling computations to be efficiently performed using carry lookahead formatted data. The proposed architecture contains flexible computational units that can efficiently execute a wide range of DSP operation patterns. It differs from prior work by using a carry lookahead tree instead of a carry save tree. Simulation results show the proposed design achieves faster execution and a larger reduction in delay time compared to existing flexible accelerator approaches.
A Dynamic Programming Approach to Energy-Efficient Scheduling on Multi-FPGA b...TELKOMNIKA JOURNAL
This paper has been studied an important issue of energy-efficient scheduling on multi-FPGA systems. The main challenges are integral allocation, reconfiguration overhead and exclusiveness and energy minimization with deadline constraint. To tackle these challenges, based on the theory of dynamic programming, we have designed and implemented an energy-efficient scheduling on multi-FPGA systems. Differently, we have presented a MLPF algorithm for task placement on FPGAs. Finally, the experimental results have demonstrated that the proposed algorithm can successfully accommodate all tasks without violation of the deadline constraint. Additionally, it gains higher energy reduction 13.3% and 26.3% than that of Particle Swarm Optimization and fully balanced algorithm, respectively.
128 bit low power and area efficient carry select adder amit bakshi academiagopi448
The document describes a 128-bit low power and area efficient carry select adder design. It proposes using a binary to excess-1 converter instead of a ripple carry adder for the carry select adder, which can reduce area and power consumption. Simulation results show the modified 128-bit carry select adder design achieves a 15.48% reduction in area and 7.41% reduction in power compared to a regular carry select adder design.
IRJET- A Review of Approximate Adders for Energy-Efficient Digital Signal Pro...IRJET Journal
The document reviews recent progress in approximate adders for energy-efficient digital signal processing. It summarizes various types of approximate adders that have been proposed, including speculative adders, segmented adders, carry select adders, and adders using approximate full adders. The document provides details on the design and operation of several specific approximate adder circuits. It also compares the delay and area complexity of different approximate adder designs.
A Review of Different Methods for Booth MultiplierIJERA Editor
In this review paper, different type of implementation of Booth multiplier has been studied. Multipliers has great importance in digital signal processor, so designing a high-speed multiplier is the need of the hour. Advantages of using modified booth multiplier algorithm is that the number of partial product is reduced. Different types of addition algorithms are also discussed which are used for addition operation of multiplier.
IRJET - A Speculative Approximate Adder for Error Recovery UnitIRJET Journal
This document presents a speculative approximate adder for error recovery. The adder is partitioned into non-overlapping blocks whose carries are predicted based on input signals of the current and next blocks. This reduces the critical path delay on average to one block. An error recovery unit is also proposed to further reduce output error rates. Simulation results show the proposed adder achieves lower error rates and error metrics compared to state-of-the-art approximate adders, with only a small increase in delay and area due to the error recovery unit.
This document presents a design for a simple accuracy reconfigurable adder (SARA) to improve the accuracy-power-delay efficiency of approximate adders. SARA is proposed as an alternative to existing accuracy configurable adders that require large areas due to redundancy and error correction circuitry. SARA uses a simple carry prediction approach without redundancy. It is evaluated as part of a 16-bit adder and shown to reduce delay compared to a ripple carry adder. SARA is also applied in an 8x8 Wallace multiplier, demonstrating reduced delay compared to using a ripple carry adder. The proposed SARA design achieves improvements in area, power, and delay efficiency for approximate arithmetic circuits.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
Abstract: Adders are known to have the frequently used in VLSI designs. In digital design we have half adder and full adder, main adders by using these adders we can implement ripple carry adders. RCA use to perform any number of addition. In this RCA is serial adder and it has commutation delay problem. If increase the ha&fa simultaneously delay also increase. That’s why we go for parallel adders(parallel prefix adders). IN the parallel prefix adder are ks adder(kogge-stone),sks adder(sparse kogge-stone),spaning tree and brentkung adder. These adders are designd and implemented on FPGA sparton3E kit. Simulated and synthesis by model sim6.4b, Xilinx ise10.1.
This paper introduces two architectures for modulo 2n+1 adders. The first architecture is based on a sparse carry computation unit that computes only some carries, enabled by a new inverted circular idempotency property. This sparse approach reduces area and power compared to prior proposals while maintaining speed. The second architecture unifies the design of modulo 2n+1 and 2n-1 adders by showing 2n+1 addition can be treated as a subcase of 2n-1 addition with minor extra logic.
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFTIRJET Journal
This document describes a proposed area efficient mixed decimation Multipath Delay Feedback (MDF) architecture for radix parallel Fast Fourier Transform (FFT) computation. The proposed architecture aims to improve efficiency in utilization of arithmetic resources in MDF architectures by integrating Decimation-In-Time (DIT) operations into Decimation-In-Frequency (DIF) operated computing units. This activates idle periods of arithmetic units in MDF architectures. The proposed architecture, called a mixed decimation MDF or DF architecture, is compared theoretically and experimentally to conventional MDF architectures. Results show the proposed DF architecture achieves improved efficiency in consumption of arithmetic resources.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm.
Multipliers play an important role in today’s digital signal processing (DSP) and various other
applications. Multiplication is the most time consuming process in various signal processing operations like
convolution, circular convolution, auto-correlation and cross-correlation. With advances in technology, many
researchers have tried and are trying to design multipliers which offer either of the following- high speed, low
power consumption, regularity of layout and hence less area or even combination of them in multiplier. However
area and speed are two conflicting constraints. So improving speed results always in larger areas. So here we try
to find out the best trade off solution among the both of them. To have features like high speed and low power
consumption multipliers several algorithms have been introduced .In this paper, we describes Multipliers by using
various algorithm in VLSI technology.
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
This document proposes a flexible accelerator architecture that exploits carry-save arithmetic to efficiently implement digital signal processing kernels. The architecture includes flexible computational units that can be configured to perform chained addition, multiplication, and addition operations directly on carry-save formatted data without intermediate conversions. Experimental results show the proposed architecture delivers average gains of 61.91% in area-delay product and 54.43% in energy consumption compared to state-of-the-art flexible data paths.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Presented approaches for generation of multiple clock gating domain parameterized PVT independent power abstracts for large IP blocks. We accomplish the gating domain parameterization through separation of the attribution of switching due to each single domain through a marking and tracing process, thereby precluding the need for separate domain by domain simulation to achieve the parameterization.
Experimental results comparing proposed approach on IP blocks of varying sizes from a real industry strength microprocessor design clearly highlight accuracy impact while keeping run time and model size increase in an acceptable range. In terms of extensions, we are exploring approaches where we could preserve each of the domains independently, for which we are looking into formulations based on constructing clock gating domain conflict hyper graphs and coloring them to determine domain interactions.
The document summarizes a research paper on the design of high-speed low-power Viterbi decoders for trellis coded modulation systems. It proposes a pre-computation architecture incorporated with the T-algorithm to effectively reduce power consumption without degrading decoding speed. The key aspects are:
1) A pre-computation approach is presented to calculate the optimal path metric in advance, allowing it to be determined faster and avoiding delays in the feedback loop.
2) A method is provided to systematically determine the optimal number of pre-computation steps needed to achieve the theoretical iteration bound for the Viterbi decoder.
3) Implementation results show the proposed pre-computation architecture can reduce power consumption by up to 70% without loss
This document summarizes a research paper on designing low power digit serial finite impulse response (FIR) filters using multiple constant multiplication (MCM) techniques. It proposes an architecture that optimizes the area of digit serial MCM operations at the gate level by considering the implementation costs of digit serial addition, subtraction and shift operations. An algorithm is presented to design digit serial FIR filters under a shift-adds architecture to reduce area compared to designs using generic digit serial multipliers. Experimental results show the technique leads to lower complexity digit serial MCM designs.
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...IRJET Journal
This document proposes modifying the standard AMBA bus architecture to support simultaneous read and write operations to memory, which is needed for some applications but not supported in the original AMBA design. It describes designing a high-speed AMBA bus architecture for image transmission and face detection using FPGA. The proposed design uses a finite state machine-based memory controller and pipeline data transfer to reduce wait states and improve system performance compared to existing AMBA designs. It aims to implement this modified AMBA architecture on FPGA for real-time image processing applications.
This document presents a new technique to reduce power consumption and increase speed in Content Addressable Memory (CAM) using memory partition and clock gating. The proposed CAM design partitions the memory into segments based on the most significant bits to check for matches. Since most words fail to match in their segments, the search can be discontinued for those segments, reducing power and increasing speed. Clock gating is also used to power gate unused portions of the CAM, further reducing static and dynamic power. Simulation and analysis using Quartus II and ModelSim show the proposed design reduces total power dissipation from 369.9mW to 46.3mW compared to an existing design.
This document describes the design of a 4-bit SFQ (Single Flux Quantum) multiplier circuit using a modified Booth encoding technique. It begins with background on multipliers and the advantages of the Booth encoding method over an AND array for reducing the number of partial products. The document then presents the design and analysis of the proposed 4-bit SFQ multiplier using a modified 2-bit Booth encoder, Carry Save Adder tree, and 6-bit carry lookahead adder. Simulation results show the modified Booth encoder approach reduces power consumption by 22.24% and delay by 23.96% compared to a conventional Booth encoder design.
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
This document discusses the performance evaluation of FPGA-based runtime dynamic partial reconfiguration for matrix multiplication. The author implements matrix multiplication using a Virtex-5 FPGA, with the ability to reconfigure modules for the multiplication at runtime. Simulation results show the multiplication of 2x2 and 4x4 matrices using this partial reconfiguration method. The results demonstrate that dynamic partial reconfiguration allows changing modules while keeping other parts of the design static, improving resource utilization and reducing power consumption compared to a fully static implementation.
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET Journal
This document presents a novel architecture for a high-speed inexact speculative adder using carry lookahead adder and Brent Kung adder. The proposed adder splits the critical path into shorter paths using fine-grain pipelining to improve speed. It uses a carry lookahead adder and Brent Kung adder in 4-bit blocks to speculate the carry and calculate sums, with error compensation. The design is implemented using a 5-stage pipeline to further reduce delay. Implementation in MAGIC shows the proposed adder achieves higher speed through optimized pipelining of key components in the speculative and compensation blocks.
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
This document discusses the implementation of a lightweight VLSI design for the HIGHT cipher on an FPGA. It begins with an introduction to lightweight VLSI architecture and its applications in low-resource devices. It then provides background on the HIGHT cipher and discusses prior work implementing cryptographic algorithms on FPGAs. The document goes on to describe the proposed VLSI design for the HIGHT cipher, which is optimized for size, power, and speed. It achieves a throughput of 25 Mbps with an encryption/decryption delay of 0.64 ms. Evaluation results demonstrate the effectiveness and suitability of the design for low-power applications.
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
This document presents a review of FPGA-based architectures for image capturing, processing, and display using a VGA monitor. It discusses using the Xilinx AccelDSP tool to develop the system on a Spartan 3E FPGA. The AccelDSP tool allows converting a MATLAB design into HDL for implementation on the FPGA. It summarizes the FPGA-based system architecture, which includes units for initialization, data transfer, image processing, and memory management. It then outlines the Xilinx AccelDSP design flow, which verifies the functionality at each stage of converting the floating-point MATLAB model to a fixed-point hardware implementation on the FPGA. The goal is to accelerate image processing applications using the parallel
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
Design and Implementation of Different types of Carry skip adderIRJET Journal
The document describes the design and implementation of different types of carry skip adders. It begins with an introduction to carry skip adders and their advantages over other adder types in terms of speed, area usage, and transistor count. It then reviews existing carry skip adder designs and their limitations. A new design called the Common Boolean Logic (CBL) carry skip adder is proposed that aims to reduce area and power consumption by eliminating redundant adder cells through shared logic. Simulation results show that an 8-bit CBL carry skip adder has 64.6% lower power and 18.7% smaller area than a conventional carry skip adder. In conclusion, the CBL carry skip adder achieves improved performance and efficiency.
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
This document describes several designs for high-speed matrix multiplication. It proposes a new design called Parallel-Parallel Input Multi-Output (PPI-MO) that uses multiple multipliers and registers to perform matrix multiplication in parallel. This increases throughput. For an n×n matrix multiplication, it uses n2 multipliers, n2 registers, and n(n-2) adders. Elements of the first matrix are input to rows of multipliers simultaneously, while elements of the second matrix are input to columns. The partial products from each column are added to calculate elements of the output matrix.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
Abstract: Adders are known to have the frequently used in VLSI designs. In digital design we have half adder and full adder, main adders by using these adders we can implement ripple carry adders. RCA use to perform any number of addition. In this RCA is serial adder and it has commutation delay problem. If increase the ha&fa simultaneously delay also increase. That’s why we go for parallel adders(parallel prefix adders). IN the parallel prefix adder are ks adder(kogge-stone),sks adder(sparse kogge-stone),spaning tree and brentkung adder. These adders are designd and implemented on FPGA sparton3E kit. Simulated and synthesis by model sim6.4b, Xilinx ise10.1.
This paper introduces two architectures for modulo 2n+1 adders. The first architecture is based on a sparse carry computation unit that computes only some carries, enabled by a new inverted circular idempotency property. This sparse approach reduces area and power compared to prior proposals while maintaining speed. The second architecture unifies the design of modulo 2n+1 and 2n-1 adders by showing 2n+1 addition can be treated as a subcase of 2n-1 addition with minor extra logic.
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFTIRJET Journal
This document describes a proposed area efficient mixed decimation Multipath Delay Feedback (MDF) architecture for radix parallel Fast Fourier Transform (FFT) computation. The proposed architecture aims to improve efficiency in utilization of arithmetic resources in MDF architectures by integrating Decimation-In-Time (DIT) operations into Decimation-In-Frequency (DIF) operated computing units. This activates idle periods of arithmetic units in MDF architectures. The proposed architecture, called a mixed decimation MDF or DF architecture, is compared theoretically and experimentally to conventional MDF architectures. Results show the proposed DF architecture achieves improved efficiency in consumption of arithmetic resources.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm.
Multipliers play an important role in today’s digital signal processing (DSP) and various other
applications. Multiplication is the most time consuming process in various signal processing operations like
convolution, circular convolution, auto-correlation and cross-correlation. With advances in technology, many
researchers have tried and are trying to design multipliers which offer either of the following- high speed, low
power consumption, regularity of layout and hence less area or even combination of them in multiplier. However
area and speed are two conflicting constraints. So improving speed results always in larger areas. So here we try
to find out the best trade off solution among the both of them. To have features like high speed and low power
consumption multipliers several algorithms have been introduced .In this paper, we describes Multipliers by using
various algorithm in VLSI technology.
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
This document proposes a flexible accelerator architecture that exploits carry-save arithmetic to efficiently implement digital signal processing kernels. The architecture includes flexible computational units that can be configured to perform chained addition, multiplication, and addition operations directly on carry-save formatted data without intermediate conversions. Experimental results show the proposed architecture delivers average gains of 61.91% in area-delay product and 54.43% in energy consumption compared to state-of-the-art flexible data paths.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Presented approaches for generation of multiple clock gating domain parameterized PVT independent power abstracts for large IP blocks. We accomplish the gating domain parameterization through separation of the attribution of switching due to each single domain through a marking and tracing process, thereby precluding the need for separate domain by domain simulation to achieve the parameterization.
Experimental results comparing proposed approach on IP blocks of varying sizes from a real industry strength microprocessor design clearly highlight accuracy impact while keeping run time and model size increase in an acceptable range. In terms of extensions, we are exploring approaches where we could preserve each of the domains independently, for which we are looking into formulations based on constructing clock gating domain conflict hyper graphs and coloring them to determine domain interactions.
The document summarizes a research paper on the design of high-speed low-power Viterbi decoders for trellis coded modulation systems. It proposes a pre-computation architecture incorporated with the T-algorithm to effectively reduce power consumption without degrading decoding speed. The key aspects are:
1) A pre-computation approach is presented to calculate the optimal path metric in advance, allowing it to be determined faster and avoiding delays in the feedback loop.
2) A method is provided to systematically determine the optimal number of pre-computation steps needed to achieve the theoretical iteration bound for the Viterbi decoder.
3) Implementation results show the proposed pre-computation architecture can reduce power consumption by up to 70% without loss
This document summarizes a research paper on designing low power digit serial finite impulse response (FIR) filters using multiple constant multiplication (MCM) techniques. It proposes an architecture that optimizes the area of digit serial MCM operations at the gate level by considering the implementation costs of digit serial addition, subtraction and shift operations. An algorithm is presented to design digit serial FIR filters under a shift-adds architecture to reduce area compared to designs using generic digit serial multipliers. Experimental results show the technique leads to lower complexity digit serial MCM designs.
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...IRJET Journal
This document proposes modifying the standard AMBA bus architecture to support simultaneous read and write operations to memory, which is needed for some applications but not supported in the original AMBA design. It describes designing a high-speed AMBA bus architecture for image transmission and face detection using FPGA. The proposed design uses a finite state machine-based memory controller and pipeline data transfer to reduce wait states and improve system performance compared to existing AMBA designs. It aims to implement this modified AMBA architecture on FPGA for real-time image processing applications.
This document presents a new technique to reduce power consumption and increase speed in Content Addressable Memory (CAM) using memory partition and clock gating. The proposed CAM design partitions the memory into segments based on the most significant bits to check for matches. Since most words fail to match in their segments, the search can be discontinued for those segments, reducing power and increasing speed. Clock gating is also used to power gate unused portions of the CAM, further reducing static and dynamic power. Simulation and analysis using Quartus II and ModelSim show the proposed design reduces total power dissipation from 369.9mW to 46.3mW compared to an existing design.
This document describes the design of a 4-bit SFQ (Single Flux Quantum) multiplier circuit using a modified Booth encoding technique. It begins with background on multipliers and the advantages of the Booth encoding method over an AND array for reducing the number of partial products. The document then presents the design and analysis of the proposed 4-bit SFQ multiplier using a modified 2-bit Booth encoder, Carry Save Adder tree, and 6-bit carry lookahead adder. Simulation results show the modified Booth encoder approach reduces power consumption by 22.24% and delay by 23.96% compared to a conventional Booth encoder design.
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
This document discusses the performance evaluation of FPGA-based runtime dynamic partial reconfiguration for matrix multiplication. The author implements matrix multiplication using a Virtex-5 FPGA, with the ability to reconfigure modules for the multiplication at runtime. Simulation results show the multiplication of 2x2 and 4x4 matrices using this partial reconfiguration method. The results demonstrate that dynamic partial reconfiguration allows changing modules while keeping other parts of the design static, improving resource utilization and reducing power consumption compared to a fully static implementation.
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET Journal
This document presents a novel architecture for a high-speed inexact speculative adder using carry lookahead adder and Brent Kung adder. The proposed adder splits the critical path into shorter paths using fine-grain pipelining to improve speed. It uses a carry lookahead adder and Brent Kung adder in 4-bit blocks to speculate the carry and calculate sums, with error compensation. The design is implemented using a 5-stage pipeline to further reduce delay. Implementation in MAGIC shows the proposed adder achieves higher speed through optimized pipelining of key components in the speculative and compensation blocks.
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
This document discusses the implementation of a lightweight VLSI design for the HIGHT cipher on an FPGA. It begins with an introduction to lightweight VLSI architecture and its applications in low-resource devices. It then provides background on the HIGHT cipher and discusses prior work implementing cryptographic algorithms on FPGAs. The document goes on to describe the proposed VLSI design for the HIGHT cipher, which is optimized for size, power, and speed. It achieves a throughput of 25 Mbps with an encryption/decryption delay of 0.64 ms. Evaluation results demonstrate the effectiveness and suitability of the design for low-power applications.
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
This document presents a review of FPGA-based architectures for image capturing, processing, and display using a VGA monitor. It discusses using the Xilinx AccelDSP tool to develop the system on a Spartan 3E FPGA. The AccelDSP tool allows converting a MATLAB design into HDL for implementation on the FPGA. It summarizes the FPGA-based system architecture, which includes units for initialization, data transfer, image processing, and memory management. It then outlines the Xilinx AccelDSP design flow, which verifies the functionality at each stage of converting the floating-point MATLAB model to a fixed-point hardware implementation on the FPGA. The goal is to accelerate image processing applications using the parallel
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
Design and Implementation of Different types of Carry skip adderIRJET Journal
The document describes the design and implementation of different types of carry skip adders. It begins with an introduction to carry skip adders and their advantages over other adder types in terms of speed, area usage, and transistor count. It then reviews existing carry skip adder designs and their limitations. A new design called the Common Boolean Logic (CBL) carry skip adder is proposed that aims to reduce area and power consumption by eliminating redundant adder cells through shared logic. Simulation results show that an 8-bit CBL carry skip adder has 64.6% lower power and 18.7% smaller area than a conventional carry skip adder. In conclusion, the CBL carry skip adder achieves improved performance and efficiency.
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
This document describes several designs for high-speed matrix multiplication. It proposes a new design called Parallel-Parallel Input Multi-Output (PPI-MO) that uses multiple multipliers and registers to perform matrix multiplication in parallel. This increases throughput. For an n×n matrix multiplication, it uses n2 multipliers, n2 registers, and n(n-2) adders. Elements of the first matrix are input to rows of multipliers simultaneously, while elements of the second matrix are input to columns. The partial products from each column are added to calculate elements of the output matrix.
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf
This paper proposes about motion estimation in H.264/AVC encoder. Compared with standards
such as MPEG-2 and MPEG-4 Visual, H.264 can deliver better image quality at the same
compressed bit rate or at a lower bit rate. The increase in compression efficiency comes at the
expense of increase in complexity, which is a fact that must be overcome. An efficient Co-design
methodology is required, where the encoder software application is highly optimized and
structured in a very modular and efficient manner, so as to allow its most complex and time
consuming operations to be offloaded to dedicated hardware accelerators. The Motion
Estimation algorithm is the most computationally intensive part of the encoder which is simulated using MATLAB. The hardware/software co-simulation is done using system generator tool and implemented using Xilinx FPGA Spartan 3E for different scanning methods.
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
The document describes a novel design for a multiplier and accumulator (MAC) unit using the modified Booth algorithm and parallel self-timed adder (PASTA). The modified Booth algorithm reduces the number of partial products compared to a regular multiplication process, lowering delay. A carry save adder design is also proposed to further improve performance in terms of computation speed, power consumption, and area compared to a conventional design using the modified Booth algorithm. Simulation results show the proposed MAC design with PASTA has better performance and reduced area overhead and critical path delay compared to conventional methods.
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is an open access journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
This document proposes power efficient sum of absolute difference (SAD) algorithms for video compression. It describes:
1. Developing low power 1-bit full adder architectures including a proposed design using NAND, AND, and OR gates that improves power over existing designs.
2. Implementing 4x4 and 8x8 SAD architectures using the proposed low power full adder, ripple carry adders, and carry save adders.
3. Synthesizing the SAD designs in a 180nm technology and finding the proposed 4x4 SAD improves total power by 61% compared to an existing design.
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
This paper is devoted to the design of dual core crypto processor for executing both Prime field and binary field instructions. The proposed design is specifically optimized for Field programmable gate array (FPGA) platform. Combination of two different field (prime field GF(p) and Binary field GF(2m)) instructions execution is analysed.The design is implemented in Spartan 3E and virtex5. Both the performance results are compared. The implementation result shows the execution of parallelism using dual field instructions
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Coarse grained hybrid reconfigurable architecture with no c routerDhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and improve execution time. The architecture supports fast search algorithms like diamond search that further improve efficiency over full search.
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Dhiraj Chaudhary
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and increase computation efficiency. The architecture supports fast search algorithms like diamond search that further improve performance over full search.
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Dhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Run time dynamic partial reconfiguration using microblaze soft core processor...eSAT Journals
aydeshmukh@gmail.com
Abstract
DSP Application requires a fast computations & flexibility of the design. Partial Reconfiguration (PR) is an advanced technique,
which improves the flexibility of FPGAs by allowing portions of a design to be reconfigured at runtime by overwriting parts of the
configuration memory. In this paper we are using microblaze soft core processor & ICAP Port to reconfigure the FPGA at runtime.
ICAP is accessed through a light-weight custom IP which requires bit stream length, go, and done signal to interface to a system that
provides partial bit stream data. The partial bit stream is provided by the processor system by reading the partial bit files from the
compact flash card. Our targeted DSP application is matrix multiplication; we are reconfiguring design by changing partial modules
at run time. To change the partial bit stream we interfaces a microblaze Soft processor & using a UART interface.ISE13.1 &
PlanAhead is used for partial reconfiguration of FPGA .EDK is used for microblaze soft processor design & ICAP Interface .The
simulation is done using Chip Scope Logic Analyzer & the complete hardware implementation is done on Xilinx VIRTEX -6 ML605
Platform.
Keywords — PlanAhead, EDK, Dynamic partial reconfiguration, ICAP, Matrix multiplication, Chipscope pro analysis,
DSP application, Microblaze processor
Design of efficient reversible floating-point arithmetic unit on field progr...IJECEIAES
The reversible logic gates are used to improve the power dissipation in modern computer applications. The floating-point numbers with reversible features are added advantage to performing complex algorithms with highperformance computations. This manuscript implements an efficient reversible floating-point arithmetic (RFPA) unit, and its performance metrics are realized in detail. The RFP adder/subtractor (A/S), RFP multiplier, and RFP divider units are designed as a part of the RFP arithmetic unit. The RFPA unit is designed by considering basic reversible gates. The mantissa part of the RFP multiplier is created using a 24x24 Wallace tree multiplier. In contrast, the reciprocal unit of the RFP divider is designed using Newton Raphson’s method. The RFPA unit and its submodules are executed in parallel by utilizing one clock cycle individually. The RFPA unit and its submodules are synthesized separately on the Vivado IDE environment and obtained the implementation results on Artix-7 field programmable gate array (FPGA). The RFPA unit utilizes only 18.44% slice look-up tables (LUTs) by consuming the 0.891 W total power on Artix-7 FPGA. The RFPA unit sub-models are compared with existing approaches with better performance metrics and chip resource utilization improvements.
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...ijcsit
In Network on Chip (NoC) rooted system, energy consumption is affected by task scheduling and allocation
schemes which affect the performance of the system. In this paper we test the pre-existing proposed
algorithms and introduced a new energy skilled algorithm for 3D NoC architecture. An efficient dynamic
and cluster approaches are proposed along with the optimization using bio-inspired algorithm. The
proposed algorithm has been implemented and evaluated on randomly generated benchmark and real life
application such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing mapping algorithm spiral and crinkle and has shown better
reduction in the communication energy consumption and shows improvement in the performance of the
system. On performing experimental analysis of proposed algorithm results shows that average reduction
in energy consumption is 49%, reduction in communication cost is 48% and average latency is 34%.
Cluster based approach is mapped onto NoC using Dynamic Diagonal Mapping (DDMap), Crinkle and
Spiral algorithms and found DDmap provides improved result. On analysis and comparison of mapping of
cluster using DDmap approach the average energy reduction is 14% and 9% with crinkle and spiral.
Similar to EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA (20)
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
1. DOI : 10.5121/vlsic.2019.10201 1
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR
SAD COMPUTATION ON FPGA
Jaya Koshta, Kavita Khare and M.K Gupta
Maulana Azad National Institute of Technology, Bhopal
ABSTRACT
Video Compression is very essential to meet the technological demands such as low power, less memory
and fast transfer rate for different range of devices and for various multimedia applications. Video
compression is primarily achieved by Motion Estimation (ME) process in any video encoder which
contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric
in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung
Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder
scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42
% as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using
Virtex 7 FPGA.
KEYWORDS
HEVC, motion estimation, sum of absolute difference, parallel prefix adders, Brent Kung Adder.
1. INTRODUCTION
To meet the technological demands such as low power, less memory and fast transfer rate for a
wide range of applications including the growing demand of High Definition(HD)(1080p) to
Ultra-High Definition(UHD)(4K and 8K), resulted in creation of stronger needs for better video
compression efficiency. HEVC or H.265is a video compression standard designed to substantially
improve coding efficiency when compared to its precedent, the Advanced Video Coding (AVC)
or H.264.HEVC is a block-based video compression designed to support higher resolutions and it
can achieve 50% bit rate saving compared to H.264/MPEG-4 AVC for the same video quality
[1][2].This considerable increase in performance is due to the many enhanced techniques and
methodologies that have been introduced in H.265/HEVC. Some of these enhancements are
included in the motion estimation process, which is one the most complex and time consuming
block in video encoding. The objective of the ME unit is to find the best matched block in the
reference (past/future) frame search window (region of interest), for every block of the current
frame such that there constructed frame contributes to the lowest residual information [3].The
bottleneck for the ME system design lies in the implementation of an appropriate SAD
architecture. For block based ME, the Sum of Absolute Difference(SAD) is the generally used
metric which adds up the absolute differences between corresponding elements in a candidate and
reference block in video frames. However with the increase in the coding block size to 64x64 in
HEVC, compared to 16x16 in H.264/AVC, results in greater complexity in ME process. Thus in
HEVC for ME process, the number of SAD computations vastly increases resulting in increase in
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
2. 2
the processing delay, power consumption and hardware complexity. Thus, a pragmatic strategy to
elevate the performance of H.265/HEVC relies on enhancing the performance of the core
component of the motion estimation engine which is the block matching unit.
Different VLSI architectures for SAD computations are available in the literature where there is a
lot of tradeoffs between the speed, power and area during the hardware implementation. Due to
the increasing demand for the portable devices with low power consumption and high
performance, the circuits are optimized according to the applications. In this paper,
implementation of absolute difference circuit on FPGA is proposed to increase speed
performance and to minimize the amount of occupied resources on FPGA for SAD calculation.
2. RELATED WORK
SAD is one of the most popular techniques used for motion estimation in digital video encoding
systems. There are large number of methods available for SAD computation where there are
tradeoffs between speed, power and area during the hardware implementation. Typically, the
SAD computation consists of first computing the absolute difference between corresponding
pixels in current and reference video frame.
The calculation of SAD from the current and reference block is performed using equation (1),
SAD = -------------(1)
Where,
CB - Current Block,
RB –Reference Block,
N X M - block size of current and reference block,
i,j - two dimensional coordinates of block.
Generally, the SAD operation basically consists of first computing the absolute difference |CB(i,j)
– RB(i,j)| and then summing up these in a multi-input addition.
For absolute difference calculation, one method is to detect the smaller operand in the absolute
difference computation |CB−RB| and to subtract it from the larger operand [4],[5].The other
method comprises of complimenting the smaller of the two numbers and then performing
addition of two numbers followed by plus one to compute the absolute difference[6]. To compute
the absolute difference for SAD, in [7] a novel architecture is optimized for realizing efficient
absolute difference circuits in Virtex-5 FPGA devices which uses the 6-input look-up tables
available within the chosen devices family to maximize speed performance and to minimize the
amount of occupied resources. In [8] an improved architecture for efficiently computing the sum
of absolute differences (SAD) on FPGAs is proposed based on a configurable adder/subtractor
implementation in which each adder input can be negated at runtime. The SAD architecture
proposed in this paper provides a significant resource reduction on current FPGAs. In [9] FPGA
design for fast computing of the minimum SAD is proposed. The hardware unit proposed is
intended to augment a general-purpose core and with the use online arithmetic (OLA) it is
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
3. 3
possible to implement a full 16 X 16 macroblock SAD in a single FPGA device and it permits to
speed up the computation by early truncation of the SAD calculation when the involved candidate
is bigger than the current reference SAD. In [10] pipelined SAD architecture for efficient SAD
calculations is proposed where consecutive pipeline stages performs the addition of absolute
differences to obtain SAD of 8 X 8 block. In [11] Different architectures for binary addition were
proposed for SAD computation on FPGA
Table 1 summarises the above mentioned related work on SAD computation. Review work
motivates for a proposal of high speed compact AD circuits on FPGA for SAD calculation. In this
paper, AD circuit on FPGA is proposed to increase speed performance and to minimize the
amount of occupied resources on FPGA for SAD calculation.
Table 1: Related work on SAD computation
3. PROPOSED ARCHITECTURE
Hardware architecture for computing the absolute difference between corresponding pixels in
current and reference video block is proposed. The method used for AD calculation is as used in
[6] where adder and comparator form the basic component as shown in Figure 1.The 8-bit
comparator compares two numbers and returns the 1’s complement of the smaller number and the
larger number as it is.
8-bit Comparator
8-bit Adder
8-bit Adder
8B’1
Reg
Current Pixel Reference Pixel
Figure 1: AD circuit block diagram
Ref. Architecture Advantages Disadvantages
[7] 6-input look-up tables on
Virtex 5
High speed and compact
Optimized for only
Virtex 5
[8] Configurable adder/subtractor
on Virtex 6
Resource reduction on current
FPGAs
SAD 1X2
implemented
[9] Online arithmetic for 16 X 16
SAD on Virtex 2
High speed Large Area
[10]
Pipelined SAD architecture High speed and low power Large area
[11] Different architectures for
binary addition
Low power
Implemented on
FPGA Spartan
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
4. 4
Proposed Architecture: In this architecture:
Ripple Carry Adders in AD circuit (Figure 1) are replaced by Brent Kung Adder (Type of
Parallel Prefix Adders-PPA).
8-bit conventional comparator in Figure 1 is replaced by comparator based on modified 1’s
complement principle and conditional sum adder scheme.
These replacements results in reduction in delay and reduces LUTs count used in the circuit.
PPA:
PPA (Parallel Prefix Adder) circuits use a tree network to reduce the latency to O (log2n)where
‘n’ represents the number of bits.PPA employs 3-structural stages as shown in Figure 2. The first
stage at the top is used for computing generate and propagate signals as given in equation (2) and
(3)exactly as in Carry Look ahead Adder (CLA) where a and b represents binary digits.
g=a.b--------------------------------------------(2)
p=a⊕b------------------------------------------(3)
The carry bits are calculated in the second stage. In this stage, to formulate the operation of the
prefix adders, “prefix operator” which is represented as “.” is used. The “prefix operator” function
has two essential properties to keep the computational operation faster. The first one is called the
associative property and the second one is idempotency property. Using the advantages of these
properties, carry-out can be found at a depth proportional to log2(n) [12]. The final summation is
obtained in the last stage.The associative and idempotency properties of "." operator allow carry
output to be computed in a different number of levels or simply depth. Therefore, various
topologies of prefix adders can be designed which are mentioned in the literature and are
inspiring to VLSI designers because of their minimum depth and delay.
The logical structures used in prefix adders scheme consists of black cell, gray cell and white cell
as defined in Figure 3[13].
Calculation of the carries
This part is parallelizable to reduce time
Pre-calculation of pi ,gi terms
Simple adder to generate the sum
Prefix graphs can
be used to describe
the structure that
performs this part
Straight forward
as in the CLA
adder
Figure 2: Parallel Prefix Adder stages
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
5. 5
The logical structure of black cell can propagate and generate signals while the logical structure
of gray cell can only generate signals. The white cell (buffer)is used for loading the signal out. In
parallel prefix scheme, generate and propagate signals can be grouped in multiple ways to get the
same correct carry signals. Based on different methodologies of grouping these signals, different
prefix architectures can be created. The parallel form of obtaining the carry bit makes PPAs
perform addition arithmetic faster. There are many types of PPA such as Brent Kung, Kogge
Stone, Ladner Fisher, Hans Carlson and Knowles[12].
i : k i : k
i : j i : j
k-1: j k-1: j i : j
Black cell Gray cell Buffer
i : j
p k-1:j
g i:k
g i:j
p i:k
g k-1:j
p i:j
g i:k
p i:k
g k-i:j
g i:j
g i:j g i:j
p i:jp i:j
Both propagate & generate Generate only Different load
Figure 3 Cell definitions for parallel-prefix scheme [10]
Out of many types of PPA, Brent Kung Adder(BKA)is used as for 8-bit implementation it has
less delay compared to other types of PPA[12][13].
In BKA propagate signals and generate signals are combined into groups of two by using the
associative property. The first stage in BKA includes computation of generate and propagate
signals corresponding to each pair of bits in a and bas given by equation (2) and (3). The second
stage ie. prefix carry tree includes computation of carries corresponding to each bit. Equations (4)
and (5)below shows how propagate and generate signals are calculated in BKA,
pi:j = pi:k. pk-1:j--------------------------------------(4)
gi:j = gi:k + (pi:k. gk-1:j)-----------------------------(5)
These propagate and generate signals given in equation (4) and (5) are calculated using black cell,
gray cell and white cell as defined in Figure 3. The last stage ie.post processing stage includes
computation of sum bits which is given by the equation (6).
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
6. 6
si = pi⊕ci------------------------------------------(6)
7 6 5 4 3 2 1 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
7 : 0 6 : 0 5 : 0 4 : 0 3 : 0 2 : 0 1 : 0 0 : 0
Input
Output
c 3 : 2 = g 3 : 2 + p 3 : 3 g 2 : 2
c 4 : 0 = g 4: 4 + p 4 : 4 g 3 : 0
Figure 4 : 8-bit Brent-Kung adder(BKA) tree [13]
Figure 4 shows 8-bit BKA tree adder which has lower delay as compared to Ripple Carry
Adder(RCA).Also, apart from using BKA, a new efficient comparator based on modified 1’s
complement principle and conditional sum adder scheme is used in this architecture as discussed
below.
Modified Comparator:
The proposed architecture apart from using BKA, also uses comparator (based on modified 1’s
complement principle and conditional sum adder scheme). In this modified 1's complement
scheme, if A>B, bit Cout = 1 and if A≤B bit Cout =0 so the only concern is about carry out bit
information as shown in Figure 5.This method always adds a fixed carry after modification, so if
A ≥ B, bit Comp = 1 and if A < B, bit Comp = 0 [15].For realization of this comparator
conditional sum adder scheme has been used that provides a logarithmic increase in speed for
addition[16].
The principle behind this scheme is to generate two sets of outputs for a given group of k bits
operands. Each set includes k sum bits and an outgoing carry. The one set assumes that the
eventual incoming carry will be zero, while the other assumes that it will be one. Depending upon
the incoming carry correct set of outputs (out of the two sets) is selected without waiting for the
carry to further propagate through the k positions as shown in Figure 6.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
7. 7
Y = A – A = 0 1 0 1 0 1 0 0 - 0 1 0 1 0 1 0 0 = (84 = 84)
0 1 0 1 0 1 0 0 = A
1 0 1 0 1 0 1 1 = 1's compliment of A
Z = B – A = 0 1 0 0 0 0 1 1 - 0 1 0 1 0 1 0 0 = (67 < 84)
0 1 0 0 0 0 1 1 = B
1 0 1 0 1 0 1 1 = 1's compliment of A
X = A - B = 0 1 0 1 0 1 0 0 - 0 1 0 0 0 0 1 1 = (84 > 67)
0 1 0 1 0 1 0 0 = A
1 0 1 1 1 1 0 0 = 1's compliment of B
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1Cout 0
+ 1 = Fixed carry-in bit
1Comp
1 1 1 1 1 1 1 1Cout 0
+ 1 = Fixed carry-in bit
0Comp 1 1 1 0 1 1 1 1
0 0 0 1 0 0 0 0
A = 0 1 0 1 0 1 0 0 = 84
B = 0 1 0 0 0 0 1 1 = 67
Cout 1
+ 1 = Fixed carry-in bit
1Comp 0 0 0 1 0 0 0 0
Status bit Cout Comp
A>B 1 1
A=B 0 1
A<B 0 0
Figure 5 Modified l's complement method for improved comparator design
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
8. 8
k – bit adder
k – bit adder
Multiplexer
1
CinCout
0
k k
k
Figure 6 Conditional sum adder scheme
However, this scheme is generally not applied to long n-bits operand at the beginning of the add
operation, since it will add delay as the carry propagates through all n positions before making the
selection for correct output. Generally, the given n bits are divided into smaller groups to apply
this conditional adder scheme separately. Thus, the serial carry-propagation inside the separate
groups can be done in parallel, reducing the overall execution time. The outputs of the subgroups
are then combined to generate the output of the final output.
A7
B7
A6
B6
A5
B5
A4
B4
A3
B3
A2
B2
B1
A1
B0
A0
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
Comp = 1 ,A>= B,
= 0, A<B
Comp
Stage 0 Stage 1 Stage 2 Stage 3
Figure 7 Modified comparator architecture
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
9. 9
Figure 7 shows comparator architecture using modified 1’s complement and conditional sum
adder design. The 8-bit comparator needs 11(=1+3+7) 2-to-1 multiplexers and eight inverters to
generate complementary values of input B. Originally Carry = AB + AC + BC = AB + (A + B) C
and if C= 0 then Carry =AB or if C=l then Carry = AB + (A+B) =A + B. The sum of MUX gates
of N-bit comparator is,
-----------------------(7)
This architecture reduces delay and also provides significant reduction in resource utilization in
FPGA.
4. SIMULATION AND RESULTS
Simulations of the conventional and proposed architecture for AD circuit implementation have
been carried out using Verilog HDL programming in Xilinx ISE 14.2 platform and implemented
on Virtex7 FPGA. Adequate testing of each design was done to verify correct operation. Figure 8
shows the simulation result of proposed AD circuit.
Figure 8 Simulation result of Absolute Difference Circuit
Figure 9 Graphical representation of delays in proposed architecture with conventional architecture
0
1
2
3
4
5
6
7
Logic
delay(ns)
Routing
delay(ns)
Total
Delay(ns)
Conventional
Proposed
Architecture
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
10. 10
Figure 10 Implementation of proposed AD architecture on FPGA
Architectu
re
Logic
delay(ns)
Routing
delay(ns)
Totalde
lay(ns)
Numbe
r of
slice
LUTs
Convention
al
0.484
(8.0%
logic)
5.673
(92.0%rout
e)
6.157 50
Proposed
Archiecture
0.391
(7.5%
logic)
4.849
(92.5%
route)
5.240 29
Table 2 Comparison of proposed AD architecture architecture with conventional architecture
Figure 9 shows graphical representation of delays in proposed architecture with conventional
architecture and Figure 10 shows implementation and area occupied of proposed AD architecture
on FPGA. Table 2 shows comparison of proposed AD circuit architecture in terms of time delay
and area. Referring to the above table, it can be seen that there is reduction in delay and number
of occupied resources on FPGA in the proposed architecture for AD circuit resulting in improved
performance for SAD computation in motion estimation.
5. CONCLUSIONS
VLSI architecture for SAD computations is proposed in this paper for reducing delay and area
and is implemented on Virtex7 FPGA. The proposed architecture reduces delay and provides
significant reduction in resource utilization in FPGA. Synthesis results shows that proposed
architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to
conventional architecture.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
11. 11
REFERENCES:
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video
coding (HEVC)standard,” IEEE Trans. Circuits Syst. Video Technol., vol.22, no. 12, pp. 1649-1668,
December 2012.
[2] I. Richardson, “HEVC: An introduction to high efficiency video coding,” 2001,
https://www.vcodex.com/h265.html
[3] N. Purnachand, L. N. Alves, and A. Navarro, “Fast Motion Estimation Algorithm for HEVC”, IEEE
International Conference on Consumer Electronics-Berlin (ICCE-Berlin), September 2012.
[4] S.Wong, B. Stougie, and S. Cotofana, “Alternatives in FPGA-based SAD Implementations,” in IEEE
International Conference on Field-Programmable Technology (FPT),IEEE, 2002, pp. 449–452.
[5] S. Vassiliadis, E. A. Hakkennes, J. S. S. M. Wong, and G.G.Pechanek, “The Sum-Absolute-
Difference Motion Estimation Accelerator,” in Euromicro Conference, 1998. Proceedings, 24th.
IEEE, 1998, pp. 559–566.
[6] Ahmed Medhat, Ahmed Shalaby, Mohammed S. Sayed, Maha Elsabrouty and Farhad Mehdipour. “A
Highly Parallel SAD Architecture for Motion Estimation in HEVC Encoder”, Circuits and Systems
(APCCAS), IEEE Asia Pacific Conference, 280 - 283, 2014.
[7] Stefania Perri , Paolo Zicari , Pasquale Corsonello, “Efficient Absolute Difference Circuits in Virtex-
5 FPGAs”, IEEE, 2010.
[8] Martin Kumm, Marco Kleinlein and Peter Zipf, “Efficient Sum of Absolute Difference Computation
on FPGAs”,26thInternational Conference on Field Programmable Logic and Applications.
[9] Joaquin Olivares, Ignacio Benavides and et.al., “Minimum Sum of Absolute Differences
implementation in a single FPGA device”, Dept. of Electro-technics and Electronics, University of
Cordoba, Spain.
[10] P.Jayakrishnan and Harish. M. Kittur, “Pipelined Arch.for Motion Estimation in HEVC Video
Coding”, Indian Journal of Science and Tech,August 2016.
[11] D. V Manjunatha, Pradeep Kumar and R. Karthik, “FPGA Implementation of Sum of Absolute
Difference (SAD) for video applications”, ARPN Journal of Engineering and Applied Sciences, Vol.
12, No. 24, December 2017
[12] Geeta Rani and Sachin Kumar, “Delay analysis of parallel-prefix adders”, International Journal of
Science and Research (IJSR), 3(6):2339-2342, 2014.
[13] Nurdiani Zamhari, Peter Voon, Kuryati Kipli, Kho Lee Chin, Maimun Huja Husin,“Comparison of
Parallel Prefix Adder (PPA)”, Proceedings of the World Congress on Engineering 2012,Vol II.
[14] R. P Brent & H. T. Kung, “A Regular Layout for Parallel Adders”, IEEE Trans. Computers, Vol. C-
31, pp 260-264, 1982.
[15] Shun-Wen Cheng, “A High-Speed Magnitude Comparator with Small Transistor Count”, in
Proceedings of IEEE internationalconference ICECS, 1168 - 1171 Vol.3, Dec 2003.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
12. 12
[16] J.Sklansky, “Conditional-Sum Addition Logic,” IRE Transactions on Electronic Computers, Vol. EC-
9, No. 2, pp. 226-231, June,1960
[17] S. Rehman; R. Young;C. Chatwin;P. Birch, “An FPGA Based Generic Framework for High Speed
Sum of Absolute Difference Implementation”, Europ. Jour Scient. Res., vol.33, no.1, 2009.
[18] Manjunatha, D. V., and G. Sainarayanan. “Low-Power Sum of Absolute Difference Architecture for
Video Coding”, Emerging Research in Electronics, Computer Science and Technology. Springer
India, 2014. 335-341.
[19] LiYufei,Feng Xiubo and Wang Q in, “A High-Performance Low Cost SAD Architecture for Video
Coding”, IEEE Transactions on Consumer Electronics, pp. 535-541, Vol. 53, No. 2, May 2007.
[20] Jarno, Vanne, Eero Aho, Timo D. Hamalainen and Kimmo Kuusilinna, “A High-Performance Sum of
Absolute Difference Implementation Motion Estimation”, IEEE Transactions on Circuits and Systems
for Video Technology, pp. 876-883, Vol. 16, No. 7, 2006.
[21] Elhamzi W., Dubois J., Miteran J “An efficient low-cost FPGA implementation of a configurable
motion estimation for H.264 video coding”, Springer Journal of Real-Time Processing,Vol:9, No:1,
pp. 19–30,2014.
[22] Moorthy T., Ye A, “A scalable architecture for variable block size motion estimation on field-
programmable gate arrays” , IEEE Canadian Conference of Electrical and Computer Engineering
(CCECE),Niagara Falls, May, pp.1303–1308,2008.
[23] Davis P., Sangeetha M. ,“Implementation of Motion Estimation Algorithm for H.265/HEVC”,
International Journal of Advanced Research in Electrical, Electronics
and Instrumentation Engineering. Vol:3,No:3, pp. 122–126,2014.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019