This document presents a comparative study of implementing single precision floating point division using different computational algorithms on FPGAs. It describes two commonly used division algorithms: Goldschmidt and Newton-Raphson. The Goldschmidt algorithm continually multiplies the numerator and denominator by a common factor to converge the denominator to 1. The Newton-Raphson algorithm calculates the multiplicative inverse of the denominator through iterative processing. A 32-bit floating point divider is designed using a 32-bit floating point multiplier module based on 24-bit Vedic multiplication and a 32-bit floating point subtractor module. Synthesis results on a Xilinx Spartan 6 FPGA show the resource utilization and propagation delay for the proposed design, which is
Design and implementation of complex floating point processor using fpgaVLSICS Design
This paper presents complete processor hardware with three arithmetic units. The first arithmetic unit can
perform 32-bit integer arithmetic operations. The second unit can perform arithmetic operations such as
addition, subtraction, multiplication, division, and square root on 32-bit floating point numbers. The third
unit can perform arithmetic operations such as addition, subtraction, multiplication on complex numbers.
The specific advancement in this processor is the new architecture introduced for complex arithmetic unit.
In general complex floating point arithmetic hardware consists of floating to fixed and fixed to floating
conversions. But using such hardware will lead to compromise between accuracy and number of bits used
to represent the fixed point equivalent of floating point numbers. The proposed architecture avoids that
compromise and it is implemented with less number of look-up tables to save around 5500 logic gates. The
complex numbers are represented using a subset of IEEE754 standard floating point format, 16-bits for
real part and 16-bits for imaginary part. The floating point arithmetic unit works on 32-bit IEEE754 single
precision numbers. The instruction set is specially designed to support integer, floating point and complex
floating point arithmetic operations. The on-chip RAM is 8kBytes and is extendable up to 64kBytes. As the
processor is designed to implement on FPGA, the embedded block RAMs are utilized as RAM.
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
The Digital Image processing applications like medical imaging, satellite imaging, Biometric trait images
etc., rely on multipliers to improve the quality of image. However, existing multiplication techniques
introduce errors in the output with consumption of more time, hence error free high speed multipliers has
to be designed. In this paper we propose FPGA based Recursive Error Free Mitchell Log Multiplier
(REFMLM) for image Filters. The 2x2 error free Mitchell log multiplier is designed with zero error by
introducing error correction term is used in higher order Karastuba-Ofman Multiplier (KOM)
Architectures. The higher order KOM multipliers is decomposed into number of lower order multipliers
using radix 2 till basic multiplier block of order 2x2 which is designed by error free Mitchell log multiplier.
The 8x8 REFMLM is tested for Gaussian filter to remove noise in fingerprint image. The Multiplier is
synthesized using Spartan 3 FPGA family device XC3S1500-5fg320. It is observed that the performance
parameters such as area utilization, speed, error and PSNR are better in the case of proposed architecture
compared to existing architectures.
A Fast Floating Point Double Precision Implementation on FpgaIJERA Editor
In the modern day digital systems, floating point units are an important component in many signal and image
processing applications. Many approaches of the floating point units have been proposed and compared with
their counterparts in recent years. IEEE 754 floating point standard allows two types of precision units for
floating point operations, single and double. In the proposed architecture double precision floating point unit is
used and basic arithmetic operations are performed. A parallel architecture is proposed along with the high
speed adder, which is shared among other operations and can perform operations independently as a separate
unit. To improve the area efficiency of the unit, carry select adder is designed with the novel resource sharing
technique which allows performing the operations with the minimum usage of the resources while computing
the carry and sum for „0‟ and „1‟. The design is implemented using the Xilinx Spartan 6 FPGA and the results
show the 23% improvement in the speed of the designed circuit
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
This document describes several designs for high-speed matrix multiplication. It proposes a new design called Parallel-Parallel Input Multi-Output (PPI-MO) that uses multiple multipliers and registers to perform matrix multiplication in parallel. This increases throughput. For an n×n matrix multiplication, it uses n2 multipliers, n2 registers, and n(n-2) adders. Elements of the first matrix are input to rows of multipliers simultaneously, while elements of the second matrix are input to columns. The partial products from each column are added to calculate elements of the output matrix.
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
The document describes a novel design for a multiplier and accumulator (MAC) unit using the modified Booth algorithm and parallel self-timed adder (PASTA). The modified Booth algorithm reduces the number of partial products compared to a regular multiplication process, lowering delay. A carry save adder design is also proposed to further improve performance in terms of computation speed, power consumption, and area compared to a conventional design using the modified Booth algorithm. Simulation results show the proposed MAC design with PASTA has better performance and reduced area overhead and critical path delay compared to conventional methods.
This document presents a VHDL implementation of an IEEE 754 floating point unit using a carry look ahead adder and radix-4 modified Booth encoder multiplier. The floating point unit performs single precision floating point multiplication. It consists of blocks to calculate the sign bit, add the exponents, multiply the significands using the modified Booth encoding technique, normalize the result, detect overflow/underflow, and implement pipelining. VHDL simulation results show that this floating point multiplier design has lower delay and power consumption compared to an array multiplier implementation.
IRJET- Design of 16 Bit Low Power Vedic Architecture using CSA & UTSIRJET Journal
The document proposes a 16-bit low power Vedic architecture using the Urdhava Tiryakbhyam Sutra (UTS) method of Vedic mathematics and carry select adders. UTS allows for the parallel generation of partial products in multiplication, improving speed. The proposed design decomposes 16-bit numbers into pairs of 8-bit blocks, uses 8-bit UTS multipliers, and adds results with carry select adders. This structure is expected to reduce power consumption and delay compared to other multipliers.
This document describes a proposed VLSI implementation of a high-speed DCT architecture for H.264 video codec design. It presents a Booth radix-8 multiplier-based multiply-accumulate (MAC) unit to improve throughput and minimize area complexity for 8x8 2D DCT computation. The proposed MAC architecture achieves a maximum operating frequency of 129.18MHz while reducing area by 64% compared to a regular merged MAC unit with a conventional multiplier. FPGA implementation and performance analysis demonstrate the suitability of the proposed DCT architecture for applications in HDTV systems.
Design and implementation of complex floating point processor using fpgaVLSICS Design
This paper presents complete processor hardware with three arithmetic units. The first arithmetic unit can
perform 32-bit integer arithmetic operations. The second unit can perform arithmetic operations such as
addition, subtraction, multiplication, division, and square root on 32-bit floating point numbers. The third
unit can perform arithmetic operations such as addition, subtraction, multiplication on complex numbers.
The specific advancement in this processor is the new architecture introduced for complex arithmetic unit.
In general complex floating point arithmetic hardware consists of floating to fixed and fixed to floating
conversions. But using such hardware will lead to compromise between accuracy and number of bits used
to represent the fixed point equivalent of floating point numbers. The proposed architecture avoids that
compromise and it is implemented with less number of look-up tables to save around 5500 logic gates. The
complex numbers are represented using a subset of IEEE754 standard floating point format, 16-bits for
real part and 16-bits for imaginary part. The floating point arithmetic unit works on 32-bit IEEE754 single
precision numbers. The instruction set is specially designed to support integer, floating point and complex
floating point arithmetic operations. The on-chip RAM is 8kBytes and is extendable up to 64kBytes. As the
processor is designed to implement on FPGA, the embedded block RAMs are utilized as RAM.
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
The Digital Image processing applications like medical imaging, satellite imaging, Biometric trait images
etc., rely on multipliers to improve the quality of image. However, existing multiplication techniques
introduce errors in the output with consumption of more time, hence error free high speed multipliers has
to be designed. In this paper we propose FPGA based Recursive Error Free Mitchell Log Multiplier
(REFMLM) for image Filters. The 2x2 error free Mitchell log multiplier is designed with zero error by
introducing error correction term is used in higher order Karastuba-Ofman Multiplier (KOM)
Architectures. The higher order KOM multipliers is decomposed into number of lower order multipliers
using radix 2 till basic multiplier block of order 2x2 which is designed by error free Mitchell log multiplier.
The 8x8 REFMLM is tested for Gaussian filter to remove noise in fingerprint image. The Multiplier is
synthesized using Spartan 3 FPGA family device XC3S1500-5fg320. It is observed that the performance
parameters such as area utilization, speed, error and PSNR are better in the case of proposed architecture
compared to existing architectures.
A Fast Floating Point Double Precision Implementation on FpgaIJERA Editor
In the modern day digital systems, floating point units are an important component in many signal and image
processing applications. Many approaches of the floating point units have been proposed and compared with
their counterparts in recent years. IEEE 754 floating point standard allows two types of precision units for
floating point operations, single and double. In the proposed architecture double precision floating point unit is
used and basic arithmetic operations are performed. A parallel architecture is proposed along with the high
speed adder, which is shared among other operations and can perform operations independently as a separate
unit. To improve the area efficiency of the unit, carry select adder is designed with the novel resource sharing
technique which allows performing the operations with the minimum usage of the resources while computing
the carry and sum for „0‟ and „1‟. The design is implemented using the Xilinx Spartan 6 FPGA and the results
show the 23% improvement in the speed of the designed circuit
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
This document describes several designs for high-speed matrix multiplication. It proposes a new design called Parallel-Parallel Input Multi-Output (PPI-MO) that uses multiple multipliers and registers to perform matrix multiplication in parallel. This increases throughput. For an n×n matrix multiplication, it uses n2 multipliers, n2 registers, and n(n-2) adders. Elements of the first matrix are input to rows of multipliers simultaneously, while elements of the second matrix are input to columns. The partial products from each column are added to calculate elements of the output matrix.
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
The document describes a novel design for a multiplier and accumulator (MAC) unit using the modified Booth algorithm and parallel self-timed adder (PASTA). The modified Booth algorithm reduces the number of partial products compared to a regular multiplication process, lowering delay. A carry save adder design is also proposed to further improve performance in terms of computation speed, power consumption, and area compared to a conventional design using the modified Booth algorithm. Simulation results show the proposed MAC design with PASTA has better performance and reduced area overhead and critical path delay compared to conventional methods.
This document presents a VHDL implementation of an IEEE 754 floating point unit using a carry look ahead adder and radix-4 modified Booth encoder multiplier. The floating point unit performs single precision floating point multiplication. It consists of blocks to calculate the sign bit, add the exponents, multiply the significands using the modified Booth encoding technique, normalize the result, detect overflow/underflow, and implement pipelining. VHDL simulation results show that this floating point multiplier design has lower delay and power consumption compared to an array multiplier implementation.
IRJET- Design of 16 Bit Low Power Vedic Architecture using CSA & UTSIRJET Journal
The document proposes a 16-bit low power Vedic architecture using the Urdhava Tiryakbhyam Sutra (UTS) method of Vedic mathematics and carry select adders. UTS allows for the parallel generation of partial products in multiplication, improving speed. The proposed design decomposes 16-bit numbers into pairs of 8-bit blocks, uses 8-bit UTS multipliers, and adds results with carry select adders. This structure is expected to reduce power consumption and delay compared to other multipliers.
This document describes a proposed VLSI implementation of a high-speed DCT architecture for H.264 video codec design. It presents a Booth radix-8 multiplier-based multiply-accumulate (MAC) unit to improve throughput and minimize area complexity for 8x8 2D DCT computation. The proposed MAC architecture achieves a maximum operating frequency of 129.18MHz while reducing area by 64% compared to a regular merged MAC unit with a conventional multiplier. FPGA implementation and performance analysis demonstrate the suitability of the proposed DCT architecture for applications in HDTV systems.
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET Journal
This document presents four proposed VLSI realization architectures for implementing a high-speed and low-power 64-bit binary division. The architectures use digit-recurrence division algorithms with radix-16 representations and signed digit number systems to represent partial remainders and quotient digits. This allows carry-free addition for calculating next partial remainders and reduces the number of required division iterations. Two representations - static and semidynamic - are used for generating divisor multiples. Radix-16 signed digit sets between [-9,9] are used to represent quotient digits. Carry save and maximally redundant number systems are explored for representing partial remainders to reduce power dissipation. Simulation results show the proposed methods achieve 26-35% lower power
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...IRJET Journal
This document describes the design of a low power serial-parallel multiplier that uses a modified radix-4 Booth algorithm. It aims to improve performance over a standard serial-parallel Booth multiplier in terms of area, delay, and power. The proposed multiplier generates partial products conditionally, adding only non-zero Booth encodings and skipping zero operations to reduce transitions and increase throughput. It is implemented using FPGA technology to evaluate its utility for applications like digital signal processing and machine learning that require high performance multiplication.
Design and Implementation of FPGA Based Low Power Pipelined 64 Bit Risc Proce...IJEEE
This paper presents an efficient design and implementation of a 64 bit RISC Processor for Data Logging System. RISC is a design mechanism to reduce the amount of space, time, cost, power and heat etc. reduces the complexity of instruction. The processor is designed for both fixed and floating point number arithmetic calculation. A Data Logger is an electronic instrument that records environmental parameters such as temperature, Humidity, Wind Speed light intensity, water level and water quality. Data Loggers find its key application where automation and control is required. The necessary code written in the hardware description language Verilog HDL.
The document describes a design for a low power 32x32 multiplier that combines Booth and Vedic multiplication architectures. It partitions each 32-bit input into two 16-bit blocks, uses 16x16 Booth multipliers to generate partial products for each block, and employs 16x16 Vedic multipliers and carry select adders to add the partial products. This combined architecture achieves lower power and faster performance than individual Booth or Vedic multipliers. The design is implemented using Xilinx Vivado and evaluated for applications such as floating point multiplication.
VHDL Implementation of High Speed and Low Power BIST Based Vedic MultiplierIRJET Journal
This document describes a VHDL implementation of a built-in self-test (BIST) based Vedic multiplier circuit that aims to achieve high speed and low power consumption. A linear feedback shift register (LFSR) based test pattern generator (TPG) is used to generate random test vectors for the circuit under test, which is a 4-bit Vedic multiplier. The proposed design is simulated using Xilinx tools and VHDL. Simulation results show the BIST-based Vedic multiplier operating along with the test vectors from the TPG. Power analysis on different FPGAs shows the design has low dynamic power consumption.
In present day MAC unit is demanded in most of the Digital signal processing. Function of addition and multiplication is performed by the MAC unit. MAC operates in two stages. Firstly, multiplier computes the given number output and the result is forwarded to second stage i.e. addition/accumulation operates. Speed of multiplier is important in MAC unit which determines critical path as well as area is also of great importance in designing of MAC unit. Multiplier plays an important roles in many digital signal processing (DSP) applications such as in convolution, digital filters and other data processing unit. Many research has been performed on MAC implementation. This paper provides analysis of the research and investigations held till now.
This document presents the implementation of an optimized floating point adder on an FPGA that follows the IEEE 754-2008 standard for decimal floating point numbers. It uses a densely packed decimal encoding scheme. The design uses a low power equal bypass adder to reduce power consumption and delay. Testing showed the design has a maximum delay of 45ns and operates in a single clock cycle on a Virtex-5 FPGA. The optimized adder design can help reduce power usage in larger floating point arithmetic units.
High-Speed and Energy-Efficient MAC Design using Vedic Multiplier and Carry S...IRJET Journal
The document describes a proposed design for a high-speed and energy-efficient multiply-accumulate (MAC) unit using a Vedic multiplier and various carry-skip adders. It first examines the area and delays of three types of carry-skip adder designs. It then describes the design of a 16x16-bit Vedic multiplier based on the Urdhva-Tiryagbhyam multiplication algorithm. Finally, it presents the proposed MAC unit architecture which integrates the Vedic multiplier with the different carry-skip adder designs to achieve improved speed and energy efficiency compared to existing MAC units. The goal is to reduce multiplication delay and improve performance for applications like digital signal processing.
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
Nowadays low power design circuits are major important for data transmission and processing the information among various system designs. One of the major multipliers used for synchronizing the data transmission is the systolic array multiplier, low power designs are mostly used for increasing the performance and reducing the hardware complexity. Among all the mathematical operations, multiplier plays a major role where it processes more information and with the high complexity of circuit in the existing irreversible design. We develop a systolic array multiplier using reversible gates for low power appliances, faults and coverage of the reversible logic are calculated in this paper. To improvise more, we introduced a reversible logic gate and tested the reversible systolic array multiplier using the fault injection method of built-in self-test block observer (BILBO) in which all corner cases are covered which shows 97% coverage compared with existing designs. Finally, Xilinx ISE 14.7 was used for synthesis and simulation results and compared parameters with existing designs which prove more efficiency.
This document discusses the implementation of a Radix-4 Booth multiplier using VHDL. It begins with an introduction to multipliers and their importance in digital circuits. It then provides background on Booth multiplication algorithms and related work that has been done to improve multiplier speed and efficiency. The methodology section describes the design of a configurable Booth multiplier that can detect the bit range of the operands and perform the multiplication accordingly in fewer cycles to reduce delay. Simulation results are provided to verify the operation of the Radix-4 Booth multiplier design for different input values.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Design and Implementation of Low Power DSP Core with Programmable Truncated V...ijsrd.com
The programmable truncated Vedic multiplication is the method which uses Vedic multiplier and programmable truncation control bits and which reduces part of the area and power required by multipliers by only computing the most-significant bits of the product. The basic process of truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. These results in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision vedic multiplier is implemented, but the active section of the truncation is selected by truncation control bits dynamically at run-time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analyzed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a high speed Vedic truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. On comparison with the previous parallel multipliers Vedic should be much more fast and area should be reduced. Programmable truncated Vedic multiplier (PTVM) should be the basic part implemented for the arithmetic and PTMAC units.
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...IJERA Editor
Due to advancement of new technology in the field of VLSI and Embedded system, there is an increasing
demand of high speed and low power consumption processor. Speed of processor greatly depends on its
multiplier as well as adder performance. In spite of complexity involved in floating point arithmetic, its
implementation is increasing day by day. Due to which high speed adder architecture become important. Several
adder architecture designs have been developed to increase the efficiency of the adder. In this paper, we
introduce an architecture that performs high speed IEEE 754 floating point multiplier using modified carry
select adder (CSA). Modified CSA depend on booth encoder (BEC) Technique. Booth encoder, Mathematics is
an ancient Indian system of Mathematics. Here we are introduced two carry select based design. These designs
are implementation Xilinx Vertex device family.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to
around the operands to the closest exponent of 2. This way the machine intensive a part of the
multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is
evaluated by comparing its performance with those of some approximate and correct multipliers using
different design parameters. In this proposed approach combined the conventional RoBA multiplier with
Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA
multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved
the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the
DSP
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to around the operands to the closest exponent of 2. This way the machine intensive a part of the multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is evaluated by comparing its performance with those of some approximate and correct multipliers using different design parameters. In this proposed approach combined the conventional RoBA multiplier with Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the DSP.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to
around the operands to the closest exponent of 2. This way the machine intensive a part of the
multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is
evaluated by comparing its performance with those of some approximate and correct multipliers using
different design parameters. In this proposed approach combined the conventional RoBA multiplier with
Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA
multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved
the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the
DSP.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to around the operands to the closest exponent of 2. This way the machine intensive a part of the multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is evaluated by comparing its performance with those of some approximate and correct multipliers using different design parameters. In this proposed approach combined the conventional RoBA multiplier with Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the DSP.
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...IRJET Journal
This document describes the implementation of an effective self-timed multiplier for single precision floating point values using a carry-look ahead adder. It begins by introducing floating point representation and the need for floating point arithmetic in applications requiring a large dynamic range. It then discusses the IEEE 754 standard for single precision floating point format and the steps to multiply two floating point values. The key aspects of the proposed self-timed multiplier are that it uses a carry-look ahead adder to add the exponents, making the operation faster than a traditional ripple carry adder. VHDL is used to design and simulate the self-timed multiplier, which is shown to correctly perform multiplications under normal, overflow, and underflow conditions.
A Review of Different Methods for Booth MultiplierIJERA Editor
In this review paper, different type of implementation of Booth multiplier has been studied. Multipliers has great importance in digital signal processor, so designing a high-speed multiplier is the need of the hour. Advantages of using modified booth multiplier algorithm is that the number of partial product is reduced. Different types of addition algorithms are also discussed which are used for addition operation of multiplier.
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueIJMER
In this paper we have implemented Radix 8 High Speed Low Power Binary Multiplier using
Modified Gate Diffusion Input(M.G.D.I) technique. Here we have used “Urdhva-tiryakbhyam”(
Vertically and crosswise ) Algorithm because as compared to other multiplication algorithms it shows
less computation and less complexity since it reduces the total number of partial products to half of it.
This multiplier at gate level can be design using any technique such as CMOS, PTL and TG but design
with new MGDI technique gives far better result in terms of area, switching delay and power
dissipation. The radix 8 High Speed Low Power Pipelined Multiplier is designed with MGDI technique
in DSCH 3.5 and layout generated in Microwind tool. The Simulation is done using 0.12μm technology
at 1.2 v supply voltage and results are compared with conventional CMOS technique. Simulation result
shows great improvement in terms of area, switching delay and power dissipation.
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34×16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
Scientific workload execution on a distributed computing platform such as a
cloud environment is time-consuming and expensive. The scientific workload
has task dependencies with different service level agreement (SLA)
prerequisites at different levels. Existing workload scheduling (WS) designs
are not efficient in assuring SLA at the task level. Alongside, induces higher
costs as the majority of scheduling mechanisms reduce either time or energy.
In reducing, cost both energy and makespan must be optimized together for
allocating resources. No prior work has considered optimizing energy and
processing time together in meeting task level SLA requirements. This paper
presents task level energy and performance assurance-workload scheduling
(TLEPA-WS) algorithm for the distributed computing environment. The
TLEPA-WS guarantees energy minimization with the performance
requirement of the parallel application under a distributed computational
environment. Experiment results show a significant reduction in using energy
and makespan; thereby reducing the cost of workload execution in comparison
with various standard workload execution models.
More Related Content
Similar to Comparative study of single precision floating point division using different computational algorithms
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET Journal
This document presents four proposed VLSI realization architectures for implementing a high-speed and low-power 64-bit binary division. The architectures use digit-recurrence division algorithms with radix-16 representations and signed digit number systems to represent partial remainders and quotient digits. This allows carry-free addition for calculating next partial remainders and reduces the number of required division iterations. Two representations - static and semidynamic - are used for generating divisor multiples. Radix-16 signed digit sets between [-9,9] are used to represent quotient digits. Carry save and maximally redundant number systems are explored for representing partial remainders to reduce power dissipation. Simulation results show the proposed methods achieve 26-35% lower power
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...IRJET Journal
This document describes the design of a low power serial-parallel multiplier that uses a modified radix-4 Booth algorithm. It aims to improve performance over a standard serial-parallel Booth multiplier in terms of area, delay, and power. The proposed multiplier generates partial products conditionally, adding only non-zero Booth encodings and skipping zero operations to reduce transitions and increase throughput. It is implemented using FPGA technology to evaluate its utility for applications like digital signal processing and machine learning that require high performance multiplication.
Design and Implementation of FPGA Based Low Power Pipelined 64 Bit Risc Proce...IJEEE
This paper presents an efficient design and implementation of a 64 bit RISC Processor for Data Logging System. RISC is a design mechanism to reduce the amount of space, time, cost, power and heat etc. reduces the complexity of instruction. The processor is designed for both fixed and floating point number arithmetic calculation. A Data Logger is an electronic instrument that records environmental parameters such as temperature, Humidity, Wind Speed light intensity, water level and water quality. Data Loggers find its key application where automation and control is required. The necessary code written in the hardware description language Verilog HDL.
The document describes a design for a low power 32x32 multiplier that combines Booth and Vedic multiplication architectures. It partitions each 32-bit input into two 16-bit blocks, uses 16x16 Booth multipliers to generate partial products for each block, and employs 16x16 Vedic multipliers and carry select adders to add the partial products. This combined architecture achieves lower power and faster performance than individual Booth or Vedic multipliers. The design is implemented using Xilinx Vivado and evaluated for applications such as floating point multiplication.
VHDL Implementation of High Speed and Low Power BIST Based Vedic MultiplierIRJET Journal
This document describes a VHDL implementation of a built-in self-test (BIST) based Vedic multiplier circuit that aims to achieve high speed and low power consumption. A linear feedback shift register (LFSR) based test pattern generator (TPG) is used to generate random test vectors for the circuit under test, which is a 4-bit Vedic multiplier. The proposed design is simulated using Xilinx tools and VHDL. Simulation results show the BIST-based Vedic multiplier operating along with the test vectors from the TPG. Power analysis on different FPGAs shows the design has low dynamic power consumption.
In present day MAC unit is demanded in most of the Digital signal processing. Function of addition and multiplication is performed by the MAC unit. MAC operates in two stages. Firstly, multiplier computes the given number output and the result is forwarded to second stage i.e. addition/accumulation operates. Speed of multiplier is important in MAC unit which determines critical path as well as area is also of great importance in designing of MAC unit. Multiplier plays an important roles in many digital signal processing (DSP) applications such as in convolution, digital filters and other data processing unit. Many research has been performed on MAC implementation. This paper provides analysis of the research and investigations held till now.
This document presents the implementation of an optimized floating point adder on an FPGA that follows the IEEE 754-2008 standard for decimal floating point numbers. It uses a densely packed decimal encoding scheme. The design uses a low power equal bypass adder to reduce power consumption and delay. Testing showed the design has a maximum delay of 45ns and operates in a single clock cycle on a Virtex-5 FPGA. The optimized adder design can help reduce power usage in larger floating point arithmetic units.
High-Speed and Energy-Efficient MAC Design using Vedic Multiplier and Carry S...IRJET Journal
The document describes a proposed design for a high-speed and energy-efficient multiply-accumulate (MAC) unit using a Vedic multiplier and various carry-skip adders. It first examines the area and delays of three types of carry-skip adder designs. It then describes the design of a 16x16-bit Vedic multiplier based on the Urdhva-Tiryagbhyam multiplication algorithm. Finally, it presents the proposed MAC unit architecture which integrates the Vedic multiplier with the different carry-skip adder designs to achieve improved speed and energy efficiency compared to existing MAC units. The goal is to reduce multiplication delay and improve performance for applications like digital signal processing.
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
Nowadays low power design circuits are major important for data transmission and processing the information among various system designs. One of the major multipliers used for synchronizing the data transmission is the systolic array multiplier, low power designs are mostly used for increasing the performance and reducing the hardware complexity. Among all the mathematical operations, multiplier plays a major role where it processes more information and with the high complexity of circuit in the existing irreversible design. We develop a systolic array multiplier using reversible gates for low power appliances, faults and coverage of the reversible logic are calculated in this paper. To improvise more, we introduced a reversible logic gate and tested the reversible systolic array multiplier using the fault injection method of built-in self-test block observer (BILBO) in which all corner cases are covered which shows 97% coverage compared with existing designs. Finally, Xilinx ISE 14.7 was used for synthesis and simulation results and compared parameters with existing designs which prove more efficiency.
This document discusses the implementation of a Radix-4 Booth multiplier using VHDL. It begins with an introduction to multipliers and their importance in digital circuits. It then provides background on Booth multiplication algorithms and related work that has been done to improve multiplier speed and efficiency. The methodology section describes the design of a configurable Booth multiplier that can detect the bit range of the operands and perform the multiplication accordingly in fewer cycles to reduce delay. Simulation results are provided to verify the operation of the Radix-4 Booth multiplier design for different input values.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Design and Implementation of Low Power DSP Core with Programmable Truncated V...ijsrd.com
The programmable truncated Vedic multiplication is the method which uses Vedic multiplier and programmable truncation control bits and which reduces part of the area and power required by multipliers by only computing the most-significant bits of the product. The basic process of truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. These results in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision vedic multiplier is implemented, but the active section of the truncation is selected by truncation control bits dynamically at run-time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analyzed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a high speed Vedic truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. On comparison with the previous parallel multipliers Vedic should be much more fast and area should be reduced. Programmable truncated Vedic multiplier (PTVM) should be the basic part implemented for the arithmetic and PTMAC units.
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...IJERA Editor
Due to advancement of new technology in the field of VLSI and Embedded system, there is an increasing
demand of high speed and low power consumption processor. Speed of processor greatly depends on its
multiplier as well as adder performance. In spite of complexity involved in floating point arithmetic, its
implementation is increasing day by day. Due to which high speed adder architecture become important. Several
adder architecture designs have been developed to increase the efficiency of the adder. In this paper, we
introduce an architecture that performs high speed IEEE 754 floating point multiplier using modified carry
select adder (CSA). Modified CSA depend on booth encoder (BEC) Technique. Booth encoder, Mathematics is
an ancient Indian system of Mathematics. Here we are introduced two carry select based design. These designs
are implementation Xilinx Vertex device family.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to
around the operands to the closest exponent of 2. This way the machine intensive a part of the
multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is
evaluated by comparing its performance with those of some approximate and correct multipliers using
different design parameters. In this proposed approach combined the conventional RoBA multiplier with
Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA
multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved
the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the
DSP
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to around the operands to the closest exponent of 2. This way the machine intensive a part of the multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is evaluated by comparing its performance with those of some approximate and correct multipliers using different design parameters. In this proposed approach combined the conventional RoBA multiplier with Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the DSP.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to
around the operands to the closest exponent of 2. This way the machine intensive a part of the
multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is
evaluated by comparing its performance with those of some approximate and correct multipliers using
different design parameters. In this proposed approach combined the conventional RoBA multiplier with
Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA
multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved
the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the
DSP.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to around the operands to the closest exponent of 2. This way the machine intensive a part of the multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is evaluated by comparing its performance with those of some approximate and correct multipliers using different design parameters. In this proposed approach combined the conventional RoBA multiplier with Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the DSP.
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...IRJET Journal
This document describes the implementation of an effective self-timed multiplier for single precision floating point values using a carry-look ahead adder. It begins by introducing floating point representation and the need for floating point arithmetic in applications requiring a large dynamic range. It then discusses the IEEE 754 standard for single precision floating point format and the steps to multiply two floating point values. The key aspects of the proposed self-timed multiplier are that it uses a carry-look ahead adder to add the exponents, making the operation faster than a traditional ripple carry adder. VHDL is used to design and simulate the self-timed multiplier, which is shown to correctly perform multiplications under normal, overflow, and underflow conditions.
A Review of Different Methods for Booth MultiplierIJERA Editor
In this review paper, different type of implementation of Booth multiplier has been studied. Multipliers has great importance in digital signal processor, so designing a high-speed multiplier is the need of the hour. Advantages of using modified booth multiplier algorithm is that the number of partial product is reduced. Different types of addition algorithms are also discussed which are used for addition operation of multiplier.
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueIJMER
In this paper we have implemented Radix 8 High Speed Low Power Binary Multiplier using
Modified Gate Diffusion Input(M.G.D.I) technique. Here we have used “Urdhva-tiryakbhyam”(
Vertically and crosswise ) Algorithm because as compared to other multiplication algorithms it shows
less computation and less complexity since it reduces the total number of partial products to half of it.
This multiplier at gate level can be design using any technique such as CMOS, PTL and TG but design
with new MGDI technique gives far better result in terms of area, switching delay and power
dissipation. The radix 8 High Speed Low Power Pipelined Multiplier is designed with MGDI technique
in DSCH 3.5 and layout generated in Microwind tool. The Simulation is done using 0.12μm technology
at 1.2 v supply voltage and results are compared with conventional CMOS technique. Simulation result
shows great improvement in terms of area, switching delay and power dissipation.
Similar to Comparative study of single precision floating point division using different computational algorithms (20)
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34×16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
Scientific workload execution on a distributed computing platform such as a
cloud environment is time-consuming and expensive. The scientific workload
has task dependencies with different service level agreement (SLA)
prerequisites at different levels. Existing workload scheduling (WS) designs
are not efficient in assuring SLA at the task level. Alongside, induces higher
costs as the majority of scheduling mechanisms reduce either time or energy.
In reducing, cost both energy and makespan must be optimized together for
allocating resources. No prior work has considered optimizing energy and
processing time together in meeting task level SLA requirements. This paper
presents task level energy and performance assurance-workload scheduling
(TLEPA-WS) algorithm for the distributed computing environment. The
TLEPA-WS guarantees energy minimization with the performance
requirement of the parallel application under a distributed computational
environment. Experiment results show a significant reduction in using energy
and makespan; thereby reducing the cost of workload execution in comparison
with various standard workload execution models.
Investigating human subjects is the goal of predicting human emotions in the
real world scenario. A significant number of psychological effects require
(feelings) to be produced, directly releasing human emotions. The
development of effect theory leads one to believe that one must be aware of
one's sentiments and emotions to forecast one's behavior. The proposed line
of inquiry focuses on developing a reliable model incorporating
neurophysiological data into actual feelings. Any change in emotional affect
will directly elicit a response in the body's physiological systems. This
approach is named after the notion of Gaussian mixture models (GMM). The
statistical reaction following data processing, quantitative findings on emotion
labels, and coincidental responses with training samples all directly impact the
outcomes that are accomplished. In terms of statistical parameters such as
population mean and standard deviation, the suggested method is evaluated
compared to a technique considered to be state-of-the-art. The proposed
system determines an individual's emotional state after a minimum of 6
iterative learning using the Gaussian expectation-maximization (GEM)
statistical model, in which the iterations tend to continue to zero error. Perhaps
each of these improves predictions while simultaneously increasing the
amount of value extracted.
Early diagnosis of cancers is a major requirement for patients and a
complicated job for the oncologist. If it is diagnosed early, it could have made
the patient more likely to live. For a few decades, fuzzy logic emerged as an
emphatic technique in the identification of diseases like different types of
cancers. The recognition of cancer diseases mostly operated with inexactness,
inaccuracy, and vagueness. This paper aims to design the fuzzy expert system
(FES) and its implementation for the detection of prostate cancer. Specifically,
prostate-specific antigen (PSA), prostate volume (PV), age, and percentage
free PSA (%FPSA) are used to determine prostate cancer risk (PCR), while
PCR serves as an output parameter. Mamdani fuzzy inference method is used
to calculate a range of PCR. The system provides a scale of risk of prostate
cancer and clears the path for the oncologist to determine whether their
patients need a biopsy. This system is fast as it requires minimum calculation
and hence comparatively less time which reduces mortality and morbidity and
is more reliable than other economic systems and can be frequently used by
doctors.
The biomedical profession has gained importance due to the rapid and accurate diagnosis of clinical patients using computer-aided diagnosis (CAD) tools.
The diagnosis and treatment of Alzheimer’s disease (AD) using complementary multimodalities can improve the quality of life and mental state of patients.
In this study, we integrated a lightweight custom convolutional neural network
(CNN) model and nature-inspired optimization techniques to enhance the performance, robustness, and stability of progress detection in AD. A multi-modal
fusion database approach was implemented, including positron emission tomography (PET) and magnetic resonance imaging (MRI) datasets, to create a fused
database. We compared the performance of custom and pre-trained deep learning models with and without optimization and found that employing natureinspired algorithms like the particle swarm optimization algorithm (PSO) algorithm significantly improved system performance. The proposed methodology,
which includes a fused multimodality database and optimization strategy, improved performance metrics such as training, validation, test accuracy, precision, and recall. Furthermore, PSO was found to improve the performance of
pre-trained models by 3-5% and custom models by up to 22%. Combining different medical imaging modalities improved the overall model performance by
2-5%. In conclusion, a customized lightweight CNN model and nature-inspired
optimization techniques can significantly enhance progress detection, leading to
better biomedical research and patient care.
Class imbalance is a pervasive issue in the field of disease classification from
medical images. It is necessary to balance out the class distribution while training a model. However, in the case of rare medical diseases, images from affected
patients are much harder to come by compared to images from non-affected
patients, resulting in unwanted class imbalance. Various processes of tackling
class imbalance issues have been explored so far, each having its fair share of
drawbacks. In this research, we propose an outlier detection based image classification technique which can handle even the most extreme case of class imbalance. We have utilized a dataset of malaria parasitized and uninfected cells. An
autoencoder model titled AnoMalNet is trained with only the uninfected cell images at the beginning and then used to classify both the affected and non-affected
cell images by thresholding a loss value. We have achieved an accuracy, precision, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively,
performing better than large deep learning models and other published works.
As our proposed approach can provide competitive results without needing the
disease-positive samples during training, it should prove to be useful in binary
disease classification on imbalanced datasets.
Recently, plant identification has become an active trend due to encouraging
results achieved in plant species detection and plant classification fields
among numerous available plants using deep learning methods. Therefore,
plant classification analysis is performed in this work to address the problem
of accurate plant species detection in the presence of multiple leaves together,
flowers, and noise. Thus, a convolutional neural network based deep feature
learning and classification (CNN-DFLC) model is designed to analyze
patterns of plant leaves and perform classification using generated finegrained feature weights. The proposed CNN-DFLC model precisely estimates
which the given image belongs to which plant species. Several layers and
blocks are utilized to design the proposed CNN-DFLC model. Fine-grained
feature weights are obtained using convolutional and pooling layers. The
obtained feature maps in training are utilized to predict labels and model
performance is tested on the Vietnam plant image (VPN-200) dataset. This
dataset consists of a total number of 20,000 images and testing results are
achieved in terms of classification accuracy, precision, recall, and other
performance metrics. The mean classification accuracy obtained using the
proposed CNN-DFLC model is 96.42% considering all 200 classes from the
VPN-200 dataset.
Big data as a service (BDaaS) platform is widely used by various
organizations for handling and processing the high volume of data generated
from different internet of things (IoT) devices. Data generated from these IoT
devices are kept in the form of big data with the help of cloud computing
technology. Researchers are putting efforts into providing a more secure and
protected access environment for the data available on the cloud. In order to
create a safe, distributed, and decentralised environment in the cloud,
blockchain technology has emerged as a useful tool. In this research paper, we
have proposed a system that uses blockchain technology as a tool to regulate
data access that is provided by BDaaS platforms. We are securing the access
policy of data by using a modified form of ciphertext policy-attribute based
encryption (CP-ABE) technique with the help of blockchain technology. For
secure data access in BDaaS, algorithms have been created using a mix of CPABE with blockchain technology. Proposed smart contract algorithms are
implemented using Eclipse 7.0 IDE and the cloud environment has been
simulated on CloudSim tool. Results of key generation time, encryption time,
and decryption time has been calculated and compared with access control
mechanism without blockchain technology.
Internet of things (IoT) has become one of the eminent phenomena in human
life along with its collaboration with wireless sensor networks (WSNs), due
to enormous growth in the domain; there has been a demand to address the
various issues regarding it such as energy consumption, redundancy, and
overhead. Data aggregation (DA) is considered as the basic mechanism to
minimize the energy efficiency and communication overhead; however,
security plays an important role where node security is essential due to the
volatile nature of WSN. Thus, we design and develop proximate node aware
secure data aggregation (PNA-SDA). In the PNA-SDA mechanism, additional
data is used to secure the original data, and further information is shared with
the proximate node; moreover, further security is achieved by updating the
state each time. Moreover, the node that does not have updated information is
considered as the compromised node and discarded. PNA-SDA is evaluated
considering the different parameters like average energy consumption, and
average deceased node; also, comparative analysis is carried out with the
existing model in terms of throughput and correct packet identification.
Drones provide an alternative progression in protection submissions since
they are capable of conducting autonomous seismic investigations. Recent
advancement in unmanned aerial vehicle (UAV) communication is an internet
of a drone combined with 5G networks. Because of the quick utilization of
rapidly progressed registering frameworks besides 5G officialdoms, the
information from the user is consistently refreshed and pooled. Thus, safety
or confidentiality is vital among clients, and a proficient substantiation
methodology utilizing a vigorous sanctuary key. Conventional procedures
ensure a few restrictions however taking care of the assault arrangements in
information transmission over the internet of drones (IOD) environmental
frameworks. A unique hyperelliptical curve (HEC) cryptographically based
validation system is proposed to provide protected data facilities among
drones. The proposed method has been compared with the existing methods
in terms of packet loss rate, computational cost, and delay and thereby
provides better insight into efficient and secure communication. Finally, the
simulation results show that our strategy is efficient in both computation and
communication.
Monitoring behavior, numerous actions, or any such information is considered
as surveillance and is done for information gathering, influencing, managing,
or directing purposes. Citizens employ surveillance to safeguard their
communities. Governments do this for the purposes of intelligence collection,
including espionage, crime prevention, the defense of a method, a person, a
group, or an item; or the investigation of criminal activity. Using an internet
of things (IoT) rover, the area will be secured with better secrecy and
efficiency instead of humans, will provide an additional safety step. In this
paper, there is a discussion about an IoT rover for remote surveillance based
around a Raspberry Pi microprocessor which will be able to monitor a
closed/open space. This rover will allow safer survey operations and would
help to reduce the risks involved with it.
In a world where climate change looms large the spotlight often shines on
greenhouse gases, but the shadow of man-made aerosols should not be
underestimated. These tiny particles play a pivotal role in disrupting Earth's
radiative equilibrium, yet many mysteries surround their influence on various
physical aspects of our planet. The root of these mysteries lies in the limited
data we have on aerosol sources, formation processes, conversion dynamics,
and collection methods. Aerosols, composed of particulate matter (PM),
sulfates, and nitrates, hold significant sway across the hemisphere. Accurate
measurement demands the refinement of in-situ, satellite, and ground-based
techniques. As aerosols interact intricately with the environment, their full
impact remains an enigma. Enter a groundbreaking study in Morocco that
dared to compare an internet of thing (IoT) system with satellite-based
atmospheric models, with a focus on fine particles below 10 and 2.5
micrometers in diameter. The initial results, particularly in regions abundant
with extraction pits, shed light on the IoT system's potential to decode
aerosols' role in the grand narrative of climate change. These findings inspire
hope as we confront the formidable global challenge of climate change.
The use of technology has a significant impact to reduce the consequences of
accidents. Sensors, small components that detect interactions experienced by
various components, play a crucial role in this regard. This study focuses on
how the MPU6050 sensor module can be used to detect the movement of
people who are falling, defined as the inability of the lower body, including
the hips and feet, to support the body effectively. An airbag system is
proposed to reduce the impact of a fall. The data processing method in this
study involves the use of a threshold value to identify falling motion. The
results of the study have identified a threshold value for falling motion,
including an acceleration relative (AR) value of less than or equal to 0.38 g,
an angle slope of more than or equal to 40 degrees, and an angular velocity
of more than or equal to 30 °/s. The airbag system is designed to inflate
faster than the time of impact, with a gas flow rate of 0.04876 m3
/s and an
inflating time of 0.05 s. The overall system has a specificity value of 100%,
a sensitivity of 85%, and an accuracy of 94%.
The fundamental principle of the paper is that the soil moisture sensor obtains
the moisture content level of the soil sample. The water pump is automatically
activated if the moisture content is insufficient, which causes water to flow
into the soil. The water pump is immediately turned off when the moisture
content is high enough. Smart home, smart city, smart transportation, and
smart farming are just a few of the new intelligent ideas that internet of things
(IoT) includes. The goal of this method is to increase productivity and
decrease manual labour among farmers. In this paper, we present a system for
monitoring and regulating water flow that employs a soil moisture sensor to
keep track of soil moisture content as well as the land’s water level to keep
track of and regulate the amount of water supplied to the plant. The device
also includes an automated led lighting system.
In order to provide sensing services to low-powered IoT devices, wireless sensor networks (WSNs) organize specialized transducers into networks. Energy usage is one of the most important design concerns in WSN because it is very hard to replace or recharge the batteries in sensor nodes. For an energy-constrained network, the clustering technique is crucial in preserving battery life. By strategically selecting a cluster head (CH), a network's load can be balanced, resulting in decreased energy usage and extended system life. Although clustering has been predominantly used in the literature, the concept of chain-based clustering has not yet been explored. As a result, in this paper, we employ a chain-based clustering architecture for data dissemination in the network. Furthermore, for CH selection, we employ the coati optimisation algorithm, which was recently proposed and has demonstrated significant improvement over other optimization algorithms. In this method, the parameters considered for selecting the CH are energy, node density, distance, and the network’s average energy. The simulation results show tremendous improvement over the competitive cluster-based routing algorithms in the context of network lifetime, stability period (first node dead), transmission rate, and the network's power reserves.
The construction industry is an industry that is always surrounded by
uncertainties and risks. The industry is always associated with a threatindustry which has a complex, tedious layout and techniques characterized by
unpredictable circumstances. It comprises a variety of human talents and the
coordination of different areas and activities associated with it. In this
competitive era of the construction industry, delays and cost overruns of the
project are often common in every project and the causes of that are also
common. One of the problems which we are trying to cater to is the improper
handling of materials at the construction site. In this paper, we propose
developing a system that is capable of tracking construction material on site
that would benefit the contractor and client for better control over inventory
on-site and to minimize loss of material that occurs due to theft and misplacing
of materials.
Today, health monitoring relies heavily on technological advancements. This
study proposes a low-power wide-area network (LPWAN) based, multinodal
health monitoring system to monitor vital physiological data. The suggested
system consists of two nodes, an indoor node, and an outdoor node, and the
nodes communicate via long range (LoRa) transceivers. Outdoor nodes use an
MPU6050 module, heart rate, oxygen pulse, temperature, and skin resistance
sensors and transmit sensed values to the indoor node. We transferred the data
received by the master node to the cloud using the Adafruit cloud service. The
system can operate with a coverage of 4.5 km, where the optimal distance
between outdoor sensor nodes and the indoor master node is 4 km. To further
predict fall detection, various machine learning classification techniques have
been applied. Upon comparing various classifier techniques, the decision tree
method achieved an accuracy of 0.99864 with a training and testing ratio of
70:30. By developing accurate prediction models, we can identify high-risk
individuals and implement preventative measures to reduce the likelihood of
a fall occurring. Remote monitoring of the health and physical status of elderly
people has proven to be the most beneficial application of this technology.
The effectiveness of adaptive filters are mainly dependent on the design
techniques and the algorithm of adaptation. The most common adaptation
technique used is least mean square (LMS) due its computational simplicity.
The application depends on the adaptive filter configuration used and are well
known for system identification and real time applications. In this work, a
modified delayed μ-law proportionate normalized least mean square
(DMPNLMS) algorithm has been proposed. It is the improvised version of the
µ-law proportionate normalized least mean square (MPNLMS) algorithm.
The algorithm is realized using Ladner-Fischer type of parallel prefix
logarithmic adder to reduce the silicon area. The simulation and
implementation of very large-scale integration (VLSI) architecture are done
using MATLAB, Vivado suite and complementary metal–oxide–
semiconductor (CMOS) 90 nm technology node using Cadence RTL and
Genus Compiler respectively. The DMPNLMS method exhibits a reduction
in mean square error, a higher rate of convergence, and more stability. The
synthesis results demonstrate that it is area and delay effective, making it
practical for applications where a faster operating speed is required.
The increasing demand for faster, robust, and efficient device development of enabling technology to mass production of industrial research in circuit design deals with challenges like size, efficiency, power, and scalability. This paper, presents a design and analysis of low power high speed full adder using negative capacitance field effecting transistors. A comprehensive study is performed with adiabatic logic and reversable logic. The performance of full adder is studied with metal oxide field effect transistor (MOSFET) and negative capacitance field effecting (NCFET). The NCFET based full adder offers a low power and high speed compared with conventional MOSFET. The complete design and analysis are performed using cadence virtuoso. The adiabatic logic offering low delay of 0.023 ns and reversable logic is offering low power of 7.19 mw.
The global agriculture system faces significant challenges in meeting the
growing demand for food production, particularly given projections that the
world's population will reach 70% by 2050. Hydroponic farming is an
increasingly popular technique in this field, offering a promising solution to
these challenges. This paper will present the improvement of the current
traditional hydroponic method by providing a system that can be used to
monitor and control the important element in order to help the plant grow up
smoothly. This proposed system is quite efficient and user-friendly that can
be used by anyone. This is a combination of a traditional hydroponic system,
an automatic control system and a smartphone. The primary objective is to
develop a smart system capable of monitoring and controlling potential
hydrogen (pH) levels, a key factor that affects hydroponic plant growth.
Ultimately, this paper offers an alternative approach to address the challenges
of the existing agricultural system and promote the production of clean,
disease-free, and healthy food for a better future.
More from International Journal of Reconfigurable and Embedded Systems (20)
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Generative AI leverages algorithms to create various forms of content
Comparative study of single precision floating point division using different computational algorithms
1. International Journal of Reconfigurable and Embedded Systems (IJRES)
Vol. 12, No. 3, November 2023, pp. 336~344
ISSN: 2089-4864, DOI: 10.11591/ijres.v12.i3pp336-344 336
Journal homepage: http://ijres.iaescore.com
Comparative study of single precision floating point division
using different computational algorithms
Naginder Singh, Kapil Parihar
Department of Electronics and Communication Engineering, M.B.M. University, Jodhpur, India
Article Info ABSTRACT
Article history:
Received Jun 25, 2022
Revised Dec 10, 2022
Accepted Mar 20, 2023
This paper presents different computational algorithms to implement single
precision floating point division on field programmable gate arrays (FPGA).
Fast division computation algorithms can apply to all division cases by
which an efficient result will be obtained in terms of delay time and power
consumption. 24-bit Vedic multiplication (Urdhva-Triyakbhyam-sutra)
technique enhances the computational speed of the mantissa module and this
module is used to design a 32-bit floating point multiplier which is the
crucial feature of this proposed design, which yields a higher computational
speed and reduced delay time. The proposed design of floating-point divider
using fast computational algorithms synthesized using Verilog hardware
description language has a 32-bit floating point multiplier module unit and a
32-bit floating point subtractor module unit. Xilinx Spartan 6 SP605
evaluation platform is used to verify this proposed design on FPGA.
Synthesis results provide the device utilization and propagation delay
parameters for the proposed design and a comparative study is done with
previous work. Input to the divider is provided in IEEE 754 32-bit formats.
Keywords:
Field programmable gate array
Floating point division
Goldschmidt algorithm
Newton-Raphson algorithm
Vedic-multiplication
This is an open access article under the CC BY-SA license.
Corresponding Author:
Naginder Singh
Department of Electronics and Communication Engineering, M.B.M. University
Jodhpur, Rajasthan, India
Email: naginder.singh0110@gmail.com
1. INTRODUCTION
Field programmable gate array (FPGA) provides a very versatile high computational environment
and currently plays an important role due to the advancement of technology process or architectures modified
instantaneously. For the proposed design the cost per logic cell is reduced voluntarily because of the
enhanced feature of Spartan 6. In many digital processing systems for many arithmetic and logical units,
floating-point dividers play a crucial role. To design a divider power consumption and throughput pays major
factors for processing the information. This paper is based on computational algorithms to design a single
precision floating point divider; in which multiplication and subtraction are the basic mathematical
operations as described earlier in [1]-[3]. The arithmetic operations such as multiplication and subtraction
modules are used to implement and design Goldschmidt and Newton-Raphson algorithm for the proposed 32-
bit floating point division [4]. For multiplication Urdhva-Tiryagbhyam-sutra is an ancient Vedic mathematics
method, this algorithm reduces the number of steps required for multiplication and can increase the speed of
multiplication [5]. In this proposed design of divider, the module to compute mantissa in the multiplication
unit is designed using medic mathematics (Urdhva-Tiryagbhyam-sutra). A module of 16*16 Vedic multiplier
implemented by using the concept Urdhva-Tiryagbhyam-sutra and Nikhilam-sutra multiplication techniques
[6]-[9]. In this paper the exponent module used in the floating-point multiplier uses a ripple carry adder to
provide a low area and simple layout [10]. Binary floating point multiplier implementation and blocks
2. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Comparative study of single precision floating point division using … (Naginder Singh)
337
described and Urdhva-Tiryagbhyam-sutra used to achieve high speed [11]. In designing of processor,
arithmetic-logic unit (ALU) unit arithmetic operations are done by addition, subtraction, and multiplication
Vedic algorithm plays an important role [12]. Nikhilam-sutra based multiplier is designed to get high speed
as the computational speed gained by Vedic mathematics described [13]. High speed squaring circuit for
binary numbers [14] is proposed based on Vedic mathematics. To perform signal processing Goldschmidt
computational algorithm-based division provides the high computational speed and throughput described
[15] and it’s suitable for high computational demanding applications. Power consumption plays an important
role and is reduced efficiently by designing a multiplier using Vedic multiplication which reduces the
required look up tables and slices resulting in reduced power consumption [16]. A floating-point multiplier is
implemented which is designed for efficient power consumption and device utilization is reduced [17].
In digital signal processing application encryption and decryption algorithms in cryptography; a
divider is one of the most important hardware blocks which is applicable in most applications [18] and also in
various mathematical computational algorithms used to design floating-point divide-add fused (DAF)
architecture [19]. The floating-point DAF architecture has a dedicated unit for floating point division
followed by addition or subtraction so that a combined operation is performed [20]. The delay time and better
device utilization which is an important factor and have to optimize are done by a division architecture based
on the Vedic mathematics algorithm Parvartya sutra [21]. The restoring division is described for sixteen-bit
division and well explained its implementation of restoring division algorithm in encryption system [22].
Floating point architectures have low frequency, larger area and high latency in nature described [23]. A
reversible Vedic multiplier is designed using reversible logic gates and results in less delay and less power
consumption [24]. High computational speed and throughput provide by Newton-Raphson and Goldschmidt
computational division algorithm methods both converge much faster within one iteration to produce the
final quotient, so these computational algorithms in a single iteration generate twice as many digits to find the
final quotient as they start with a close approximation to find the final quotient [25]. To represent 32-bit and
64-bit floating point numbers IEEE 754 standard format is used [26]. Booth-based multiplication is used to
pre-calculate the carry and the presented design improved the speed and dissipated less power with respect to
the existing multipliers [27]. A fast-floating point unit is designed and implemented for FPGAs [28]. As
floating-point number is separated into three parts where a fixed number of bits is used to represent these
three parts that are sign, exponent and mantissa. To describe single and double precision the structure of the
IEE 754 format is shown in Table 1. For single precision IEEE 754 format, 23 bits are used to represent
mantissa, exponent part is represented by 8-bits and the sign bit is represented by the most significant bit
(MSB) part where the sign of the floating-point number depends on the MSB bit, floating point number is
negative when MSB is 1 else the number is positive when MSB bit is 0. For division purposes in many
signals processing algorithms complex iterative process is used so, not only is the precision to be maintained
but it has to be maintained for very large data intervals and is achieved by using Newton-Raphson and
Goldschmidt computational algorithms.
This paper presents implementation using Goldschmidt and Newton-Raphson computational
algorithm. To design of 32-bit floating point division which uses arithmetic operations such as 32-bit floating
point multiplication module and floating-point subtraction modules and a 24-bit Vedic multiplication used
for computing mantissa in 32-floating point multiplier module in the proposed designed. The paper describes
the implementation and design of Newton-Raphson and Goldschmidt computational algorithms single
precision floating point divider on FPGA.
The basics of IEEE 754 floating-point representation are discussed: in section 2 and the division
algorithms which are i) Newton Raphson and ii) Goldschmidt computational algorithm methods are
discussed in section 3. The implementation of Vedic multiplication technique base 32-bit floating point
multiplier is discussed in section 4. Floating point sub-tractor describes in section 5. The simulation results in
6 and conclusion describes is in section 7.
2. FLOATING POINT STANDARD
IEEE 754 standard representation for binary floating-point numbers [26]. The binary number
represents the single and double precision formats, 32-bit represented by single precision and 64-bit
represented by double precision. Single and double precision formats of IEEE 754 structure are shown in
Table 1 sign, exponent and mantissa are 3 fields’ that present a fixed number of bits represented in IEEE 754
format. In single precision, the mantissa part represents by 23-bits and for normalization 1-bit is added to the
MSB, the exponent part represents by 8-bits, and it is biased to 127, and the exponent part is represented in
excess of 127-bit format and the sign bit represents by 1 bit or MSB. The sign bit or MSB represents the sign
of the floating-point number, hence the floating-point number is negative when MSB is 1 else the number is
positive when the MSB bit is 0.
3. ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 3, November 2023: 336-344
338
Table 1. IEEE 754 format representation of 32-bit and 64-bit
Precision Sign Exponent Mantissa
Single precision (32 bit) 1-bit 8-bit 23-bit
Double precision (64 bit) 1-bit 11-bit 52-bit
3. DIVISION ALGORITHMS
The division operation requires a very high number of iterations and is implemented as an inverse
process of the multiplication operation. When combined with shift operation the number of iterations is
reduced to the number of result valid bits. The fast algorithm was developed to produce a more precise result
that requires more iteration and for this purpose algorithms that were developed are Goldschmidt and
Newton-Raphson division algorithms. These algorithms within a single iteration converge much faster
however; several additions or subtractions and multiplications operations are needed. The Goldschmidt
division computational algorithm goal to converge the denominator to 1 is achieved by continual
multiplication of the numerator and denominator by the same factor Fi. The quotient produced as a result
directly converges the numerator at the same time. By calculating the multiplicative inverse of denominator
which is produced by multiplied numerator to produce the result and this process is based on Newton-
Raphson division computational algorithm.
3.1. Goldschmidt algorithm
A complex initialization is used to compute continual iterations for the Goldschmidt computational
division algorithm. To get the resultant quotient this algorithm used a common factor 𝐺𝑖, this common factor
continually multiplying both the dividend and divisor that converge the divisor (𝐷) to one is an iterative
process used by this algorithm [25]. In this proposed floating point divider design to perform arithmetic
operations, a 32-floating point multiplier module and subtractor module are used in Goldschmidt
computational division algorithm. In this division algorithm, the divisor should be in the interval (0, 1) by
scaling the numerator and denominator, shifting operation is used for scaling the divisor. The fast division
algorithms are developed to produce a precise result for which more iteration is required. For a single
iteration, the Goldschmidt algorithm converges much faster for computing the result to perform the continual
iteration process thus; more multiplication and subtraction operations are needed. This computational
algorithm produces an optimized result by computing iterations in one cycle. As shown in (1) where N is the
numerator, 𝑄 is a quotient, 𝐷 is the denominator and 𝐺𝑖 is factoring at iteration 𝑖.
𝑄 =
𝑁
𝐷
∏ 𝐺𝑖
𝑛
𝑖
∏ 𝐺𝑖
𝑛
𝑖
(1)
The 𝐺𝑖+1 is the factor for the next iteration calculated using (2) and that is used to substitute in (1)
for calculating the further, iterations. For the next iteration, the numerator and denominator have to be
calculated and it’s done by (3).
𝐺𝑖+1 = 2 − 𝐷𝑖 (2)
𝑁𝑖+1
𝐷𝑖+1
=
𝑁𝑖
𝐷𝑖
𝐺𝑖+1
𝐺𝑖+1
(3)
Floating point division is based on the Goldschmidt computational algorithm where inputs 𝑁 and 𝐷
are the numerator and the denominator given as input for division. To calculate one iteration Goldschmidt's
computational algorithm uses one multiplier and one subtraction module. This computational algorithm
scales the numerator and denominator by the common factor 𝐺𝑖, in four iterations this algorithm computes
directly and produces the quotient in the result. Hence, the numerator converges directly to the quotient as the
denominator converges to one. For more refined results, more iterations are performed. F1, F2 and F3 are the
three floating point input numbers the resultant division of F1 by F2 is followed by the addition or
subtraction of F3 this operation performed by the Goldschmidt computational algorithm based proposed
floating-point divider is carried out as shown in the simulation results.
3.2. Newton-Raphson algorithm
Newton-Raphson division algorithm, this computational algorithm starts with the computation of the
multiplicative inverse; the multiplicative inverse is calculated by an iterative process and the resultant
quotient is found by l multiply dividend and the calculated multiplicative inverse. The shifting operation is
used for scaling the divisor. To get more precise results more iterations are required and for this purpose, fast
4. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Comparative study of single precision floating point division using … (Naginder Singh)
339
division algorithms are used. Newton-Raphson computational algorithm-based floating point division
flowchart, using this algorithm to compute iteration converges much faster. Thus, for getting precise results
number of iterations required and to perform continual iterations several multiplication and subtraction
operations are required.
In this computational algorithm, an optimized result can be achieved in one cycle by computing
three iterations, for which two multipliers and a subtraction module are for iteration. More iteration is
performed to refine the multiplicative inverse. The Newton-Raphson division algorithm is based on
calculating of reciprocal of the denominator and producing the result reciprocal of the denominator
multiplied by the numerator. The formula to calculate the multiplicative inverse of the denominator) where 𝑍𝑖
is the multiplicative inverse at iteration 𝑖 is given by (4) in which D is the denominator.
𝑍𝑖+1 = 𝑍𝑖 + 𝑍𝑖(1 − 𝐷𝑍𝑖) = 𝑍𝑖(2 − 𝐷𝑍𝑖) (4)
In Newton-Raphson division by scaling the denominator, minimization of maximal relative error can
be achieved in the interval [0.5, 1]. The Newton-Raphson division uses complex initialization which requires
addition, multiplication and subtraction operations and therefore this initialization has been classified as the first
iteration cycle. 𝑍0 represents the initial iteration and its initialization is carried out by (5).
𝑍0 =
48
17
−
32
17
𝐷 (5)
4. FLOATING POINT MULTIPLIER
Figure 1 shows the architecture of the proposed multiplier module which used Vedic multiplication,
so (24x24) Vedic multiplier is used for calculating the mantissa part which enhances the overall performance
of the proposed 32-bit multiplier. The multiplication module unit is divided into three parts for two inputs,
input A [31-0] and B [31-0] which are in 32-bit floating point number IEEE 754 format for multiplication,
this format describes a fixed number of bits i.e.; sign (s1 and s2), exponent (e1, e2) & mantissa (m1, m2).
Figure 1. The proposed design for 32-bit floating point multiplier (FPM)
4.1. Mantissa unit
For getting high computation speed in this module for multiplication of the mantissa unit, lower bits
of inputs m1 and m2 that is A [22-0] and B [22-0]. Calculation is done using (24x24)-bit Vedic multiplier
and produces an output which is 24-bit normalized output which has leading 1 as their MSB. This multiplier
module efficiently used Vedic mathematics which also provides higher throughput.
5. ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 3, November 2023: 336-344
340
4.2. Exponent unit
A ripple carry adder is used in the exponent unit for the exponent calculation, e1 and e2 that is A
[30-23]) and B [30–23]. A [30-23]) and B [30–23] are the inputs given to the 8-bit ripple carry adder unit
which computes and the result is biased to 127. The result by this exponent unit the two cases overflow and
underflow are carefully handled so that to get the optimized result.
4.3. Sign unit
The two inputs s1 and s2 that A [31] and (B [31] is the 31st bit of the 32-bit floating point numbers
are given to the sign unit and computed by XORing s1 and s2 and the output of the exclusively-OR (XOR)
gate is the sign floating point number. Both Goldschmidt and Newton-Raphson computational algorithms
were designed using this multiplier module which is implemented using Vedic mathematics for getting
efficient results. If 32-bit single precision IEEE-754 format is considered then a 24-bit Vedic multiplier is
used to calculate the mantissa. To calculate the exponent, an 8-bit ripple carries adder is used, this
implementation results in a low–area, ease of implementation and simple layout. The input to the multiplier
for multiplication is provided in standard format for single precision which is in IEEE 754 format.
5. FLOATING POINT SUBTRACTOR
In the substractor module X [31-0] and Y [31-0] are the inputs given to the floating point
substractor. The floating-point inputs to the substractor module are separated as a sign, exponent and
mantissa and they are represented in IEEE 754 format further, the substraction operation is done in a stepwise
manner. Initially, to perform the substraction operation the floating-point number is unpacked so that the sign
(b1, b2), exponent (e1, e2) and mantissa (s1, s2) are identified for both the inputs. Then the alignment is done
to equalize the exponent and normalization is performed in the mantissa part. By comparing the mantissa m1
and m2 the relation between inputs exponent e1 and e2 which are (X [30-23]) and (Y [30-23]) is determined
if neither of the operands is infinite. After this the shifting operation is done on mantissa until the exponents
e1 and e2 that is becomes equal that is X [30-23]) = Y [30-23]. Further, the mantissa m1 and m2 are
substracted that is X [23-0] and (Y [23-0] after alignment and normalization process and after this resultant
value of mantissa are rounded off. Finally, the concatenation is done to the sign, exponent and mantissa parts.
The result of this concatenation is the single precision floating point number. The complete architecture of
the 32-bit floating point substractor module is shown in Figure 2 where X and Y are the inputs given to
substractor and this is used to calculate the iterative process computes in the fast computational division
algorithms that are the Goldschmidt algorithm and Newton-Raphson algorithm for getting the optimized
division result.
Figure 2. Architecture of floating point substractor (FPS)
6. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Comparative study of single precision floating point division using … (Naginder Singh)
341
6. SIMULATION RESULT
ISim simulator is used for performing simulations and the Xilinx Spartan 6 SP605 evaluation
platform is used to implement and synthesize the proposed floating point divider design which is based on
Vedic mathematics. The two numbers (N, D) are given as input for the division to 32-bit floating point
divider. Table 2 shows the sample inputs (N, D) and the resultant output quotient (Q) of the sample inputs for
simulation for 32-bit floating point division.
Table 2. The sample inputs (N, D) and the resultant output (Q) of the sample inputs
Parameter Decimal Sign Exponent Mantissa
N 3.14 0 10000000 10010001111010111000011
D 8.56 0 10000010 00010001111010111000011
Q 0.366822 0 01111101 01110111101000000100110
The device utilization parameters of the Xilinx power estimator (XPE)-14.3 are shown in Table 3
and Table 4 for the Goldschmidt and Newton-Raphson computational algorithm-based 32-bit floating point
division implemented on the Xilinx Spartan 6 SP605 valuation platform FPGA. The device utilization
parameters such as the number of slice registers, look-up tables (LUTs), occupied slices and bounded input
and bounded output utilization are efficiently less utilized. By using Goldschmidt-based single floating-point
divider with respect to Newton-Raphson based single precision floating point divider as shown in device
utilization parameter Table 3 and Table 4.
Table 3. Device utilization parameters Goldschmidt computational algorithm
Logic utilization Used Available Utilization
Slice Registers Nos. 408 54,576 0.7%
Slice LUTs Nos. 9,185 27,288 33%
Occupied Slices Nos. 2,913 6,822 42%
Bonded IOBs NOs. 193 296 65%
Table 4. Device utilization parameters for Newton- Raphson computational algorithm
Logic utilization Used Available Utilization
Slice Registers Nos. 556 54,576 1%
Slice LUTs Nos. 11,584 27,288 42%
Occupied Slices Nos. 3,719 6,822 54%
Bonded IOBs NOs. 257 296 86%
The Table 5 compares the performance of the proposed DAF units using two computational
algorithms, Newton-Raphson and Goldschmidt, in terms of power consumption and latency time. The table
clearly indicates that the Goldschmidt-based DAF performs better in both aspects compared to the Newton-
Raphson-based DAF. Therefore, the statement correctly concludes that the Goldschmidt-based DAF gives
better performance in terms of latency and power with respect to the others.
Table 5. Comparative analysis of DAF unit in terms of power and latency time
S.No. Jeevan et al. [9] Nagendra et al. [10]
Using DAF
Newton-Raphson
Using DAF
Goldschmidt based
Latency time 175.49 ns 130.8 ns 110 ns 75 ns
Power - 0.050 W 0.057 W 0.031 W
The device summary reports of XPE-14.3 for the proposed 32-bit floating point division using
Goldschmidt and Newton-Raphson computational algorithm is shown in Table 6 where Goldschmidt-based
division parameters such as junction temperature, on-chip power, thermal margin and ambient temperature
(ϴJA) are less as compared to the Newton-Raphson based division. The synthesis and implementation is
done on Spartan-6 (SP605) evaluation platform field programmable gate array of the proposed 32-bit floating
point divider. Figures 3 and 4 show the simulation results where Q is used to represent the quotient and the
two numbers N and D represent the numerator and denominator given as input in IEEE-754 format to 32-bit
floating-point divider given as input to the proposed 32-bit floating point divider which is designed using
Goldschmidt and Newton-Raphson computational algorithms.
7. ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 3, November 2023: 336-344
342
Table 6. Device summary report of XPE-14.3 on Spartan-6 SP605 evaluation platform FPGA of floating
point divider
Using Goldschmidth algorithm Using Newton-Raphson algoroitm
Specifications Values Values
Junction temperature (ºC) 22.5 25.5
On-Chip power (Total) 0.031W 0.037 W
Thermal margin
58.5 ºC
(3.8 W)
59.5 ºC
(4.0 W)
Effective (ϴJA) 13.3 ºC/W 14.3 ºC/W
Figure 3. Goldschmidt computational algorithm based floating point division simulation result
In Figure 3 for Goldschmidt-based single precision division computations G1, G2, and G3 are used
to represent the first, second, and third iteration results and the final iteration result is represented by G4. In
Figure 4 for Newton-Raphson-based single precision division computations. The initial value represented by
Z0, the Z1, and Z2 are used to represent the first and second iteration results, and the final iteration result is
represented by Z3.
Figure 4. Newton-Raphson computational algorithm based floating point division simulation result
8. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Comparative study of single precision floating point division using … (Naginder Singh)
343
7. CONCLUSION
Goldschmidt and Newton–Raphson computational algorithm-based single precision floating point
division were synthesized and implemented using Xilinx Spartan 6 SP605 evaluation platform on FPGA and
simulations of the proposed module were done on ISim simulator. For high computational arithmetic
operations design presented in this paper is useful. Floating point divider performance is improved by
designing multiplier module used in it using Vedic mathematics so, high computational speed and throughput
are achieved. The device utilization parameters are optimized which results in an improvement in power
consumption which is reduced by 26% and latency time reduced by 42.6% respectively with respect to
existing DAF using Goldschmidt-based single precision floating point division.
REFERENCES
[1] A. Parker and J. O. Hamblen, “Optimal value for the Newton-Raphson division algorithm,” Information Processing Letters, vol.
42, no. 3, pp. 141–144, May 1992, doi: 10.1016/0020-0190(92)90137-K.
[2] I. Kong and E. E. Swartzlander, “A rounding method to reduce the required multiplier precision for Goldschmidt division,” IEEE
Transactions on Computers, vol. 59, no. 12, pp. 1703–1708, Dec. 2010, doi: 10.1109/TC.2010.86.
[3] A. Rodriguez-Garcia, L. Pizano-Escalante, R. Parra-Michel, O. Longoria-Gandara, and J. Cortez, “Fast fixed-point divider based
on Newton-Raphson method and piecewise polynomial approximation,” in 2013 International Conference on Reconfigurable
Computing and FPGAs, ReConFig 2013, Dec. 2013, pp. 1–6, doi: 10.1109/ReConFig.2013.6732291.
[4] A. Rathor and L. Bandil, “Design Of 32 bit floating point addition and subtraction units based on IEEE 754 standard,”
International Journal of Engineering Research and Technology, vol. 2, no. 6, pp. 2708–2712, 2013.
[5] S. B. K. Tirtha and V. S. Agrawala, Vedic Mathematics: Sixteen Simple Mathematical Formulae from the Vedas, Motilal
Banarsidass Publishing House, 1992.
[6] S. S. Mahakalkar and S. L. Haridas, “Design of high performance IEEE754 floating point multiplier using Vedic mathematics,” in
Proceedings - 2014 6th International Conference on Computational Intelligence and Communication Networks, CICN 2014, Nov.
2014, pp. 985–988, doi: 10.1109/CICN.2014.207.
[7] M. Pradhan and R. Panda, “Speed comparison of 16x16 vedic multipliers,” International Journal of Computer Applications, vol.
21, no. 6, pp. 12-19, May 2011, doi: 10.5120/2516-3417.
[8] S. K. Panda, R. Das, and T. R. Sahoo, “VLSI implementation of vedic multiplier using Urdhva– Tiryakbhyam sutra in VHDL
environment: A novelty,” IOSR Journal of VLSI and Signal Processing, vol. 5, no. 1, pp. 17–24, 2015, doi: 10.9790/4200-
05131724.
[9] B. Jeevan, S. Narender, C. V. K. Reddy, and K. Sivani, “A high speed binary floating point multiplier using Dadda algorithm,” in
Proceedings - 2013 IEEE International Multi Conference on Automation, Computing, Control, Communication and Compressed
Sensing, iMac4s 2013, Mar. 2013, pp. 455–460, doi: 10.1109/iMac4s.2013.6526454.
[10] C. Nagendra, R. M. Owens, and M. J. Irwin, “Power-delay characteristics of CMOS adders,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 2, no. 3, pp. 377–381, Sep. 1994, doi: 10.1109/92.311649.
[11] A. Gupta, “Arithmetic Unit implementation using delay optimized vedic multiplier with BIST capability,” International Journal of
Engineering and Innovative Technology (IJEIT), vol. 1, no. 5, pp. 59–62, 2012.
[12] A. Sahu, S. K. Panda, and S. Jena, “HDL implementation and performance comparison of an optimized high speed multiplier,”
IOSR Journal of VLSI and Signal Processing, vol. 5, no. 2, pp. 10–19, 2015, doi: 10.9790/4200-05211019.
[13] M. Pradhan and R. Panda, “High speed multiplier using Nikhilam Sutra algorithm of Vedic mathematics,” International Journal of
Electronics, vol. 101, no. 3, pp. 300–307, 2014, doi: 10.1080/00207217.2013.780298.
[14] K. Sethi and R. Panda, “An improved squaring circuit for binary numbers,” International Journal of Advanced Computer Science
and Applications, vol. 3, no. 2, 2012, doi: 10.14569/ijacsa.2012.030220.
[15] P. Malik, “High throughput floating-point dividers implemented in FPGA,” in Proceedings - 2015 IEEE 18th International
Symposium on Design and Diagnostics of Electronic Circuits and Systems, DDECS 2015, Apr. 2015, pp. 291–294, doi:
10.1109/DDECS.2015.66.
[16] A. Kanhe, S. K. Das, and A. K. Singh, “Design and implementation of low power multiplier using vedic multiplication technique,”
International Journal of Computer Science and Communication (IJCSC), vol. 3, no. 1, pp. 131–132, 2012.
[17] M. Al-Ashrafy, A. Salem, and W. Anis, “An efficient implementation of floating point multiplier,” in Saudi International
Electronics, Communications and Photonics Conference 2011, SIECPC 2011, Apr. 2011, pp. 1–5, doi:
10.1109/SIECPC.2011.5876905.
[18] R. Bhaskar, G. Hegde, and P. R. Vaya, “An efficient hardware model for RSA encryption system using Vedic mathematics,”
Procedia Engineering, vol. 30, pp. 124–128, 2012, doi: 10.1016/j.proeng.2012.01.842.
[19] A. Amaricai, M. Vladutiu, and O. Boncalo, “Design issues and implementations for floating-point divide-add fused,” IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 57, no. 4, pp. 295–299, Apr. 2010, doi:
10.1109/TCSII.2010.2043473.
[20] K. Pande, A. Parkhi, S. Jaykar, and A. Peshattiwar, “Design and implementation of floating point divide-add fused architecture,” in
Proceedings - 2015 5th International Conference on Communication Systems and Network Technologies, CSNT 2015, Apr. 2015,
pp. 797–800, doi: 10.1109/CSNT.2015.179.
[21] S. K. Panda and A. Sahu, “A novel vedic divider architecture with reduced delay for VLSI applications,” International Journal of
Computer Applications, vol. 120, no. 17, pp. 31–36, Jun. 2015, doi: 10.5120/21322-4350.
[22] M. Der Shieh, J. H. Chen, H. H. Wu, and W. C. Lin, “A new modular exponentiation architecture for efficient design of RSA
cryptosystem,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 9, pp. 1151–1161, Sep. 2008, doi:
10.1109/TVLSI.2008.2000524.
[23] A. P. Ramesh, A. V. N. Tilak, and A. M. Prasad, “FPGA based implementation of high speed double precision floating point
multiplier with tiling technique using verilog,” International Journal of Computer Applications, vol. 58, no. 21, pp. 17–25, 2012,
doi: 10.5120/9407-3814.
[24] G. Srikanth and N. S. Kumar, “Design of high speed low power reversible vedic multiplier and reversible divider,” International
Journal of Engineering Research and Applications, vol. 4, no. 9, pp. 70–74, 2014.
[25] I. Kong and E. E. Swartzlander, “A goldschmidt division method with faster than quadratic convergence,” IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 19, no. 4, pp. 696–700, 2011, doi: 10.1109/TVLSI.2009.2036926.
9. ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 3, November 2023: 336-344
344
[26] K. Underwood, “FPGAs vs. CPUs: Trends in peak floating-point performance,” ACM/SIGDA International Symposium on Field
Programmable Gate Arrays - FPGA, vol. 12, pp. 171–180, 2004, doi: 10.1145/968280.968305.
[27] C. V. S. Chaitanya, C. Sundaresan, P. R. Venkateswaran, and K. Prasad, “Design of modified booth based multiplier with carry
pre-computation,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 13, no. 3, pp. 1048–1055,
Mar. 2019, doi: 10.11591/ijeecs.v13.i3.pp1048-1055.
[28] M. F. Hassan, K. F. Hussein, and B. Al-Musawi, “Design and implementation of fast floating point units for FPGAs,” Indonesian
Journal of Electrical Engineering and Computer Science (IJEECS), vol. 19, no. 3, pp. 1480-1489, Sep. 2020, doi:
10.11591/ijeecs.v19.i3.pp1480-1489.
BIOGRAPHIES OF AUTHORS
Naginder Singh completed his Master of Technology from the National Institute
of Technology, Kurukshetra (Haryana) and Bachelor of Technology from College of
Engineering and Technology, Bikaner. He is currently serving as an Assistant Professor in the
Department of Electronics and Communication Engineering, M.B.M. University, Jodhpur,
Rajasthan (India). His research interests are in FPGA-based design, embedded systems and
sensors, embedded system design, and VLSI. He can be contacted at email:
naginder.ece@mbm.ac.in.
Kapil Parihar received his B.Tech. degree in Electronics and Communication
Engineering at Bikaner Engineering College, Bikaner and M.E. in Digital Communication at
M.B.M. Engineering College, Jodhpur, Rajasthan (India). He is full-time Assistant professor at
M.B.M. University, Jodhpur, India. His research lines are optical communication and photonic
crystal fiber. He doing his Ph.D. in Electronics and Communication Engineering Department
from M.B.M. University Jodhpur and his field interest is terahertz antennas, electromagnetic
engineering and antennas/sensors for communication, radar, imaging and sensing systems,
UWB wireless communication, antennas for portable devices, and antennas for base stations in
wireless communications. He can be contacted at email: kapil.ece@mbm.ac.in.