This paper presents a compact design of Montgomery modular multiplier
(MMM). MMM serves as a building block commonly required in security
protocols relying on public key encryption. The proposed design is intended
for hardware applications of lightweight cryptographic modules that is utilized
for the system on chip (SoC) and internet of things (IoT) devices. The proposed
design is an enhancement of the original MMM without any multiplication or
subtraction processes. The main target of the new modification is enhancing
the performance and reducing the area of the MMM hardware module. The
operands and internal variables of the proposed hardware circuit is optimized
to be bounded to the smallest efficient size to minimize the area and the critical
path delay. The proposed design was coded in VHDL, implemented on the
Virtex-6 FPGA, and its performance was analyzed utilizing XILINX ISE
tools. Our design occupies the smallest area comparing with other
implementations on the same FPGA type. The proposed design saves in a
range between 60.0% and 99.0% of the resources compared with other relevant
designs.
An Area-efficient Montgomery Modular Multiplier for CryptosystemsIJERA Editor
RSA is one of the most widely adopted public key algorithms at present and it requires repeated modular
multiplications to accomplish the computation of modular exponentiation. A famous approach to implement
modular multiplication in hardware circuits is based on the Montgomery modular multiplication algorithm since
it has many advantages. To speed up the encryption/decryption process, many high-speed Montgomery modular
multiplication algorithms and hardware architectures employ carry-save addition. But CSA based architecture
increases the area. In this paper, in order to reduce the area of the CSA based multiplier, an area-efficient
algorithm called Double Add Reduce algorithm is introduced. Then the performance analysis of the new design
with the previous had done for comparison.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An fpga implementation of the lms adaptive filter eSAT Journals
This document describes an FPGA implementation of the Least Mean Square (LMS) adaptive filter algorithm for active vibration control. It compares fixed-point and floating-point implementations in terms of area usage and performance. The LMS algorithm is implemented using a finite state machine model with separate modules for operations like filtering, error estimation, and weight adaptation. Both implementations utilize this structural model. The fixed-point version uses 16-bit integers and fractions, while the floating-point version leverages IP cores. Results show the floating-point implementation has better accuracy and resource utilization than the fixed-point version for active vibration control applications on FPGAs.
Modified montgomery modular multiplier for cryptosystemsIAEME Publication
The document summarizes a new area-efficient algorithm called the Double Add Reduce (DAR) algorithm for Montgomery modular multiplication. The DAR algorithm modifies an existing carry-save addition (CSA) based Montgomery multiplier to reduce area. It replaces the CSA block with a double adder reduce block that computes the Montgomery product using two adders instead of CSA, lowering the overall area. The DAR block first doubles the multiplicand Y, then iteratively calculates the Montgomery product by adding the current value, the LSB of the current value times the modulus M, and the multiplicand times the doubled Y, before dividing the result by two.
This document summarizes and compares the performance of three asymmetric cryptographic algorithms (RSA, ECC, and MQQ) on ARM processor-based embedded systems. It provides background on each algorithm, including how they work and their computational complexities. The document then describes simulations conducted using the SimpleScalar tool to analyze the processing time, memory usage, and processor usage of each algorithm. The results showed that the MQQ algorithm performed better than ECC and RSA on embedded systems in terms of these metrics.
Faster Interleaved Modular Multiplier Based on Sign DetectionVLSICS Design
Data Security is the most important issue nowadays. A lot of cryptosystems are introduced to provide security. Public key cryptosystems are the most common cryptosystems used for securing data communication. The common drawback of applying such cryptosystems is the heavy computations which degrade performance of a system. Modular multiplication is the basic operation of common public key cryptosystems such as RSA, Diffie-Hellman key agreement (DH), ElGamal and ECC. Much research is now
directed to reduce overall time consumed by modular multiplication operation. Abd-el-fatah et al. introduced an enhanced architecture for computing modular ultiplication of two large numbers X and Y modulo given M. In this paper, a modification on that architecture is introduced. The proposed design computes modular multiplication by scanning two bits per iteration instead of one bit. The proposed design for 1024-bit precision reduced overall time by 38% compared to the design of Abd-el-fatah et al.
This document discusses matrix inversion techniques for MIMO wireless communication systems. It begins by introducing how matrix inversion is used in algorithms for MIMO systems and standards like 802.11n. Existing matrix inversion approaches cannot achieve the performance needed for real-time 802.11n systems. The document then presents a new matrix inversion algorithm based on modified squared Givens rotations (MSGR) that enables real-time implementation with high throughput and low latency. This algorithm overcomes limitations of other QR decomposition techniques. Finally, the document evaluates this algorithm integrated into a MIMO receiver and demonstrates it can support the requirements of modern wireless standards like 802.11n.
Efficient implementation of bit parallel finite eSAT Journals
Abstract Arithmetic in Finite/Galois field is a major aspect for many applications such as error correcting code and cryptography. Addition and multiplication are the two basic operations in the finite field GF (2m).The finite field multiplication is the most resource and time consuming operation. In this paper the complexity (space) analysis and efficient FPGA implementation of bit parallel Karatsuba Multiplier over GF (2m) is presented. This is especially interesting for high performance systems because of its carry free property. To reduce the complexity of Classical Multiplier, multiplier with less complexity over GF (2m) based on Karatsuba Multiplier is used. The LUT complexity is evaluated on FPGA by using Xilinx ISE 8.1i.Furthermore,the experimental results on FPGAs for bit parallel Karatsuba Multiplier and Classical Multiplier were shown and the comparison table is provided. To the best of our knowledge, the bit parallel karatsuba multiplier consumes least resources among the known FPGA implementation. Keywords: Classical Multiplier, Cryptograph, FPGA, Galois field, Karatsuba Multiplier
An Area-efficient Montgomery Modular Multiplier for CryptosystemsIJERA Editor
RSA is one of the most widely adopted public key algorithms at present and it requires repeated modular
multiplications to accomplish the computation of modular exponentiation. A famous approach to implement
modular multiplication in hardware circuits is based on the Montgomery modular multiplication algorithm since
it has many advantages. To speed up the encryption/decryption process, many high-speed Montgomery modular
multiplication algorithms and hardware architectures employ carry-save addition. But CSA based architecture
increases the area. In this paper, in order to reduce the area of the CSA based multiplier, an area-efficient
algorithm called Double Add Reduce algorithm is introduced. Then the performance analysis of the new design
with the previous had done for comparison.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An fpga implementation of the lms adaptive filter eSAT Journals
This document describes an FPGA implementation of the Least Mean Square (LMS) adaptive filter algorithm for active vibration control. It compares fixed-point and floating-point implementations in terms of area usage and performance. The LMS algorithm is implemented using a finite state machine model with separate modules for operations like filtering, error estimation, and weight adaptation. Both implementations utilize this structural model. The fixed-point version uses 16-bit integers and fractions, while the floating-point version leverages IP cores. Results show the floating-point implementation has better accuracy and resource utilization than the fixed-point version for active vibration control applications on FPGAs.
Modified montgomery modular multiplier for cryptosystemsIAEME Publication
The document summarizes a new area-efficient algorithm called the Double Add Reduce (DAR) algorithm for Montgomery modular multiplication. The DAR algorithm modifies an existing carry-save addition (CSA) based Montgomery multiplier to reduce area. It replaces the CSA block with a double adder reduce block that computes the Montgomery product using two adders instead of CSA, lowering the overall area. The DAR block first doubles the multiplicand Y, then iteratively calculates the Montgomery product by adding the current value, the LSB of the current value times the modulus M, and the multiplicand times the doubled Y, before dividing the result by two.
This document summarizes and compares the performance of three asymmetric cryptographic algorithms (RSA, ECC, and MQQ) on ARM processor-based embedded systems. It provides background on each algorithm, including how they work and their computational complexities. The document then describes simulations conducted using the SimpleScalar tool to analyze the processing time, memory usage, and processor usage of each algorithm. The results showed that the MQQ algorithm performed better than ECC and RSA on embedded systems in terms of these metrics.
Faster Interleaved Modular Multiplier Based on Sign DetectionVLSICS Design
Data Security is the most important issue nowadays. A lot of cryptosystems are introduced to provide security. Public key cryptosystems are the most common cryptosystems used for securing data communication. The common drawback of applying such cryptosystems is the heavy computations which degrade performance of a system. Modular multiplication is the basic operation of common public key cryptosystems such as RSA, Diffie-Hellman key agreement (DH), ElGamal and ECC. Much research is now
directed to reduce overall time consumed by modular multiplication operation. Abd-el-fatah et al. introduced an enhanced architecture for computing modular ultiplication of two large numbers X and Y modulo given M. In this paper, a modification on that architecture is introduced. The proposed design computes modular multiplication by scanning two bits per iteration instead of one bit. The proposed design for 1024-bit precision reduced overall time by 38% compared to the design of Abd-el-fatah et al.
This document discusses matrix inversion techniques for MIMO wireless communication systems. It begins by introducing how matrix inversion is used in algorithms for MIMO systems and standards like 802.11n. Existing matrix inversion approaches cannot achieve the performance needed for real-time 802.11n systems. The document then presents a new matrix inversion algorithm based on modified squared Givens rotations (MSGR) that enables real-time implementation with high throughput and low latency. This algorithm overcomes limitations of other QR decomposition techniques. Finally, the document evaluates this algorithm integrated into a MIMO receiver and demonstrates it can support the requirements of modern wireless standards like 802.11n.
Efficient implementation of bit parallel finite eSAT Journals
Abstract Arithmetic in Finite/Galois field is a major aspect for many applications such as error correcting code and cryptography. Addition and multiplication are the two basic operations in the finite field GF (2m).The finite field multiplication is the most resource and time consuming operation. In this paper the complexity (space) analysis and efficient FPGA implementation of bit parallel Karatsuba Multiplier over GF (2m) is presented. This is especially interesting for high performance systems because of its carry free property. To reduce the complexity of Classical Multiplier, multiplier with less complexity over GF (2m) based on Karatsuba Multiplier is used. The LUT complexity is evaluated on FPGA by using Xilinx ISE 8.1i.Furthermore,the experimental results on FPGAs for bit parallel Karatsuba Multiplier and Classical Multiplier were shown and the comparison table is provided. To the best of our knowledge, the bit parallel karatsuba multiplier consumes least resources among the known FPGA implementation. Keywords: Classical Multiplier, Cryptograph, FPGA, Galois field, Karatsuba Multiplier
Efficient implementation of bit parallel finite field multiplierseSAT Publishing House
This document summarizes an article that presents an efficient implementation of a bit parallel Karatsuba finite field multiplier on FPGA. It begins by introducing finite field arithmetic and multiplication, which is the most resource intensive operation. It then discusses different multiplier designs, including the classical multiplier and Karatsuba multiplier. The Karatsuba multiplier has lower complexity than the classical multiplier by reducing the number of gates required. Experimental results on FPGAs show that the bit parallel Karatsuba multiplier consumes the fewest resources among known FPGA implementations. In summary, the document presents an efficient Karatsuba finite field multiplier design with lower complexity than alternative designs.
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETcsandit
Residue Number System is generally supposed to use co-prime moduli set. Non-coprime moduli sets are a field in RNS which is little studied. That's why this work was devoted to them. The resources that discuss non-coprime in RNS are very limited. For the previous reasons, this paper analyses the RNS conversion using suggested non-coprime moduli set.
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
Nowadays low power design circuits are major important for data transmission and processing the information among various system designs. One of the major multipliers used for synchronizing the data transmission is the systolic array multiplier, low power designs are mostly used for increasing the performance and reducing the hardware complexity. Among all the mathematical operations, multiplier plays a major role where it processes more information and with the high complexity of circuit in the existing irreversible design. We develop a systolic array multiplier using reversible gates for low power appliances, faults and coverage of the reversible logic are calculated in this paper. To improvise more, we introduced a reversible logic gate and tested the reversible systolic array multiplier using the fault injection method of built-in self-test block observer (BILBO) in which all corner cases are covered which shows 97% coverage compared with existing designs. Finally, Xilinx ISE 14.7 was used for synthesis and simulation results and compared parameters with existing designs which prove more efficiency.
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...IJERA Editor
Real time motion estimation for tracking is a challenging task. Several techniques can transform an image into frequency domain, such as DCT, DFT and wavelet transform. Direct implementation of 2-D DCT takes N^4 multiplications for an N x N image which is impractical. The proposed architecture for implementation of 2-D DCT uses look up tables. They are used to store pre-computed vector products that completely eliminate the multiplier. This makes the architecture highly time efficient, and the routing delay and power consumption is also reduced significantly. Another approach, 2-D discrete wavelet transform based motion estimation (DWT-ME) provides substantial improvements in quality and area. The proposed architecture uses Haar wavelet transform for motion estimation. In this paper, we present the comparison of the performance of discrete cosine transform, discrete wavelet transform for implementation in motion estimation.
Towards Efficient Modular Adders based on Reversible CircuitsIJSRED
This document discusses the design of efficient modular adders based on reversible circuits. It proposes combining residue number systems (RNS) with reversible computing to achieve an ultra-efficient computing paradigm for emerging applications. RNS allows for parallel and carry-free arithmetic, which is well-suited to take advantage of the properties of reversible circuits. However, existing RNS architectures were designed for ASIC implementation and need to be adapted for reversible circuits. Modular addition is a key operation in RNS since addition is used in forward and reverse conversion as well as other operations. The document presents implementations of modular adders using reversible gates like Feynman, Peres and HNG gates. Experimental results show modulo adders can be designed using reversible
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET Journal
This document describes hardware co-simulation of classical edge detection algorithms using Xilinx System Generator. It presents designs for Robert, Prewitt, and Sobel edge detection operators using minimum FPGA resources. The designs were tested on a Spartan-3E FPGA board using JTAG hardware co-simulation. Results show the Sobel operator provides the best edge detection, followed by Prewitt and Robert. Compared to prior work, the proposed designs utilize significantly fewer FPGA slices, flip flops, and lookup tables, demonstrating more efficient implementation of the classical edge detection algorithms.
Application of smart antenna interference suppression techniques in tdscdmamarwaeng
This document discusses the application of smart antenna interference suppression techniques in TD-SCDMA systems. It first provides background on TD-SCDMA and smart antenna technology. It then describes two adaptive beamforming algorithms - LMS and RLS - and simulates their performance at interference suppression in MATLAB. The simulation results show that both algorithms can effectively form a main beam towards the desired user and nulls towards interferers. RLS converges faster than LMS, though it has higher computational complexity. The study demonstrates that smart antenna techniques can improve signal quality in TD-SCDMA systems by suppressing interference.
FPGA Implementation of Soft Output Viterbi Algorithm Using Memoryless Hybrid ...VLSICS Design
The importance of convolutional codes is well established. They are widely used to encode digital data before transmission through noisy or error-prone communication channels to reduce occurrence of errors and memory. This paper presents novel decoding technique, memoryless Hybrid Register Exchange with simulation and FPGA implementation results. It requires single register as compared to Register Exchange Method (REM) & Hybrid Register Exchange Method (HREM); therefore the data trans-fer operations and ultimately the switching activity will get reduced.
FPGA IMPLEMENTATION OF SOFT OUTPUT VITERBI ALGORITHM USING MEMORYLESS HYBRID ...VLSICS Design
The importance of convolutional codes is well established. They are widely used to encode digital data before transmission through noisy or error-prone communication channels to reduce occurrence of errors and memory. This paper presents novel decoding technique, memoryless Hybrid Register Exchange with simulation and FPGA implementation results. It requires single register as compared to Register Exchange Method (REM) & Hybrid Register Exchange Method (HREM); therefore the data trans-fer operations and ultimately the switching activity will get reduced.
In OFDM-IDMA scheme, intersymbol interference (ISI) is resolved by the OFDM layer and multiple access interference (MAI) is suppressed by the IDMA layer at low cost . However OFDM-IDMA scheme suffers high peakto-average power ratio (PAPR) problem. For removing high PAPR problem a hybrid multiple access scheme SC-FDM-IDMA has been proposed. In this paper, bit error rate (BER) performance comparison of SC-FDM-IDMA scheme, OFDM-IDMA scheme and IDMA scheme have been duly presented. Moreover, the BER performance of various subcarrier mapping methods for SC-FDM-IDMA scheme as well as other results with variation of different parameters have also been demonstrated. Finally simulation result for BER performance improvement has been shown employing BCH code. All the simulation results demonstrate the suitability of SC-FDMIDMA scheme for wireless communication under AWGN channel environment.
International Journal of Engineering Research and DevelopmentIJERD Editor
This document describes a proposed VLSI architecture for an optimized low power digit serial finite impulse response (FIR) filter using multiple constant multiplications (MCM). It introduces an algorithm to optimize the area of digit serial MCM operations at the gate level by considering implementation costs of digit serial addition, subtraction, and shift operations. The proposed filter architecture aims to reduce area and power compared to designs using generic digit serial multipliers through the use of MCM blocks optimized for area. Experimental results indicate the algorithm leads to lower complexity digit serial MCM designs.
This document summarizes a research paper on designing low power digit serial finite impulse response (FIR) filters using multiple constant multiplication (MCM) techniques. It proposes an architecture that optimizes the area of digit serial MCM operations at the gate level by considering the implementation costs of digit serial addition, subtraction and shift operations. An algorithm is presented to design digit serial FIR filters under a shift-adds architecture to reduce area compared to designs using generic digit serial multipliers. Experimental results show the technique leads to lower complexity digit serial MCM designs.
This document describes a proposed modular multiplication algorithm that divides the computation into two steps:
1) A multiplication step that uses Toom-Cook multiplication to split the inputs into five parts
2) A modular multiplication step that uses Barrett and Montgomery modular multiplication algorithms in parallel to compute the results of the five parts from the first step.
The algorithm is designed to minimize the number of single-precision multiplications and enable more than three-way parallel computation, improving efficiency over other modular multiplication methods.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
The document describes a system that uses AI technologies like optical character recognition, text summarization, and speech synthesis to automatically recognize text from images, summarize it, and generate an audio podcast of the summary. The system segments text from images using a neural network with differentiable binarization. It recognizes words using a temporal recognition network with connectionist temporal classification. It summarizes the text using either an abstractive transformer model or extractive PageRank algorithm. Finally, it generates a mel-spectrogram from the summary and synthesizes speech from the spectrogram using generative adversarial networks. The system aims to quickly digest lengthy publications for users.
Basic signal processing system design on fpga using lms based adaptive filtereSAT Journals
Abstract
Adaptive digital filter based on LMS algorithms widely used in the area of digital signal processing to iteratively estimate the
statistics of an unknown signal. Design of an adaptive filter is based on three major computing elements namely multiplier, adder
and delay unit torealize the Finite Impulse Response (FIR) filter. The filter weights( coefficient) of the FIR filter are adjusted
automatically by Least Mean Square of the error so as to match the adapted output to the desired input. This paper explains the
design of adaptive filter by two approaches. One is model based approach and other is Field Programmable Gate Arrays
(FPGAs). The model based design approach is developed around MATLAB, SIMULINK and SYSTEM GENERATOR tools, which
provide a virtual FPGA platform. Modern FPGA include the resources needed to design efficient filtering structures. The LMS
algorithm has been implemented on CYCLONE II EP2C35F672C8 FPGA device, using ALTERA QUARTUS II development
platform. The three major demonstrable applications cited in the present work are System Identification, Noise reduction and
Echo cancellation.
Keywords – Signal Processing; FPGA; Adaptive Filter; DE2KIT
This document discusses optimizing the gate-level area of digit-serial finite impulse response (FIR) filter designs using multiple constant multiplication (MCM) blocks. It introduces the problem of designing a digit-serial MCM operation with minimal area and presents algorithms to formalize the area optimization problem. Results show the proposed algorithms and digit-serial MCM architectures efficiently design digit-serial MCM operations and FIR filters with reduced area compared to existing approaches.
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
The document describes the design and implementation of an efficient interpolator for wireless communication systems using FPGA. It proposes a multiplier-less technique using distributed arithmetic look-up tables (DALUT) that replaces multiply-accumulate operations with LUT accesses. A 66th-order half-band polyphase FIR structure is implemented using the DALUT approach on Spartan-3E and Virtex2Pro FPGAs. Results show the proposed design achieves maximum frequencies of 92.859MHz on Virtex Pro and 61.6MHz on Spartan 3E while consuming fewer resources than a traditional MAC-based design.
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
Interpolator is an important sampling device used for multirate filtering to
provide signal processing in wireless communication system. There are many
applications in which sampling rate must be changed. Interpolators and decimators are
utilized to increase or decrease the sampling rate. In this paper an efficient method has
been presented to implement high speed and area efficient interpolator for wireless
communication systems. A multiplier less technique is used which substitutes multiplyand-accumulate
operations with look up table (LUT) accesses. Interpolator has been
implemented using Partitioned distributed arithmetic look up table (DALUT)
technique. This technique has been used to take an optimal advantage of embedded
LUTs of the target FPGA. This method is useful to enhance the system performance in
terms of speed and area. The proposed interpolator has been designed using half band
poly phase FIR structure with Matlab, simulated with ISE, synthesized with Xilinx
Synthesis Tools (XST) and implemented on Spartan-3E and Virtex2pro device. The
proposed LUT based multiplier less approach has shown a maximum operating
frequency of 92.859 MHz with Virtex Pro and 61.6 MHz with Spartan 3E by
consuming considerably less resources to provide cost effective solution for wireless
communication systems.
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
More Related Content
Similar to A compact FPGA-based montgomery modular multiplier
Efficient implementation of bit parallel finite field multiplierseSAT Publishing House
This document summarizes an article that presents an efficient implementation of a bit parallel Karatsuba finite field multiplier on FPGA. It begins by introducing finite field arithmetic and multiplication, which is the most resource intensive operation. It then discusses different multiplier designs, including the classical multiplier and Karatsuba multiplier. The Karatsuba multiplier has lower complexity than the classical multiplier by reducing the number of gates required. Experimental results on FPGAs show that the bit parallel Karatsuba multiplier consumes the fewest resources among known FPGA implementations. In summary, the document presents an efficient Karatsuba finite field multiplier design with lower complexity than alternative designs.
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETcsandit
Residue Number System is generally supposed to use co-prime moduli set. Non-coprime moduli sets are a field in RNS which is little studied. That's why this work was devoted to them. The resources that discuss non-coprime in RNS are very limited. For the previous reasons, this paper analyses the RNS conversion using suggested non-coprime moduli set.
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
Nowadays low power design circuits are major important for data transmission and processing the information among various system designs. One of the major multipliers used for synchronizing the data transmission is the systolic array multiplier, low power designs are mostly used for increasing the performance and reducing the hardware complexity. Among all the mathematical operations, multiplier plays a major role where it processes more information and with the high complexity of circuit in the existing irreversible design. We develop a systolic array multiplier using reversible gates for low power appliances, faults and coverage of the reversible logic are calculated in this paper. To improvise more, we introduced a reversible logic gate and tested the reversible systolic array multiplier using the fault injection method of built-in self-test block observer (BILBO) in which all corner cases are covered which shows 97% coverage compared with existing designs. Finally, Xilinx ISE 14.7 was used for synthesis and simulation results and compared parameters with existing designs which prove more efficiency.
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...IJERA Editor
Real time motion estimation for tracking is a challenging task. Several techniques can transform an image into frequency domain, such as DCT, DFT and wavelet transform. Direct implementation of 2-D DCT takes N^4 multiplications for an N x N image which is impractical. The proposed architecture for implementation of 2-D DCT uses look up tables. They are used to store pre-computed vector products that completely eliminate the multiplier. This makes the architecture highly time efficient, and the routing delay and power consumption is also reduced significantly. Another approach, 2-D discrete wavelet transform based motion estimation (DWT-ME) provides substantial improvements in quality and area. The proposed architecture uses Haar wavelet transform for motion estimation. In this paper, we present the comparison of the performance of discrete cosine transform, discrete wavelet transform for implementation in motion estimation.
Towards Efficient Modular Adders based on Reversible CircuitsIJSRED
This document discusses the design of efficient modular adders based on reversible circuits. It proposes combining residue number systems (RNS) with reversible computing to achieve an ultra-efficient computing paradigm for emerging applications. RNS allows for parallel and carry-free arithmetic, which is well-suited to take advantage of the properties of reversible circuits. However, existing RNS architectures were designed for ASIC implementation and need to be adapted for reversible circuits. Modular addition is a key operation in RNS since addition is used in forward and reverse conversion as well as other operations. The document presents implementations of modular adders using reversible gates like Feynman, Peres and HNG gates. Experimental results show modulo adders can be designed using reversible
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET Journal
This document describes hardware co-simulation of classical edge detection algorithms using Xilinx System Generator. It presents designs for Robert, Prewitt, and Sobel edge detection operators using minimum FPGA resources. The designs were tested on a Spartan-3E FPGA board using JTAG hardware co-simulation. Results show the Sobel operator provides the best edge detection, followed by Prewitt and Robert. Compared to prior work, the proposed designs utilize significantly fewer FPGA slices, flip flops, and lookup tables, demonstrating more efficient implementation of the classical edge detection algorithms.
Application of smart antenna interference suppression techniques in tdscdmamarwaeng
This document discusses the application of smart antenna interference suppression techniques in TD-SCDMA systems. It first provides background on TD-SCDMA and smart antenna technology. It then describes two adaptive beamforming algorithms - LMS and RLS - and simulates their performance at interference suppression in MATLAB. The simulation results show that both algorithms can effectively form a main beam towards the desired user and nulls towards interferers. RLS converges faster than LMS, though it has higher computational complexity. The study demonstrates that smart antenna techniques can improve signal quality in TD-SCDMA systems by suppressing interference.
FPGA Implementation of Soft Output Viterbi Algorithm Using Memoryless Hybrid ...VLSICS Design
The importance of convolutional codes is well established. They are widely used to encode digital data before transmission through noisy or error-prone communication channels to reduce occurrence of errors and memory. This paper presents novel decoding technique, memoryless Hybrid Register Exchange with simulation and FPGA implementation results. It requires single register as compared to Register Exchange Method (REM) & Hybrid Register Exchange Method (HREM); therefore the data trans-fer operations and ultimately the switching activity will get reduced.
FPGA IMPLEMENTATION OF SOFT OUTPUT VITERBI ALGORITHM USING MEMORYLESS HYBRID ...VLSICS Design
The importance of convolutional codes is well established. They are widely used to encode digital data before transmission through noisy or error-prone communication channels to reduce occurrence of errors and memory. This paper presents novel decoding technique, memoryless Hybrid Register Exchange with simulation and FPGA implementation results. It requires single register as compared to Register Exchange Method (REM) & Hybrid Register Exchange Method (HREM); therefore the data trans-fer operations and ultimately the switching activity will get reduced.
In OFDM-IDMA scheme, intersymbol interference (ISI) is resolved by the OFDM layer and multiple access interference (MAI) is suppressed by the IDMA layer at low cost . However OFDM-IDMA scheme suffers high peakto-average power ratio (PAPR) problem. For removing high PAPR problem a hybrid multiple access scheme SC-FDM-IDMA has been proposed. In this paper, bit error rate (BER) performance comparison of SC-FDM-IDMA scheme, OFDM-IDMA scheme and IDMA scheme have been duly presented. Moreover, the BER performance of various subcarrier mapping methods for SC-FDM-IDMA scheme as well as other results with variation of different parameters have also been demonstrated. Finally simulation result for BER performance improvement has been shown employing BCH code. All the simulation results demonstrate the suitability of SC-FDMIDMA scheme for wireless communication under AWGN channel environment.
International Journal of Engineering Research and DevelopmentIJERD Editor
This document describes a proposed VLSI architecture for an optimized low power digit serial finite impulse response (FIR) filter using multiple constant multiplications (MCM). It introduces an algorithm to optimize the area of digit serial MCM operations at the gate level by considering implementation costs of digit serial addition, subtraction, and shift operations. The proposed filter architecture aims to reduce area and power compared to designs using generic digit serial multipliers through the use of MCM blocks optimized for area. Experimental results indicate the algorithm leads to lower complexity digit serial MCM designs.
This document summarizes a research paper on designing low power digit serial finite impulse response (FIR) filters using multiple constant multiplication (MCM) techniques. It proposes an architecture that optimizes the area of digit serial MCM operations at the gate level by considering the implementation costs of digit serial addition, subtraction and shift operations. An algorithm is presented to design digit serial FIR filters under a shift-adds architecture to reduce area compared to designs using generic digit serial multipliers. Experimental results show the technique leads to lower complexity digit serial MCM designs.
This document describes a proposed modular multiplication algorithm that divides the computation into two steps:
1) A multiplication step that uses Toom-Cook multiplication to split the inputs into five parts
2) A modular multiplication step that uses Barrett and Montgomery modular multiplication algorithms in parallel to compute the results of the five parts from the first step.
The algorithm is designed to minimize the number of single-precision multiplications and enable more than three-way parallel computation, improving efficiency over other modular multiplication methods.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
The document describes a system that uses AI technologies like optical character recognition, text summarization, and speech synthesis to automatically recognize text from images, summarize it, and generate an audio podcast of the summary. The system segments text from images using a neural network with differentiable binarization. It recognizes words using a temporal recognition network with connectionist temporal classification. It summarizes the text using either an abstractive transformer model or extractive PageRank algorithm. Finally, it generates a mel-spectrogram from the summary and synthesizes speech from the spectrogram using generative adversarial networks. The system aims to quickly digest lengthy publications for users.
Basic signal processing system design on fpga using lms based adaptive filtereSAT Journals
Abstract
Adaptive digital filter based on LMS algorithms widely used in the area of digital signal processing to iteratively estimate the
statistics of an unknown signal. Design of an adaptive filter is based on three major computing elements namely multiplier, adder
and delay unit torealize the Finite Impulse Response (FIR) filter. The filter weights( coefficient) of the FIR filter are adjusted
automatically by Least Mean Square of the error so as to match the adapted output to the desired input. This paper explains the
design of adaptive filter by two approaches. One is model based approach and other is Field Programmable Gate Arrays
(FPGAs). The model based design approach is developed around MATLAB, SIMULINK and SYSTEM GENERATOR tools, which
provide a virtual FPGA platform. Modern FPGA include the resources needed to design efficient filtering structures. The LMS
algorithm has been implemented on CYCLONE II EP2C35F672C8 FPGA device, using ALTERA QUARTUS II development
platform. The three major demonstrable applications cited in the present work are System Identification, Noise reduction and
Echo cancellation.
Keywords – Signal Processing; FPGA; Adaptive Filter; DE2KIT
This document discusses optimizing the gate-level area of digit-serial finite impulse response (FIR) filter designs using multiple constant multiplication (MCM) blocks. It introduces the problem of designing a digit-serial MCM operation with minimal area and presents algorithms to formalize the area optimization problem. Results show the proposed algorithms and digit-serial MCM architectures efficiently design digit-serial MCM operations and FIR filters with reduced area compared to existing approaches.
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
The document describes the design and implementation of an efficient interpolator for wireless communication systems using FPGA. It proposes a multiplier-less technique using distributed arithmetic look-up tables (DALUT) that replaces multiply-accumulate operations with LUT accesses. A 66th-order half-band polyphase FIR structure is implemented using the DALUT approach on Spartan-3E and Virtex2Pro FPGAs. Results show the proposed design achieves maximum frequencies of 92.859MHz on Virtex Pro and 61.6MHz on Spartan 3E while consuming fewer resources than a traditional MAC-based design.
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
Interpolator is an important sampling device used for multirate filtering to
provide signal processing in wireless communication system. There are many
applications in which sampling rate must be changed. Interpolators and decimators are
utilized to increase or decrease the sampling rate. In this paper an efficient method has
been presented to implement high speed and area efficient interpolator for wireless
communication systems. A multiplier less technique is used which substitutes multiplyand-accumulate
operations with look up table (LUT) accesses. Interpolator has been
implemented using Partitioned distributed arithmetic look up table (DALUT)
technique. This technique has been used to take an optimal advantage of embedded
LUTs of the target FPGA. This method is useful to enhance the system performance in
terms of speed and area. The proposed interpolator has been designed using half band
poly phase FIR structure with Matlab, simulated with ISE, synthesized with Xilinx
Synthesis Tools (XST) and implemented on Spartan-3E and Virtex2pro device. The
proposed LUT based multiplier less approach has shown a maximum operating
frequency of 92.859 MHz with Virtex Pro and 61.6 MHz with Spartan 3E by
consuming considerably less resources to provide cost effective solution for wireless
communication systems.
Similar to A compact FPGA-based montgomery modular multiplier (20)
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
Control of a servo-hydraulic system utilizing an extended wavelet functional ...nooriasukmaningtyas
Servo-hydraulic systems have been extensively employed in various industrial applications. However, these systems are characterized by their highly complex and nonlinear dynamics, which complicates the control design stage of such systems. In this paper, an extended wavelet functional link neural network (EWFLNN) is proposed to control the displacement response of the servo-hydraulic system. To optimize the controller's parameters, a recently developed optimization technique, which is called the modified sine cosine algorithm (M-SCA), is exploited as the training method. The proposed controller has achieved remarkable results in terms of tracking two different displacement signals and handling external disturbances. From a comparative study, the proposed EWFLNN controller has attained the best control precision compared with those of other controllers, namely, a proportional-integralderivative (PID) controller, an artificial neural network (ANN) controller, a wavelet neural network (WNN) controller, and the original wavelet functional link neural network (WFLNN) controller. Moreover, compared to the genetic algorithm (GA) and the original sine cosine algorithm (SCA), the M-SCA has shown better optimization results in finding the optimal values of the controller's parameters.
Decentralised optimal deployment of mobile underwater sensors for covering la...nooriasukmaningtyas
This paper presents the problem of sensing coverage of layers of the ocean in three dimensional underwater environments. We propose distributed control laws to drive mobile underwater sensors to optimally cover a given confined layer of the ocean. By applying this algorithm at first the mobile underwater sensors adjust their depth to the specified depth. Then, they make a triangular grid across a given area. Afterwards, they randomly move to spread across the given grid. These control laws only rely on local information also they are easily implemented and computationally effective as they use some easy consensus rules. The feature of exchanging information just among neighbouring mobile sensors keeps the information exchange minimum in the whole networks and makes this algorithm practicable option for undersea. The efficiency of the presented control laws is confirmed via mathematical proof and numerical simulations.
Evaluation quality of service for internet of things based on fuzzy logic: a ...nooriasukmaningtyas
The development of the internet of thing (IoT) technology has become a major concern in sustainability of quality of service (SQoS) in terms of efficiency, measurement, and evaluation of services, such as our smart home case study. Based on several ambiguous linguistic and standard criteria, this article deals with quality of service (QoS). We used fuzzy logic to select the most appropriate and efficient services. For this reason, we have introduced a new paradigmatic approach to assess QoS. In this regard, to measure SQoS, linguistic terms were collected for identification of ambiguous criteria. This paper collects the results of other work to compare the traditional assessment methods and techniques in IoT. It has been proven that the comparison that traditional valuation methods and techniques could not effectively deal with these metrics. Therefore, fuzzy logic is a worthy method to provide a good measure of QoS with ambiguous linguistic and criteria. The proposed model addresses with constantly being improved, all the main axes of the QoS for a smart home. The results obtained also indicate that the model with its fuzzy performance importance index (FPII) has efficiently evaluate the multiple services of SQoS.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Smart monitoring system using NodeMCU for maintenance of production machinesnooriasukmaningtyas
Maintenance is an activity that helps to reduce risk, increase productivity, improve quality, and minimize production costs. The necessity for maintenance actions will increase efficiency and enhance the safety and quality of products and processes. On getting these conditions, it is necessary to implement a monitoring system used to observe machines' conditions from time to time, especially the machine parts that often experience problems. This paper presents a low-cost intelligent monitoring system using NodeMCU to continuously monitor machine conditions and provide warnings in the case of machine failure. Not only does it provide alerts, but this monitoring system also generates historical data on machine conditions to the Google Cloud (Google Sheet), includes which machines were down, downtime, issues occurred, repairs made, and technician handling. The results obtained are machine operators do not need to lose a relatively long time to call the technician. Likewise, the technicians assisted in carrying out machine maintenance activities and online reports so that errors that often occur due to human error do not happen again. The system succeeded in reducing the technician-calling time and maintenance workreporting time up to 50%. The availability of online and real-time maintenance historical data will support further maintenance strategy.
Design and simulation of a software defined networkingenabled smart switch, f...nooriasukmaningtyas
Using sustainable energy is the future of our planet earth, this became not only economically efficient but also a necessity for the preservation of life on earth. Because of such necessity, smart grids became a very important issue to be researched. Many literatures discussed this topic and with the development of internet of things (IoT) and smart sensors, smart grids are developed even further. On the other hand, software defined networking is a technology that separates the control plane from the data plan of the network. It centralizes the management and the orchestration of the network tasks by using a network controller. The network controller is the heart of the SDN-enabled network, and it can control other networking devices using software defined networking (SDN) protocols such as OpenFlow. A smart switching mechanism called (SDN-smgrid-sw) for the smart grid will be modeled and controlled using SDN. We modeled the environment that interact with the sensors, for the sun and the wind elements. The Algorithm is modeled and programmed for smart efficient power sharing that is managed centrally and monitored using SDN controller. Also, all if the smart grid elements (power sources) are connected to the IP network using IoT protocols.
Efficient wireless power transmission to remote the sensor in restenosis coro...nooriasukmaningtyas
In this study, the researchers have proposed an alternative technique for designing an asymmetric 4 coil-resonance coupling module based on the series-to-parallel topology at 27 MHz industrial scientific medical (ISM) band to avoid the tissue damage, for the constant monitoring of the in-stent restenosis coronary artery. This design consisted of 2 components, i.e., the external part that included 3 planar coils that were placed outside the body and an internal helical coil (stent) that was implanted into the coronary artery in the human tissue. This technique considered the output power and the transfer efficiency of the overall system, coil geometry like the number of coils per turn, and coil size. The results indicated that this design showed an 82% efficiency in the air if the transmission distance was maintained as 20 mm, which allowed the wireless power supply system to monitor the pressure within the coronary artery when the implanted load resistance was 400 Ω.
Grid reactive voltage regulation and cost optimization for electric vehicle p...nooriasukmaningtyas
Expecting large electric vehicle (EV) usage in the future due to environmental issues, state subsidies, and incentives, the impact of EV charging on the power grid is required to be closely analyzed and studied for power quality, stability, and planning of infrastructure. When a large number of energy storage batteries are connected to the grid as a capacitive load the power factor of the power grid is inevitably reduced, causing power losses and voltage instability. In this work large-scale 18K EV charging model is implemented on IEEE 33 network. Optimization methods are described to search for the location of nodes that are affected most due to EV charging in terms of power losses and voltage instability of the network. Followed by optimized reactive power injection magnitude and time duration of reactive power at the identified nodes. It is shown that power losses are reduced and voltage stability is improved in the grid, which also complements the reduction in EV charging cost. The result will be useful for EV charging stations infrastructure planning, grid stabilization, and reducing EV charging costs.
Topology network effects for double synchronized switch harvesting circuit on...nooriasukmaningtyas
Energy extraction takes place using several different technologies, depending on the type of energy and how it is used. The objective of this paper is to study topology influence for a smart network based on piezoelectric materials using the double synchronized switch harvesting (DSSH). In this work, has been presented network topology for circuit DSSH (DSSH Standard, Independent DSSH, DSSH in parallel, mono DSSH, and DSSH in series). Using simulation-based on a structure with embedded piezoelectric system harvesters, then compare different topology of circuit DSSH for knowledge is how to connect the circuit DSSH together and how to implement accurately this circuit strategy for maximizing the total output power. The network topology DSSH extracted power a technique allows again up to in terms of maximal power output compared with network topology standard extracted at the resonant frequency. The simulation results show that by using the same input parameters the maximum efficiency for topology DSSH in parallel produces 120% more energy than topology DSSH-series. In addition, the energy harvesting by mono-DSSH is more than DSSH-series by 650% and it has exceeded DSSHind by 240%.
Improving the design of super-lift Luo converter using hybrid switching capac...nooriasukmaningtyas
In this article, an improvement to the positive output super-lift Luo converter (POSLC) has been proposed to get high gain at a low duty cycle. Also, reduce the stress on the switch and diodes, reduce the current through the inductors to reduce loss, and increase efficiency. Using a hybrid switch unit composed of four inductors and two capacitors it is replaced by the main inductor in the elementary circuit. It’s charged in parallel with the same input voltage and discharged in series. The output voltage is increased according to the number of components. The gain equation is modeled. The boundary condition between continuous conduction mode (CCM) and discontinuous conduction mode (DCM) has been derived. Passive components are designed to get high output voltage (8 times at D=0.5) and low ripple about (0.004). The circuit is simulated and analyzed using MATLAB/Simulink. Maximum power point tracker (MPPT) controls the converter to provide the most interest from solar energy.
Third harmonic current minimization using third harmonic blocking transformernooriasukmaningtyas
Zero sequence blocking transformers (ZSBTs) are used to suppress third harmonic currents in 3-phase systems. Three-phase systems where singlephase loading is present, there is every chance that the load is not balanced. If there is zero-sequence current due to unequal load current, then the ZSBT will impose high impedance and the supply voltage at the load end will be varied which is not desired. This paper presents Third harmonic blocking transformer (THBT) which suppresses only higher harmonic zero sequences. The constructional features using all windings in single-core and construction using three single-phase transformers explained. The paper discusses the constructional features, full details of circuit usage, design considerations, and simulation results for different supply and load conditions. A comparison of THBT with ZSBT is made with simulation results by considering four different cases
Power quality improvement of distribution systems asymmetry caused by power d...nooriasukmaningtyas
With an increase of non-linear load in today’s electrical power systems, the rate of power quality drops and the voltage source and frequency deteriorate if not properly compensated with an appropriate device. Filters are most common techniques that employed to overcome this problem and improving power quality. In this paper an improved optimization technique of filter applies to the power system is based on a particle swarm optimization with using artificial neural network technique applied to the unified power flow quality conditioner (PSO-ANN UPQC). Design particle swarm optimization and artificial neural network together result in a very high performance of flexible AC transmission lines (FACTs) controller and it implements to the system to compensate all types of power quality disturbances. This technique is very powerful for minimization of total harmonic distortion of source voltages and currents as a limit permitted by IEEE-519. The work creates a power system model in MATLAB/Simulink program to investigate our proposed optimization technique for improving control circuit of filters. The work also has measured all power quality disturbances of the electrical arc furnace of steel factory and suggests this technique of filter to improve the power quality.
Studies enhancement of transient stability by single machine infinite bus sys...nooriasukmaningtyas
Maintaining network synchronization is important to customer service. Low fluctuations cause voltage instability, non-synchronization in the power system or the problems in the electrical system disturbances, harmonics current and voltages inflation and contraction voltage. Proper tunning of the parameters of stabilizer is prime for validation of stabilizer. To overcome instability issues and get reinforcement found a lot of the techniques are developed to overcome instability problems and improve performance of power system. Genetic algorithm was applied to optimize parameters and suppress oscillation. The simulation of the robust composite capacitance system of an infinite single-machine bus was studied using MATLAB was used for optimization purpose. The critical time is an indication of the maximum possible time during which the error can pass in the system to obtain stability through the simulation. The effectiveness improvement has been shown in the system
Renewable energy based dynamic tariff system for domestic load managementnooriasukmaningtyas
To deal with the present power-scenario, this paper proposes a model of an advanced energy management system, which tries to achieve peak clipping, peak to average ratio reduction and cost reduction based on effective utilization of distributed generations. This helps to manage conventional loads based on flexible tariff system. The main contribution of this work is the development of three-part dynamic tariff system on the basis of time of utilizing power, available renewable energy sources (RES) and consumers’ load profile. This incorporates consumers’ choice to suitably select for either consuming power from conventional energy sources and/or renewable energy sources during peak or off-peak hours. To validate the efficiency of the proposed model we have comparatively evaluated the model performance with existing optimization techniques using genetic algorithm and particle swarm optimization. A new optimization technique, hybrid greedy particle swarm optimization has been proposed which is based on the two aforementioned techniques. It is found that the proposed model is superior with the improved tariff scheme when subjected to load management and consumers’ financial benefit. This work leads to maintain a healthy relationship between the utility sectors and the consumers, thereby making the existing grid more reliable, robust, flexible yet cost effective.
Energy harvesting maximization by integration of distributed generation based...nooriasukmaningtyas
The purpose of distributed generation systems (DGS) is to enhance the distribution system (DS) performance to be better known with its benefits in the power sector as installing distributed generation (DG) units into the DS can introduce economic, environmental and technical benefits. Those benefits can be obtained if the DG units' site and size is properly determined. The aim of this paper is studying and reviewing the effect of connecting DG units in the DS on transmission efficiency, reactive power loss and voltage deviation in addition to the economical point of view and considering the interest and inflation rate. Whale optimization algorithm (WOA) is introduced to find the best solution to the distributed generation penetration problem in the DS. The result of WOA is compared with the genetic algorithm (GA), particle swarm optimization (PSO), and grey wolf optimizer (GWO). The proposed solutions methodologies have been tested using MATLAB software on IEEE 33 standard bus system
Intelligent fault diagnosis for power distribution systemcomparative studiesnooriasukmaningtyas
Short circuit is one of the most popular types of permanent fault in power distribution system. Thus, fast and accuracy diagnosis of short circuit failure is very important so that the power system works more effectively. In this paper, a newly enhanced support vector machine (SVM) classifier has been investigated to identify ten short-circuit fault types, including single line-toground faults (XG, YG, ZG), line-to-line faults (XY, XZ, YZ), double lineto-ground faults (XYG, XZG, YZG) and three-line faults (XYZ). The performance of this enhanced SVM model has been improved by using three different versions of particle swarm optimization (PSO), namely: classical PSO (C-PSO), time varying acceleration coefficients PSO (T-PSO) and constriction factor PSO (K-PSO). Further, utilizing pseudo-random binary sequence (PRBS)-based time domain reflectometry (TDR) method allows to obtain a reliable dataset for SVM classifier. The experimental results performed on a two-branch distribution line show the most optimal variant of PSO for short fault diagnosis.
A deep learning approach based on stochastic gradient descent and least absol...nooriasukmaningtyas
More than eighty-five to ninety percentage of the diabetic patients are affected with diabetic retinopathy (DR) which is an eye disorder that leads to blindness. The computational techniques can support to detect the DR by using the retinal images. However, it is hard to measure the DR with the raw retinal image. This paper proposes an effective method for identification of DR from the retinal images. In this research work, initially the Weiner filter is used for preprocessing the raw retinal image. Then the preprocessed image is segmented using fuzzy c-mean technique. Then from the segmented image, the features are extracted using grey level co-occurrence matrix (GLCM). After extracting the fundus image, the feature selection is performed stochastic gradient descent, and least absolute shrinkage and selection operator (LASSO) for accurate identification during the classification process. Then the inception v3-convolutional neural network (IV3-CNN) model is used in the classification process to classify the image as DR image or non-DR image. By applying the proposed method, the classification performance of IV3-CNN model in identifying DR is studied. Using the proposed method, the DR is identified with the accuracy of about 95%, and the processed retinal image is identified as mild DR.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
A compact FPGA-based montgomery modular multiplier
1. Indonesian Journal of Electrical Engineering and Computer Science
Vol. 21, No. 2, February 2021, pp. 735~743
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v21.i2.pp735-743 735
Journal homepage: http://ijeecs.iaescore.com
A compact FPGA-based montgomery modular multiplier
Ahmed A. H. Abd-Elkader1
, Mostafa Rashdan2
, El-Sayed A. M. Hasaneen3
, Hesham F. A. Hamed4
1
Qusier Telecom, Telecom Egypt, Red Sea, Egypt
2
Faculty of Energy Engineering, Aswan University, Aswan, Egypt
3
Faculty of Engineering, Aswan University, Aswan, Egypt
4
Faculty of Engineering, Minia University, Minia, Egypt, and Faculty of Engineering, Egyptian - Russian University,
Cairo, Egypt
Article Info ABSTRACT
Article history:
Received Jun 15, 2020
Revised Aug 11, 2020
Accepted Aug 21, 2020
This paper presents a compact design of Montgomery modular multiplier
(MMM). MMM serves as a building block commonly required in security
protocols relying on public key encryption. The proposed design is intended
for hardware applications of lightweight cryptographic modules that is utilized
for the system on chip (SoC) and internet of things (IoT) devices. The proposed
design is an enhancement of the original MMM without any multiplication or
subtraction processes. The main target of the new modification is enhancing
the performance and reducing the area of the MMM hardware module. The
operands and internal variables of the proposed hardware circuit is optimized
to be bounded to the smallest efficient size to minimize the area and the critical
path delay. The proposed design was coded in VHDL, implemented on the
Virtex-6 FPGA, and its performance was analyzed utilizing XILINX ISE
tools. Our design occupies the smallest area comparing with other
implementations on the same FPGA type. The proposed design saves in a
range between 60.0% and 99.0% of the resources compared with other relevant
designs.
Keywords:
FPGA
Lightweight cryptography
LUT
Modular multiplier
Virtex-6
This is an open access article under the CC BY-SA license.
Corresponding Author:
Ahmed A. H. Abd-Elkader
Qusier Telecom
Telecom Egypt, TE Qusier, Red Sea, Egypt
Email: eng.ahmed_a_elkader@yahoo.com
1. INTRODUCTION
Cryptography is an important area of concern in the field of information security [1]. The most
commonly used public-key cryptographic algorithms, such as RSA, Digital Signature Algorithm (DSA) and
Elliptical Curve Cryptography (ECC), primarily rely on modular multiplication [2]. Therefore, a cryptographic
system with high performance depends on the construction of modular multiplication. A Montgomery modular
multiplier (MMM) is the widely used modular multiplier [3]. MMM is an effective technique for implementing
the modular multiplication with large operands in high-performance hardware [4, 5]. For security schemes
which are based on public key cryptography, it is important to use hardware modules to achieve a high
performance. The security provided by means of public key cryptographic algorithms request a large number
of arithmetic operations on abstract algebraic structures (finite fields and groups) [2]. Furthermore, these
operations are conventionally executed over large numbers (160-3072 bits), which makes them considerably
time-consuming operations. This issue has motivated the researchers to modify new hardware, specialized for
accelerating the computation time as the focal design goal, which comes at the cost of high hardware resource
consumption. However, a major problem with this kind of applications is that, the embedded systems require
a fewer hardware resources. Therefore, this aspect is one of the greatest challenges of implementation of the
2. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 2, February 2021 : 735 - 743
736
light-weight cryptographic systems. The preferred solution is applying cryptographic algorithms on the Field
Programmable Gate Array (FPGA). The FPGA is low-cost, versatile and robust. For cryptographic systems in
particular; FPGA is able to reconfigure for any new security requirements. Low-cost and low-power FPGAs
are available on the market and are expected to become popular with applications like the Wireless Sensor
Network (WSN) or the Internet of Things (IoT) [6-9].
This research presents a new approach to the MMM algorithm with higher efficiency and lower area
costs, which is an important factor in the implementation of embedded devices systems. The proposed
algorithm is a modification of the radix-2 MMM structure to improve the area and the efficiency, no subtraction
operations are performed. The previous modification of MMM to discard the final subtraction presented a new
set of parameters and more cycles [10, 11]. This method involved a calculation of the Montgomery product
with longer operands and more iterations, which could reduce performance seriously. Our work imposes a
tighter bound on previous assumptions, with no increase in operand size or number of iterations, which enables
us to advance the hardware implementation efficiency. The compact structure of the proposed design over the
other competitive designs is appropriate for embedded systems and hardware applications of the IoT devices.
The proposed design has been coded in VHDL, and targeted Virtex-6 FPGA platform. The proposed design
has been synthesized utilizing Xilinx ISE, and simulated utilizing ModelSim. The new approach various bit-
length (160-1024) has been compared with the related works in terms of area in Slice LUTs, frequency in MHz,
time in 𝜇𝑠, throughput in Mbps, Area-Time and efficiency (Mbps per FPGA LUT). A comparison of the results
reveals the improvement of the utilized hardware resources of the proposed design over the related works.
The rest of the paper is organized as follows. Section 2 explorers the classical algorithm of Modular
Multiplier Reduction, the original MMM, and radix-2 MMM. Section 3 describes the algorithm, the hardware
implementation, and the bounds on the Inputs/Outputs (I/O) of the proposed design. Section 4 provides the
simulation and the synthesis results of the implementation of the proposed design. The bounds on the
Inputs/Outputs (I/O) of the proposed design has been analyzed. The simulation of the proposed design and the
accuracy of the design has been verified using ModelSim. The synthesis results of implementing the proposed
design on XILINX Virtex-6 FPGA have been provided utilizing Xilinx ISE. Finally, Section 5 concludes our
contribution work.
2. BACKGROUND
2.1. Classical modular multiplication
Algorithm 1 is a straightforward algorithm to compute the modular reduction of two multiplied
integers 𝒜, B. The first step is obtaining the product α of the integer numbers. The reduction modulo 𝓂 step
usually involves a division operation 𝒟 of α by the modulus 𝓂, q is the quotient. Computation of the quotient
𝑞 and remainder 𝒫 when 𝛼 is divided by 𝓂 is demonstrated in [12]. The third step is obtaining the residue 𝒫
of the division operation as the result of modular multiplication reduction. It is a very time-consuming operation
on both hardware and software platforms.
Algorithm-1 Classical Modular Multiplier
INPUT: Integers (𝒜, ℬ , 𝓂 )
OUTPUT: 𝒫 = 𝒜 × ℬ 𝑚𝑜𝑑 𝓂.
1. 𝛼 = 𝒜 × ℬ;
2. 𝒟 = 𝛼/𝓂 = 𝑞𝓂 + 𝒫;
3. Return (𝒫).
2.2. Montgomery modular multiplier algorithm
Montgomery introduced a technique to avoid the division process for the modular reduction of product
of two integers [3]. For the three integers (𝒜, ℬ, 𝓂), Montgomery substituted the division by modulus 𝓂 with
the division by modulus 𝑅. The Montgomery approach to calculate the reduction product of modulo 𝓂 for two
integers 𝒜 and ℬ is listed:
a) Choosing R > 𝓂. R is an integral power of 2. The integer 𝓂 must be odd to satisfy the condition
𝐺𝐶𝐷(𝑅, 𝓂) = 1.
b) Precompute the integers 𝑅−1
𝑎𝑛𝑑 𝓂
̀ such that:
𝑅𝑅−1
− 𝓂𝓂
̀ = 1 (1)
𝑅−1
≡ 1 𝑚𝑜𝑑 𝑚 (2)
𝓂
̀ = −𝑚−1
𝑚𝑜𝑑 𝑟 (3)
3. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A compact FPGA-based montgomery modular multiplier (Ahmed A. H. Abd-Elkader)
737
c) Transform the operand to its Montgomery domain, which is known also with 𝑅𝐸𝐷𝐶 function [3].
𝑅𝐸𝐷𝐶(𝑥, 𝑦) = 𝑥 × 𝑦 × 𝑅−1
𝑚𝑜𝑑 𝑚 (4)
𝒜̅ = 𝑅𝐸𝐷𝐶(𝒜, 𝑅2) = 𝒜 × 𝑅 𝑚𝑜𝑑 𝓂 (5)
ℬ
̅ = 𝑅𝐸𝐷𝐶(ℬ, 𝑅2) = ℬ × 𝑅 𝑚𝑜𝑑 𝓂 (6)
Giving the product in Montgomery domain such that:
𝒫
̅ = 𝑅𝐸𝐷𝐶(𝒜̅, ℬ
̅) = 𝒜 × ℬ × R 𝑚𝑜𝑑 𝓂 (7)
If 𝒫
̅ > 𝓂 then 𝒫
̅ = 𝒫
̅ − 𝓂 (8)
The previous step is expensive due to its reduction modulo 𝓂. Therefore, Montgomery applied a more
efficient way to calculate it.
𝑅𝐸𝐷𝐶(𝑥, 𝑦) =
(𝒮+𝒮 𝓂
̀ 𝑚𝑜𝑑 𝑅 )𝑚 )
𝑅
(9)
where 𝒮 = 𝑥 × 𝑦. This equation is applied for 𝒜̅, ℬ
̅, 𝒫
̅. To reveal the final result, it is necessary to inverse
transformation of the Montgomery domain to its original form:
𝑅𝐸𝐷𝐶(𝒫
̅, 1) =
(𝒫
̅+𝒫
̅ 𝓂
̀ 𝑚𝑜𝑑 𝑅 )𝑚 )
𝑅
(10)
The preceding approach is slow because of a lot of addition and multiplication operations. More
efficient approach is demonstrated in Algorithm-2, the algorithm Montgomery Modular Multiplier (denoted as
MMM algorithm).
Algorithm-2 Montgomery Modular Multiplier (MMM)
Input: Integers (𝒜, ℬ, 𝓂)[ 𝑘 bits] radix 𝑟 representation
where 0 ≤ (𝒜, ℬ) < 𝓂, R = 𝑟k
, 𝑔𝑐𝑑(𝓂, 𝑟) = 1,
𝓂
̀ = −𝓂−1
mod R
Output: 𝒫 = 𝒜 × ℬ × R−1
𝑚𝑜𝑑 𝓂
1. 𝒫 = 0;
2. For (𝑖 from 0 to k − 1, 𝑖 = 𝑖 + 1) do
3. 𝑡𝑖 = ((𝒫0 + 𝒜𝑖 × ℬ0) × 𝓂
̀ )Mod r;
4. 𝒫 = (𝒫 + 𝒜𝑖 × ℬ + 𝑡𝑖 × 𝓂)/r;
5. Loop;
6. If 𝒫 > 𝓂 then
𝒫 = 𝒫 − 𝓂;
end If
7. Return(𝒫).
The new approach utilizes the 𝑅𝐸𝐷𝐶 function for two integer operands 𝒜, ℬ directly. where
𝒮 = 𝒜 × ℬ.
𝒫 = 𝑅𝐸𝐷𝐶(𝒜, ℬ) =
(𝒮+𝒮 𝓂
̀ 𝑚𝑜𝑑 𝑅 )𝑚 )
𝑅
(11)
For MMM algorithm there are 𝑘 iterations, where 𝑘 is the bit length of the modulus 𝓂. The multiplier
𝒜 bits are scanned from LSB to MSB, 𝒫0 𝑎𝑛𝑑 ℬ0 are the LSBs of 𝒫 𝑎𝑛𝑑 ℬ respectively. The steps 3 and 4 is
repeated for every iteration for its corresponding 𝒜𝑖 and for its accumulated 𝒫 according to the step 3 result 1
or 0; modulus 𝓂 added or not to step 4. To get the validation result the MMM algorithm is used once more.
𝑅𝑒𝑑𝑐(𝒫, 1) =
(𝒫+𝒫 𝓂
̀ 𝑚𝑜𝑑 𝑅 )𝑚 )
𝑅
(12)
4. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 2, February 2021 : 735 - 743
738
Algorithm-3 Radix-2 (MMM)
Input: Integers (𝒜, ℬ, 𝓂)[ 𝑘 𝑏𝑖𝑡𝑠] radix-2 representation.
where 0 ≤ (𝒜, ℬ) < 𝓂, R = 2k
, 𝑔𝑐𝑑(𝓂, 2) = 1,
Output: 𝒫 = 𝒜 × ℬ × 2−𝑘
𝑚𝑜𝑑 𝓂
1. 𝒫 = 0;
2. 𝑭or (𝑖 from 0 to k − 1, 𝑖 = 𝑖 + 1) do
3. 𝑡𝑖 = (𝒫0 + 𝒜𝑖 × ℬ0) mod 2;
4. 𝒫 = (𝒫 + 𝒜𝑖 × ℬ + 𝑡𝑖 × 𝓂)/2;
5. Loop;
6. If 𝒫 > 𝓂 then
𝒫 = 𝒫 − 𝓂;
end If
7. Return(𝒫).
Algorithm-3 demonstrates the Radix-2 version of MMM, where 𝑟 = 2, [13, 14]. Therefor, 𝓂
̀ = 1.
The division by 2 in step-4 is a simple one-bit right shift operation. Therefore, for each iteration loop two
additions are required and the final subtraction (step 8).
3. RESEARCH METHOD
3.1. Proposed algorithm of the modular multiplier
The modified MMM Radix-2 modular multiplication algorithm is presented in Algorithm-4. This
algorithm is the modification of Algorithm-3 without the final subtraction step, and with a modification of the
construction to reduce the area and enhance the efficiency. In Algorithm-3, the multiplier corresponding bit 𝒜𝑖
is multiplied by the multiplicand Lest Significant Bit (LSB) ℬ0 in step-3, and by the multiplicand ℬ in step-4.
In Algorithm-3, the multiplication in step-3 is AND operation, and in step-4 is a Multiplexer.
Algorithm-4 Proposed Modular Multiplier
Input: Integers (𝒜, ℬ, 𝓂)[ 𝑘 𝑏𝑖𝑡𝑠] radix 2 representation.
𝑤ℎ𝑒𝑟𝑒 0 ≤ (𝒜, ℬ) < 𝓂, gcd(𝓂, 2) = 1.
Output: 𝒫 = 𝒜 × ℬ × 2−𝑘
𝑚𝑜𝑑 𝓂
1. α = 0;
2. 𝐹𝑜𝑟 (𝑖 𝑓𝑟𝑜𝑚 0 𝑡𝑜 𝑘 − 1, 𝑖 = 𝑖 + 1) 𝑑𝑜
3. If 𝒜𝑖= '1' then
γ = ℬ;
else
γ = 0;
End If;
4. If α1 𝑥𝑜𝑟 γ0 = '1' then
Ȥ = γ + 𝓂;
else
Ȥ = γ;
End If;
5. α = Ȥ + 𝒗;
6. 𝒗 = α >>1;
7. Loop;
8. 𝒫 = 𝒗
9. Return 𝒫
Nevertheless, in step-3 of Algorithm-4 the value of 𝒜𝑖 utilized only once and select between two
values; the multiplicand value ℬ or the zero value. The final subtraction in Algorithm-3 is a source of leakage
and hardware consumption. In order to remove the extra subtraction and have a uniform input and output range,
Walter [10] proposed altering the range of 𝒜, ℬ, 𝓂 to be within [0, 2𝑘), increasing the number of iterations
from 𝑘 to 𝑘 + 2, and setting the value of 𝑅 to 2𝑘+2
𝑚𝑜𝑑 𝓂 [15, 16]. In Algorithm-4 to decrease the leakage
and increase the efficiency there is no subtraction operation also, and the output is 𝑘 + 1 𝑏𝑖𝑡𝑠. The precondition
in [10] of input operands 𝒜 and ℬ is 𝒜 < 2𝓂 and ℬ < 2𝓂, which can strictly degrade performance [17, 18].
By contrast, the new modification in this work presented by Algorithm-4 retaining the range of operands and
modulus within [0, 𝑘), that has been enhanced the performance of the proposed design than the other related
works. Furthermore, the value of 𝑅 and iterations in the proposed Algorithm-4 is still 2𝑘
and 𝑘,respectively.
Therefore, these modifications improve the performance of the proposed algorithm. Furthermore, the hardware
resources and interconnects required will be less, and thus the area consumed is also has been reduced.
5. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A compact FPGA-based montgomery modular multiplier (Ahmed A. H. Abd-Elkader)
739
3.2. Proposed design of radix-2 MMM
The hardware circuit for Algorithm-4 is revealed in Figure 1. The proposed design utilizes a counter
instead of using loop to obtain a reduced area, it needs one Clock for each iteration, the clock increases the
counter by one. The counter counts 𝑘 iterations. Due to the registers used to synchronize implemented values;
there is a delay of one more clock, therefore, the counter counts from 0 to 𝑘. The counter is a part of control
unit responsible for scanning the multiplier 𝒜 from LSB to MSB and get out one bit for iteration (𝒜𝑖). The
proposed design is composed of one 𝑘 + 1 bit Adder, one 𝑘 + 2 bit Adder, and two Multiplexers. Initially
register 𝒗 is loaded with zero. The corresponding bit 𝒜𝑖 selects between two values, the multiplicand ℬ if it is
‘1’ or zero 𝑘 − 𝑏𝑖𝑡𝑠 otherwise. The output of the first Multiplexer is denoted by γ.
The first Adder executes addition of modulus 𝓂 and γ, the output is loaded to an input of the second
Multiplexer, the other input is γ. In step-4 of Algorithm-4, the inputs of the XOR gate are the bit of the order
one of the register α (α𝟏) and the LSB of the register γ (γ0). Α𝟏 is utilized instead of 𝑣0 to reduce the critical
path delay. The output of the XOR gate is the selector of the second Multiplexer to define even output (Ȥ) of
the Multiplexer. Step-4 defines the purpose of XOR gate and the second Multiplexer. According to the
condition that modulus 𝓂 must be odd; (odd integer + 𝓂) is an even integer. If γ is odd it converted to even
by adding modulus 𝓂 to it, step-4. The second Adder executes addition of accumulator 𝒗 and Ȥ, the output is
loaded to variable α. The value of α is shifted right one bit, divided by 2, and loaded to the register of the
accumulated value 𝒗. Those procedures are reiterated for each clock. Thus, the given architecture takes 𝑘+1
clock cycle to make out the calculation process, subsequently, the value of register 𝒗 is loaded to the output 𝒫.
The output 𝒫, 𝑘 + 1 𝑏𝑖𝑡𝑠,is the reduction product of the two input integers multiplied by 2−𝑘
. The size of the
hardware components depends on the size of the operands and the modulus for the multiplication process. This
design can be easily scaled according to requirement for any number of bits.
Figure 1. Proposed modular multiplier circuit
3.3. Bounds on the I/O
Outputs from multiplications are re-used as inputs throughout the cryptographic systems. So, it's
important to keep those numbers bound. In particular, for all outputs 𝒫 we will show that 𝒫 < 2𝑘
is
maintainable. The four variables in the Algorithm-4 are γ, Ȥ, α, and 𝒗, the bounded bit-length of it are 𝑘, 𝑘 +
1, 𝑘 + 2, and 𝑘 + 1, respectively, Figure 1. The maximum value of modulus 𝓂 is:
𝓂 = 2𝑘
− 1 (13)
The precondition of MMM is ℬ < 𝓂, thus, the maximum value of variable γ is the maximum value
of the multiplicand ℬ:
γ = ℬ = 2𝑘
− 2 (14)
6. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 2, February 2021 : 735 - 743
740
The value of variable Ȥ is:
Ȥ = γ + 𝓂 (15)
The maximum value of variable α is:
α = Ȥ + 𝑣ℎ−1 (16)
where, ℎ is the iteration number. The value of the accumulated variable 𝑣ℎ is:
𝑣 =
𝛼
2
(17)
After the last iteration ℎ = 𝑘 , and the value of the output 𝒫 is:
𝒫 = 𝑣𝑘 (18)
The verification that the output and other variables for 𝑘 = 8 are below the size of their implemented
hardware variables shown in Table 1. The ceiling function maps 𝛼/2 to the least integer greater than or equal
to 𝛼/2. In general, the maximum values of the Algorithm-4 variables are as follows for any 𝑘 − 𝑏𝑖𝑡 (𝓐, 𝓑, 𝓶):
γ = 2𝑘
− 2 (19)
Ȥ = γ + 𝓂 = 2𝑘+1
− 3 (20)
𝛼 = Ȥ + 𝒗ℎ−1 = 2𝑘+2
− 2𝑘−(ℎ−2)
− 5 (21)
𝑣ℎ = ⌈
𝛼
2
⌉ = 2𝑘+1
− 2𝑘−(ℎ−1)
− 2 (22)
The maximum value of the output 𝒫 is:
𝒫 = 𝑣𝑘 = 2𝑘+1
− 4Therefore, the bit-length of the output 𝒫 is bounded on 𝑘 + 1 bits.
Table 1. Bounds on the output of the proposed design
ℎ 𝑣ℎ−1
< 2𝑘+1
γ
< 2𝑘
Ȥ = γ + 𝓂
< 2𝑘+1
𝛼 = Ȥ + 𝑣ℎ−1
< 2𝑘+2
𝑣ℎ = ⌈ 𝛼/2⌉
< 2𝑘+1
1 0 2𝑘
− 2 2𝑘+1
− 3 2𝑘+1
− 3 2𝑘
− 1
2 2𝑘
− 1 2𝑘
− 2 2𝑘+1
− 3 2𝑘+2
− 2𝑘
− 4 2𝑘+1
− 2𝑘−1
− 2
3 2𝑘+1
− 2𝑘−1
− 2 2𝑘
− 2 2𝑘+1
− 3 2𝑘+2
− 2𝑘−1
− 5 2𝑘+1
− 2𝑘−2
− 2
4 2𝑘+1
− 2𝑘−2
− 2 2𝑘
− 2 2𝑘+1
− 3 2𝑘+2
− 2𝑘−2
− 5 2𝑘+1
− 2𝑘−3
− 2
5 2𝑘+1
− 2𝑘−3
− 2 2𝑘
− 2 2𝑘+1
− 3 2𝑘+2
− 2𝑘−3
− 5 2𝑘+1
− 2𝑘−4
− 2
6 2𝑘+1
− 2𝑘−4
− 2 2𝑘
− 2 2𝑘+1
− 3 2𝑘+2
− 2𝑘−4
− 5 2𝑘+1
− 2𝑘−5
− 2
7 2𝑘+1
− 2𝑘−5
− 2 2𝑘
− 2 2𝑘+1
− 3 2𝑘+2
− 2𝑘−5
− 5 2𝑘+1
− 2𝑘−6
− 2
8 2𝑘+1
− 2𝑘−6
− 2 2𝑘
− 2 2𝑘+1
− 3 2𝑘+2
− 2𝑘−6
− 5 2𝑘+1
− 2𝑘−7
− 2
4. RESULTS AND DISCUSSION
The hardware architectures for MMM, existing and proposed method, has been presented in Sections
2 and 3, respectively. This architecture of the proposed design has been implemented in VHDL. This work has
been simulated and synthesized using ModelSim and Xilinx ISE tools respectively, where the target device is
used Virtex-6 device XC6VLX760-2FF1760 FPGA.
4.1. Simulation of the proposed algorithm
The simulation and verification of the design is carried out using ModelSim, Figure 2. As an
example, 𝑘 = 8, 𝒜 = 165, ℬ = 231, and the modulus 𝓂 = 245. As expected, the final result is 𝒫 =
280,where, 𝒫 = 𝒜 × ℬ × 2−𝑘
𝑚𝑜𝑑 𝑚 = 165 ∗ 231 ∗ 2−8
𝑚𝑜𝑑 245 = 280. The result is 𝑘 + 1 bits as a
consequence of eradicating the final subtraction step.
7. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A compact FPGA-based montgomery modular multiplier (Ahmed A. H. Abd-Elkader)
741
Figure 2. Simulation of the proposed MMM
4.2. Implementation and comparison
In this section, a comparison of state of the art 𝐺𝐹(𝑝) hardware Montgomery Modular Multiplication
is presented. Table 2 shows that, the performance analysis of our work with existing designs. The metrics used
to evaluate the proposed hardware designs for various bit-length are area in Slice LUTs, frequency in MHz,
time in 𝜇𝑠, throughput in Mbps, Area-Time per bit-length (AT/b), and efficiency (Mbps per FPGA LUT).
Efficiency metric has been used in previous works to evaluate the area resources used and performance
achieved in cryptographic hardware architectures [19].
K. Javeed [14] presented a radix-2 implementation of the MMM and IMM algorithms on FPGA. The
outcomes indicate that the radix-2 MMM design is more proficient in terms of calculation time, FPGA slice
area and throughput as compared to the radix-2 IMM design. The synthesis results of implementation of MMM
are shown in Table 2. [14] offers low computation time and reasonable throughput, but at the cost of higher
area. Our proposed design has an advance on all aspects over the comparative one. Our work has achieved an
improvement in efficiency by 61%, while consuming only about 40 % of the total slice-LUTs when compared
with [14].
S. Ghosh [20] presented a modification of an interleaved multiplier using Montgomery ladder and the
high-speed adder circuits. The design of [20] offers higher frequency and throughput for 256 and 224 bit-length.
In contrast, this work consumes much higher area, a higher AT/b requirement, and lower efficiency. A comparison
of the two works reveals that our proposed algorithm has achieved an advance in efficiency by an average 5.25
times the efficiency of [20], whereas also consuming only around 19% of the total slice-LUTs.
Based on interleaved multiplication algorithm; K. Javeed [21] presented a modified version of radix-
4 and radix-8 Booth encoded modular multipliers over general 𝐺𝐹(𝑝). Because of the utilization of higher
radix approach; the design has a lower clock- cycles, and higher throughput for 224 and 256 bit-length. On the
other hand, this approach utilized much hardware resources, a higher AT/b requirement, and lower efficiency.
Our proposed algorithm has performed an improvement in efficiency by around seven-times the efficiency of
[21], furthermore, consuming less than 13% of the total slice-LUTs.
The previous works introduced the term “Normalized-LUT” to stand for DSP cost in the measure of
LUT [11, 22, 23]. Yan [24] presented an Implementation of hybrid 256-bit Montgomery modular multiplier
over GF (p) on FPGAs, which utilizes Karatsuba and Knuth multiplication algorithms in various stages. This
work provided higher frequency and higher throughput over our proposed work. By contrast, the cost of
utilizing the hardware resource utilization was very high, and the efficiency is very low. Our work has attained
an enhancement in efficiency by 10.5 times the efficiency of [24], while consuming only 1.0% of the total
slice-LUTs.
Based on an interleaved multiplication algorithm and Montgomery power laddering, Javeed [25]
presented an implementation of radix-4 modular multipliers. This work has a development in frequency,
throughput over the previous work in [21] by twice. Nevertheless, our proposed algorithm has an advanced
efficiency over algorithm in [25] by four- times, while consuming only 12% of the total slice-LUTs.
Liu [11] proposed a design of 258-bit multiplier based on KO-3 algorithm construed from Karatsuba
algorithm. This study has a perfection in time, and throughput over the proposed design, but at the cost of the much
higher area, thus having a higher AT/b requirement and lower efficiency. Our work has attained an advance in
efficiency by 2.7 times the efficiency of [11], while consuming only 0.33% of the total slice-LUTs.
J. Ding [22] introduced Broken-Karatsuba multiplication with the non-least-positive form. Based on
the modified modular multiplication algorithm, a 256-bit two-stage modular multiplier was built. This work
accomplishes the computation of 256-bit MMM in just 9 cycles; thus, it has a perfect delay time, and a high
throughput. On the other hand, the cost of the area is much higher, thus having a higher AT/b requirement and
lower efficiency. Our work has attained an improvement in efficiency by 1.8 times the efficiency of [22], while
consuming only 1.8% of the total slice-LUTs.
8. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 2, February 2021 : 735 - 743
742
J. Ding [26] implemented Montgomery modular multiplication and utilized a modular multiplication
approach based on non-least positive form (NLP) combining Karatsuba and schoolbook multiplication, which
saves 2 base multiplications compared to Karatsuba-only designs. This work performs 256-bit MMM
computation in just 16 cycles; hence, it has a reasonable delay time and a high throughput. By contrast, this
work consumes much higher area, a higher requirement for AT / b and less efficiency. A comparison of the
two works shows that our proposed algorithm has achieved an average efficiency advance of 1.1 times the
efficiency of 3-way technique in [26], whereas only about 3.4% of the total slice-LUTs are consumed.
Table 2. Comparison of modular multiplication implementations on FPGA
5. CONCLUSION
This paper introduces a modular multiplier with a novel algorithm and compact architecture based
on the MMM algorithm without any subtraction or multiplication procedures. The motivation behind this work,
is to present a design of a lower area and higher efficiency than the relevant designs of MMM. This work has
been implemented on Xilinx Virtex-6 FPGA platform, and verified by Modelsim simulator. The
implementations of various bit lengths have been evaluated in terms of area, frequency, throughput, AT/b, and
efficiency. The proposed multiplier performs a 256-bit and 1024-bit modular multiplication in 1.64 µs and
19.92 µs, occupies 614 LUTs and 2408 LUTs, and runs at 156 MHz and 51 MHz, respectively. We can infer
from the synthesized results that the proposed multiplier consumes the lowest area as compared with other
multipliers. Consequently, the developed architecture has a competent area-time and enhanced efficiency.
Thus, the developed architecture is a very adequate to be applied for efficient lightweight asymmetric
cryptosystems.
REFERENCES
[1] H. Delfs and H. Knebl, "Introduction to Cryptography Principles and Applications", Third. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2015.
[2] W. Stallings, "Cryptography and Network Security Principles and Practice, Six", New Jersey: Pearson Education,
2014.
[3] P. L. Montgomery, “Modular Multiplication Without Trial Division,” Math. Comput., vol. 44, no. 170, p. 519, 1985,
doi: 10.2307/2007970.
[4] A. Parihar and S. Nakhate, “Fast Montgomery modular multiplier for Rivest-Shamir-Adleman cryptosystem,” IET
Inf. Secur., vol. 13, no. 3, pp. 231-238, 2019, doi: 10.1049/iet-ifs.2018.5191.
Design Bit-
length
Frequency
(MHz)
Slice
LUTs
DSP Normalized
LUT
Time
(μs)
Throughput
(Mbps)
AT/b Efficiency
(Mbps/LUT)
[14] 160 207 958 0 958 0.78 205.13 4.67 0.214
192 186 1150 0 1150 1.04 184.62 6.23 0.161
224 169 1342 0 1342 1.34 167.16 8.03 0.125
256 154 1534 0 1534 1.68 152.38 10.07 0.099
384 115 2302 0 2302 3.36 114.29 20.14 0.050
512 92.5 3070 0 3070 5.56 92.09 33.34 0.030
1024 51 6142 0 6142 20.10 50.95 120.56 0.008
[20] 160 191 2002 0 2002 0.84 190.48 10.51 0.095
192 184.6 2401 0 2401 1.04 184.62 13.01 0.077
224 179.1 2787 0 2787 1.25 179.20 15.55 0.064
256 174 3207 0 3207 1.48 172.97 18.54 0.054
[21] 160 91.6 2911 0 2911 0.88 181.82 16.01 0.062
192 89 3511 0 3511 1.08 177.78 19.75 0.051
224 87.3 4053 0 4053 1.29 173.64 23.34 0.043
256 85.5 4606 0 4606 1.50 170.67 26.99 0.037
[24] 256 206 16900 108 81376 0.13 1954.20 41.65 0.024
[25] 256 166 5300 0 5300 0.77 332.47 15.94 0.063
[11] 256 68 187900 0 187900 0.01 17414.97 10.79 0.093
[22] 256 161 4700 48 33356 0.06 4579.56 7.28 0.137
[26] 256 256 3500 24 17828.00 0.06 4096.00 4.35 0.230
Proposed
design
160 210.82 379 0 379 0.76 209.51 1.81 0.553
192 189.13 451 0 451 1.02 188.15 2.40 0.417
224 171.49 525 0 525 1.31 170.73 3.08 0.325
256 156.86 614 0 614 1.64 156.24 3.93 0.254
384 116.94 914 0 914 3.29 116.64 7.84 0.128
512 93.22 1211 0 1211 5.50 93.04 13.02 0.077
1024 51.46 2408 0 2408 19.92 51.41 46.84 0.021
9. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A compact FPGA-based montgomery modular multiplier (Ahmed A. H. Abd-Elkader)
743
[5] E. Ozcan and S. S. Erdem, “A High Performance Full-Word Barrett Multiplier Designed for FPGAs with DSP
Resources,” in Digital Circuits and Sub-Systems II, vol. 2, no. 3, pp. 73-76, 2019. doi: 10.1109/prime.2019.8787740.
[6] F. G. Abdulkadhim, Z. Yi, C. Tang, M. Khalid, and S. A. Waheeb, “A survey on the applications of IoT: An
investigation into existing environments, present challenges and future opportunities,” TELKOMNIKA
(Telecommunication, Computing, Electronics and Control), vol. 18, no. 3, pp. 1447-1458, 2020, doi:
10.12928/TELKOMNIKA.v18i3.15604.
[7] R. Jyothi and N. G. Cholli, “An efficient approach for secured communication in wireless sensor networks,” Int. J.
Electr. Comput. Eng., vol. 10, no. 2, pp. 1641-1647, 2020, doi: 10.11591/ijece.v10i2.
[8] M. Gunturi, H. D. Kotha, and M. Srinivasa Reddy, “An overview of internet of things,” J. Adv. Res. Dyn. Control
Syst., vol. 10, no. 9, pp. 659-665, 2018, doi: 10.12928/telkomnika.v18i5.15911.
[9] A. Karim Mohamed Ibrahim, R. A. Rashid, A. H. F. A. Hamid, M. Adib Sarijari, and M. A. Baharudin, “Lightweight
IoT middleware for rapid application development,” TELKOMNIKA (Telecommunication, Computing, Electronics
and Control), vol. 17, no. 3, pp. 1385-1392, 2019, doi: 10.12928/TELKOMNIKA.V17I3.11793.
[10] C. D. Walter, “Montgomery exponentiation needs no final subtractions,” Electron. Lett., vol. 35, no. 21, pp. 1831-
1832, 1999, doi: 10.1049/el:19991230.
[11] R. Liu and S. Li, “A design and implementation of montgomery modular multiplier,” in Proceedings - IEEE
International Symposium on Circuits and Systems, vol. 2019-May, no. 1, pp. 1-4, 2019, doi:
10.1109/ISCAS.2019.8702684.
[12] B. Schneier, “Applied Cryptography,” Electr. Eng., vol. 1, no. 32, pp. 429-455, 1996, doi: 10.1.1.99.2838.
[13] K. Pratibha and R. Muthaiah, “Survey on Hardware Implementation of Montgomery Modular,” Int. J. Pure Appl.
Math., vol. 119, no. 12, pp. 13437-13452, 2018.
[14] K. Javeed, D. Irwin, and X. Wang, “Design and performance comparison of modular multipliers implemented on
FPGA platform,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics),
vol. 10039 LNCS, pp. 251-260, 2016, doi: 10.1007/978-3-319-48671-0_23.
[15] G. S. Y. Baek, “Uniform Montgomery multiplier,” J. Cryptogr. Eng., 2019, doi: 10.1007/s13389-019-00213-7.
[16] M. Der Shieh, J. H. Chen, W. C. Lin, and H. H. Wu, “A new algorithm for high-speed modular multiplication design,”
IEEE Trans. Circuits Syst. I Regul. Pap., vol. 56, no. 9, pp. 2009-2019, 2009, doi: 10.1109/TCSI.2008.2011585.
[17] Z. Liu and J. Großschädl, “New speed records for montgomery modular multiplication on 8-bit AVR
microcontrollers,” Prog. Cryptology--AFRICACRYPT 2014, vol. 8469 LNCS, pp. 215-234, 2014, doi: 10.1007/978-
3-319-06734-6_14.
[18] G. Hachez and J. J. Quisquater, “Montgomery exponentiation with no final subtractions: Improved results,” in
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 1965 LNCS, pp. 293-301, 2000, doi: 10.1007/3-540-44499-8-23.
[19] L. Rodríguez-Flores, M. Morales-Sandoval, R. Cumplido, C. Feregrino-Uribe, and I. Algredo-Badillo, “Compact
FPGA hardware architecture for public key encryption in embedded devices,” PLoS One, vol. 13, no. 1, pp. 1-21,
2018, doi: 10.1371/journal.pone.0190939.
[20] S. Ghosh, D. Mukhopadhyay, and D. R. Chowdhury, “High Speed F p Multipliers and Adders on FPGA Platform,”
in Design and Architectures for Signal and Image Processing (DASIP), pp. 21-26, 2010.
[21] K. Javeed and X. Wang, “Radix-4 and radix-8 booth encoded interleaved modular multipliers over general Fp,” Conf.
Dig. - 24th Int. Conf. F. Program. Log. Appl. FPL 2014, 2014, doi: 10.1109/FPL.2014.6927452.
[22] J. Ding and S. Li, “Broken-Karatsuba multiplication and its application to Montgomery modular multiplication,” in
2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017, pp. 5-8, 2017, doi:
10.23919/FPL.2017.8056769.
[23] G. C. T. Chow, K. Egurot, W. Luk, and P. Leong, “A karatsuba-based montgomery multiplier,” Proc. - 2010 Int.
Conf. F. Program. Log. Appl. FPL 2010, pp. 434-437, 2010, doi: 10.1109/FPL.2010.89.
[24] X. Yan, G. Wu, D. Wu, F. Zheng, and X. Xie, “An implementation of montgomery modular multiplication on
FPGAs,” in Proceedings - 2013 International Conference on Information Science and Cloud Computing, ISCC 2013,
pp. 32-38, 2014, doi: 10.1109/ISCC.2013.19.
[25] K. Javeed, X. Wang, and M. Scott, “Serial and parallel interleaved modular multipliers on FPGA platform,” 25th Int.
Conf. F. Program. Log. Appl. FPL 2015, pp. 2-5, 2015, doi: 10.1109/FPL.2015.7293986.
[26] J. Ding and S. Li, “A low-latency and low-cost Montgomery modular multiplier based on NLP multiplication,” IEEE
Trans. Circuits Syst. II Express Briefs, vol. 67, no. 7, pp. 1319-1323, 2020, doi: 10.1109/TCSII.2019.2932328.