RSA is one of the most widely adopted public key algorithms at present and it requires repeated modular
multiplications to accomplish the computation of modular exponentiation. A famous approach to implement
modular multiplication in hardware circuits is based on the Montgomery modular multiplication algorithm since
it has many advantages. To speed up the encryption/decryption process, many high-speed Montgomery modular
multiplication algorithms and hardware architectures employ carry-save addition. But CSA based architecture
increases the area. In this paper, in order to reduce the area of the CSA based multiplier, an area-efficient
algorithm called Double Add Reduce algorithm is introduced. Then the performance analysis of the new design
with the previous had done for comparison.
Arithmetic Operations in Multi-Valued LogicVLSICS Design
This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, keeping minimum number of gates and depth of net in to onsideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice.
In this project 31 % area delay product reduction is possible with the use of the CSLA based 32 bit unsigned parallel multiplier than CLAA based 32 bit unsigned parallel multiplier
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Arithmetic Operations in Multi-Valued LogicVLSICS Design
This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, keeping minimum number of gates and depth of net in to onsideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice.
In this project 31 % area delay product reduction is possible with the use of the CSLA based 32 bit unsigned parallel multiplier than CLAA based 32 bit unsigned parallel multiplier
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Design of 8-Bit Comparator Using 45nm CMOS TechnologyIJMER
In this paper design of 8- bit binary comparator using 45nm CMOS technology is discussed.
This design needs less area and less number of transistors, also discussed about power and execution time. The
circuit has three output X, Y and Z. X is active high, when A>B, Y is active high when A=B and Z is active high
when both X and Y are active low. Design 1- bit comparator with the help of precharge gate.The design of 1-bit
comparator has been extended to implement an 8-bit comparator by connecting in series with pass
transistor between them. The design has been implemented in Microwind3.1, is tested successfully and
has been validated using Pspice for different measurable parameter.
Layout Design Analysis of CMOS Comparator using 180nm TechnologyIJEEE
Comparator is a very useful and basic arithmetic component of digital system. In the world of technology the demand of portable devices are increasing day by day. This paper presents CMOS design of 1-bit comparator on 180nm technology. The layout of 1-bit comparator has been developed using Automatic and semi-custom techniques. Both the layouts are compared and analyzed in terms of their Power and Area consumption. Automatic layout is generated from its equivalent schematic whereas semi-custom layout is developed manually. The result shows that semi-custom consumes less power as compared to Automatic.
Hardware Implementation of Two’s Compliment Multiplier with Partial Product b...IJERA Editor
With the emergence of portable computing and communication systems, power consumption has become one of the major objectives during VLSI design. Furthermore, the multiplication is an essential arithmetic operation for common DSP applications, such as filtering, convolution, fast Fourier Transform (FFT) etc. To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efficient multipliers are very important for the design of low-power DSP systems. This paper presents an approach to reduce power consumption of 2’s compliment multiplier design, in which switching activities are reduced through dynamic by passing of partial products.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Efficient Layout Design of CMOS Full SubtractorIJEEE
Arithmetic circuits and for that matter Combinational circuit design is very important part of VLSI design process. The pertinent issues involved are layout area and power consumption. The main aim of this paper is to design a full subtractor using 90 nm technology. The proposed full subtractor has been designed and simulated using DSCH 3.1 auto-generated design and using Microwind 3.1 simulation software for semi-custom design. The results obtained show that the semi-custom design is area efficient than the auto-generated design. On the other hand, power consumption in the later is more as compared to the auto- generated design.
Area-Delay Efficient Binary Adders in QCAIJERA Editor
In this paper, a novel quantum-dot cellular automata (QCA) adder design is presented that decrease the number
of QCA cells compared to previously method designs. The proposed one-bit QCA adder is based on a new
algorithm that requires only three majority gates and two inverters for the QCA addition. A novel 128-bit adder
designed in QCA was implemented. It achieved speed performances higher than all the existing. QCA adders,
with an area requirement comparable with the low RCA and CFA established. The novel adder operates in the
RCA functional, but it could propagate a carry signal through a number of cascaded MGs significantly lower
than conventional RCA adders. In adding together, because of the adopted basic logic and layout strategy, the
number of clock cycles required for completing the explanation was limited. As transistors reduce in size more
and more of them can be accommodated in a single die, thus increasing chip computational capabilities.
However, transistors cannot find much smaller than their current size. The quantum-dot cellular automata
approach represents one of the possible solutions in overcome this physical limit, even though the design of
logic modules in QCA is not forever straightforward.
Design of 8-Bit Comparator Using 45nm CMOS TechnologyIJMER
In this paper design of 8- bit binary comparator using 45nm CMOS technology is discussed.
This design needs less area and less number of transistors, also discussed about power and execution time. The
circuit has three output X, Y and Z. X is active high, when A>B, Y is active high when A=B and Z is active high
when both X and Y are active low. Design 1- bit comparator with the help of precharge gate.The design of 1-bit
comparator has been extended to implement an 8-bit comparator by connecting in series with pass
transistor between them. The design has been implemented in Microwind3.1, is tested successfully and
has been validated using Pspice for different measurable parameter.
Layout Design Analysis of CMOS Comparator using 180nm TechnologyIJEEE
Comparator is a very useful and basic arithmetic component of digital system. In the world of technology the demand of portable devices are increasing day by day. This paper presents CMOS design of 1-bit comparator on 180nm technology. The layout of 1-bit comparator has been developed using Automatic and semi-custom techniques. Both the layouts are compared and analyzed in terms of their Power and Area consumption. Automatic layout is generated from its equivalent schematic whereas semi-custom layout is developed manually. The result shows that semi-custom consumes less power as compared to Automatic.
Hardware Implementation of Two’s Compliment Multiplier with Partial Product b...IJERA Editor
With the emergence of portable computing and communication systems, power consumption has become one of the major objectives during VLSI design. Furthermore, the multiplication is an essential arithmetic operation for common DSP applications, such as filtering, convolution, fast Fourier Transform (FFT) etc. To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efficient multipliers are very important for the design of low-power DSP systems. This paper presents an approach to reduce power consumption of 2’s compliment multiplier design, in which switching activities are reduced through dynamic by passing of partial products.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Efficient Layout Design of CMOS Full SubtractorIJEEE
Arithmetic circuits and for that matter Combinational circuit design is very important part of VLSI design process. The pertinent issues involved are layout area and power consumption. The main aim of this paper is to design a full subtractor using 90 nm technology. The proposed full subtractor has been designed and simulated using DSCH 3.1 auto-generated design and using Microwind 3.1 simulation software for semi-custom design. The results obtained show that the semi-custom design is area efficient than the auto-generated design. On the other hand, power consumption in the later is more as compared to the auto- generated design.
Area-Delay Efficient Binary Adders in QCAIJERA Editor
In this paper, a novel quantum-dot cellular automata (QCA) adder design is presented that decrease the number
of QCA cells compared to previously method designs. The proposed one-bit QCA adder is based on a new
algorithm that requires only three majority gates and two inverters for the QCA addition. A novel 128-bit adder
designed in QCA was implemented. It achieved speed performances higher than all the existing. QCA adders,
with an area requirement comparable with the low RCA and CFA established. The novel adder operates in the
RCA functional, but it could propagate a carry signal through a number of cascaded MGs significantly lower
than conventional RCA adders. In adding together, because of the adopted basic logic and layout strategy, the
number of clock cycles required for completing the explanation was limited. As transistors reduce in size more
and more of them can be accommodated in a single die, thus increasing chip computational capabilities.
However, transistors cannot find much smaller than their current size. The quantum-dot cellular automata
approach represents one of the possible solutions in overcome this physical limit, even though the design of
logic modules in QCA is not forever straightforward.
This paper presents a compact design of Montgomery modular multiplier
(MMM). MMM serves as a building block commonly required in security
protocols relying on public key encryption. The proposed design is intended
for hardware applications of lightweight cryptographic modules that is utilized
for the system on chip (SoC) and internet of things (IoT) devices. The proposed
design is an enhancement of the original MMM without any multiplication or
subtraction processes. The main target of the new modification is enhancing
the performance and reducing the area of the MMM hardware module. The
operands and internal variables of the proposed hardware circuit is optimized
to be bounded to the smallest efficient size to minimize the area and the critical
path delay. The proposed design was coded in VHDL, implemented on the
Virtex-6 FPGA, and its performance was analyzed utilizing XILINX ISE
tools. Our design occupies the smallest area comparing with other
implementations on the same FPGA type. The proposed design saves in a
range between 60.0% and 99.0% of the resources compared with other relevant
designs.
Arithmetic Operations in Multi-Valued Logic VLSICS Design
This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, keeping minimum number of gates and depth of net in to consideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice.
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
Nowadays low power design circuits are major important for data transmission and processing the information among various system designs. One of the major multipliers used for synchronizing the data transmission is the systolic array multiplier, low power designs are mostly used for increasing the performance and reducing the hardware complexity. Among all the mathematical operations, multiplier plays a major role where it processes more information and with the high complexity of circuit in the existing irreversible design. We develop a systolic array multiplier using reversible gates for low power appliances, faults and coverage of the reversible logic are calculated in this paper. To improvise more, we introduced a reversible logic gate and tested the reversible systolic array multiplier using the fault injection method of built-in self-test block observer (BILBO) in which all corner cases are covered which shows 97% coverage compared with existing designs. Finally, Xilinx ISE 14.7 was used for synthesis and simulation results and compared parameters with existing designs which prove more efficiency.
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
The Digital Image processing applications like medical imaging, satellite imaging, Biometric trait images
etc., rely on multipliers to improve the quality of image. However, existing multiplication techniques
introduce errors in the output with consumption of more time, hence error free high speed multipliers has
to be designed. In this paper we propose FPGA based Recursive Error Free Mitchell Log Multiplier
(REFMLM) for image Filters. The 2x2 error free Mitchell log multiplier is designed with zero error by
introducing error correction term is used in higher order Karastuba-Ofman Multiplier (KOM)
Architectures. The higher order KOM multipliers is decomposed into number of lower order multipliers
using radix 2 till basic multiplier block of order 2x2 which is designed by error free Mitchell log multiplier.
The 8x8 REFMLM is tested for Gaussian filter to remove noise in fingerprint image. The Multiplier is
synthesized using Spartan 3 FPGA family device XC3S1500-5fg320. It is observed that the performance
parameters such as area utilization, speed, error and PSNR are better in the case of proposed architecture
compared to existing architectures.
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
In this paper the Delay Computation method for Common Sub expression Elimination algorithm is being implemented on Cyclotomic Fast Fourier Transform. The Common Sub Expression Elimination algorithm is combined with the delay computing method and is known as Gate Level Delay Computation with Common Sub expression Elimination Algorithm. Common sub expression elimination is effective
optimization method used to reduce adders in cyclotomic Fourier transform. The delay computing method is based on delay matrix and suitable for implementation with computers. The Gate level delay computation method is used to find critical path delay and it is analyzed on various finite field elements. The presented algorithm is established through a case study in Cyclotomic Fast Fourier Transform over finite field. If Cyclotomic Fast Fourier Transform is implemented directly then the system will have high additive complexities. So by using GLDC-CSE algorithm on cyclotomic fast Fourier transform, the additive
complexities will be reduced and also the area and area delay product will be reduced.
Multipliers play an important role in today’s digital signal processing (DSP) and various other
applications. Multiplication is the most time consuming process in various signal processing operations like
convolution, circular convolution, auto-correlation and cross-correlation. With advances in technology, many
researchers have tried and are trying to design multipliers which offer either of the following- high speed, low
power consumption, regularity of layout and hence less area or even combination of them in multiplier. However
area and speed are two conflicting constraints. So improving speed results always in larger areas. So here we try
to find out the best trade off solution among the both of them. To have features like high speed and low power
consumption multipliers several algorithms have been introduced .In this paper, we describes Multipliers by using
various algorithm in VLSI technology.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Low Power VLSI Design of Modified Booth Multiplieridescitation
Low power VLSI circuits became very vital criteria
for designing the energy efficient electronic designs for prime
performance and compact devices. Multipliers play a very
important role for planning energy economical processors that
decides the potency of the processor. To scale back the facility
consumption of multiplier factor booth coding methodology
is being employed to rearrange the input bits. The operation
of the booth decoder is to rearrange the given booth equivalent.
Booth decoder can increase the range of zeros in variety. Hence
the switching activity are going to be reduced that further
reduces the power consumption of the design. The input bit
constant determines the switching activity part that’s once
the input constant is zero corresponding rows or column of
the adder ought to be deactivated. When multiplicand contains
a lot of number of zeros the higher power reduction will takes
place. therefore in booth multiplier factor high power
reductions are going to be achieved.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
An Area-efficient Montgomery Modular Multiplier for Cryptosystems
1. Anizha Radhakrishnan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 9( Version 1), September 2014, pp.210-214
www.ijera.com 210 | P a g e
An Area-efficient Montgomery Modular Multiplier for Cryptosystems Anizha Radhakrishnan*, Seena George** *(M.tech scholar, Department of ECE, Sree Narayana Gurukulam College of Engineering , Kolenchery, India)
** (Asst. Professor, Department of ECE, Sree Narayana Gurukulam College of Engineering , Kolenchery, India)
Abstract RSA is one of the most widely adopted public key algorithms at present and it requires repeated modular multiplications to accomplish the computation of modular exponentiation. A famous approach to implement modular multiplication in hardware circuits is based on the Montgomery modular multiplication algorithm since it has many advantages. To speed up the encryption/decryption process, many high-speed Montgomery modular multiplication algorithms and hardware architectures employ carry-save addition. But CSA based architecture increases the area. In this paper, in order to reduce the area of the CSA based multiplier, an area-efficient algorithm called Double Add Reduce algorithm is introduced. Then the performance analysis of the new design with the previous had done for comparison.
Keywords— Area-efficient architecture, Carry save addition, Cryptography, Double add reduce algorithm, Montgomery modular multiplier.
I. INTRODUCTION
The motivation for studying area-efficient and fast modular multiplication algorithms comes from their application in public key cryptography. Modular exponentiation is a critical operation in RSA [1], Diffie- Hellman key exchange etc. It is a popular operation for encryption in public key cryptosystems. The core operation in this method is repeated modular multiplication. Many algorithms are available for performing fast modular multiplication. These algorithms can be grouped in to two. That is, the first category algorithms perform modular reduction followed by multiplication. Classical, Karatsuba, Schonhage- Strassen and Barrett algorithms are examples of such multiplication and modular reduction algorithms [2], [3]. The second category performs modular reduction combined with multiplication. Montgomery algorithm [4] introduced in 1985, is a famous approach for carrying out fast modular multiplication and belongs to the second category. For a single modular multiplication Montgomery method is disadvantageous. But the advantage comes when it performs modular exponentiation. Montgomery algorithm performs modular multiplication without division. That is, it replaces division with simple bit shift operation. In many public key systems Montgomery method is used for obtaining fast modular multiplication.
Many algorithms are introduced to perform Montgomery modular multiplication in a faster way. In this paper, we are introducing a new area reducing architecture for Montgomery modular multiplier. The remainder of this paper is organized as follows: Section II briefly reviews previous algorithms and their architectures for performing Montgomery multiplication. In section III, the mathematics behind the Montgomery algorithm is explained in detail. Then we have introduced a new area efficient Montgomery algorithm in section IV. Section V gives a detailed comparison between proposed method and the previous CSA based modular multiplier. Finally we concluded this paper in section VI.
II. PREVIOUS METHODOLOGIES
Montgomery multiplication algorithm is the most efficient algorithm available. The main advantage of Montgomery algorithm is that it replaces the division operation with shift operations. During two decades many alternative forms of Montgomery algorithms are introduced. These architectures use carry save addition. The work in [5] and [13] presented two types of Montgomery algorithms which use Carry Save Adder (CSA). One of the two types used four-to-two CSA and the other used five-to-CSA. They had given a brief comparison between these two versions of Montgomery multipliers. They had found that the multiplier using four-to-two CSA architecture has shorter critical path than that of five-to-two CSA multiplier. But extra storage elements and multiplexers are required for 4-to-2 architecture which probably increases the energy consumption. The work in [6] proposed a Montgomery multiplication algorithm using pipelined carry save addition to
RESEARCH ARTICLE OPEN ACCESS
2. Anizha Radhakrishnan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 9( Version 1), September 2014, pp.210-214
www.ijera.com 211 | P a g e
shorten the critical path delay of five-to-two CSA. This method also required additional pipeline registers and multiplexers which will increase the area. Ming Der Shieh presented [7] a new algorithm for high speed modular multiplication. This new Montgomery multiplier performs modular reduction in a pipelined fashion, so that the critical path delay is reduced from the four-to-two to three-to-two carry- save addition. Then it requires additional pipeline registers to store intermediate values. All the above works were not discussed about the energy consumption. Several previous works [8], [9] have developed techniques to reduce the power/energy consumption of Montgomery multipliers. In [8], some latches named glitch blockers are located at the outputs of some circuit modules to reduce the spurious transitions and the expected switching activities of high fan-out signals in the radix-4 scalable Montgomery multiplier. Shiann Rong Kuang [9] tried to reduce the energy consumption of CSAs and registers in the CSA-based Montgomery multipliers via a new technique. They modified the CSA based Montgomery multiplier algorithm and as a result of this, the number of clock cycles required to complete the multiplication is largely decreased. To achieve further energy reduction, they have adjusted the internal structure of barrel register full adder and then applied the gated clock design technique. But this energy efficient algorithm increased the total area of the design. In order to reduce the area, we are presenting a new algorithm which can modify the CSA based algorithm in [9]. Before introducing the area efficient algorithm, let us see the mathematics behind the Montgomery modular multiplication.
III. MONTGOMERY MODULAR MULTIPLICATION
Modular multiplication of two integers X and Y, simply performs, Z = X . Y mod M (1) Here X, Y and M are n - bit numbers and M should be greater than X and Y. Instead of computing X. Y, the Montgomery multiplication [10] algorithm computes, Z’ = MP (X, Y, M) = X .Y. 2-n mod M (2) MP denotes Montgomery Product and sometimes the equation (2) can also be represented as shown below. Z’ = MP (X, Y, M) = X .Y. R-1 mod M (3)
Here R = 2n and R-1 is the multiplicative inverse of R mod M. That is,
R. R-1 mod M = 1 (4) In order to perform Montgomery multiplication the numbers should be converted to the Montgomery domain. The conversion to the Montgomery domain is very simple. The conversion from integer domain to Montgomery domain is shown below. X’ = X. R mod M (5) Y’ = Y. R mod M (6) Here X’ and Y’ are the Montgomery forms of X and Y. After completing the multiplication in Montgomery domain, we have to convert it back to the integer domain to obtain the final result. The conversion from the Montgomery product Z’ to Z is shown below: Z = MP (Z’, 1, M) = Z’ .1. R-1 mod M (7) These conversions are required only at the starting point and ending point. It is important to note here that to find the modular product the Montgomery algorithm does not perform any division. On the other hand, it performs just addition and bit shift operation. The pseudo code [11], [12] for finding Montgomery product S is given below. Algorithm 1 MP1(X,Y,M) { Initially S = 0; for i = 0 to n-1 { S = S + Xi . Y ; If S is odd then S = S + M ; S = S/2; } If S ≥ M then S = S – M; } Here we can see that algorithm requires division by 2. Division by 2 is equivalent to shifting the bits to right which is fast and easy in hardware. Thus we can avoid the division. Only requirement is to perform simple conversion from Montgomery domain done before and after multiplication.
IV. PROPOSED METHODOLOGY
The work in [9], proposed an energy-efficient four-to-two CSA based Montgomery algorithm. While reducing energy, area of the multiplier is slightly increased. In this paper, we propose an area efficient, fast and low power algorithm called Double Add Reduce (DAR) algorithm which can replace the CSA based multiplier.
3. Anizha Radhakrishnan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 9( Version 1), September 2014, pp.210-214
www.ijera.com 212 | P a g e
Fig.1. Block diagram of multiplier based on DAR algorithm Figure 1 shows the block diagram of Montgomery modular multiplier. Here main block is a double add reduce block.
4.1 Double Add Reduce (DAR) algorithm
The DAR block computes Montgomery product using DAR algorithm which is shown below: Algorithm 2 MP2(X,Y,M) { S-1 = 0; Y = 2 x Y; For i=0 to n { qi = Si-1 mod 2; (LSB of Si-1) Si = (Si-1 + qi.M + Xi.Y) / 2; } Return Sn; } The multiplicand X is shifted up by 1 bit to simplify the calculation of qi. This implies that the result is a factor of 2 too large, and so the for loop must be executed an additional time to eliminate this factor.
4.2 Architecture
The architecture based on DAR algorithm is shown in figure 2. That is, the algorithm can be executed with the help of two adders. To remove the dependency of qi on the addition of Xi.Y and Si-1, Y can be shifted up 1 bit, thus forcing the LSB of Si-1 + Xi. Y to always be zero, as illustrated in figure 2. Fig.2. Architecture based on DAR algorithm
However, this extra factor of 2 must be removed by an extra iteration, meaning that the number of clock cycles is equal to n+1.
V. RESULT ANALYSIS
In order to verify the area efficiency, we have captured our new design in VHDL and implemented into the Xilinx Spartan 3E series of FPGA. For performance comparison, we have also captured the CSA based multiplier design in VHDL and implemented into the FPGA. Following section gives the performance results of these implementations.
5.1 Detailed area analysis
During implementation, we have generated device utilization report using Xilinx ISE 8.1i software tool. The design summary report generated by the Xilinx ISE software tool can be used to analyze the area utilized by the two versions of the Montgomery modular multiplier. Table I gives a detailed area analysis of the two implementations.
TABLE I. AREA ANALYSIS OF THE TWO IMPLEMENTATIONS
CSA based modular multiplier (8- bit)
DAR based modular multiplier (8- bit)
Number of slice flipflop used
64
23
Number of 4 input LUTs used
144
60
Total equivalent gate count for design
1527
595
Table I shows that proposed DAR based modular multiplier (8- bit) considerably reduces the total area used by the design.
4. Anizha Radhakrishnan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 9( Version 1), September 2014, pp.210-214
www.ijera.com 213 | P a g e
5.2 Timing and power analysis
From the simulation results of both CSA multiplier (8 bit) and DAR multiplier (8 bit), we have seen that after applying inputs the later one takes less time for producing the output. For same inputs Modified multiplier gives the result in less time. From the Figure 3, CSA based multiplier takes 2.185 μs for generating the output while from the Figure 4, DAR based multiplier takes only 1.745 μs. After applying inputs, DAR based modular multiplier takes less time to produce output. Fig.3. Simulation result for 4-to-2 CSA based modular multiplier Fig.4. Simulation result for DAR based modular multiplier The power summary provides information about the usage of voltage and other power resources for your entire design or for a single design module. During synthesis we have generated the power data choosing device type Spartan 3 and package tq144. These two multipliers consumes almost same power (18mW).
VI. CONCLUSION
This paper presented an efficient algorithm and its corresponding architecture to reduce the area without affecting the speed of a Montgomery modular multiplier. Experimental results showed that the proposed method is indeed capable of reducing the area of 8 bit Montgomery modular multiplier up to 61% compared to the previous CSA based design. In addition to this, our new design increased the speed to 20% and having almost same power consumption. Here we cannot observe significant change in power consumptions of the two versions of multipliers. In future, we will try to develop efficient architecture to carry out fast modular multiplication with low energy consumption.
VII. ACKNOWLEDGMENT
We would like to express our sincere gratitude to the Management and staff, SNG College of Engineering for providing the facilities.
REFERENCES
[1] Taek-Won Kwon, Chang-Seok You,‖ Two implementation methods of a 1024-bit RSA Cryptoprocessor based on modified Montgomery algorithm‖, 0-7803-6685- 9/01/$10.00 2001 IEEE.
[2] Andre Weimerskirch and Christof Paar, ―Generalizations of the Karatsuba Algorithm for Efficient Implementations‖, IEEE Int.Conf. Comput. Design, 2005.
[3] Antoon Bosselaers, Rene Govaerts and Joos Vandewalle,‖ Comparison of three modular reduction functions‖, ° Springer-Verlag 1993 oct 25.
[4] Peter L. Montgomery, ―Modular multiplication without trivial division”, Mathematics of computation, Vol.44, No. 170. (Apr 1985) ,pp. 519-522.
[5] C. McIvor, M. McLoone, and J. V. McCanny, ―Modified Montgomery modular multiplication and RSA exponentiation techniques,‖ IEE Proc. Comput. Digit. Tech., vol. 151, no. 6, pp. 402–408, Nov. 2004.
[6] K. Manochehri and S. Pourmozafari, ―Fast Montgomery modular multiplication by pipelined CSA architecture,‖ in Proc. IEEE Int. Conf. Microelectron., Dec. 2004, pp. 144–147.
[7] M.-D. Shieh, J.-H. Chen, W.-C. Lin, and H.- H.Wu, ―A new algorithm for high-speed modular multiplication design,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 9, pp. 2009–2019, Sep. 2009.
[8] H.-K. Son and S.-G. Oh, ―Design and implementation of scalable low-power Montgomery multiplier,‖ in Proc. IEEE Int. Conf. Comput. Design, 2004, pp. 524–531.
[9] Shiann-Rong Kuang, Jiun-Ping Wang,‖ Energy-Efficient High-Throughput Montgomery Modular Multipliers for RSA Cryptosystems‖, IEEE Transactions on very large scale integration (VLSI) systems, vol. 21, no. 11, november 2013.
[10] J. C. Neto, A. F. Tenca, and W. V. Ruggiero, ―A parallel k-partition method to perform Montgomery multiplication,‖ in Proc. IEEE Int. Conf. Appl.-Specif. Syst., Arch. Process., Sep. 2011, pp. 251–254.