Design of an area efficient million-bit integer multiplier using double modulus ntt

NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Design of an Area-Efficient Million-Bit Integer Multiplier Using Double
Modulus NTT
Abstract:
This brief proposes a double modulus number theoretical transform (NTT) method for
million-bit integer multiplication in fully homomorphic encryption. In our method, each
NTT point is processed simultaneously under two moduli, and the final result is
generated through the Chinese reminder theorem. The employment of double modulus
enlarges the permitted NTT sample size from 24 to 32 bits and thus improves the
transform efficiency. Based on the proposed double modulus method, we accomplish a
VLSI design of million-bit integer multiplier with the Schönhage–Strassen algorithm.
Implementation results on Altera Stratix-V FPGA show that this brief is able to compute
a product of two 1024k-bit integers every 4.9 ms at the cost of only 7.9k ALUTs and 3.6k
registers, which is more area-efficient when compared with the current competitors.
Software Implementation:
 Modelsim
 Xilinx 14.2
Existing System:
FULLY homomorphic encryption (FHE) proposed by Gentry has drawn much attention
due to its great significance to cloud computing. Currently, there are three major kinds of
FHE schemes, Gentry–Halevi lattice-based schemes, integer-based schemes, and
schemes based on learning with errors problem. In both the Gentry–Halevi FHE schemes
and integer–based FHE schemes, large integer multiplication is the most computationally
intensive operation. The high timing and area cost of large integer multiplier has been a
major restrictive factor in the feasibility of such FHE schemes. So far, a lot of studies,

NXFEE INNOVATION
_________________________________________________________________
have been conducted on the design of large integer multiplier. To the best of our
knowledge, first implemented million-bit multiplication for FHE on graphics processing
unit platform. Then realized a million-bit multiplier in custom hardware employing fast
Fourier transform (FFT)-based recursive multiplication algorithm.
After that, an architecture design of a 768k-bit multiplier on the Stratix-V FPGA based
on 4k-point and 16-point combined FFT blocks. Then the proposed a supersize integer-
FFT multiplication architecture with different choices of number theoretical transform
(NTT) modulus. A VLSI design of a 768k-bit integer-FFT multiplier based on a memory
in-place architecture. A low-latency architecture and a low Hamming weight architecture
for large integer multiplication. However, most of the previous works focus on reducing
the multiplication time but give little consideration to the area efficiency.
Area efficiency is also quite important, because high area cost implementations generally
require a high-end field-programmable gate array (FPGA) platform or a high gate count
ASIC platform, both of which are too costly for practical applications. The objective of
this brief is to design a fast million-bit integer multiplier without compromising its area
efficiency. Our contributions are as follows: 1) we propose a double modulus NTT
method to enlarge the permitted word size of NTT from 24 to 32 bits; 2) a decimation-
infrequency (DIF) and decimation-in-time (DIT) hybrid NTT approach is introduced to
eliminate the reorder operations in hardware NTT; and 3) the VLSI architecture of an
area-efficient million-bit integer multiplier is designed and verified on Altera Stratix-V
FPGA.
Disadvantages:
 Area Efficiency is lower
 Transform efficiency is lower
Proposed System:

NXFEE INNOVATION
_________________________________________________________________
Our literature review has revealed that existing multiplier schemes are lacking some of
the essential features we need for FHE schemes. A few references propose FFT based
implementations of large integer multiplication techniques. These techniques exploit the
convolution theorem to gain significant speedup over other multiplication techniques. To
overcome the inefficiencies experienced and the noise problem experienced. we chose to
develop an application specific custom architecture based on the NTT transform.

NXFEE INNOVATION
_________________________________________________________________
Fig. 1. A part of fully parallel circuit realizing the Schonhage-Strassen ¨ algorithm
The Schonhage-Strassen algorithm is the best option for ¨ large integer multiplications.
The algorithm has an asymptotic complexity of O(N log N log log N). Given the size of
our operands the Schonhage-Strassen algorithm is perfectly suited ¨ to fulfill our
performance needs. Another advantage of the algorithm is that it lends itself to a high
degree of parallelization. A part of high level illustration of the parallel realization of the
algorithm for short operands is shown in Figure 1. In the figure, shown registers are the
same registers which are shown
separately for ease of view. We can add more reconstruction units in parallel with factor
of the digit size. In our implementation of the Schonhage-Strassen algorithm ¨ we have 3
× 215 digits and we are computing block sizes starting from 12 the architecture provides
ample opportunities for parallelization. However, there are limits on how far we go
exploiting the parallelism in a hardware realization. There are two reasons of this: The
first reason for this is, the stage reconstruction computations are simple operations and
takes few clock cycles, whereas we have huge data size to process.
To overcome this obstacle we need an architecture that can handle a higher data
bandwidth with rapid data access to perform calculations in parallel. Otherwise just by
increasing the computational blocks while keeping restricted (low) bandwidth we will not
be able to improve the performance. As for the second reason, in the algorithm the index
range of the dependent digits double in each stage which requires significantly more
routing on the digits.
As the number of computational blocks and the stage number increases, too many
collisions and overlaps occur in the routing for the VLSI design tools to handle. Having a
well designed high capacity cache is the key to achieving high performance large integer
multiplier. Furthermore, the cache must be tailored to match of the computational
elements. In our design we chose to incorporate m = 4 computation units, so that we can

NXFEE INNOVATION
_________________________________________________________________
have some performance boost and also avoid the routing problems that might occur. Even
with m = 4 the bus width reaches 512-bits. In order to supply the computation blocks
with sufficient data, we have chosen the cache size of the architecture as N = 3 × 215 =
98304. Moreover, to enable parallel reads without impeding the bandwidth, we divided
the cache into 2 × m sub-caches. It is possible to incorporate more computation units, but
it will also make the control logic harder to design and create additional routing
complications.
The VLSI design is captured by Verilog HDL. Only the Omega ROM in Fig. 2 is
implemented with FPGA block memory. The other three RAMs, namely NTT_RAM_A,
NTT_RAM_B, and INTT_RAM_C, are considered as external RAMs. This is feasible,
because the performance of our pipeline.

NXFEE INNOVATION
_________________________________________________________________
Fig. 2. Pipeline CRT unit architecture.
structure relies a little on the read or write latency of those three RAMs. Our million bit
multiplier occupies 7.9k ALUTs, 3.6k registers, 40 DSP blocks, and 5.3-Mb block
memory. The simulation result for million-bit multiplier is validated against a software
implementation using NTL library. Simulation clock cycles for each of the six operations
(namely data input, NTT, multiplication of NTT results, INTT, CRT, and data output)
can be found.

NXFEE INNOVATION
_________________________________________________________________
In total, our multiplier can achieve one single 1024k-bit multiplication in 1.6 million
clock cycles, which equals 9.7 ms at the maximum frequency of 170 MHz. Consider that
the NTT and INTT in our design are two-stage pipelined, our design is able to compute a
product of two 1024k-bit integers every 4.9 ms. Both of the two designs perform 64k-
point finite-field FFT using a 16-point NTT engine, which is faster than original two-
point NTT engine but utilizes much more area due to its wider data bus and more
complicate memory architecture. Our design employs two-point NTT engines, and
accelerates NTT through the proposed double modulus method. Despite that our
multiplication size is 1.3 times larger (1024k versus 768k), we still achieve a 74%
reduced area time product (ATP) and a 22% reduced ATP. That means our multiplier is
more ALUT-efficient.
Advantages:
 Area is efficient
 Transform efficiency is improved
References:
[1] C. Gentry, ―Fully homomorphic encryption using ideal lattices,‖ in Proc. STOC, vol. 9. 2009, pp.
169–178.
[2] C. Gentry and S. Halevi, ―Implementing gentry’s fully-homomorphic encryption scheme,‖ in
Advances in Cryptology—EUROCRYPT, Berlin, Germany: Springer, 2011, pp. 129–148.
[3] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, ―Fully homomorphic encryption over the
integers,‖ in Advances in Cryptology— EUROCRYPT. Berlin, Germany: Springer, 2010, pp. 24–43.
[4] Z. Brakerski, C. Gentry, and V. Vaikuntanathan, ―(Leveled) fully homomorphic encryption without
bootstrapping,‖ in Proc. 3rd Innov. Theor. Comput. Sci. Conf., 2012, pp. 309–325.
[5] W. Wang, Y. Hu, L. Chen, X. Huang, and B. Sunar, ―Accelerating fully homomorphic encryption
using GPU,‖ in Proc. IEEE Conf. High Perform. Extreme Comput. (HPEC), Sep. 2012, pp. 1–5.

NXFEE INNOVATION
_________________________________________________________________
[6] Y. Doröz, E. Öztürk, and B. Sunar, ―Evaluating the hardware performance of a million-bit
multiplier,‖ in Proc. Euromicro Conf. Digital Syst. Design (DSD), Sep. 2013, pp. 955–962.
[7] W. Wang and X. Huang, ―FPGA implementation of a large-number multiplier for fully
homomorphic encryption,‖ in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2013, pp. 2589–2592.
[8] X. Cao, C. Moore, M. O’Neill, E. O’Sullivan, and N. Hanley, ―Accelerating fully homomorphic
encryption over the integers with super-size hardware multiplier and modular reduction,‖ in Proc. IACR
Cryptol. EPrint Arch., 2013, p. 616, 2013.
[9] W. Wang, X. Huang, N. Emmart, and C. Weems, ―VLSI design of a large-number multiplier for
fully homomorphic encryption,‖ IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 9, pp.
1879–1887, Sep. 2014.
[10] X. Cao, C. Moore, M. O’Neill, E. O’Sullivan, and N. Hanley, ―Optimised multiplication
architectures for accelerating fully homomorphic encryption,‖ IEEE Trans. Comput., vol. 65, no. 9, pp.
2794–2806, Sep. 2016.
[11] D. D. A. Schönhage and V. Strassen, ―Schnelle multiplikation groβer Zahlen,‖ Computing, vol. 7,
no. 3, pp. 281–292, 1971.
[12] A. Karatsuba and Y. Ofman, ―Multiplication of multidigit numbers on automata,‖ Soviet Phys.
Doklady, vol. 7, p. 595, Jan. 1963.

Design of an area efficient million-bit integer multiplier using double modulus ntt

More Related Content

Similar to Design of an area efficient million-bit integer multiplier using double modulus ntt

More from Nxfee Innovation

Recently uploaded

Design of an area efficient million-bit integer multiplier using double modulus ntt