38 116-1-pb
Upcoming SlideShare
Loading in...5

38 116-1-pb






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

38 116-1-pb 38 116-1-pb Document Transcript

  • ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 Tripartite Modular Multiplication using Toom-Cook Multiplication Amar Mandal, Rupali Syal compute modular multiplication these method also execution Abstract— Modular multiplication is the fundamental in parallel way.operation in most public-key cryptosystem. Therefore, theefficiency of modular multiplication directly affects the The proposed modular multiplication algorithm thatefficiency of whole crypto-system. This paper presents an efficiently integrates three existing algorithms, Barrettefficient modular multiplication algorithm for large integer. modular multiplication, Montgomery modular multiplicationThe proposed algorithm integrates with three existing and Toom-Cook multiplication, this proposed algorithmalgorithm, Barrett Algorithm and Montgomery algorithm for divide into two step multiplication and modularmodular multiplication, Toom-Cook algorithm for multiplication step. Multiplication step Toom-cook algorithmmultiplication. This algorithm execution done in parallel way sothat enhance the performance. These algorithms Analysis with is used. Modular multiplication step Barrett and Montgomeryrespect to their performance and compare to other modular algorithms are used in parallel way. The proposed algorithmmultiplication algorithms. minimizes the number of single-precision multiplication Index Terms— Barrett algorithm, Bipartite modular enable more than three way parallel computation.multiplication, Karatsuba multiplication algorithm, The remainder of this paper is structured as follows.Montgomery algorithm, Toom-Cook multiplication, Tripartite Section 2 describes Barrett algorithm, Montgomerymodular multiplication. algorithm, Bipartite algorithm and Tripartite algorithm. In Section 3, our proposed algorithm is introduced. Software implementation results are introduced in Section 4. Section 5 I. INTRODUCTION concludes the paper.Public Key Cryptography (PKC) introduced by Diffie andHellman [1] in the mid-1970s. Many cryptographic protocols,such as the RSA scheme [2], ElGamal [3], Diffie-Hellman key II. RELATED WORKexchange, and DSA [4], are based on modular arithmeticoperations. These algorithms for modular multiplication are described for use with large nonnegative integers expressed in radix b The efficiency of a particular cryptosystem will depend on notation, where b can be any integer ≥ 2. Given a modulus Ma number of factors, such as parameter size, time-memory and two elements X, Y ∈ ZM where ZM is the ring of integerstradeoffs, available processing power, parallel computing, modulo M we define the ordinary modular multiplication assoftware and/or hardware optimization, and mathematical XY mod Malgorithms. An efficient implementation of this operation is Mathematical representation of X, Y and M is inputs ofthe key to high performance. A basic operation in public key modular multiplication algorithms.cryptosystems is the modular multiplication of large numbers. k 1 M i 0 mibi 0<mk-1 < b and 0 ≤ mi <b, for i=0,1,…..,k-1 This paper deals with different modular multiplication X  i 0 xibi 0<xk-1 < b and 0 ≤ xi <b, for i=0,1,…..,k-1algorithms namely Barrett algorithm [11], Montgomery k 1algorithm [9], Bipartite algorithm [5], Tripartite algorithm Y  i 0 yibi 0<yk-1 < b and 0 ≤ yi <b, for i=0,1,…..,k-1 k 1[19] and purposed algorithm. Barrett and Montgomeryalgorithms are widely used today. Barrett algorithm output inthis algorithm is (X.Y)modM and this algorithm requiredpreprocessed value. Montgomery modular multiplication These algorithms for performing the modular multiplicationalgorithm output is (X.Y)R-1modM and also required and analyze their time and space requirements. . The analysispreprocessed value . Bipartite modular multiplication is performed by counting the total number of multiplications,integrates Barrett and Montgomery method these methods additions, subtractions, and memories read and writeexecution in parallel way. Tripartite modular multiplication operations in terms of the input size parameter k. They areuse Karatsuba multiplication for multiplication of two large counted to calculate the proportion of the memory accessnumber and two efficient Barrett and Montgomery algorithms time in the total running time of the modular multiplication algorithm. In our analysis, loop establishment and index. computations are not taken into account. The space analysis Amar Mandal, Department of Computer Science and Engineering, PEC is performed by counting the total number of words used asUniversity of Technology, Chandigarh, India, the temporary space. However, the space required keeping the input and output values. Rupali Syal, Department of Information Technology, PEC University ofTechnology, Chandigarh, India, 100 All Rights Reserved © 2012 IJARCSEE
  • ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012A. BARRETT MODULAR MULTIPLICATION Montgomery reduction is isomorphic to the ordinary modular multiplication. The rationale behind the m-residueP. Barrett [11] introduced the idea of estimating the quotient transformation is the ability to perform a MontgomeryS/M, S=XY with operations that either are less expensive in reduction (XR−1)modM for 0 ≤ X<RM in almost the sametime than a multiprecision division by M (viz., 2 divisions by time as a multiplication. In this algorithm required one prea power of band a partial multiprecision multiplication), or compute value M’=-M-1can be done as a pre calculation for a given m (viz., U = b2k /M, i.e., U is a scaled estimate of the modulus’ reciprocal). MONTGOMERY MODULAR MULTIPLICATIONThe estimate q of S/M is obtained by replacing the floating ALGORITHAMpoint divisions in q       S / b 2 k t b 2 k / M  Input: X=(x[k-1],x[-2],..x[1],x[0])b , Y=(y[k-1],y[-2],..y[1],y[0])b,  by integer  bt  M=(m[k-1],m[-2],..m[1],m[0])b , M’=(m’[k-1],m’[-2],..m’[1],m’[0])b, b≥2divisions q     S / b b / M  2k t 2k Output: XYR-1 mod M ˆ .  bt  1. S=XY; 2. for (i = 0; i < k; i++) do {This estimate will never be too large and, if k<t≤2k, the error 3. ti= (Si M’0) mod b; ˆis at most two: S/M−2 ≤ q ≤S/M, for k<t≤2k. 4. S = S + tiMbi;The best choice for t, resulting in the least single precision 5. }multiplications and the smallest maximal error, is k+1, which 6. S = Sdiv bk; ˆalso was Barrett’s original choice. An estimate r for S mod M 7. if (S ≥ M) thenis then given by r=x-qm, or, as r < bk+1 (if b>2), by 8. S = S − M; ˆ r =((S)mod bk+1 −(qm)mod bk+1)mod bk+1, which means thatonce again only a partial multiprecision multiplication is This algorithm (with the slight improvement above) requiresneeded. At most two further subtractions of mare required to 2k2+k multiplications, 4k2+4k+2 additions, 6k2+7k+2 reads,obtain the correct remainder. and 2k2+5k+1 writes, including the final multi-precision subtraction, and uses k + 3 words of memory space. The Montgomery representation of an integer X, denoted byBARRETT MODULAR MULTIPLICATION XMont, can be computed by performing a MontgomeryALGORITHAM multiplication on X and R2, denoted by MontM(X,R2),Input: X=(x[k-1],x[-2],..x[1],x[0])b, resulting in XMont = MontM(X,R2) = (X·R2 ·R−1) mod M =Y=(y[k-1],y[-2],..y[1],y[0])b, (X·R) mod M. After computing the MontgomeryM=(m[k-1],m[-2],..m[1],m[0])b , multiplication of two operands in MontgomeryU=(u[k-1],u[-2],..u[1],u[0])b , b≥2 representation, the result is also in MontgomeryOutput: XY mod M representation and can be converted back by multiplication 1. S=XY; with R−1, which comes down to Montgomery multiplication 2. q = ((S div bk−1)U) div bk+1; with 1.Computation of the result: T = MontM(XMont, Y ) = (X · R · Y · R−1) mod M 3. S = Smod bk+1 − (qM) mod bk+1; = (X · Y ) mod M . 4. if (S < 0) then This means that two Montgomery multiplications are needed 5. S = S + bk+1; for one modular multiplication. That is why the use of 6. while (S ≥ M) do Montgomery multiplication is only interesting when many 7. S = S − M; consecutive modular multiplications need to be performed.This Algorithm requires 3k2 multiplications, 6k2+k+1 C. BIPARTITE MODULAR MULTIPLICATIONadditions, 9k2+2k+2 reads, 3k2+4k writes and uses2k+1words of memory space. In Bipartite Modular multiplication both Barrett and Montgomery algorithms are used in this algorithm X is dividing in to two parts upper parts calculate usingB. MONTGOMERY MODULAR MULTIPLICATION Montgomery algorithm and lower part calculate using BarrettThe Let R>M be an integer relatively prime to M such that algorithm .The bipartite algorithm was introduced for thecomputations modulo R are easy to process: R = bk . Notice purpose of a two-way parallel computation [6]. It uses twothat the condition gcd(M, b)=1 means that this method cannot custom modular multipliers, a Barrett modular multiplier andbe used for all moduli. In case b is a power of 2, it simply a Montgomery multiplier, in order to improve the speed. Bymeans that m should be odd. The m-residue with respect to R combining a Barrett modular multiplication withof an integer X<M is defined as XRmod M. The set {XR mod Montgomery modular multiplication, it splits the operandM | 0 ≤ x<M} clearly forms a complete residue system. The multiplier into two parts and processes them in parallel,Montgomery reduction of X is defined as XR−1 mod M, increasing the calculation speed. Parallel execution of thiswhere R−1 is the inverse of R modulo m, and it is the inverse method with the help of fork() system call in Linux operatingoperation of the m-residue transformation. It can be shown system. The calculation is performed using Montgomerythat the multiplication of two m-residues followed by residues defined by a modulus M and a Montgomery radix R, 101 All Rights Reserved © 2012 IJARCSEE
  • ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012R < M. Next, we outline the main idea of the bipartite 11/4k2 +k multiplications, 11/2k2+21/2k+8 additions,algorithm. Let R = bl for some 0 < l < k. Consider the 31/4k2+16k+10 reads, and 3/2k2+25/2k+6 writes 7k+5multiplier Y to be split into two parts Y1 and Y0 so that Y = subtraction and first multiplication step required 3/4k2Y1R + Y0. Then, the Montgomery multiplication modulo M multiplications, k2+2k additions, 9/4k2+4k reads, andof the integers X and Y can be computed as follows: 3/2k2+7/2k-4 writes k subtraction. The all above algorithms XYR-1 mod M they are slightly more number of operations in read, write, = X(Y1R + Y0)R-1 mod M multiplication, subtraction and addition but this algorithm =((XY1 mod M) + (XY0R-1 mod M))mod M computes parallel way first and second parts using Barrett = BarrettM(X,Y) +MontM(X,Y) algorithm third term using Montgomery algorithm, modularlet l =(k/2) than This algorithm (with the slight improvement multiplication step of This algorithm execution in parallelabove) requires (5/2k2 ) multiplications, (5k2+5/2k+1) way so that time consuming is less than other algorithm.additions, (15/2k2+19/2k+5) reads, and (5/2k2+17/2k+3)writes 5k+3 subtraction, and uses 2k + 1 words of memoryspace. In this algorithm use both Montgomery and Barrett III. THE PROPOSED MODULAR MULTIPLICATIONmethods execution in parallel way so that enhance the speed.D. TRIPARTITE MODULAR MULTIPLICATION The proposed modular multiplication algorithm dividesTripartite modular multiplication algorithm divides into two into two step multiplication and modular multiplication.step multiplication and modular multiplication, first Multiplication step using Toom-Cook algorithm and split inmultiplication step using Karatsuba algorithm and split in five parts and in modular multiplication step compute ofthree parts and second part modular multiplication part these five parts by using Barrett and Montgomery modularexecution of these three parts by using Barrett and multiplication in parallel way.Montgomery modular multiplication in parallel way. Multiplication step computes by Toom-CookThe first multiplication step computes by Karatsuba multiplication algorithm. Given two large integers, X and Y,algorithm. The Karatsuba algorithm is a fast multiplication Toom–Cook splits up X and Y into t smaller parts each ofalgorithm. It reduces the multiplication of two k-digit length l, and performs operations on the parts, Toom-3 isnumbers to at most 3k log2 3  3k 1.585 single-digit only a single instance of the Toom–Cook algorithm, log 3multiplications in general (and exactly k 2 when k is a where t = 3.Toom-3 reduces 9 multiplications to 5, and runs in Θ(nlog(5)/log(3)), about Θ(n1.465). In general, Toom-t runs inpower of 2). It is therefore faster than the classical algorithm,which requires k2 single-digit products. Θ(c(t) ne), where e = log(2t − 1) / log(t), ne is the time spentThe basic step of Karatsubas algorithm is a formula that on sub-multiplications, and c is the time spent on additionsallows us to compute the product of two large numbers X and multiplication by small constants. The Karatsubaand Y using three multiplications of smaller numbers, each algorithm is a special case of Toom–Cook, where the numberwith about half as many digits as X or Y, plus some additions is split into two smaller ones. It reduces 4 multiplications to 3and digit shifts. Let X and Y are represented as n-digit and so operates at Θ(nlog(3)/log(2)), which is about Θ(n1.585).strings in some base B. For any positive integer l less than k, Ordinary long multiplication is equivalent to Toom-1, withone can split the two given numbers as follows complexity Θ(n2).R=Bl In a typical large integer implementation, each integer isX=X1R+X0 represented as a sequence of digits in positional notation,Y=Y1R+Y0 with the base or radix set to some (typically large) value b, (inWhere X0 and Y0 are less than R. The product is then a computer implementation, b would typically be a power ofXY=( X1R+X0)( Y1R+Y0) =Z2R2 +Z1R+Z0 2 instead). Say the two integers being multiplied are: TheWhere base B = bi, such that the number of digits of both m and n inZ2=X1Y1 base B is at most t (e.g., 3 in Toom-3). ThenZ0=X0Y0 separate m and n into their base B digits mi, ni: Then useZ1=X1Y0+X0Y1=(X1+X0) (Y1+Y0)-Z2-Z0 these digits as coefficients in degree t−1Karatsuba observed that XY can be computed in only three polynomials p and q, with the property that p(B)multiplications, and few extra additions: = m and q(B) = n:First step multiplication of number and splitting three parts p(x)=m2x2+m1x+m0Z0, Z1, Z2 Modular multiplication step compute this three q(x)=n2x2+n1x+n0parts as follows.(XYR-1)mod M The purpose of defining these polynomials is that if = (Z2R2 +Z1R+Z0)R-1 mod M =(Z2R)mod M +Z1mod M +Z0R-1 mod M compute their product r(x) = p(x)q(x), our answer will =(Z2R)mod M +Z1mod M+X0Y0R-1 mod M be r(B) = m×n. In the case where the numbers being =BarrettM(Z2,R) + BarrettM(Z1,1) multiplied are of different sizes, its useful to use different +MontM(X0,Y0) values of t for m and n, which well call tm and tn. The To obtain a high-speed implementation, one can number of elementary operations (addition/subtraction) cancompute these three different terms in parallel. We take l=k/2 be reduced. Executed here over the first operandfor calculation. Modular multiplication step two Barrett (polynomial p) of the running example is the following:methods and one Montgomery requires. this step requires p0=m0+m2 102 All Rights Reserved © 2012 IJARCSEE
  • ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 p(0)=m0 p(1)=p0+m1 p(-1)=p0+m1 IV. RESULT p(-2)= (p(−1) + m2)×2 − m0 Use After Software performance; Execution times for the p(∞)=m2 modular multiplication of a 2k-digit number modulo a k-digit This sequence requires five addition/subtraction modulus M for the five modular multiplication algorithmsoperations, one less than the straightforward evaluation. In compared to the execution time of a k × k-digit multiplicationpractical implementations, as the operands become smaller, (b = 23, on a 1.73 GHz Intel Celeron R based PC with gccthe algorithm will switch to the Schoolbook long 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)).multiplication. Letting r be the product polynomial: RESULT Lenth Times in milliseconds of M in r(0)=p(0)q(0) bits Barre Montgomery Bipartite Tripartite Proposed tt (Karatsuba) Algorithm r(1)=p(1)q(1) 128 8 9 22 30 47 r(-1)=p(-1)q(-1) 256 42 50 11 50 52 r(-2)=p(-2)q(-2) 512 61 59 20 58 58 r(∞)=p(∞)q(∞) 1024 94 105 89 60 74 A difficult design challenge in Toom–Cook is to find an 2048 173 143 153 120 91efficient sequence of operations to compute this product; onesequence given by Bodrato[14] for Toom-3 is the following. r0=r(0) These observations are confirmed by a software r4=r(∞) implementation of these algorithms, see in Table. The r3=(r(−2) − r(1))/3 implementation is written in ANSI C [4] and hence should be r1=(r(1) − r(−1))/2 portable to any computer for which an implementation of the r2=r(−1) − r(0) ANSI C standard exists. All figures in this article are r3=(r2 − r3)/2 + 2r(∞) obtained on a 1.73 GHz Intel Celeron R based PC using the r2=r2 + r1 − r(∞) 32-bit compiler gcc 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4). r1=r1 − r3 Parallel execution of Bipartite, Tripartite and Proposed now product polynomial r, Algorithm with the help of fork() system call in Linux r(x)=r0+r1x+r2x2+r3x3+r4x4 operating system. Finally, evaluate r(B) to obtain our final answer. This isstraightforward since B is a power of b and so themultiplications by powers of B are all shifts by a whole V. CONCLUSIONSnumber of digits in base b. r(B)=r0+r1B+r2B2+r3B3+r4B4 This paper discusses various algorithms for modular First step multiplication of number and splitting parts, multiplication of large numbers and evaluated them withModular multiplication step compute these parts as follows. respect to their accuracy, computation performance and let B=R efficiency. Each algorithm has its own features suitable for a r(R)= r0+r1R+r2R2+r3R3+r4R4 specific field of application. No single algorithm provides a perfect solution to meet all demands; depending on the r(R)R-2 = (r0+r1R+r2R2+r3R3+r4R4)R-2 environment in which computation are to be performed, one = r0R-2+r1R-1+r2+r3R1+r4R2 algorithm may be preferable over another. Barrett and Modular multiplication with M in both sides as follows: Montgomery Algorithm are efficient for smaller modular (r(R)R-2)modM= (r0R-2+r1R-1+r2+r3R1+r4R2)modM multiplication but for large modular multiplication tripartite =MontM(MontM(r0,1),1)+MontM(r1,1)+ and proposed Algorithm are efficient as shown in result.BarrettM(r2,1)+BarrettM(r3,R)+BarrettM(r4,R2) The future work would be to use the Schönhage–Strassen To obtain a high-speed implementation, one can compute algorithm for the multiplication step instead of Toom-Cook’sthese five different terms in parallel. where B=bi for method in proposed algorithm.calculation of this algorithm we take i=k/3, Second modularmultiplication part having three Barrett algorithms and threeMontgomery algorithms requires 29/18k2+2k 2 2multiplications, 29/3k +11k+12 additions, 17/6k +34/3k+16reads, and 17/18k2+25/3k+10 writes 11/3k+8 subtraction. Inthe above all algorithms they are slightly difference in therenumber of operations in read, write, multiplication,subtraction and addition, this algorithm execution in parallelway so that time consuming is less than other algorithms. . 103 All Rights Reserved © 2012 IJARCSEE
  • ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 REFERENCES [1] W. Diffie and M.E. Hellman, “New Directions in Cryptography,” IEEE Trans. Information Theory, vol. IT-22, no. 6, pp. 644-654, Nov. 1976 [2] R.L. Rivest, A. Shamir, and L. Adleman, “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems,” Comm. ACM, vol. 21, no. 2, pp. 120-126, Feb. 1978. [3] T. ElGamal, “A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms,” IEEE Trans. Information Theory, vol. 31, no. 4, pp. 469-472, July 1985. [4] ANSI X9.30, Public Key Cryptography for the Financial Services Industry: Part 1: The Digital Signature Algorithm (DSA), Am. Nat’l Standards Inst., Am. Bankers Assoc., 1997. [5] Marcelo E. Kaihara and Naofumi Takagi, “Bipartite Modular Multiplication Method” IEEE Transactions on Computers, vol. 57, no. 2, pp. 157-164, Feb. 2008 [6] M. E. Kaihara and N. Takagi. Bipartite Modular Multiplication. In J. R. Rao and B. Sunar, editors, Proceedings of 7th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), number 3659 in Lecture Notes in Computer Science. Springer-Verlag, 2005 [7] G.R. Blakley, “A Computer Algorithm for Calculating the Product AB Modulo M,” IEEE Trans. Computers, vol. 32, no. 5, pp. 497-500, May 1983. [8] E.F. Brickell, “A Fast Modular Multiplication Algorithm with Application to Two Key Cryptography,” Advances in Cryptology Proc. CRYPTO ’82, pp. 51-60, 1983. [9] P.L. Montgomery, “Modular Multiplication without Trial Division,” Math. Computation, vol. 44, no. 170, pp. 519-521, Apr. 1985.[10] K.R. Sloan, “Comments on a Computer Algorithm for Calculating the Product AB Modulo M,” IEEE Trans. Computers, vol. 34, no. 3, pp. 290-292, Mar. 1985.[11] P.D. Barrett, “Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,” Advances in Cryptology, Proc. Crypto’86, LNCS 263, A.M. Odlyzko, Ed., Springer-Verlag, pp. 311–323, 1987.[12] Bosselaers, R. Govaerts, and J. Vandewalle, "Comparison of Three Modular Reduction Functions," Proc. CRYPTO93, pp.175-186.[13] Menezes, J., van Oorschot, P. C., and Vanstone, S. A., "Handbook of Applied Cryptology," chapter 14.3.3, pp. 603-604.[14] Marco Bodrato. Towards Optimal Toom–Cook Multiplication for Univariate and Multivariate Polynomials in Characteristic 2 and 0. In WAIFI07 proceedings, volume 4547 of LNCS, pages 116–133. June 21–22, 2007[15] A. Toom. The Complexity of a Scheme of Functional Elements Realizing the Multiplication of Integers.Translations of Dokl. Adad. Nauk. SSSR, 3, 1963.[16] A. Karatsuba and Y. Ofman. Multiplication of Many-Digital Numbers by Automatic Computers. Translation in Physics-Doklady, 145:595-596, 7 1963.[17] N. Koblitz. Elliptic Curve Cryptosystem. Math. Comp., 48:203-209, 1987.[18] Ç.K. Koç, T. Acar, and BS Kaliski, “Analyzing and Comparing Montgomery Multiplication Algorithms,” IEEE Micro, vol. 16, no. 3, pp. 26-33, June 1996.[19] Kazuo Sakiyama, Miroslav Knezevic, Junfeng Fan, Bart Preneel, and Ingrid Verbauwhede. Tripartite modular multiplication. Integration, 44(4):259 269, 2011[20] Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. i-hop homomorphic encryption and rerandomizable yao circuits. In Tal Rabin, editor, CRYPTO, volume 6223 of Lecture Notes in Computer Science, pages 155 172. Springer, 2010. 104 All Rights Reserved © 2012 IJARCSEE