Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptography Implementation Marisa W. Paryasto#1, Budi Rahardjo#2, Fajar Yuliawan*3, Intan Muchtadi-Alamsyah*4, Kuspriyanto#5 # School of Electrical Engineering and Informatics, Institut Teknologi Bandung Jl. Ganesha No. 10 Bandung 40132 - Indonesia 1 firstname.lastname@example.org 2 email@example.com 5 firstname.lastname@example.org * Algebra Research Group, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jl. Ganesha No. 10 Bandung 40132 - Indonesia 3 email@example.com 4 firstname.lastname@example.org Abstract— Abstract--- In this work we propose the use of The keylength used in ECC determines the level ofcomposite field to implement finite field multiplication, which will security. In general, longer key requires more components inbe use in ECC implementation. We use 299-bit keylength and the corresponding hardware implementation. As the internetGF((213)23) is used instead of GF(2299) . Composite field multiplier technology grows, the demand to implement ECC oncan be implemented using conventional multiplication operation or constraint devices increases. As a consequence, there is a needusing LUT (Look-Up Table). In this paper, LUT is used formultiplication in ground field and Karatsuba Offman Algorithm for efficient algorithm and architecture for implementing ECCfor the extension field multiplication. A generic architecture for the on constraint devices. For the security level requirement at themultiplier is presented. Implementation is done with VHDL with present, ECC should have above 160 bits keylength.the target device Altera DE -2. We implement 299 bits in this research. However implementing 299-bit ECC in constrained device, simulated with FPGA board, cannot be done using conventional Keywords— security, cryptography, elliptic curve, finite field, algorithm and architecture. For example, 299-bit binarymultiplier, composite field classic multiplier cannot fit in our FPGA device. One alternative solution is using composite field. Composite field is a finite field divided into subfield I. INTRODUCTION (ground field) and extended field. It is done by composing a long string of bit into smaller groups of bit. ThisElliptic Curve Cryptography (ECC) is a public-key encryption representation allows arithmetic operations to be done inthat requires high computation for solving complex arithmetic smaller chunks of string so the complex operations can beoperations. Elliptic curve is used in cryptography because of broke down into simpler operations. The focus of our paper isits mathematical properties that fits the encryption process. implementation of multiplier using composite field. We focusElliptic curve has its own arithmetic operations, very specific on multiplier because multiplication is the most frequentlyand unpredictable, makes it cryptographically strong and used operation in ECC. The benefit of using composite field isbecomes the most preferable cryptography algorithm to the lower of memory usage and lower components.replace RSA. Unfortunately, to implement ECC requires The use of composite field characteristic of dividing onesophisticated mathematical skills. There are many layers, big chunk of operation into smaller ones, combined with therestrictions, and combinations that make various ECC recursive Karatsuba Offman Algorithm (KOA), allows us toimplementations difficult to compare. Every level of ECC implement multiplier for ECC in limited resources withoffers many things to explore. In this work we focus on the adequate level of security. Other multipliers are limited up tolowest level: finite field operations. Since multiplication is the 260 bits.most frequently used operations, we investigate existingmultipliers algoritms and make improvements. II. PREVIOUS WORK
The first idea multiplier for composite field was initiated by the standard alog table. It contains the values (k, gk) sortedMastrovito . The multiplier is called hybrid-multiplier. It with repect to the index k , where k = 0, 1, 2, . . .,2n+1-2 . Sinceperforms the multiplication by doing multiplication serially in the values of i and j in step 1 and 2 of the multiplications arethe ground field and parallel in the extension field. in the range 0, 2n-1 , the range of k = i + j is 0, 2n+1-2 . ThusMastrovitos multiplier basically works using a multiplication modular addition operation can be omitted and the groundmatrix that includes the reduction process. Paar in his works field multiplication operation can be simplified as follows:[2,3,4] added some improvements to Mastrovitos. Paarimplemented multiplication in the ground field using KOA 1. i := log[A]and Mastrovito for multiplication in the extension field. Later, 2. j := log[B]Rosner  conducted further research of Mastrovito and Paar. 3. k := i + j 4. C := extended-alog[k] Look-Up Table (LUT) for composite field operations hasbeen implemented in . The algorithm for ground field 3.2 Composite Fieldmultiplication using logaritmic table lookup is proven to befast. Let GF(2k) denote a binary extension field defined over GF(2) . If the elements of the set III. METHODOLOGY 3.1 Look-Up Table are linearly independent, then B1 forms a polynomial basis for LUT is used for storing log and alog (anti log) table to GF(2k) . Given an element A in GF(2k) , it can be written asmake multiplication operation in the ground field GF(2n)perform faster.  concludes that n does not have to beexactly the same as a single computer word (e.g. 8, 16). It hasbeen proved that n < 2 is more efficient because the table willbe smaller and thus will take advantage of the first level cacheof computers. where a0 , a1 , ..., ak-1 in GF(2) are the coefficients. One of the reason why this research uses LUT for storing An extension field defined over one of its subfields isprecomputed log and alog table in GF(213) is that Table 2 in known as a composite field. It is denoted as GF((2n)m) where shows that LUT for n =13 is efficient for polynomial GF(2n) is known as the ground field over which the compositebasis multiplication compared to bigger n. To construct field is defined. For each given degree, there is only one finitelogaritmic lookup table, a primitive element g in GF(2n) is field of characteristic 2, both the binary and composite fieldsselected to be the generator of the field GF(2n) , so that every refer to the same field although their representation methodselement A in this field can be written as a power of g as A=gi , are different. To represent the elements in the composite fieldwhere 0 < i < 2n-1 . Then the powers of the primitive element GF((2n)m) , the basis that can be used isgi can be computed for i=0, 1, 2,.. , 2n-1 , and obtain 2n pairsof the form (A, i). Two tables sorting these pairs have to beconstructed in two different ways: the log table sorted with where β is the root of a degree m irreducible polinomialrespect to A and the alog table sorted with the respect i . whose coefficients are in the base field GF(2n). An element AThese tables then can be used for performing the field in GF((2n)m) can be written asmultiplication, squaring and inversion operations. Given twoelements A, B in GF(2n) , the multiplication C=AB isperformed as follows: 1. i := log[A] where a0 , a1 , ..., am-1 in GF(2n). The operations in the 2. j := log[B] ground field are carried out using pre-calculated logarithmic 3. k := i+j (mod 2n-1) lookup tables. To construct logarithmic tables, a primitive 4. C := alog[k] element in GF(2n) need to be found. The steps above is based on the fact that C = AB = gigj = The reason why we use GF((213)23) in this implementation i+j mod 2n-1g . Ground field multiplication requires three memory is because GF((213)23)=GF(2299) which complies with theaccess and a single addition operation with modulus 2n-1 . security level needed. The other reasons is that GCD(13,23)=1 so irreducible polynomial can be used for  also proposed the use of the extended alog table for both ground field and extension field, and there are trinomialseliminating modular addition operation (step 3). The extended and polynomials available. Carefully chosen irreduciblealog table is 2n+1-1 long, which is about twice the length of
polynomial will reduce the complexity of multiplication algorithm eventually terminates after t steps. In the final stepsoperation. the polynomials M(t)(x) are degenarated into single coefficients. Since every step halves the number of coefficients, the algorithm terminates after t=log2 m steps. 3.3 Karatsuba-Offman Algorithm Multiplier IV. DESIGN AND IMPLEMENTATION The Karatsuba-Offman Algorithm (KOA) is a recursive The generic architecture of our circuit is shown in themethod for efficient polynomial multiplication. It is known in Figure 1. On the left side there are two set of input registers, that two arbitrary polynomials in one variable of degree each for input A and B. The size of the input registers dependsless or equal to m-1 with coefficient from a field GF(2m) can on the bit size. For our particular case, it is 13-bit. If we usebe multiplied with not more than m2 multiplications in GF(2m) off the shelf components, we may have to use 16-bit registers.and (m-1)2 additions in GF(2m). The KOA provides arecursive algorithm which reduces the above multiplicativeand additive (for large enough m) complexities. A KOArestricted to polynomials, where m=2t with t an integer ispresented in . Let a(x) and b(x) be two elements in GF(2m). Find aproduct d(x) = a(x)b(x) , with degree < 2m – 2. Both elementscan be represented in the polynomial basis as Fig.1 Multiplier General ArchitectureUsing the equation above, the polynomial product is given as Right next to the input registers are temporary registers that are used to store addition terms before they are multiplied. These terms depend on the KOA splitting. For 223, we can split it into 12+11 or 8+8+7. The decision will effect the number of temporary registers (and adders needed).Auxiliary polynomials M(1)(x) are defined: In the center of our circuit is the GF(213) multiplier. In this particular design we have only one multiplier, implemented as LUT multiplier. Multiplication is done in serial fashion. Figure 2 shows an estimated timing diagram, which will be implemented in the sequencer. For example, when multiplying a22 and b22, the enable lines of registers related to those element and the result register are activated. If area is permitting, we could add more multipliers. Additional multipliers will reduce the time to perform all And then the product can be defined as: multiplications at the expense of more area. Careful timing consideration must be done in order to avoid race condition is multiple multipliers are implemented. The results of multiplications are stored in temporary The algorithm becomes recursive if it is applied again to registers before they are added to create the final results. Thus,the polynomials given above. The next iteration step splits the there is a network of adders on the right side.polynomials AL , BL , AH , BH , (AL+AH) , and (BL+BH) againin half. With these newly halved polynomials, new auxiliarypolynomials M(2)(x) can be defined in a similar way. The
Fig.2 Estimated timing diagram Below is the snippets of VHDL code LUT implementingGF(213) multiplier. A 13-bit multiplier requires 213 entries ≈8000 entries, for each table. If we implement this in generalpurpose hardware then it should be implemented in 16 bit (2bytes). In our implementation, the log and alog table occupies2*8*2 bytes = 32 Kbytes . process (clk) begin if clkevent and clk = 1 then case a is when "0000000000000" => i <= "0000000000001"; when "0000000000001" => i <= "0000000000010"; when "0000000000010" => i <= "0000000000100"; when "0000000000011" => i <= "0000000001000"; when "0000000000100" => i <= "0000000010000"; when "0000000000101" => i <= "0000000100000"; … V. CONCLUSIONS We have presented a composite field multiplier using LUTand KOA. A 13-bit LUT multiplier is used in the ground field,KOA multiplier is used in the extension field. A generalarchitecture of the design is presented. REFERENCES  Mastrovito Edoardo. VLSI Architecture for Computations in Galois Fields. PhD thesis, Linkoping University, 1991.  Christof Paar. Efficient VLSI Architectures for Bit-parallel Computation in Galois Fields. PhD thesis, 1994.  Christof Paar. Fast Arithmetic Architectures for Public-Key Algorithms over Galois Fields GF((2n)m), pages 363–378. Number 1233 in Lecture Notes in Computer Science. Springer- Verlag, 1997.  Christof Paar and Peter Fleischmann. Fast arithmetic for public- key algorithms in galois fields with composite exponents. IEEE Transactions on Computers, 48(10):1025–1034, October 1999.  Martin Christopher Rosner. Elliptic curve cryptosystems on reconfigurable hardware. Master’s thesis, Worcester Polytechnic Institute, May 1998.  E. Savas and C. K. Koc. Efficient methods for composite fields arithmetic. Technical report, Oregon State University, 1999.