Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cryptanalysis of the seal encryption algorithm

1,655 views

Published on

Published in: Education, Technology
  • Be the first to comment

Cryptanalysis of the seal encryption algorithm

  1. 1. 2 Cryptanalysis of the SEAL Encryption Algorithm Helena Handschuh ? Gemplus PSI 1, Place de la Mediterranee 95200 Sarcelles France Henri Gilbert France Telecom CNET PAA-TSA-SRC 38-40 Rue du General Leclerc 92131 Issy-les-Moulineaux France Abstract. SEAL was rst introduced in 1] by Rogaway and Copper- smith as a fast software- oriented encryption algorithm. It is a pseu- dorandom function which stretches a short index into a much longer pseudorandom string under control of a secret key pre-processed into internal tables. In this paper we rst describe an attack of a simpli ed version of SEAL, which provides large parts of the secret tables from ap- proximately 224 algorithm computations. As far as the original algorithm is concerned, we construct a test capable of distinguishing SEAL from a random function using approximately 230 computations. Moreover, we describe how to derive some bits of information about the secret tables. These results were con rmed by computer experiments.1 Description of the SEAL AlgorithmSEAL is a length-increasing "pseudorandom" function which maps a 32-bit stringn to an L-bit string SEAL(n) under a secret 160-bit key a. The output length Lis meant to be variable, but is generally limited to 64 kbytes. In this paper, weassume it is worth exactly 64 kbytes (214 32-bit words), but all our results couldbe obtained with a smaller output length.The key a is only used to de ne three secret tables R, S , and T . These ta-bles respectively contain 256, 256 and 512 32-bit values which are derived fromthe Secure Hash Algorithm (SHA) 2] using a as the secret key and re-indexingthe 160-bit output into 32-bit output words.SEAL is the result of the two cascaded generators shown below.? The study reported in this paper was performed while Helena Handschuh was work- ing at France Telecom-CNET.
  2. 2. n n n n 8 16 24 R4]l R 4 + 1] l R 4 + 2] l R 4 + 3] l p1 - T -+ 2 q1 - T -+ 2 p2 - T -+ 2- T -+ 2 q- 2 p3 - T -+ 2 q3 - T -+ 2 p4 - T -+ 2- T -+ 2 q- 4 n3(= n2(= n4 (= n1(= p5 - T -+ 2 q5 - T -+ 2 p6 - T -+ 2- T -+ 2 q- 6 ? ? ? ? A0 B0 C0 D0 Fig. 1. The rst generator of SEAL Ai?1 + n1 Bi?1 Ci?1 + n2 Di?1 p1 - T -+ 2 - q1 -T - -+ 2 p1 + p- T -+ 2 2 2 - q--T- 1 + q2 2 -+ 2 p2 - + p- T - 2 3 q2 + q- T -+ p3 2 3 2 + p- T - 2 4 q-- T -+ 2 3 + q4 2 ? ? ? ? A i Bi Ci Di Fig. 2. The second generator of SEAL (ith iteration)
  3. 3. The rst generator uses a routine depending on the a-derived tables R andT depicted at gure 1. It maps the 32-bit string n and the 6-bit counter l tofour 32-bit words A0 , B 0 , C 0, D0 and another four 32-bit words n1 , n2 , n3, n4.These eight words are to be used by the second generator.The second generator uses a routine depending on the a-derived tables de-picted at gure 2. There are 64 iterations of this routine, indexed by i =1 to 64. A0 B 0 C 0D0 serves as an input to the rst iteration, producing anA1B 1 C 1D1 block. For the next iterations, the input block is alternately (Ai?1 +n1; B i?1 ; C i?1+n2; Di?1) for even i values and (Ai?1 +n3; B i?1 ; C i?1+n4; Di?1)for odd i values. At iteration i, the output block (Y1i , Y2i , Y3i, Y4i ) is deducedfrom the intermediate block (Ai ,B i ,C i,Di ) using the a-derived table S as shownbelow in gure 3. Ai Bi Ci Di S 4i ? 1] -L S 4i ? 4] - + S 4i ? 3] -L S 4i ? 2] - + ? ? ? ? Y1 i Y2 i Y3 i Y4 i Fig. 3. Deriving the generator output In the above gures : { stands for the XOR function; { 2 stands for the sum (mod 232); { stands for a right rotation of 9 bits; { N stands for a right rotation of N bits; { p1 through p4 and q1 through q4 stand for the inputs of table T obtained from the 9 bits 2 to 11 of values A, B , C and D; for instance in gure 1, p1 = A&0x7fc.Concerning the de nition of SEAL, more details can be found in 1] and in 2].The algorithm is divided into three steps. { First we compute the internal tables under the secret key a. The security of this step relies on SHA which is assumed to be highly secure. Therefore, R, S and T are pseudorandom tables. { Second we compute A0, B 0, C 0, D0 , n1, n2, n3 and n4 from n, l and table R. This is what we already called the rst generator. Let us assume the output is pseudorandom as well.
  4. 4. { Finally, the second generator computes iteratively the Ai Bi C iDi blocks, from which the Y1i , Y2i, Y3i , Y4i values are derived. We change the original notations as follows : Y1i = Ai S1 i Y2i = Bi + Si 2 Y3i = C i S3 i Y4i = Di + S i : 4 In this part we found certain weaknesses which are investigated in Sections 3 and 4.2 Preliminaries2.1 Role of mod 232 additionsAlthough the combined use of the + and operations probably strengthensSEAL as compared with a situation where only one of these operations wouldbe used, we do not believe that this represents the main ingredient of the secu-rity of SEAL, which is essentially a table- driven algorithm.As a matter of fact, any x + y sum can be written :x + y = x y c(x; y)where the carry word c(x; y) is far from being uniformly distributed thus +just introduces an additional, unbalanced term, as compared with .This remark led us to assume that replacing in SEAL (more precisely in thesecond generator of gure 2) all + operators by xors would not fundamentallymodify the nature of the algorithm, and that cryptanalytic results obtained withsuch a simpli ed version could at least partially be transposed to the real cipher.The results of our analysis of this simpli ed version of SEAL are summarised inSection 3 hereafter.2.2 The three words Di?1 , C i and Di are correlatedLet us consider the function depicted at gure 2. Given a xed value of theiteration index i (say i = 3), the input and output to this function are knownfrom the generator outputs (Y1i?1 , Y2i?1, Y3i?1, Y4i?1) and (Y1i , Y2i , Y3i , Y4i ) upto the following unknown words : { the 8 words (S1?1 , S2?1 , S3?1 , S4?1 ) and (S1, S2 , S3 , S4 ), which value does i i i i i i i i not depend upon the considered initial value (n; l). { the 2 words n1 and n2 , which value depends upon (n; l).
  5. 5. The involvement of the IV-dependent words n1 and n2 in the function consider-ably complicates the analysis of the ith iteration because of the randomisatione ect upon the input to output dependency.In order to nd statistics applicable to any IV value, we investigate how to"get rid" of any dependency in n1 and n2 in some relations induced by the equa-tion of iteration i.Let us consider the Di?1 input word and the C i and Di input words. Denote theoutput tables involved in the right part of gure 2 by : T1 = T p2], T2 = T q3]and T3 = T p4]. It is easy to establish the relation :(1) (Di?1 + T1) (C i 9 + T2 ) = (Di 18) (T3 9)This relation does not involve n1 and n2. The T1 ; T2 ; T3 terms in this relation canbe seen as three random values selected from the T table. Since there are only29 values in the T table, given any two words out of the (Di?1 , C i, Di ) triplets,there are at most 227 possible values for the third word of the triplet instead of232 if Di?1 , C i and Di were statistically independent. This gives some evidencethat the Di?1 input and the C i and Di output are statistically correlated, in away which does not depend upon n1 and n2. In other words, the SEAL generatorderives from an (n; l) initial value three slightly correlated output words Y4i?1,Y3i and Y4i.Relation (1) above represents the starting point for the various attacks reportedin Sections 3 to 5 hereafter.3 An Attack of a simpli ed version of SEALIn this Section we present an attack of the simpli ed version of SEAL obtainedby replacing in gure 2 all mod 232 additions by xors. The attack is divided intofour steps.3.1 Step 1We derive the unordered set of values of the T table, up to an unknown 32-bitconstant Di . Relation (1) above represents the starting point for this deriva-tion. After replacing + by in (1) and Di?1 ; C i; Di by X4 = Y4i , Y3 = Y3i andY4 = Y4i respectively, we obtain the relation :(2) Y4 Y3 9 X4 18 = T3 9 (T1 T2 ) 18 i 9
  6. 6. where the i constant depends upon the S table. T1 and T2 are 2 among 512values of table T . Statistically speaking, once in 29, T1 = T2 , and T1 T2 = 0. Ifwe compute 218 samples, each of the 512 values of table T i should appearonce in average.We collect the combination of the generator output words given by the leftterm of (2) for about 221 (n; l) samples. Whenever one value appears more than4 times, we assume this 21 a value of table T is i. All the other values havea probability of about 2232 to appear. This way, we nd about 490 out of 512values of table T up to one constant value.3.2 Step 2The purpose of the second step is to compute a constant i which is needed inthe third step in order to nd out statistics involving B i?1 , Di , Ai and B i (seeFig. 2.). Consider the following equation (3) established in a similar way to (2)from the relation between B i?1 and the output words :(3) Y4 9 Y2 Y1 9 X2 18 T3 18 = (T10 T20 ) 18 (T30 T40 )9 (S4 9 S2?1 18 S2 S1 9) i i i iwhere i = S4 i 9 S2?1 i 18 S2 S1 i i 9 i 18For each sample, we can nd out T3 i by searching exhaustively the rightcombination (T1 , T2 , T3 ) in equation (2). In order to save time, we compute onceand for all a table with the 218 values of (T1 T2 ) 18 and search for the rightthird value. We perform this search as well as the computation of the left termof (3) for 221 samples. Once in 218, T10 = T20 and T30 = T40 . This way the constantvalue i we are looking for appears at least 4 times.3.3 Step 3The purpose of this step is to nd out various values of n1. Once we have thesevalues, we can make out the relation between the in and outputs of table T upto one constant value. Let us consider equation (4) established from the relationbetween Ai?1 and the output words :(4)X1 18 Y1 Y4 T20 9 T40 T3 9 = n1 18 S1?1 i 18 S4 S1 i i
  7. 7. We can nd out T20 i and T 0 4 i by searching the right combination of 0 ; T2; T3; T4 ) in equation (2) using the value i we made out in step 2. For(T1 0 0 0each sample we compute, we get about 16 possibilities, as (T10 ; T20 ; T30 ; T40 ) gives236 possible values for a 32- bit word.In order to nd the right combination, let us consider two distinct iterationindexes i and j : we know that for a given l value, if i is even (or odd), we alwaysxor the same n1 (or n3 ) to the input A. Let us therefore take two rounds i andj that are both odd (or even). We need to know table T i, table T j , iand j .We collect samples of : { n1 18 i i?1 where i = S1 18 S4 S1 i; i i { n1 18 j j 9 i 9 as value T3 is found through table T i and values T 0 and T 0 through 2 4 table T j.We xor all the samples of round i with all the samples of round j . One of thesevalues is the right combination of i j ( i j ) 9Then we nd all the samples for rounds i and j of another value n1 (i.e. ofround l). We compare these two sets of samples and make out the right value ofn1 18 i .This step can be repeated various times to collect values of n1 while computingonly once the tables T and the constants .3.4 Step 4In this step we nally derive the in and outputs of table T from equation (5):(5) p1 = ((X1 n1 S1?1 )&0x7fc)=4 iIn this equation p1 is the input of table T . We have seen in the rst threesteps that we can derive the value of T1 from in and output samples of SEAL.So we nally derive several values of :Tp i] iwhere i = ((S1?1 i i 18)&0x7fc)=4.
  8. 8. 3.5 SummarySumming up the four steps we have just described, we can break the T table upto one constant value using about 2 221 samples of (n; l) for step 1, 2 221samples of (n; l) for step 2 and about 29 values of (n; l) for steps 3 and 4. Thismeans, the T table can be broken using about 224 samples of (n; l).We could probably go on breaking the simpli ed version of SEAL by ndingout sets of values (n1 , n2 , n3 , n4 ), then trying to break the rst generator and nd table R, but this is not our purpose here.4 A Test of the real version of SEALIn this Section we use some of the ideas of Vaudenays Statistical Cryptanalysisof Block Ciphers 3] to distinguish SEAL from a truely random function.4.1 2 CryptanalysisThe purpose of Vaudenays paper is to prove that statistical analysis on cipherssuch as DES may provide as e cient attacks as linear or di erential cryptanal-ysis. Statistical analysis enables to recover very low biases and a simple 2 testcan get very good results even without knowing exactly what happens inside theinner loops of the algorithm or the S- boxes.We intend to use this property to detect low biases of a certain combinationof the output words of SEAL suggested by the analysis made in Section 3 inorder to prove SEAL is far from being undistinguishable from a pseudo-randomfunction. This provides a rst test of the SEAL algorithm.4.2 Number of samples needed for the 2 test to distinguish abiased distribution from an unbiased oneWe denote by N the number of samples computed. We assume the samples aredrawn from a set of r values. We call ni the number of occurances of the ith ofthe r values among the N samples and S 2 the associated indicator : Pri=1(ni ? N )2 S =2 N r rThe 2 test compares the value of this indicator to the one an unbiased dis-tribution would be likely to provide. If the ni were drawn according to an unbi-ased multinomial distribution of parameters ( r , 1 ,..., r ), the expectation and the 1 r 1standard deviation of the S 2 estimator would be given by :
  9. 9. { E(S 2 ) ! = p? 1 r { (S 2 ) ! = 2(r ? 1)If the distribution of the ni is still multinomial but biased, say with probabilitiesp1; :::; pr, then we can compute the new expected value 0 of the S 2 : Pr N 2 0 = E ( i=1 (ni ? r ) ) = N r X N r i=1 E ((ni ? piN + pi N ? N )2 ) r rIt can be easily shown that : 0 ! + r(N ? 1) X(pi ? 1 )2 r i=1 rAn order of magnitude of the number N of samples needed by the 2 test to dis-tinguish a biased distribution from an unbiased one with substantial probabilityis given by the condition : 0?which gives us the following order of magnitude for N : p2( N r Pri=1(rp??1)r1 )2 i4.3 Model of the testLet us consider equation (2) with the real scheme (including the sums). We canrewrite it :(6)Y4 Y3 9 X4 18 = T3 9 (T1 T2 ) 18 (r1 r2 r3 )18 i 9 r4where r1 and r2 are the carry bits created by the addition of T1 and T2 , and r3and r4 the ones of the addition of S4?1 and S4 . i iWe apply the 2 test to the four leftmost bits of Y4 Y3 9 X4 18suspecting a slight bias in this expression. Without having carefully analysedthe exact distribution of the sum of the four carries, we intend to prove that itsconvolution with the biased distribution of T3 9 (T1 T2 ) 18 i 9does result in a still slightly unbalanced distribution.As we take 4-bit samples, we apply the 2 test with r ? 1 = 15 degrees offreedom. Detailed information about this test can be found in 4].
  10. 10. 4.4 ResultsWhenever we analyse at least 233 samples, the test proves with probability 10001to be wrong, that SEAL has a biased distribution. We have made several tests,and each time the value of the S 2 estimator we obtained for this order of mag-nitude of N was greater than 40 for the 4 least signi cant bits and greater than320 for the 8 least signi cant bits. In other words, the test proves that the dis-tribution is a biased one.Figure 4 shows the value of the S 2 estimator for tests made with a = 0x67452301.S 2 223 224 225 226 227 228 229 230 231 232 233 234 2354 bits 14.27 25 13.97 9.41 26.96 16.5 16.78 29.65 21.05 30.15 45.74 44.69 55.968 bits 261 293 274 238 229 227 246 225 278 313 331 378 453 Fig. 4. Results of the tests with up to 235 samples of (n; l). The former test test can be slightly improved as follows : let us denote the fourleast signi cant bits of S4 by si4 . For each of the 16 possible values of si4 , apply ithe S 2 test to the 4 or the 8 least signi cant bits of (Y4 ? si4 ) Y3 9 X4 18.The test with the correct si4 value detects a bias with 230 (n; l) values only (seeFigure 5). Note that a signi cant bias is also detected whenever the two leastsigni cant bits of si4 are correct. S 2 225 226 227 228 229 230 231 232 233 234 235 4 bits 9.45 13.56 20.62 25.59 32.77 71.90 83.63 130 250 438 928 8 bits 253 271 292 321 276 357 379 438 569 838 1520 Fig. 5. Results of the test with the correct value of the four si4 bits.4.5 Deriving rst information on SEALThe interpretation of the above improved test is straightforward : when the twoleast signi cant bits of si4 are correct, r4 is partly known and the 2 test givesmuch better results than with wrong bits of si4 . Therefore we can derive at leasttwo bits of information on table S . If the test is applied to more than the fourleftmost bits of the samples, more than 2 bits can be derived from secret tableT . Whenever these bits are right, the 2 rises much faster than for wrong values.As the evolution of the 2 indicator is quite close to a straight line when thedivergence starts, the results can be checked applying the test to 220 through232 samples. Divergence becomes obvious when about 230 samples have beencomputed.
  11. 11. 5 Deriving information on the T tableIn this Section we give some evidence that the initial step of the attack on thesimpli ed version of SEAL introduced in Section 3 can be adapted to providelarge parts of the T table for the real algorithm.Let us consider relation (6).As seen in Section 3.1, the distribution of the T3 9 (T1 T2 ) 18 i 9value at the right of (6) is unbalanced. The most frequent values are providedby the 512 T i words.On the other hand the distribution of the carry words r1 r2 r3 and r4 isalso unbalanced. More precisely, due to the fact that in any carry word r each 3bit r j ] has a 4 probability of being equal to the next bit r j + 1], the number ofinversions, i.e. j values s.t. r j ] 6= r j + 1] is likely to be small when r is a carryword or an exclusive or of carry words.Thus we can expect the 512 T i values to give rise to spread probabil-ity maxima in the distribution probability of the left term of (6).Based on 232 (n; l) values, we did the following experiment :We analysed the probabililty distribution of the 23 lowest weight bits of theleft combination of (6) in order to reduce the memory requirements. So in therest of this Section, though we do not introduce any new notation, we implicitelyrefer to 23-bit words instead of 32-bit words.For several T i values (about 25), we computed the sum of the probabil-ities of the neighbours of T i , i.e. the values of the form T i + r, where ris one of the approximately 222 23-bit words with at most 11 inversions.We computed the same sum of approximately 222 probabilities around arbi-trarily chosen values other than the 512 T i values.For more than half of the T i values, the obtained sum was larger thanall the sums associated to the arbitrarily chosen values.The complexity of the search of the T i values is quite high if an independentcomputation of a sum of 222 probabilities is made for each of the 223 candidatevalues. This approach leads to a 245 complexity, far over the computing capabil-ities of the computer we used for the experiments ; however, substantial gainsmight be achieved by reusing appropriately selected partial sums of probabilities.Thus in summary, we believe that with slightly more than 232 (n; l) values,it should be possible to recover a substantial part of the information on the T
  12. 12. table, up to an unknown constant.6 ConclusionWe have shown in some detail in Section 3 that the simpler scheme with xorsinstead of sums can be attacked with the generator output corresponding toabout 224 samples of (n; l), e.g. 218 n values and 26 l values for each n value.The test of Section 4 can be applied with about 230 samples of (n; l), e.g. 224 nvalues and 26 l values for each n value. Moreover, information about table S canbe derived from this test with 230 samples of (n; l) as well, and large amounts ofinformation contained in table T can be derived from approximately 232 samplesof (n; l) or slightly more.Despite of their relatively low time and space complexity, which enabled usto perform the computer simulations mentioned in Sections 4 and 5, the at-tacks reported in this paper do not seriously endanger the practical security ofSEAL, because a too large amount of keystream samples (corresponding to morethan 230 (n; l) initialisation vectors) is required. These attacks suggest howeverthat simple modi cations of some design features of SEAL, e.g. the detail ofthe involvement of the IV-dependent values n1 to n4 in the second generator,would probably strengthen the algorithm without signi cant impact upon itsperformance.7 AcknowledgementsWe thank Thierry Baritaud and Pascal Chauvaud for the elaboration of theL TEX version of the gures and the paper. We would like to address special Athanks to Francois Allegre who gave us access to a quite powerful computer thatmade the tests reasonably quick and to Alain Scheiwe who helped us write theC source code of the tests.References 1. P. Rogaway and D. Coppersmith, "A Software-Optimized Encryption Algorithm", Proceedings of the 1993 Cambridge Security Workshop, Springer-Verlag, 1994. 2. B. Schneier, Applied Cryptography, Second Edition, John Wiley & Sons, 1996. 3. S. Vaudenay, "Statistical Cryptanalysis of Block Ciphers - 2 Cryptanalysis", 1995. 4. J. Bass, Elements de Calcul des Probabilites, 3e edition, Masson Et Cie, 1974. AThis article was processed using the L TEX macro package with LLNCS style

×