PredictElimLin

Predicting ElimLin Outcome on Simon
Iason Papapanagiotakis-Bousy
University College London
Author Note
This work was done as an individual project for the Cryptanalysis COMPGA18 module
of the MSc. Information Security at University College London.
April 2016

Predicting ElimLin Outcome on Simon 2
Abstract
ElimLin have been proposed as an algorithm to solve complex systems of equations
generated by the algebraic encoding of block ciphers. In this paper, we study the behavior of
ElimLin when running against the Simon block cipher proposed by the NSA. We compute
functions to compute important results of the attack on 8 rounds of the cipher by doing linear,
polynomial and non-linear regression based on the number of known ciplertext/plaitext pairs.
Finally, we compare the results of running ElimLin on Simon and on Coutrois Toy Cipher.
Keywords: ElimLin, Simon, Algebraic Cryptanalysis, CTC, Regression

Table of Contents
Abstract............................................................................................................................... 2
Introduction......................................................................................................................... 5
1 Background................................................................................................................. 6
1.1 ElimLin ............................................................................................................... 6
1.2 Simon.................................................................................................................. 6
2 Data collection ............................................................................................................ 7
2.1 Notation............................................................................................................... 7
2.2 Gathering Data.................................................................................................... 8
3 Building predictive models....................................................................................... 10
3.1 Interesting Predictions ...................................................................................... 10
3.2 Methodology......................................................................................................11
3.2.1 Regression......................................................................................................11
3.2.2 Evaluation method ........................................................................................ 12
3.2.3 Visualization ................................................................................................. 12
3.3 Predictors .......................................................................................................... 13
3.3.1 The 𝑼𝒏𝒃𝒓𝒐𝒌𝒆𝒏𝑺𝒕𝒂𝒓𝒕𝑽𝒂𝒓𝒔 ratio................................................................ 13
3.3.2 𝑼𝒏𝒃𝒓𝒐𝒌𝒆𝒏................................................................................................... 15
3.3.3 Specific ElimLin rounds ............................................................................... 16
4 Comparison with CTC.............................................................................................. 18
4.1 Results............................................................................................................... 18
4.2 Comments ......................................................................................................... 19
4.2.1 Differences.................................................................................................... 19
4.2.2 Similarities.................................................................................................... 20
4.2.3 Analysis......................................................................................................... 20
5 Conclusion and Future work..................................................................................... 22
References......................................................................................................................... 23
APPENDIX A: Limits of predictors ................................................................................. 24
APPENDIX B: Generating graphs in WolframCloud....................................................... 25
APPENDIX C: Getting the full predictors........................................................................ 26

Table of Figures
Figure 1: example of ElimLin execution ........................................................................................ 7
Figure 2: interpretation of ElimLin output...................................................................................... 7
Figure 3: Data gathered by running ElimLin on 7 rounds of Simon .............................................. 8
Figure 6: Data gathered by running ElimLin on 16 rounds of Simon ............................................ 9
Figure 7: Polynomial of degree 5 predicting U/S ......................................................................... 13
Figure 8: Polynomial of logarithms degree 4 predicting U/S....................................................... 14
Figure 9: Power function predicting U/S...................................................................................... 14
Figure 10: Polynomial of degree 4 to predict Unbroken .............................................................. 15
Figure 11: r1, r2 and r3 growing linearly with K.......................................................................... 16
Figure 12: r4 grows faster than linearly........................................................................................ 17
Figure 13: ElimLin on 5 rounds of CTC fixed key-bits = 10 ....................................................... 18
Figure 15: ElimLin on 11 rounds of CTC fixed key-bits = 10 ..................................................... 19
Figure 17: K-Unbroken relationship for 7 rounds of Simon and CTC......................................... 20
Figure 18: K-Unbroken/StartVars relationship for 7 rounds of Simon and CTC ......................... 21
Figure 19: Predictor failing after the breaking point .................................................................... 24
Figure 20: Code for WolframCloud to generate graphs................................................................ 25

Introduction
The structure of the paper is as follows. In the first chapter we cover some background, we
introduce the Simon cipher and describe the ElimLin algorithm and its properties. In chapter 2 we
declare the notation for the variables used thought-out the paper and then present the data collected
by running ElimLin on Simon. Next, chapter 3, introduces the quantities we are trying to predict,
presents the regression technique and the evaluation methods we used and concludes with our
actual predictors. Chapter 4 presents the data acquired by running ElimLin on Courtois Toy Cipher
and then compares that to the data from Simon. We conclude with possible future directions in
chapter 5. In the appendixes you can find extra material.
Together with this document, this work includes some other data and source code files that
are mentioned at different places.

1 Background
In this chapter we give a quick introduction to ElimLin and the Simon block cipher. As
they are the focus of our work on the next chapters.
1.1 ElimLin
The ElimLin algorithm has been proposed by Courtois et al. as a tool for algebraic
cryptanalysis that solves systems of equations generated by the algebraic encoding of block ciphers
[1], [2]. The algorithm is composed of two sequential operations:
1. Gaussian Elimination: All the linear equations in the linear span of initial equations are
found. They are the intersection between two vector spaces: The vector space spanned by
all monomials of degree 1 and the vector space spanned by all equations.
2. Substitution: Variables are iteratively eliminated in the whole system based on linear
equations until there is no linear equation left. Consequently, the remaining system has
fewer variables.
This process is repeated until no linear equations are obtained. In our work we used the
implementation found in [3].
A surprising attack.
ElimLin might look as a trivial algorithm for solving systems of equations but what is
surprising to discover is that it exploits hidden structures to eliminate more linear variables on each
round. As shown in our results, there is a tipping point after which ElimLin will completely solve
the system of equations very fast.
1.2 Simon
The Simon block cipher have been recently introduced by the NSA along the Speck block
cipher [4]. Simon is a lightweight block cipher aimed to be used on constrained devices but is
designed to work a wide range of rounds.
There exist many variant of Simon with different key and block sizes. For this work we
used the open source implementation of [5] that does the algebraic encoding of Simon with
𝑘𝑒𝑦 𝑠𝑖𝑧𝑒 = 128, 𝑏𝑙𝑜𝑐𝑘 𝑠𝑖𝑧𝑒 = 64.

2 Data collection
This chapter first introduces the notation used throughout the work for the variables we are
interested in. Then we present the data acquired by running ElimLin against the Simon cipher.
2.1 Notation
When running from the terminal the Simon.exe program we have different options, we will
use it like this: Simon.exe NR /fixkF /insK /xl0 where NR = number of rounds for the cipher, F =
number of fixed/guessed bits for the key and K = number of random plaintext/ciphertext pairs used.
The output of the program might look like this:
In this case NR = 16, F = 0, K = 8. The output is interpreted as follows:
 TrivialLins is the number of linear equations found on the original encoding of the
cipher, this is not an important quantity.
 r1, r2, r3… are the number of linear equations found after the first, second, third
…etc. round of the cipher.
 TotalVars is the total number of variables appearing in the system during ElimLin
execution.
 Unbroken is the number of equations not solved when ElimLin terminates.
 StartVars is the number of non-trivial equations of the initial encoding of the system.
Simon.exe 16 /fixk0 /ins8 /xl0
…
10496+ 512+ 448+ 257+ 1+ 0
Elim[ 14208] 0.529 h 2494/3712 …
Figure 1: example of ElimLin execution
Simon.exe 16 /fixk0 /ins8 /xl0
…
TrivialLins+ r1+ r2+ r3+ r4+ r5
Elim[ TotalVars] 0.529 h Unbroken/StartVars …
Figure 2: interpretation of ElimLin output

2.2 Gathering Data
In order to build accurate predictors of how ElimLin works on Simon we needed as much
data as possible. In this section we present the results of running ElimLin on Simon where we
guess none of the key bits (𝐹 = 0) and vary the number of rounds NR and the known
ciphertext/plaintext pairs K.
Due to time and resources constrains only up to 8 rounds have been completely cracked
but we still provide some data for 9 and 16 rounds that could be useful.
A ~ in the r1, r2… fields means that due to some error we were not able to collect those
values. This does not influence the rest of the values collected.
NR K Startvars Unbroken r1 r2 r3 r4 r5 r6 r7
7 2 448 248 1472 128 64 8 0
7 3 608 268 2208 192 128 20 0
7 4 768 271 2944 256 192 49 0
7 5 928 262 3680 320 256 89 1 0
7 6 1088 244 4416 384 320 139 1 0
7 7 1248 215 5152 448 384 200 1 0
7 8 1408 188 5888 512 448 259 1 0
7 9 1568 158 6624 576 512 320 2 0
7 10 1728 126 7360 640 576 384 2 0
7 11 1888 82 8096 704 640 448 14 0
7 12 2048 0 8832 768 704 512 17 47 0
Figure 3: Data gathered by running ElimLin on 7 rounds of Simon
8 2 512 317 1600 128 64 3 0
8 4 896 409 3200 256 192 39 0
8 8 1664 443 6400 512 448 261 0
8 10 2048 448 8000 640 576 384 0
8 12 2432 448 9600 768 704 512 0
8 16 3200 444 12800 1024 960 768 4 0
8 20 3968 439 16000 1280 1216 1024 9 0
8 25 4928 435 20000 1600 1536 1344 13 0
8 30 5888 425 24000 1920 1856 1664 23 0
8 32 6272 417 25600 2048 1984 1792 31 0
8 40 7808 402 32000 2560 2496 2304 46 0
8 50 9728 378 40000 3200 3136 2944 70 0
8 64 12416 307 51200 4096 4032 3840 140 1 0
8 70 13568 249 56000 4480 4416 4224 189 10 0

The full data collected can be found in the SimonResults.xlsx file.
From now on we will focus on the data for 8 rounds of Simon, the rest is provided to anyone who
wished to do further analysis.
9 2 576 373 1728 128 64 11 0
9 4 1024 533 3456 256 192 42 1 0
9 8 1920 698 6912 512 448 259 3 0
9 16 3712 947 13824 1024 960 768 13 0
9 32 7296 1424 27648 2048 1984 1792 48 0
9 64 14464 2298 ~ ~ ~ ~ ~ ~ ~
16 2 1024 824 2624 128 64 8 0
16 4 1920 1433 5248 256 192 39 0
16 8 3712 2494 10496 512 448 257 1 0
16 16 7296 4534 20992 1024 960 768 10 0
16 32 14464 8606 41984 2048 1984 1792 34 0
16 64 28800 16655 ~ ~ ~ ~ ~ ~ ~

3 Building predictive models
In this chapter we use the data gathered and presented in chapter 2 to build predictive
models for different interesting variables of the output of ElimLin. We consider only data for the
8 rounds of Simon with 0 key-bits fixed. This allows us to study the relationship between the
number of ciphertext/plaintext pairs K and the variables described in 3.1.
3.1 Interesting Predictions
In this section we describe what are the variables worth predicting and what is their
meaning. We introduce some new variables based on the ones introduced in 2.1.
An interesting quantity to look at is the number of non-trivial equations solved by ElimLin,
this will be equal to the sum r1, r2… When this quantity is equal to StartVars, all equations are
solved and we have fully recovered the key. We call this quantity 𝐵𝑟𝑜𝑘𝑒𝑛.
𝐵𝑟𝑜𝑘𝑒𝑛 = ∑ 𝑟𝑖
𝑖
= 𝑆𝑡𝑎𝑟𝑡𝑉𝑎𝑟𝑠 − 𝑈𝑛𝑏𝑟𝑜𝑘𝑒𝑛
And to break the system we need:
𝐵𝑟𝑜𝑘𝑒𝑛
𝑆𝑡𝑎𝑟𝑉𝑎𝑟𝑠
= 1 ⇔ 1 −
𝑈𝑛𝑏𝑟𝑜𝑘𝑒𝑛
𝑆𝑡𝑎𝑟𝑡𝑉𝑎𝑟𝑠
= 1 ⇒
= 0
This led us to choose the ratio
as the target of our first predictor. If we are able to predict
the necessary parameters to get this ratio close to zero, we could have a very good estimation of
the data complexity of the attack.
Another interesting prediction would be to directly predict 𝑈𝑛𝑏𝑟𝑜𝑘𝑒𝑛. This could be even
better than the previous ratio but due to the form of the data our second predictor is a little less
accurate. It is still very useful to have even a general idea of this number.
Finally, we are also interested in predicting independent values of 𝑟1, 𝑟2 , 𝑟3 , 𝑟4. This is
interesting because 𝑟4 represents information hidden deep inside the cipher structure and knowing
how it will grow can give insights to study the attack.

3.2 Methodology
In this section we present the tools and ideas used to build the predictive models for the
variables described in the previous section.
The first step when trying to model data is to visualize it. This helps in having a general
understanding of what is the relation between the independent (K) and the dependent variable. This
was done in Microsoft Excel in the file SimonResults.xlsx. Excel then provides built-in
functionality for creating trendlines, those can be linear, polynomials up to degree 6, logarithmic
functions etc. This allows for quick modeling/testing and gives a very good idea of what could
work as a model.
3.2.1 Regression
Regression is a statistical process for estimating the relationships among variables. It
analyzes the relationship between a dependent variable and one or more independent variables (or
'predictors'). The estimation target is a function of the independent variables called the regression
function [6].
There are different types of regression, for this work we explored:
 linear, where the regression function is a linear equation
 polynomial, where the regression function is a polynomial equation of any degree
 non-linear, where the regression function is more complex. i.e. 𝑓(𝑥) = ln(𝑎 ∗ 𝑥) 𝑏
To do this we used Python and the Scipy library [7], specifically the curve_fit function for non-
linear regression and Numpy [8] polyfit for linear and polynomial regression. Given a parametrized
function, it will try to find the best values for the parameters to minimize the sum of least squares
using the Levenberg-Marquard algorithm [9], [10]. The Python code can be found in the pred.py
file.

3.2.2 Evaluation method
To evaluate different regression functions that we found and compare them, we used two
metrics. The first one is the standard error of the estimate and gives the average error of a given
predicted value.
𝜎𝑒𝑠𝑡 = √
∑(𝑌 − 𝑌′)2
𝑁
Where 𝜎𝑒𝑠𝑡 is the standard error of the estimate, Y is an actual value, Y’ is a predicted value and N
the number of pairs of scores. This metric is useful but can also be misleading. 𝜎𝑒𝑠𝑡 = 2 on its own
means nothing if we don’t know the range of the values of the data. An estimator with 𝜎𝑒𝑠𝑡 = 2
where 𝑦 ∈ (0,1000) is much better than another one with 𝜎𝑒𝑠𝑡 = 0.5 where 𝑦 ∈ (0,1).
In order to overcome this weakness of the standard error of the estimate, we also use 𝑅2
.
This is a widely used metric that is normalized 0 < 𝑅2
≤ 1. The higher the value of 𝑅2
the better
the model fits the data. We now give the definition of 𝑅2
:
𝑦𝑖 are real values, 𝑓𝑖 are predicted values, 𝑛 the number of data points and 𝑦̅ =
1
𝑛
∑ 𝑦𝑖 the mean.
𝑅2
= 1 −
∑ (𝑦𝑖 − 𝑓𝑖)2
𝑖
∑ (𝑦𝑖 − 𝑦̅)2
𝑖
3.2.3 Visualization
To visualize the data and the regression functions found we used Microsoft Excel for basic
plots and the WolframAlpha cloud [11] for more advanced graphics. On appendix B there is a
small guide on how to use the WolframCloud. During development we also used this website for
quick visualization of functions and points.

3.3 Predictors
In this section we present the best regression functions – predictors we found for each value
described in 3.1. For each predictor we give a plot of the function and the real data points. As a
measure of quality we use the 𝑅2
metric. For easier viewing we give rounded versions of the
coefficients, on appendix C can be found the full solution as computed.
Remarks:
 All predictors are functions of K, the number of known ciphertext/plaintext pairs.
 Where appropriate we use a logarithmic scale.
3.3.1 The
𝑼𝒏𝒃𝒓𝒐𝒌𝒆𝒏
𝑺𝒕𝒂𝒓𝒕𝑽𝒂𝒓𝒔
ratio
This was our primary goal; we present three different very good predictors.
Polynomial of degree 5
Using polynomial regression, we found the following function to predict the ratio for 8
rounds of Simon.
𝑓(𝑥) = −
5.51
109
𝑥5
+
1.28
106
𝑥4
−
1.12
104
𝑥3
+
4.59
103
𝑥2
−
8.94
102
𝑥 + 0.76
𝜎𝑒𝑠𝑡 = 0.012
𝑅2
= 0.9949
Figure 7: Polynomial of degree 5 predicting U/S

Polynomial of logarithms degree 4
Due to the shape of the data we found that it can also be modeled as a polynomial
of logarithms with the following function.
𝑓(𝑥) = −
9.57
103
(ln 𝑥)4
+
1.0003
10
(ln 𝑥)3
−
3.19
10
(ln 𝑥)2
+
1.35
10
ln 𝑥 + 0.65
𝜎𝑒𝑠𝑡 = 0.002
𝑅2
= 0.9998
𝒂𝒙 𝒃
+ 𝒄
Our last predictor for the
𝑈
𝑆
value is:0
𝑓(𝑥) = 1.069 ∗ 𝑥−0.485
− 0.127
𝜎𝑒𝑠𝑡 = 0.012
𝑅2
= 0.9941
Figure 8: Polynomial of logarithms degree 4 predicting U/S
Figure 9: Power function predicting U/S

3.3.2 𝑼𝒏𝒃𝒓𝒐𝒌𝒆𝒏
As mentioned earlier, we are interested in predicting the number of variables that cannot
be determined by ElimLin for a given number K. Our results show that this can be modelled as a
polynomial of degree 4 with very good results.
𝑓(𝑥) = −
1.34904349067
104
∗ 𝑥4
+
2.06539
102
∗ 𝑥3
− 1.118316 ∗ 𝑥^2 + 22.3936 ∗ 𝑥 + 308.25
𝜎𝑒𝑠𝑡 = 13.43
𝑅2
= 0.9863
It is important to note that rounding the coefficients of the equation even to the third decimal digit
will alter significantly the curve produced.
Figure 10: Polynomial of degree 4 to predict Unbroken

3.3.3 Specific ElimLin rounds
Here we give our finding regarding the values of r1, r2, r3 and r4.
We notice something very interesting, all three variables r1, r2 and r3 are growing linearly
with the same rate. The only difference is the offset of the function 𝑓(𝑥) = 64𝑥 + 𝑐 .
r1 = 64x
R² = 1
r2 = 64x - 64
R² = 1
r3 = 63.296x - 224.24
R² = 0.9995
-1000
0
1000
2000
3000
4000
5000
0 10 20 30 40 50 60 70 80
R1, R2 AND R3 AS A FUNCTION OF K
r1 r2 r3 Linear (r1) Linear (r2) Linear (r3)
Figure 11: r1, r2 and r3 growing linearly with K

On the other hand, r4 is behaving differently. Here we see that it grows faster than linearly and this
gives great power to the attacker. Initial data suggest that r5 will be like r4 but we do not currently
have enough data to make an educated guess.
Figure 12: r4 grows faster than linearly
We give polynomials of degree 2 (red) and 4 (black) that model the growth of r4. To see the
predictor fail and learn why it doesn’t matter see the appendix A.
y = 4E-05x4 - 0.0048x3 + 0.2576x2 - 4.4612x + 26.728
R² = 0.9994
y = 0.0583x2 - 1.7869x + 21.613
R² = 0.9933
0
20
40
60
80
100
120
140
160
180
200
0 10 20 30 40 50 60 70 80
R4 as a function of K
r4 Poly. (r4) Poly. (r4)

4 Comparison with CTC
An interesting work would be to compare the behavior of ElimLin on Simon with another
cipher. Courtois Toy Cipher (CTC) is embedded in the same executable we used for ElimLin,
making it an easy choice to study. We first give the results obtained by running ElimLin on CTC
and then comment and compare them with those of ElimLin on Simon.
4.1 Results
We tested CTC for 8 S-boxes, meaning that the version tested had 24 bit keys. We then
varied the number of rounds, number of known ciphertext/plaintext pairs and known key-bits.
Rounds fixedk kcp StartVars unbroken U/S
5 10 2 206 142 0.68932
5 10 3 302 188 0.622517
5 10 4 398 226 0.567839
5 10 5 494 257 0.520243
5 10 6 590 304 0.515254
5 10 7 686 337 0.491254
5 10 8 782 376 0.480818
5 10 9 878 402 0.457859
5 10 10 999 454 0.454454
5 10 13 1262 469 0.371632
Figure 13: ElimLin on 5 rounds of CTC fixed key-bits = 10
Rounds fixedk kcp startVars unbroken U/S
7 10 2 302 237 0.784768
7 10 3 446 329 0.737668
7 10 4 590 415 0.70339
7 10 5 734 505 0.688011
7 10 6 878 588 0.669704
7 10 7 1022 679 0.664384
7 10 8 1166 738 0.632933
7 10 9 1310 839 0.640458
7 10 10 1554 908 0.584299
7 10 11 1598 978 0.612015
7 10 12 1742 1028 0.590126
7 10 14 2030 1124 0.553695
7 10 16 2318 1220 0.526316

11 10 2 494 430 0.870445
11 10 3 734 617 0.840599
11 10 4 974 803 0.824435
11 10 5 1214 988 0.813839
11 10 6 1454 1174 0.807428
11 10 7 1694 1351 0.797521
11 10 8 1934 1530 0.791107
11 10 9 2174 1694 0.779209
11 10 10 2414 1871 0.775062
11 10 12 2894 2194 0.75812
11 10 20 4814 3347 0.695264
5 15 5 489 201 0.411043
5 15 7 681 211 0.309838
5 15 9 873 210 0.24055
5 15 11 1065 208 0.195305
5 15 13 1257 205 0.163087
The full data, including simulations for different number of rounds are in the CTC.xlsx file.
4.2 Comments
The data from ElimLin on CTC give a different picture, here we point out similarities and
differences with the data acquired by running ElimLin on Simon shown in 2.2. Due to the lack of
data for larger values of K we were not able to build predictor functions.
4.2.1 Differences
ElimLin on Courtois Toy Cipher runs significantly slower than on Simon. This is most
likely due to the structure and the algebraic encoding of the two ciphers. Simon is known for its
very simple structure, making the “hidden” information that ElimLin recovers much easier to
obtain. The fact that ElimLin on CTC is much slower is the reason why there is less data collected
for this cipher. It is also the reason why while we studied Simon with 0 out of 128 key-bits known,

on CTC we fixed 10 or 15 out of 24. A final consequence of the hardness is that even with only 5
rounds and 10 fixed key-bits we were not able neither recover the full key nor reach the point where
the number of unbroken variables decreases. This was latter achieved with 15 key-bits fixed.
4.2.2 Similarities
Despite the differences presented in the previous section and our main focus on Simon, we
do see similarities on the two datasets. On both scenarios, the
is constantly diminishing.
This is an indicator that ElimLin is indeed doing something. Furthermore, as seen in Figure 16,
the number of unbroken variables after some K is falling as observed in the Simon datasets.
4.2.3 Analysis
As a comparative example, here you can see the difference of the evolution of the number
of unbroken variables and the
𝑢𝑛𝑏𝑟𝑜𝑘𝑒𝑛
ratio for 7 rounds of both ciphers. CTC had 10 key-bits
while Simon had zero.
Figure 17: K-Unbroken relationship for 7 rounds of Simon and CTC
0
200
400
600
800
1000
1200
1400
0 2 4 6 8 10 12 14 16 18
K-U relationship for 7 rounds of Simon and CTC
Simon-Unbroken CTC-Ubroken

Figure 18: K-Unbroken/StartVars relationship for 7 rounds of Simon and CTC
These graphs show that ElimLin seems to not work on CTC as well as it does on Simon, further
data are needed to make a stronger claim.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 2 4 6 8 10 12 14 16 18
K-U/S relationship for 7 rounds of Simon and CTC
Simon-U/S CTC-U/S

5 Conclusion and Future work
With this work we demonstrate that it is possible to make functions for very accurate
predictions of the evolution of an ElimLin attack on the Simon cipher. Here we focused only on
one variant of the cipher and restricted to 8 rounds. The next step would be to make use of
multivariate regression in order to make a global predictor that would take into account the number
of rounds and possibly even the number of guessed key-bits. In order to do this there effectively,
much more data is needed. To have more data is a matter of computing resources but could also
be done faster with improvements on the ElimLin software. Furthermore, applying the same
methodology to study ElimLin on other block ciphers and compare the results could also reveal
interesting information. Finally, future analysis should focus on the time complexity as well to be
able to compare both data and time needed against other attacks.

References
[1] N. Courtois and G. Bard, “Algebraic cryptanalysis of the data encryption standard,”
Cryptography and Coding, 2007.
[2] N. Courtois, “CTC2 and Fast Algebraic Attacks on Block Ciphers Revisited.,” IACR
Cryptology ePrint Archive, 2007.
[3] “Tools for Experimental Algebraic Cryptanalysis.” [Online]. Available:
http://www.cryptosystem.net/aes/tools.html. [Accessed: 08-Apr-2016].
[4] R. Beaulieu and D. Shors, “The SIMON and SPECK lightweight block ciphers,”
Proceedings of the …, 2015.
[5] “SimonSpeck: Simon & Speck block cipher implementation open source code in C.”
[Online]. Available: https://github.com/GSongHashrate/SimonSpeck. [Accessed: 08-Apr-
2016].
[6] Wikipedia, “Regression Analysis.” [Online]. Available:
https://en.wikipedia.org/wiki/Regression_analysis.
[7] “Scipy.” [Online]. Available: https://www.scipy.org/.
[8] “Numpy.” [Online]. Available: http://www.numpy.org/.
[9] A.-M. Legendre, Nouvelles méthodes pour la détermination des orbites des comètes. Paris:
F. Didot, 1805.
[10] K. Levenberg, “A Method for the Solution of Certain Non-Linear Problems in Least
Squares,” Quarterly of Applied Mathematics, vol. 2, pp. 164 – 168, 1944.
[11] “WolframCloud.” [Online]. Available: https://www.wolframcloud.com/.

APPENDIX A: Limits of predictors
It is interesting to see the limits of the predictors found in this work. Here we focus on the
r4 predictor presented in 3.3.3.
The predicting functions for ElimLin on Simon are useful when trying to determine the
expected data complexity of an attack to achieve full (or almost full) key recovery. Thus we are
interested in all our predictors to be as accurate as possible at the “break” point and a little before.
But what happens to them after they reach that point? As seen in Figure 12, following the
polynomial predictor of degree 4: 𝑓(𝑥) =
4
105
𝑥4
−
4.8
103
𝑥3
+
2.576
10
𝑥2
− 4.4612𝑥 + 26.728 the
predicted values will keep growing.
But, this growth is not seen in the actual data if we run simulations for larger values of K.
This is due to the fact that firstly, the total number of variables that can be “solved” at fourth round
is bound by the total number of variables of the system and secondly, by increasing even more K,
ElimLin will be able to find more variables earlier in the process, leaving only very few to be
found on the fourth round and eventually zero.
Below is the graph of r4 values, for 6 rounds of Simon, before and after the “breaking”
point. The line traced is the estimated predictor based on the first four points.
Figure 19: Predictor failing after the breaking point
y = 6x3 - 54.5x2 + 160.5x - 151
0.1
1
10
100
1000
10000
0 2 4 6 8 10 12 14
R4 values after the breaking point
r4 r4 Poly. (r4)

APPENDIX B: Generating graphs in WolframCloud
WolframCloud is part of the WolframAlpha computation engine and is a great platform for
programming and experiments when it comes to use a lot of math. I used it because of its power
and flexibility that allowed me to do exactly the graphs I wanted. That said, there are other tools
that could do equally well the same job.
To use it, sing up at www.wolframcloud.com/ , log in to the online wolfram development
platform and upload to your files the wolf.nb file. Once the file is uploaded and loaded into the
online editor click on “Evaluation” → “Evaluate all cells”. You should see the following:
Figure 20: Code for WolframCloud to generate graphs
After execution this code should be followed by the graphs presented in 3.3.1 and 3.3.2.

APPENDIX C: Getting the full predictors
To compute the exact predictors, we used the pred.py Python script. The data is embedded
in the code to make it more easy to use and portable but can be changed in the loadData function.
The code is also commented to make it more readable. The program can be executed via command
line Python pred.py and will output 4 functions to model
and one function for predicting
𝑢𝑛𝑏𝑟𝑜𝑘𝑒𝑛 with their standar error of the estimate and 𝑅2
values.
− − − − − − −−> 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑛𝑔 𝑈𝑛𝑏𝑟𝑜𝑘𝑒𝑛/𝑆𝑡𝑎𝑟𝑡𝑉𝑎𝑟𝑠 𝑟𝑎𝑡𝑖𝑜:
𝑓(𝑥) = −0.00957458507921 ∗ (𝑙𝑛𝑥)4
+ 0.100033595599 ∗ (𝑙𝑛𝑥)3
− 0.319048040248
∗ (𝑙𝑛𝑥)2
+ 0.135243470687 ∗ 𝑙𝑛𝑥 + 0.64818802088
𝑠𝑡𝑎𝑛𝑑𝑎𝑟 𝑒𝑟𝑟𝑜𝑟 = 0.00203449262351 , 𝑅2
= 0.999855203924
𝑓(𝑥) =
𝑒0.350096348878∗(𝑙𝑛𝑥)3+1.17546785847∗𝑙𝑛𝑥
𝑒0.0385177598247∗(𝑙𝑛𝑥)4+1.24895878909∗(𝑙𝑛𝑥)2+0.80206204769
𝑠𝑡𝑎𝑛𝑑𝑎𝑟 𝑒𝑟𝑟𝑜𝑟 = 0.00382755918962 , 𝑅2
= 0.999487506472
𝑓(𝑥) = 1.06872499031 ∗ 𝑥−0.485414642925
− 0.127479547952
𝑠𝑡𝑎𝑛𝑑𝑎𝑟 𝑒𝑟𝑟𝑜𝑟 = 0.0129444711103 , 𝑅2
= 0.994138440527
𝑓(x) = −5.51150597571 ∗ 10−9
𝑥5
+ 1.28058151563 ∗ 10−6
𝑥4
− 0.000111919894711 𝑥3
+ 0.00457854642152 𝑥2
− 0.0894587591956 𝑥 + 0.761784059032
𝑠𝑡𝑎𝑛𝑑𝑎𝑟 𝑒𝑟𝑟𝑜𝑟 = 0.0119782223137 , 𝑅2
= 0.994980860191
− − − − − − −−> 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑛𝑔 𝑈𝑛𝑏𝑟𝑜𝑘𝑒𝑛:
𝑓(𝑥) = −0.000134904349067 ∗ 𝑥4
+ 0.0206538910434 ∗ 𝑥3
− 1.11720790316 ∗ 𝑥2
+ 22.3935898687 ∗ 𝑥 + 308.249974221
𝑠𝑡𝑎𝑛𝑑𝑎𝑟 𝑒𝑟𝑟𝑜𝑟 = 13.4331654503 , 𝑅2
= 0.986337343228

PredictElimLin

Recommended

Recommended

More Related Content

Similar to PredictElimLin

Similar to PredictElimLin (19)

PredictElimLin