SlideShare a Scribd company logo
1 of 8
Download to read offline
Time, Parameters and Efficiency Analysis of the
Dual-Sieving Algorithm
Nikita Grishin
September 3rd, 2013
Abstract
This report presents some results of a time, parameters and efficiency
analysis of the Dual-Sieving algorithm, currently under development by
LACAL, EPFL. This algorithm solves exact SVP and CVP for a lattice
of dimension n in time 20.398n
using memory 20.292n
. We use gprof unix
tool for time-profiling, and Perl scripting language for data analysis.
1 Introduction
1.1 Short introduction to dual-sieving algorithm
1.1.1 General description
As the input to the algorithm, we have a lattice L = L (0)
⊆ Rn
of dimension
n with a basis B = B(0)
, a random vector x(0)
⊆ Rn
, called target-vector and
radius R0 ∈ R>0. The task is to enumerate (almost) all vectors in x(0)
+ L ∩
BallR(0)
= C(0)
. The radius R(0)
= βrn
n vol(L0) where β > 1 and rn is
the radius of n-dimensional ball of volume 1. Assuming the Gaussian heuristic,
we expect that |C(0)
| ≈ βn
≈ V ol(Ball(R(0)
))
V olL . The SVP(L ) can be reduced to
enumerating a bounded coset 0 + L ∩ Ball(λ1), and a CVP(L , t) for t ∈ Rn
can be solved by enumerating a coset -t + L ∩ Ball(dist(t, λ1)).
Parameters of the algorithm are α and β.
Let B(0)
be an LLL-reduced basis of a lattice L (0)
. We wish to enumerate
all points in C(0)
= x(0)
+L (0)
∩Ball(R(0)
) with given x(0)
and R(0)
. If B(0)
was
quasi-orthogonal, we would just apply Schnorr-Euchner’s algorithm and obtain
C(0)
in time ≈ |C(0)
|. Otherwise, we propose to decompose the problem until
we reach a quasi-orthogonal case.
Let B(1)
be a basis of an overlattice L (1)
⊇ L (0)
and x(1)
∈ R be a random
vector. We construct C(0)
by merging C(1)
and C (1)
, where C(1)
= x(1)
+
L (1)
∩Ball(R(1)
) and C (1)
= x(0)
−x(1)
+L (1)
∩Ball(R(1)
), with R0
= αR(1)
.
So for (y, y ) ∈ (C(1)
, C (1)
) we have y + y ∈ x(0)
+ L (1)
. The probability that
y + y will be also in of the x(0)
+ L (0)
can be estimated by vol(L (1)
)
vol(L (0))
.
To find optimal α and β that minimize the memory, we need that |C(0)
| ≈
|C(1)
| ≈ |C (1)
|. Again, by the Gaussian heuristic, |C(0)
| ≈ vol(Ball(R(0)
))
vol(L (0))
≈
vol(Ball(R(1)
))
vol(L (1))
⇔ vol(L (1)
)
vol(L (0))
= vol(Ball(R(1)
)
volBall(R(0))
≈ 1
αn . The last approximation follows
1
as R(0)
= α ∗ R(1)
. We deduce that vol(L (1)
) ≈ vol(L (0)
)
αn Also we want to
have a non-zero number of points inside of an intersection I = x(1)
+ L (1)
∩
Ball(0, R(1)
) ∩ Ball(v, R(1)
) with v ∈ C(0)
which means that vol(I)
vol(L (1))
≥ 1
should hold. We can elstimate the vol(I) by (R(1)
)n
vol(Ball( 1 − α2
4 )) ≈
(R(0)
α )n
( 1 − α2
4 )n
in the asymptotic case. With the requirement vol(I) ≥
vol(L (1)) and the fact that R(0)
≈ βrn
n vol(L0) we find a relation between β
and α, that garantees that the intersection I of two balls is not empty: β2
(1 −
α2
4 ) ≥ 1.
So, after the merge between C(1)
and C (1)
we obtain β2n
αn vectors in average
in C(1)
C (1)
, and all these vectors are in x(0)
+ L (0)
. To optimize the
complexity of the enumeration, we need to minimize the max(β2n
αn , βn
) under
the condition that β2
(1 − α2
4 ) ≥ 1. From β2
(1 − α2
4 ) ≥ 1 ⇒ β2
≥ 4
4−α2 such
that we choose β2
= 4
4−α2 to minimize the size of the list. Minimizing β2n
αn , we
find β = 4
3 and α = 3
2 that minimize the number of vectors in C(1)
C (1)
.
Afterwards, we choose only vectors with a norm less than or equal to R(0)
.
To sample all vectors in C(1)
and C (1)
we iterate this process k times with k
being the smallest integer such that vol(L )1/n
αk ≤ min b∗
i 2 with b∗
i the Gram-
Schmidt vectors of L .
1.1.2 Symmetric and asymmetric merge tree, target-vector norm
To optimize the speed and the memory of the algorithm, we tried two differents
types of merge trees: symmetric and asymmetric. Also, we were playing with
differentes norms of target-vectors.
For the asymmetric merge tree, at each merge step of the algorithm, we
choose a random vector x(i)
∈ Rn
and we wish to enumerate vectors in the sets
C(i)
= x(i)
+ L (i)
∩ Ball(R(i)
) and C (i)
= x(i−1)
− x(i)
+ L (i)
∩ Ball(R(i)
). So
at level k we obtain 2k
different sets.
For the symmetric merge tree, we choose a random vector only once, at
step 1, x(1)
, and we construct two sets C(1)
= x(1)
+ L (1)
∩ Ball(R(1)
) and
C (1)
= x(0)
− x(1)
+ L (1)
∩ Ball(R(1)
). Afterwards, at each next step of
the algorithm, we take as the target-vector x(i)
half the previous target-vector
x(i−1)
. So at level k we obtain 2 different sets.
The symmetric approach allows us to gain in speed and memory, but we
loose some randomness.
We observed (figure 1a) that we enumerate exponentially more elements
that expected if the target vector was too close to zero. In a second experiment
(figure 1b) we therefore increased the norm of x(1)
by a factor 2k
2
Level
|L
(i)
1 |+|L
(i)
2 |
2
1 156652
2 162259
3 173625
4 213778
5 406735
(a) n=50, R=1.09, α =1.12, short
target-vectors
Level
|L
(i)
1 |+|L
(i)
2 |
2
1 70068.5
2 79621.5
3 89920.5
4 103614
5 155649
(b) n=50, R=1.09, α =1.12, large
target-vectors
Figure 1: Comparison of list length for short and large target-vector
1.2 Test and experiment of parameters and environment
There are some parameters in the Dual-Sieving algorithm, namely the lattice
dimension, a factor to increase the sphere radius, the number of translated
lattices and the number of random bases for each lattice, as well as α and β.
Experiments were done with the following parameters:
• n ∈ {25, 35, 40, 45, 50, 60},
• factor to increase the sphere radius factorR ∈ {1.01, 1.02, 1.02, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1},
• number of translated lattices nbrep ∈ {1, 5},
• number of random bases per lattice nbessais ∈ {1, 3},
• the α parameter was set to its default value (α = 4
3 ) for all n, except
n ∈ {50, 60}.
Hardware used for experiments:
• Intel(R) Xeon(R) CPU, E5430 @ 2.66GHz
• 16 Gb RAM
3
2 Time analysis
Results of the time analysis are predictable: it is exponential in the dimension
and grows exponentially to the factorR. The interesting fact is that for n > 40,
algorithm works better with factorR = 1.07 than with factorR = 1.06 (Figure
2).
20.00 25.00 30.00 35.00 40.00 45.00
dimension
0.00
11.08
22.16
33.24
44.33
time(mimutes)
factorR=1.01 factorR=1.02 factorR=1.03 factorR=1.04 factorR=1.05 factorR=1.06
factorR=1.07 factorR=1.08
Time vs Dimension
Figure 2: Dimensions vs time (in minutes).
4
3 Merge steps
3.1 Average and variance of the list length
For low dimensions (n < 50) we used the asymmetric merge tree, α was fixed
to 4
3 , β was fixed to 3
2 . Results show that depending on the level the
average list length drops quickly. The reason for such loss of elements are the
two conditions: at the i-th step, new vectors should
• be in L (i−1)
• have a norm less than or equal to R(i−1)
These conditions are not satisfied for enough elements. To fix this, we can
decrease the value of the α parameter: the probability that a vector v will be
in both lattices is equal to 1
αn .
For high dimensions (n ∈ {50, 60}), we use α ∈ {1.11, 1.12} and factorR ∈
{1.01, 1.04, 1.05, 1.06, 1.07, 1.09}, symmetric merge tree and β fixed to 4
3 .
In the figure 3 and 4 you can see the average and the variance of a list length
per level for low dimensions using asymmetric merge tree and short target-
vectors.
0.00 1.00 2.00 3.00 4.00
Level
0.00
946.70
1893.41
2840.11
3786.82
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
List length average vs Level
n=35.
0.00 1.00 2.00 3.00 4.00
Level
0.00
3177.19
6354.39
9531.58
12708.77
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
List length average vs Level
n=40.
0.00 1.25 2.50 3.75 5.00
Level
0.00
11501.19
23002.38
34503.56
46004.75
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
List length average vs Level
n=45.
Figure 3: List length averages for low dimensions, asymmetric merge tree
0.00 1.00 2.00 3.00 4.00
Level
0.00
310.81
621.62
932.43
1243.24
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
List length variance vs Level
n=35.
0.00 1.00 2.00 3.00 4.00
Level
0.00
1223.55
2447.11
3670.66
4894.21
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
List length variance vs Level
n=40.
0.00 1.25 2.50 3.75 5.00
Level
0.00
3226.39
6452.78
9679.17
12905.56
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
List length variance vs Level
n=45.
Figure 4: List length variances for low dimensions, asymmetric merge tree
In the tables 5 you find the average list length per level for high dimensions
using symmetric merge tree. Note that for symmetric merge tree we have only
two branches with the same number of elements inside each pair of lists, so
5
this average was calculated as follows:
|L
(i)
1 |+|L
(i)
2 |
2 where |L
(i)
j | is the number of
elements inside of the list j on level i.
Level
|L
(i)
1 |+|L
(i)
2 |
2
1 1
2 179
3 3604.5
4 18927
5 78280
a) n=50, R=1.01, α =1.12, short
target-vectors
Level
|L
(i)
1 |+|L
(i)
2 |
2
1 2869
2 13600
3 33993.5
4 68635.5
5 179852
b) n=50, R=1.05, α =1.12, short
target-vectors
Level
|L
(i)
1 |+|L
(i)
2 |
2
1 156652
2 162259
3 173625
4 213778
5 406735
c) n=50, R=1.09, α =1.12, short
target-vectors
Level
|L
(i)
1 |+|L
(i)
2 |
2
1 2384024
2 2323117
3 2143358
4 1859734
5 1502343.5
6 1260579
7 1893701
e) n=60, R=1.09, α =1.12, short
target-vectors
Figure 5: List length averages for high dimensions, symmetric merge tree
6
3.2 Duplicates in list sorting
We have counted duplicates for low dimensions with an asymmetric merge tree.
The number of duplicates is not stable as the size of the lists does decrease.
On the figure 6 ans 7 you can see the average number and the variance of
number of duplicates by level for dimensions 35, 40, and 45.
0.00 1.00 2.00 3.00 4.00
Level
0.00
3713.58
7427.17
11140.75
14854.34
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
Duplicates average vs Level
n=35.
0.00 1.00 2.00 3.00 4.00
Level
0.00
17653.43
35306.87
52960.30
70613.73
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
Duplicates average vs Level
n=40.
0.00 1.25 2.50 3.75 5.00
Level
0.00
83221.88
166443.75
249665.63
332887.51
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
Duplicates average vs Level
n=45.
Figure 6: Duplicates averages
0.00 1.00 2.00 3.00 4.00
Level
0.00
62.59
125.19
187.78
250.38
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
Duplicates variance vs Level
n=35.
0.00 1.00 2.00 3.00 4.00
Level
0.00
116.03
232.06
348.08
464.11
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
Duplicates variance vs Level
n=40.
0.00 1.25 2.50 3.75 5.00
Level
0.00
304.94
609.87
914.81
1219.74
factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07
factorR=08
Duplicates variance vs Level
n=45.
Figure 7: Duplicates variances
7
4 Percentage of short vectors found
We are interested in the ratio of short vectors within all found vectors of C(0)
.
The short vector in this case is one of norm less than 1.1 ∗ |v|2 where v is a
known somehow short vector in L .
Results for low dimensions, using the asymmetric merge tree:
• n = 40, α = 4
3 , factorR = 1.06, rate= 1.45%
• n = 40, α = 4
3 , factorR = 1.07, rate= 1.45%
• n = 40, α = 4
3 , factorR = 1.08, rate= 0.7%
• n = 45, α = 4
3 , factorR = 1.07, rate= 0.57%
• n = 45, α = 4
3 , factorR = 1.08, rate= 0.2%
Results for high dimensions, using the symmetric merge tree:
• n = 50, α = 1.12, factorR = 1.06, rate= 0.7%, found vectors: 282, number
of short vectors: 2.
• n = 50, α = 1.12, factorR = 1.07, rate= 0.09%, found vectors: 13237,
number of short vectors: 12.
• n = 50, α = 1.12, factorR = 1.08, rate= 0.015%, found vectors: 95090,
number of short vectors: 14.
5 Conclusion
The number of experiments that we have done is not sufficient to make any
rigorous conclusion about the choice of parameters, targets or the efficiency of
the dual-sieve algorithm. Further experiments need to be performed for higher
dimensions, more random lattices and randomised bases.
However, the experiments gave rise to some improvements such as an in-
crease of the target vector norm which guarantees lists of average size. We also
compared the symmetric (single choice of target and having in lower levels) with
the asymmetric (new random targets in each level) merge tree. We saw that
a single random target vector reduces the memory considerably while keeping
enough randomness to lead solutions.
8

More Related Content

What's hot

Metodos jacobi y gauss seidel
Metodos jacobi y gauss seidelMetodos jacobi y gauss seidel
Metodos jacobi y gauss seidelCesar Mendoza
 
Gauss jordan and Guass elimination method
Gauss jordan and Guass elimination methodGauss jordan and Guass elimination method
Gauss jordan and Guass elimination methodMeet Nayak
 
Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential slides
 
Backtracking & branch and bound
Backtracking & branch and boundBacktracking & branch and bound
Backtracking & branch and boundVipul Chauhan
 
Application of Numerical Methods (Finite Difference) in Heat Transfer
Application of Numerical Methods (Finite Difference) in Heat TransferApplication of Numerical Methods (Finite Difference) in Heat Transfer
Application of Numerical Methods (Finite Difference) in Heat TransferShivshambhu Kumar
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMarjan Sterjev
 
Perspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer SheetPerspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer SheetHoang Nguyen Phong
 
Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.Nagesh NARASIMHA PRASAD
 
2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_faria2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_fariaPaulo Faria
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetHoang Nguyen Phong
 
Modeling biased tracers at the field level
Modeling biased tracers at the field levelModeling biased tracers at the field level
Modeling biased tracers at the field levelMarcel Schmittfull
 
Branch and bound technique
Branch and bound techniqueBranch and bound technique
Branch and bound techniqueishmecse13
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1aGaston Liberman
 

What's hot (20)

Metodos jacobi y gauss seidel
Metodos jacobi y gauss seidelMetodos jacobi y gauss seidel
Metodos jacobi y gauss seidel
 
Gauss jordan and Guass elimination method
Gauss jordan and Guass elimination methodGauss jordan and Guass elimination method
Gauss jordan and Guass elimination method
 
Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential
 
Data Analysis Homework Help
Data Analysis Homework HelpData Analysis Homework Help
Data Analysis Homework Help
 
220exercises2
220exercises2220exercises2
220exercises2
 
Backtracking & branch and bound
Backtracking & branch and boundBacktracking & branch and bound
Backtracking & branch and bound
 
Application of Numerical Methods (Finite Difference) in Heat Transfer
Application of Numerical Methods (Finite Difference) in Heat TransferApplication of Numerical Methods (Finite Difference) in Heat Transfer
Application of Numerical Methods (Finite Difference) in Heat Transfer
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
Perspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer SheetPerspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer Sheet
 
Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.
 
2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_faria2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_faria
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Jacobi method
Jacobi methodJacobi method
Jacobi method
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 
Hprec5 5
Hprec5 5Hprec5 5
Hprec5 5
 
Modeling biased tracers at the field level
Modeling biased tracers at the field levelModeling biased tracers at the field level
Modeling biased tracers at the field level
 
metode iterasi Gauss seidel
metode iterasi Gauss seidelmetode iterasi Gauss seidel
metode iterasi Gauss seidel
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Branch and bound technique
Branch and bound techniqueBranch and bound technique
Branch and bound technique
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1a
 

Similar to main

Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct ProblemKai Zhang
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsAjay Bidyarthy
 
20070823
2007082320070823
20070823neostar
 
New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...Alexander Litvinenko
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...Jialin LIU
 
Mat 223_Ch4-VectorSpaces.ppt
Mat 223_Ch4-VectorSpaces.pptMat 223_Ch4-VectorSpaces.ppt
Mat 223_Ch4-VectorSpaces.pptabidraufv
 
matrixMultiplication
matrixMultiplicationmatrixMultiplication
matrixMultiplicationCNP Slagle
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Sample Space And Events
Sample Space And EventsSample Space And Events
Sample Space And Eventsmathscontent
 
Nbhm m. a. and m.sc. scholarship test september 20, 2014 with answer key
Nbhm m. a. and m.sc. scholarship test september 20, 2014 with answer keyNbhm m. a. and m.sc. scholarship test september 20, 2014 with answer key
Nbhm m. a. and m.sc. scholarship test september 20, 2014 with answer keyMD Kutubuddin Sardar
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.pptashutoshvb1
 
Sparse data formats and efficient numerical methods for uncertainties in nume...
Sparse data formats and efficient numerical methods for uncertainties in nume...Sparse data formats and efficient numerical methods for uncertainties in nume...
Sparse data formats and efficient numerical methods for uncertainties in nume...Alexander Litvinenko
 

Similar to main (20)

25010001
2501000125010001
25010001
 
Rsa encryption
Rsa encryptionRsa encryption
Rsa encryption
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct Problem
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation Algorithms
 
20070823
2007082320070823
20070823
 
New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
 
Mat 223_Ch4-VectorSpaces.ppt
Mat 223_Ch4-VectorSpaces.pptMat 223_Ch4-VectorSpaces.ppt
Mat 223_Ch4-VectorSpaces.ppt
 
matrixMultiplication
matrixMultiplicationmatrixMultiplication
matrixMultiplication
 
Adaline and Madaline.ppt
Adaline and Madaline.pptAdaline and Madaline.ppt
Adaline and Madaline.ppt
 
v39i11.pdf
v39i11.pdfv39i11.pdf
v39i11.pdf
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Sample Space And Events
Sample Space And EventsSample Space And Events
Sample Space And Events
 
Sample Space And Events
Sample Space And EventsSample Space And Events
Sample Space And Events
 
Nbhm m. a. and m.sc. scholarship test september 20, 2014 with answer key
Nbhm m. a. and m.sc. scholarship test september 20, 2014 with answer keyNbhm m. a. and m.sc. scholarship test september 20, 2014 with answer key
Nbhm m. a. and m.sc. scholarship test september 20, 2014 with answer key
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.ppt
 
Mtc ssample05
Mtc ssample05Mtc ssample05
Mtc ssample05
 
Mtc ssample05
Mtc ssample05Mtc ssample05
Mtc ssample05
 
3 analysis.gtm
3 analysis.gtm3 analysis.gtm
3 analysis.gtm
 
Sparse data formats and efficient numerical methods for uncertainties in nume...
Sparse data formats and efficient numerical methods for uncertainties in nume...Sparse data formats and efficient numerical methods for uncertainties in nume...
Sparse data formats and efficient numerical methods for uncertainties in nume...
 

main

  • 1. Time, Parameters and Efficiency Analysis of the Dual-Sieving Algorithm Nikita Grishin September 3rd, 2013 Abstract This report presents some results of a time, parameters and efficiency analysis of the Dual-Sieving algorithm, currently under development by LACAL, EPFL. This algorithm solves exact SVP and CVP for a lattice of dimension n in time 20.398n using memory 20.292n . We use gprof unix tool for time-profiling, and Perl scripting language for data analysis. 1 Introduction 1.1 Short introduction to dual-sieving algorithm 1.1.1 General description As the input to the algorithm, we have a lattice L = L (0) ⊆ Rn of dimension n with a basis B = B(0) , a random vector x(0) ⊆ Rn , called target-vector and radius R0 ∈ R>0. The task is to enumerate (almost) all vectors in x(0) + L ∩ BallR(0) = C(0) . The radius R(0) = βrn n vol(L0) where β > 1 and rn is the radius of n-dimensional ball of volume 1. Assuming the Gaussian heuristic, we expect that |C(0) | ≈ βn ≈ V ol(Ball(R(0) )) V olL . The SVP(L ) can be reduced to enumerating a bounded coset 0 + L ∩ Ball(λ1), and a CVP(L , t) for t ∈ Rn can be solved by enumerating a coset -t + L ∩ Ball(dist(t, λ1)). Parameters of the algorithm are α and β. Let B(0) be an LLL-reduced basis of a lattice L (0) . We wish to enumerate all points in C(0) = x(0) +L (0) ∩Ball(R(0) ) with given x(0) and R(0) . If B(0) was quasi-orthogonal, we would just apply Schnorr-Euchner’s algorithm and obtain C(0) in time ≈ |C(0) |. Otherwise, we propose to decompose the problem until we reach a quasi-orthogonal case. Let B(1) be a basis of an overlattice L (1) ⊇ L (0) and x(1) ∈ R be a random vector. We construct C(0) by merging C(1) and C (1) , where C(1) = x(1) + L (1) ∩Ball(R(1) ) and C (1) = x(0) −x(1) +L (1) ∩Ball(R(1) ), with R0 = αR(1) . So for (y, y ) ∈ (C(1) , C (1) ) we have y + y ∈ x(0) + L (1) . The probability that y + y will be also in of the x(0) + L (0) can be estimated by vol(L (1) ) vol(L (0)) . To find optimal α and β that minimize the memory, we need that |C(0) | ≈ |C(1) | ≈ |C (1) |. Again, by the Gaussian heuristic, |C(0) | ≈ vol(Ball(R(0) )) vol(L (0)) ≈ vol(Ball(R(1) )) vol(L (1)) ⇔ vol(L (1) ) vol(L (0)) = vol(Ball(R(1) ) volBall(R(0)) ≈ 1 αn . The last approximation follows 1
  • 2. as R(0) = α ∗ R(1) . We deduce that vol(L (1) ) ≈ vol(L (0) ) αn Also we want to have a non-zero number of points inside of an intersection I = x(1) + L (1) ∩ Ball(0, R(1) ) ∩ Ball(v, R(1) ) with v ∈ C(0) which means that vol(I) vol(L (1)) ≥ 1 should hold. We can elstimate the vol(I) by (R(1) )n vol(Ball( 1 − α2 4 )) ≈ (R(0) α )n ( 1 − α2 4 )n in the asymptotic case. With the requirement vol(I) ≥ vol(L (1)) and the fact that R(0) ≈ βrn n vol(L0) we find a relation between β and α, that garantees that the intersection I of two balls is not empty: β2 (1 − α2 4 ) ≥ 1. So, after the merge between C(1) and C (1) we obtain β2n αn vectors in average in C(1) C (1) , and all these vectors are in x(0) + L (0) . To optimize the complexity of the enumeration, we need to minimize the max(β2n αn , βn ) under the condition that β2 (1 − α2 4 ) ≥ 1. From β2 (1 − α2 4 ) ≥ 1 ⇒ β2 ≥ 4 4−α2 such that we choose β2 = 4 4−α2 to minimize the size of the list. Minimizing β2n αn , we find β = 4 3 and α = 3 2 that minimize the number of vectors in C(1) C (1) . Afterwards, we choose only vectors with a norm less than or equal to R(0) . To sample all vectors in C(1) and C (1) we iterate this process k times with k being the smallest integer such that vol(L )1/n αk ≤ min b∗ i 2 with b∗ i the Gram- Schmidt vectors of L . 1.1.2 Symmetric and asymmetric merge tree, target-vector norm To optimize the speed and the memory of the algorithm, we tried two differents types of merge trees: symmetric and asymmetric. Also, we were playing with differentes norms of target-vectors. For the asymmetric merge tree, at each merge step of the algorithm, we choose a random vector x(i) ∈ Rn and we wish to enumerate vectors in the sets C(i) = x(i) + L (i) ∩ Ball(R(i) ) and C (i) = x(i−1) − x(i) + L (i) ∩ Ball(R(i) ). So at level k we obtain 2k different sets. For the symmetric merge tree, we choose a random vector only once, at step 1, x(1) , and we construct two sets C(1) = x(1) + L (1) ∩ Ball(R(1) ) and C (1) = x(0) − x(1) + L (1) ∩ Ball(R(1) ). Afterwards, at each next step of the algorithm, we take as the target-vector x(i) half the previous target-vector x(i−1) . So at level k we obtain 2 different sets. The symmetric approach allows us to gain in speed and memory, but we loose some randomness. We observed (figure 1a) that we enumerate exponentially more elements that expected if the target vector was too close to zero. In a second experiment (figure 1b) we therefore increased the norm of x(1) by a factor 2k 2
  • 3. Level |L (i) 1 |+|L (i) 2 | 2 1 156652 2 162259 3 173625 4 213778 5 406735 (a) n=50, R=1.09, α =1.12, short target-vectors Level |L (i) 1 |+|L (i) 2 | 2 1 70068.5 2 79621.5 3 89920.5 4 103614 5 155649 (b) n=50, R=1.09, α =1.12, large target-vectors Figure 1: Comparison of list length for short and large target-vector 1.2 Test and experiment of parameters and environment There are some parameters in the Dual-Sieving algorithm, namely the lattice dimension, a factor to increase the sphere radius, the number of translated lattices and the number of random bases for each lattice, as well as α and β. Experiments were done with the following parameters: • n ∈ {25, 35, 40, 45, 50, 60}, • factor to increase the sphere radius factorR ∈ {1.01, 1.02, 1.02, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1}, • number of translated lattices nbrep ∈ {1, 5}, • number of random bases per lattice nbessais ∈ {1, 3}, • the α parameter was set to its default value (α = 4 3 ) for all n, except n ∈ {50, 60}. Hardware used for experiments: • Intel(R) Xeon(R) CPU, E5430 @ 2.66GHz • 16 Gb RAM 3
  • 4. 2 Time analysis Results of the time analysis are predictable: it is exponential in the dimension and grows exponentially to the factorR. The interesting fact is that for n > 40, algorithm works better with factorR = 1.07 than with factorR = 1.06 (Figure 2). 20.00 25.00 30.00 35.00 40.00 45.00 dimension 0.00 11.08 22.16 33.24 44.33 time(mimutes) factorR=1.01 factorR=1.02 factorR=1.03 factorR=1.04 factorR=1.05 factorR=1.06 factorR=1.07 factorR=1.08 Time vs Dimension Figure 2: Dimensions vs time (in minutes). 4
  • 5. 3 Merge steps 3.1 Average and variance of the list length For low dimensions (n < 50) we used the asymmetric merge tree, α was fixed to 4 3 , β was fixed to 3 2 . Results show that depending on the level the average list length drops quickly. The reason for such loss of elements are the two conditions: at the i-th step, new vectors should • be in L (i−1) • have a norm less than or equal to R(i−1) These conditions are not satisfied for enough elements. To fix this, we can decrease the value of the α parameter: the probability that a vector v will be in both lattices is equal to 1 αn . For high dimensions (n ∈ {50, 60}), we use α ∈ {1.11, 1.12} and factorR ∈ {1.01, 1.04, 1.05, 1.06, 1.07, 1.09}, symmetric merge tree and β fixed to 4 3 . In the figure 3 and 4 you can see the average and the variance of a list length per level for low dimensions using asymmetric merge tree and short target- vectors. 0.00 1.00 2.00 3.00 4.00 Level 0.00 946.70 1893.41 2840.11 3786.82 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 List length average vs Level n=35. 0.00 1.00 2.00 3.00 4.00 Level 0.00 3177.19 6354.39 9531.58 12708.77 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 List length average vs Level n=40. 0.00 1.25 2.50 3.75 5.00 Level 0.00 11501.19 23002.38 34503.56 46004.75 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 List length average vs Level n=45. Figure 3: List length averages for low dimensions, asymmetric merge tree 0.00 1.00 2.00 3.00 4.00 Level 0.00 310.81 621.62 932.43 1243.24 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 List length variance vs Level n=35. 0.00 1.00 2.00 3.00 4.00 Level 0.00 1223.55 2447.11 3670.66 4894.21 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 List length variance vs Level n=40. 0.00 1.25 2.50 3.75 5.00 Level 0.00 3226.39 6452.78 9679.17 12905.56 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 List length variance vs Level n=45. Figure 4: List length variances for low dimensions, asymmetric merge tree In the tables 5 you find the average list length per level for high dimensions using symmetric merge tree. Note that for symmetric merge tree we have only two branches with the same number of elements inside each pair of lists, so 5
  • 6. this average was calculated as follows: |L (i) 1 |+|L (i) 2 | 2 where |L (i) j | is the number of elements inside of the list j on level i. Level |L (i) 1 |+|L (i) 2 | 2 1 1 2 179 3 3604.5 4 18927 5 78280 a) n=50, R=1.01, α =1.12, short target-vectors Level |L (i) 1 |+|L (i) 2 | 2 1 2869 2 13600 3 33993.5 4 68635.5 5 179852 b) n=50, R=1.05, α =1.12, short target-vectors Level |L (i) 1 |+|L (i) 2 | 2 1 156652 2 162259 3 173625 4 213778 5 406735 c) n=50, R=1.09, α =1.12, short target-vectors Level |L (i) 1 |+|L (i) 2 | 2 1 2384024 2 2323117 3 2143358 4 1859734 5 1502343.5 6 1260579 7 1893701 e) n=60, R=1.09, α =1.12, short target-vectors Figure 5: List length averages for high dimensions, symmetric merge tree 6
  • 7. 3.2 Duplicates in list sorting We have counted duplicates for low dimensions with an asymmetric merge tree. The number of duplicates is not stable as the size of the lists does decrease. On the figure 6 ans 7 you can see the average number and the variance of number of duplicates by level for dimensions 35, 40, and 45. 0.00 1.00 2.00 3.00 4.00 Level 0.00 3713.58 7427.17 11140.75 14854.34 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 Duplicates average vs Level n=35. 0.00 1.00 2.00 3.00 4.00 Level 0.00 17653.43 35306.87 52960.30 70613.73 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 Duplicates average vs Level n=40. 0.00 1.25 2.50 3.75 5.00 Level 0.00 83221.88 166443.75 249665.63 332887.51 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 Duplicates average vs Level n=45. Figure 6: Duplicates averages 0.00 1.00 2.00 3.00 4.00 Level 0.00 62.59 125.19 187.78 250.38 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 Duplicates variance vs Level n=35. 0.00 1.00 2.00 3.00 4.00 Level 0.00 116.03 232.06 348.08 464.11 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 Duplicates variance vs Level n=40. 0.00 1.25 2.50 3.75 5.00 Level 0.00 304.94 609.87 914.81 1219.74 factorR=01 factorR=02 factorR=03 factorR=04 factorR=05 factorR=06 factorR=07 factorR=08 Duplicates variance vs Level n=45. Figure 7: Duplicates variances 7
  • 8. 4 Percentage of short vectors found We are interested in the ratio of short vectors within all found vectors of C(0) . The short vector in this case is one of norm less than 1.1 ∗ |v|2 where v is a known somehow short vector in L . Results for low dimensions, using the asymmetric merge tree: • n = 40, α = 4 3 , factorR = 1.06, rate= 1.45% • n = 40, α = 4 3 , factorR = 1.07, rate= 1.45% • n = 40, α = 4 3 , factorR = 1.08, rate= 0.7% • n = 45, α = 4 3 , factorR = 1.07, rate= 0.57% • n = 45, α = 4 3 , factorR = 1.08, rate= 0.2% Results for high dimensions, using the symmetric merge tree: • n = 50, α = 1.12, factorR = 1.06, rate= 0.7%, found vectors: 282, number of short vectors: 2. • n = 50, α = 1.12, factorR = 1.07, rate= 0.09%, found vectors: 13237, number of short vectors: 12. • n = 50, α = 1.12, factorR = 1.08, rate= 0.015%, found vectors: 95090, number of short vectors: 14. 5 Conclusion The number of experiments that we have done is not sufficient to make any rigorous conclusion about the choice of parameters, targets or the efficiency of the dual-sieve algorithm. Further experiments need to be performed for higher dimensions, more random lattices and randomised bases. However, the experiments gave rise to some improvements such as an in- crease of the target vector norm which guarantees lists of average size. We also compared the symmetric (single choice of target and having in lower levels) with the asymmetric (new random targets in each level) merge tree. We saw that a single random target vector reduces the memory considerably while keeping enough randomness to lead solutions. 8