Presentation slides for 64-bit maximally equidistributed F2-linear pseudorandom number generators MELG-64.
Article:
S. Harase and T. Kimoto, "Implementing 64-bit maximally equidistributed F2-linear generators with Mersenne prime period", ACM Transactions on Mathematical Software, Volume 44, Issue 3, April 2018, Article No. 30, 11 pp.
The code in C:
https://github.com/sharase/melg-64
Implementing 64-bit Maximally Equidistributed F2-Linear Generators with Mersenne Prime Period
1. Implementing 64-bit Maximally Equidistributed
F2-Linear Generators with Mersenne Prime Period
Shin Harase1 and Takamitsu Kimoto2
(Ritsumeikan Univ.1 and Tokyo Tech2)
March 25th, 2021
This work was supported by JSPS KAKENHI Grant Numbers JP18K18016,
JP26730015, JP26310211, JP15K13460, JP12J07985.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 1 / 17
2. Outline
1 Introduction
2 F2-linear pseudorandom number generators (PRNGs)
3 Mersenne Twister
4 Our main result: 64-bit MELGs
5 Conclusion
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 2 / 17
3. Introduction
Background: CPUs and operating systems are moving from 32 to 64 bits,
and hence it is important to have good 64-bit pseudorandom number
generators (PRNGs) designed to fully exploit these word lengths.
Requirements for PRNGs in Monte Carlo simulation:
Fast generation;
Long periodicity;
Equidistribution property (i.e., high-dimensional uniformity);
Memory efficiency.
The 32-bit Mersenne Twister (MT) MT19937 (Matsumoto–Nishimura,
1998) is one of the most widely used PRNGs, but it is not completely
optimized in terms of high-dimensional uniformity. The WELL generators
(Panneton et. al., 2006) were developed to overcome this weakness.
⇝ For 64-bit PRNGs, MT19937-64, SFMT19937 using SIMD, etc., have
been proposed, but there exists no 64-bit MT-type generator completely
optimized for high-dimensional uniformity, such as a variant of WELL.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 3 / 17
4. Our PRNG: MELG19937-64
Very long period 219937 − 1 ≈ 106000;
High-dimensional uniformity completely optimized;
Fast generation competitive with MT19937-64;
Memory size requiring only 312 words.
We provide PRNGs with periods from 2521 − 1 to 244497 − 1.
Reference:
S. Harase and T. Kimoto,“Implementing 64-bit maximally
equidistributed F2-linear generators with Mersenne prime period”,
ACM Trans. Math. Software 44 (2018), no. 3, Art. 30, 11 pp.
The code in C:
https://github.com/sharase/melg-64
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 4 / 17
5. F2-linear pseudorandom number generators (PRNGs)
Definition (F2-linear generators)
Let F2 := {0, 1} be the two-element field;
S := Fp
2: state space (p = dim(S), e.g., p = 19937);
f : S → S: F2-linear state transition function;
O := Fw
2 : set of outputs (w is the word size of the intended
machine, e.g., 32 or 64);
o : S → O : F2-linear output function.
For an initial state s0 ∈ S, at every time step, the state is changed by the
recursion
si+1 = f(si) (i = 0, 1, 2, . . .),
and the output sequence is given by o(s0), o(s1), o(s2), . . . ∈ O. We
identify O as a set of unsigned w-bit binary integers.
In this setting, we can compute some theoretical criteria, such as the
periods and dimensions of equidistribution.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 5 / 17
6. Theoretical criteria for F2-linear PRNGs
Definition (Maximal period)
If the output sequence o(s0), o(s1), o(s2), ... has the period of
length 2p − 1 (i.e., its maximal possible value), we say that the
F2-linear generators has the maximal period.
If p is a Mersenne exponent (i.e., 2p − 1 is a Mersenne prime), it is
easy to check the maximal period.
We assume throughout that this condition holds.
Definition (Number N1 of nonzero coefficients for P (z))
Let P (z) be the characteristic polynomial of f (p = deg P (z)).
Let N1 be the number of nonzero coefficients for P (z).
As a criterion for F2-linear PRNGs, N1 should be large enough.
This criterion ensures that the generator avoids a long-lasting impact for
poor initialization, such as 0-excess states (Panneton et al, 2006).
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 6 / 17
7. Theoretical criteria for F2-linear PRNGs
Let S := Fp
2 and O := Fw
2 . Let trv : O → Fv
2 be a truncation function
taking the v most significant bits from w-bit binary integers. We regard
these bits as the output with v-bit accuracy.
Definition (Dimension of equidistribution with v-bit accuracy k(v))
Assume that the initial state s0 is uniformly distributed over the state
space S. If the consecutive k-tuples
s0 ∈ S 7→ (trv(o(s0)), trv(o(f(s0))), . . . , trv(o(fk−1
(s0)))) ∈ Fkv
2
occur with the same probability, the generator is said to be
k-dimensionally equidistributed with v-bit accuracy. The largest value of
k with this property is called the dimension of equidistribution with v-bit
accuracy, denoted by k(v).
This definition means that each of 2kv possible pattern of bits occurs the
same number of times over the entire period 2p − 1 (except for the
all-zero patterns that occurs once less often).
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 7 / 17
8. Example: dimension of equidistribution
2-dimensional equidistribution with 2-bit accuracy
0
0.25
0.5
0.75
1
0 0.25 0.5 0.75 1
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 8 / 17
9. Example: dimension of equidistribution
2-dimensional equidistribution with 2-bit accuracy
0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 9 / 17
10. Theoretical criteria for F2-linear PRNGs
The dimension of equidistribution ensures that the output values with the
v most significant bits are uniformly distributed up to dimension k(v). As
a criterion of uniformity, larger values of k(v) for each 1 ≤ v ≤ w is
desirable.
Now, we have the upper bound
k(v) ≤ bp/vc
for each v = 1, 2, . . . , w. Define the sum of the gaps
∆ :=
w
∑
v=1
(bp/vc − k(v)).
If ∆ = 0, the generator is said to be maximally equidistributed (ME).
(cf., 32-bit Mersenne Twister MT19937 has ∆ = 6750.)
♠ The aim of our study is to design maximally equidistributed F2-linear
PRNGs with similar speed as 64-bit Mersenne Twisters.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 10 / 17
11. Our main result: MELG19937-64
We develop 64-bit Maximally Equidistributed F2-Linear Generator
MELG19937-64. The period is 219937 − 1, which is equivalent to
Mersenne Twister MT19937.
Comparison: we compare PRNGs for p = 19937 and w = 64.
We measure the CPU time (in seconds) taken to generate 109 64-bit
unsigned integers.
Generators N1 ∆ CPU time 1 CPU time 2
MELG19937-64 9603 0 4.2123 6.2920
MT19937-64 285 7820 5.1002 6.6490
Platforms (64-bit CPUs and OSs):
CPU time 1: Intel Core i7-3770 (3.40GHz) Linux gcc compiler with
-O3
CPU time 2: AMD Phenom II X6 1045T (2.70 GHz) Linux gcc
compiler with -O3
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 11 / 17
12. Mersenne Twister (Matsumoto–Nishimura, 1998)
Use an incomplete array as a state space S (by discarding r bits);
Use an output function T with a single memory reference.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 12 / 17
13. Mersenne Twister (Matsumoto–Nishimura, 1998)
The state transition (ww−r
i , wi+1, wi+2, . . . , wi+N−1) ∈ S
f
7→
(ww−r
i+1 , wi+2, wi+3, . . . , wi+N ) ∈ S is implemented as
wi+N := wi+M ⊕ (ww−r
i | wr
i+1)A,
where w := (w0, . . . , ww−1), a := (a0, . . . , aw−1) ∈ Fw
2 and
wA :=
{
(w 1) if ww−1 = 0,
(w 1) ⊕ a if ww−1 = 1.
The output function (ww−r
i+1 , wi+2, . . . , wi+N ) ∈ S
o
7→ wi+N T ∈ O is
z ← wN−1 ⊕ (wN−1 u),
z ← z ⊕ ((z s) b),
z ← z ⊕ ((z t) c),
wN−1T ← z ⊕ ((z l).
Nishimura (2000) searched for specific parameters for w = 64.
♠ Note that f : S → S is simple but o : S → O is rather complicated.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 13 / 17
14. Our PRNGs: 64-bit MELGs
Use an incomplete array as a state space (by discarding r bits);
Use an extra state variable v0 and make a double feedback (Panneton
et. al., 2006).
Use an output function with several memory references (Harase,
2009).
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 14 / 17
15. Our PRNGs: 64-bit MELGs
The state transition (ww−r
i , wi+1, wi+2, . . . , wi+N−2, vi) ∈ S
f
7→
(ww−r
i+1 , wi+2, wi+3, . . . , wi+N−1, vi+1) ∈ S is implemented as
vi+1 := (ww−r
i | wr
i+1)A ⊕ wi+M ⊕ viB,
wi+N−1 := (ww−r
i | wr
i+1) ⊕ vi+1C.
Here, w := (w0, . . . , ww−1), a := (a0, . . . , aw−1) ∈ Fw
2 and
wA :=
{
(w 1) if ww−1 = 0,
(w 1) ⊕ a if ww−1 = 1,
wB := w ⊕ (w b),
wC := w ⊕ (w c).
The output function o : S → O is defined by
(wr−w
i+1 , wi+2, . . . , wi+N−1, vi+1) ∈ S
o
7→ wi+N−1T1⊕wi+LT2 ∈ O,
wT1 ← w ⊕ (w t), wT2 ← (w b).
♠ We shift the balance of costs in f and o without loss of speed.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 15 / 17
16. Comparison to SFMT19937-64
SIMD-oriented Fast Mersenne Twister (SFMT) generators (Saito and
Matsumoto, 2008) have a function to generate 64-bit unsigned integers.
Thus, we compare 64-bit PRNGs with 64-bit integer output sequences of
SFMT19937, denoted by SFMT19937-64.
N1, ∆ and CPU time (in seconds) taken to generate 109
64-bit unsigned integers
Generators N1 ∆ CPU time 1 CPU time 2
MELG19937-64 9603 0 4.2123 6.2920
MT19937-64 285 7820 5.1002 6.6490
MT19937-64 (ID3: 5-term) 5795 7940 4.8993 6.7930
SFMT19937-64 (without SIMD) 6711 14095 4.2654 5.6123
SFMT19937-64 (with SIMD) 6711 14095 1.8457 2.8806
Platforms (64-bit CPUs and OSs):
CPU time 1: Intel Core i7-3770 (3.40GHz) Linux gcc compiler with
-O3
CPU time 2: AMD Phenom II X6 1045T (2.70 GHz) Linux gcc
compiler with -O3
SFMT19937 is very fast but ∆ for SFMT19937 is large.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 16 / 17
17. Conclusion
We design 64-bit maximally equidistributed F2-linear PRNGs and
searched for specific parameters with period lengths from 2607 − 1 to
244497 − 1.
The high-dimensional uniformity is completely optimized and the
generation speed is still competitive with 64-bit Mersenne Twisters
(Nishimura, 2000) on some platforms.
We also implement a jump ahead algorithm to obtain disjoint streams
in parallel computing. (The default skip size is 2256.)
The design of PRNGs is a trade-off between speed and quality. Our
generators offer both high performance and computational efficiency.
Reference:
S. Harase and T. Kimoto,“Implementing 64-bit maximally
equidistributed F2-linear generators with Mersenne prime period”,
ACM Trans. Math. Software 44 (2018), no. 3, Art. 30, 11 pp.
The code in C:
https://github.com/sharase/melg-64
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 17 / 17