SlideShare a Scribd company logo
1 of 17
Download to read offline
Implementing 64-bit Maximally Equidistributed
F2-Linear Generators with Mersenne Prime Period
Shin Harase1 and Takamitsu Kimoto2
(Ritsumeikan Univ.1 and Tokyo Tech2)
March 25th, 2021
This work was supported by JSPS KAKENHI Grant Numbers JP18K18016,
JP26730015, JP26310211, JP15K13460, JP12J07985.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 1 / 17
Outline
1 Introduction
2 F2-linear pseudorandom number generators (PRNGs)
3 Mersenne Twister
4 Our main result: 64-bit MELGs
5 Conclusion
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 2 / 17
Introduction
Background: CPUs and operating systems are moving from 32 to 64 bits,
and hence it is important to have good 64-bit pseudorandom number
generators (PRNGs) designed to fully exploit these word lengths.
Requirements for PRNGs in Monte Carlo simulation:
Fast generation;
Long periodicity;
Equidistribution property (i.e., high-dimensional uniformity);
Memory efficiency.
The 32-bit Mersenne Twister (MT) MT19937 (Matsumoto–Nishimura,
1998) is one of the most widely used PRNGs, but it is not completely
optimized in terms of high-dimensional uniformity. The WELL generators
(Panneton et. al., 2006) were developed to overcome this weakness.
⇝ For 64-bit PRNGs, MT19937-64, SFMT19937 using SIMD, etc., have
been proposed, but there exists no 64-bit MT-type generator completely
optimized for high-dimensional uniformity, such as a variant of WELL.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 3 / 17
Our PRNG: MELG19937-64
Very long period 219937 − 1 ≈ 106000;
High-dimensional uniformity completely optimized;
Fast generation competitive with MT19937-64;
Memory size requiring only 312 words.
We provide PRNGs with periods from 2521 − 1 to 244497 − 1.
Reference:
S. Harase and T. Kimoto,“Implementing 64-bit maximally
equidistributed F2-linear generators with Mersenne prime period”,
ACM Trans. Math. Software 44 (2018), no. 3, Art. 30, 11 pp.
The code in C:
https://github.com/sharase/melg-64
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 4 / 17
F2-linear pseudorandom number generators (PRNGs)
Definition (F2-linear generators)
Let F2 := {0, 1} be the two-element field;
S := Fp
2: state space (p = dim(S), e.g., p = 19937);
f : S → S: F2-linear state transition function;
O := Fw
2 : set of outputs (w is the word size of the intended
machine, e.g., 32 or 64);
o : S → O : F2-linear output function.
For an initial state s0 ∈ S, at every time step, the state is changed by the
recursion
si+1 = f(si) (i = 0, 1, 2, . . .),
and the output sequence is given by o(s0), o(s1), o(s2), . . . ∈ O. We
identify O as a set of unsigned w-bit binary integers.
In this setting, we can compute some theoretical criteria, such as the
periods and dimensions of equidistribution.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 5 / 17
Theoretical criteria for F2-linear PRNGs
Definition (Maximal period)
If the output sequence o(s0), o(s1), o(s2), ... has the period of
length 2p − 1 (i.e., its maximal possible value), we say that the
F2-linear generators has the maximal period.
If p is a Mersenne exponent (i.e., 2p − 1 is a Mersenne prime), it is
easy to check the maximal period.
We assume throughout that this condition holds.
Definition (Number N1 of nonzero coefficients for P (z))
Let P (z) be the characteristic polynomial of f (p = deg P (z)).
Let N1 be the number of nonzero coefficients for P (z).
As a criterion for F2-linear PRNGs, N1 should be large enough.
This criterion ensures that the generator avoids a long-lasting impact for
poor initialization, such as 0-excess states (Panneton et al, 2006).
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 6 / 17
Theoretical criteria for F2-linear PRNGs
Let S := Fp
2 and O := Fw
2 . Let trv : O → Fv
2 be a truncation function
taking the v most significant bits from w-bit binary integers. We regard
these bits as the output with v-bit accuracy.
Definition (Dimension of equidistribution with v-bit accuracy k(v))
Assume that the initial state s0 is uniformly distributed over the state
space S. If the consecutive k-tuples
s0 ∈ S 7→ (trv(o(s0)), trv(o(f(s0))), . . . , trv(o(fk−1
(s0)))) ∈ Fkv
2
occur with the same probability, the generator is said to be
k-dimensionally equidistributed with v-bit accuracy. The largest value of
k with this property is called the dimension of equidistribution with v-bit
accuracy, denoted by k(v).
This definition means that each of 2kv possible pattern of bits occurs the
same number of times over the entire period 2p − 1 (except for the
all-zero patterns that occurs once less often).
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 7 / 17
Example: dimension of equidistribution
2-dimensional equidistribution with 2-bit accuracy
0
0.25
0.5
0.75
1
0 0.25 0.5 0.75 1
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 8 / 17
Example: dimension of equidistribution
2-dimensional equidistribution with 2-bit accuracy
0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 9 / 17
Theoretical criteria for F2-linear PRNGs
The dimension of equidistribution ensures that the output values with the
v most significant bits are uniformly distributed up to dimension k(v). As
a criterion of uniformity, larger values of k(v) for each 1 ≤ v ≤ w is
desirable.
Now, we have the upper bound
k(v) ≤ bp/vc
for each v = 1, 2, . . . , w. Define the sum of the gaps
∆ :=
w
∑
v=1
(bp/vc − k(v)).
If ∆ = 0, the generator is said to be maximally equidistributed (ME).
(cf., 32-bit Mersenne Twister MT19937 has ∆ = 6750.)
♠ The aim of our study is to design maximally equidistributed F2-linear
PRNGs with similar speed as 64-bit Mersenne Twisters.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 10 / 17
Our main result: MELG19937-64
We develop 64-bit Maximally Equidistributed F2-Linear Generator
MELG19937-64. The period is 219937 − 1, which is equivalent to
Mersenne Twister MT19937.
Comparison: we compare PRNGs for p = 19937 and w = 64.
We measure the CPU time (in seconds) taken to generate 109 64-bit
unsigned integers.
Generators N1 ∆ CPU time 1 CPU time 2
MELG19937-64 9603 0 4.2123 6.2920
MT19937-64 285 7820 5.1002 6.6490
Platforms (64-bit CPUs and OSs):
CPU time 1: Intel Core i7-3770 (3.40GHz) Linux gcc compiler with
-O3
CPU time 2: AMD Phenom II X6 1045T (2.70 GHz) Linux gcc
compiler with -O3
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 11 / 17
Mersenne Twister (Matsumoto–Nishimura, 1998)
Use an incomplete array as a state space S (by discarding r bits);
Use an output function T with a single memory reference.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 12 / 17
Mersenne Twister (Matsumoto–Nishimura, 1998)
The state transition (ww−r
i , wi+1, wi+2, . . . , wi+N−1) ∈ S
f
7→
(ww−r
i+1 , wi+2, wi+3, . . . , wi+N ) ∈ S is implemented as
wi+N := wi+M ⊕ (ww−r
i | wr
i+1)A,
where w := (w0, . . . , ww−1), a := (a0, . . . , aw−1) ∈ Fw
2 and
wA :=
{
(w  1) if ww−1 = 0,
(w  1) ⊕ a if ww−1 = 1.
The output function (ww−r
i+1 , wi+2, . . . , wi+N ) ∈ S
o
7→ wi+N T ∈ O is
z ← wN−1 ⊕ (wN−1  u),
z ← z ⊕ ((z  s)  b),
z ← z ⊕ ((z  t)  c),
wN−1T ← z ⊕ ((z  l).
Nishimura (2000) searched for specific parameters for w = 64.
♠ Note that f : S → S is simple but o : S → O is rather complicated.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 13 / 17
Our PRNGs: 64-bit MELGs
Use an incomplete array as a state space (by discarding r bits);
Use an extra state variable v0 and make a double feedback (Panneton
et. al., 2006).
Use an output function with several memory references (Harase,
2009).
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 14 / 17
Our PRNGs: 64-bit MELGs
The state transition (ww−r
i , wi+1, wi+2, . . . , wi+N−2, vi) ∈ S
f
7→
(ww−r
i+1 , wi+2, wi+3, . . . , wi+N−1, vi+1) ∈ S is implemented as
vi+1 := (ww−r
i | wr
i+1)A ⊕ wi+M ⊕ viB,
wi+N−1 := (ww−r
i | wr
i+1) ⊕ vi+1C.
Here, w := (w0, . . . , ww−1), a := (a0, . . . , aw−1) ∈ Fw
2 and
wA :=
{
(w  1) if ww−1 = 0,
(w  1) ⊕ a if ww−1 = 1,
wB := w ⊕ (w  b),
wC := w ⊕ (w  c).
The output function o : S → O is defined by
(wr−w
i+1 , wi+2, . . . , wi+N−1, vi+1) ∈ S
o
7→ wi+N−1T1⊕wi+LT2 ∈ O,
wT1 ← w ⊕ (w  t), wT2 ← (w  b).
♠ We shift the balance of costs in f and o without loss of speed.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 15 / 17
Comparison to SFMT19937-64
SIMD-oriented Fast Mersenne Twister (SFMT) generators (Saito and
Matsumoto, 2008) have a function to generate 64-bit unsigned integers.
Thus, we compare 64-bit PRNGs with 64-bit integer output sequences of
SFMT19937, denoted by SFMT19937-64.
N1, ∆ and CPU time (in seconds) taken to generate 109
64-bit unsigned integers
Generators N1 ∆ CPU time 1 CPU time 2
MELG19937-64 9603 0 4.2123 6.2920
MT19937-64 285 7820 5.1002 6.6490
MT19937-64 (ID3: 5-term) 5795 7940 4.8993 6.7930
SFMT19937-64 (without SIMD) 6711 14095 4.2654 5.6123
SFMT19937-64 (with SIMD) 6711 14095 1.8457 2.8806
Platforms (64-bit CPUs and OSs):
CPU time 1: Intel Core i7-3770 (3.40GHz) Linux gcc compiler with
-O3
CPU time 2: AMD Phenom II X6 1045T (2.70 GHz) Linux gcc
compiler with -O3
SFMT19937 is very fast but ∆ for SFMT19937 is large.
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 16 / 17
Conclusion
We design 64-bit maximally equidistributed F2-linear PRNGs and
searched for specific parameters with period lengths from 2607 − 1 to
244497 − 1.
The high-dimensional uniformity is completely optimized and the
generation speed is still competitive with 64-bit Mersenne Twisters
(Nishimura, 2000) on some platforms.
We also implement a jump ahead algorithm to obtain disjoint streams
in parallel computing. (The default skip size is 2256.)
The design of PRNGs is a trade-off between speed and quality. Our
generators offer both high performance and computational efficiency.
Reference:
S. Harase and T. Kimoto,“Implementing 64-bit maximally
equidistributed F2-linear generators with Mersenne prime period”,
ACM Trans. Math. Software 44 (2018), no. 3, Art. 30, 11 pp.
The code in C:
https://github.com/sharase/melg-64
S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 17 / 17

More Related Content

What's hot

Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wed
Vin Voro
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
Arvind Devaraj
 
Analysis of multiple groove guide
Analysis of multiple groove guideAnalysis of multiple groove guide
Analysis of multiple groove guide
Yong Heui Cho
 
Isi and nyquist criterion
Isi and nyquist criterionIsi and nyquist criterion
Isi and nyquist criterion
srkrishna341
 
conference_poster_4
conference_poster_4conference_poster_4
conference_poster_4
Jiayi Jiang
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dft
mikeproud
 

What's hot (20)

Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wed
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
Fdtd
FdtdFdtd
Fdtd
 
Lecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard TransformLecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard Transform
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
Digital Communication
Digital CommunicationDigital Communication
Digital Communication
 
Analysis of multiple groove guide
Analysis of multiple groove guideAnalysis of multiple groove guide
Analysis of multiple groove guide
 
Isi and nyquist criterion
Isi and nyquist criterionIsi and nyquist criterion
Isi and nyquist criterion
 
Fourier series and applications of fourier transform
Fourier series and applications of fourier transformFourier series and applications of fourier transform
Fourier series and applications of fourier transform
 
Fourier supplementals
Fourier supplementalsFourier supplementals
Fourier supplementals
 
Presentation on fourier transformation
Presentation on fourier transformationPresentation on fourier transformation
Presentation on fourier transformation
 
conference_poster_4
conference_poster_4conference_poster_4
conference_poster_4
 
Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dft
 
Matched filter
Matched filterMatched filter
Matched filter
 
Properties of Fourier transform
Properties of Fourier transformProperties of Fourier transform
Properties of Fourier transform
 
EH1 - Reduced-order modelling for vibration energy harvesting
EH1 - Reduced-order modelling for vibration energy harvestingEH1 - Reduced-order modelling for vibration energy harvesting
EH1 - Reduced-order modelling for vibration energy harvesting
 
upgrade2013
upgrade2013upgrade2013
upgrade2013
 

Similar to Implementing 64-bit Maximally Equidistributed F2-Linear Generators with Mersenne Prime Period

Tpr star tree
Tpr star treeTpr star tree
Tpr star tree
Win Yu
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
NithyasriA2
 
Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...
Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...
Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...
Marisa Paryasto
 
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and AnalysisAll Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
Inderjeet Singh
 
Implementation performance analysis of cordic
Implementation performance analysis of cordicImplementation performance analysis of cordic
Implementation performance analysis of cordic
iaemedu
 

Similar to Implementing 64-bit Maximally Equidistributed F2-Linear Generators with Mersenne Prime Period (20)

Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Tpr star tree
Tpr star treeTpr star tree
Tpr star tree
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
VHDL Design and FPGA Implementation of a High Data Rate Turbo Decoder based o...
VHDL Design and FPGA Implementation of a High Data Rate Turbo Decoder based o...VHDL Design and FPGA Implementation of a High Data Rate Turbo Decoder based o...
VHDL Design and FPGA Implementation of a High Data Rate Turbo Decoder based o...
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
 
Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System
 
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
 
21cm cosmology with ML
21cm cosmology with ML21cm cosmology with ML
21cm cosmology with ML
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
icwet1097
icwet1097icwet1097
icwet1097
 
Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...
Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...
Composite Field Multiplier based on Look-Up Table for Elliptic Curve Cryptogr...
 
Hardware implementation of the serpent block cipher using fpga technology
Hardware implementation of the serpent block cipher using fpga technologyHardware implementation of the serpent block cipher using fpga technology
Hardware implementation of the serpent block cipher using fpga technology
 
5.1.3. Chord.pptx
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptx
 
Reed solomon explained v1 0
Reed solomon explained v1 0Reed solomon explained v1 0
Reed solomon explained v1 0
 
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and AnalysisAll Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
 
It3416071612
It3416071612It3416071612
It3416071612
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check
 
Implementation performance analysis of cordic
Implementation performance analysis of cordicImplementation performance analysis of cordic
Implementation performance analysis of cordic
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 

Recently uploaded (20)

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 

Implementing 64-bit Maximally Equidistributed F2-Linear Generators with Mersenne Prime Period

  • 1. Implementing 64-bit Maximally Equidistributed F2-Linear Generators with Mersenne Prime Period Shin Harase1 and Takamitsu Kimoto2 (Ritsumeikan Univ.1 and Tokyo Tech2) March 25th, 2021 This work was supported by JSPS KAKENHI Grant Numbers JP18K18016, JP26730015, JP26310211, JP15K13460, JP12J07985. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 1 / 17
  • 2. Outline 1 Introduction 2 F2-linear pseudorandom number generators (PRNGs) 3 Mersenne Twister 4 Our main result: 64-bit MELGs 5 Conclusion S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 2 / 17
  • 3. Introduction Background: CPUs and operating systems are moving from 32 to 64 bits, and hence it is important to have good 64-bit pseudorandom number generators (PRNGs) designed to fully exploit these word lengths. Requirements for PRNGs in Monte Carlo simulation: Fast generation; Long periodicity; Equidistribution property (i.e., high-dimensional uniformity); Memory efficiency. The 32-bit Mersenne Twister (MT) MT19937 (Matsumoto–Nishimura, 1998) is one of the most widely used PRNGs, but it is not completely optimized in terms of high-dimensional uniformity. The WELL generators (Panneton et. al., 2006) were developed to overcome this weakness. ⇝ For 64-bit PRNGs, MT19937-64, SFMT19937 using SIMD, etc., have been proposed, but there exists no 64-bit MT-type generator completely optimized for high-dimensional uniformity, such as a variant of WELL. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 3 / 17
  • 4. Our PRNG: MELG19937-64 Very long period 219937 − 1 ≈ 106000; High-dimensional uniformity completely optimized; Fast generation competitive with MT19937-64; Memory size requiring only 312 words. We provide PRNGs with periods from 2521 − 1 to 244497 − 1. Reference: S. Harase and T. Kimoto,“Implementing 64-bit maximally equidistributed F2-linear generators with Mersenne prime period”, ACM Trans. Math. Software 44 (2018), no. 3, Art. 30, 11 pp. The code in C: https://github.com/sharase/melg-64 S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 4 / 17
  • 5. F2-linear pseudorandom number generators (PRNGs) Definition (F2-linear generators) Let F2 := {0, 1} be the two-element field; S := Fp 2: state space (p = dim(S), e.g., p = 19937); f : S → S: F2-linear state transition function; O := Fw 2 : set of outputs (w is the word size of the intended machine, e.g., 32 or 64); o : S → O : F2-linear output function. For an initial state s0 ∈ S, at every time step, the state is changed by the recursion si+1 = f(si) (i = 0, 1, 2, . . .), and the output sequence is given by o(s0), o(s1), o(s2), . . . ∈ O. We identify O as a set of unsigned w-bit binary integers. In this setting, we can compute some theoretical criteria, such as the periods and dimensions of equidistribution. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 5 / 17
  • 6. Theoretical criteria for F2-linear PRNGs Definition (Maximal period) If the output sequence o(s0), o(s1), o(s2), ... has the period of length 2p − 1 (i.e., its maximal possible value), we say that the F2-linear generators has the maximal period. If p is a Mersenne exponent (i.e., 2p − 1 is a Mersenne prime), it is easy to check the maximal period. We assume throughout that this condition holds. Definition (Number N1 of nonzero coefficients for P (z)) Let P (z) be the characteristic polynomial of f (p = deg P (z)). Let N1 be the number of nonzero coefficients for P (z). As a criterion for F2-linear PRNGs, N1 should be large enough. This criterion ensures that the generator avoids a long-lasting impact for poor initialization, such as 0-excess states (Panneton et al, 2006). S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 6 / 17
  • 7. Theoretical criteria for F2-linear PRNGs Let S := Fp 2 and O := Fw 2 . Let trv : O → Fv 2 be a truncation function taking the v most significant bits from w-bit binary integers. We regard these bits as the output with v-bit accuracy. Definition (Dimension of equidistribution with v-bit accuracy k(v)) Assume that the initial state s0 is uniformly distributed over the state space S. If the consecutive k-tuples s0 ∈ S 7→ (trv(o(s0)), trv(o(f(s0))), . . . , trv(o(fk−1 (s0)))) ∈ Fkv 2 occur with the same probability, the generator is said to be k-dimensionally equidistributed with v-bit accuracy. The largest value of k with this property is called the dimension of equidistribution with v-bit accuracy, denoted by k(v). This definition means that each of 2kv possible pattern of bits occurs the same number of times over the entire period 2p − 1 (except for the all-zero patterns that occurs once less often). S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 7 / 17
  • 8. Example: dimension of equidistribution 2-dimensional equidistribution with 2-bit accuracy 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 8 / 17
  • 9. Example: dimension of equidistribution 2-dimensional equidistribution with 2-bit accuracy 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 9 / 17
  • 10. Theoretical criteria for F2-linear PRNGs The dimension of equidistribution ensures that the output values with the v most significant bits are uniformly distributed up to dimension k(v). As a criterion of uniformity, larger values of k(v) for each 1 ≤ v ≤ w is desirable. Now, we have the upper bound k(v) ≤ bp/vc for each v = 1, 2, . . . , w. Define the sum of the gaps ∆ := w ∑ v=1 (bp/vc − k(v)). If ∆ = 0, the generator is said to be maximally equidistributed (ME). (cf., 32-bit Mersenne Twister MT19937 has ∆ = 6750.) ♠ The aim of our study is to design maximally equidistributed F2-linear PRNGs with similar speed as 64-bit Mersenne Twisters. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 10 / 17
  • 11. Our main result: MELG19937-64 We develop 64-bit Maximally Equidistributed F2-Linear Generator MELG19937-64. The period is 219937 − 1, which is equivalent to Mersenne Twister MT19937. Comparison: we compare PRNGs for p = 19937 and w = 64. We measure the CPU time (in seconds) taken to generate 109 64-bit unsigned integers. Generators N1 ∆ CPU time 1 CPU time 2 MELG19937-64 9603 0 4.2123 6.2920 MT19937-64 285 7820 5.1002 6.6490 Platforms (64-bit CPUs and OSs): CPU time 1: Intel Core i7-3770 (3.40GHz) Linux gcc compiler with -O3 CPU time 2: AMD Phenom II X6 1045T (2.70 GHz) Linux gcc compiler with -O3 S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 11 / 17
  • 12. Mersenne Twister (Matsumoto–Nishimura, 1998) Use an incomplete array as a state space S (by discarding r bits); Use an output function T with a single memory reference. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 12 / 17
  • 13. Mersenne Twister (Matsumoto–Nishimura, 1998) The state transition (ww−r i , wi+1, wi+2, . . . , wi+N−1) ∈ S f 7→ (ww−r i+1 , wi+2, wi+3, . . . , wi+N ) ∈ S is implemented as wi+N := wi+M ⊕ (ww−r i | wr i+1)A, where w := (w0, . . . , ww−1), a := (a0, . . . , aw−1) ∈ Fw 2 and wA := { (w 1) if ww−1 = 0, (w 1) ⊕ a if ww−1 = 1. The output function (ww−r i+1 , wi+2, . . . , wi+N ) ∈ S o 7→ wi+N T ∈ O is z ← wN−1 ⊕ (wN−1 u), z ← z ⊕ ((z s) b), z ← z ⊕ ((z t) c), wN−1T ← z ⊕ ((z l). Nishimura (2000) searched for specific parameters for w = 64. ♠ Note that f : S → S is simple but o : S → O is rather complicated. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 13 / 17
  • 14. Our PRNGs: 64-bit MELGs Use an incomplete array as a state space (by discarding r bits); Use an extra state variable v0 and make a double feedback (Panneton et. al., 2006). Use an output function with several memory references (Harase, 2009). S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 14 / 17
  • 15. Our PRNGs: 64-bit MELGs The state transition (ww−r i , wi+1, wi+2, . . . , wi+N−2, vi) ∈ S f 7→ (ww−r i+1 , wi+2, wi+3, . . . , wi+N−1, vi+1) ∈ S is implemented as vi+1 := (ww−r i | wr i+1)A ⊕ wi+M ⊕ viB, wi+N−1 := (ww−r i | wr i+1) ⊕ vi+1C. Here, w := (w0, . . . , ww−1), a := (a0, . . . , aw−1) ∈ Fw 2 and wA := { (w 1) if ww−1 = 0, (w 1) ⊕ a if ww−1 = 1, wB := w ⊕ (w b), wC := w ⊕ (w c). The output function o : S → O is defined by (wr−w i+1 , wi+2, . . . , wi+N−1, vi+1) ∈ S o 7→ wi+N−1T1⊕wi+LT2 ∈ O, wT1 ← w ⊕ (w t), wT2 ← (w b). ♠ We shift the balance of costs in f and o without loss of speed. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 15 / 17
  • 16. Comparison to SFMT19937-64 SIMD-oriented Fast Mersenne Twister (SFMT) generators (Saito and Matsumoto, 2008) have a function to generate 64-bit unsigned integers. Thus, we compare 64-bit PRNGs with 64-bit integer output sequences of SFMT19937, denoted by SFMT19937-64. N1, ∆ and CPU time (in seconds) taken to generate 109 64-bit unsigned integers Generators N1 ∆ CPU time 1 CPU time 2 MELG19937-64 9603 0 4.2123 6.2920 MT19937-64 285 7820 5.1002 6.6490 MT19937-64 (ID3: 5-term) 5795 7940 4.8993 6.7930 SFMT19937-64 (without SIMD) 6711 14095 4.2654 5.6123 SFMT19937-64 (with SIMD) 6711 14095 1.8457 2.8806 Platforms (64-bit CPUs and OSs): CPU time 1: Intel Core i7-3770 (3.40GHz) Linux gcc compiler with -O3 CPU time 2: AMD Phenom II X6 1045T (2.70 GHz) Linux gcc compiler with -O3 SFMT19937 is very fast but ∆ for SFMT19937 is large. S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 16 / 17
  • 17. Conclusion We design 64-bit maximally equidistributed F2-linear PRNGs and searched for specific parameters with period lengths from 2607 − 1 to 244497 − 1. The high-dimensional uniformity is completely optimized and the generation speed is still competitive with 64-bit Mersenne Twisters (Nishimura, 2000) on some platforms. We also implement a jump ahead algorithm to obtain disjoint streams in parallel computing. (The default skip size is 2256.) The design of PRNGs is a trade-off between speed and quality. Our generators offer both high performance and computational efficiency. Reference: S. Harase and T. Kimoto,“Implementing 64-bit maximally equidistributed F2-linear generators with Mersenne prime period”, ACM Trans. Math. Software 44 (2018), no. 3, Art. 30, 11 pp. The code in C: https://github.com/sharase/melg-64 S. Harase and T. Kimoto (Rits and TIT) Implementing 64-bit MELGs March 25th, 2021 17 / 17