6. Serial RNG: LCG
β Linear-congruential (LCG)
β ππ = π β ππβ1 + π πππ π,
β a, c and M must be chosen carefully!
β Never choose π = 231
! Should be a prime
β Park & Miller: π = 16807, π = 214748647 =
231 β 1. π is a Mersenne prime!
β Most likely in your C runtime
7. LCG: the good and bad
β Good:
β Simple and efficient even if we use mod
β Single word of state
β Bad:
β Short period β at most m
β Low-bits are correlated especially if π = 2 π
β Pure serial
9. Mersenne Prime modulo
β IDIV can be 40~80 cycles for 32b/32b
β π πππ π where π = 2 π β 1:
β π = π & π + π β« π ;
β πππ‘ π β₯ π ? π β π βΆ π;
10. Lagged-Fibonacci Generator
β ππ = ππβπ β ππβπ; p and q are the lags
β β is =-* mod M (or XOR);
β ALFG: π π = π πβπ + π πβπ(πππ 2 π)
β * give best quality
β Period = 2 π β 1 2 πβ3; π = 2 π
11. LFG
β The good:
βVery efficient: 2 ops + power-of-2 mod
βMuch Long period than LCG;
βDirectly works in floats
βHigher quality than LCG
βALFG can skip ahead
12. LFG β the bad
β Need to store max(p,q) floats
β Pure sequential β
β multiplicative LFG canβt jump ahead.
13. Mersenne Twister
β Gold standard ?
β Large state (624 ints)
β Lots of flops
β Hard to leapfrog
β Limited parallelism
power spectrum
15. Parallel RNG
β Maintain the RNGβs quality
β Same result regardless of the # of cores
β Minimal state especially for gpu.
β Minimal correlation among the streams.
16. Random Tree
β’ 2 LCGs with different π
β’ L used to generate a
seed for R
β’ No need to know how
many generators or # of
values #s per-thread
β’ GG
17. Leapfrog with 3 cores
β’ Each thread leaps
ahead by π using L
β’ Each thread use its
own R to generate its
own sequence
β’ π = πππππ β π πππππππππ
18. Leapfrog
β basic LCG without c:
β πΏ π+1 = ππΏ π πππ π
β π π+1 = π π π π πππ π
β LCG: π΄ = π πand πΆ = π(π π β 1)/(π β 1) β
each core jumps ahead by n (# of cores)
19. Leapfrog with 3 cores
β’ Each sequence will
not overlap
β’ Final sequence is the
same as the serial
code
20. Leapfrog β the good
β Same sequence as serial code
β Limited choice of RNG (e.g. no MLFG)
β No need to fix the # of random values used
per core (need to fix βnβ)
21. Leapfrog β the bad
β π πno longer have the good qualities of π
β power-of-2 N produce correlated sub-
sequences
β Need to fix βnβ - # of generators/sequences
β the period of the original RNG is shorten by a
factor of βnβ. 32 bit LCG has a short period to
start with.
22. Sequence Splitting
β’ If we know the # of
values per thread π
β’ πΏ π+1 = π π
πΏ π πππ π
β’ π π+1 = ππ π πππ π
β’ the sequence is a subset
of the serial code
23. Leapfrog and Splitting
β Only guarantees the sequences are non-
overlap; nothing about its quality
β Not invariant to degree of parallelism
β Result change when # cores change
β Serial and parallel code does not match
24. Lagged-Fibonacci Leapfrog
β LFG has very long period
β Period = 2 π β 1 2 πβ3; π = 2 π
β π can be power-of-two!
β Much better quality than LCG
β No leapfrog for the best variant β β*β
β Luckily the ALFG supports leapfrogging
25. Issues with Leapfrog & Splitting
β LCGβs period get even shorter
β Questionable quality
β ALFG is much better but have to store
more state β for the βlagβ.
27. Core Idea
1. input trivially prepared
in parallel, e.g. linear
ramp
2. feed input value into
hash, independently
and in parallel
3. output white noise
hash
input
output
28. TEA
β A Feistel coder
β Input is split into L
and R
β 128B key
β F: shift and XORs or
adds
33. References
β [Mascagni 99] Some Methods for Parallel Pseudorandom Number Generation, 1999.
β [Park & Miller 88] Random Number Generators: Good Ones are hard to Find, CACM, 1988.
β [Pryor 94] Implementation of a Portable and Reproducible Parallel Pseudorandom Number
Generator, SC, 1994
β [Tzeng & Li 08] Parallel White Noise Generation on a GPU via Cryptographic Hash, I3D, 2008
β [Wheeler 95] TEA, a tiny encryption algorithm, 1995.
34. Take Aways
β Look beyond LCG
β ALFG is worth a closer look
β Crypto-based hash is most promising β
especially TEA.