Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Parallel Random
Generator
Manny Ko
Principal Engineer
Activision
Outline
●Serial RNG
●Background
●LCG, LFG, crypto-hash
●Parallel RNG
●Leapfrog, splitting, crypto-hash
RNG - desiderata
● White noise like
● Repeatable for any # of cores
● Fast
● Small storage
RNG Quality
● DIEHARD
● Spectral test
● SmallCrush
● BigCrush
GPUBBS
Power Spectrum
Power spectrum density Radial Mean Radial Variance
Serial RNG: LCG
● Linear-congruential (LCG)
● 𝑋𝑖 = 𝑎 ∗ 𝑋𝑖−1 + 𝑐 𝑚𝑜𝑑 𝑀,
● a, c and M must be chosen carefully!
● Never choo...
LCG: the good and bad
● Good:
● Simple and efficient even if we use mod
● Single word of state
● Bad:
● Short period – at ...
LCG - bad
● 𝑋 𝑘_+1 = (3 ∗ 𝑋 𝑘+4) 𝑚𝑜𝑑 8
● {1,7,1,7, … }
Mersenne Prime modulo
● IDIV can be 40~80 cycles for 32b/32b
● 𝑘 𝑚𝑜𝑑 𝑝 where 𝑝 = 2 𝑠 − 1:
● 𝑖 = 𝑘 & 𝑝 + 𝑘 ≫ 𝑠 ;
● 𝑟𝑒𝑡 𝑖 ≥ ...
Lagged-Fibonacci Generator
● 𝑋𝑖 = 𝑋𝑖−𝑝 ∗ 𝑋𝑖−𝑞; p and q are the lags
● ∗ is =-* mod M (or XOR);
● ALFG: 𝑋 𝑛 = 𝑋 𝑛−𝑗 + 𝑋 𝑛−𝑘...
LFG
● The good:
●Very efficient: 2 ops + power-of-2 mod
●Much Long period than LCG;
●Directly works in floats
●Higher qual...
LFG – the bad
● Need to store max(p,q) floats
● Pure sequential –
● multiplicative LFG can’t jump ahead.
Mersenne Twister
● Gold standard ?
● Large state (624 ints)
● Lots of flops
● Hard to leapfrog
● Limited parallelism
power...
● End of Basic RNG Overview
Parallel RNG
● Maintain the RNG’s quality
● Same result regardless of the # of cores
● Minimal state especially for gpu.
●...
Random Tree
• 2 LCGs with different 𝑎
• L used to generate a
seed for R
• No need to know how
many generators or # of
valu...
Leapfrog with 3 cores
• Each thread leaps
ahead by 𝑁 using L
• Each thread use its
own R to generate its
own sequence
• 𝑁 ...
Leapfrog
● basic LCG without c:
● 𝐿 𝑘+1 = 𝑎𝐿 𝑘 𝑚𝑜𝑑 𝑚
● 𝑅 𝑘+1 = 𝑎 𝑛 𝑅 𝑘 𝑚𝑜𝑑 𝑚
● LCG: 𝐴 = 𝑎 𝑛and 𝐶 = 𝑐(𝑎 𝑛 − 1)/(𝑎 − 1) –
ea...
Leapfrog with 3 cores
• Each sequence will
not overlap
• Final sequence is the
same as the serial
code
Leapfrog – the good
● Same sequence as serial code
● Limited choice of RNG (e.g. no MLFG)
● No need to fix the # of random...
Leapfrog – the bad
● 𝑎 𝑝no longer have the good qualities of 𝑎
● power-of-2 N produce correlated sub-
sequences
● Need to ...
Sequence Splitting
• If we know the # of
values per thread 𝑛
• 𝐿 𝑘+1 = 𝑎 𝑛
𝐿 𝑘 𝑚𝑜𝑑 𝑚
• 𝑅 𝑘+1 = 𝑎𝑅 𝑘 𝑚𝑜𝑑 𝑚
• the sequence i...
Leapfrog and Splitting
● Only guarantees the sequences are non-
overlap; nothing about its quality
● Not invariant to degr...
Lagged-Fibonacci Leapfrog
● LFG has very long period
● Period = 2 𝑝 − 1 2 𝑏−3; 𝑀 = 2 𝑏
● 𝑀 can be power-of-two!
● Much bet...
Issues with Leapfrog & Splitting
● LCG’s period get even shorter
● Questionable quality
● ALFG is much better but have to ...
Crypto Hash
● MD5
● TEA: tiny encryption algorithm
Core Idea
1. input trivially prepared
in parallel, e.g. linear
ramp
2. feed input value into
hash, independently
and in pa...
TEA
● A Feistel coder
● Input is split into L
and R
● 128B key
● F: shift and XORs or
adds
TEA
Magic ‘delta’
● 𝑑𝑒𝑙𝑡𝑎 = 5 − 1 231
● Avalanche in 6 cycles (often in 4)
● * mixes better than ^ but makes TEA
twice as slow
Applications
Fractal terrain
(vertex
shader)
Texture tiling
(fragment
shader)st
SPRNG
● Good package by Michael Mascagni
● http://www.sprng.org/
References
● [Mascagni 99] Some Methods for Parallel Pseudorandom Number Generation, 1999.
● [Park & Miller 88] Random Num...
Take Aways
● Look beyond LCG
● ALFG is worth a closer look
● Crypto-based hash is most promising –
especially TEA.
Upcoming SlideShare
Loading in …5
×

Parallel Random Generator - GDC 2015

2,151 views

Published on

Generating random numbers in a highly parallel program is surprising non-trivial. A lot of good generators have lots of state and is purely serial. Simple generators like LCG can leapfrog ahead but of limited quality and depends on #cores. We want our code to be independent of the degree of parallelism.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Parallel Random Generator - GDC 2015

  1. 1. Parallel Random Generator Manny Ko Principal Engineer Activision
  2. 2. Outline ●Serial RNG ●Background ●LCG, LFG, crypto-hash ●Parallel RNG ●Leapfrog, splitting, crypto-hash
  3. 3. RNG - desiderata ● White noise like ● Repeatable for any # of cores ● Fast ● Small storage
  4. 4. RNG Quality ● DIEHARD ● Spectral test ● SmallCrush ● BigCrush GPUBBS
  5. 5. Power Spectrum Power spectrum density Radial Mean Radial Variance
  6. 6. Serial RNG: LCG ● Linear-congruential (LCG) ● 𝑋𝑖 = 𝑎 ∗ 𝑋𝑖−1 + 𝑐 𝑚𝑜𝑑 𝑀, ● a, c and M must be chosen carefully! ● Never choose 𝑀 = 231 ! Should be a prime ● Park & Miller: 𝑎 = 16807, 𝑚 = 214748647 = 231 − 1. 𝑚 is a Mersenne prime! ● Most likely in your C runtime
  7. 7. LCG: the good and bad ● Good: ● Simple and efficient even if we use mod ● Single word of state ● Bad: ● Short period – at most m ● Low-bits are correlated especially if 𝑚 = 2 𝑛 ● Pure serial
  8. 8. LCG - bad ● 𝑋 𝑘_+1 = (3 ∗ 𝑋 𝑘+4) 𝑚𝑜𝑑 8 ● {1,7,1,7, … }
  9. 9. Mersenne Prime modulo ● IDIV can be 40~80 cycles for 32b/32b ● 𝑘 𝑚𝑜𝑑 𝑝 where 𝑝 = 2 𝑠 − 1: ● 𝑖 = 𝑘 & 𝑝 + 𝑘 ≫ 𝑠 ; ● 𝑟𝑒𝑡 𝑖 ≥ 𝑝 ? 𝑖 − 𝑝 ∶ 𝑖;
  10. 10. Lagged-Fibonacci Generator ● 𝑋𝑖 = 𝑋𝑖−𝑝 ∗ 𝑋𝑖−𝑞; p and q are the lags ● ∗ is =-* mod M (or XOR); ● ALFG: 𝑋 𝑛 = 𝑋 𝑛−𝑗 + 𝑋 𝑛−𝑘(𝑚𝑜𝑑 2 𝑚) ● * give best quality ● Period = 2 𝑝 − 1 2 𝑏−3; 𝑀 = 2 𝑏
  11. 11. LFG ● The good: ●Very efficient: 2 ops + power-of-2 mod ●Much Long period than LCG; ●Directly works in floats ●Higher quality than LCG ●ALFG can skip ahead
  12. 12. LFG – the bad ● Need to store max(p,q) floats ● Pure sequential – ● multiplicative LFG can’t jump ahead.
  13. 13. Mersenne Twister ● Gold standard ? ● Large state (624 ints) ● Lots of flops ● Hard to leapfrog ● Limited parallelism power spectrum
  14. 14. ● End of Basic RNG Overview
  15. 15. Parallel RNG ● Maintain the RNG’s quality ● Same result regardless of the # of cores ● Minimal state especially for gpu. ● Minimal correlation among the streams.
  16. 16. Random Tree • 2 LCGs with different 𝑎 • L used to generate a seed for R • No need to know how many generators or # of values #s per-thread • GG
  17. 17. Leapfrog with 3 cores • Each thread leaps ahead by 𝑁 using L • Each thread use its own R to generate its own sequence • 𝑁 = 𝑐𝑜𝑟𝑒𝑠 ∗ 𝑠𝑒𝑞𝑝𝑒𝑟𝑐𝑜𝑟𝑒
  18. 18. Leapfrog ● basic LCG without c: ● 𝐿 𝑘+1 = 𝑎𝐿 𝑘 𝑚𝑜𝑑 𝑚 ● 𝑅 𝑘+1 = 𝑎 𝑛 𝑅 𝑘 𝑚𝑜𝑑 𝑚 ● LCG: 𝐴 = 𝑎 𝑛and 𝐶 = 𝑐(𝑎 𝑛 − 1)/(𝑎 − 1) – each core jumps ahead by n (# of cores)
  19. 19. Leapfrog with 3 cores • Each sequence will not overlap • Final sequence is the same as the serial code
  20. 20. Leapfrog – the good ● Same sequence as serial code ● Limited choice of RNG (e.g. no MLFG) ● No need to fix the # of random values used per core (need to fix ‘n’)
  21. 21. Leapfrog – the bad ● 𝑎 𝑝no longer have the good qualities of 𝑎 ● power-of-2 N produce correlated sub- sequences ● Need to fix ‘n’ - # of generators/sequences ● the period of the original RNG is shorten by a factor of ‘n’. 32 bit LCG has a short period to start with.
  22. 22. Sequence Splitting • If we know the # of values per thread 𝑛 • 𝐿 𝑘+1 = 𝑎 𝑛 𝐿 𝑘 𝑚𝑜𝑑 𝑚 • 𝑅 𝑘+1 = 𝑎𝑅 𝑘 𝑚𝑜𝑑 𝑚 • the sequence is a subset of the serial code
  23. 23. Leapfrog and Splitting ● Only guarantees the sequences are non- overlap; nothing about its quality ● Not invariant to degree of parallelism ● Result change when # cores change ● Serial and parallel code does not match
  24. 24. Lagged-Fibonacci Leapfrog ● LFG has very long period ● Period = 2 𝑝 − 1 2 𝑏−3; 𝑀 = 2 𝑏 ● 𝑀 can be power-of-two! ● Much better quality than LCG ● No leapfrog for the best variant – ‘*’ ● Luckily the ALFG supports leapfrogging
  25. 25. Issues with Leapfrog & Splitting ● LCG’s period get even shorter ● Questionable quality ● ALFG is much better but have to store more state – for the ‘lag’.
  26. 26. Crypto Hash ● MD5 ● TEA: tiny encryption algorithm
  27. 27. Core Idea 1. input trivially prepared in parallel, e.g. linear ramp 2. feed input value into hash, independently and in parallel 3. output white noise hash input output
  28. 28. TEA ● A Feistel coder ● Input is split into L and R ● 128B key ● F: shift and XORs or adds
  29. 29. TEA
  30. 30. Magic ‘delta’ ● 𝑑𝑒𝑙𝑡𝑎 = 5 − 1 231 ● Avalanche in 6 cycles (often in 4) ● * mixes better than ^ but makes TEA twice as slow
  31. 31. Applications Fractal terrain (vertex shader) Texture tiling (fragment shader)st
  32. 32. SPRNG ● Good package by Michael Mascagni ● http://www.sprng.org/
  33. 33. References ● [Mascagni 99] Some Methods for Parallel Pseudorandom Number Generation, 1999. ● [Park & Miller 88] Random Number Generators: Good Ones are hard to Find, CACM, 1988. ● [Pryor 94] Implementation of a Portable and Reproducible Parallel Pseudorandom Number Generator, SC, 1994 ● [Tzeng & Li 08] Parallel White Noise Generation on a GPU via Cryptographic Hash, I3D, 2008 ● [Wheeler 95] TEA, a tiny encryption algorithm, 1995.
  34. 34. Take Aways ● Look beyond LCG ● ALFG is worth a closer look ● Crypto-based hash is most promising – especially TEA.

×