Computational Randomness:
Controlled Chaos in an Organized Machine
Amanda Sopkin @amandasopkin
Milan | November 29 - 30, 2018
Who am I?
@amandasopkin
Slides are online
@amandasopkin
Where are we headed?
@amandasopkin
Why python?
@amandasopkin
History of Cryptography
@amandasopkin
1900 BC: Cryptographic
hieroglyphics
@amandasopkin
100 BC: Julius Caeser
@amandasopkin
@amandasopkin
Bread and Butter of cryptography
1500s: Vigenère’s system
@amandasopkin
To encrypt, a table of alphabets can be used, termed a
tabula recta, Vigenère square or Vigenère table. It has
the alphabet written out 26 times in different rows,
each alphabet shifted cyclically to the left compared
to the previous alphabet, corresponding to the 26
possible Caesar ciphers. At different points in the
encryption process, the cipher uses a different
alphabet from one of the rows. The alphabet used at
each point depends on a repeating keyword
1500s: Vigenère’s system
Kerchoff’s Principle:
secrecy of key
@amandasopkin
1900s: Hebern’s system
@amandasopkin
1900s: Hebern’s system
1900s: Hebern’s system
Letter frequencies
Can be used to break a cipher
Letter frequencies
Letter frequencies
Frequency of each letter in cipher
Sorted from most common to least
Letter frequencies
Standard english letter frequencies
Letter frequencies
Frequencies of letters in cipher
Letter frequencies
Most frequent letters in cipher are S and O
Substitute E and T for
S and O
Spot instances of “tle”
Most common letter is now G,
which must be a, i, or o
Spot oFe and theF and then
Lheet
Spot sODVe and OK
Sub l for D, v for V, R for K
Spot enoQRh
Spot EesMge and Nount
Few more steps...
Solved!
Back to history...
@amandasopkin
WW2: Engima Machine
@amandasopkin
WW2: Engima Machine
The rotors rotate at different rates as you type on
the keyboard and output appropriate letters of
cipher text. In this case the key was the initial
setting of the rotors.
1973: IBM’s Luther
Algorithm (DES)
@amandasopkin
Note on block cipher
encryption
A block cipher is an encryption method that applies
a deterministic algorithm along with a symmetric key
to encrypt a block of text, rather than encrypting
one bit at a time as in stream ciphers. For example,
a common block cipher, AES, encrypts 128 bit blocks
with a key of predetermined length: 128, 192, or 256
bits.
2000: Advanced
Encryption Standard
(Rijndael)
@amandasopkin
1970s-1990s:
Crypto Wars
@amandasopkin
Key size restrictions
@amandasopkin
1996: The end of crypto
wars
@amandasopkin
1991: PGP
(Pretty Good Privacy)
@amandasopkin
Bread and Butter of cryptography
Why is randomness
important?
@amandasopkin
Random private key
+
Public key
= encrypted message
@amandasopkin
1. You generate a random key.
2. You use that key to encrypt your data.
3. I send you my public key.
4. You use my public key to encrypt your random key.
5. Send both the encrypted data and the encrypted random key
to me.
6. I use my private key to decrypt your random key.
7. I use your random key to decrypt the data.
@amandasopkin
The security of the
private key is
important
@amandasopkin
Lets talk about
randomness...
@amandasopkin
@amandasopkin
Randomness
Makes processes secure
Mathematically/computationally,
naturally, philosophically important
Difficult to actually achieve
@amandasopkin
Why do we need
randomness today?
@amandasopkin
4oio342ip4o24p32o
4fdslf95454
Problems with randomness
The seed, or starting point The algorithm
@amandasopkin
1.Determined that user ids were seeded
with restart time
2.Crashed the Hacker News site
3.Predicted restart time
4.Predicted assigned user ids as users
logged in
5.Impersonated discovered users
@amandasopkin
DUAL_EC_DRBG Controversy
● 2004: Dual EC PRNG introduced
● 08/2007: Shumow and Ferguson present
Dual_EC_DRBG flaw at cryptography conference
DUAL_EC_DRBG Controversy
● 11/2007: Schneier bases article in Wired on
their findings
DUAL_EC_DRBG Controversy
“...would allow NSA to determine the
state of the random number
generator, and thereby eventually be
able to read all data sent over the
SSL connection.”
DUAL_EC_DRBG Controversy
● 09/2013: One of the purposes of Bullrun is
described as being "to covertly introduce
weaknesses into the encryption standards
followed by hardware and software developers
around the world."
DUAL_EC_DRBG Controversy
● NIST recommends removal of the algorithm as a
standard
DUAL_EC_DRBG Controversy
● 2004: Dual EC PRNG introduced
● 08/2007: Shumow and Ferguson present Dual_EC_DRBG
flaw at cryptography conference
● 11/2007: Schneier bases article in Wired on their
findings
DUAL_EC_DRBG Controversy
● 09/2013: One of the purposes of Bullrun is
described as being "to covertly introduce
weaknesses into the encryption standards followed
by hardware and software developers around the
world."
● 12/2013: Presidential advisory examines encryption
standards
● 2014: Standard is removed
DUAL_EC_DRBG Controversy
Years until standard removed...
10!
Who did this impact?
Microsoft, Google, Apple, McAfee,
Docker, IBM, Oracle, Cisco, VMWare,
Juniper, HP, Red Hat, Samsung,
Toshiba, DELL, Ruckus, F5 Networks,
Lenovo, Nokia, the RSA BSAFE
libraries for Java and C++ and
more....
Ok, so you want to
create randomness...
@amandasopkin
An ideal pseudo random number generator
should...
1.Pass statistical tests of randomness
An ideal pseudo random number generator
should...
Monobit Distance Poker or
Craps
Birthday
1.Pass statistical tests of randomness
2.Take a long time before repeating
An ideal pseudo random number generator
should...
Have a long “period”
1.Pass statistical tests of randomness
2.Take a long time before repeating
3.Execute efficiently
An ideal pseudo random number generator
should...
&
Quick Low storage
1.Pass statistical tests of randomness
2.Take a long time before repeating
3.Execute efficiently
4.Be repeatable
An ideal pseudo random number generator
should...
1.Pass statistical tests of randomness
2.Take a long time before repeating
3.Execute efficiently
4.Be repeatable
5.Be portable
An ideal pseudo random number generator
should...
Can be run on any machine or system
What are the common
ways of generating
“randomness”?
@amandasopkin
Linear congruential generators
Linear congruential generators take the form
xk = (axk−1 + c) (mod M)
where x0 is the seed, the integer M is the
largest representable integer, and the period
is at most M.
Linear congruential generators
a = 3
c = 9
m = 16
x0 = 4394
def lcg():
xi = x0
for i in range(10):
xi = (a*xi + c)%m
print(xi)
Linear congruential generators
Algorithm: xi = (a*xi + c)%m
7
14
3
2
15
6
11
10
7
Towards a better
pseudorandom generator
@amandasopkin
Any one who
considers
arithmetical
methods of
producing random
digits is, of
course, in a state
of sin.
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8 digits, add
leading 0s
Take the middle 4 digits of the result
Repeat the sequence
Mid square method generally
Start with a 4 digit seed 9834
Mid square method generally
Start with a 4 digit seed
Square this value 96707556
9834
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
96707556
9834
96707556
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
Take the middle 4 digits of the
result
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
9834
96707556
96707556
7075
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
Take the middle 4 digits of the
result
Repeat the sequence
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
9834
96707556
96707556
7075
50055625
Mid square method
seed_number = int(input("Please enter a four digit number:n[####] "))
number = seed_number
already_seen = set()
counter = 0
while number not in already_seen:
counter += 1
already_seen.add(number)
number = int(str(number * number).zfill(8)[2:6])
print(f"#{counter}: {number}")
print(f"We began with the seed {seed_number}, and"
f" we repeated ourselves after {counter} steps"
f" with {number}.")
Mid square method
Please enter a four digit number: [####]
5859
#1: 3278
#2: 7452
#3: 5323
#4: 3343
#5: 1756
#6: 835
#7: 6972
#8: 6087
#9: 515
#10: 2652
.......
#59: 24 #60: 5 #61: 0 #62: 0 We began with the seed 5859, and we repeated ourselves after 62 steps
with 0.
Issues with mid square method
Relatively slow
Statistically unsatisfactory
Sample of random numbers may be too short
Predicting the mid square method
Advanced LCG Mid square method
Let’s talk cryptography
@amandasopkin
Most used pseudo random number generator
Very long period (the Mersenne prime: 219937 − 1)
Not cryptographically secure
The Mersenne Twister
Predicting the random() module
from random import random
import matplotlib.pyplot as plt
def uni(n, m, a, c, seed):
sequence = []
Xn = seed
for i in range(n):
Xn = ((a*Xn + c) % m)
sequence.append(Xn/float(m-1))
return(sequence)
x = range(1000)
y_1 = uni(1000, 2**32, 11695477, 1, datetime.now().microsecond)
y_2 = [random() for i in range(1000)]
plt.plot(x, y_1, "o", color="blue")
plt.show()
plt.plot(x, y_2, "o", color="red")
plt.show()
Predicting the random() module
Advanced LCG Built in Random PRNG
Whats wrong with the
random module?
@amandasopkin
Problems with the random module...
Problems with the random module...
Problems with the random module...
...
Introducing...the
secrets module!
@amandasopkin
The Secrets module
Is cryptographically secure
Includes ready made “batteries” for
Users that don’t want to build their own
Uses 32 bytes of entropy by default
A note on entropy...
@amandasopkin
@amandasopkin
Natural sources of entropy
Source code of Secrets module
from random import SystemRandom
_sysrand = SystemRandom()
randbits = _sysrand.getrandbits
choice = _sysrand.choice
def randbelow(exclusive_upper_bound):
return _sysrand._randbelow(exclusive_upper_bound)
DEFAULT_ENTROPY = 32 # number of bytes to return by default
def token_bytes(nbytes=None):
if nbytes is None:
nbytes = DEFAULT_ENTROPY
return os.urandom(nbytes)
def token_hex(nbytes=None):
return binascii.hexlify(token_bytes(nbytes)).decode('ascii')
def token_urlsafe(nbytes=None):
tok = token_bytes(nbytes)
return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii')
SystemRandom
Uses OS as a source of randomness
Not available on all systems
Does not rely on software states
Sequences are not repeatable
/dev/random
Will block without sufficient entropy
Relies on “the kernel entropy pool”
Slower than /dev/urandom
/dev/urandom
Will not block without sufficient entropy
Relies on “the kernel entropy pool”
Faster than /dev/random
Theoretically vulnerable to attack
Using the secrets module to get tokens
import secrets
token1 = secrets.token_hex(16)
token2 = secrets.token_hex(10)
print(token1)
print(token2)
d2bdc979d5ecec0dccf67854459c5284
584d93ac921d3c74be9c
Using the secrets module for password
generation
import secrets
import string
alphabet = string.ascii_letters + string.digits
password = ''.join(secrets.choice(alphabet)
for i in range(10))
print(password)
i3OFMKPr8q
The secrets module:
not the end all be all.
@amandasopkin
Popular algorithms
● AES
● RSA
● ECC
Popular algorithms
● AES: made of three block ciphers: AES-128, AES-192
and AES-256. Each cipher encrypts and decrypts data in
blocks of 128 bits using cryptographic keys of 128-,
192- and 256-bits
Popular algorithms: AES
Popular algorithms
● RSA: based on the practical difficulty of the
factorization of the product of two large prime
numbers, the "factoring problem."
Popular algorithms: RSA
Popular algorithms
● ECC: based on the algebraic structure of elliptic
curves over finite fields.
Popular algorithms: ECC
Python’s “nuclear reactor” of
Randomness
"...folks really are better off learning to use things
like cryptography.io for security sensitive software, so
this change is just about harm mitigation given that it's
inevitable that a non-trivial proportion of the millions
of current and future Python developers won't do that."
Cryptography.io
For the paranoid...
How it works: Avalanche breakdown
Let’s wrap up...
@amandasopkin
Is very important for security
Difficult to truly achieve
Can be simulated
Randomness...
@amandasopkin
Something to say?
Amanda Sopkin
@amandasopkin
Please submit feedback
on the app!
Thank you!
@amandasopkin
Sources:
● Icons taken from flaticon.com
● https://crypto.stackexchange.com/questions/51232/using-
32-hexadecimal-digits-vs-ascii-equivalent-16-character-
password
● https://dev.to/walker/pseudo-random-numbers-in-python-
from-arithmetic-to-probability-distributions
● Wired Magazine
● The Washington Post
● NYT
● Dilbert

Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machine - Codemotion Milan 2018