Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
1. Learning Financial Market Data
with Recurrent Autoencoders and TensorFlow
.
Steven Hutt steven.c.hutt@gmail.com
July 28, 2016
TensorFlow London Meetup
3. Learning Sequences with RNNs
In a Recurrent Neural Network there are nodes which feedback on themselves. Below
xt ∈ Rnx
, ht ∈ Rnh and zt ∈ Rnz
, for t = 1, 2, . . . , T. Each node is a vector or 1-dim array.
..
output
.state .
input
.
zt
.ht
.
xt
.
Whx
. Whh
.
Wzh
. h0
.
z1
.
z2
.
z3
. h1
. h2
. h3
.
x1
.
x2
.
x3
.
zT
. hT
.
xT
.
· · ·
. · · ·.
· · ·
The RNN is defined by update equations:
ht+1 = f(Whhht + Whxxt + bh), zt = g(Wyhht + bz).
where Wuv ∈ Rnu×nv
and bu ∈ Rnu
are the RNN weights and biases respectively. The
functions f and g are generally nonlinear.
2
4. Long Short Term Memory Units
An LSTM [3] learns long term dependencies by adding a memory cell and controlling what
gets modified and what gets forgotten:
..memory ct .
state ht
.∗.
forget
. +.
modify
.
σ
.
σ
.
tanh
.
σ
.
∗
.
∗
.
tanh
.
•
.
⃝
.
⃝
.
⃝
.. ⃝.. ct+1.
ht+1
.
xt
.
input
.
⃝
.
= copy
.
•
.
⃝
.
= join then copy
An LSTM can remember state dependencies further back than a standard RNN. It is
capable of learning real grammars.
3
5. Unitary RNNs
The behaviour of the LSTM can be difficult to analyse.
An alternative approach [1] is to ensure there are no contractive directions, that is
require Whh ∈ U(n), the space of unitary matrices.
This preserves the simple form of an RNN but ensures the state dependencies go further
back in the sequence.
A key problem is that gradient descent does not preserve
the unitary property:
x ∈ U(n) → x + λυ /∈ U(n)
The solution in [1] is to choose Whh from a
subset generated by parametrized unitary
matrices:
Whh = D3R2F−1
D2ΠR1FD1.
Here D is diagonal, R is a reflection, F is the Fourier trans-
form and Π is a permutation. 4
6. RNNs as Programs addressing External Memory
RNNs are Turing complete but the theoretical capabilities are not matched in practice
due to training inefficiencies.
The Neural Turing Machine emulates a differentiable Turing Machine by adding external
addressable memory and using the RNN state as a memory controller [2].
..
Input xt
.
Output yt
.RNN Controller.RNN Controller.RNN Controller.
Read Heads
.
Write Heads
.
Memory
.
ml
.
ml
Vanila NTM Architecture
NTM equivalent 'code'
initialise: move head to start location
while input delimiter not seen do
receive input vector
write input to head location
increment head location by 1
end while
return head to start location
while true do
read output vector from head location
emit output
increment head location by 1
end while 5
8. Autoencoders
The more regularities there are in data the more it can be compressed. The more
we are able to compress the data, the more we have learned about the data.
(Grünwald, 1998)
..
x1
.
x2
.
x3
.
x4
.
x5
.
x6
.
h11
.
h12
.
h13
.
h14
.
h21
.
h22
.
h31
.
h32
.
h33
.
h34
.
y1
.
y2
.
y3
.
y4
.
y5
.
y6
.
input
.
code
.
reconstruct
.
encode
.
decode
.
Wh1x
.
Wh2h1
.
Wh3h2
.
Wyh3
Autoencoder
7
9. Recurrent Autoencoders
.. he
0
. he
1
. he
2
. he
3
.
x1
.
x2
.
x3
. he
T
.
xT
. · · ·.
· · ·
. hd
0
.
z1
.
z2
.
z3
. hd
1
. hd
2
. hd
3
.
z0
.
z1
.
z2
.
zT
. hd
T
.
zT−1
.
· · ·
. · · ·.
· · ·
. =
encoder decoder
A Recurrent Autoencoder is trained to output a reconstruction z of the the input
sequence x. All data must pass through the narrow he
T. Compression between input and
output forces pattern learning.
8
10. Recurrent Autoencoder Updates
.. he
0
. he
1
. he
2
. he
3
.
x1
.
x2
.
x3
. he
T
.
xT
. · · ·.
· · ·
................ hd
0
.
z1
.
z2
.
z3
. hd
1
. hd
2
. hd
3
.
z0
.
z1
.
z2
.
zT
. hd
T
.
zT−1
.
· · ·
. · · ·.
· · ·
. =...............
encoder decoder
he
1 = f(We
hhhe
0 + We
hxx1 + be
h)
he
2 = f(We
hhhe
1 + We
hxx2 + be
h)
he
3 = f(We
hhhe
2 + We
hxx3 + be
h)
· · ·
encoder updates
hd
0 = he
T
copy
hd
1 = f(Wd
hhhd
0 + Wd
hxz0 + bd
h)
hd
2 = f(Wd
hhhd
1 + Wd
hxz1 + bd
h)
hd
3 = f(Wd
hhhd
2 + Wd
hxz2 + bd
h)
· · ·
decoder updates
9
11. Easy RNNs with TensorFlow VariableScope
def linear(in_array, out_size):
in_size = in_array.get_shape()
# creates or returns a variable
W = vs.get_variable(shape=[in_size, out_size], name='W')
# out_array = in_array * W
out_array = math_ops.matmul(in_array, W)
return out_array
with tf.variable_scope('encoder_0') as scope:
# creates variable encoder_0/W
out_array_1 = linear(in_array_0, out_size_1)
# error: encoder_0/W already exists
# out_array_2 = linear(out_array_1, out_size_2)
# reuse encoder_0/W
scope.reuse_variables()
out_array_2 = linear(out_array_1, out_size_2)
10
12. TensorFlow encoder
with tf.variable_scope('encoder_0') as scope:
for t in range(1, seq_len):
# placeholder for input data at time t
xs_0_enc[t] = tf.placeholder(shape=[None, nx_enc])
# encoder update at time t
hs_0_enc[t] = lstm_cell_l0_enc(xs_0_enc[t], hs_0_enc[t-1])
scope.reuse_variables() 11
13. Deep Recurrent Autoencoder
...
encoder_2
.
encoder_1
.encoder_0.
he
30
.
he
20
. he
10
.
he
31
.
he
32
.
he
33
.
he
21
.
he
22
.
he
23
. he
11
. he
12
. he
13
.
x1
.
x2
.
x3
.
he
3T
.
he
2T
. he
1T
.
xT
.
· · ·
.
· · ·
. · · ·.
· · ·
.
decoder_2
.
decoder_1
.
decoder_0
.
hd
30
.
hd
20
.
hd
10
.
hd
31
.
hd
32
.
hd
33
.
hd
21
.
hd
22
.
hd
23
.
hd
11
.
hd
12
.
hd
13
.
z1
.
z2
.
z3
.
z0
.
z1
.
z2
.
hd
3T
.
hd
2T
.
hd
1T
.
zT−1
.
zT
.
· · ·
.
· · ·
.
· · ·
.
· · ·
.
· · ·
.
=
12
14. TensorFlow Deep Recurrent AutoEncoder
with tf.variable_scope('encoder_0') as scope:
for t in range(1, seq_len):
# placeholder for input data
xs_0_enc[t] = tf.placeholder(shape=[None, nx_enc])
hs_0_enc[t] = lstm_cell_l0_enc(xs_0_enc[t], hs_0_enc[t-1])
scope.reuse_variables()
with tf.variable_scope('encoder_1') as scope:
for t in range(1, seq_len):
# encoder_1 input is encoder_0 hidden state
xs_1_enc[t] = hs_0_enc[t]
hs_1_enc[t] = lstm_cell_l1_enc(xs_1_enc[t], hs_1_enc[t-1])
scope.reuse_variables()
with tf.variable_scope('encoder_2') as scope:
for t in range(1, seq_len):
# encoder_2 input is encoder_1 hidden state
xs_2_enc[t] = hs_1_enc[t]
hs_2_enc[t] = lstm_cell_l2_enc(xs_2_enc[t], hs_2_enc[t-1])
scope.reuse_variables()
13
15. Variational Autoencoder
We assume the market data {xk} is sampled from a probability distribution with a 'small'
number of latent variables h.
Assume pθ(x, h) = pθ(x | h)pθ(h) where θ are
weights of a NN/RNN. Then
pθ(x) =
∑
h
pθ(x | h)pθ(h)
Train network to minimize
Lθ({xk}) = min
θ
∑
k
− log(pθ(xk)).
Compute pθ(h | x) and cluster.
But pθ(h | x) is intractable!
..h.
x
. pθ(h).
pθ(x)
. prior.
likelihood
.
marginal
.
posterior
.
p(x | h)
.
pθ(h | x)
14
16. Variational Inference
Variational Inference learns an NN/RNN approximation qϕ(h | x) to pθ(h | x) during
training.
Think of qϕ as encoder and pθ decoder networks: an autoencoder.
Then log pθ(xk) ≥ Lθ,ϕ(xk) where
Lθ,ϕ(xk) = −DKL(qϕ(h | xk)||pθ(h))
regularization term
+ Eqϕ (log pθ(xk | h))
reconstruction term
is the variational lower bound.
Reconstruction term = log-likelihood wrt qϕ
Regularization term = target prior on encoder
..h.
x
. pθ(h).
pθ(x)
. prior.
likelihood
.
marginal
.
approx
.
pθ(x | h)
.
qϕ(h | x)
15
17. Search
Train a Deep Recurrent Autoencoder on
public market data (book state over time)
Use the trained encoder as hash function
Search based on Euclidean or Hamming
distance (after binary projection)
16
18. Classify
EUR vs AUD
Train a Variational Recurrent Autoencoder with Gaus-
sian prior on public market data
Include two contracts: EUR and AUD in training data
Use trained encoder to infer Gaussian means from
training data
17
19. Predict
Train an LSTM on sequences of states of the LOB to predict changes in best bid.
Target data: change in best bid over time t → t + 1.
LSTM output: zt = softmax distribution for best bid moves (−1, 0, +1) over time t → t + 1.
It's tough to make predictions, especially about the future.
target
predict
18
21. References
M. Arjovsky, A. Shah, and Y. Bengio.
Unitary evolution recurrent neural networks.
arXiv:1511.06464v4, 2016.
A. Graves, G. Wayne, and I. Danihelka.
Neural turing machines.
arXiv:1410.5401v2, 2014.
S. Hochreiter and J. Schmidhuber.
Long short term memory.
Neural Computation, 9:1735--1780, 1997.
19