Learning Financial Market Data with Recurrent Autoencoders and TensorFlow

Learning Financial Market Data
with Recurrent Autoencoders and TensorFlow
.
Steven Hutt steven.c.hutt@gmail.com
July 28, 2016
TensorFlow London Meetup

The Limit Order Book
A ﬁnancial exchange provides a central location for electronically trading contracts.
Trading is implemented on a Matching Engine, which matches buy and sell orders
according to certain priority rules and queues orders which do not immediately match.
A. Booth©
A Marketable limit order is immediately matched
and results in a trade
A Passive limit order cannot be matched and joins
the queue
Data In: Client ID tagged order ﬂow
Data Out: LOB state (quantity at price)
In either case need to learn data sequences
1

Learning Sequences with RNNs
In a Recurrent Neural Network there are nodes which feedback on themselves. Below
xt ∈ Rnx
, ht ∈ Rnh and zt ∈ Rnz
, for t = 1, 2, . . . , T. Each node is a vector or 1-dim array.
..
output
.state .
input
.
zt
.ht
.
xt
.
Whx
. Whh
.
Wzh
. h0
.
z1
.
z2
.
z3
. h1
. h2
. h3
.
x1
.
x2
.
x3
.
zT
. hT
.
xT
.
· · ·
. · · ·.
· · ·
The RNN is deﬁned by update equations:
ht+1 = f(Whhht + Whxxt + bh), zt = g(Wyhht + bz).
where Wuv ∈ Rnu×nv
and bu ∈ Rnu
are the RNN weights and biases respectively. The
functions f and g are generally nonlinear.
2

Long Short Term Memory Units
An LSTM [3] learns long term dependencies by adding a memory cell and controlling what
gets modiﬁed and what gets forgotten:
..memory ct .
state ht
.∗.
forget
. +.
modify
.
σ
.
σ
.
tanh
.
σ
.
∗
.
∗
.
tanh
.
•
.
⃝
.
⃝
.
⃝
.. ⃝.. ct+1.
ht+1
.
xt
.
input
.
⃝
.
= copy
.
•
.
⃝
.
= join then copy
An LSTM can remember state dependencies further back than a standard RNN. It is
capable of learning real grammars.
3

Unitary RNNs
The behaviour of the LSTM can be difﬁcult to analyse.
An alternative approach [1] is to ensure there are no contractive directions, that is
require Whh ∈ U(n), the space of unitary matrices.
This preserves the simple form of an RNN but ensures the state dependencies go further
back in the sequence.
A key problem is that gradient descent does not preserve
the unitary property:
x ∈ U(n) → x + λυ /∈ U(n)
The solution in [1] is to choose Whh from a
subset generated by parametrized unitary
matrices:
Whh = D3R2F−1
D2ΠR1FD1.
Here D is diagonal, R is a reﬂection, F is the Fourier trans-
form and Π is a permutation. 4

RNNs as Programs addressing External Memory
RNNs are Turing complete but the theoretical capabilities are not matched in practice
due to training inefﬁciencies.
The Neural Turing Machine emulates a differentiable Turing Machine by adding external
addressable memory and using the RNN state as a memory controller [2].
..
Input xt
.
Output yt
.RNN Controller.RNN Controller.RNN Controller.
Read Heads
.
Write Heads
.
Memory
.
ml
.
ml
Vanila NTM Architecture
NTM equivalent 'code'
initialise: move head to start location
while input delimiter not seen do
receive input vector
write input to head location
increment head location by 1
end while
return head to start location
while true do
read output vector from head location
emit output
increment head location by 1
end while 5

Deep RNNs
..
out
.
L3
.
L2
.L1
.
in
.
zt
.
h3t
.
h2t
.h1t
.
xt
.
Wh1x
. Wh1h1
.
Wh2h2
.
Wh3h3
.
h1t
.
h2t
.
Wzh3
.
h30
.
h20
. h10
.
z1
.
z2
.
z3
.
h31
.
h32
.
h33
.
h21
.
h22
.
h23
. h11
. h12
. h13
.
x1
.
x2
.
x3
.
zT
.
h3T
.
h2T
. h1T
.
xT
.
· · ·
.
· · ·
.
· · ·
. · · ·.
· · ·
The input to hidden layer Li is the state hi−1 of hidden layer Li−1. Deep RNNs can learn at
different time scales.
6

Autoencoders
The more regularities there are in data the more it can be compressed. The more
we are able to compress the data, the more we have learned about the data.
(Grünwald, 1998)
..
x1
.
x2
.
x3
.
x4
.
x5
.
x6
.
h11
.
h12
.
h13
.
h14
.
h21
.
h22
.
h31
.
h32
.
h33
.
h34
.
y1
.
y2
.
y3
.
y4
.
y5
.
y6
.
input
.
code
.
reconstruct
.
encode
.
decode
.
Wh1x
.
Wh2h1
.
Wh3h2
.
Wyh3
Autoencoder
7

Recurrent Autoencoders
.. he
0
. he
1
. he
2
. he
3
.
x1
.
x2
.
x3
. he
T
.
xT
. · · ·.
· · ·
. hd
0
.
z1
.
z2
.
z3
. hd
1
. hd
2
. hd
3
.
z0
.
z1
.
z2
.
zT
. hd
T
.
zT−1
.
· · ·
. · · ·.
· · ·
. =
encoder decoder
A Recurrent Autoencoder is trained to output a reconstruction z of the the input
sequence x. All data must pass through the narrow he
T. Compression between input and
output forces pattern learning.
8

Recurrent Autoencoder Updates
.. he
0
. he
1
. he
2
. he
3
.
x1
.
x2
.
x3
. he
T
.
xT
. · · ·.
· · ·
................ hd
0
.
z1
.
z2
.
z3
. hd
1
. hd
2
. hd
3
.
z0
.
z1
.
z2
.
zT
. hd
T
.
zT−1
.
· · ·
. · · ·.
· · ·
. =...............
encoder decoder
he
1 = f(We
hhhe
0 + We
hxx1 + be
h)
he
2 = f(We
hhhe
1 + We
hxx2 + be
h)
he
3 = f(We
hhhe
2 + We
hxx3 + be
h)
· · ·
encoder updates
hd
0 = he
T
copy
hd
1 = f(Wd
hhhd
0 + Wd
hxz0 + bd
h)
hd
2 = f(Wd
hhhd
1 + Wd
hxz1 + bd
h)
hd
3 = f(Wd
hhhd
2 + Wd
hxz2 + bd
h)
· · ·
decoder updates
9

Easy RNNs with TensorFlow VariableScope
def linear(in_array, out_size):
in_size = in_array.get_shape()
# creates or returns a variable
W = vs.get_variable(shape=[in_size, out_size], name='W')
# out_array = in_array * W
out_array = math_ops.matmul(in_array, W)
return out_array
with tf.variable_scope('encoder_0') as scope:
# creates variable encoder_0/W
out_array_1 = linear(in_array_0, out_size_1)
# error: encoder_0/W already exists
# out_array_2 = linear(out_array_1, out_size_2)
# reuse encoder_0/W
scope.reuse_variables()
out_array_2 = linear(out_array_1, out_size_2)
10

TensorFlow encoder
for t in range(1, seq_len):
# placeholder for input data at time t
xs_0_enc[t] = tf.placeholder(shape=[None, nx_enc])
# encoder update at time t
hs_0_enc[t] = lstm_cell_l0_enc(xs_0_enc[t], hs_0_enc[t-1])
scope.reuse_variables() 11

Deep Recurrent Autoencoder
...
encoder_2
.
encoder_1
.encoder_0.
he
30
.
he
20
. he
10
.
he
31
.
he
32
.
he
33
.
he
21
.
he
22
.
he
23
. he
11
. he
12
. he
13
.
x1
.
x2
.
x3
.
he
3T
.
he
2T
. he
1T
.
xT
.
· · ·
.
· · ·
. · · ·.
· · ·
.
decoder_2
.
decoder_1
.
decoder_0
.
hd
30
.
hd
20
.
hd
10
.
hd
31
.
hd
32
.
hd
33
.
hd
21
.
hd
22
.
hd
23
.
hd
11
.
hd
12
.
hd
13
.
z1
.
z2
.
z3
.
z0
.
z1
.
z2
.
hd
3T
.
hd
2T
.
hd
1T
.
zT−1
.
zT
.
· · ·
.
· · ·
.
· · ·
.
· · ·
.
· · ·
.
=
12

TensorFlow Deep Recurrent AutoEncoder
# placeholder for input data
xs_0_enc[t] = tf.placeholder(shape=[None, nx_enc])
# encoder_1 input is encoder_0 hidden state
xs_1_enc[t] = hs_0_enc[t]
# encoder_2 input is encoder_1 hidden state
xs_2_enc[t] = hs_1_enc[t]
13

Variational Autoencoder
We assume the market data {xk} is sampled from a probability distribution with a 'small'
number of latent variables h.
Assume pθ(x, h) = pθ(x | h)pθ(h) where θ are
weights of a NN/RNN. Then
pθ(x) =
∑
h
pθ(x | h)pθ(h)
Train network to minimize
Lθ({xk}) = min
θ
∑
k
− log(pθ(xk)).
Compute pθ(h | x) and cluster.
But pθ(h | x) is intractable!
..h.
x
. pθ(h).
pθ(x)
. prior.
likelihood
.
marginal
.
posterior
.
p(x | h)
.
pθ(h | x)
14

Variational Inference
Variational Inference learns an NN/RNN approximation qϕ(h | x) to pθ(h | x) during
training.
Think of qϕ as encoder and pθ decoder networks: an autoencoder.
Then log pθ(xk) ≥ Lθ,ϕ(xk) where
Lθ,ϕ(xk) = −DKL(qϕ(h | xk)||pθ(h))
regularization term
+ Eqϕ (log pθ(xk | h))
reconstruction term
is the variational lower bound.
Reconstruction term = log-likelihood wrt qϕ
Regularization term = target prior on encoder
..h.
x
. pθ(h).
pθ(x)
. prior.
likelihood
.
marginal
.
approx
.
pθ(x | h)
.
qϕ(h | x)
15

Search
Train a Deep Recurrent Autoencoder on
public market data (book state over time)
Use the trained encoder as hash function
Search based on Euclidean or Hamming
distance (after binary projection)
16

Classify
EUR vs AUD
Train a Variational Recurrent Autoencoder with Gaus-
sian prior on public market data
Include two contracts: EUR and AUD in training data
Use trained encoder to infer Gaussian means from
training data
17

Predict
Train an LSTM on sequences of states of the LOB to predict changes in best bid.
Target data: change in best bid over time t → t + 1.
LSTM output: zt = softmax distribution for best bid moves (−1, 0, +1) over time t → t + 1.
It's tough to make predictions, especially about the future.
target
predict
18

References
M. Arjovsky, A. Shah, and Y. Bengio.
Unitary evolution recurrent neural networks.
arXiv:1511.06464v4, 2016.
A. Graves, G. Wayne, and I. Danihelka.
Neural turing machines.
arXiv:1410.5401v2, 2014.
S. Hochreiter and J. Schmidhuber.
Long short term memory.
Neural Computation, 9:1735--1780, 1997.
19

Learning Financial Market Data with Recurrent Autoencoders and TensorFlow

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Learning Financial Market Data with Recurrent Autoencoders and TensorFlow

Similar to Learning Financial Market Data with Recurrent Autoencoders and TensorFlow (20)

More from Altoros

More from Altoros (20)

Recently uploaded

Recently uploaded (20)

Learning Financial Market Data with Recurrent Autoencoders and TensorFlow