Machine Learning in Finance via Randomization

Machine Learning in Finance via Randomization
Josef Teichmann
(based joint work with Lukas Gonon, Christa Cuchiero, Lyudmila
Grigoryeva, and Juan-Pablo Ortega)
ETHZ
Universität Klagenfurt
Josef Teichmann (ETHZ) Randomized signature May 2022 1 / 30

Introduction Motivation
Data driven models
Key question: how to gain a quantitative understanding in dynamic decision
making in Finance or Economics?
Classical approach: specify a model pool with well understood
characteristics depending on as few as possible parameters (Occam’s razor),
calibrate them to data and solve optimization problems thereon.
Machine Learning approach: highly overparametrized function families are
used instead of a few parameter families for constructing model pools,
strategies for optimization problems, etc. Apply learning procedures and
data (real or artificially generated) to obtain parameter configuration which
perform well (e.g. Deep Hedging).
Machine Learning relies on different kinds of universal approximation
theorems for, e.g.
I Feed forward networks, Recurrent neural networks, LSTMs, Neural
(stochastic) differential equations, Signature-based models, etc,
I and on training models on data.

Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).

Randomization and Provability
Key question: how to gain a quantitative understanding why artificial
traders work so well?
First step: Trained randomized networks as a rough role model for what
might be a result of early stopped training but with a perspective of
provability. See, e.g., works of Jakob Heiss and Hanna Wutte in my working
group.
Second step: Randomized networks can appear as almost optimal choice
(Reservoir Computing).
Third step: Reduction of training complexity is a feature (sustainable
machine learning).

Randomized networks: an older perspective? Ideas from
Reservoir computing
... to ease training procedures
... going back to Herbert Jäger with many contributions from Claudio
Gallicchio, Lyudmila Grigoryeva and Juan-Pablo Ortega, et al.
Typically an input signal is fed into a fixed (random) dynamical
system, called reservoir, which maps the input usually to higher
dimensions.
Then a simple (often linear) readout mechanism is trained to read the
state of the reservoir and map it to the desired output.
The main benefit is that training is performed only at the readout
stage while the reservoir is fixed and untrained.
Reservoirs can in some cases be realized physically and learning the
readout layer is often a simple (regularized) regression.

Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a

generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.

generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.
To illustrate this methodology let us consider as motivating example the following
task from finance:
Goal: learn from observations of time series data of a stock price its
dependence on the driving noise, e.g. Brownian motion
As it is easy to simulate Brownian motion, the learned relationship from the
driving Brownian motion to the price data, allows to easily simulate stock
prices, e.g. for risk management.
⇒ Market generators

Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:

Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:
I a universal (fixed) reservoir X, no
dependence on the specific
dynamics of Y
I linear readout W that needs to be
trained such that Y ≈ WX
B
F //
R

Y
X
W
??

Learning from market observations
R
W
??
Reservoir Linear Readout

Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?

There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.

There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.
Signature goes back to K. Chen (’57) and plays a prominent in rough
path theory (T. Lyons (’98), P. Friz N. Victoir (’10), P. Friz
M. Hairer (’14)).
In the last few years there have been many papers (e.g. Levin, Lyons,
and Ni (2016)) showing how to apply rough path theory and
signature methods to machine learning and time series analysis.

Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.

Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .

shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).

shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).
⇒ This yields a natural split in spirit of reservoir computing into
I the signature of the input signal being the generic reservoir;
I a linear (readout) map.

Goal of the talk
Prove in the case of randomized recurrent neural networks how
dynamical systems can be approximated with precise convergence
rates.
Take randomized recurrent networks as a role model for randomized
deep networks.

Randomized signature and reservoir computing Setting
Mathematical Setting
We consider here the simplest setting with smooth input signals (controls)
and output signals taking values in RN
, but it works with more general
drivers (e.g. semimartingale) and by replacing RN
by so-called convenient
vector spaces.
Consider a controlled ordinary differential equation (CODE)
dYt =
d
X
i=0
Vi (Yt)dui
(t) , Y0 = y ∈ RN
(CODE)
for some smooth vector fields Vi : RN
→ RN
, i = 0, . . . , d and d smooth
control curves ui
. Notice again that du0
(t) = dt
We observe the controls u (input) and Y (output), but do not have access
to the vector fields Vi .
The goal is to learn the dynamics and to simulate from it conditional on
(new) controls u, i.e. we aim to learn the map
input control u 7→ solution trajectory Y .

Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.

Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.
Sums and products are defined in the natural way.
We consider the complete locally convex topology making all
projections a 7→ ai1...ik
continuous on Ad , hence a convenient vector
space.

Signature in a nutshell – definitions
Signature of u is the unique solution of the following CODE in T((Rd ))
d Sigs,t =
d
X
i=1
Sigs,t ei dui
(t), Sigs,s = 1.
and is apparently given by
Sigs,t(a) = a
∞
X
k=0
d
X
i1,...,ik =0
Z
s≤t1≤···≤tk ≤t
dui1
(t1) · · · duik
(tk) ei1 · · · eik
.

Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Signature and its connection to reservoir computing
The following “splitting theorem” is the precise link to reservoir computing. We
suppose here that (CODE) admits a unique global solution given by an smooth
evolution operator Evol such that Yt = Evolt(y).
Theorem
Let Evol be a smooth evolution operator on RN
such that (Evolt(y))t satisfies
(CODE). Then for any smooth function g : RN
→ R and for every M ≥ 0 there is
a time-homogenous linear map W depending on (V1, . . . , Vd , g, M, y) from
TM
(Rd
) → R such that
g Evolt(y)

= W πM(Sigt)

+ O tM+1

,
where πM : T((Rd
)) → TM
(Rd
) is the canonical projection.
Remark
For the proof see e.g. Lyons (1998). It can however be proved in much more
generality, e.g. on convenient vector spaces.

Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Is signature a good reservoir?
This split is not yet fully in spirit of reservoir computing, since unlike
a physical systems where the evaluations are ultrafast, computing
signature up to a high order can take a while, in particular if d is large.
Moreover, regression on signature is the analog on path space of a
polynomial approximation, which can have several disadvantages.
Remedy: information compression by Johnson-Lindenstrauss
projection.

Randomly projected universal signature dynamics The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss (JL) lemma
We here state the classical version of the Johnson-Lindenstrauss Lemma.
Lemma
For every 0 1, and every set Q consisting of an N points in some Rn
, there
is a linear map f : Rn
→ Rk
with k ≥ 24 log N
32−23 such that
(1 − )kv1 − v2k
2
≤ kf (v1) − f (v2)k
2
≤ (1 + )kv1 − v2k
2
for all v1, v2 ∈ Q, i.e. the geometry of Q is almost preserved after the projection.
The map f is called (JL) map and it can be drawn randomly from a set of
linear projection maps.
Indeed, take a k × n matrix A of with iid standard normal entries. Then
1
√
k
A satisfies the desired requirements with high probability .
We apply this remarkable result to obtain “versions of signature” in lower
dimensional spaces.

Randomly projected universal signature dynamics Randomized signature
Towards randomized signature
We look for (JL) maps on TM
(Rd
) which preserve its geometry encoded in some
set of (relevant) directions Q. In order to make this program work, we need the
following definition:
Definition
Let Q be any (finite or infinite) set of elements of norm one in TM
(Rd
) with
Q = −Q. For v ∈ TM
(Rd
) we define the function
kvkQ := inf
n X
j
|λj |

X
j
λj vj = v and vj ∈ Q
o
.
We use the convention inf ∅ = +∞ since the function is only finite on span(Q).
The function k.kQ behaves precisely like a norm on the span of Q.
Additionally kvkQ1
≥ kvkQ2
for Q1 ⊂ Q2.

Towards randomized signature – a first estimate
Proposition
Fix M ≥ 1 and 0. Moreover, let Q be any N point set of vectors with norm
one in TM
(Rd
). Then there is linear map f : TM
(Rd
) → Rk
(with k being the
above JL constant with N), such that

hv1, v2 − (f ∗
◦ f )(v2)i

≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.

≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.
By means of this special JL map associated to a point set Q we can now
“project signature” without loosing too much information.
We can then solve the projected and obtain – up to some time – a solution
which is -close to signature.
By a slight abuse of notation we write Sigt for the truncated version
πM Sigt

in TM
(Rd
).

Randomized signature is as expressive as signature
Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann)
Let u be a smooth control and f a JL map from TM
(Rd
) → Rk
where k is
determined via some fixed and a fixed set Q. We denote by r-Sig the smooth
evolution of the following controlled differential equation on Rk
dXt =
d
X
i=1
1
√
n
f (f ∗
(Xt)ei ) + (1 −
1
√
n
)f (Sigt ei )

dui
(t) , X0 ∈ Rk
,
where n = dim(TM
(Rd
)). Then for each w ∈ TM
(Rd
)
|hw, Sigt −f ∗
(r-Sigt(X0))i|
≤

+ C
d
X
i=1
Z t
0
k Evol∗
r wkQk Sigr ei kQ dr ,
where Evol denotes here the evolution operator corresponding to
dZt =
Pd
i=1
1
√
n
(f ∗
◦ f )(Ztei )dui
(t) and C = sups≤r≤t, i

Machine Learning in Finance via Randomization

More Related Content

Similar to Machine Learning in Finance via Randomization

More from Förderverein Technische Fakultät

Recently uploaded

Machine Learning in Finance via Randomization