Machine Learning in Finance via Randomization
Josef Teichmann
(based joint work with Lukas Gonon, Christa Cuchiero, Lyudmila
Grigoryeva, and Juan-Pablo Ortega)
ETHZ
Universität Klagenfurt
Josef Teichmann (ETHZ) Randomized signature May 2022 1 / 30
Introduction Motivation
Data driven models
Key question: how to gain a quantitative understanding in dynamic decision
making in Finance or Economics?
Classical approach: specify a model pool with well understood
characteristics depending on as few as possible parameters (Occam’s razor),
calibrate them to data and solve optimization problems thereon.
Machine Learning approach: highly overparametrized function families are
used instead of a few parameter families for constructing model pools,
strategies for optimization problems, etc. Apply learning procedures and
data (real or artificially generated) to obtain parameter configuration which
perform well (e.g. Deep Hedging).
Machine Learning relies on different kinds of universal approximation
theorems for, e.g.
I Feed forward networks, Recurrent neural networks, LSTMs, Neural
(stochastic) differential equations, Signature-based models, etc,
I and on training models on data.
Josef Teichmann (ETHZ) Randomized signature May 2022 2 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Randomization and Provability
Key question: how to gain a quantitative understanding why artificial
traders work so well?
First step: Trained randomized networks as a rough role model for what
might be a result of early stopped training but with a perspective of
provability. See, e.g., works of Jakob Heiss and Hanna Wutte in my working
group.
Second step: Randomized networks can appear as almost optimal choice
(Reservoir Computing).
Third step: Reduction of training complexity is a feature (sustainable
machine learning).
Josef Teichmann (ETHZ) Randomized signature May 2022 4 / 30
Introduction Motivation
Randomized networks: an older perspective? Ideas from
Reservoir computing
... to ease training procedures
... going back to Herbert Jäger with many contributions from Claudio
Gallicchio, Lyudmila Grigoryeva and Juan-Pablo Ortega, et al.
Typically an input signal is fed into a fixed (random) dynamical
system, called reservoir, which maps the input usually to higher
dimensions.
Then a simple (often linear) readout mechanism is trained to read the
state of the reservoir and map it to the desired output.
The main benefit is that training is performed only at the readout
stage while the reservoir is fixed and untrained.
Reservoirs can in some cases be realized physically and learning the
readout layer is often a simple (regularized) regression.
Josef Teichmann (ETHZ) Randomized signature May 2022 5 / 30
Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.
To illustrate this methodology let us consider as motivating example the following
task from finance:
Goal: learn from observations of time series data of a stock price its
dependence on the driving noise, e.g. Brownian motion
As it is easy to simulate Brownian motion, the learned relationship from the
driving Brownian motion to the price data, allows to easily simulate stock
prices, e.g. for risk management.
⇒ Market generators
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
Introduction Motivation
Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:
Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
Introduction Motivation
Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:
I a universal (fixed) reservoir X, no
dependence on the specific
dynamics of Y
I linear readout W that needs to be
trained such that Y ≈ WX
B
F //
R

Y
X
W
??
Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
Introduction Motivation
Learning from market observations
R
 W
??
Reservoir Linear Readout
Josef Teichmann (ETHZ) Randomized signature May 2022 8 / 30
Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.
Signature goes back to K. Chen (’57) and plays a prominent in rough
path theory (T. Lyons (’98), P. Friz  N. Victoir (’10), P. Friz 
M. Hairer (’14)).
In the last few years there have been many papers (e.g. Levin, Lyons,
and Ni (2016)) showing how to apply rough path theory and
signature methods to machine learning and time series analysis.
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).
⇒ This yields a natural split in spirit of reservoir computing into
I the signature of the input signal being the generic reservoir;
I a linear (readout) map.
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Goal of the talk
Prove in the case of randomized recurrent neural networks how
dynamical systems can be approximated with precise convergence
rates.
Take randomized recurrent networks as a role model for randomized
deep networks.
Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
Introduction Motivation
Goal of the talk
Prove in the case of randomized recurrent neural networks how
dynamical systems can be approximated with precise convergence
rates.
Take randomized recurrent networks as a role model for randomized
deep networks.
Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
Randomized signature and reservoir computing Setting
Mathematical Setting
We consider here the simplest setting with smooth input signals (controls)
and output signals taking values in RN
, but it works with more general
drivers (e.g. semimartingale) and by replacing RN
by so-called convenient
vector spaces.
Consider a controlled ordinary differential equation (CODE)
dYt =
d
X
i=0
Vi (Yt)dui
(t) , Y0 = y ∈ RN
(CODE)
for some smooth vector fields Vi : RN
→ RN
, i = 0, . . . , d and d smooth
control curves ui
. Notice again that du0
(t) = dt
We observe the controls u (input) and Y (output), but do not have access
to the vector fields Vi .
The goal is to learn the dynamics and to simulate from it conditional on
(new) controls u, i.e. we aim to learn the map
input control u 7→ solution trajectory Y .
Josef Teichmann (ETHZ) Randomized signature May 2022 12 / 30
Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.
Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.
Sums and products are defined in the natural way.
We consider the complete locally convex topology making all
projections a 7→ ai1...ik
continuous on Ad , hence a convenient vector
space.
Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – definitions
Signature of u is the unique solution of the following CODE in T((Rd ))
d Sigs,t =
d
X
i=1
Sigs,t ei dui
(t), Sigs,s = 1.
and is apparently given by
Sigs,t(a) = a
∞
X
k=0
d
X
i1,...,ik =0
Z
s≤t1≤···≤tk ≤t
dui1
(t1) · · · duik
(tk) ei1 · · · eik
.
Josef Teichmann (ETHZ) Randomized signature May 2022 14 / 30
Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Signature and its connection to reservoir computing
The following “splitting theorem” is the precise link to reservoir computing. We
suppose here that (CODE) admits a unique global solution given by an smooth
evolution operator Evol such that Yt = Evolt(y).
Theorem
Let Evol be a smooth evolution operator on RN
such that (Evolt(y))t satisfies
(CODE). Then for any smooth function g : RN
→ R and for every M ≥ 0 there is
a time-homogenous linear map W depending on (V1, . . . , Vd , g, M, y) from
TM
(Rd
) → R such that
g Evolt(y)

= W πM(Sigt)

+ O tM+1

,
where πM : T((Rd
)) → TM
(Rd
) is the canonical projection.
Remark
For the proof see e.g. Lyons (1998). It can however be proved in much more
generality, e.g. on convenient vector spaces.
Josef Teichmann (ETHZ) Randomized signature May 2022 15 / 30
Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Is signature a good reservoir?
This split is not yet fully in spirit of reservoir computing, since unlike
a physical systems where the evaluations are ultrafast, computing
signature up to a high order can take a while, in particular if d is large.
Moreover, regression on signature is the analog on path space of a
polynomial approximation, which can have several disadvantages.
Remedy: information compression by Johnson-Lindenstrauss
projection.
Josef Teichmann (ETHZ) Randomized signature May 2022 16 / 30
Randomly projected universal signature dynamics The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss (JL) lemma
We here state the classical version of the Johnson-Lindenstrauss Lemma.
Lemma
For every 0    1, and every set Q consisting of an N points in some Rn
, there
is a linear map f : Rn
→ Rk
with k ≥ 24 log N
32−23 such that
(1 − )kv1 − v2k
2
≤ kf (v1) − f (v2)k
2
≤ (1 + )kv1 − v2k
2
for all v1, v2 ∈ Q, i.e. the geometry of Q is almost preserved after the projection.
The map f is called (JL) map and it can be drawn randomly from a set of
linear projection maps.
Indeed, take a k × n matrix A of with iid standard normal entries. Then
1
√
k
A satisfies the desired requirements with high probability .
We apply this remarkable result to obtain “versions of signature” in lower
dimensional spaces.
Josef Teichmann (ETHZ) Randomized signature May 2022 17 / 30
Randomly projected universal signature dynamics Randomized signature
Towards randomized signature
We look for (JL) maps on TM
(Rd
) which preserve its geometry encoded in some
set of (relevant) directions Q. In order to make this program work, we need the
following definition:
Definition
Let Q be any (finite or infinite) set of elements of norm one in TM
(Rd
) with
Q = −Q. For v ∈ TM
(Rd
) we define the function
kvkQ := inf
n X
j
|λj |
X
j
λj vj = v and vj ∈ Q
o
.
We use the convention inf ∅ = +∞ since the function is only finite on span(Q).
The function k.kQ behaves precisely like a norm on the span of Q.
Additionally kvkQ1
≥ kvkQ2
for Q1 ⊂ Q2.
Josef Teichmann (ETHZ) Randomized signature May 2022 18 / 30
Randomly projected universal signature dynamics Randomized signature
Towards randomized signature – a first estimate
Proposition
Fix M ≥ 1 and   0. Moreover, let Q be any N point set of vectors with norm
one in TM
(Rd
). Then there is linear map f : TM
(Rd
) → Rk
(with k being the
above JL constant with N), such that
hv1, v2 − (f ∗
◦ f )(v2)i
≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.
Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
Randomly projected universal signature dynamics Randomized signature
Towards randomized signature – a first estimate
Proposition
Fix M ≥ 1 and   0. Moreover, let Q be any N point set of vectors with norm
one in TM
(Rd
). Then there is linear map f : TM
(Rd
) → Rk
(with k being the
above JL constant with N), such that
hv1, v2 − (f ∗
◦ f )(v2)i
≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.
By means of this special JL map associated to a point set Q we can now
“project signature” without loosing too much information.
We can then solve the projected and obtain – up to some time – a solution
which is -close to signature.
By a slight abuse of notation we write Sigt for the truncated version
πM Sigt

in TM
(Rd
).
Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
Randomly projected universal signature dynamics Randomized signature
Randomized signature is as expressive as signature
Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann)
Let u be a smooth control and f a JL map from TM
(Rd
) → Rk
where k is
determined via some fixed  and a fixed set Q. We denote by r-Sig the smooth
evolution of the following controlled differential equation on Rk
dXt =
d
X
i=1
 1
√
n
f (f ∗
(Xt)ei ) + (1 −
1
√
n
)f (Sigt ei )

dui
(t) , X0 ∈ Rk
,
where n = dim(TM
(Rd
)). Then for each w ∈ TM
(Rd
)
|hw, Sigt −f ∗
(r-Sigt(X0))i|
≤
hw, Evolt(1 − f ∗
(X0))i
+ C
d
X
i=1
Z t
0
k Evol∗
r wkQk Sigr ei kQ dr ,
where Evol denotes here the evolution operator corresponding to
dZt =
Pd
i=1
1
√
n
(f ∗
◦ f )(Ztei )dui
(t) and C = sups≤r≤t, i

Machine Learning in Finance via Randomization

  • 1.
    Machine Learning inFinance via Randomization Josef Teichmann (based joint work with Lukas Gonon, Christa Cuchiero, Lyudmila Grigoryeva, and Juan-Pablo Ortega) ETHZ Universität Klagenfurt Josef Teichmann (ETHZ) Randomized signature May 2022 1 / 30
  • 2.
    Introduction Motivation Data drivenmodels Key question: how to gain a quantitative understanding in dynamic decision making in Finance or Economics? Classical approach: specify a model pool with well understood characteristics depending on as few as possible parameters (Occam’s razor), calibrate them to data and solve optimization problems thereon. Machine Learning approach: highly overparametrized function families are used instead of a few parameter families for constructing model pools, strategies for optimization problems, etc. Apply learning procedures and data (real or artificially generated) to obtain parameter configuration which perform well (e.g. Deep Hedging). Machine Learning relies on different kinds of universal approximation theorems for, e.g. I Feed forward networks, Recurrent neural networks, LSTMs, Neural (stochastic) differential equations, Signature-based models, etc, I and on training models on data. Josef Teichmann (ETHZ) Randomized signature May 2022 2 / 30
  • 3.
    Introduction Motivation Training ... ...is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 4.
    Introduction Motivation Training ... ...is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 5.
    Introduction Motivation Training ... ...is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 6.
    Introduction Motivation Training ... ...is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 7.
    Introduction Motivation Randomization andProvability Key question: how to gain a quantitative understanding why artificial traders work so well? First step: Trained randomized networks as a rough role model for what might be a result of early stopped training but with a perspective of provability. See, e.g., works of Jakob Heiss and Hanna Wutte in my working group. Second step: Randomized networks can appear as almost optimal choice (Reservoir Computing). Third step: Reduction of training complexity is a feature (sustainable machine learning). Josef Teichmann (ETHZ) Randomized signature May 2022 4 / 30
  • 8.
    Introduction Motivation Randomized networks:an older perspective? Ideas from Reservoir computing ... to ease training procedures ... going back to Herbert Jäger with many contributions from Claudio Gallicchio, Lyudmila Grigoryeva and Juan-Pablo Ortega, et al. Typically an input signal is fed into a fixed (random) dynamical system, called reservoir, which maps the input usually to higher dimensions. Then a simple (often linear) readout mechanism is trained to read the state of the reservoir and map it to the desired output. The main benefit is that training is performed only at the readout stage while the reservoir is fixed and untrained. Reservoirs can in some cases be realized physically and learning the readout layer is often a simple (regularized) regression. Josef Teichmann (ETHZ) Randomized signature May 2022 5 / 30
  • 9.
    Introduction Motivation Main paradigmand a motivating example Main paradigm of reservoir computing Split the input-output map, e.g. from initial values and the driving signal to the observed dynamics into a Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
  • 10.
    Introduction Motivation Main paradigmand a motivating example Main paradigm of reservoir computing Split the input-output map, e.g. from initial values and the driving signal to the observed dynamics into a generic part, the reservoir, which is not or only in small parts trained a readout part, which is accurately trained and often linear. Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
  • 11.
    Introduction Motivation Main paradigmand a motivating example Main paradigm of reservoir computing Split the input-output map, e.g. from initial values and the driving signal to the observed dynamics into a generic part, the reservoir, which is not or only in small parts trained a readout part, which is accurately trained and often linear. To illustrate this methodology let us consider as motivating example the following task from finance: Goal: learn from observations of time series data of a stock price its dependence on the driving noise, e.g. Brownian motion As it is easy to simulate Brownian motion, the learned relationship from the driving Brownian motion to the price data, allows to easily simulate stock prices, e.g. for risk management. ⇒ Market generators Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
  • 12.
    Introduction Motivation Reservoir computingfor market generation Assume that N market factors (e.g. prices, volatilities, etc.) are described by dYt = d X i=0 V i (Yt)dBi (t) , Y0 = y ∈ RN with V i : RN → RN and d independent Brownian motions Bi , i = 1, . . . , d. From now on the 0-th component shall always be time, i.e. dB0 (t) = dt. We want to learn the map F : (input noise B) → (solution trajectory Y ), without knowing V . This is a generically complicated map. Idea: Split the map in two parts: Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
  • 13.
    Introduction Motivation Reservoir computingfor market generation Assume that N market factors (e.g. prices, volatilities, etc.) are described by dYt = d X i=0 V i (Yt)dBi (t) , Y0 = y ∈ RN with V i : RN → RN and d independent Brownian motions Bi , i = 1, . . . , d. From now on the 0-th component shall always be time, i.e. dB0 (t) = dt. We want to learn the map F : (input noise B) → (solution trajectory Y ), without knowing V . This is a generically complicated map. Idea: Split the map in two parts: I a universal (fixed) reservoir X, no dependence on the specific dynamics of Y I linear readout W that needs to be trained such that Y ≈ WX B F // R Y X W ?? Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
  • 14.
    Introduction Motivation Learning frommarket observations R W ?? Reservoir Linear Readout Josef Teichmann (ETHZ) Randomized signature May 2022 8 / 30
  • 15.
    Introduction Motivation Randomized recurrentneural networks Can we understand why randomized recurrent neural networks are excellent feature extractors (or reservoirs)? Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
  • 16.
    Introduction Motivation Randomized recurrentneural networks Can we understand why randomized recurrent neural networks are excellent feature extractors (or reservoirs)? There is a natural candidate, namely the infinite dimensional signature of the driving signal, which serves as (universal) linear regression basis for continuous path functionals. Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
  • 17.
    Introduction Motivation Randomized recurrentneural networks Can we understand why randomized recurrent neural networks are excellent feature extractors (or reservoirs)? There is a natural candidate, namely the infinite dimensional signature of the driving signal, which serves as (universal) linear regression basis for continuous path functionals. Signature goes back to K. Chen (’57) and plays a prominent in rough path theory (T. Lyons (’98), P. Friz N. Victoir (’10), P. Friz M. Hairer (’14)). In the last few years there have been many papers (e.g. Levin, Lyons, and Ni (2016)) showing how to apply rough path theory and signature methods to machine learning and time series analysis. Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
  • 18.
    Introduction Motivation Learning thedynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 19.
    Introduction Motivation Learning thedynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Linear functions on the signature form an algebra that contains 1: I Every polynomial on signature may be realized as a linear function via the so-called shuffle product . Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 20.
    Introduction Motivation Learning thedynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Linear functions on the signature form an algebra that contains 1: I Every polynomial on signature may be realized as a linear function via the so-called shuffle product . ⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation distance of the path) path functionals on compact sets can be uniformly approximated by a linear function of the time extended signature. ⇒ Universal approximation theorem (UAT). Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 21.
    Introduction Motivation Learning thedynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Linear functions on the signature form an algebra that contains 1: I Every polynomial on signature may be realized as a linear function via the so-called shuffle product . ⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation distance of the path) path functionals on compact sets can be uniformly approximated by a linear function of the time extended signature. ⇒ Universal approximation theorem (UAT). ⇒ This yields a natural split in spirit of reservoir computing into I the signature of the input signal being the generic reservoir; I a linear (readout) map. Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 22.
    Introduction Motivation Goal ofthe talk Prove in the case of randomized recurrent neural networks how dynamical systems can be approximated with precise convergence rates. Take randomized recurrent networks as a role model for randomized deep networks. Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
  • 23.
    Introduction Motivation Goal ofthe talk Prove in the case of randomized recurrent neural networks how dynamical systems can be approximated with precise convergence rates. Take randomized recurrent networks as a role model for randomized deep networks. Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
  • 24.
    Randomized signature andreservoir computing Setting Mathematical Setting We consider here the simplest setting with smooth input signals (controls) and output signals taking values in RN , but it works with more general drivers (e.g. semimartingale) and by replacing RN by so-called convenient vector spaces. Consider a controlled ordinary differential equation (CODE) dYt = d X i=0 Vi (Yt)dui (t) , Y0 = y ∈ RN (CODE) for some smooth vector fields Vi : RN → RN , i = 0, . . . , d and d smooth control curves ui . Notice again that du0 (t) = dt We observe the controls u (input) and Y (output), but do not have access to the vector fields Vi . The goal is to learn the dynamics and to simulate from it conditional on (new) controls u, i.e. we aim to learn the map input control u 7→ solution trajectory Y . Josef Teichmann (ETHZ) Randomized signature May 2022 12 / 30
  • 25.
    Randomized signature andreservoir computing Preliminaries on signature Signature in a nutshell – notations The signature takes values in the free, nilpotent algebra generated by d indeterminates e1, . . . , ed given by T((Rd )) := {a = ∞ X k=0 d X i1,...,ik =1 ai1...ik ei1 · · · eik }. Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
  • 26.
    Randomized signature andreservoir computing Preliminaries on signature Signature in a nutshell – notations The signature takes values in the free, nilpotent algebra generated by d indeterminates e1, . . . , ed given by T((Rd )) := {a = ∞ X k=0 d X i1,...,ik =1 ai1...ik ei1 · · · eik }. Sums and products are defined in the natural way. We consider the complete locally convex topology making all projections a 7→ ai1...ik continuous on Ad , hence a convenient vector space. Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
  • 27.
    Randomized signature andreservoir computing Preliminaries on signature Signature in a nutshell – definitions Signature of u is the unique solution of the following CODE in T((Rd )) d Sigs,t = d X i=1 Sigs,t ei dui (t), Sigs,s = 1. and is apparently given by Sigs,t(a) = a ∞ X k=0 d X i1,...,ik =0 Z s≤t1≤···≤tk ≤t dui1 (t1) · · · duik (tk) ei1 · · · eik . Josef Teichmann (ETHZ) Randomized signature May 2022 14 / 30
  • 28.
    Randomized signature andreservoir computing A “splitting result” in spirit of reservoir computing Signature and its connection to reservoir computing The following “splitting theorem” is the precise link to reservoir computing. We suppose here that (CODE) admits a unique global solution given by an smooth evolution operator Evol such that Yt = Evolt(y). Theorem Let Evol be a smooth evolution operator on RN such that (Evolt(y))t satisfies (CODE). Then for any smooth function g : RN → R and for every M ≥ 0 there is a time-homogenous linear map W depending on (V1, . . . , Vd , g, M, y) from TM (Rd ) → R such that g Evolt(y) = W πM(Sigt) + O tM+1 , where πM : T((Rd )) → TM (Rd ) is the canonical projection. Remark For the proof see e.g. Lyons (1998). It can however be proved in much more generality, e.g. on convenient vector spaces. Josef Teichmann (ETHZ) Randomized signature May 2022 15 / 30
  • 29.
    Randomized signature andreservoir computing A “splitting result” in spirit of reservoir computing Is signature a good reservoir? This split is not yet fully in spirit of reservoir computing, since unlike a physical systems where the evaluations are ultrafast, computing signature up to a high order can take a while, in particular if d is large. Moreover, regression on signature is the analog on path space of a polynomial approximation, which can have several disadvantages. Remedy: information compression by Johnson-Lindenstrauss projection. Josef Teichmann (ETHZ) Randomized signature May 2022 16 / 30
  • 30.
    Randomly projected universalsignature dynamics The Johnson-Lindenstrauss Lemma The Johnson-Lindenstrauss (JL) lemma We here state the classical version of the Johnson-Lindenstrauss Lemma. Lemma For every 0 1, and every set Q consisting of an N points in some Rn , there is a linear map f : Rn → Rk with k ≥ 24 log N 32−23 such that (1 − )kv1 − v2k 2 ≤ kf (v1) − f (v2)k 2 ≤ (1 + )kv1 − v2k 2 for all v1, v2 ∈ Q, i.e. the geometry of Q is almost preserved after the projection. The map f is called (JL) map and it can be drawn randomly from a set of linear projection maps. Indeed, take a k × n matrix A of with iid standard normal entries. Then 1 √ k A satisfies the desired requirements with high probability . We apply this remarkable result to obtain “versions of signature” in lower dimensional spaces. Josef Teichmann (ETHZ) Randomized signature May 2022 17 / 30
  • 31.
    Randomly projected universalsignature dynamics Randomized signature Towards randomized signature We look for (JL) maps on TM (Rd ) which preserve its geometry encoded in some set of (relevant) directions Q. In order to make this program work, we need the following definition: Definition Let Q be any (finite or infinite) set of elements of norm one in TM (Rd ) with Q = −Q. For v ∈ TM (Rd ) we define the function kvkQ := inf n X j |λj |
  • 33.
    X j λj vj =v and vj ∈ Q o . We use the convention inf ∅ = +∞ since the function is only finite on span(Q). The function k.kQ behaves precisely like a norm on the span of Q. Additionally kvkQ1 ≥ kvkQ2 for Q1 ⊂ Q2. Josef Teichmann (ETHZ) Randomized signature May 2022 18 / 30
  • 34.
    Randomly projected universalsignature dynamics Randomized signature Towards randomized signature – a first estimate Proposition Fix M ≥ 1 and 0. Moreover, let Q be any N point set of vectors with norm one in TM (Rd ). Then there is linear map f : TM (Rd ) → Rk (with k being the above JL constant with N), such that
  • 36.
    hv1, v2 −(f ∗ ◦ f )(v2)i
  • 38.
    ≤ kv1kQkv2kQ , forall v1, v2 ∈ span(Q), where f ∗ : Rk → TM (Rd ) denotes the adjoint map of f with respect to the standard inner product on Rk . Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
  • 39.
    Randomly projected universalsignature dynamics Randomized signature Towards randomized signature – a first estimate Proposition Fix M ≥ 1 and 0. Moreover, let Q be any N point set of vectors with norm one in TM (Rd ). Then there is linear map f : TM (Rd ) → Rk (with k being the above JL constant with N), such that
  • 41.
    hv1, v2 −(f ∗ ◦ f )(v2)i
  • 43.
    ≤ kv1kQkv2kQ , forall v1, v2 ∈ span(Q), where f ∗ : Rk → TM (Rd ) denotes the adjoint map of f with respect to the standard inner product on Rk . By means of this special JL map associated to a point set Q we can now “project signature” without loosing too much information. We can then solve the projected and obtain – up to some time – a solution which is -close to signature. By a slight abuse of notation we write Sigt for the truncated version πM Sigt in TM (Rd ). Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
  • 44.
    Randomly projected universalsignature dynamics Randomized signature Randomized signature is as expressive as signature Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann) Let u be a smooth control and f a JL map from TM (Rd ) → Rk where k is determined via some fixed and a fixed set Q. We denote by r-Sig the smooth evolution of the following controlled differential equation on Rk dXt = d X i=1 1 √ n f (f ∗ (Xt)ei ) + (1 − 1 √ n )f (Sigt ei ) dui (t) , X0 ∈ Rk , where n = dim(TM (Rd )). Then for each w ∈ TM (Rd ) |hw, Sigt −f ∗ (r-Sigt(X0))i| ≤
  • 46.
    hw, Evolt(1 −f ∗ (X0))i
  • 48.
    + C d X i=1 Z t 0 kEvol∗ r wkQk Sigr ei kQ dr , where Evol denotes here the evolution operator corresponding to dZt = Pd i=1 1 √ n (f ∗ ◦ f )(Ztei )dui (t) and C = sups≤r≤t, i