SlideShare a Scribd company logo
1 of 50
Download to read offline
Machine Learning in Finance via Randomization
Josef Teichmann
(based joint work with Lukas Gonon, Christa Cuchiero, Lyudmila
Grigoryeva, and Juan-Pablo Ortega)
ETHZ
Universität Klagenfurt
Josef Teichmann (ETHZ) Randomized signature May 2022 1 / 30
Introduction Motivation
Data driven models
Key question: how to gain a quantitative understanding in dynamic decision
making in Finance or Economics?
Classical approach: specify a model pool with well understood
characteristics depending on as few as possible parameters (Occam’s razor),
calibrate them to data and solve optimization problems thereon.
Machine Learning approach: highly overparametrized function families are
used instead of a few parameter families for constructing model pools,
strategies for optimization problems, etc. Apply learning procedures and
data (real or artificially generated) to obtain parameter configuration which
perform well (e.g. Deep Hedging).
Machine Learning relies on different kinds of universal approximation
theorems for, e.g.
I Feed forward networks, Recurrent neural networks, LSTMs, Neural
(stochastic) differential equations, Signature-based models, etc,
I and on training models on data.
Josef Teichmann (ETHZ) Randomized signature May 2022 2 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
Introduction Motivation
Randomization and Provability
Key question: how to gain a quantitative understanding why artificial
traders work so well?
First step: Trained randomized networks as a rough role model for what
might be a result of early stopped training but with a perspective of
provability. See, e.g., works of Jakob Heiss and Hanna Wutte in my working
group.
Second step: Randomized networks can appear as almost optimal choice
(Reservoir Computing).
Third step: Reduction of training complexity is a feature (sustainable
machine learning).
Josef Teichmann (ETHZ) Randomized signature May 2022 4 / 30
Introduction Motivation
Randomized networks: an older perspective? Ideas from
Reservoir computing
... to ease training procedures
... going back to Herbert Jäger with many contributions from Claudio
Gallicchio, Lyudmila Grigoryeva and Juan-Pablo Ortega, et al.
Typically an input signal is fed into a fixed (random) dynamical
system, called reservoir, which maps the input usually to higher
dimensions.
Then a simple (often linear) readout mechanism is trained to read the
state of the reservoir and map it to the desired output.
The main benefit is that training is performed only at the readout
stage while the reservoir is fixed and untrained.
Reservoirs can in some cases be realized physically and learning the
readout layer is often a simple (regularized) regression.
Josef Teichmann (ETHZ) Randomized signature May 2022 5 / 30
Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.
To illustrate this methodology let us consider as motivating example the following
task from finance:
Goal: learn from observations of time series data of a stock price its
dependence on the driving noise, e.g. Brownian motion
As it is easy to simulate Brownian motion, the learned relationship from the
driving Brownian motion to the price data, allows to easily simulate stock
prices, e.g. for risk management.
⇒ Market generators
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
Introduction Motivation
Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:
Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
Introduction Motivation
Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:
I a universal (fixed) reservoir X, no
dependence on the specific
dynamics of Y
I linear readout W that needs to be
trained such that Y ≈ WX
B
F //
R

Y
X
W
??
Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
Introduction Motivation
Learning from market observations
R
 W
??
Reservoir Linear Readout
Josef Teichmann (ETHZ) Randomized signature May 2022 8 / 30
Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.
Signature goes back to K. Chen (’57) and plays a prominent in rough
path theory (T. Lyons (’98), P. Friz  N. Victoir (’10), P. Friz 
M. Hairer (’14)).
In the last few years there have been many papers (e.g. Levin, Lyons,
and Ni (2016)) showing how to apply rough path theory and
signature methods to machine learning and time series analysis.
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).
⇒ This yields a natural split in spirit of reservoir computing into
I the signature of the input signal being the generic reservoir;
I a linear (readout) map.
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
Introduction Motivation
Goal of the talk
Prove in the case of randomized recurrent neural networks how
dynamical systems can be approximated with precise convergence
rates.
Take randomized recurrent networks as a role model for randomized
deep networks.
Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
Introduction Motivation
Goal of the talk
Prove in the case of randomized recurrent neural networks how
dynamical systems can be approximated with precise convergence
rates.
Take randomized recurrent networks as a role model for randomized
deep networks.
Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
Randomized signature and reservoir computing Setting
Mathematical Setting
We consider here the simplest setting with smooth input signals (controls)
and output signals taking values in RN
, but it works with more general
drivers (e.g. semimartingale) and by replacing RN
by so-called convenient
vector spaces.
Consider a controlled ordinary differential equation (CODE)
dYt =
d
X
i=0
Vi (Yt)dui
(t) , Y0 = y ∈ RN
(CODE)
for some smooth vector fields Vi : RN
→ RN
, i = 0, . . . , d and d smooth
control curves ui
. Notice again that du0
(t) = dt
We observe the controls u (input) and Y (output), but do not have access
to the vector fields Vi .
The goal is to learn the dynamics and to simulate from it conditional on
(new) controls u, i.e. we aim to learn the map
input control u 7→ solution trajectory Y .
Josef Teichmann (ETHZ) Randomized signature May 2022 12 / 30
Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.
Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.
Sums and products are defined in the natural way.
We consider the complete locally convex topology making all
projections a 7→ ai1...ik
continuous on Ad , hence a convenient vector
space.
Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – definitions
Signature of u is the unique solution of the following CODE in T((Rd ))
d Sigs,t =
d
X
i=1
Sigs,t ei dui
(t), Sigs,s = 1.
and is apparently given by
Sigs,t(a) = a
∞
X
k=0
d
X
i1,...,ik =0
Z
s≤t1≤···≤tk ≤t
dui1
(t1) · · · duik
(tk) ei1 · · · eik
.
Josef Teichmann (ETHZ) Randomized signature May 2022 14 / 30
Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Signature and its connection to reservoir computing
The following “splitting theorem” is the precise link to reservoir computing. We
suppose here that (CODE) admits a unique global solution given by an smooth
evolution operator Evol such that Yt = Evolt(y).
Theorem
Let Evol be a smooth evolution operator on RN
such that (Evolt(y))t satisfies
(CODE). Then for any smooth function g : RN
→ R and for every M ≥ 0 there is
a time-homogenous linear map W depending on (V1, . . . , Vd , g, M, y) from
TM
(Rd
) → R such that
g Evolt(y)

= W πM(Sigt)

+ O tM+1

,
where πM : T((Rd
)) → TM
(Rd
) is the canonical projection.
Remark
For the proof see e.g. Lyons (1998). It can however be proved in much more
generality, e.g. on convenient vector spaces.
Josef Teichmann (ETHZ) Randomized signature May 2022 15 / 30
Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Is signature a good reservoir?
This split is not yet fully in spirit of reservoir computing, since unlike
a physical systems where the evaluations are ultrafast, computing
signature up to a high order can take a while, in particular if d is large.
Moreover, regression on signature is the analog on path space of a
polynomial approximation, which can have several disadvantages.
Remedy: information compression by Johnson-Lindenstrauss
projection.
Josef Teichmann (ETHZ) Randomized signature May 2022 16 / 30
Randomly projected universal signature dynamics The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss (JL) lemma
We here state the classical version of the Johnson-Lindenstrauss Lemma.
Lemma
For every 0    1, and every set Q consisting of an N points in some Rn
, there
is a linear map f : Rn
→ Rk
with k ≥ 24 log N
32−23 such that
(1 − )kv1 − v2k
2
≤ kf (v1) − f (v2)k
2
≤ (1 + )kv1 − v2k
2
for all v1, v2 ∈ Q, i.e. the geometry of Q is almost preserved after the projection.
The map f is called (JL) map and it can be drawn randomly from a set of
linear projection maps.
Indeed, take a k × n matrix A of with iid standard normal entries. Then
1
√
k
A satisfies the desired requirements with high probability .
We apply this remarkable result to obtain “versions of signature” in lower
dimensional spaces.
Josef Teichmann (ETHZ) Randomized signature May 2022 17 / 30
Randomly projected universal signature dynamics Randomized signature
Towards randomized signature
We look for (JL) maps on TM
(Rd
) which preserve its geometry encoded in some
set of (relevant) directions Q. In order to make this program work, we need the
following definition:
Definition
Let Q be any (finite or infinite) set of elements of norm one in TM
(Rd
) with
Q = −Q. For v ∈ TM
(Rd
) we define the function
kvkQ := inf
n X
j
|λj |
X
j
λj vj = v and vj ∈ Q
o
.
We use the convention inf ∅ = +∞ since the function is only finite on span(Q).
The function k.kQ behaves precisely like a norm on the span of Q.
Additionally kvkQ1
≥ kvkQ2
for Q1 ⊂ Q2.
Josef Teichmann (ETHZ) Randomized signature May 2022 18 / 30
Randomly projected universal signature dynamics Randomized signature
Towards randomized signature – a first estimate
Proposition
Fix M ≥ 1 and   0. Moreover, let Q be any N point set of vectors with norm
one in TM
(Rd
). Then there is linear map f : TM
(Rd
) → Rk
(with k being the
above JL constant with N), such that
hv1, v2 − (f ∗
◦ f )(v2)i
≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.
Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
Randomly projected universal signature dynamics Randomized signature
Towards randomized signature – a first estimate
Proposition
Fix M ≥ 1 and   0. Moreover, let Q be any N point set of vectors with norm
one in TM
(Rd
). Then there is linear map f : TM
(Rd
) → Rk
(with k being the
above JL constant with N), such that
hv1, v2 − (f ∗
◦ f )(v2)i
≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.
By means of this special JL map associated to a point set Q we can now
“project signature” without loosing too much information.
We can then solve the projected and obtain – up to some time – a solution
which is -close to signature.
By a slight abuse of notation we write Sigt for the truncated version
πM Sigt

in TM
(Rd
).
Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
Randomly projected universal signature dynamics Randomized signature
Randomized signature is as expressive as signature
Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann)
Let u be a smooth control and f a JL map from TM
(Rd
) → Rk
where k is
determined via some fixed  and a fixed set Q. We denote by r-Sig the smooth
evolution of the following controlled differential equation on Rk
dXt =
d
X
i=1
 1
√
n
f (f ∗
(Xt)ei ) + (1 −
1
√
n
)f (Sigt ei )

dui
(t) , X0 ∈ Rk
,
where n = dim(TM
(Rd
)). Then for each w ∈ TM
(Rd
)
|hw, Sigt −f ∗
(r-Sigt(X0))i|
≤
hw, Evolt(1 − f ∗
(X0))i
+ C
d
X
i=1
Z t
0
k Evol∗
r wkQk Sigr ei kQ dr ,
where Evol denotes here the evolution operator corresponding to
dZt =
Pd
i=1
1
√
n
(f ∗
◦ f )(Ztei )dui
(t) and C = sups≤r≤t, i

More Related Content

Similar to Machine Learning in Finance via Randomization

Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 

Similar to Machine Learning in Finance via Randomization (20)

Striving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 01
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science
 
Eric Smidth
Eric SmidthEric Smidth
Eric Smidth
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Imprecision in learning: an overview
Imprecision in learning: an overviewImprecision in learning: an overview
Imprecision in learning: an overview
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
 
Master Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMaster Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno Siebes
 
DEFENSE
DEFENSEDEFENSE
DEFENSE
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Lecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfLecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdf
 
Cuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and ApplicationsCuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and Applications
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
 

More from Förderverein Technische Fakultät

The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
Förderverein Technische Fakultät
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
Förderverein Technische Fakultät
 
The Computing Continuum.pdf
The Computing Continuum.pdfThe Computing Continuum.pdf
The Computing Continuum.pdf
Förderverein Technische Fakultät
 

More from Förderverein Technische Fakultät (20)

Supervisory control of business processes
Supervisory control of business processesSupervisory control of business processes
Supervisory control of business processes
 
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
 
A Game of Chess is Like a Swordfight.pdf
A Game of Chess is Like a Swordfight.pdfA Game of Chess is Like a Swordfight.pdf
A Game of Chess is Like a Swordfight.pdf
 
From Mind to Meta.pdf
From Mind to Meta.pdfFrom Mind to Meta.pdf
From Mind to Meta.pdf
 
Miniatures Design for Tabletop Games.pdf
Miniatures Design for Tabletop Games.pdfMiniatures Design for Tabletop Games.pdf
Miniatures Design for Tabletop Games.pdf
 
Distributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptxDistributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptx
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
 
Engineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdfEngineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdf
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
 
Towards a data driven identification of teaching patterns.pdf
Towards a data driven identification of teaching patterns.pdfTowards a data driven identification of teaching patterns.pdf
Towards a data driven identification of teaching patterns.pdf
 
Förderverein Technische Fakultät.pptx
Förderverein Technische Fakultät.pptxFörderverein Technische Fakultät.pptx
Förderverein Technische Fakultät.pptx
 
The Computing Continuum.pdf
The Computing Continuum.pdfThe Computing Continuum.pdf
The Computing Continuum.pdf
 
East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...
 
IT does not stop
IT does not stopIT does not stop
IT does not stop
 
Advances in Visual Quality Restoration with Generative Adversarial Networks
Advances in Visual Quality Restoration with Generative Adversarial NetworksAdvances in Visual Quality Restoration with Generative Adversarial Networks
Advances in Visual Quality Restoration with Generative Adversarial Networks
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdfIndustriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
 
Introduction to 5G from radio perspective
Introduction to 5G from radio perspectiveIntroduction to 5G from radio perspective
Introduction to 5G from radio perspective
 
Förderverein Technische Fakultät
Förderverein Technische Fakultät Förderverein Technische Fakultät
Förderverein Technische Fakultät
 

Recently uploaded

Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
GlendelCaroz
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
GOWTHAMIM22
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
Sérgio Sacani
 

Recently uploaded (20)

GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einstein
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 

Machine Learning in Finance via Randomization

  • 1. Machine Learning in Finance via Randomization Josef Teichmann (based joint work with Lukas Gonon, Christa Cuchiero, Lyudmila Grigoryeva, and Juan-Pablo Ortega) ETHZ Universität Klagenfurt Josef Teichmann (ETHZ) Randomized signature May 2022 1 / 30
  • 2. Introduction Motivation Data driven models Key question: how to gain a quantitative understanding in dynamic decision making in Finance or Economics? Classical approach: specify a model pool with well understood characteristics depending on as few as possible parameters (Occam’s razor), calibrate them to data and solve optimization problems thereon. Machine Learning approach: highly overparametrized function families are used instead of a few parameter families for constructing model pools, strategies for optimization problems, etc. Apply learning procedures and data (real or artificially generated) to obtain parameter configuration which perform well (e.g. Deep Hedging). Machine Learning relies on different kinds of universal approximation theorems for, e.g. I Feed forward networks, Recurrent neural networks, LSTMs, Neural (stochastic) differential equations, Signature-based models, etc, I and on training models on data. Josef Teichmann (ETHZ) Randomized signature May 2022 2 / 30
  • 3. Introduction Motivation Training ... ... is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 4. Introduction Motivation Training ... ... is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 5. Introduction Motivation Training ... ... is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 6. Introduction Motivation Training ... ... is an enigmatic procedure: ⇒ it often means applying an algorithm and not necessarily solving a well defined problem. Architectures, training hyperparameters, training complexity, training data have to be carefully adjusted with a lot of domain knowledge to make things finally work. Good habits of classical modeling are left behind (no convergence rates, overparametrization, no sophisticated training algorithms, large amounts of sometimes quite different data, etc). Results are in several cases fascinating and far reaching. Robustness is almost too good to be true (Deep Hedging, Deep Optimal Stopping). Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
  • 7. Introduction Motivation Randomization and Provability Key question: how to gain a quantitative understanding why artificial traders work so well? First step: Trained randomized networks as a rough role model for what might be a result of early stopped training but with a perspective of provability. See, e.g., works of Jakob Heiss and Hanna Wutte in my working group. Second step: Randomized networks can appear as almost optimal choice (Reservoir Computing). Third step: Reduction of training complexity is a feature (sustainable machine learning). Josef Teichmann (ETHZ) Randomized signature May 2022 4 / 30
  • 8. Introduction Motivation Randomized networks: an older perspective? Ideas from Reservoir computing ... to ease training procedures ... going back to Herbert Jäger with many contributions from Claudio Gallicchio, Lyudmila Grigoryeva and Juan-Pablo Ortega, et al. Typically an input signal is fed into a fixed (random) dynamical system, called reservoir, which maps the input usually to higher dimensions. Then a simple (often linear) readout mechanism is trained to read the state of the reservoir and map it to the desired output. The main benefit is that training is performed only at the readout stage while the reservoir is fixed and untrained. Reservoirs can in some cases be realized physically and learning the readout layer is often a simple (regularized) regression. Josef Teichmann (ETHZ) Randomized signature May 2022 5 / 30
  • 9. Introduction Motivation Main paradigm and a motivating example Main paradigm of reservoir computing Split the input-output map, e.g. from initial values and the driving signal to the observed dynamics into a Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
  • 10. Introduction Motivation Main paradigm and a motivating example Main paradigm of reservoir computing Split the input-output map, e.g. from initial values and the driving signal to the observed dynamics into a generic part, the reservoir, which is not or only in small parts trained a readout part, which is accurately trained and often linear. Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
  • 11. Introduction Motivation Main paradigm and a motivating example Main paradigm of reservoir computing Split the input-output map, e.g. from initial values and the driving signal to the observed dynamics into a generic part, the reservoir, which is not or only in small parts trained a readout part, which is accurately trained and often linear. To illustrate this methodology let us consider as motivating example the following task from finance: Goal: learn from observations of time series data of a stock price its dependence on the driving noise, e.g. Brownian motion As it is easy to simulate Brownian motion, the learned relationship from the driving Brownian motion to the price data, allows to easily simulate stock prices, e.g. for risk management. ⇒ Market generators Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
  • 12. Introduction Motivation Reservoir computing for market generation Assume that N market factors (e.g. prices, volatilities, etc.) are described by dYt = d X i=0 V i (Yt)dBi (t) , Y0 = y ∈ RN with V i : RN → RN and d independent Brownian motions Bi , i = 1, . . . , d. From now on the 0-th component shall always be time, i.e. dB0 (t) = dt. We want to learn the map F : (input noise B) → (solution trajectory Y ), without knowing V . This is a generically complicated map. Idea: Split the map in two parts: Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
  • 13. Introduction Motivation Reservoir computing for market generation Assume that N market factors (e.g. prices, volatilities, etc.) are described by dYt = d X i=0 V i (Yt)dBi (t) , Y0 = y ∈ RN with V i : RN → RN and d independent Brownian motions Bi , i = 1, . . . , d. From now on the 0-th component shall always be time, i.e. dB0 (t) = dt. We want to learn the map F : (input noise B) → (solution trajectory Y ), without knowing V . This is a generically complicated map. Idea: Split the map in two parts: I a universal (fixed) reservoir X, no dependence on the specific dynamics of Y I linear readout W that needs to be trained such that Y ≈ WX B F // R Y X W ?? Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
  • 14. Introduction Motivation Learning from market observations R W ?? Reservoir Linear Readout Josef Teichmann (ETHZ) Randomized signature May 2022 8 / 30
  • 15. Introduction Motivation Randomized recurrent neural networks Can we understand why randomized recurrent neural networks are excellent feature extractors (or reservoirs)? Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
  • 16. Introduction Motivation Randomized recurrent neural networks Can we understand why randomized recurrent neural networks are excellent feature extractors (or reservoirs)? There is a natural candidate, namely the infinite dimensional signature of the driving signal, which serves as (universal) linear regression basis for continuous path functionals. Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
  • 17. Introduction Motivation Randomized recurrent neural networks Can we understand why randomized recurrent neural networks are excellent feature extractors (or reservoirs)? There is a natural candidate, namely the infinite dimensional signature of the driving signal, which serves as (universal) linear regression basis for continuous path functionals. Signature goes back to K. Chen (’57) and plays a prominent in rough path theory (T. Lyons (’98), P. Friz N. Victoir (’10), P. Friz M. Hairer (’14)). In the last few years there have been many papers (e.g. Levin, Lyons, and Ni (2016)) showing how to apply rough path theory and signature methods to machine learning and time series analysis. Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
  • 18. Introduction Motivation Learning the dynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 19. Introduction Motivation Learning the dynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Linear functions on the signature form an algebra that contains 1: I Every polynomial on signature may be realized as a linear function via the so-called shuffle product . Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 20. Introduction Motivation Learning the dynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Linear functions on the signature form an algebra that contains 1: I Every polynomial on signature may be realized as a linear function via the so-called shuffle product . ⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation distance of the path) path functionals on compact sets can be uniformly approximated by a linear function of the time extended signature. ⇒ Universal approximation theorem (UAT). Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 21. Introduction Motivation Learning the dynamics via signature Signature is point-separating: I The signature of a d-dimensional (geometric rough) path, including in particular Brownian motion and smooth curves, uniquely determines the path up to tree-like equivalences. These tree-like equivalences can be avoided by adding time. Linear functions on the signature form an algebra that contains 1: I Every polynomial on signature may be realized as a linear function via the so-called shuffle product . ⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation distance of the path) path functionals on compact sets can be uniformly approximated by a linear function of the time extended signature. ⇒ Universal approximation theorem (UAT). ⇒ This yields a natural split in spirit of reservoir computing into I the signature of the input signal being the generic reservoir; I a linear (readout) map. Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
  • 22. Introduction Motivation Goal of the talk Prove in the case of randomized recurrent neural networks how dynamical systems can be approximated with precise convergence rates. Take randomized recurrent networks as a role model for randomized deep networks. Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
  • 23. Introduction Motivation Goal of the talk Prove in the case of randomized recurrent neural networks how dynamical systems can be approximated with precise convergence rates. Take randomized recurrent networks as a role model for randomized deep networks. Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
  • 24. Randomized signature and reservoir computing Setting Mathematical Setting We consider here the simplest setting with smooth input signals (controls) and output signals taking values in RN , but it works with more general drivers (e.g. semimartingale) and by replacing RN by so-called convenient vector spaces. Consider a controlled ordinary differential equation (CODE) dYt = d X i=0 Vi (Yt)dui (t) , Y0 = y ∈ RN (CODE) for some smooth vector fields Vi : RN → RN , i = 0, . . . , d and d smooth control curves ui . Notice again that du0 (t) = dt We observe the controls u (input) and Y (output), but do not have access to the vector fields Vi . The goal is to learn the dynamics and to simulate from it conditional on (new) controls u, i.e. we aim to learn the map input control u 7→ solution trajectory Y . Josef Teichmann (ETHZ) Randomized signature May 2022 12 / 30
  • 25. Randomized signature and reservoir computing Preliminaries on signature Signature in a nutshell – notations The signature takes values in the free, nilpotent algebra generated by d indeterminates e1, . . . , ed given by T((Rd )) := {a = ∞ X k=0 d X i1,...,ik =1 ai1...ik ei1 · · · eik }. Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
  • 26. Randomized signature and reservoir computing Preliminaries on signature Signature in a nutshell – notations The signature takes values in the free, nilpotent algebra generated by d indeterminates e1, . . . , ed given by T((Rd )) := {a = ∞ X k=0 d X i1,...,ik =1 ai1...ik ei1 · · · eik }. Sums and products are defined in the natural way. We consider the complete locally convex topology making all projections a 7→ ai1...ik continuous on Ad , hence a convenient vector space. Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
  • 27. Randomized signature and reservoir computing Preliminaries on signature Signature in a nutshell – definitions Signature of u is the unique solution of the following CODE in T((Rd )) d Sigs,t = d X i=1 Sigs,t ei dui (t), Sigs,s = 1. and is apparently given by Sigs,t(a) = a ∞ X k=0 d X i1,...,ik =0 Z s≤t1≤···≤tk ≤t dui1 (t1) · · · duik (tk) ei1 · · · eik . Josef Teichmann (ETHZ) Randomized signature May 2022 14 / 30
  • 28. Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing Signature and its connection to reservoir computing The following “splitting theorem” is the precise link to reservoir computing. We suppose here that (CODE) admits a unique global solution given by an smooth evolution operator Evol such that Yt = Evolt(y). Theorem Let Evol be a smooth evolution operator on RN such that (Evolt(y))t satisfies (CODE). Then for any smooth function g : RN → R and for every M ≥ 0 there is a time-homogenous linear map W depending on (V1, . . . , Vd , g, M, y) from TM (Rd ) → R such that g Evolt(y) = W πM(Sigt) + O tM+1 , where πM : T((Rd )) → TM (Rd ) is the canonical projection. Remark For the proof see e.g. Lyons (1998). It can however be proved in much more generality, e.g. on convenient vector spaces. Josef Teichmann (ETHZ) Randomized signature May 2022 15 / 30
  • 29. Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing Is signature a good reservoir? This split is not yet fully in spirit of reservoir computing, since unlike a physical systems where the evaluations are ultrafast, computing signature up to a high order can take a while, in particular if d is large. Moreover, regression on signature is the analog on path space of a polynomial approximation, which can have several disadvantages. Remedy: information compression by Johnson-Lindenstrauss projection. Josef Teichmann (ETHZ) Randomized signature May 2022 16 / 30
  • 30. Randomly projected universal signature dynamics The Johnson-Lindenstrauss Lemma The Johnson-Lindenstrauss (JL) lemma We here state the classical version of the Johnson-Lindenstrauss Lemma. Lemma For every 0 1, and every set Q consisting of an N points in some Rn , there is a linear map f : Rn → Rk with k ≥ 24 log N 32−23 such that (1 − )kv1 − v2k 2 ≤ kf (v1) − f (v2)k 2 ≤ (1 + )kv1 − v2k 2 for all v1, v2 ∈ Q, i.e. the geometry of Q is almost preserved after the projection. The map f is called (JL) map and it can be drawn randomly from a set of linear projection maps. Indeed, take a k × n matrix A of with iid standard normal entries. Then 1 √ k A satisfies the desired requirements with high probability . We apply this remarkable result to obtain “versions of signature” in lower dimensional spaces. Josef Teichmann (ETHZ) Randomized signature May 2022 17 / 30
  • 31. Randomly projected universal signature dynamics Randomized signature Towards randomized signature We look for (JL) maps on TM (Rd ) which preserve its geometry encoded in some set of (relevant) directions Q. In order to make this program work, we need the following definition: Definition Let Q be any (finite or infinite) set of elements of norm one in TM (Rd ) with Q = −Q. For v ∈ TM (Rd ) we define the function kvkQ := inf n X j |λj |
  • 32.
  • 33. X j λj vj = v and vj ∈ Q o . We use the convention inf ∅ = +∞ since the function is only finite on span(Q). The function k.kQ behaves precisely like a norm on the span of Q. Additionally kvkQ1 ≥ kvkQ2 for Q1 ⊂ Q2. Josef Teichmann (ETHZ) Randomized signature May 2022 18 / 30
  • 34. Randomly projected universal signature dynamics Randomized signature Towards randomized signature – a first estimate Proposition Fix M ≥ 1 and 0. Moreover, let Q be any N point set of vectors with norm one in TM (Rd ). Then there is linear map f : TM (Rd ) → Rk (with k being the above JL constant with N), such that
  • 35.
  • 36. hv1, v2 − (f ∗ ◦ f )(v2)i
  • 37.
  • 38. ≤ kv1kQkv2kQ , for all v1, v2 ∈ span(Q), where f ∗ : Rk → TM (Rd ) denotes the adjoint map of f with respect to the standard inner product on Rk . Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
  • 39. Randomly projected universal signature dynamics Randomized signature Towards randomized signature – a first estimate Proposition Fix M ≥ 1 and 0. Moreover, let Q be any N point set of vectors with norm one in TM (Rd ). Then there is linear map f : TM (Rd ) → Rk (with k being the above JL constant with N), such that
  • 40.
  • 41. hv1, v2 − (f ∗ ◦ f )(v2)i
  • 42.
  • 43. ≤ kv1kQkv2kQ , for all v1, v2 ∈ span(Q), where f ∗ : Rk → TM (Rd ) denotes the adjoint map of f with respect to the standard inner product on Rk . By means of this special JL map associated to a point set Q we can now “project signature” without loosing too much information. We can then solve the projected and obtain – up to some time – a solution which is -close to signature. By a slight abuse of notation we write Sigt for the truncated version πM Sigt in TM (Rd ). Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
  • 44. Randomly projected universal signature dynamics Randomized signature Randomized signature is as expressive as signature Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann) Let u be a smooth control and f a JL map from TM (Rd ) → Rk where k is determined via some fixed and a fixed set Q. We denote by r-Sig the smooth evolution of the following controlled differential equation on Rk dXt = d X i=1 1 √ n f (f ∗ (Xt)ei ) + (1 − 1 √ n )f (Sigt ei ) dui (t) , X0 ∈ Rk , where n = dim(TM (Rd )). Then for each w ∈ TM (Rd ) |hw, Sigt −f ∗ (r-Sigt(X0))i| ≤
  • 45.
  • 46. hw, Evolt(1 − f ∗ (X0))i
  • 47.
  • 48. + C d X i=1 Z t 0 k Evol∗ r wkQk Sigr ei kQ dr , where Evol denotes here the evolution operator corresponding to dZt = Pd i=1 1 √ n (f ∗ ◦ f )(Ztei )dui (t) and C = sups≤r≤t, i
  • 49.
  • 50.
  • 52.
  • 53.
  • 54. . Josef Teichmann (ETHZ) Randomized signature May 2022 20 / 30
  • 55. Randomly projected universal signature dynamics Randomized signature Proof As signature satisfies d Sigt = Pd i=1 Sigt ei dui (t) , Sig0 = 1 , we have for the difference Sigt −f ∗ (Xt) = 1 − f ∗ (X0) + d X i=1 Z t 0 Sigr ei − 1 √ n f ∗ (f (f ∗ (Xr )ei )) dui (r) − Z t 0 (1 − 1 √ n ) d X i=1 f ∗ (f (Sigr ei )) dui (r) = 1 − f ∗ (X0) + d X i=1 Z t 0 1 √ n (f ∗ ◦ f ) Sigr ei − f ∗ (Xr )ei dui (r) + d X i=1 Z t 0 Sigr ei − (f ∗ ◦ f ) Sigr ei dui (r) . Josef Teichmann (ETHZ) Randomized signature May 2022 21 / 30
  • 56. Randomly projected universal signature dynamics Randomized signature Proof and some remarks This can be solved by variation of constants. Hence for every w ∈ TM (Rd ) we get
  • 57.
  • 58. hw, Sigt −f ∗ (Xt)i
  • 59.
  • 60.
  • 61.
  • 62. hw, Evolt(1 − f ∗ (X0))i
  • 63.
  • 65.
  • 66. hEvol∗ r w, Sigr ei − (f ∗ ◦ f ) Sigr ei i
  • 67.
  • 69.
  • 70. hw, Evolt(1 − f ∗ (X0))i
  • 71.
  • 72. + C d X i=1 Z t 0 k Evol∗ r wkQk Sigr ei kQ dr , where the last estimate follows from the above proposition. Remarks: I An appropriate choice for Q can be the standard basis TM (Rd ) and its negative so that it contains 2n points. In this case k ≥ 24 log(2n) 32−23 . I In order to guarantee that
  • 73.
  • 74. hw, Sigt −f ∗ (Xt)i
  • 75.
  • 76. is indeed small and k significantly smaller than n, we need to control the Q-norms independently of n. Josef Teichmann (ETHZ) Randomized signature May 2022 22 / 30
  • 77. Randomly projected universal signature dynamics Randomized signature Remarks on the estimate For appropriate choices of Q, e.g. containing the standard basis of TM (Rd ), k Sigr ei kQ can be bounded independently of n. Hence, hw, Sigt −f ∗ (r-Sigt(X0))i becomes small whenever X0 is chosen such that f ∗ (X0) ≈ 1 and when k Evol∗ t wkQ can be bounded independently of n. The latter holds true if w is sparsely populated. Choosing w to be sparse is possible due to the following result: Let w lie in the closed convex hull of a set K ⊂ Rn such that K is bounded by some R 0. Then for every m ≥ 1 and c R2 − kwk2 there is some wm being a convex combination of m points of K such that kw − wmk2 ≤ c m . (This goes back to Bernard Maurey.) Josef Teichmann (ETHZ) Randomized signature May 2022 23 / 30
  • 78. Randomly projected universal signature dynamics Randomized signature Remarks on the estimate – estimating k Evol∗ t wkQ Consider for simplicity d = 1 so that dim(TM (R)) = M ≡ n. Let f = 1 √ k A where A ∈ Rk×n with normally distributed entries. Then for appropriate Q as above k Evol∗ t wkQ ≈ k exp(t 1 √ n A A)k1,1kwk1, where for B ∈ Rn×n we have kBk1 = maxj Pn i=1 |bij | Due to the scaling 1 √ n and the central limit theorem k exp(t 1 √ n A A)k1 ≤ c for some constant c (independent of n). For sparse w, kwk1 is independent of n. Note that for general w we have kwk1 ≤ √ nkwk2. Hence, for sparse w k Evol∗ t wkQ ≤ C̃ with some constant C̃ independent of n. ⇒ If n is large r-Sig compresses the information of Sig very well. Josef Teichmann (ETHZ) Randomized signature May 2022 24 / 30
  • 79. Randomly projected universal signature dynamics Randomized signature r-Sig as random dynamical system We can actually calculate approximately the vector fields which determine the dynamics of r-Sig by generic random elements. Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann) For M → ∞ (and thus n → ∞) the entries of the matrix representation of the linear maps y 7→ 1 √ n f (f ∗ (y)ei ) for i = 1, . . . , d, are asymptotically normally distributed with independent entries. The time dependent bias terms (1 − 1 √ n )f (Sigt ei ) are as well asymptotically normally distributed with independent entries. Josef Teichmann (ETHZ) Randomized signature May 2022 25 / 30
  • 80. Randomly projected universal signature dynamics Randomized signature Randomized signature as reservoir Practical implementation of randomized signature Given a set of hyper-parameters θ ∈ Θ, and a dimension k, choose randomly (often just by independently sampling from a normal distribution) matrices M1, . . . , Md ∈ Rk×k as well as (bias) vectors b1, . . . , bd . Then one can tune the hyper-parameters and the dimension k such that dXt = d X i=1 (Mi Xt + bi )dui (t), X0 = x approximates the CODE Y locally in time via a linear readout W up to arbitrary precision. The process X will serve as reservoir. Note that again it does not depend on the specific dynamics of Y which should be learned. Josef Teichmann (ETHZ) Randomized signature May 2022 26 / 30
  • 81. Applications for market generation Deep Simulation - Summary Split the map in two parts: Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
  • 82. Applications for market generation Deep Simulation - Summary Split the map in two parts: I a universal reservoir X, no dependence on the specific dynamics I linear readout W that needs to be trained such that Y ≈ WX B F // R Y X W ?? Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
  • 83. Applications for market generation Deep Simulation - Summary Split the map in two parts: I a universal reservoir X, no dependence on the specific dynamics I linear readout W that needs to be trained such that Y ≈ WX What should we take as reservoir? B F // R Y X W ?? Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
  • 84. Applications for market generation Deep Simulation - Summary Split the map in two parts: I a universal reservoir X, no dependence on the specific dynamics I linear readout W that needs to be trained such that Y ≈ WX What should we take as reservoir? I Signature process of Brownian motion works, but computationally expensive B F // R Y X W ?? X Z 0t1···tk t ◦dBi1 (t1) · · · ◦ dBin (tk )e Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
  • 85. Applications for market generation Deep Simulation - Summary Split the map in two parts: I a universal reservoir X, no dependence on the specific dynamics I linear readout W that needs to be trained such that Y ≈ WX What should we take as reservoir? I Signature process of Brownian motion works, but computationally expensive I Random projections (implicit inclusion of high order signature terms) B F // R Y X W ?? X Z 0t1···tk t ◦dBi1 (t1) · · · ◦ dBin (tk )e Randomized signature dXt = Pd i=1(Mi Xt + bi ) ◦ dBi t, X0 ∈ Rk , Mi , bi randomly chosen Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
  • 86. Applications for market generation Example - SABR model Let us consider as example the SABR stochastic volatility model. The process Y consists of two components (Y 1, Y 2). Y 1 corresponds to the price process and Y 2 to the stochastic volatility process: dY 1 t = Y 1 t Y 2 t (ρdB1 t + q 1 − ρ2dB2 t ) dY 2 t = αY 2 t dB2 t , where B1 and B2 are two independent Brownian motions and α 6= 0 and ρ ∈ [−1, 1]. Given a trajectory of 1000 time points we now learn the map (B1 t∈[0,1000], B2 t∈[0,1000]) 7→ (Y 1 t∈[0,1000], Y 2 t∈[0,1000]) Josef Teichmann (ETHZ) Randomized signature May 2022 28 / 30
  • 87. Applications for market generation Training and prediction results Is it possible to predict the future evolution of the market environment given new input Brownian motions? Josef Teichmann (ETHZ) Randomized signature May 2022 29 / 30
  • 88. Applications for market generation Training and prediction results Is it possible to predict the future evolution of the market environment given new input Brownian motions? Training is done on the first 1000 time points Prediction works for 3000 time points further. The first graph is Y 1 and the second Y 2 , each time the predicted (blue) versus the true one (green). Josef Teichmann (ETHZ) Randomized signature May 2022 29 / 30
  • 89. Applications for market generation Training and prediction results Is it possible to predict the future evolution of the market environment given new input Brownian motions? Training is done on the first 1000 time points Prediction works for 3000 time points further. The first graph is Y 1 and the second Y 2 , each time the predicted (blue) versus the true one (green). In practice, the past market Brownian motions have to be extracted for learning, prediction is done by generating new ones. Josef Teichmann (ETHZ) Randomized signature May 2022 29 / 30
  • 90. Conclusion Conclusion We show that the time evolution of controlled differential equations can be arbitrarily well approximated by regressions on a certain randomly chosen dynamical system of moderately high dimension, called randomized signature (randomized rNNs). This is motivated by paradigms of reservoir computing and widely applied signature methods from rough paths theory. We apply the method for market generation/simulation in finance/randomized Longstaff Schwartz/provable machine learning. Josef Teichmann (ETHZ) Randomized signature May 2022 30 / 30