Deep Learning Applications in Finance.pdf

Generative Adversarial Networks (GANs) and
their applications in Finance
A brief introduction
Vanessa Bridge1 Prof. Salisbury2
1Department of Mathematics And Statistics
York University
15, April 2023
Vanessa, Bridge (York U) Deep Learning Strategies For Financial Applications ICLR 2023 1 / 49

Table of Contents
1 Introduction
2 Deep Learning
3 Generative Adversarial Networks
4 Training Model
5 Fine-tuning of trading strategies
6 Sampling And Aggregation
7 Experimental Results

Introduction
There are many advantages of using machine learning tools in the field of
Finance, from predictive models, to data generation to find new alpha
opportunities.
To obtain an edge we will explore the use of Generative Adversarial
Networks or GAN to create synthetic data to calibrate trading strategies
on weak signals.
We will also explore how generated data can be used for ensemble
modeling.

Why Use These Techniques?
(i) Generate more diverse training and testing sets, compared to traditional
resampling techniques;
(ii) Provides the ability to draw samples specifically about stressful events,
ideal for model checking and stress testing; and
(iii) Provides a level of anonymization to the dataset, differently from
other techniques that (re)shuffle/resample data.[1]

Time Series Challenges
1.Missing data interval in between
If the intervals of the time series are regular but some values are simply
not present. Sometimes data received through data ingestion may not
have continuous data events as expected.
2. Units of Measurements
Sudden change in Units of measurement will affect the prediction and also
the recommendation generated later. During pre-processing steps, it is
necessary to validate the units of measurements.
3. Timestamp collected is wrong or having unexpected delay
If time series data collected is wrong or if there is a delay in timestamps
may lead to prediction failures in production. Monitor the data in the
native tool and find out the cause for it if prediction failures happen.

Content Review
Before we go into the talk we will cover some concepts:
Deep Learning
Generative Adversarial Networks
Time Series Analysis
Stochastic Gradient Descent

Deep Learning
Figure: Deep Learning

Deep Learning: Perceptron
Definition
Perceptrons are functions that receives a list of input signals and
transforms them into output signals. The perceptron aims to understand
data representation by stacking together many layers, where each layer is
responsible for understanding some part of the input.

Neural Networks
Neural Networks
NN consist of multiple layers of interconnected nodes, each building upon
the previous layer to refine and optimize the prediction or categorization.
They use non-linear activation functions to the network. This progression
of computations through the network ends in a final output that is used as
the result or prediction.

Deep Learning: Propagation
Forward Propagation
This progression of computations through the network is called forward
propagation. The input and output layers of a deep neural network are
called visible layers. The input layer is where the deep learning model
ingests the data for processing, and the output layer is where the final
prediction or classification is made.
Back-Propagation
Another process called backpropagation uses algorithms, like gradient
descent, to calculate errors in predictions and then adjusts the weights and
biases of the function by moving backwards through the layers in an effort
to train the model.

Deep Leaning Algorithm
Combination
Together, forward propagation and backpropagation allow a neural
network to make predictions and correct for any errors accordingly. Over
time, the algorithm becomes gradually more accurate.

Generative Adversarial Networks
Generative Adversarial Networks (GANs)
Is a modelling strategy that employ two Neural Networks:
a Generator (G)
a Discriminator (D)
How they work?
They are trained jointly, with G benefiting from D incapability to recognise
true from generated data, whilst D loss is minimized when it is able to
classify correctly inputs coming from G as fake and the dataset as true.
Competition drive both Networks to improve their performance until the
genuine data is indistinguishable from the generated one.

GAN
Figure: GAN Architecture

Discriminator Architecture
The Discriminator
the Discriminator acts to separate the input created by the Generator and
of the real/observed data generation process.

Generator Architecture
Generator
The Generator is responsible to produce a rich, high dimensional vector
attempting to replicate a given data generation process;

Conditional GAN
A Conditional GAN (cGAN) attempts to learn an implicit conditional
generative model by using extra input data V:
a class label,
a certain categorical feature,
a current/expected market condition
It is specially useful when the data follows a sequence, like time series or
text, or wants to build ”what if” scenarios.
Defintion
Formally a cGAN can be defined by including the conditional variable v:
G : z × v −
→ x and D : x∗ × v −
→ [0, 1]
D and G follow a two-player minmax game with value function V (G, D) :
minG maxD V (D, G) = Ex pdata(x)
[logD(x|v)]+Ez pdata(z)
[log(1−D(G(z|v)))]

cGAN

Motivation For Conditional GAN
In many scenarios it is interesting to have the ability to generate data to
analyse and forecast. In the world of finance due to trade and other
limiting factors data is not often easily available. cGAN’s offer the
possibility of:
Generation of training and testing sets instead of resampling
techniques
Ability to test and fine tune trading strategies
Discover alpha gaining strategies

Algorithm

Selecting The Right Hyperparameters
Before running cGAN Training one must set the hyperparameters. It
mainly encompasses:
G and D architectures,
Number of lags p,
Noise vector size and prior distribution,
Minibatch size L,
Number of epochs,
Snapshot frequency (snap),
Number of samples C,
Parameters associated to the stochastic gradient optimizer;

Model Training: Stochastic Gradient (SG)
Much like regular GANs, training cGANs consists of a similar approach
using a Stochastic Gradient minibatch. The SG is calculated using L
samples from the mini batch and z is the noise vector.
Stochastic Gradient Discriminator
∇θD
1
L
PL
l=1[logD(y
(l)
t |y
(l)
t−1, ..., y
(l)
t−p) + log(1 − D(G(z(l)|y
(l)
t−1, ..., y
(l)
t−p)]
Stochastic Gradient Generator
∇θG
1
L
PL
l=1[logD(G(z(l)|y
(l)
t−1, ..., y
(l)
t−p))]
However selecting the rigth cGAN can be a difficult task that is
computationally expensive and so using snapshots as a way to evaluate
them at different points in time should be considered.

Model Training: Loss Function
Root Mean Squared Error
To measure the goodness-fit of the model (aking to thee chi-square
distance):
RSMEc =
q
1
T−p−1
PT
p+1(yt − y
(∗)
t )2
Figure: RMSE curves, considering a range of snapshot frequencies and number of
samples

Fine-tuning Trading Strategies
Goal Setting: Utility function
To find the proper hyperparemeters a goal needs to be set. This goal
depends on what is the utility function P that the quantitative analyst is
targeting:
outperformance during active trading,
hedging a specific risk,
reaching a certain level of risk-adjusted returns.
Model Validation
Hence, we train a cGAN and use the generator G to draw B samples from
the time series. For every sample, we perform an one-split to create
X(train)
and X(val)
, so that we are able to identify Mλ parameters λ and
assess a set of hyperparameters λ.

Model Validation: Data Selection
Finite set of examples: X(train), draw from a probability distribution
px (x)
Set of hyperparameters λ ∈ Λ, such as number of neurons, activation
function of layer j, etc.
Utility function P to measure a trading strategy Sλ performance in
face of new samples from px (x)
trading strategy Mλ with parameters θ identifiable by an optimization
of a training criterion, but only spotted after a certain λ is fixed
Optimal Configuration
λ∗ = arg max{λ∈Λ} Ex px [P(x; Mλ(Xtrain
))]

Hyperparameter Optimization And Model Validation
Optimal vs. Approximation
Challenges arise when trying to use the previous formula due to the
difficulty in generating new samples from px (x). Additionally Λ can be
extremely large.
Approximation
λ∗ = arg max{λ∈Λ} Ex px [P(x; Mλ(Xtrain
))]
≈ arg max{λ∈{λ1,λ2,...,λm} Ex px [P(x; Mλ(Xtrain
))]
≈ arg max{λ∈{λ1,λ2,...,λn} meanx∈X(val) [P(x; Mλ(Xtrain
))]

cGAN for Fine-tuning Trading Strategies

Alternatives
Parameter Search can be difficult, other solutions can be used such as:
Evolution Strategies
Bayesian Optimization
Similarly, the creation of proper validation sets can be challenging
depending on whether the samples are independent and identically
distributed or not. Solutions like:
k-fold-cross-validation
bootstrap
block-cross-validation
slidding window

Sampling And Aggregation
Ensemble Of Trading Strategies
By combining a set of base learners, usually considered ”Weak”, such as
Classification and Regression Tree, aggregation of these strategies can out
compete ”strong” learners such as SVM. These method can be compared
to bagging.
Variance Reduction
Let Y1, ..., YB be a set of base learners. If we average their predictions and
analyse its variance we get:
V[ 1
B
PB
b=1 Ŷb] = 1
B2 (
PB
b=1 V[Ŷb] + 2
PB
1≤b≤j≤B C[Ŷb, Ŷj ])
if we assume V[Ŷb] = σ2 and C[Ŷb, Ŷj ] = ρσ2 that simplifies to:
V[ 1
B
PB
b=1 Ŷb] = σ2( 1
B + B−1
B ρ) ≤ σ2

Algorithm 3

Experiments
Statistics Collected
The techniques presented below were tested by a group of researchers.
The collected data over a a wide number of assets. Ran experiments and
tested the Generator and Discriminator performances. Some of the
statistics used were related to cumulative returns per asset pool.
Figure: Cumulative returns aggregated across asset pool. Before being averaged,
each individual asset was volatility scaled to 10

Data Set
Data Parameters
The data collected corresponds to 579 assets (currencies, equities and
fixed income). The period used goes from March 2000 to February 2018.
The process used split data in a sequence of returns r1, ..., rT in a single
in-sample/training (IS) and out-sample(OS) set. The trading horizon is
h=1260 days

Asset Statistics
Figure: Aggregated statistics of the assets used during empirical evaluation.

Alpha Metrics
Calamar Ratio
The Calmar ratio is a gauge of the performance of investment funds. It is
a function of the fund’s average compounded annual rate of return versus
its maximum drawdown. The higher the Calmar ratio, the better it
performed on a risk-adjusted basis during the given time frame, which is
mostly commonly set at 36 months.
CR = RM
−MDD(RM )
Sharpe Ratio
The Sharpe ratio compares the return of an investment with its risk. It’s a
mathematical expression of the insight that excess returns over a period of
time may signify more volatility and risk, rather than investing skill.
SR = RM
σM
R
where, RM is the strategy average excess returns, σM
R is it volatility and
MDD(RM) is the strategy maximum drawdown.

GAN Architecture And Hyperparameters

Algorithm to Combine Strategies
Figure: Ensemble Strategy Results

Trading Strategies Configuration
Figure: Main configuration used for fine-tuning of trading strategies

Case Study: Combination of Trading Strategies
Overview
This case evaluates the success of different combination of trading
strategies. In this sense, Algorithm 4 presents the main loop used for
cGANs and Stationary Bootstrap. First step is to resample the actual
returns RS(r1, ..., rTh) using Stationary Bootstrap or cGAN, creating a new
sequence of returns {r∗
1 , ..., r∗
Th} = X(train)
set. We then proceed as usual:
use X(train)
to train a base learner M(b),λ and add it to the ensemble set
ES All of these steps are repeated B times. Finally, we can propagate the
OS feature set through the ensemble ES, get the aggregated prediction,
and compute its performance within this holdout set.

Trading Results
Figure: Median and Mean Absolute Deviation (MAD) results of Trading and
Ensemble Strategies on the OS set.

Case Study: Fine-tuning of Trading Strategies
Model Comparison
This section focuses on evaluating the performance of the three different
architectures of cGANs, the competing methods to cGAN for fine-tuning
trading strategies are: naive (training and validation sets are equal),
one-split and sliding window; block, hv-block and k-fold cross-validation;
stationary bootstrap.
The main hypothesis is: given a trading strategy Mλ, which MV
mechanism is able to uncover the best configuration λ to apply during the
OS period? We search for an answer to this hypothesis using linear and
nonlinear trading strategies (Ridge Regression, Gradient Boosting Trees
and Multilayer Perceptron).

Fine-Tuning Result and Comparison
Figure: Quantiles of Sharpe and Calmar ratios in the OS set across the 579 assets
for different trading strategies and model validation schemes.

Fine-Tuning Results
Results
We can spot that there not much differences between the model validation
schemes, with Naive yielding the worst median (50%) values (0.121), and
hv-Block, Block and cGAN-Medium with the best median (0.138); same
can be said with respect to Calmar ratios.
Overall, apart from a few analyses and cases (e.g., GBT and Naive
method), in aggregate the model validation schemes do not appear to be
significantly distinct from each other.
This can be interpreted that cGAN is a viable procedure to be part of the
fine-tuning pipeline, since its results are statistically indistinguishable to
well established methodologies[1].

Rank Analysis

Findings

cGAN Large Outperforming
Figure: A sample of Sharpe ratio results in the OS set for cases where
cGAN-Large outcompeted the other methods.

Example of cGAN Strategy Performance
Figure: SPX Index cumulative returns in the OS set for different model validation
schemes using MLP as the trading strategy. cGAN-Large and hv-Block found out
the same hyperparameters, therefore obtaining similar profiles.

Applications to cGAN
Another interesting application is to use cGANs for medical time series
generation and anonymization. A group of researchers used cGANs to
generate realistic synthetic medical data, so that this data could be shared
and published without privacy concerns, or even used to augment or enrich
similar datasets collected in different or smaller cohorts of patients.
Most of the applications of cGANs related to the work presented have
centred in synthesizing data to improve supervised learning models. The
only exception is, where a cGAN is used to perform direction prediction in
stock markets[2].
Most work deals with the problem of imbalanced classification, in
particular to fraud detection; it has been shown that cGANs compare
favourably to other traditional techniques for oversampling.

Challenges
Is the GAN memorising the training data?
Is the GAN ignoring data samples it cannot reproduce or over
producing the ones it can easily reproduce (i.e.: mode collapse) [3]
Potential risk that cGAN is unable to replicate well pdata and
although samples might be more diverse they are also more ”biased”

Conclusion
Over these presentation were able to demonstrate the relevance of having
a set of model assessment schemes, using cGAN to identify alpha
opportunities that other techniques are unable to find. Furtheremore, the
research shows that it is possible to generate more diverse training and
testing sets, compared to traditional resampling techniques[1].
The findings encourage the further investigation of cGAN techniques for
other applications not covered here such as stress testing. We also need to
keep in mind the current limitations and to consider further exploration of
the techniques by combining with other methods[4].

Reference List
1 Adriano Soares Koshiyama, Nick Firoozye, and Philip C. Treleaven.
Generative Adversarial Networks for Financial Trading Strategies
Fine-Tuning and Combination. CoRR, abs/1901.01751, 2019.
Hans Buehler, Lukas Gonon, Josef Teichmann, and Ben Wood. Deep
Hedging. Quantitative Finance, 19(8):1271–1291, 2019.
3 Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein
Generative Adversarial Networks. In Doina Precup and Yee Whye Teh
4 Thiago W. Alves, Ionu¸t Florescu, George Calhoun, Drago¸s Bozdog.
SHIFT: A Highly Realistic Financial Market Simulation
Platform.August 31, 2020 SIMULATION PLATFORM

Deep Learning Applications in Finance.pdf

Recommended

Recommended

More Related Content

Similar to Deep Learning Applications in Finance.pdf

Similar to Deep Learning Applications in Finance.pdf (20)

Recently uploaded

Recently uploaded (20)

Deep Learning Applications in Finance.pdf