PYTHON + TENSORFLOW:
how to earn money
in the Stock Exchange
with Deep Learning
Jose M. Leiva
$
$
$
$
$
Summary 1. A primer on Machine Learning
2. Neural Networks and Deep
Learning
3. Asset Management and
Quantitative Finance
Imagine we want to build an
automatic digit recognizer.
A primer on
Machine Learning
REMEMBER!
A digital image is
nothing but a matrix
of numbers!
Source: MNIST Dataset
Source: https://medium.com/@ageitgey/machine-learning-is-fun
Example: Automatic Digit Recognition
2 DIFFERENT ALTERNATIVES:
1. Solve the problem in the original domain
with 784 values (28 x 28 images)
2. Extract features (for example, height and width)
4
Linear classifiers:
perform the
transformation ⟨ ,x⟩
= T
x
Width
Height
Width
Height
Examples of “1”
Examples
of “8”
● We are usually interested in providing a degree of confidence
about the prediction rather than yes/no answers.
● How to transform the unbounded value T
x into a probability?
● Sigmoid function
5
Obtaining Probabilities
p(y=’8’)
p(y=’1’) = 1-p(y=’8’)
Fundamental problem in machine learning:
How to choose the best ?
We must evaluate the cost of the errors!
ALTERNATIVES:
● Counting the errors
● Log-loss or cross-entropy loss: the one usually
employed in neural networks
● Maximum margin
● Etc.
6
The cost of the errors
Width
Height
Example: Maximum-margin classification
(Support Vector Machines, SVM)
When the sigmoid function is combined with the log-loss, we have
a logistic regression classifier.
● scikit-learn (aka sklearn) has become the Python library for Machine Learning.
● It is object-oriented, very comfortable to work with.
● It offers submodules for data encoding, preprocessing, dimensionality reduction,
regression, classification, error metrics, model selection, etc., etc.
7
Machine Learning with Python
BUILD A MODEL AND PERFORM PREDICTION:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X,y)
yt = model.predict(Xtest) # predicts the labels
pt = model.predict_proba(Xtest)# obtain the probs
EVALUATE ACCURACY:
from sklearn.metrics import accuracy_score
acc = accuracy_score(yt, Ytest)
OR DIRECTLY:
acc = model.score(Xtest,Ytest)
We can build non-linear models by
concatenating several layers of
non-linear units.
We can think of each unit as a logistic
regressor as the one we saw before.
Neural Networks
and Deep Learning
Source: http://cs231n.github.io/neural-networks-1/
Fully connected Neural Networks
It is important that the units in the
network are followed by a non-linear
transformation:
This way we can draw non-linear
separation functions!
y = ( T
x+b)
9
A non-linear activation
function can replicate
the spiking mechanism
of biological neurons.
Source: Bruce Blause, from en.wikipedia.org/wiki/neuron
Example of activation
function: Rectified
Linear Unit (ReLU)
CONVOLUTIONAL NEURAL NETWORKS
10
Deep Learning: other architectures
RECURRENT NEURAL NETWORKS
They are used in Natural Language
Processing (NLP) and Machine Translation
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/Sources:
http://cs231n.github.io/convolutional-networks/
http://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
Fully connected Neural Networks
All layers can use units with ReLU activations, but we need something else in the last one.
We would like the output to give us a set of probability values
To this end, we apply a softmax transformation in the last layer:
● It preserves the order of the outputs
● The new outputs sum up 1.
11
● Example: evaluate f= x2
+ y2
. First define the graph.
import tensorflow as tf
x = tf.Variable( 3)
y = tf.Variable( 4)
f = x*x + y*y
● It’s separated from its execution:
>>> sess = tf.Session()
>>> sess.run(x.initializer)
>>> sess.run(y.initializer)
>>> result = sess.run() # Result equals 25
● Or more compactly:
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
result = f.eval
12
Deep Learning with TensorFlow
● Example:Graph for Logistic Regression on the MNIST database.
# Data input
x = tf.placeholder(tf.float32, [ None, 784])
y = tf.placeholder(tf.float32, [ None, 10])
# Set model weights
W = tf.Variable(tf.zeros([ 784, 10]))
b = tf.Variable(tf.zeros([ 10]))
# Construct model
pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
# Minimize error using cross entropy
cost = tf.reduce_mean( -tf.reduce_sum(y *tf.log(pred), axis=1))
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
.minimize(cost)
● Running and optimising:
with tf.Session() as sess:
sess.run(init)
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
_, c = sess.run([optimizer, cost], feed_dict ={x: batch_xs,
y: batch_ys})
# Compute average loss
avg_cost += c / total_batch
TensorFlow is a open source software library for numerical computation.
● In Portfolio Management, we aim at
invest a given wealth in a set of
assets.
● The strategy can be either static or
dynamic: we focus on the latter.
● The proportion of the wealth
investing in the i-th asset is wi
, with
● We focus on the case where wi
≥ 0
(long only) and wi
< 1 (no leverage).
Asset Management
and Quantitative
Finance
A neural network with softmax
output is perfect for the job!
14
Neural Networks in Portfolio Management
PROBLEM DEFINITION:
We would like to maximize the profit:
● t is the time instant
● n is the index of each asset in the portfolio
● rn
t+1
is the return at t+1
● wn
t
is the weight of asset n in the portfolio at t
We train the network every year.
Usually, we would use the following procedure:
...but we cannot afford to skip the validation set!
Instead, we use a constant number of iterations
(100.000) and incorporate the validation set in
the train one.
15
Neural Networks in Portfolio Management
We use the following set of assets:
We use the following variables as inputs to the
network:
● T last daily price movements.
● H last weekly price movements
● Day of the week (5 binary variables)
RESULTS
● Liquidity constraint: 5%
● Management fee: 1%
● Transaction fee: 0.05%
16
Neural Networks in Portfolio Management
There are three recommendations a week (on
Monday, Wednesday and Friday), each one
incorporating the output of the NN in one third of the
portfolio.
Is it really possible
to consistently beat
the market?
“Beating the stock market requires
outpredicting teams of investors in
fancy suits with MBAs from Ivy League
schools who are paid seven-figure
salaries and who have state-of-the-art
computer systems at their disposal.”
Nate Silver, The Signal and the Noise
ETS - @EtsFactory
José M. Leiva Murillo - @JLeivaMurillo
ETS Asset Management Factory
José M. Leiva Murillo
Machine Learning, Artificial Intelligence,
Asset Management, Python and more in
our blog:
www.quantdare.com
Thanks
@ETSFactory
#PyConES2017

Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learning. PyconEs2017 talk.

  • 1.
    PYTHON + TENSORFLOW: howto earn money in the Stock Exchange with Deep Learning Jose M. Leiva $ $ $ $ $
  • 2.
    Summary 1. Aprimer on Machine Learning 2. Neural Networks and Deep Learning 3. Asset Management and Quantitative Finance
  • 3.
    Imagine we wantto build an automatic digit recognizer. A primer on Machine Learning REMEMBER! A digital image is nothing but a matrix of numbers! Source: MNIST Dataset Source: https://medium.com/@ageitgey/machine-learning-is-fun
  • 4.
    Example: Automatic DigitRecognition 2 DIFFERENT ALTERNATIVES: 1. Solve the problem in the original domain with 784 values (28 x 28 images) 2. Extract features (for example, height and width) 4 Linear classifiers: perform the transformation ⟨ ,x⟩ = T x Width Height Width Height Examples of “1” Examples of “8”
  • 5.
    ● We areusually interested in providing a degree of confidence about the prediction rather than yes/no answers. ● How to transform the unbounded value T x into a probability? ● Sigmoid function 5 Obtaining Probabilities p(y=’8’) p(y=’1’) = 1-p(y=’8’)
  • 6.
    Fundamental problem inmachine learning: How to choose the best ? We must evaluate the cost of the errors! ALTERNATIVES: ● Counting the errors ● Log-loss or cross-entropy loss: the one usually employed in neural networks ● Maximum margin ● Etc. 6 The cost of the errors Width Height Example: Maximum-margin classification (Support Vector Machines, SVM) When the sigmoid function is combined with the log-loss, we have a logistic regression classifier.
  • 7.
    ● scikit-learn (akasklearn) has become the Python library for Machine Learning. ● It is object-oriented, very comfortable to work with. ● It offers submodules for data encoding, preprocessing, dimensionality reduction, regression, classification, error metrics, model selection, etc., etc. 7 Machine Learning with Python BUILD A MODEL AND PERFORM PREDICTION: from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X,y) yt = model.predict(Xtest) # predicts the labels pt = model.predict_proba(Xtest)# obtain the probs EVALUATE ACCURACY: from sklearn.metrics import accuracy_score acc = accuracy_score(yt, Ytest) OR DIRECTLY: acc = model.score(Xtest,Ytest)
  • 8.
    We can buildnon-linear models by concatenating several layers of non-linear units. We can think of each unit as a logistic regressor as the one we saw before. Neural Networks and Deep Learning Source: http://cs231n.github.io/neural-networks-1/
  • 9.
    Fully connected NeuralNetworks It is important that the units in the network are followed by a non-linear transformation: This way we can draw non-linear separation functions! y = ( T x+b) 9 A non-linear activation function can replicate the spiking mechanism of biological neurons. Source: Bruce Blause, from en.wikipedia.org/wiki/neuron Example of activation function: Rectified Linear Unit (ReLU)
  • 10.
    CONVOLUTIONAL NEURAL NETWORKS 10 DeepLearning: other architectures RECURRENT NEURAL NETWORKS They are used in Natural Language Processing (NLP) and Machine Translation Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/Sources: http://cs231n.github.io/convolutional-networks/ http://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
  • 11.
    Fully connected NeuralNetworks All layers can use units with ReLU activations, but we need something else in the last one. We would like the output to give us a set of probability values To this end, we apply a softmax transformation in the last layer: ● It preserves the order of the outputs ● The new outputs sum up 1. 11
  • 12.
    ● Example: evaluatef= x2 + y2 . First define the graph. import tensorflow as tf x = tf.Variable( 3) y = tf.Variable( 4) f = x*x + y*y ● It’s separated from its execution: >>> sess = tf.Session() >>> sess.run(x.initializer) >>> sess.run(y.initializer) >>> result = sess.run() # Result equals 25 ● Or more compactly: init = tf.global_variables_initializer() with tf.Session() as sess: init.run() result = f.eval 12 Deep Learning with TensorFlow ● Example:Graph for Logistic Regression on the MNIST database. # Data input x = tf.placeholder(tf.float32, [ None, 784]) y = tf.placeholder(tf.float32, [ None, 10]) # Set model weights W = tf.Variable(tf.zeros([ 784, 10])) b = tf.Variable(tf.zeros([ 10])) # Construct model pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax # Minimize error using cross entropy cost = tf.reduce_mean( -tf.reduce_sum(y *tf.log(pred), axis=1)) # Gradient Descent optimizer = tf.train.GradientDescentOptimizer(learning_rate) .minimize(cost) ● Running and optimising: with tf.Session() as sess: sess.run(init) for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) _, c = sess.run([optimizer, cost], feed_dict ={x: batch_xs, y: batch_ys}) # Compute average loss avg_cost += c / total_batch TensorFlow is a open source software library for numerical computation.
  • 13.
    ● In PortfolioManagement, we aim at invest a given wealth in a set of assets. ● The strategy can be either static or dynamic: we focus on the latter. ● The proportion of the wealth investing in the i-th asset is wi , with ● We focus on the case where wi ≥ 0 (long only) and wi < 1 (no leverage). Asset Management and Quantitative Finance
  • 14.
    A neural networkwith softmax output is perfect for the job! 14 Neural Networks in Portfolio Management PROBLEM DEFINITION: We would like to maximize the profit: ● t is the time instant ● n is the index of each asset in the portfolio ● rn t+1 is the return at t+1 ● wn t is the weight of asset n in the portfolio at t
  • 15.
    We train thenetwork every year. Usually, we would use the following procedure: ...but we cannot afford to skip the validation set! Instead, we use a constant number of iterations (100.000) and incorporate the validation set in the train one. 15 Neural Networks in Portfolio Management We use the following set of assets: We use the following variables as inputs to the network: ● T last daily price movements. ● H last weekly price movements ● Day of the week (5 binary variables)
  • 16.
    RESULTS ● Liquidity constraint:5% ● Management fee: 1% ● Transaction fee: 0.05% 16 Neural Networks in Portfolio Management There are three recommendations a week (on Monday, Wednesday and Friday), each one incorporating the output of the NN in one third of the portfolio.
  • 17.
    Is it reallypossible to consistently beat the market? “Beating the stock market requires outpredicting teams of investors in fancy suits with MBAs from Ivy League schools who are paid seven-figure salaries and who have state-of-the-art computer systems at their disposal.” Nate Silver, The Signal and the Noise
  • 18.
    ETS - @EtsFactory JoséM. Leiva Murillo - @JLeivaMurillo ETS Asset Management Factory José M. Leiva Murillo Machine Learning, Artificial Intelligence, Asset Management, Python and more in our blog: www.quantdare.com Thanks @ETSFactory #PyConES2017

Editor's Notes

  • #5 Solving this problem boils down to finding the optimal separating hyperplane.
  • #6  But we still have to solve the problem of how to draw the optimal separating hyperplane.
  • #8 This is all it takes to start with ML Of course, ML is far more than that, but Some of the main elements are present in the previous slides: 1- The need for feature extraction or feat. engineering. 2- The need of an appropriate probabilistic interpretation 3- The need for a cost function that penalizes the errors in a sensible and (specially) convex way.
  • #9 What’s the big deal about DL? Imagine we take the LR and use it as the building block of something bigger : this is called a FC network because all the outputs of a layer are the inputs of the next one. You might think that the strange sigmoid function should be only in the last “unit”
  • #10 A combination of linear transformations is still a linear combination: the discrimination function is still a hyperplane Using non linar activations (as the sigmoid) in each unit allows for more complex discrimination functions. Maybe this is easier to understand if you think of how a neuron works.
  • #11 In a convolutional neural network (CNN), we apply several levels of convolution operation. After each convolution, the image is transformed into another one with a lower width and height but a higher depth. This architecture works well in computer vision because it imitates human vision: how neurons are organized in the visual cortex -> from low level patterns to higher ones. In an RNN, we add feedback to the network: information about the present example is propagated through time. The output at a given instant not only depends on the particular input but also on the previous history. This is why this kind of network performs very well in problems involving sequences: speech recognition, machine translation, etc...
  • #12 Very often we use a combination of architectures: for images, we use several convolutional layers followed by one (or several) FC layers I’d like to point out something important: even if you use ReLU activations after each layer, we still need a softmax in the last layer: probability estimation! The softmax function is the generalization of the sigmoid when have more than 2 categories. So, the important question: how can we perform deep learning with python?
  • #13 TensorFlow is a framework developed by Google, and then made public. It has the advantages from both Google and the open-source communities. It has a lot of features, and can be built on top of different hardware architectures (GPUs, etc)
  • #14 So, what would we need to apply a NN to the problem of choosing these weights? Well, first of all we need a cost function.
  • #15 This is not a classifier, so we don’t have to evaluate the cost of the errors. But we still need a function to optimize!
  • #17 In this figure we show that the portfolios returned by the NN performs quite well. It’s at least as good as a reference global asset. In the 2nd figure we have the exposure: the distribution of the wealth across the assets along time. The strategy is the mixture of three weekly strategies, each one taking place in a different day of the week Please note that the turnover of this strategy, which is the speed at which the composition changes, is too high. It could be not acceptable by a portfolio manager. There is something interesting: the amount of cash in the portfolio is only significant in 2014. Why?