ML for Trading
ML/DL /
BU
(BU Head) Larry Guo
larryguo5078@gmail.com, FB: ( )
May 16th, 2018
: Marketing, Sales, Technical Marketing, Legal / ,
BU Head, .
Disclaimer
• talk, ...
or .....
•
• /
( , )
• / Google image
/
/ noise
python
pandas
ML/DL
Agenda
•Machine Learning/Deep Learning ( /
• ML for Trading
• ,, the path to AI
vs AI
+
+
If….
buy/
hold/
sell
data
arg min
✓
Loss(f(X))
learning
fˆ✓(X)
model
rule base leanring base
= =
AI@ Finance
ML Basic
DL ⇢ ML ⇢ AI
?
• ??
✴ ( , )
✴(vs , , label, ground truth)
✴ ( ) (Loss/objective function)
✴
•
• ( )
• Lear from Error ( )
✓⇤
f(X, ✓), vs.{(x1, y1), (x2, y2), ..(xn, yn)}
s.t., f✓⇤ (xi) ! yi
Types of Learning
Learn from Paired data
(with Label)Supervised
Learning
Learn from Un-Paired data
(without Label)
Un-Supervised
Learning
Learn from Action,
reward)
Reinforcement
Learning
Supervised Learning
Classification
: (1,2,3) vs. (1,5,6)
Regression
vs: %
Unsupervised Learning
Reinforcement Learning
State
Action
Reward
Environment
Agent
State’
Agent’s Objective:
Find best policy max E
"
X
i
i
Ri
#
i
Ri
(Supervised Learning)
•Classification ( / , ,Buy/Hold/Sell)
P(y|X)
(X, y) pair data set f(X) = ˆy ! y
X = features y = label
•Regression ( )
eg. p(ˆy1 = cat|x1) > p(ˆy1 = dog|x1) ! cat
(Unsupervised Learning)
•Clustering ,
•PCA
•Generating ( )
P(X)
GAN
celebA 1024
Learn from Loss
min(loss)
y: loss:
: ✓
dy
d✓dy
d✓
✓ =: ✓ ⌘ ⇤
dy
d✓
✓ =: ✓
dy
d✓
r : Gradient Gradient Decent ,
✓ =: ✓ ⌘ ⇤ r✓y
Robust
Under fitting
Over fitting
’ ’
<<
> >Generalization
Train/Test Split,
No Data Leak !
Mean Squared Error
ˆy = ax1 + bx2 + cx3
L =
1
N
X
kˆy yk2
Cross Entropy
H(p, p) =
X
x
p(x) log p(x)
0
=
X
x
p(x) log
1
p(x)entropy
DKL(p||q) =
X
x
p(x) log
p(x)
q(x)
q p , ??
p = q, =0
Cross Entropy
DKL(p||q) =
X
x
p(x) log
1
q(x)
(
X
x
p(x) log
1
p(x)
)
DKL(p||q) = H(p, q) H(p)
(if given p)
min DKL(p||q) ⇠ min H(p, q)
cross entropy
H(p, q)
=
X
x
p(x) log q(x) or
X
i
yi log ˆyi or log ˆy
for binary
i=
X
n
X
i
yn
i log ˆyn
i
n=#sample
min (- log likelihood) ~ max (likelihood)
X: Feature( )
K_D RSI ROE
Day1
Day2
..
DayN
{
)
Time dependency
To predict Day(N+1)
??
Domain Knowledge!!
y: labeling
•Category Data into Numeric Encoding, (or
even, One_Hot Encoding)
yi 2 D ⇢ {A, B, C, D, E, F}
,
A = 0
B = 1
C = 2
D = 3
E = 4
F = 5
y:One-Hot
0
0
1
0
0
0
0
yˆy
0.1
0.05
0.05
0.748
0.05
0.001
0.001
maximize
minimize
minimize
X
Model
fˆ✓(X)
sum=1
p
ˆy to estimate y
loss =
X
yi log(ˆyi) = log(0.748)
Softmax!
ML Frameworks
•Regression (Logistic Regression)
•SVM (Kernel Trick.) , LibSVM
•Decision Tree, Random Forest, XGBoost
•Deep Learning
: Decision Tree
•import numpy as np
•from sklearn.model_selection import train_test_split
✴
•X_train, X_test, y_train,y_test = train_test_split (X,y,
test_size=0.2,shuffle=True)
✴# X (feature), y (label) , # 80% training 20%
test .
•from sklearn.tree import DecisionTreeClassifier
•clf = DecisionTreeClassifier()
•clf.fit(X_train, y_train)
✴ Deciion tree ; / ; clf = classifier
•clf.score(X_test, y_test) # test set
•clf.predict (new_data) #
Tree to Forest
!
Deep Learning
Deep Learning
W1
W2
W3
X1
X2
X3
Neural Network
ˆy yˆy = (
X
i
WiXi)
ˆˆy = (
X
i
Ui ˆyi) y
activation
eg 0,1
FCN, fully connected, dense
Neural Network
2
4
x1
x2
x3
3
5
2
6
6
6
4
h
(1)
1
h
(1)
2
h
(1)
3
h
(1)
4
3
7
7
7
5
=
2
6
6
4
w11 w12 w13
w21 . .
w31 . .
w41 w42 w43
3
7
7
5
h
(1)
i =
0
@
X
j
wijxj
1
A
Neural Network,
” ”(tensor)
( : ) !
Activation Function
Relu,
y=max(x,0)
0.5
1.0
0.0
Sigmoid
0.0
1.0
-1.0
tanh
activation
eg 0,1
Back Propagation
Forward Pass
Backward Pass loss
Convolution Neural Network
Dot Product
???
~u
~vi
~vi • ~u = kvkkuk cos ✓
cosine similarity
target
1D Convolution
???
target
2D Convolution
1 0 1
0 1 0
1 0 1
???
Kernel ,
filter,
Filters
horizon filter
vertical filter
Filter 

Filter on Image
Filter ?
detect
detect pattern

detecte
CNN
hidden representation
ImageNet Competition
100 1000
’12 AlexNet, DL
VGG 19, Google Inception (NiN) , ResNet 152
: Transfer Learning,
Hidden Representation
hidden representation
Convolution Auto Encoder
dim: 1024 x 768
dim: 300
input output
Representation Learning
DNA
Neural Network
Train your own
classifier
Neural style
Transfer
Face
Recognition
Recurrent Neural Network
RNN, LSTM, GRU
This movie is good, really not bad
This movie is bad, really not good
Sequence Modeling
When the recurrent network is trained to perform a task that requires predicting
the future from the past, the network typically learns to use h(t) as a kind of lossy
summary of the task-relevant aspects of the past sequence of inputs up to t. This
summary is in general necessarily lossy, since it maps an arbitrary length sequence
(x(t), x(t 1), x(t 2), . . . , x(2), x(1)) to a fixed length vector h(t). Depending on the
training criterion, this summary might selectively keep some aspects of the past
sequence with more precision than other aspects. For example, if the RNN is used
in statistical language modeling, typically to predict the next word given previous
words, it may not be necessary to store all of the information in the input sequence
up to time t, but rather only enough information to predict the rest of the sentence.
The most demanding situation is when we ask h(t) to be rich enough to allow
one to approximately recover the input sequence, as in autoencoder frameworks
(chapter 14).
ff
hh
xx
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
h(... )
h(... )
h(... )
h(... )
ff
Unfold
ff ff f
Figure 10.2: A recurrent network with no outputs. This recurrent network just processes
information from the input x by incorporating it into the state h that is passed forward
through time. (Left)Circuit diagram. The black square indicates a delay of a single time
step. (Right)The same network seen as an unfolded computational graph, where each
node is now associated with one particular time instance.
Equation 10.5 can be drawn in two different ways. One way to draw the RNN
is with a diagram containing one node for every component that might exist in a
376
h(t)
= f(h(t 1)
, x(t)
, ✓)
input
xt 1
ht 1
ˆyt 1 ˆyt
ˆyt+1
ht+1
xt+1xt
ht
P(ˆyt = buy|x1, x2, ...xt 1, xt)
L =
X
t
Lt =
X
i
log P(yt|x1, x2, ...xt)
RNN Various Type
Image Classification Sentiment Analysis , ( ) Image Captioning
MNIST A Baby Eating a piece of paper
RNN Various Type
Character RNN
Deep Shakespear
Deep Math
Project Magenta
NLP, Translation
Seq2Seq
Google Translation
representation
“Deep Math “
source: The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy
Reinforcement Learning
State
Action
Reward
Environment
Agent
State’
Agent’s Objective:
Find best policy max E
"
X
i
i
Ri
#
i
Ri
Q-learning
action discrete finite set
a : epsilon greedy a =
(
random, prob(✏)
arg max
x
Q(s, x), prob(1 ✏)
new old learning
rate
state s, action a, state s’, reward r
a’: state s’, action
Q(s, a) Q(s, a) + (↵) ⇤ [r + max
a0
Q(s0
, a0
) Q(s, a)]
, 0r + max
a0
Q(s0
, a0
) Q(s, a)
Value Iteration
Deep Q-Learning
state
action
Q-
leanring
Q(s,a)
w =: w ⌘ ⇤ rwLoss(w)
state
Q(s,a0)
Q(s,a1)
Q(s,a2)
Neural Network as Q function
a0:sell
a1:hold
a2:buy
, 0r + max
a0
Q(s0
, a0
) Q(s, a)
Loss(w) = (r + max
a0
Qw(s0
, a0
) Qw(s, a))2 w
Modeling Trading
Modeling
Price Based/
Technical Analysis
Factor Model Event/Text
K
D1
D2
D3
D4
D5
K
P(SD6|(DK
1 , DK
2 , DK
3 ..DK
5 )
{
1
0
1
K
D1
D2
DD3
D4
D5
LSTM or GRU (RNN)
D6
? ?
buy/sell signal
RNN(LSTM)
label: [1,0,0] ( /long day 6 close - day 5 close > threshold)
label: [0,0,1] ( /short day 6 close - day 5 close < - threshold)
K
D1
D2
DD3
D4
D5
D1: (K )
D2: (K )
D5: (K )
.
.
LSTM
Input
LSTM
(LSTM)
fully connected
ˆy0, ˆy1, ˆy2,
X
= 1
y,y2, y3
cross entropy
one-hot
(Keras: 10
Technical Index as Features
K D RSV MA5 MA20 OBV MACD
D1
D2
D3
D4
D5
Factor Model as Features
Rev G% EPS PE ROE
M1
M2
M3
M4
M5
{Preprocess
/
Using CNN
Input
CNN
(CNN)
fully connected
ˆy0, ˆy1, ˆy2,
X
= 1
y,y2, y3
cross entropy
K
D1
D2
DD3
D4
D5
D1: (K )
D2: (K )
D5: (K )
.
.
CNN Sliding window
Self Define Features
label: [1,0,0] ( /long day 6 close - day 5 close > threshold)
label: [0,0,1] ( /short day 6 close - day 5 close < - threshold)
Consider Candle Chart as
Image (Using CNN)
source: DEEP STOCK REPRESENTATION LEARNING: FROM CANDLESTICK CHARTS TO INVESTMENT DECISIONS
Candlestick image Convolution Auto Encoder, VGG
512D Representation
Clustering Method: Network Modularity
Find Best Stock in
each cluster
¯µ
Using Reinforcement Learning
state
Q(s,a0)
Q(s,a1)
Q(s,a2)
Neural Network as Q function
a0:sell
a1:hold
a2:buy
K
D1
D2
DD3
D4
D5
Position Sizing, Asset
+
{LSTM/CNN
get representation
Or,
Reward ?
Fix a time T as a episode
for all t <=T
R_t = log return of day t Rt = log
Rt
Rt 1
1: Day Trading
•
Long/Short
features
•No perfect answer
• Features ?
• Label ? loss ?
•
2: Using RL for Portfolio
Optimization
Asset : [S1, S2]
S1, S2 : Stocks with predicted growth
Use RL to Optimize Portfolio
A = w1S1 + w2S2
w1 + w2 = 1
Consider action: w1 : {0.10, 0.25, 0.5, 0.75, 0.9}
Reward: Rt = log
At
At 1
How about n-stocks ?
3: Inspired from Project
Magenta
source: Tuning Recurrent Neural Networks with Reinforcement Learning
: note , as input of character RNN
RL: (reward)
Character RNN
RL ?

(min DD, ..or ....)
Path
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
~ Robert Frost
Generation Busy
•
• po
•
•
•
: Overwhelming
• MOOC, CS231n, CS224n, …
Arxiv, Youtube Udacity, Coursera,
•Open Source Package…github,
•
•
ML
,
• Excel
•Learn to be focused: Leverage, to be Focused, for at
least 45 minutes , Flora, )
• 30
✴ 30 10 ...
✴ 30 , )
• / .
Correct Path ?
Thank you!
•ML/DL: Coursera: Andrew NG, Udacity
•ML/DL: ,
•CNN: CS231n
•NLP: CS224n
•RL: CS294, CS234 (no video), David Silver
•book: Hands on machine Learning & deep learning with tensor flow
•book: Deep Learning ( )
•RL: David Silver

Machine Learning for Trading