Machine Learning, Financial Engineering and Quantitative Investing

MACHINE LEARNING
FINANCIAL ENGINEERING
MAR 2018
STEVEN WANG
QUANTITATIVE INVESTING

Quick Take
Machine Learning
Financial Engineering
Quantitative Investing
Takeaway
P-, Q-
easure
ntitative
vesting
Q-Measure
Financial
Engineering
QT
, Q-
asure
titative
esting
Q-Measure
Financial
Engineering
ML
arning?
g Problem?
ible?
FE
QI

Mean
Machine
Machine
Learning
What is Machine Learning
What is Learning Problem
Is Learning Feasible
How to Learn Well

Mean
Machine
1.1 What is Machine Learning? - Overview

Mean
Machine
1.1 What is Machine Learning? - Type
Agent
Environment
ActionReward State
Input
Layer
Output
Layer
Hidden
Layer 1
Hidden
Layer 2
Supervised Learning Unsupervised Learning
Semi-supervised Learning Reinforcement Learning
Deep Learning

Mean
Machine
1.2 What is Learning Problem?
Unknown Target Function
c: X  Y
Training Examples
(x(1), y(1)), (x(2), y(2)),…, (x(n), y(n))
Learning
Algorithm
A
Hypothesis Set
H = {h1, h2,…, hM}
Final Hypothesis
g  c
ideal loan approval formula
historical records of applicants
a set of candidate formulas
learned loan approval formula

Mean
Machine
1.3 Is Learning Feasible? - No Free Lunch Theorem
Feature x Label y Model A Model B Model C
Training
Data
[0, 0] 0 0 0 0
[1, 1] 1 1 1 1
Test
Data
[1, 0] ？ 1 0 1
[0, 1] ？ 0 0 1
Model A = random guess
Model B = support vector machine
Model C = deep neural network
Is Model C > Model B > Model A?
c(x) = x1 ⋁ x2 : Model C wins
o The 3rd data x = [1, 0], so y = 1 ⋁ 0 = 1
o The 4th data x = [0, 1], so y = 0 ⋁ 1 = 1
c(x) = x1 ⋀ x2 : Model B wins
o The 3rd data x = [1, 0], so y = 1 ⋀ 0 = 0
o The 4th data x = [0, 1], so y = 0 ⋀ 1 = 0
c(x) = x1: Model A wins
o The 3rd data x = [1, 0], so y = 1
o The 4th data x = [0, 1], so y = 0
Model A is as good as Model C! Anything needs to learn?
All Models are expected to have equivalent performance!
The c is an unknown function, the performance on training data is
not indicative of the performance on test data.
The performance on test data is all that matters in learning!
Can we really learn something? Learning seems to be doomed but …

Mean
Machine
1.3 Is Learning Feasible? - No Free Lunch Proof
It is meaningless to discuss the superiority of algorithm given no specific problems.
Define
A = algorithm
xin = in-sample data
xout = out-of-sample data (N)
c = unknown target function
h = hypothesis function
Consider all cases of c, the expected out-of-sample
error under algorithm A is the same.
E A xin, c
= ෍
c
෍
h
෍
xout
P xout
PDF of xout
∙ I h xout ≠ c xout
error of h on xout
∙ P h xin, A
PDF of h given A and xin
= ෍
xout
P xout ∙ ෍
h
P h xin, A ∙ ෍
c
I h xout ≠ c xout
= ෍
xout
P xout ∙ ෍
h
P h xin, A ∙
1
2
2N
= 2N−1
෍
xout
P xout ∙ ෍
h
P h xin, A
= 2N−1 ෍
xout
P xout
The error is independent of algorithm A!
E[random guess|xin, c] = E[state-of-the-art|xin, c]

Mean
Machine
1.3 Is Learning Feasible? - Add Probability Distribution
c: X  Y
Training Examples
(x(1), y(1)), (x(2), y(2)),…, (x(n), y(n))
Learning
Algorithm
A
Hypothesis Set
H = {h1, h2,…, hM}
Final Hypothesis
g  c
Input Distribution
P(X)
x
x(1),
x(2),
…,
x(n)

Mean
Machine
1.3 Is Learning Feasible? - Logic Chain of Proof
learning
feasibleProve
target
function
c
Learn g closest
to cFind Etrue(g)
smallShow
Etrain(g) ≈
Etrue(g)
Etrain(g)
small
GivenEtrue(g)
small Showg ≈ c Showlearning
feasible Show
God’s Gift: Etrain(g) ≈ Etrue(g)
Your Capability: Etrain(g) is small
Target function c is UNKNOWN
True error Etrue(g) is IMPOSSIBLE to compute

Mean
Machine
1.3 Is Learning Feasible? - From Unknown to Known
Can we infer u from v?
No, people from sample might all support Clinton
but Trump eventually win!
The above statement is POSSIBLE but not
PROBABLE. When sample size is big enough,
“v ≈ u” is probably approximately correct (PAC)
P( v − u > ε) ≤ 2e−2ε2nHoeffding’s
inequality
• No u appears to be at RHS of above formula
• A link from unknown u to known v
u is deterministic unknown, v is stochastic known.Population Sampleu v = 2/5

Mean
Machine
1.3 Is Learning Feasible? - From Polling to Learning
Polling Learning
Label
Support Trump
Support Clinton
Correct classification
Incorrect classification
Aim Get vote percentage for Trump Learn target function c(x) = y
Data US citizens Examples
Data Distribution Every citizen is i.i.d Every example is i.i.d
In-Sample Sample Training set
In-Sample
Statistics
v = vote percentage for Trump in-sample training error Etrain h =
1
n
σi=1
n
I{h 𝐱 i
≠ c 𝐱 i
}
Out-of-Sample
Statistics
u = vote percentage for Trump out-of-sample true error Etrue h = P(h 𝐱 ≠ c(𝐱))
P( v − u > ε) ≤ 2e−2ε2n P( Etrain(h) − Etrue(h) > ε) ≤ 2e−2ε2n
Polling Learning
simplify
P(bad h) ≤ 2e−2ε2n
analogy
Are we done? No!
This is verification,
not learning

Mean
Machine
1.3 Is Learning Feasible? - From One to Many
c: X  Y
Verify Training Examples
(x(1), y(1)), (x(2), y(2)),…, (x(n), y(n))
A Fixed Hypothesis function
h
Verification
h  c or h ≠ c
Input Distribution
P(X)
x
x(1),
x(2),
…,
x(n)
The entire flowchart assumed a FIXED h and
then came the data.
In order to be real learning, we have to choose
g among a hypothesis set {h1, h2, …, hM} instead
of fixing a single h
P Etrain g − Etrue g > ε
= P bad g
≤ P bad h1 or bad h2 or ⋯ or bad hM
≤ P bad h1 + P bad h2 + ⋯ + P bad hM
≤ 2e−2ε2n + 2e−2ε2n + ⋯ + 2e−2ε2n
= 2Me−2ε2n
P(bad h) ≤ 2e−2ε2n
P(bad g) ≤ 2Me−2ε2n
From h to g
Are we done? No!
M can be very huge,
infinite-huge

Mean
Machine
1.3 Is Learning Feasible? - From Finite to Infinite
When M  ∞
P bad g ≤ 2Me−2ε2n = 2
M
e2ε2n
= very large number
Congratulations!
Even primary student knows P(bad g) ≤ 1
What went wrong?

Mean
Machine
1.3 Is Learning Feasible? - From Infinite to Finite
h1
h2
h3
The hypothesis h1, h2 and h3 are effective equivalent!
Dichotomy
Growth
function
Shattered
Break
Point
VC
Dimension
P Etrain g − Etrue g > ε ≤ 4
2n dvc + 1
e
1
8
ε2n

Mean
Machine
1.3 Is Learning Feasible? - Learning is Feasible
P Etrain g − Etrue g > ε ≤ 4
2n dvc + 1
e
1
8
ε2n
We can reach above conclusion by not knowing
• algorithm A
• input distribution P(X)
• target function c
We just need
• training examples D
• hypothesis set H
to find final hypothesis g to learn c.
Learning is feasible
when VC dimension is finite
c: X  Y
Training Examples
(x(1), y(1)), (x(2), y(2)),…, (x(n), y(n))
Learning
Algorithm
A
Hypothesis Set
H = {h1, h2,…, hM}
Final Hypothesis
g  c
Input Distribution
P(X)
x
x(1),
x(2),
…,
x(n)

Mean
Machine
1.4 How to Learn Well? - Over-learn vs Under-learn
Exercise Exam
Both are leaves Which one is leaf?
When you learn too much:
This is not leaf, since
leaves must be serrated
When you learn too little:
This is leaf, since leaves
are green

Mean
Machine
1.4 How to Learn Well? - Overfit vs Underfit

Mean
Machine
Financial
Engineering
Overview
Data, Parameter
Curve Construction
Model Calibration
Instrument Valuation
Risk Measurement

Mean
Machine
2.1 Overview
Curve
Construction
Model
Calibration
Instrument
Valuation
Θnum(t)
Θprm(t)
P(t, ·) Θmdl(t) V(t)
Evaluation
Risk
Measurement
CalibrationBoostrapping
Perturbation
Extraction
∂V/∂Θmkt
∂V/∂ΘmdlΘmkt(t)
Data
Parameter
Variable
Computation
P&L
VaR

Mean
Machine
2.2 Data, Parameter
• Deposit rates, futures rates and swap rates (yield curve construction)
• Cap and swaption implied volatilities (IR volatility calibration)
• FX swap point and volatilities (FX volatility calibration)
• CDS spread curve (hazard rate calibration)
Θmkt(t)
Daily observable market data
• Libor fixing (historical data point)
• Correlation between CMS and FX (historical time series)
• Short rate mean reversion speed (κ = 0.01)
Θprm(t)
Indirectly observed or
estimated from historical data
or treated as “exotic” constants
• Number of Monte Carlo paths (N = 50,000)
• Number of node points in finite difference grid (N = 1,000)
• Tolerance of errors in optimization (ε = 10-5)
Θnum(t)
Parameters that control the
numerical schemes

Mean
Machine
2.3 Curve Construction
Benchmark Curve
Deposit
ED Futures
FRA
Swap
2
Benchmark Curve
USD USD
FX Swap Point
CRX Basis Swap
Deposit
ED Futures
FRA
Swap
4
Index Curve
CUR CUR CUR
4
1
Discount Curve
USD
OIC
OIS
USD
2
IR Basis Swap
USD
Index Curve
USD
FX Discount Curve
CUR
CURUSD
3 3
IR Basis Swap
CUR
3
1
Discount Curve
CUR
OIC
OIS
CUR
2
2
3
4

Mean
Machine
2.4 Model Calibration
Expiry|Strike 15705.69 16578.23 17450.77 18323.31 19195.85
1M 28.63 26.00 24.34 23.16 23.16
3M 27.06 25.60 24.70 23.94 23.49
6M 26.28 25.47 24.92 24.34 23.97
12M 26.07 25.66 25.33 24.96 24.74
24M 26.54 26.40 26.16 25.94 25.72
60M 29.00 28.87 28.73 28.66 28.60
Maturity|Expiry 1M 3M 6M 1Y 2Y 3Y 4Y 5Y 7Y 10Y 15Y 20Y 25Y 30Y
1Y 59.80 56.15 56.27 65.12 66.75 55.32 44.80 36.16 28.18 22.39 19.98 18.09 17.52 17.17
2Y 53.00 46.22 50.38 59.33 56.22 46.80 39.23 33.31 27.06 22.04 20.26 18.72 18.19 18.05
3Y 53.00 43.60 47.48 57.00 48.87 41.21 35.61 31.16 26.06 21.80 20.19 18.63 18.15 18.14
4Y 52.70 50.04 48.35 50.06 43.32 37.41 33.03 29.58 25.17 21.60 20.07 18.46 18.00 18.12
5Y 50.80 48.45 48.02 46.04 40.06 34.93 31.15 28.33 24.50 21.47 19.84 18.04 17.65 17.94
7Y 41.50 43.49 41.98 39.47 35.22 31.46 28.38 26.08 23.38 21.18 19.73 18.11 18.25 19.32
10Y 10.00 33.70 32.49 32.59 32.36 30.12 27.94 26.01 24.66 22.56 20.74 19.55 18.23 19.28
15Y 30.60 26.74 27.17 27.46 26.07 24.78 23.60 22.70 21.20 19.45 17.94 17.32 19.25 21.27
20Y 25.50 25.24 25.69 25.90 24.73 23.70 22.70 21.89 20.60 19.09 18.09 17.63 19.57 22.18
25Y 24.80 24.65 24.68 24.77 23.78 22.95 22.11 21.39 20.27 19.02 18.24 17.51 19.88 22.92
30Y 24.60 24.52 24.11 24.04 23.11 22.40 21.65 21.01 20.05 19.04 18.22 17.53 20.27 23.39
Expiry|Strike 272.57 287.71 295.28 302.85 310.42 317.99 333.14
1M 32.33 31.18 30.63 30.60 31.40 32.27 33.32
2M 32.13 32.18 32.38 32.71 33.11 33.47 33.92
3M 35.17 35.67 36.10 36.52 36.93 37.35 37.81
6M 34.63 35.10 35.55 36.00 36.48 36.93 37.41
1Y 31.87 32.07 32.24 32.45 32.69 33.00 33.29
18M 29.31 29.68 29.95 30.29 30.66 31.10 31.60
2Y 28.75 29.07 29.31 29.66 30.03 30.49 31.09
Expiry|Convention ATM 25RR 10RR 25BF 10BF
O/N 6.44 -0.56 -1.01 0.14 0.48
1W 8.55 -0.65 -1.17 0.15 0.50
2W 8.65 -0.75 -1.35 0.14 0.47
1M 8.78 -1.00 -1.79 0.11 0.40
2M 8.70 -1.10 -1.98 0.17 0.59
3M 8.75 -1.25 -2.25 0.18 0.62
6M 9.00 -1.50 -2.74 0.28 0.98
9M 9.19 -1.60 -2.91 0.30 1.03
1Y 9.30 -1.65 -3.00 0.29 0.99
2Y 9.78 -1.70 -3.18 0.32 1.15
Calibration
EQ: N225 Option CM: Coffee Option
IR: USD ATM Swaption
FX: EURUSD Option
SABR , , ,  Schwartz κ, σ, θ
Hull-White κ, σ
Heston κ, v0, η, , 

Mean
Machine
2.5 Instrument Valuation - Fundamentals
No
Arbitrage
Numeraire
Change
Measure
Pricing
Formula V 0 = N 0 × EN
V T
N T
Numeraire Probability Measure
Bank Account Risk-neutral Measure
Zero-Coupon Bond Forward Measure
Annuity Swap Measure
Given two assets A and B with their
payoff f and g at T, if f = g, then A = B
dP
dQ
=
Q 0
P 0
∙
P T
Q T

Mean
Machine
2.5 Instrument Valuation - Fundamentals (No-Arbitrage Principle)
Given two assets A and B with their payoff f and g at T, by no-arbitrage principle, if f = g, then A = B.
• When A > B ：At t = 0 buy B sell A with profit A – B > 0, at T sell B buy A with profit g – f = 0
• When A < B ：At t = 0 sell B buy A with profit B – A > 0, at T buy B sell A with profit f – g = 0
B0 = A0 =
1
1 + r T
𝐝𝐢𝐬𝐜𝐨𝐮𝐧𝐭
∙ pu ∙ h uS0 + pd ∙ h(dS0)
𝐞𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐩𝐚𝐲𝐨𝐟𝐟
A0 = xS0 − yC
AT = ൝
xuS0 − y 1 + r T
C stock ↑ uS0
xdS0 − y 1 + r T
C stock ↓ dS0
BT = ቊ
h(uS0) stock ↑ uS0
h(dS0) stock ↓ dS0
AT = BT
xuS0 − y 1 + r TC = h(uS0) stock ↑ uS0
xdS0 − y 1 + r TC = h(dS0) stock ↓ dS0
x =
h uS0 − h(dS0)
(u − d)S0
y =
1
1 + r TC
∙
d ∙ h uS0 − u ∙ h(dS0)
u − d
Use no-arbitrage principle to price any financial instrument B at time 0, B0
Construct
Express
Equal
Link
Solve
Present Value = Discount Factor × Expected Payoff

Mean
Machine
2.5 Instrument Valuation - Fundamentals (Numeraire and Probability Measure)
Numeraire Numeraire is unit, and it can be money, tradeable asset or even apple.
Probability
Measure
Probability measure is a set of probabilities for certain event.
Q(H) = 0.8, Q(T) = 0.2
P(H) = P(T) = 0.5Fair Coin
Biased Coin
A 0 = φ1A1 T + φ2A2 T + ⋯ + φKAK T = ෍
k=1
K
φkAk(T)
B 0 = φ1B1 T + φ2B2 T + ⋯ + φKBK T = ෍
k=1
K
φkBk(T)
A 0
B 0
=
σk=1
K
φkAk T
σk=1
K
φkBk T
= ෍
k=1
K
φk
b
Ak T = ෍
k=1
K
φkBk T
b
×
Ak T
Bk T
= ෍
k=1
K
φkBk T
σk=1
K
φkBk T
×
Ak T
Bk T
= ෍
k=1
K
πk ×
Ak T
Bk T
= EB
A T
B T
A 0 = B 0 × EB
A T
B T
B is a numeraire, EB is the expectation with probability measure induced by B
Numeraire Probability Measure Instrument
Bank Account Risk-neutral Measure FX, Equity, Commodity Option
Zero-Coupon Bond Forward Measure Cap, Floor
Annuity Swap Measure Swaption

Mean
Machine
2.5 Instrument Valuation - Fundamentals (Change of Probability Measure)
EP X = p1x1 + p2x2 = q1
p1
q1
x1 + q2
p2
q2
x2 = EQ Z ∙ X
q1 = 0.8, q2 = 0.2
p1 = p2 = 0.5Fair Coin
Biased Coin
Z is Radon-Nikodym derivative,
denoted as Z = dP/dQQuestion 1:
Relationship between two
probability measures
Numeraire
Probability
Measure
corresponds Change
Numeraire
Change
Probability
Measure
corresponds
What is the relationship between two probability measures
What is the relationship between two numeraires
Why change of measure

Mean
Machine
2.5 Instrument Valuation - Fundamentals (Change of Probability Measure)
Question 2:
Relationship between
two numeraires
Question 3:
Why change of measure
A 0
B 0
= EB
A T
B T
EP X = EQ
dP
dQ
∙ XE1 E2
EQ
A T
Q T
∙
Q 0
P 0
=
A 0
Q 0
𝐁𝐲 𝐄𝟏
𝐐=𝐁
∙
Q 0
P 0
=
A 0
P 0
= EP
𝐁𝐲 𝐄𝟏
𝐏=𝐁
A T
P T
= EQ
dP
dQ
∙
A T
P T
𝐁𝐲 𝐄2
dP
dQ
=
Q 0
P 0
∙
P T
Q T
Risk-Neutral Measure EQ Forward Measure ET
Numeraire Bank Account β(t) Zero Coupon Bond P(t, T)
Property β(0) = 1 P(T, T) = 1
Martingale
Formula
V 0
β 0
= EQ
V T
β T
V 0
P 0, T
= ET
V T
P T, T
Simplified
Formula
V 0 = EQ
V T
β T
V 0 = P 0, T ∙ ET V T

Mean
Machine
2.5 Instrument Valuation - Pricing Methods
V t = N t × Et
N
V T
N T
1. Find the PDF of V(T)/N(T) under measure N.
2. Use integration to represent expectation.
3. Simplify it to closed-form if possible, or leave it as
numerical integration otherwise.
1. Change measure N to risk-neutral measure.
2. Use Feynman-Kac Theorem to derive PDE of V.
3. Fix solution domain, construct grid, set terminal and
boundary conditions, discretize derivatives in spatial
and time dimension, adopt finite difference scheme.
1. By law of large number, compute E[V/N] by taking
the average of Vi/Ni as approximation.
2. Adopt variance reduction technique to enhance
Monte Carlo efficiency.
Closed-Form,
Numerical
Integration
PDE Finite
Difference
Method
Monte Carlo
Method

Mean
Machine
2.5 Instrument Valuation - Closed-Form, Numerical Integration
dS t
)S(t
= [r − q]dt + σdB(t)
൞
dS t
S t
= r − q dt + v t dB1 t
dv t = κ θ − v t dt + η v t dB2 t
Closed-Form
V = ω ∙ [e−qT
∙ S0Φ(ωd+) − e−rT
∙ KΦ(ωd−)]
• d =
1
σ T
ln
S0e r−q T
K
σ T
2
Numerical Integration
V = ω ∙ [e−qT
∙ S0P1(ω) − e−rT
∙ KP2(ω)]
• P1 ω = ½ 1 −  + P1 S0, v0, T, K
• P2 ω = ½ 1 −  + P2 S0, v0, T, K
• Pj x, v, T, y =
1
2
+
1
π
‫׬‬0
∞
Re
Cj T,ϕ −Dj T,ϕ v+ln(
x
y
)ϕi
ϕi
dϕ
• Dj T, ϕ =
bj−ρηϕi+dj−(bj−ρηϕi−dj)gje
djT
η2(1−gje
djT
)
• Cj T, ϕ = r − q Tϕi +
κθ
η2 bj − ρηϕi + dj T − 2ln
1−gje
djT
1−gj
• dj = (bj − ρηϕi)2−η2(2ujϕi − ϕ2), gj =
bj−ρηϕi+dj
bj−ρηϕi−dj
• b1 =  - η, b2 = , u1 = 0.5, u2 = -0.5
Black-Scholes Model
Heston Model
Technique Used:
• Itô's Formula
• Girsanov’s Theorem
• Moment Matching
• Drift Interpolation
• Parameter Averaging

Mean
Machine
2.5 Instrument Valuation - PDE Finite Difference Method (SDE to PDE)
Given the SDE of x(t) and payoff function V of a derivative at maturity T:
)dx t = μ t, x dt + σ t, x dB(t
V x(T), T = h(x T )
The V(x, t) satisfies the following PDE:
𝜕V
𝜕t
+ μ t, x
𝜕V
𝜕x
+
1
2
σ2
t, x
𝜕2V
𝜕x2
− rV = 0
V x, t = e−r T−t ∙ Et h(x T )
By no-arbitrage principle:
SDE
PDE
Feynman-Kac

Mean
Machine
2.5 Instrument Valuation - PDE Finite Difference Method (Grid Construction)
t
X
t0
tn = T
x0 Xm+1
Terminal Condition
Xj-1 Xj Xj+1
ti-1
ti
ti+1
Interior Points
BoundaryCondition
BoundaryCondition
x ∈ xj j=0
m+1
⇨ xj = xmin + j∆x, ∆x=
xmax −xmin
m + 1
t ∈ ti i=0
m
⇨ ti = i∆t, ∆t=
T
n

Mean
Machine
2.5 Instrument Valuation - PDE Finite Difference Method (Discretization and Scheme)
t
X
Fully Explicit (θ = 0)
(xj, ti)(xj-1, ti) (xj+1, ti)
(xj, ti-1)
t
X
(xj, ti+1)
Fully Implicit (θ = 1) t
X
(xj, ti+1)(xj-1, ti+1) (xj+1, ti+1)
Crank-Nicolson (θ = ½)
Order Spatial Dimension Time Dimension
1st
𝜕Vj t
𝜕x
≈
Vj+1 t − Vj−1 t
2∆x
𝜕𝐕
𝜕t
≈
𝐕 ti+1 − 𝐕 ti
∆t
2nd
𝜕Vj
2
t
𝜕x2
≈
Vj+1 t − 2Vj t + Vj−1 t
∆x
2
Nodes in relation Node at discretization Nodes not in relation
 Use central difference on
∂Vj/∂xand ∂2Vj/∂x2 at xj
 Discretize ∂Vj/∂tat tθ
i,i+1 =
θti + (1 - θ)ti+1

Mean
Machine
2.5 Instrument Valuation - PDE Finite Difference Method (Representation)
The difference equation at (tθ
i,i+1, xj) is
Vj ti+1 − Vj ti
∆t
= − μ ti,i+1
θ
, xj ·
Vj+1 ti,i+1
θ
− Vj−1 ti,i+1
θ
2∆x
−
σ2 ti,i+1
θ
, xj
2
·
Vj+1 ti,i+1
θ
− 2Vj ti,i+1
θ
+ Vj−1 ti,i+1
θ
∆x
2
+ r ti,i+1
θ
, xj · Vj(ti,i+1
θ
)
𝐈 − θ∆t 𝐀 ti,i+1
θ
∙ 𝐕 ti = 𝐈 + 1 − θ ∆t 𝐀 ti,i+1
θ
∙ 𝐕 ti+1 + θ𝛀 ti + 1 − θ 𝛀 ti+1
Write the algebraic form to matrix form
Identity Matrix Tri-Diagonal Matrix Boundary Value Vector

Mean
Machine
2.5 Instrument Valuation - Monte Carlo Method (Fundamentals)
Consider a derivative V with time T payout
V(T) = g(T), by no-arbitrage principle
V t = N t × Et
N
g T
N T
Law of Large Numbers
Let Y1, Y2, …, Yn be a sequence of independent identically distributed (i.i.d.)
random variables with finite expectation . Define the sample mean
ഥY n =
1
n
σi=1
n
Yi → lim
n→∞
ฑഥY n
sample
mean
= lim
n→∞
1
n
σi=1
n
Yi = ฎμ
population
mean
V t ≈ ഥV t = N t ×
1
n
෍
i=1
n
gi
Ni
Central Limit Theorem
Let Y1, Y2, …, Yn be a sequence of i.i.d. random variables with finite
expectation  and standard deviation σ. Then for n → ∞
ฑഥY n
sample
mean
− ฎμ
population
mean
s(n)
n
~ N 0,1
where
s2 n =
1
n−1
σi=1
n
[Yi − ഥY n ]2 =
n
n−1
σ2
V(t) ∈ ഥV t − zα
2
∙
s n
n
, ഥV t + zα
2
∙
s n
n
ฑs n
n
𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐞𝐫𝐫𝐨𝐫 ↓
=
n − 1
n
ฏσY
𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞 ↓
ดn
𝐬𝐚𝐦𝐩𝐥𝐞 𝐬𝐢𝐳𝐞 ↑

Mean
Machine
2.5 Instrument Valuation - Monte Carlo Method (Variance Reduction)
E Ynew = E Y
Variance Reduction
Yi
av
= (Yi
1
+ Yi
2
)/2
E Ynew = E
Yi
1
+ Yi
2
2
= E Y
Var Ynew = Var
Yi
1
+ Yi
2
2
=
Var Y
2n
1 + ρ <
Var Y
2
)Ynew
= Y + c(Ycv
− μcv
E Ynew
= E Y] + cE[Ycv
− μcv
= E Y
Var Ynew
= min
c
Var Y + c Ycv
− μcv
= (1 − ρY,Ycv)Var Y ≤ Var Y
Ynew
= E Y Z
E Ynew = E E Y Z = E[Y]
Var Y = E Var Y Z + Var E Y Z
≥ Var E Y Z = Var Ynew
Ynew
= E Y Z
E Ynew = E E Y Z = E[Y]
Var Ynew
= min
Nj
E Var Y Z
≤ E Var Y Z ≤ Var Y
Var Ynew < Var Y
Ef )V(X = නV x f x dx
= න
V x f x
g x
g(x)dx = Eg
V X f X
g X
Varf
V X − Varg
V X f X
g X
= න V2 x f x 1 −
f x
g x
dx > 0
൝
g x > f x , when V2
x f x large
g x < f x when V2
x f x small
Antithetic
Variate
Control
Variate
Conditioning
Sampling
Stratified
Sampling
Importance Sampling

Mean
Machine
2.6 Risk Measurement - Sensitivity
𝜕V
𝜕Θk
=
V Θ1, Θ1, … , Θk + ∆, … , ΘK − V Θ1, Θ1, … , Θk, … , ΘK
∆
Bump and
Revaluation
Compute delta of European Option in Monte Carlo
Pathwise Differentiation
Compute delta of Digital Option in Monte Carlo
Likelihood Ratio
𝜕V
𝜕S0
= E
)𝜕g(ST
𝜕ST
∙
𝜕ST
𝜕S0
= E
𝜕 ST − K +
𝜕ST
∙
𝜕ST
𝜕S0
= E 1{ST > K} ∙
𝜕ST
𝜕S0
𝜕V
𝜕S0
=
𝜕 ‫׬‬ g(ST) ∙ f(ST; S0)dST
𝜕S0
= න g(ST) ∙
)𝜕f(ST; S0
𝜕S0
dST
= න g(ST) ∙
൯fS0
(ST; S0
)f(ST; S0
f(ST; S0)dST
= E g(ST) ∙
൯fS0
(ST; S0
)f(ST; S0

Mean
Machine
2.6 Risk Measurement - Value-at-Risk
Window: 1 Year
Holding Period: 10 Day
Confidence Level: 99%
Risk Factors
Historical Time Series Perturbations Historical Scenarios
Simulated PVs Simulated P&Ls
S1
1
S1
m
…
S1
250
…
S2
1
S2
m
…
S2
250
…
Sn
1
Sn
m
…Sn
250
…
PV1
PVm
…
PV250
…
P&L1
P&Lm
…
P&L250
…
Δ1
1
Δ1
m
…
Δ1
250
…
Δ2
1
Δ2
m
…
Δ2
250
…
Δn
1
Δn
m
…
Δn
250
…
RF1
1
RF1
m
…
RF1
260
…
RF2
1
RF2
m
…
RF2
260
…
RFn
1
RFn
m
…
RFn
260
…
RF1
RFn
…
…
…
RF2
……
Portfolio VaR
(100 - )% VaR PnL0
ProfitLoss
Loss < VaR
%

Mean
Machine
Quantitative
Investing
Overview
Quant Platform
Data Processing
Stock Selection
Portfolio Construction

Mean
Machine
3.1 Overview
• Data Collection
• Outlier Handling: MAD, 3σ,
Percentile
• Standardization: Raw, Ranked
Data
Preprocessing
Stock
Selection
• Optimization: EW, MVO, GMV,
MDP, RP, RB, EMV, BL
• Constraints: Industry, Factor
Exposure, Stock
Portfolio
Construction
Traditional
Approach
Machine Learning
Approach
• Single-Factor Test: IC, Stratified Backtesting
• Multi-Factor Test: Correlation, Factor Synthesis
• Multi-Factor Linear Regression
Import Package
Parameter
Setting
Data Labeling Model Training
Model Setting
Data Splitting
Model
Assessment
Strategy
Implementation
Strategy
Assessment

Mean
Machine
3.2 Quant Platform
https://www.joinquant.com/
https://uqer.io/home/https://www.ricequant.com/
https://www.quantopian.com

Mean
Machine
3.3 Data Preprocessing - Data Collection
date = '2018-1-4'
stocks = all_instruments(type="CS", date=date).order_book_id.tolist()
data = get_fundamentals(
query(
fundamentals.eod_derivative_indicator.pb_ratio,
fundamentals.eod_derivative_indicator.market_cap
).filter(
fundamentals.income_statement.stockcode.in_(stocks)
), date, '1d').major_xs(date).dropna()
data['BP'] = 1/data['pb_ratio']
data.head(3).append(data.tail(3))

Mean
Machine
3.3 Data Preprocessing - Outlier handling
def filter_extreme_MAD(series,n):
median = series.quantile(0.5)
new_median = ((series - median).abs()).quantile(0.50)
max_range = median + n*new_median
min_range = median - n*new_median
return np.clip(series,min_range,max_range)
def filter_extreme_3sigma(series,n=3):
mean = series.mean()
std = series.std()
max_range = mean + n*std
min_range = mean - n*std
return np.clip(series,min_range,max_range)
def filter_extreme_percentile(series,min = 0.025,max = 0.975):
series = series.sort_values()
q = series.quantile([min,max])
return np.clip(series,q.iloc[0],q.iloc[1])
MAD
3 Sigma
Percentile

Mean
Machine
3.3 Data Preprocessing - Standardization
def standardize_series(series):
return (series-series.mean()) / series.std()
new = filter_extreme_3sigma(data['BP'])
ax = standardize_series(new).plot.kde(label = 'Standardized Raw Factor')
ax.legend();
ax = standardize_series(new.rank()).plot.kde(label = 'Standardized
Ranked Factor')
ax.legend();
zi =
Xi − μ
σ
zi =
Yi − μ
σ
, Y = Rank(X)
Standardized Raw Factor
Standardized Ranked Factor

Mean
Machine
3.4 Stock Selection - Traditional Approach
ri = βi1f1 + ⋯ + βiKfK + εi = ෍
k=1
K
βikfk + εi
Multi-Factor
Model
The basic premise of multi-factor model is
that similar assets display similar returns.
Factor
Exposure
Factor
Premium
Specified
Return
Excess
Return
Estimate Factor Premium
Consider the following stocks and factors:
Stock 1: Apple Stock 2: Facebook Stock 3: Google
Factor 1: PE (price-to-earnings) Factor 2: DY (dividend yield)
For each t
1. Collect the factor exposures i1(t-1) and i2(t-1),
2. Collect the stock price at t-1 and t, and compute excess return ri(t)
3. Perform cross-sectional regression to get factor premiums f1(t) and f2(t)
r1(t) = β11(t − 1)f1(t) + β12(t − 1)f2(t)
r2(t) = β21(t − 1)f1(t) + β22(t − 1)f2(t)
r3(t) = β31(t − 1)f1(t) + β32(t − 1)f2(t)
)f1(1 )f2(1 ⋯ )fK(1
)f1(2 f2(2) ⋱ )fK(2
⋮ ⋱ ⋱ ⋮
)f1(T )f2(T ⋯ )fK(T
f1 T + 1 f2 T + 1 ⋯ fK T + 1
Collect
Time Series
From 0 to T
Predict
Factor
Premium
at T+1
AR(p)
MA(q)
ARMA(p, q)
…

Mean
Machine
3.4 Stock Selection - Traditional Approach
Factor Premium
Factor Exposure
Excess Return
•Rank and Select
𝐁(T + 1) =
൯β1,1(T + 1 ൯β1,2(T + 1 ⋯ ൯β1,K(T + 1
൯β2,1(T + 1 ൯β2,2(T + 1 ⋱ ൯β2,K(T + 1
⋮ ⋱ ⋱ ⋮
൯βN,1(T + 1 ൯βN,2(T + 1 ⋯ ൯βN,K(T + 1
𝐟(T + 1) = f1 T + 1 f2 T + 1 ⋯ fK T + 1 T
ri(T + 1) = ෍
k=1
K
βi,k(T + 1) · fk T + 1
𝐫(T + 1) = r1 T + 1 r2 T + 1 ⋯ rN T + 1 T

Mean
Machine
3.4 Stock Selection - Machine Learning Approach
Import Package
Parameter
Setting
Data Labeling Model Training
Model Setting
Data Splitting
Model
Assessment
Strategy
Implementation
Strategy
Assessment

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Import Package)
import numpy as np # Matrix Computation
import pandas as pd # Handle Dataframe
# Show plot in this notebook
%matplotlib inline
import matplotlib.pyplot as plt # Plotting
from sklearn.model_selection import train_test_split # Split training and test set
from sklearn.model_selection import GridSearchCV # Select hyper-parameter by cross-validation error
from sklearn.model_selection import KFold # CV model for binary class or balanced class
from sklearn.model_selection import StratifiedKFold # CV model for multi-class or inbalanced class
from sklearn import metrics as me
# Machine Learning Model
from sklearn.svm import SVC # Support Vector Machine
from sklearn.ensemble import RandomForestClassifier as RFC # Random Forest
from sklearn.ensemble import GradientBoostingClassifier as GBC # Gradient Boosted

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Parameter Setting)
class PARA:
method = 'SVM' # Specify the method, can be 'RF', 'GBT'
month_train = range(1, 84+1) # In-sample 84 data points = 84 training examples
month_test = range(85, 120+1) # Out-of-sample 36 data points = 36 test examples
percent_select = [0.5, 0.5] # 50% positive examples，50% negative examples
cv = 10 # 10-fold cross-validation
seed = 1 # Random seed, for results reproduction
para = PARA()

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Data labelling)
def label_data( data ):
data['Label'] = np.nan # Initialization
data = data.sort_values( by='Return', ascending=False ) # Sort excess return in descending order
n_stock = np.multiply( para.percent_select, data.shape[0] ) # Compute the number for stocks for pos and neg class
n_stock = np.around(n_stock).astype(int) # Round number of stocks to integer
data.iloc[0:n_stock[0], -1] = 1 # Assign 1 to those stocks with best performace
data.iloc[-n_stock[1]:, -1] = 0 # Assign 0 to those stocks with worst performace
data = data.dropna(axis=0) # Delete examples with NaN value
return data
Data Format: m × n matrix, first row
contains labels, next m-1 rows contains
the information of m-1 stocks.
• Column 1 - 3: basic information
• Column 4: excess return of next month
• Column 5 - n: factor exposure
csv file

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Data Splitting)
for i in para.month_train: # load csv month by month
file_name = str(i) + ’.csv'
data_curr_month = pd.read_csv( file_name, header=0 )
para.n_stock = data_curr_month.shape[0]
data_curr_month = data_curr_month.dropna(axis=0) # remove NaN
data_curr_month = label_data( data_curr_month ) # label data
# merger to a single dataframe
if i == para.month_train[0]: # first month
data_train = data_curr_month
else:
data_train = data_train.append(data_curr_month)
X = data_train.loc[:, 'EP':'BIAS'];
y = data_train.loc[:, 'Label'];
X_train, X_cv, y_train, y_cv = train_test_split( X, y, test_size=1.0/para.cv, random_state=para.seed )

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Model Setting)
if para.method == 'SVM': # Support Vector Machine
model = SVC( kernel = 'linear', C = 1 )
elif para.method == 'RF': # Random Forest
model = RFC( n_estimators=200, max_depth=6, random_state=para.seed )
elif para.method == 'GBT': # Gradient Boosted Tree
model = GBC( n_estimators=200, max_depth=6, random_state=para.seed )

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Model Training)
model.fit( X_train, y_train )
y_pred_train = model.predict( X_train )
y_score_train = model.decision_function( X_train )
y_pred_cv = model.predict( X_cv )
y_score_cv = model.decision_function( X_cv )
print( 'Training set, accuracy = %.2f' %me.accuracy_score( y_train, y_pred_train ) )
print( 'Training set, AUC = %.2f' %me.roc_auc_score( y_train, y_score_train ) )
print( 'Validation set, accuracy = %.2f' %me.accuracy_score( y_cv, y_pred_cv ) )
print( 'Validation set, AUC = %.2f' %me.roc_auc_score( y_cv, y_score_cv ) )
kernel = ['linear', 'rbf']
C = [0.01, 0.1, 1, 10]
param_grid = dict(kernel=kernel, C=C)
kfold = StratifiedKFold( n_splits=PARA.cv, shuffle=True, random_state=PARA.seed )
grid_search = GridSearchCV( model, param_grid, n_jobs=-1, cv=kfold, verbose=1 )
grid_result = grid_search.fit( X, y )
Default
Parameter
Hyperparameter
Tuning

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Model Training)
best_model = grid_result.best_estimator_
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
# plot results
scores = np.array(means).reshape(len(kernel), len(C))
for i, value in enumerate(kernel):
plt.plot(C, scores[i], label='depth: ' + str(value))
plt.legend()
plt.xlabel('C')
plt.ylabel('kernel')
plt.savefig('kernel_vs_C.png')

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Model Assessment)
for i in para.month_test: # Print accuracy and AUC for each test example
y_true_i_month = pd.DataFrame( {'Return':y_true_test.iloc[:,i-1]} )
y_pred_i_month = y_pred_test.iloc[:,i-1]
y_score_i_month = y_score_test.iloc[:,i-1]
y_true_i_month = y_true_i_month.dropna(axis=0) # remove NaN
y_i_month = label_data( y_true_i_month )['Label']
y_pred_i_month = y_pred_i_month[ y_i_month.index ].values
y_score_i_month = y_score_i_month[ y_i_month.index ].values
print( 'test set, month %d, accuracy = %.2f' %(i, me.accuracy_score( y_i_month, y_pred_i_month ) ) )
print( 'test set, month %d, AUC = %.2f' %(i, me.roc_auc_score( y_i_month, y_score_i_month ) ) )
…

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Strategy Implementation)
n_stock = 15
strategy = pd.DataFrame( {'Return':[0]*para.month_test[-1], 'Value':[1]*para.month_test[-1]} )
for i in para.month_test:
y_true_i_month = y_true_test.iloc[:,i-1]
y_score_i_month = y_score_test.iloc[:,i-1]
y_score_i_month = y_score_i_month.sort_values(ascending=False) # Sort the score (probability) in descending order
i_index = y_score_i_month[0:n_stock].index # Find the index of the first 15 stocks
strategy.loc[i-1,'Return'] = np.mean(y_true_i_month[i_index])/100 # Compute the mean return of the 15 stocks
strategy['Value'] = (strategy['Return']+1).cumprod() # Mutiply the mean return each test month to get total return

Mean
Machine
3.4 Stock Selection - Machine Learning Approach (Strategy Assessment)
strategy_value = strategy.reindex(index=para.month_test, columns=['Value'])
strategy_return = strategy.reindex(index=para.month_test, columns=['Return'])
plt.plot( para.month_test, strategy_value, 'r-' )
plt.show()
excess_return = np.mean(strategy_return) * 12
excess_vol = np.std(strategy_return) * np.sqrt(12)
IR = excess_return / excess_vol
print( 'annual excess return = %.2f' %excess_return )
print( 'annual excess volatility= %.2f' %excess_vol )
print( 'information ratio = %.2f' %IR)

Mean
Machine
3.5 Portfolio Construction
Expected Return and
Variance Known?
Yes
MVO
No
Any View on Risk?
No
Any View on Return?
Yes
RB
Yes
BL
RP
EMV
Zero
Correlation
MDP
GMV EW
Same
Sharpe
Ratio
• MVO: Mean-Variance Optimization
• MDP: Most Diversified Portfolio
• GMV: Global Minimum Variance
• EMV: Equal Marginal Volatility
• EW: Equal Weight
• RB: Risk Budgeting
• RP: Risk Parity
• BL: Black-Litterman
Same
Risk
Budget
Same
Volatility
Same VolatilitySame
Expected
Return
Zero
Average
Correlation

Mean
Machine
Takeaway
P-Measure
Machine
Learning
P-, Q-
Measure
Quantitative
Investing
Q-Measure
Financial
Engineering
Model the
future
Real
probability
Discrete
process
Statistics
Estimation
Buy-side
Extrapolate
the present
Risk-neutral
probability
Continuous
process
Ito’s
Calculus
Calibration
Sell-side

THANK YOU
STEVEN WANG SHENGYUAN
Wechat Account: MeanMachine1031

Machine Learning, Financial Engineering and Quantitative Investing

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Machine Learning, Financial Engineering and Quantitative Investing

Similar to Machine Learning, Financial Engineering and Quantitative Investing (20)

Recently uploaded

Recently uploaded (20)

Machine Learning, Financial Engineering and Quantitative Investing