Ds for finance day 3

Location:
#BostonFintechWeek
Babson College
Boston Campus
Data Science for Finance Crash Course
Day 3
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
Sri.krishnamurthy@qusandbox.com
www.analyticscertificate.com

2
Slides & Materials will be available at:
https://researchhub.qusandbox.com

MODULE 1:
• Data Science in Finance
Orientation on the Credit risk case study
Lab 1:
Exploring Data sets to make sense in Python
MODULE 2:
• Machine Learning in 30 minutes!
Lab 2:Credit risk case study
Building your first model
Agenda

MODULE 3:
• Evaluating machine learning models: The metrics
Understanding and tuning your model
MODULE 4:
• Deployment of machine learning models and Prediction through AP
Deploying your model and predicting interest rates
Agenda

5
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4

7
Types of algorithms
Machine
learning
Supervised
Learning
Prediction
Classification
Unsupervised
Learning
Clustering

8
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y

9
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1

10
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
𝑌 = 𝛽0 + 𝛽1 𝑋1
Linear Regression Model Neural network Model

11
• Non-Parametric models
▫ No functional form assumed
• Examples : Random Forest
Supervised Learning models
https://commons.wikimedia.org/wiki/File:Random_forest_diag
ram_complete.png

14
Evaluating
Machine
learning
algorithms
Supervised -
Prediction
R-square RMS MAE MAPE
Supervised-
Classification
Confusion Matrix ROC Curves
Evaluation framework

15
• The prediction error for record i is defined as the difference
between its actual y value and its predicted y value
𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖
• 𝑅2
indicates how well data fits the statistical model
𝑅2
= 1 −
σ𝑖=1
𝑛
(𝑦𝑖 − ො𝑦𝑖)2
σ𝑖=1
𝑛
(𝑦𝑖 − ത𝑦𝑖)2
Prediction Accuracy Measures

16
• Fit measures in classical regression modeling:
• Adjusted 𝑅2 has been adjusted for the number of predictors. It increases only
when the improve of model is more than one would expect to see by chance
(p is the total number of explanatory variables)
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 −
Τσ𝑖=1
𝑛
(𝑦𝑖 − ො𝑦𝑖)2
(𝑛 − 𝑝 − 1)
σ𝑖=1
𝑛
𝑦𝑖 − ത𝑦𝑖
2 /(𝑛 − 1)
• MAE or MAD (mean absolute error/deviation) gives the magnitude of the
average absolute error
𝑀𝐴𝐸 =
σ𝑖=1
𝑛
𝑒𝑖
𝑛

17
▫ MAPE (mean absolute percentage error) gives a percentage score of
how predictions deviate on average
𝑀𝐴𝑃𝐸 =
σ𝑖=1
𝑛
𝑒𝑖/𝑦𝑖
𝑛
× 100%
• RMSE (root-mean-squared error) is computed on the training and
validation data
𝑅𝑀𝑆𝐸 = 1/𝑛 ෍
𝑖=1
𝑛
𝑒𝑖
2

18
• Consider a two-class case with classes 𝐶0 and 𝐶1
• Classification matrix:
Classification matrix
Predicted Class
Actual Class 𝐶0 𝐶1
𝐶0
𝑛0,0= number of 𝐶0 cases
classified correctly
classified incorrectly as 𝐶1
𝐶1
classified incorrectly as 𝐶0
classified correctly

19
• Estimated misclassification rate (overall error rate) is a main
accuracy measure
𝑒𝑟𝑟 =
𝑛0,1 + 𝑛1,0
𝑛0,0 + 𝑛0,1 + 𝑛1,0 + 𝑛1,1
=
𝑛0,1 + 𝑛1,0
𝑛
• Overall accuracy:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 − 𝑒𝑟𝑟 =
𝑛0,0 + 𝑛1,1
𝑛
Accuracy Measures

20
1. Choose the metrics that makes sense for the application
2. Evaluate the metrics for both training and testing datasets
3. Check if your model is overfitting or underfitting
4. Monitor the model over time
5. Your best model may not be the best model
Things to remember when choosing metrics

22
The Process
Data
cleansing
Feature
Engineering
Training
and Testing
Model
building
Model
selection

23
• What transformations do I need for the x and y variables ?
• Which are the best features to use?
▫ Dimension Reduction – PCA
▫ Best subset selection
 Forward selection
 Backward elimination
 Stepwise regression
See:
http://scikit-learn.org/stable/modules/feature_selection.html
Feature Engineering for Regression models

24
• Number of features in a tree : max_features
• Number of trees: n_estimators
• Min number of data elements in a leaf: main_sample_leaf
• Number of processors to use: n_jobs
See: http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomFores
tRegressor.html
Fine tuning Random forest models

25
• Parameters
▫ Number of layers, nodes in each layer etc.
• Hyper parameters
▫ Learning rate
▫ Optimization algorithms
▫ Regularization
▫ Activation function
Finetuning Neural Network Models

26
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4

Thank you for attending Day 3!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
27

Ds for finance day 3

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ds for finance day 3

Similar to Ds for finance day 3 (20)

More from QuantUniversity

More from QuantUniversity (20)

Recently uploaded

Recently uploaded (20)

Ds for finance day 3