3. MODULE 1:
• Data Science in Finance
Orientation on the Credit risk case study
Lab 1:
Exploring Data sets to make sense in Python
MODULE 2:
• Machine Learning in 30 minutes!
Lab 2:Credit risk case study
Building your first model
Agenda
4. MODULE 3:
• Evaluating machine learning models: The metrics
Lab 3:Credit risk case study
Understanding and tuning your model
MODULE 4:
• Deployment of machine learning models and Prediction through AP
Lab 4:Credit risk case study
Deploying your model and predicting interest rates
Agenda
5. 5
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4
8. 8
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y
9. 9
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
10. 10
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
𝑌 = 𝛽0 + 𝛽1 𝑋1
Linear Regression Model Neural network Model
11. 11
• Non-Parametric models
▫ No functional form assumed
• Examples : Random Forest
Supervised Learning models
https://commons.wikimedia.org/wiki/File:Random_forest_diag
ram_complete.png
15. 15
• The prediction error for record i is defined as the difference
between its actual y value and its predicted y value
𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖
• 𝑅2
indicates how well data fits the statistical model
𝑅2
= 1 −
σ𝑖=1
𝑛
(𝑦𝑖 − ො𝑦𝑖)2
σ𝑖=1
𝑛
(𝑦𝑖 − ത𝑦𝑖)2
Prediction Accuracy Measures
16. 16
• Fit measures in classical regression modeling:
• Adjusted 𝑅2 has been adjusted for the number of predictors. It increases only
when the improve of model is more than one would expect to see by chance
(p is the total number of explanatory variables)
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 −
Τσ𝑖=1
𝑛
(𝑦𝑖 − ො𝑦𝑖)2
(𝑛 − 𝑝 − 1)
σ𝑖=1
𝑛
𝑦𝑖 − ത𝑦𝑖
2 /(𝑛 − 1)
• MAE or MAD (mean absolute error/deviation) gives the magnitude of the
average absolute error
𝑀𝐴𝐸 =
σ𝑖=1
𝑛
𝑒𝑖
𝑛
Prediction Accuracy Measures
17. 17
▫ MAPE (mean absolute percentage error) gives a percentage score of
how predictions deviate on average
𝑀𝐴𝑃𝐸 =
σ𝑖=1
𝑛
𝑒𝑖/𝑦𝑖
𝑛
× 100%
• RMSE (root-mean-squared error) is computed on the training and
validation data
𝑅𝑀𝑆𝐸 = 1/𝑛
𝑖=1
𝑛
𝑒𝑖
2
Prediction Accuracy Measures
18. 18
• Consider a two-class case with classes 𝐶0 and 𝐶1
• Classification matrix:
Classification matrix
Predicted Class
Actual Class 𝐶0 𝐶1
𝐶0
𝑛0,0= number of 𝐶0 cases
classified correctly
𝑛0,1= number of 𝐶0 cases
classified incorrectly as 𝐶1
𝐶1
𝑛1,0= number of 𝐶1 cases
classified incorrectly as 𝐶0
𝑛1,1= number of 𝐶1 cases
classified correctly
20. 20
1. Choose the metrics that makes sense for the application
2. Evaluate the metrics for both training and testing datasets
3. Check if your model is overfitting or underfitting
4. Monitor the model over time
5. Your best model may not be the best model
Things to remember when choosing metrics
23. 23
• What transformations do I need for the x and y variables ?
• Which are the best features to use?
▫ Dimension Reduction – PCA
▫ Best subset selection
Forward selection
Backward elimination
Stepwise regression
See:
http://scikit-learn.org/stable/modules/feature_selection.html
Feature Engineering for Regression models
24. 24
• Number of features in a tree : max_features
• Number of trees: n_estimators
• Min number of data elements in a leaf: main_sample_leaf
• Number of processors to use: n_jobs
See: http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomFores
tRegressor.html
Fine tuning Random forest models
25. 25
• Parameters
▫ Number of layers, nodes in each layer etc.
• Hyper parameters
▫ Learning rate
▫ Optimization algorithms
▫ Regularization
▫ Activation function
Finetuning Neural Network Models
26. 26
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4
27. Thank you for attending Day 3!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
27