SlideShare a Scribd company logo
1 of 27
Download to read offline
Location:
#BostonFintechWeek
Babson College
Boston Campus
Data Science for Finance Crash Course
Day 3
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
Sri.krishnamurthy@qusandbox.com
www.analyticscertificate.com
2
Slides & Materials will be available at:
https://researchhub.qusandbox.com
MODULE 1:
• Data Science in Finance
Orientation on the Credit risk case study
Lab 1:
Exploring Data sets to make sense in Python
MODULE 2:
• Machine Learning in 30 minutes!
Lab 2:Credit risk case study
Building your first model
Agenda
MODULE 3:
• Evaluating machine learning models: The metrics
Lab 3:Credit risk case study
Understanding and tuning your model
MODULE 4:
• Deployment of machine learning models and Prediction through AP
Lab 4:Credit risk case study
Deploying your model and predicting interest rates
Agenda
5
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4
6
7
Types of algorithms
Machine
learning
Supervised
Learning
Prediction
Classification
Unsupervised
Learning
Clustering
8
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y
9
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
10
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
𝑌 = 𝛽0 + 𝛽1 𝑋1
Linear Regression Model Neural network Model
11
• Non-Parametric models
▫ No functional form assumed
• Examples : Random Forest
Supervised Learning models
https://commons.wikimedia.org/wiki/File:Random_forest_diag
ram_complete.png
12
13
14
Evaluating
Machine
learning
algorithms
Supervised -
Prediction
R-square RMS MAE MAPE
Supervised-
Classification
Confusion Matrix ROC Curves
Evaluation framework
15
• The prediction error for record i is defined as the difference
between its actual y value and its predicted y value
𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖
• 𝑅2
indicates how well data fits the statistical model
𝑅2
= 1 −
σ𝑖=1
𝑛
(𝑦𝑖 − ො𝑦𝑖)2
σ𝑖=1
𝑛
(𝑦𝑖 − ത𝑦𝑖)2
Prediction Accuracy Measures
16
• Fit measures in classical regression modeling:
• Adjusted 𝑅2 has been adjusted for the number of predictors. It increases only
when the improve of model is more than one would expect to see by chance
(p is the total number of explanatory variables)
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 −
Τσ𝑖=1
𝑛
(𝑦𝑖 − ො𝑦𝑖)2
(𝑛 − 𝑝 − 1)
σ𝑖=1
𝑛
𝑦𝑖 − ത𝑦𝑖
2 /(𝑛 − 1)
• MAE or MAD (mean absolute error/deviation) gives the magnitude of the
average absolute error
𝑀𝐴𝐸 =
σ𝑖=1
𝑛
𝑒𝑖
𝑛
Prediction Accuracy Measures
17
▫ MAPE (mean absolute percentage error) gives a percentage score of
how predictions deviate on average
𝑀𝐴𝑃𝐸 =
σ𝑖=1
𝑛
𝑒𝑖/𝑦𝑖
𝑛
× 100%
• RMSE (root-mean-squared error) is computed on the training and
validation data
𝑅𝑀𝑆𝐸 = 1/𝑛 ෍
𝑖=1
𝑛
𝑒𝑖
2
Prediction Accuracy Measures
18
• Consider a two-class case with classes 𝐶0 and 𝐶1
• Classification matrix:
Classification matrix
Predicted Class
Actual Class 𝐶0 𝐶1
𝐶0
𝑛0,0= number of 𝐶0 cases
classified correctly
𝑛0,1= number of 𝐶0 cases
classified incorrectly as 𝐶1
𝐶1
𝑛1,0= number of 𝐶1 cases
classified incorrectly as 𝐶0
𝑛1,1= number of 𝐶1 cases
classified correctly
19
• Estimated misclassification rate (overall error rate) is a main
accuracy measure
𝑒𝑟𝑟 =
𝑛0,1 + 𝑛1,0
𝑛0,0 + 𝑛0,1 + 𝑛1,0 + 𝑛1,1
=
𝑛0,1 + 𝑛1,0
𝑛
• Overall accuracy:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 − 𝑒𝑟𝑟 =
𝑛0,0 + 𝑛1,1
𝑛
Accuracy Measures
20
1. Choose the metrics that makes sense for the application
2. Evaluate the metrics for both training and testing datasets
3. Check if your model is overfitting or underfitting
4. Monitor the model over time
5. Your best model may not be the best model
Things to remember when choosing metrics
21
22
The Process
Data
cleansing
Feature
Engineering
Training
and Testing
Model
building
Model
selection
23
• What transformations do I need for the x and y variables ?
• Which are the best features to use?
▫ Dimension Reduction – PCA
▫ Best subset selection
 Forward selection
 Backward elimination
 Stepwise regression
See:
http://scikit-learn.org/stable/modules/feature_selection.html
Feature Engineering for Regression models
24
• Number of features in a tree : max_features
• Number of trees: n_estimators
• Min number of data elements in a leaf: main_sample_leaf
• Number of processors to use: n_jobs
See: http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomFores
tRegressor.html
Fine tuning Random forest models
25
• Parameters
▫ Number of layers, nodes in each layer etc.
• Hyper parameters
▫ Learning rate
▫ Optimization algorithms
▫ Regularization
▫ Activation function
Finetuning Neural Network Models
26
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4
Thank you for attending Day 3!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
27

More Related Content

What's hot

Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Sri Ambati
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Simplilearn
 

What's hot (20)

Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
 
A Kaggle Talk
A Kaggle TalkA Kaggle Talk
A Kaggle Talk
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Machine Learning Project - Neural Network
Machine Learning Project - Neural Network Machine Learning Project - Neural Network
Machine Learning Project - Neural Network
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop
 
Modular Machine Learning for Model Validation
Modular Machine Learning for Model ValidationModular Machine Learning for Model Validation
Modular Machine Learning for Model Validation
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
 
MachineLearning_AishwaryaCR
MachineLearning_AishwaryaCRMachineLearning_AishwaryaCR
MachineLearning_AishwaryaCR
 
Musings of kaggler
Musings of kagglerMusings of kaggler
Musings of kaggler
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Presentation: Ad-Click Prediction, A Data-Intensive Problem
Presentation: Ad-Click Prediction, A Data-Intensive ProblemPresentation: Ad-Click Prediction, A Data-Intensive Problem
Presentation: Ad-Click Prediction, A Data-Intensive Problem
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformFreenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning Platform
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
 
Demystifying Data Science
Demystifying Data ScienceDemystifying Data Science
Demystifying Data Science
 

Similar to Ds for finance day 3

MSPresentation_Spring2011
MSPresentation_Spring2011MSPresentation_Spring2011
MSPresentation_Spring2011
Shaun Smith
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 

Similar to Ds for finance day 3 (20)

Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 
Predictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter OptimizationPredictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter Optimization
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
 
Session 6.pdf
Session 6.pdfSession 6.pdf
Session 6.pdf
 
Session 6.pdf
Session 6.pdfSession 6.pdf
Session 6.pdf
 
MSPresentation_Spring2011
MSPresentation_Spring2011MSPresentation_Spring2011
MSPresentation_Spring2011
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Mtc strategy-briefing-houston-pd m-05212018-3
Mtc strategy-briefing-houston-pd m-05212018-3Mtc strategy-briefing-houston-pd m-05212018-3
Mtc strategy-briefing-houston-pd m-05212018-3
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 

More from QuantUniversity

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 

More from QuantUniversity (20)

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
 
The API Jungle
The API JungleThe API Jungle
The API Jungle
 
Explainable AI Workshop
Explainable AI WorkshopExplainable AI Workshop
Explainable AI Workshop
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
Qwafafew meeting 5
Qwafafew meeting 5Qwafafew meeting 5
Qwafafew meeting 5
 

Recently uploaded

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 

Recently uploaded (20)

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Ds for finance day 3

  • 1. Location: #BostonFintechWeek Babson College Boston Campus Data Science for Finance Crash Course Day 3 2018 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP Sri.krishnamurthy@qusandbox.com www.analyticscertificate.com
  • 2. 2 Slides & Materials will be available at: https://researchhub.qusandbox.com
  • 3. MODULE 1: • Data Science in Finance Orientation on the Credit risk case study Lab 1: Exploring Data sets to make sense in Python MODULE 2: • Machine Learning in 30 minutes! Lab 2:Credit risk case study Building your first model Agenda
  • 4. MODULE 3: • Evaluating machine learning models: The metrics Lab 3:Credit risk case study Understanding and tuning your model MODULE 4: • Deployment of machine learning models and Prediction through AP Lab 4:Credit risk case study Deploying your model and predicting interest rates Agenda
  • 5. 5 Data pre- processing & EDA Building a Machine Learning model Evaluating different models and model selection Deploying your model in production Recap Day 1 Day 2 Day 3 Day 4
  • 6. 6
  • 8. 8 • Supervised Algorithms ▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  • 9. 9 • Unsupervised Algorithms ▫ Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  • 10. 10 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Supervised Learning models - Prediction 𝑌 = 𝛽0 + 𝛽1 𝑋1 Linear Regression Model Neural network Model
  • 11. 11 • Non-Parametric models ▫ No functional form assumed • Examples : Random Forest Supervised Learning models https://commons.wikimedia.org/wiki/File:Random_forest_diag ram_complete.png
  • 12. 12
  • 13. 13
  • 14. 14 Evaluating Machine learning algorithms Supervised - Prediction R-square RMS MAE MAPE Supervised- Classification Confusion Matrix ROC Curves Evaluation framework
  • 15. 15 • The prediction error for record i is defined as the difference between its actual y value and its predicted y value 𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖 • 𝑅2 indicates how well data fits the statistical model 𝑅2 = 1 − σ𝑖=1 𝑛 (𝑦𝑖 − ො𝑦𝑖)2 σ𝑖=1 𝑛 (𝑦𝑖 − ത𝑦𝑖)2 Prediction Accuracy Measures
  • 16. 16 • Fit measures in classical regression modeling: • Adjusted 𝑅2 has been adjusted for the number of predictors. It increases only when the improve of model is more than one would expect to see by chance (p is the total number of explanatory variables) 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 − Τσ𝑖=1 𝑛 (𝑦𝑖 − ො𝑦𝑖)2 (𝑛 − 𝑝 − 1) σ𝑖=1 𝑛 𝑦𝑖 − ത𝑦𝑖 2 /(𝑛 − 1) • MAE or MAD (mean absolute error/deviation) gives the magnitude of the average absolute error 𝑀𝐴𝐸 = σ𝑖=1 𝑛 𝑒𝑖 𝑛 Prediction Accuracy Measures
  • 17. 17 ▫ MAPE (mean absolute percentage error) gives a percentage score of how predictions deviate on average 𝑀𝐴𝑃𝐸 = σ𝑖=1 𝑛 𝑒𝑖/𝑦𝑖 𝑛 × 100% • RMSE (root-mean-squared error) is computed on the training and validation data 𝑅𝑀𝑆𝐸 = 1/𝑛 ෍ 𝑖=1 𝑛 𝑒𝑖 2 Prediction Accuracy Measures
  • 18. 18 • Consider a two-class case with classes 𝐶0 and 𝐶1 • Classification matrix: Classification matrix Predicted Class Actual Class 𝐶0 𝐶1 𝐶0 𝑛0,0= number of 𝐶0 cases classified correctly 𝑛0,1= number of 𝐶0 cases classified incorrectly as 𝐶1 𝐶1 𝑛1,0= number of 𝐶1 cases classified incorrectly as 𝐶0 𝑛1,1= number of 𝐶1 cases classified correctly
  • 19. 19 • Estimated misclassification rate (overall error rate) is a main accuracy measure 𝑒𝑟𝑟 = 𝑛0,1 + 𝑛1,0 𝑛0,0 + 𝑛0,1 + 𝑛1,0 + 𝑛1,1 = 𝑛0,1 + 𝑛1,0 𝑛 • Overall accuracy: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 − 𝑒𝑟𝑟 = 𝑛0,0 + 𝑛1,1 𝑛 Accuracy Measures
  • 20. 20 1. Choose the metrics that makes sense for the application 2. Evaluate the metrics for both training and testing datasets 3. Check if your model is overfitting or underfitting 4. Monitor the model over time 5. Your best model may not be the best model Things to remember when choosing metrics
  • 21. 21
  • 23. 23 • What transformations do I need for the x and y variables ? • Which are the best features to use? ▫ Dimension Reduction – PCA ▫ Best subset selection  Forward selection  Backward elimination  Stepwise regression See: http://scikit-learn.org/stable/modules/feature_selection.html Feature Engineering for Regression models
  • 24. 24 • Number of features in a tree : max_features • Number of trees: n_estimators • Min number of data elements in a leaf: main_sample_leaf • Number of processors to use: n_jobs See: http://scikit- learn.org/stable/modules/generated/sklearn.ensemble.RandomFores tRegressor.html Fine tuning Random forest models
  • 25. 25 • Parameters ▫ Number of layers, nodes in each layer etc. • Hyper parameters ▫ Learning rate ▫ Optimization algorithms ▫ Regularization ▫ Activation function Finetuning Neural Network Models
  • 26. 26 Data pre- processing & EDA Building a Machine Learning model Evaluating different models and model selection Deploying your model in production Recap Day 1 Day 2 Day 3 Day 4
  • 27. Thank you for attending Day 3! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 27