ML Credit Scoring App Uses RF for Winners-Losers

•

0 likes•135 views

eurosigdoc acm

Frederico Innocenti Miguel Albergaria Claudio Napoli Iacopo Fiorentino

Economy & Finance

Machine Learning Application:
Credit Scoring
Programming Techniques
Professor Carlos Costa
Master in Mathematical Finance
Federico Innocenti 53251
Miguel Albergaria 48547
Claudio Napoli 53358
Iacopo Fiorentino 53315 Lisbon, December 11th
2019

Context
► The data is collected from Thomson Reuters from firms
included in the main stock indexes.
► The goal is to set a score of a company to decide
whether to give a loan or not to that firm based on a
client’s probability of default.
► For that we compute many ratios and at the end we
want to “differentiate winners from losers”.

Data preparation
► Importing data, checking the type of data and
clearing missing values;
► Correlation matrix;
► See how the data is distributed through graphs;
► Rearranging the data clearing very low values
and very high values, i.e., outliers.
► After all of that, we did the correlation matrix
and graphs again to compare them and to have
a better view of our results.

Modelling data
► Our data doesn´t have a probability of default, so we need to create one.
► In order to compute the machine learning approach we use:
► Supervised learning: logistic regression and random forest
► Unsupervised learning: clustering K-mean
► We decided to use a financial scorecard, in order to give a certain score to
different ratios.

Setting the score
► Relevant ratios: current ratio, debt ratio, equity to asset ratio, debt to
equity ratio, return on asset, return on equity, long term coverage ratio and
asset turnover ratio.

► The company’s goal is to obtain the highest score that we compute in the
way showed before. An example of the code is shown here:
► The final score is set by adding all of the “ratios’ scores”.

Evaluation
► For the evaluation of our model we compute a confusion matrix in order to
see the result and have an easier first parametre to compare the three
models.
► After setting the score we binarize the score being 1 the lowest probability
of default and 0 the highest. We chose as threshold a score of 500 points and
then we proceed to the evaluation.

Logistic Regression
► We leave the set of the logistic
regression in default mode with
a test size of 0.7.
► The final result is good with a
AUC of 0.75, which means that
it is a good model distinguishing
the given classes.
► But there is a problem!
► The model has a type 2 error. In
other words, it predicts 1 but
actually is 0.
► So the F1 score (measure of
accuracy) is 0.68.

Random Forest
► In order to optimize the process we put the “number of jobs” 150 and the
“number of estimators” is 1 since it is a binomial classification.

► This model achieved a really high AUC: 0.87 and a good F1-Score.
► High precision and high recall means low probability of error type I and II.

K-Mean
► We increased the number of iterations to 400 times in order to optimize this
model and to try to get more stable results.
► The main problem with the K-mean clustering model is that it suffers from a low
precision predicting the default cases (type I error).
► On the other hand it has an acceptable F1-Score and a AUC of 0.80.

Conclusions
► The standardization of the ratio and the cleaning of the data gets the models
to have a high AUC on the three models.
► The better model is the Random Forest, getting a better AUC result.
► We confirm that machine learning algorithms are really powerful in analysing
data and it can be helpful to solve this specific problem.

What's hot

Strayer mat 540 week 2 quiz 1 set 3 questions neweyavagal

Supervised learningJohnson Ubah

R operatorsLearnbay Datascience

@elemorfarukElemor Faruk

Tut8 model selectionSchool of Economics, North-West University

Machine learningMike Martinez

Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony

Statistical Learning on Credit DataFiras Obeid

Multiple reg presentationSeth Anandaram Jaipuria College

Use Of Calculus In ProgrammingAfaq Siddiqui

Bank loan purchase modelingSaleesh Satheeshchandran

Array sheet Mahmoud Abuelmagd

House price predictionSabahBegum

Linear Regression Exmailund

Telecom customer churn predictionSaleesh Satheeshchandran

Employee mode of commutingSaleesh Satheeshchandran

Chapter 2universidad industrial de santander

What's hot (18)

Strayer mat 540 week 2 quiz 1 set 3 questions new

Supervised learning

R operators

@elemorfaruk

Tut8 model selection

Machine learning

Machine learning session6(decision trees random forrest)

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...

Statistical Learning on Credit Data

Multiple reg presentation

Use Of Calculus In Programming

Bank loan purchase modeling

Array sheet

House price prediction

Linear Regression Ex

Telecom customer churn prediction

Employee mode of commuting

Chapter 2

Similar to ML Credit Scoring App Uses RF for Winners-Losers

Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei

Py data19 finalMaria Navarro Jiménez

Week14_Business Simulation Modeling MSBA.pptxUsamamalik345378

Machine learning in credit risk modeling : a James white paperJames by CrowdProcess

MIS637_Final_Project_Rahul_BhatiaRahul Bhatia

Detection of credit card fraudBastiaan Frerix

Computational Finance Introductory LectureStuart Gordon Reid

Statistical Learning and Model Selection (1).pptxrajalakshmi5921

Logistic_regression_ML.pdfCHINTASAISIRI20BCE73

Chapter 04 Tuul Tuul

Decision Tree and Bayesian ClassificationKomal Kotak

Machine Learning Approach.pptxCYPatrickKwee

Machine-Learning-Overview a statistical approachAjit Ghodke

cas_washington_nov2010_webYanwei (Wayne) Zhang

Study on Evaluation of Venture Capital Based onInteractive Projection Algorithminventionjournals

Ch08 ci estimationMohamed Elias

The following calendar-year information is taken from the December.docxcherry686017

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc

WEKA:Credibility Evaluating Whats Been Learnedweka Content

Similar to ML Credit Scoring App Uses RF for Winners-Losers (20)

Accurate Campaign Targeting Using Classification Algorithms

Py data19 final

Week14_Business Simulation Modeling MSBA.pptx

Machine learning in credit risk modeling : a James white paper

MIS637_Final_Project_Rahul_Bhatia

Detection of credit card fraud

Computational Finance Introductory Lecture

Statistical Learning and Model Selection (1).pptx

Logistic_regression_ML.pdf

Chapter 04

Decision Tree and Bayesian Classification

Machine Learning Approach.pptx

Machine-Learning-Overview a statistical approach

cas_washington_nov2010_web

Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm

Ch08 ci estimation

The following calendar-year information is taken from the December.docx

Decoding Loan Approval: Predictive Modeling in Action

WEKA: Credibility Evaluating Whats Been Learned

WEKA:Credibility Evaluating Whats Been Learned

Recently uploaded

The Economic History of the U.S. Lecture 25.pdfGale Pooley

Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile

06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdfFinTech Belgium

The Economic History of the U.S. Lecture 19.pdfGale Pooley

The Economic History of the U.S. Lecture 22.pdfGale Pooley

Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )Pooja Nehwal

CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual serviceanilsa9823

Basic concepts related to Financial modellingbaijup5

Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Call Girls in Nagpur High Profile

Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Pooja Nehwal

Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...ssifa0344

VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...dipikadinghjn ( Why You Choose Us? ) Escorts

(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7Call Girls in Nagpur High Profile Call Girls

VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...dipikadinghjn ( Why You Choose Us? ) Escorts

02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptxFinTech Belgium

(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...dipikadinghjn ( Why You Choose Us? ) Escorts

Veritas Interim Report 1 January–31 March 2024Veritas Eläkevakuutus - Veritas Pensionsförsäkring

The Economic History of the U.S. Lecture 23.pdfGale Pooley

Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Recently uploaded (20)

The Economic History of the U.S. Lecture 25.pdf

Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...

06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf

The Economic History of the U.S. Lecture 19.pdf

The Economic History of the U.S. Lecture 22.pdf

Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )

CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual service

Basic concepts related to Financial modelling

Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...

Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...

Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...

VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...

(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7

VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...

02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx

(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...

Veritas Interim Report 1 January–31 March 2024

The Economic History of the U.S. Lecture 23.pdf

Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking

ML Credit Scoring App Uses RF for Winners-Losers

1. Machine Learning Application: Credit Scoring Programming Techniques Professor Carlos Costa Master in Mathematical Finance Federico Innocenti 53251 Miguel Albergaria 48547 Claudio Napoli 53358 Iacopo Fiorentino 53315 Lisbon, December 11th 2019

2. Context ► The data is collected from Thomson Reuters from firms included in the main stock indexes. ► The goal is to set a score of a company to decide whether to give a loan or not to that firm based on a client’s probability of default. ► For that we compute many ratios and at the end we want to “differentiate winners from losers”.

3. Data preparation ► Importing data, checking the type of data and clearing missing values; ► Correlation matrix; ► See how the data is distributed through graphs; ► Rearranging the data clearing very low values and very high values, i.e., outliers. ► After all of that, we did the correlation matrix and graphs again to compare them and to have a better view of our results.

4. Modelling data ► Our data doesn´t have a probability of default, so we need to create one. ► In order to compute the machine learning approach we use: ► Supervised learning: logistic regression and random forest ► Unsupervised learning: clustering K-mean ► We decided to use a financial scorecard, in order to give a certain score to different ratios.

5. Setting the score ► Relevant ratios: current ratio, debt ratio, equity to asset ratio, debt to equity ratio, return on asset, return on equity, long term coverage ratio and asset turnover ratio.

6. ► The company’s goal is to obtain the highest score that we compute in the way showed before. An example of the code is shown here: ► The final score is set by adding all of the “ratios’ scores”.

7. Evaluation ► For the evaluation of our model we compute a confusion matrix in order to see the result and have an easier first parametre to compare the three models. ► After setting the score we binarize the score being 1 the lowest probability of default and 0 the highest. We chose as threshold a score of 500 points and then we proceed to the evaluation.

8. Logistic Regression ► We leave the set of the logistic regression in default mode with a test size of 0.7. ► The final result is good with a AUC of 0.75, which means that it is a good model distinguishing the given classes. ► But there is a problem! ► The model has a type 2 error. In other words, it predicts 1 but actually is 0. ► So the F1 score (measure of accuracy) is 0.68.

9. Random Forest ► In order to optimize the process we put the “number of jobs” 150 and the “number of estimators” is 1 since it is a binomial classification.

10. ► This model achieved a really high AUC: 0.87 and a good F1-Score. ► High precision and high recall means low probability of error type I and II.

11. K-Mean ► We increased the number of iterations to 400 times in order to optimize this model and to try to get more stable results. ► The main problem with the K-mean clustering model is that it suffers from a low precision predicting the default cases (type I error). ► On the other hand it has an acceptable F1-Score and a AUC of 0.80.

12. Conclusions ► The standardization of the ratio and the cleaning of the data gets the models to have a high AUC on the three models. ► The better model is the Random Forest, getting a better AUC result. ► We confirm that machine learning algorithms are really powerful in analysing data and it can be helpful to solve this specific problem.

ML Credit Scoring App Uses RF for Winners-Losers

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to ML Credit Scoring App Uses RF for Winners-Losers

Similar to ML Credit Scoring App Uses RF for Winners-Losers (20)

More from eurosigdoc acm

More from eurosigdoc acm (20)

Recently uploaded

Recently uploaded (20)

ML Credit Scoring App Uses RF for Winners-Losers