SlideShare a Scribd company logo
Home Credit Default Risk
Can you predict how capable
each applicant is of repaying
a loan?
Cha ho seong
CONTENTS
01 About Home Credit
02 Data
03 Data analysis
04 model evaluation
05 model improvements
06 conclusion
About Home Credit
01
01About Home Credit
Founded in 1997,
Home Credit Group is an international consumer finance provider with operations in 11 countries.
They focus on responsible lending primarily to people with little or no credit history.
Their services are simple, easy and fast.
Data
02
02 Dataset
• There are both categorical data, numeric data, and character data.
• There are many variables.
checklist
02 Label Data
Imbalanced class problem exist!
checklist
02 EDA - integer type
There are many columns with extreme value.
02 EDA - integer type
• HOUR_APPR_PROCESS_START and TARGET
• CNT_CHILDREN and TARGET
Not repaid rate at which Loans are
processed in the morning hours seems
to be high.
As CNT_CHILDREN increases, not repaid ratio tends to increase
02 EDA – character type
• FLAG_OWN_CAR and TARGET • FLAG_OWN_REALTY and TARGET
Ownership of a car seems to have more impact on the repayment rate than property ownership.
• NAME_INCOME_TYPE and TARGET
- maternity leave
- Unemployed
• Economic variables
02
• CODE_GENDER and TARGET
• Demographic Variables
• NAME_FAMILY_STATUS and TARGET
• NAME_TYPE_SUITE and TARGET
EDA – character type
02 ED
• EMERGENCYSTATE_MODE and TARGET
EDA – character type
• Social status variable
• etc
• NAME_EDUCATION_TYPE and TARGET
• NAME_CONTRACT_TYPE and TARGET
On Cash loans , repaid rate is much higher
• OCCUPATION_TYPE and TARGET
02 EDEDA – numeric type
• Correlation between variables
Some variables have high correlation
02NULL Values
> mean(is.na(application)) [1] 0.2454355
Columns with null values greater than 50%
Too many null values
we assume that variables with
Large number of null values are
hard to present data
Decided to exclude those variables
Data analysis
03
03 Data analysis
Logistic binary regression
Which model would be appropriate for credit default prediction ?
03 Data analysis
Why Logistic binary regression ?
• few constraints on explanatory variables
• Classification problem
03Data analysis
1. Split data into Train, validation, test
(60%) (20%) (20%)
2. Model selection
Model 1. all variables used
03Data analysis
Multiple collinearity is suspected
Model evaluation
04
03 Data analysis
Prediction
• Confusion matrix
Accuracy : 0.9248
High-accuracy estimated!
But,
Since there is a class imbalance problem,
it is necessary to confirm measurement criteria
other than accuracy.
03 Data analysis
Prediction
• Confusion matrix
sensitivity : 0.0101801
“Sensitivity” measures positive sample rate
accurately classified
True Positive
Positive
13
1277
=
03 Data analysis
AUC : 0.6956109
Prediction
• ROC curve and AUC
Model
improvements
05
05Model improvements
Task1. imbalanced class problem
Task2. Multiple collinearity
Model 2
05Model improvements
Task1. Imbalanced class problem
Try “SMOTE” Which is another sampling method .
It considers sample’s k nearest neighbors (in feature
space) and create a synthetic data point
> table(train$TARGET)
0 1
12521 950
 new_train <- SMOTE(TARGET ~ ., train, perc.over = 600, perc.under = 140)
> table(new_train$TARGET)
0 1
7979 6650
05Model improvements
Task2. Multiple collinearity
AMT_GOODS_PRICE , AMT_CREDIT : 0.9868
AMT_ANNUITY , AMT_GOODS_PRICE :0.7643
YEARS_BEGINEXPLUATATION_AVG, YEARS_BEGINEXPLUATATION_MODE : 0.9482
YEARS_BEGINEXPLUATATION_AVG,, YEARS_BEGINEXPLUATATION_MEDI : 0.9844
FLOORSMAX_MODE, FLOORSMAX_AVG : 0.9858
FLOORSMAX_MEDI, FLOORSMAX_AVG : 0.9971
YEARS_BEGINEXPLUATATION_MEDI, YEARS_BEGINEXPLUATATION_MODE : 0.9242
FLOORSMAX_MODE, FLOORSMAX_MEDI : 0.9883
FLOORSMAX_MODE, TOTALAREA_MODE : 0.6232
OBS_60_CNT_SOCIAL_CIRCLE, OBS_30_CNT_SOCIAL_CIRCLE : 0.9987
DEF_60_CNT_SOCIAL_CIRCLE , DEF_30_CNT_SOCIAL_CIRCLE : 0.8658
Excluding highly correlated independent variables
05Model improvements
binary logistic regression seeks directly to minimize the sum of squared deviance residuals.
It is the deviance residuals which are implied in the ML algorithm of the regression.
As you will see from the back, that fact is a factor of lowering accuracy.
05Model improvements
Accuracy : 0.948 -> 0.825 decrease
Sensitivity : 0.01 -> 0.33 increase !
05Model improvements
AUC : 0.6680598
Prediction
• ROC curve and AUC
Conclusion
06
06Conclusion
Task1. imbalanced class problem
-> SMOTE
Task2. Multiple collinearity
-> Exclude variables with high Correlation
Accuracy decreased
Sensitivity increased
AUC decreased
That's why sensitivity matters.
Despite the decrease in other criteria
Clearly there is no absolute criterion.
Especially , sensitivity is often used when the cost of prediction is high.
06Conclusion
Further study…
- In order to improve accuracy,
applying Other prediction model ( Decision tree, ANN , SVM)
- handling NA values without excluding
Thank you

More Related Content

What's hot

Credit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
Credit EDA Case Study : Exploratory Data Analysis on Bank Loan DataCredit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
Credit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
PRABHASH GOKARN
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
ALTEN Calsoft Labs
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
Tuhin AI Advisory
 
Depository institutions
Depository institutionsDepository institutions
Depository institutions
or12ange
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Master Card Money Send
Master Card Money Send Master Card Money Send
Introduction to Wealth Management Industry by Miles Software
Introduction to Wealth Management Industry by Miles SoftwareIntroduction to Wealth Management Industry by Miles Software
Introduction to Wealth Management Industry by Miles Software
Miles_Software123
 
Transforming Retail Banking Processes: Four Modular Process Models Are Emerging
Transforming Retail Banking Processes: Four Modular Process Models Are EmergingTransforming Retail Banking Processes: Four Modular Process Models Are Emerging
Transforming Retail Banking Processes: Four Modular Process Models Are Emerging
Richard Claassens CIPPE
 
Money creation
Money creationMoney creation
Money creation
Alicia Ross
 
Open banking-Future of Banking
Open banking-Future of BankingOpen banking-Future of Banking
Open banking-Future of Banking
farhan ali
 
Commercial Lending
Commercial LendingCommercial Lending
Commercial Lending
Michael Gove
 
Overview of Data Analytics in Lending Business
Overview of Data Analytics in Lending BusinessOverview of Data Analytics in Lending Business
Overview of Data Analytics in Lending Business
Sanjay Kar
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
Alexandre Pinto
 
Banking system ppt
Banking system pptBanking system ppt
Banking system ppt
Lohith Lohi
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
Mithul Murugaadev
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
Alibaba Cloud
 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
Pankaj Baid
 
Commercial Lending By Banks.pptx
Commercial Lending By Banks.pptxCommercial Lending By Banks.pptx
Commercial Lending By Banks.pptx
ArslanAkram52
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
Vikas Virani
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 

What's hot (20)

Credit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
Credit EDA Case Study : Exploratory Data Analysis on Bank Loan DataCredit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
Credit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 
Depository institutions
Depository institutionsDepository institutions
Depository institutions
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Master Card Money Send
Master Card Money Send Master Card Money Send
Master Card Money Send
 
Introduction to Wealth Management Industry by Miles Software
Introduction to Wealth Management Industry by Miles SoftwareIntroduction to Wealth Management Industry by Miles Software
Introduction to Wealth Management Industry by Miles Software
 
Transforming Retail Banking Processes: Four Modular Process Models Are Emerging
Transforming Retail Banking Processes: Four Modular Process Models Are EmergingTransforming Retail Banking Processes: Four Modular Process Models Are Emerging
Transforming Retail Banking Processes: Four Modular Process Models Are Emerging
 
Money creation
Money creationMoney creation
Money creation
 
Open banking-Future of Banking
Open banking-Future of BankingOpen banking-Future of Banking
Open banking-Future of Banking
 
Commercial Lending
Commercial LendingCommercial Lending
Commercial Lending
 
Overview of Data Analytics in Lending Business
Overview of Data Analytics in Lending BusinessOverview of Data Analytics in Lending Business
Overview of Data Analytics in Lending Business
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
 
Banking system ppt
Banking system pptBanking system ppt
Banking system ppt
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
 
Commercial Lending By Banks.pptx
Commercial Lending By Banks.pptxCommercial Lending By Banks.pptx
Commercial Lending By Banks.pptx
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 

Similar to Credit default risk

Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
IRJET Journal
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
Vasudev pendyala
 
EDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdfEDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdf
SourabhH1
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
Eric Esajian
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
Aniket Patil
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
patilaniket2418
 
Binary Classification Final
Binary Classification FinalBinary Classification Final
Binary Classification Final
Reuben Hilliard
 
Taekeun Kim_Loan default prediction.pptx
Taekeun Kim_Loan default prediction.pptxTaekeun Kim_Loan default prediction.pptx
Taekeun Kim_Loan default prediction.pptx
TaeKeunKim11
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
Boston Institute of Analytics
 
Dte energy driving to best in class retail experience
Dte energy driving to best in class retail experienceDte energy driving to best in class retail experience
Dte energy driving to best in class retail experience
robgirvan
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Boston Institute of Analytics
 
Porto Seguro’s Safe driver prediction
Porto Seguro’s Safe driver predictionPorto Seguro’s Safe driver prediction
Porto Seguro’s Safe driver prediction
Kapil Garg
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
Kunal Kashyap
 
Expanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SASExpanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SAS
Michael Mina
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Boston Institute of Analytics
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
Boston Institute of Analytics
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedin
Asoka Korale
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
Armando Vieira
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
Databricks
 

Similar to Credit default risk (20)

Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
EDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdfEDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdf
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Binary Classification Final
Binary Classification FinalBinary Classification Final
Binary Classification Final
 
Taekeun Kim_Loan default prediction.pptx
Taekeun Kim_Loan default prediction.pptxTaekeun Kim_Loan default prediction.pptx
Taekeun Kim_Loan default prediction.pptx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Dte energy driving to best in class retail experience
Dte energy driving to best in class retail experienceDte energy driving to best in class retail experience
Dte energy driving to best in class retail experience
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Porto Seguro’s Safe driver prediction
Porto Seguro’s Safe driver predictionPorto Seguro’s Safe driver prediction
Porto Seguro’s Safe driver prediction
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Expanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SASExpanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SAS
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedin
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
 

More from chs71

Credit default risk
Credit default riskCredit default risk
Credit default risk
chs71
 
Tensorflow
TensorflowTensorflow
Tensorflow
chs71
 
Pandas
PandasPandas
Pandas
chs71
 
Seoul square[mock project]
Seoul square[mock project]Seoul square[mock project]
Seoul square[mock project]
chs71
 
Learning method
Learning methodLearning method
Learning method
chs71
 
CNN
CNNCNN
CNN
chs71
 
Vip detection sensor
Vip detection sensorVip detection sensor
Vip detection sensor
chs71
 
Share house
Share houseShare house
Share house
chs71
 
Logistic regression1
Logistic regression1Logistic regression1
Logistic regression1
chs71
 
Class imbalance problem1
Class imbalance problem1Class imbalance problem1
Class imbalance problem1
chs71
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
chs71
 

More from chs71 (11)

Credit default risk
Credit default riskCredit default risk
Credit default risk
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
Pandas
PandasPandas
Pandas
 
Seoul square[mock project]
Seoul square[mock project]Seoul square[mock project]
Seoul square[mock project]
 
Learning method
Learning methodLearning method
Learning method
 
CNN
CNNCNN
CNN
 
Vip detection sensor
Vip detection sensorVip detection sensor
Vip detection sensor
 
Share house
Share houseShare house
Share house
 
Logistic regression1
Logistic regression1Logistic regression1
Logistic regression1
 
Class imbalance problem1
Class imbalance problem1Class imbalance problem1
Class imbalance problem1
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
 

Recently uploaded

原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 

Recently uploaded (20)

原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 

Credit default risk

  • 1. Home Credit Default Risk Can you predict how capable each applicant is of repaying a loan? Cha ho seong
  • 2. CONTENTS 01 About Home Credit 02 Data 03 Data analysis 04 model evaluation 05 model improvements 06 conclusion
  • 4. 01About Home Credit Founded in 1997, Home Credit Group is an international consumer finance provider with operations in 11 countries. They focus on responsible lending primarily to people with little or no credit history. Their services are simple, easy and fast.
  • 6. 02 Dataset • There are both categorical data, numeric data, and character data. • There are many variables. checklist
  • 7. 02 Label Data Imbalanced class problem exist! checklist
  • 8. 02 EDA - integer type There are many columns with extreme value.
  • 9. 02 EDA - integer type • HOUR_APPR_PROCESS_START and TARGET • CNT_CHILDREN and TARGET Not repaid rate at which Loans are processed in the morning hours seems to be high. As CNT_CHILDREN increases, not repaid ratio tends to increase
  • 10. 02 EDA – character type • FLAG_OWN_CAR and TARGET • FLAG_OWN_REALTY and TARGET Ownership of a car seems to have more impact on the repayment rate than property ownership. • NAME_INCOME_TYPE and TARGET - maternity leave - Unemployed • Economic variables
  • 11. 02 • CODE_GENDER and TARGET • Demographic Variables • NAME_FAMILY_STATUS and TARGET • NAME_TYPE_SUITE and TARGET EDA – character type
  • 12. 02 ED • EMERGENCYSTATE_MODE and TARGET EDA – character type • Social status variable • etc • NAME_EDUCATION_TYPE and TARGET • NAME_CONTRACT_TYPE and TARGET On Cash loans , repaid rate is much higher • OCCUPATION_TYPE and TARGET
  • 13. 02 EDEDA – numeric type • Correlation between variables Some variables have high correlation
  • 14. 02NULL Values > mean(is.na(application)) [1] 0.2454355 Columns with null values greater than 50% Too many null values we assume that variables with Large number of null values are hard to present data Decided to exclude those variables
  • 16. 03 Data analysis Logistic binary regression Which model would be appropriate for credit default prediction ?
  • 17. 03 Data analysis Why Logistic binary regression ? • few constraints on explanatory variables • Classification problem
  • 18. 03Data analysis 1. Split data into Train, validation, test (60%) (20%) (20%) 2. Model selection Model 1. all variables used
  • 21. 03 Data analysis Prediction • Confusion matrix Accuracy : 0.9248 High-accuracy estimated! But, Since there is a class imbalance problem, it is necessary to confirm measurement criteria other than accuracy.
  • 22. 03 Data analysis Prediction • Confusion matrix sensitivity : 0.0101801 “Sensitivity” measures positive sample rate accurately classified True Positive Positive 13 1277 =
  • 23. 03 Data analysis AUC : 0.6956109 Prediction • ROC curve and AUC
  • 25. 05Model improvements Task1. imbalanced class problem Task2. Multiple collinearity Model 2
  • 26. 05Model improvements Task1. Imbalanced class problem Try “SMOTE” Which is another sampling method . It considers sample’s k nearest neighbors (in feature space) and create a synthetic data point > table(train$TARGET) 0 1 12521 950  new_train <- SMOTE(TARGET ~ ., train, perc.over = 600, perc.under = 140) > table(new_train$TARGET) 0 1 7979 6650
  • 27. 05Model improvements Task2. Multiple collinearity AMT_GOODS_PRICE , AMT_CREDIT : 0.9868 AMT_ANNUITY , AMT_GOODS_PRICE :0.7643 YEARS_BEGINEXPLUATATION_AVG, YEARS_BEGINEXPLUATATION_MODE : 0.9482 YEARS_BEGINEXPLUATATION_AVG,, YEARS_BEGINEXPLUATATION_MEDI : 0.9844 FLOORSMAX_MODE, FLOORSMAX_AVG : 0.9858 FLOORSMAX_MEDI, FLOORSMAX_AVG : 0.9971 YEARS_BEGINEXPLUATATION_MEDI, YEARS_BEGINEXPLUATATION_MODE : 0.9242 FLOORSMAX_MODE, FLOORSMAX_MEDI : 0.9883 FLOORSMAX_MODE, TOTALAREA_MODE : 0.6232 OBS_60_CNT_SOCIAL_CIRCLE, OBS_30_CNT_SOCIAL_CIRCLE : 0.9987 DEF_60_CNT_SOCIAL_CIRCLE , DEF_30_CNT_SOCIAL_CIRCLE : 0.8658 Excluding highly correlated independent variables
  • 28. 05Model improvements binary logistic regression seeks directly to minimize the sum of squared deviance residuals. It is the deviance residuals which are implied in the ML algorithm of the regression. As you will see from the back, that fact is a factor of lowering accuracy.
  • 29. 05Model improvements Accuracy : 0.948 -> 0.825 decrease Sensitivity : 0.01 -> 0.33 increase !
  • 30. 05Model improvements AUC : 0.6680598 Prediction • ROC curve and AUC
  • 32. 06Conclusion Task1. imbalanced class problem -> SMOTE Task2. Multiple collinearity -> Exclude variables with high Correlation Accuracy decreased Sensitivity increased AUC decreased That's why sensitivity matters. Despite the decrease in other criteria Clearly there is no absolute criterion. Especially , sensitivity is often used when the cost of prediction is high.
  • 33. 06Conclusion Further study… - In order to improve accuracy, applying Other prediction model ( Decision tree, ANN , SVM) - handling NA values without excluding