SlideShare a Scribd company logo
1 of 30
GROUP – 7
Dawei Ye
Hrushikesh Basavanahalli
Jobil Joseph
Ryan Curtis
Shu-Feng Tsao
Yijing Liang
Predicting
Credit Card Defaults
OPIM 5640 – Predictive Modeling – Final Assignment
Data Source from
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA
 The Business Problem
 Data - Data source used
 Modeling Methodology Adapted
 Detailing Major steps in the methodology
 Models
 WHY and HOW we choose this model
 Conclusion
Agenda
2
The Business Problem
Can she JUMP?
3
We took a dataset with 30,000 records of c
redit card borrowers with details
about their demography and behavior and
payment patterns
Dataset Source…Kaggle…..
The goal is to build a Predictive
Model that would
predict if a credit card user would
default on the Upcoming payment
with acceptable accuracy
4
A Doctor
a data doctor… with a
diagnostic approach
5
- Data
- Data
- Data
- Data
- Data
Visualize Data
For Initial
Diagnostics
Build Baseline
Model
Logistic Regression Model
Revised the Model
After Each Iteration
Checking improvement
in accuracy
Data
Cleansing
Data
Pre-processing
Exploratory Data
Analytics
Feature / Engineering
Selection
Trying Different Models
6
Data Dictionary
Data contains a binary variable, default payment (Yes = 1, No = 0), as the response variable. Total
25 variables in the dataset including response variable.
ID: ID of each client
LIMIT_BAL: Amount of the given credit (NT dollar): it includes both the individual consumer credit a
nd his/her family (supplementary) credit.
SEX: Gender (1 = male; 2 = female).
EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
MARRIAGE: Marital status (1 = married; 2 = single; 3 = others).
AGE: Age (year).
PAY_0 – PAY_6: History of past payment. The measurement scale for
the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for t
wo months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and
above.
BILL_AMT1-BILL_AMT6: Amount of bill statement (NT dollar). BILL_AMT1 = amount of bill statemen
t in September, 2005; BILL_AMT2 = amount of bill statement in August, 2005; . . .; BILL_AMT6 = am
ount of bill statement in April, 2005.
PAY_AMT1-PAY_AMT6: Amount of previous payment (NT dollar). PAY_AMT1 = amount paid in Sep
tember, 2005; PAY_AMT2 = amount paid in August, 2005; . . .;PAY-AMT6 = amount paid in April, 20
05.
7
Data Overview for baseline model
Data summary
Total 30000 records
Summary Details about Training and V
alidation dataset used for the model.
We used a 70:30 split using stratified
random sampling.
Total number of records in training
data: 21000
Total positive cases(default/1) in traini
ng data: 4666
Total negative cases(non-default/0) in
training data: 16334
Total number of records in validatio
n data: 9000
Total positive cases(default/1) in valida
tion data: 1970
Total negative cases(non-default/0) in
validation data: 7030
8
Baseline model-Logistic Regression
Baseline Evaluation
AUC on ROC
Curve Benchmark
Training 72.27%
Validation 72.70%
9
Data Cleansing & Preprocessing
Variable Initial Type Type Changed to
1 SEX: numerical continuous character nominal
2 EDUCATION: numerical continuous character nominal
3 MARRIAGE: numerical continuous character nominal
4 PAY_0: numerical continuous character nominal
5 PAY_2: numerical continuous character nominal
6 PAY_3: numerical continuous character nominal
7 PAY_4: numerical continuous character nominal
8 PAY_5: numerical continuous character nominal
9 PAY_6: numerical continuous character nominal
Variable Transformation
10 BILL_AMT1 to BILL_AMT6 Standardized the Variable to scale
11 PAY_AMT1 to PAY_AMT6 Standardized the Variable to scale
12 LIMIT_BAL Standardized the Variable to scale
13 Age Standardized the Variable to scale
10
Benchmark Check-Post Preprocessing
AUC on RO
C Curve Benchmark
Pre
Processing
Gain on
Benchmark
Training 72.27% 77.06% 4.79%
Validation 72.70% 77.68% 4.98%
11
Data Visualization / Exploration
Default by Education Level
Imputed
Grad
School
Bachelors
High School
Other
Other
Other
12
Default by Age
21-27
27-31
31-37
37-43
43-79
Data Visualization / Exploration
13
Default by Marriage Status
Married
Single
Other
Imputed
@source
Data Visualization / Exploration
14
Scatter plot across Bill Amount and Pay Amount
There is HIGH Correlation between Bill Amount
(Value of monthly bill) across SIX months
However, there is LOW correlation between
Payment pattern across SIX months
Data Visualization / Exploration
15
Feature Engineering
Three Categories [pay_*, bill_amt*, pay_amt*] of variables shows behavioral patterns
across SIX months
To extract the aggregated pattern across SIX months, we derived FOUR new Variables
from the above Three Categories of variables.
Field Name Description
AMT_OWED
Running or cumulative sum of bill amount - payment amount fo
r each individual
AVG_6MTH_OWED Mean value of AMT_OWED over a 6-month period
MISSED_PAYMENTS
Maximum number of missed payments recorded for the individu
al
BALANCE_TO_LIMIT_RATIO
Average 6-month balance divided by the individual’s credit limit;
note anything <= 0.3 is considered good
16
Missed Payment by Gender
Data Visualization / Exploration…..
17
Missed Payments by Education Level
Data Visualization / Exploration
18
To understand the underlying structure of data, we
did a cluster analysis using hierarchical method
Six different cluster
groups were identified
The Cluster value was
added as NEW
variable to the data
In total FIVE new variable
s were added to the
dataset
FOUR derived variables –
which were standardized.
One Variable representing
SIX clusters – type casted
to character nominal
variable
Data Visualization - Clustering
19
Revised Benchmark
AUC on R
OC Curve
Benchma
rk
Pre
Processing
Feature
Engineering
Gain on
Benchma
rk
Training 72.27% 77.06% 77.15% 4.85%
Validatio
n 72.70% 77.68% 77.78% 4.99%
20
Dimensionality Reduction-PCA
Top TEN principal components -
adding up to 96.32% of variance
in DATA – was considered instea
d of numerical variables
AUC on
ROC
Curve
Benchm
ark
Pre
Processi
ng
Feature
Engineering PCA
Gain on
Benchmark
Training 72.27% 77.06% 77.15% 77.12% 4.85%
Validation 72.70% 77.68% 77.78% 77.69% 4.99%
Revised Benchmark
Dimensionality reduction using PCA method is d
ecreasing AUC value
from previous step. Therefore, we
Decided NOT to consider principal
components in the modeling
21
Trying-Different Models..
Following models were
considered for Analysis.
Stepwise Regression
Bootstrap Forest
Neural Networks
Evaluation Criteria:
Evaluation of each of the models were done based on
AUC under ROC value
Lift Ratio
Misclassification Rate
Accuracy of Positive Cases
Lift Ratio, Misclassification Rate and Accuracy of
Positive Cases were calculated at Probability Cutoff of 0.5
However, in some business context we may have to focus
on other evaluation matrices – like minimum misclassifica
tion
Rate or maximum sensitivity etc., which may lead to a
different model.
To explain this, we have considered an additional
evaluation criteria where we considered the
minimum misclassification rate on validation set.
22
Trying-Different Models..
Stepwise Regression
23
Trying-Different Models..
Bootstrap Forest
24
Trying-Different Models..
Neural Networks
25
Model Evaluation
At Probability cutoff - 0.5
At Probability Cutoff
with minimum
Misclassification Rate
Model AUC Un
der
ROC
Lift
Ratio
Misclassification
Rate
Accuracy of
Positive Cas
es
Threshold
(cutoff)
Lift
Ratio
Misclassification
Rate
Accuracy of
Positive Cas
es
Threshold
(cutoff)
Stepwise Regression
Training 77.40% 3.06 18.02% 67.93% 0.5 3.02 17.95% 67.18%
0.4
Validation 78.01% 3.08 17.78% 67.39% 0.5 2.96 17.71% 64.83%
0.4
Bootstrap Forest
Training 85.20% 3.15 17.44% 69.99% 0.5 3.02 17.01% 67.15%
0.42
Validation 78.80% 3.11 17.53% 68.08% 0.5 3.16 17.56% 69.16%
0.42
Neural Networks
Training 79.30% 3.11 17.53% 69.17% 0.5 3.10 17.57% 68.80%
0.49
Validation 78.27% 3.00 18.03% 65.62% 0.5 3.15 17.51% 68.87%
0.49
Models Comparison
Bootstrap Forest seems fare better across most evaluation metrics. Final model gave a gain of 6.1% AUC under
ROC curve from the initial baseline model benchmark 26
Model Evaluation
Model Analysis- Column Contribution
Analyzing – Column
Contributions in the Model,
Pay_0 the recent repayment s
tatus is the most influencing
factor in this model
27
Model Evaluation
Model Analysis- Column Contribution
Whenever Pay_0 has value 2 (payment delayed
for two months), chance of correctly identifying
default cases re Higher.
For any other value of Pay_0 the
chances of incorrect predictions are
higher.
28
Conclusion
Bootstrap model seems to work better for this problem and context. However for
the same problem with a different context or criterion may lead to a different
model.
Extending the utility of this model beyond this dataset to wider credit card industry
“The Model with sufficient refinement and learning should be able to predict default
trends in the industry and help regulators formulate policies and take preemptive
actions in interest of both USERS and BANK”
29
30

More Related Content

Similar to Predicting Credit Card Defaults with Bootstrap Forest Model

Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorXuanhua(Peter) Yin
 
Production Planning and Control
Production Planning and ControlProduction Planning and Control
Production Planning and ControlSanjit Singh
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for SponsorsDee Daley
 
Basic Statistics for Paid Search Advertising
Basic Statistics for Paid Search AdvertisingBasic Statistics for Paid Search Advertising
Basic Statistics for Paid Search AdvertisingNina Estenzo
 
QSIR knowledge exchange - Matt Tite presentation
QSIR knowledge exchange   -  Matt Tite presentationQSIR knowledge exchange   -  Matt Tite presentation
QSIR knowledge exchange - Matt Tite presentationNHS Improving Quality
 
Earned Value Management Meets Big Data
Earned Value Management Meets Big DataEarned Value Management Meets Big Data
Earned Value Management Meets Big DataGlen Alleman
 
Csat random walk opportunity cost
Csat random walk opportunity costCsat random walk opportunity cost
Csat random walk opportunity costbhagyeshduke
 
Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.Kaustav Lahiri
 
Ecm time series forecast
Ecm time series forecastEcm time series forecast
Ecm time series forecastAyapparaj SKS
 
Kash Masuria Whats Six Simga Presentation
Kash Masuria Whats Six Simga PresentationKash Masuria Whats Six Simga Presentation
Kash Masuria Whats Six Simga Presentationguestd0440a
 
Decision analysis part iii
Decision analysis part iiiDecision analysis part iii
Decision analysis part iiiAsk To Solve
 
Training on the topic MSA as per new RevAF.pptx
Training on the topic MSA as per new RevAF.pptxTraining on the topic MSA as per new RevAF.pptx
Training on the topic MSA as per new RevAF.pptxSantoshKale31
 
Missing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docxMissing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docxannandleola
 
GP_Training_Introduction-to-MSA__RevAF.pptx
GP_Training_Introduction-to-MSA__RevAF.pptxGP_Training_Introduction-to-MSA__RevAF.pptx
GP_Training_Introduction-to-MSA__RevAF.pptxssuserbcf0cd
 

Similar to Predicting Credit Card Defaults with Bootstrap Forest Model (20)

Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
Speed Dating SS
Speed Dating SSSpeed Dating SS
Speed Dating SS
 
P Chart Tutorial
P Chart TutorialP Chart Tutorial
P Chart Tutorial
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for Semiconductor
 
Production Planning and Control
Production Planning and ControlProduction Planning and Control
Production Planning and Control
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for Sponsors
 
Basic Statistics for Paid Search Advertising
Basic Statistics for Paid Search AdvertisingBasic Statistics for Paid Search Advertising
Basic Statistics for Paid Search Advertising
 
QSIR knowledge exchange - Matt Tite presentation
QSIR knowledge exchange   -  Matt Tite presentationQSIR knowledge exchange   -  Matt Tite presentation
QSIR knowledge exchange - Matt Tite presentation
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Earned Value Management Meets Big Data
Earned Value Management Meets Big DataEarned Value Management Meets Big Data
Earned Value Management Meets Big Data
 
Csat random walk opportunity cost
Csat random walk opportunity costCsat random walk opportunity cost
Csat random walk opportunity cost
 
Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.
 
Ecm time series forecast
Ecm time series forecastEcm time series forecast
Ecm time series forecast
 
Factors affecting customer satisfaction
Factors affecting customer satisfactionFactors affecting customer satisfaction
Factors affecting customer satisfaction
 
Kash Masuria Whats Six Simga Presentation
Kash Masuria Whats Six Simga PresentationKash Masuria Whats Six Simga Presentation
Kash Masuria Whats Six Simga Presentation
 
Decision analysis part iii
Decision analysis part iiiDecision analysis part iii
Decision analysis part iii
 
Training on the topic MSA as per new RevAF.pptx
Training on the topic MSA as per new RevAF.pptxTraining on the topic MSA as per new RevAF.pptx
Training on the topic MSA as per new RevAF.pptx
 
Missing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docxMissing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docx
 
GP_Training_Introduction-to-MSA__RevAF.pptx
GP_Training_Introduction-to-MSA__RevAF.pptxGP_Training_Introduction-to-MSA__RevAF.pptx
GP_Training_Introduction-to-MSA__RevAF.pptx
 

More from Shu-Feng Tsao

OPIM 5272 group project
OPIM 5272 group projectOPIM 5272 group project
OPIM 5272 group projectShu-Feng Tsao
 
OPIM 5270 Team 2 Presentation
OPIM 5270 Team 2 PresentationOPIM 5270 Team 2 Presentation
OPIM 5270 Team 2 PresentationShu-Feng Tsao
 
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...Shu-Feng Tsao
 
Data Analytics for Business Certificate for Shu tsao
Data Analytics for Business Certificate for Shu tsaoData Analytics for Business Certificate for Shu tsao
Data Analytics for Business Certificate for Shu tsaoShu-Feng Tsao
 
Case Study: Increasing Product Returns at Amaron, Inc.
Case Study: Increasing Product Returns at Amaron, Inc.Case Study: Increasing Product Returns at Amaron, Inc.
Case Study: Increasing Product Returns at Amaron, Inc.Shu-Feng Tsao
 
Forecasting case study: Chiboodle inc
Forecasting case study: Chiboodle incForecasting case study: Chiboodle inc
Forecasting case study: Chiboodle incShu-Feng Tsao
 
Older Adults FV Intake & FM shopping
Older Adults FV Intake & FM shoppingOlder Adults FV Intake & FM shopping
Older Adults FV Intake & FM shoppingShu-Feng Tsao
 

More from Shu-Feng Tsao (7)

OPIM 5272 group project
OPIM 5272 group projectOPIM 5272 group project
OPIM 5272 group project
 
OPIM 5270 Team 2 Presentation
OPIM 5270 Team 2 PresentationOPIM 5270 Team 2 Presentation
OPIM 5270 Team 2 Presentation
 
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...
 
Data Analytics for Business Certificate for Shu tsao
Data Analytics for Business Certificate for Shu tsaoData Analytics for Business Certificate for Shu tsao
Data Analytics for Business Certificate for Shu tsao
 
Case Study: Increasing Product Returns at Amaron, Inc.
Case Study: Increasing Product Returns at Amaron, Inc.Case Study: Increasing Product Returns at Amaron, Inc.
Case Study: Increasing Product Returns at Amaron, Inc.
 
Forecasting case study: Chiboodle inc
Forecasting case study: Chiboodle incForecasting case study: Chiboodle inc
Forecasting case study: Chiboodle inc
 
Older Adults FV Intake & FM shopping
Older Adults FV Intake & FM shoppingOlder Adults FV Intake & FM shopping
Older Adults FV Intake & FM shopping
 

Recently uploaded

Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...makika9823
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdfFinTech Belgium
 
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...ssifa0344
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptxFinTech Belgium
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfAdnet Communications
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Sapana Sha
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Commonwealth
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdfAdnet Communications
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawlmakika9823
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...Call Girls in Nagpur High Profile
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free DeliveryPooja Nehwal
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure servicePooja Nehwal
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 

Recently uploaded (20)

Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
 
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdf
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]
 
Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 

Predicting Credit Card Defaults with Bootstrap Forest Model

  • 1. GROUP – 7 Dawei Ye Hrushikesh Basavanahalli Jobil Joseph Ryan Curtis Shu-Feng Tsao Yijing Liang Predicting Credit Card Defaults OPIM 5640 – Predictive Modeling – Final Assignment Data Source from Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA
  • 2.  The Business Problem  Data - Data source used  Modeling Methodology Adapted  Detailing Major steps in the methodology  Models  WHY and HOW we choose this model  Conclusion Agenda 2
  • 4. We took a dataset with 30,000 records of c redit card borrowers with details about their demography and behavior and payment patterns Dataset Source…Kaggle….. The goal is to build a Predictive Model that would predict if a credit card user would default on the Upcoming payment with acceptable accuracy 4
  • 5. A Doctor a data doctor… with a diagnostic approach 5
  • 6. - Data - Data - Data - Data - Data Visualize Data For Initial Diagnostics Build Baseline Model Logistic Regression Model Revised the Model After Each Iteration Checking improvement in accuracy Data Cleansing Data Pre-processing Exploratory Data Analytics Feature / Engineering Selection Trying Different Models 6
  • 7. Data Dictionary Data contains a binary variable, default payment (Yes = 1, No = 0), as the response variable. Total 25 variables in the dataset including response variable. ID: ID of each client LIMIT_BAL: Amount of the given credit (NT dollar): it includes both the individual consumer credit a nd his/her family (supplementary) credit. SEX: Gender (1 = male; 2 = female). EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). MARRIAGE: Marital status (1 = married; 2 = single; 3 = others). AGE: Age (year). PAY_0 – PAY_6: History of past payment. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for t wo months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. BILL_AMT1-BILL_AMT6: Amount of bill statement (NT dollar). BILL_AMT1 = amount of bill statemen t in September, 2005; BILL_AMT2 = amount of bill statement in August, 2005; . . .; BILL_AMT6 = am ount of bill statement in April, 2005. PAY_AMT1-PAY_AMT6: Amount of previous payment (NT dollar). PAY_AMT1 = amount paid in Sep tember, 2005; PAY_AMT2 = amount paid in August, 2005; . . .;PAY-AMT6 = amount paid in April, 20 05. 7
  • 8. Data Overview for baseline model Data summary Total 30000 records Summary Details about Training and V alidation dataset used for the model. We used a 70:30 split using stratified random sampling. Total number of records in training data: 21000 Total positive cases(default/1) in traini ng data: 4666 Total negative cases(non-default/0) in training data: 16334 Total number of records in validatio n data: 9000 Total positive cases(default/1) in valida tion data: 1970 Total negative cases(non-default/0) in validation data: 7030 8
  • 9. Baseline model-Logistic Regression Baseline Evaluation AUC on ROC Curve Benchmark Training 72.27% Validation 72.70% 9
  • 10. Data Cleansing & Preprocessing Variable Initial Type Type Changed to 1 SEX: numerical continuous character nominal 2 EDUCATION: numerical continuous character nominal 3 MARRIAGE: numerical continuous character nominal 4 PAY_0: numerical continuous character nominal 5 PAY_2: numerical continuous character nominal 6 PAY_3: numerical continuous character nominal 7 PAY_4: numerical continuous character nominal 8 PAY_5: numerical continuous character nominal 9 PAY_6: numerical continuous character nominal Variable Transformation 10 BILL_AMT1 to BILL_AMT6 Standardized the Variable to scale 11 PAY_AMT1 to PAY_AMT6 Standardized the Variable to scale 12 LIMIT_BAL Standardized the Variable to scale 13 Age Standardized the Variable to scale 10
  • 11. Benchmark Check-Post Preprocessing AUC on RO C Curve Benchmark Pre Processing Gain on Benchmark Training 72.27% 77.06% 4.79% Validation 72.70% 77.68% 4.98% 11
  • 12. Data Visualization / Exploration Default by Education Level Imputed Grad School Bachelors High School Other Other Other 12
  • 13. Default by Age 21-27 27-31 31-37 37-43 43-79 Data Visualization / Exploration 13
  • 14. Default by Marriage Status Married Single Other Imputed @source Data Visualization / Exploration 14
  • 15. Scatter plot across Bill Amount and Pay Amount There is HIGH Correlation between Bill Amount (Value of monthly bill) across SIX months However, there is LOW correlation between Payment pattern across SIX months Data Visualization / Exploration 15
  • 16. Feature Engineering Three Categories [pay_*, bill_amt*, pay_amt*] of variables shows behavioral patterns across SIX months To extract the aggregated pattern across SIX months, we derived FOUR new Variables from the above Three Categories of variables. Field Name Description AMT_OWED Running or cumulative sum of bill amount - payment amount fo r each individual AVG_6MTH_OWED Mean value of AMT_OWED over a 6-month period MISSED_PAYMENTS Maximum number of missed payments recorded for the individu al BALANCE_TO_LIMIT_RATIO Average 6-month balance divided by the individual’s credit limit; note anything <= 0.3 is considered good 16
  • 17. Missed Payment by Gender Data Visualization / Exploration….. 17
  • 18. Missed Payments by Education Level Data Visualization / Exploration 18
  • 19. To understand the underlying structure of data, we did a cluster analysis using hierarchical method Six different cluster groups were identified The Cluster value was added as NEW variable to the data In total FIVE new variable s were added to the dataset FOUR derived variables – which were standardized. One Variable representing SIX clusters – type casted to character nominal variable Data Visualization - Clustering 19
  • 20. Revised Benchmark AUC on R OC Curve Benchma rk Pre Processing Feature Engineering Gain on Benchma rk Training 72.27% 77.06% 77.15% 4.85% Validatio n 72.70% 77.68% 77.78% 4.99% 20
  • 21. Dimensionality Reduction-PCA Top TEN principal components - adding up to 96.32% of variance in DATA – was considered instea d of numerical variables AUC on ROC Curve Benchm ark Pre Processi ng Feature Engineering PCA Gain on Benchmark Training 72.27% 77.06% 77.15% 77.12% 4.85% Validation 72.70% 77.68% 77.78% 77.69% 4.99% Revised Benchmark Dimensionality reduction using PCA method is d ecreasing AUC value from previous step. Therefore, we Decided NOT to consider principal components in the modeling 21
  • 22. Trying-Different Models.. Following models were considered for Analysis. Stepwise Regression Bootstrap Forest Neural Networks Evaluation Criteria: Evaluation of each of the models were done based on AUC under ROC value Lift Ratio Misclassification Rate Accuracy of Positive Cases Lift Ratio, Misclassification Rate and Accuracy of Positive Cases were calculated at Probability Cutoff of 0.5 However, in some business context we may have to focus on other evaluation matrices – like minimum misclassifica tion Rate or maximum sensitivity etc., which may lead to a different model. To explain this, we have considered an additional evaluation criteria where we considered the minimum misclassification rate on validation set. 22
  • 26. Model Evaluation At Probability cutoff - 0.5 At Probability Cutoff with minimum Misclassification Rate Model AUC Un der ROC Lift Ratio Misclassification Rate Accuracy of Positive Cas es Threshold (cutoff) Lift Ratio Misclassification Rate Accuracy of Positive Cas es Threshold (cutoff) Stepwise Regression Training 77.40% 3.06 18.02% 67.93% 0.5 3.02 17.95% 67.18% 0.4 Validation 78.01% 3.08 17.78% 67.39% 0.5 2.96 17.71% 64.83% 0.4 Bootstrap Forest Training 85.20% 3.15 17.44% 69.99% 0.5 3.02 17.01% 67.15% 0.42 Validation 78.80% 3.11 17.53% 68.08% 0.5 3.16 17.56% 69.16% 0.42 Neural Networks Training 79.30% 3.11 17.53% 69.17% 0.5 3.10 17.57% 68.80% 0.49 Validation 78.27% 3.00 18.03% 65.62% 0.5 3.15 17.51% 68.87% 0.49 Models Comparison Bootstrap Forest seems fare better across most evaluation metrics. Final model gave a gain of 6.1% AUC under ROC curve from the initial baseline model benchmark 26
  • 27. Model Evaluation Model Analysis- Column Contribution Analyzing – Column Contributions in the Model, Pay_0 the recent repayment s tatus is the most influencing factor in this model 27
  • 28. Model Evaluation Model Analysis- Column Contribution Whenever Pay_0 has value 2 (payment delayed for two months), chance of correctly identifying default cases re Higher. For any other value of Pay_0 the chances of incorrect predictions are higher. 28
  • 29. Conclusion Bootstrap model seems to work better for this problem and context. However for the same problem with a different context or criterion may lead to a different model. Extending the utility of this model beyond this dataset to wider credit card industry “The Model with sufficient refinement and learning should be able to predict default trends in the industry and help regulators formulate policies and take preemptive actions in interest of both USERS and BANK” 29
  • 30. 30

Editor's Notes

  1. To Predict if a borrower would default or NOT Team’s goal is to predict weather a borrower would default on his / her credit card due or NOT This would help Banks decide on RISK the bank is taking up while issue a Credit Card or deciding on the - Default NOT necessarily mean bad for bank IF borrower recovers and pays up all necessary fee! - But very important for bank to assess the RISK they are carrying while approving a revolving credit for the borrower.
  2. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  3. Analogy of Doctor diagnosing a patient… and improving his health so that… we are confident that he would be able to jump… Data Doctor accepted the data Visualized the data to do initial diagnostics Data Cleansing Data Pre-processing Exploratory Data Analysis Feature / Engineering Selection Try different Model details Evaluation of the Model Publishing the model
  4. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  5. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  6. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  7. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  8. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  9. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  10. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  11. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  12. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  13. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  14. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  15. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  16. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  17. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  18. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  19. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  20. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  21. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  22. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  23. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  24. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  25. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  26. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  27. To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….