SlideShare a Scribd company logo
1 of 43
Construction of a robust
prediction model to forecast the
likelihood of a credit card holder
to experience payment defaults in
upcoming months.
Xi global resources
Group of company
March 28, 2024
contents
Introduction
01
Resources
02
Methodology
03
Result and discussion
04
Conclusion
06
Recommendations
05
Introduction
Taiwan is a prominent financial institution. The company is known for provision of credit
card services to a substantial clientele. Similar to other credit card issuers, this corporation
encounters the task of forecasting the likelihood of a customer's payment default in the
forthcoming month.
It was stated by Augustin, the company project manager that, accurate forecasting of
defaults is essential for effectively managing risks and making informed decisions within
the organization.
After much meeting with the company board of director and management, it was
observed that there is a need for a professional data scientist would could create a robust,
reliable, efficient and unbiased model to solve the identified problem. In this regards I was
seek for as an analyst. The purpose of my service is to construct a robust prediction
model that can accurately forecast the likelihood of a credit card holder experiencing
payment default during the upcoming month.
The organization has offered a dataset that encompasses a range of elements pertaining to
the demographic information, repayment history, bill statements, and other pertinent
attributes of credit card customers. These characteristics will form the foundation for
constructing the predictive model.
Source: https://images.app.goo.gl/5w6oRm8eHyFVsNmo8
Odusanya, Hafeez. (2023). "INFLUENCE OF CREDIT RISK MANAGEMENT ON FINANCIAL
PERFORMANCE OF COMMERCIAL BANKS IN NIGERIA.
Research findings indicate that the act of making
payments can result in substantial financial
losses of a company, while also exerting an
influence on customer relationships and credit
risk.
Resources
Question addressed
The only resource provided is the dataset. This
dataset contains information on default payments,
demographic factors, credit data, history of
payment, and bill statements of credit card clients
in Taiwan from April 2005 to September 2005.
The dataset contains 25 variables such as:
 ID of the client
 Amount of given credit in NT dollars
 Gender
 Education
 Marrital status
 Age
 Payment status from April to September
 Amount of bill statement from April to
September
 Amount of previous payment from April to
September
 Default payment next month
 Which factors have the greatest impact on the probability of credit card holders defaulting
on their payments?
 Do demographic characteristics, including age, gender, education, and marital status, exhibit
a correlation with default payment behavior?
 What is the impact of repayment patterns, namely from PAY_0 to PAY_6, on the probability
of default payment?
 Is it possible to forecast future default payment behavior based on prior bill amounts
(BILL_AMT1 to BILL_AMT6) and previous payment amounts (PAY_AMT1 to
PAY_AMT6)?
 Do variations in default payment behavior exist among individuals with varying levels of
education or marital statuses?
 What is the relationship between the credit amount provided (LIMIT_BAL) and the rates of
default payments?
 Given historical data and demographic information, can a predictive model effectively
estimate default payment with a high degree of accuracy?
 Does the dataset exhibit any temporal patterns in default payment behavior during the six-
month duration?
 Are there specific cohorts of credit card customers who demonstrate a greater inclination
towards default payment, and if yes, what are the defining characteristics of these cohorts?
 What is the relationship between various combinations of repayment status and bill/payment
amounts and the results of default payments?
Dataset Information
Methods
In order to tackle this issue, we will utilize machine learning methodologies to build a prognostic model utilizing the dataset
furnished by the organization. The dataset was first transformed. This transformation was done to decode categorical variables
which by default where either in dummy variable or dichotomous. After this, a preliminary analysis was conducted to explore
and to understand the data structure. Exploratory analysis, to detect missing values and outliers. Considering the nature of the
dataset, which contains both qualitative and quantitative variables. A bar chart was used for qualitative variables, while boxplot
and density plot was used for the quantitative or continuous variables.
Also, the relationship between the quantitative variables were investigated by conducting a correlation matrix. After which the
dataset was split into training and test set with the ratio of 75 : 25. By splitting the data one ensures that a particular piece of it is to
be used only to train machine learning models. Another piece is to remain unused during the training process, but is rather used to
assess the model performance. Splitting the data prevents overfitting and allows for a more accurate evaluation of the model’s
ability to generalize to previously unknown data in machine learning model
At first, all the variables in the dataset was used to train the model, after that, a backward selection process was apply with a stop
condition to remove any variable with insignificant estimate. The level of significance for the post-selection, was established as
alpha 0.0.05. The Post-Selection method improves the model by continuing to iterate and identify and eliminate elements that do
not assist in the model.
The logistic regression model used is given by Y(x=1) =
𝑒𝑘
1+ 𝑒𝑘 , where k = α 𝑜 + 𝑋1α 𝑜 + 𝑋1α 𝑜 + …
Structure of the dataset
Figure 1: Variable contained in the dataset displaying the total number of observation by the
variable type(integer or numeric)
Structure of the dataset
Figure 2: Variable contained in the dataset displaying the percentage of values present(or if
there is any missing values) by total number of observation.
Preliminary Analysis: Exploratory data analysis
Figure 3: Distribution of gender by default payment
next month
Preliminary Analysis: Exploratory data analysis
Figure 3: Distribution of gender by default payment
next month
Preliminary Analysis: Exploratory data analysis
Figure 4: Distribution of marriage by default payment next month
Preliminary Analysis: Exploratory data analysis
Figure 5: Distribution of education by default payment next month
Preliminary Analysis: Exploratory data analysis
Figure 6: Distribution of repayment status in September, 2005 by
default payment next month
Preliminary Analysis: Exploratory data analysis
Figure 7:
Preliminary Analysis: Exploratory data analysis
Figure 8: Distribution of repayment status in July, 2005 by default
payment next month
Preliminary Analysis: Exploratory data analysis
Figure 9: Distribution of repayment status in June, 2005 by default
payment next month
Preliminary Analysis: Exploratory data analysis
Figure 10: Distribution of repayment status in May, 2005 by
default payment next month
Preliminary Analysis: Exploratory data analysis
Figure 11: Distribution of repayment status in April, 2005 by
default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 48509.16 51994.23
Std.Dev 73782.07 73577.61
Min -6676 -165580
Q1 2986.5 3676.5
Median 20185 23119.5
Q3 59667 69031
Max 613860 964511
MAD 29304.33 33292.52
IQR 56638.75 65349.75
CV 1.52 1.42
Skewness 2.97 2.58
Kurtosis 11.62 9.31
Figure 12: Distribution of amount of bill statement in
September, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 47283.62 49717.44
Std.Dev 71651.03 71029.95
Min -17710 -69777
Q1 2693 3054
Median 20300.5 21660.5
Q3 57920.5 65698
Max 581775 983931
MAD 29519.31 31535.64
IQR 55225.75 62631
CV 1.52 1.43
Skewness 2.97 2.63
Kurtosis 11.54 9.95
Figure 13: Distribution of amount of bill statement in
August, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 45181.6 47533.37
Std.Dev 68516.98 69576.66
Min -61506 -157264
Q1 2500 2768.5
Median 19834.5 20202.5
Q3 54734.5 61896
Max 578971 1664089
MAD 28828.42 29416.27
IQR 52233.75 59124.25
CV 1.52 1.46
Skewness 2.95 3.13
Kurtosis 11.34 22.03
Figure 14: Distribution of amount of bill statement in
July, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 42036.95 43611.17
Std.Dev 64351.08 64324.8
Min -65167 -170000
Q1 2141 2360
Median 19119.5 19000
Q3 50178.5 55993
Max 548020 891586
MAD 27679.4 27591.19
IQR 48034.25 53628
CV 1.53 1.47
Skewness 3 2.77
Kurtosis 11.86 11.16
Figure 15: Distribution of amount of bill statement in
June, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 39540.19 40530.45
Std.Dev 61424.7 60617.27
Min -53007 -81334
Q1 1500.5 1823
Median 18478.5 17998
Q3 47856 51136.5
Max 547880 927171
MAD 26726.83 26092.28
IQR 46350.25 49312.25
CV 1.55 1.5
Skewness 3.03 2.83
Kurtosis 12.22 12.33
Figure 16: Distribution of amount of bill statement in
May, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 38271.44 39042.27
Std.Dev 59579.67 59547.02
Min -339603 -209051
Q1 1150 1265
Median 18028.5 16679
Q3 47430 49844
Max 514975 961664
MAD 26150.84 24310.19
IQR 46274 48577
CV 1.56 1.53
Skewness 2.9 2.83
Kurtosis 11.79 12.41
Figure 17: Distribution of amount of bill statement in
April, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 3397.04 6307.34
Std.Dev 9544.25 18014.51
Min 0 0
Q1 0 1163.5
Median 1636 2459.5
Q3 3478.5 5606.5
Max 300000 873552
MAD 2425.53 3068.24
IQR 3478.25 4442.5
CV 2.81 2.86
Skewness 14.77 13.94
Kurtosis 323.48 371.72
Figure 18: Distribution of amount of previous
payments in September, 2005 by default payment next
month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 3388.65 6640.47
Std.Dev 11737.99 25302.26
Min 0 0
Q1 0 1005
Median 1533.5 2247.5
Q3 3310.5 5311.5
Max 358689 684259
MAD 2273.57 2863.64
IQR 3309.75 4306.25
CV 3.46 3.81
Skewness 18.23 28.94
Kurtosis 439.72 1439.86
Figure 19: Distribution of amount of previous payment
in August, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 3367.35 5753.5
Std.Dev 12959.62 18684.26
Min 0 0
Q1 0 600
Median 1222 2000
Q3 3000 5000
Max 508229 896040
MAD 1811.74 2799.89
IQR 3000 4400
CV 3.85 3.25
Skewness 18.13 16.73
Kurtosis 492.11 537.52
Figure 20: Distribution of amount of previous payment
in July, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 3155.63 5300.53
Std.Dev 11191.97 16689.78
Min 0 0
Q1 0 390
Median 1000 1734
Q3 2940.5 4602
Max 432130 621000
MAD 1482.6 2570.83
IQR 2939.25 4212
CV 3.55 3.15
Skewness 16.97 12.2
Kurtosis 463.45 248.77
Figure 21: Distribution of amount of previous payment
in June, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 3219.14 5248.22
Std.Dev 11944.73 16071.67
Min 0 0
Q1 0 369
Median 1000 1765
Q3 3000 4600
Max 332000 426529
MAD 1482.6 2616.79
IQR 3000 4231
CV 3.71 3.06
Skewness 15.21 10.46
Kurtosis 316.99 160.71
Figure 22: Distribution of amount of previous payment
in May, 2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 3441.48 5719.37
Std.Dev 13464.01 18792.95
Min 0 0
Q1 0 300
Median 1000 1706
Q3 2975 4545
Max 345293 528666
MAD 1482.6 2529.32
IQR 2974.5 4245
CV 3.91 3.29
Skewness 12.66 10.2
Kurtosis 208.12 155.51
Figure 23: Distribution of amount of previous payment
in April,2005 by default payment next month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 35.73 35.42
Std.Dev 9.69 9.08
Min 21 21
Q1 28 28
Median 34 34
Q3 42 41
Max 75 79
MAD 10.38 8.9
IQR 14 13
CV 0.27 0.26
Skewness 0.66 0.75
Kurtosis 0.11
Figure 24: Distribution of age by default payment next
month
Preliminary Analysis: Exploratory data analysis
Estimators Yes No
Mean 130109.7 178099.73
Std.Dev 115378.5 131628.36
Min 10000 10000
Q1 50000 70000
Median 90000 150000
Q3 200000 250000
Max 740000 1000000
MAD 88956 133434
IQR 150000 180000
CV 0.89 0.74
Skewness 1.35 0.91
Kurtosis 1.55 0.38
Figure 25: Distribution of amount of given credit bill
by default payment next month
Correlation analysis
Figure 26: Relationship between quantitative variables
 Figure 26 shows the correlation coefficients between various
pairs of variables in the dataset.
 There is a moderate positive correlation between the amounts
of bill statements in different months (e.g., BILL_AMT1 and
BILL_AMT2 have a correlation coefficient of 0.951).
 There are weak positive correlations between the amounts of
bill statements and the amounts of previous payments (e.g.,
PAY_AMT1 and BILL_AMT1 have a correlation coefficient
of 0.140).
 There are also weak positive correlations between the amounts
of bill statements and the repayment status (e.g., PAY_0 and
BILL_AMT1 have a correlation coefficient of 0.285).
Model
Figure 27: Model summary showing each of the
estimate, with significant levels.
 The coefficient estimates represent the change in the log-
odds of the dependent variable (default.payment.next.month)
for a one-unit increase in the predictor variable, holding all
other variables constant.
 For example, the coefficient for AGE is 0.008812. This
means that for every one-year increase in age, the log-odds
of default.payment.next.month increases by 0.008812 units.
 Similarly, the coefficient for PAY_0 is 0.5867. This suggests
that a one-unit increase in the PAY_0 variable (repayment
status in September 2005) results in a 0.5867 increase in the
log-odds of default.payment.next.month, holding all other
variables constant.
Model parameters
Figure 28: Visualization of the model estimates displaying
level of significant for each of the estimate.
Model
 The null deviance (23778) represents the difference
in deviance between the null model (with no
predictors) and the saturated model (with all
predictors).
 The residual deviance (20943) represents the
difference in deviance between the fitted model and
the saturated model.
 A lower residual deviance indicates a better fit of
the model to the data.
 The Akaike Information Criterion (AIC) is a
measure of the model's goodness of fit, balancing
the fit of the model with the number of parameters
used. Lower AIC values indicate better model fit.
 The significance codes indicate the statistical significance of
each coefficient estimate.
 *** '' indicates p < 0.001, ** '' indicates p < 0.01, *''
indicates p < 0.05, '.' indicates p < 0.1, and ' ' indicates p >
0.1.
 For example, the coefficient estimates for AGE, PAY_0,
BILL_AMT1, and PAY_AMT1 are highly significant (p <
0.001), indicating that these variables are strongly associated
with default.payment.next.month.
 PAY_AMT5, PAY_AMT4, and PAY_AMT6 have p-values
slightly above 0.05, suggesting they may have a weaker
association with default.payment.next.month.
Model Equation
The logistic regression equation can be constructed as follows:
logit(p) = -1.412 + 0.008812 * AGE + 0.5867 * PAY_0 + 0.1042 * PAY_2 + 0.1218 * PAY_3 -
0.00000531 * BILL_AMT1 + 0.00000338 * BILL_AMT3 - 0.00001491 * PAY_AMT1 -
0.00001212 * PAY_AMT2 - 0.000003455 * PAY_AMT6 - 0.000003057 * PAY_AMT5 -
0.000003074 * PAY_AMT4
The formula for the logit function is:
logit(p)=ln(
𝑃
1 −𝑃
)
Where:
p represents the probability of the event occurring.
Receiver Operating Characteristic curve
Figure 28: Area under the curve (AUC)
 The AUC (Area Under the Curve) of the ROC
curve quantifies the overall performance of the
model.
 With an AUC of 0.72: The model demonstrates
moderate discrimination ability.
 An AUC of 0.72 suggests that the model is
better than random chance but may still have
room for improvement.
 It implies that the model can distinguish
between the two classes (positive and
negative) with a reasonable degree of accuracy.
Model evaluation
Figure 30: Visualization of the quantile-quantile plot
 In Fig 30, the observed residuals are plotted against the
quantiles of a theoretical distribution (usually the
standard normal distribution).
 If the residuals are normally distributed, the points on
the QQ plot will fall approximately along a straight
line.
 As seen in Fig 30, there is a deviations from a straight
line suggest departures from normality in the residuals
 This isn’t a desirable characteristic. And it suggests that
the residuals are not normally distributed.
 This departures from normality in the residuals
indicate issues with the model assumptions and the
presence of outliers or influential data points.
Structure of the dataset
Figure 31: Distribution of amount of bill statement in
September, 2005 by default payment next month
Structure of the dataset
Figure 31: Distribution of amount of bill statement in
September, 2005 by default payment next month
Accuracy
Trained Model 80.84 %
Retrained Model 80.91 %
Recommendations
Bill Amount and previous
paymentd
The bill amounts (e.g., BILL_AMT1,
BILL_AMT3) and previous payment
amounts (e.g., PAY_AMT1,
PAY_AMT2) also play a significant
role in predicting default. Encouraging
timely bill payments and offering
flexible payment options can help
mitigate default risks.
Customer Segmentation
Utilize the insights from the logistic
regression model to segment customers
based on their risk profiles. This
segmentation can help prioritize
collections efforts, tailor marketing
campaigns, and customize financial
products to better meet the needs of
different customer segments.
Customer Assistance Program
Customer Assistance Programs: Implement
customer assistance programs or financial
counseling services to support customers
experiencing financial difficulties. Proactively
reaching out to at-risk customers and offering
them assistance can help prevent defaults and
foster customer loyalty.
Payment Status Importance
It is crucial for the business to closely
monitor customers' payment behavior,
especially when there are signs of
payment delays or defaults.
Age Factor
While the age variable (AGE)
has a relatively small coefficient,
it still contributes to the model's
predictive power. Understanding
the age distribution of customers
and how it correlates with
default rates can help tailor
marketing strategies or financial
products to different age groups.
Conclusion
 In conclusion, the logistic regression analysis and correlation
results provide valuable insights into the factors influencing
default payment next month in the dataset.
 The analysis highlights the significance of payment status, age,
bill amounts, and previous payments in predicting default. By
closely monitoring these factors and adapting strategies
accordingly, businesses can better manage default risks and
improve their financial stability.
THANK YOU!

More Related Content

Similar to Construction of a robust prediction model to forecast the likelihood of a credit card holder to experience payment defaults in upcoming months.

Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Vatsal N Shah
 
Pds assignment 2 presentation
Pds assignment 2 presentationPds assignment 2 presentation
Pds assignment 2 presentationVikas Virani
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersSaurabh Singh
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model finalRitu Sarkar
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmanda Reed
 
Instance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural NetworksInstance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural NetworksITIIIndustries
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Marketing Analysis of Scooter (Monopattino) Sharing in Milan
Marketing Analysis of Scooter (Monopattino) Sharing in Milan Marketing Analysis of Scooter (Monopattino) Sharing in Milan
Marketing Analysis of Scooter (Monopattino) Sharing in Milan Emre Danışan
 
Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateKaren Yang
 
Should this loan be approved or denied
Should this loan be approved or deniedShould this loan be approved or denied
Should this loan be approved or deniedPOOJA PATIL
 
Mortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek MrotekMortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek Mrotekkylemrotek
 

Similar to Construction of a robust prediction model to forecast the likelihood of a credit card holder to experience payment defaults in upcoming months. (19)

Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients
 
Pds assignment 2 presentation
Pds assignment 2 presentationPds assignment 2 presentation
Pds assignment 2 presentation
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customers
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
 
Creditscore
CreditscoreCreditscore
Creditscore
 
Bivariate Regression
Bivariate RegressionBivariate Regression
Bivariate Regression
 
Predictive Modeling Development Life Cycle
Predictive Modeling Development Life CyclePredictive Modeling Development Life Cycle
Predictive Modeling Development Life Cycle
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Instance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural NetworksInstance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural Networks
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Marketing Analysis of Scooter (Monopattino) Sharing in Milan
Marketing Analysis of Scooter (Monopattino) Sharing in Milan Marketing Analysis of Scooter (Monopattino) Sharing in Milan
Marketing Analysis of Scooter (Monopattino) Sharing in Milan
 
Forecasting peer-to-peer lending risk
Forecasting peer-to-peer lending riskForecasting peer-to-peer lending risk
Forecasting peer-to-peer lending risk
 
Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRate
 
scrib.pptx
scrib.pptxscrib.pptx
scrib.pptx
 
Should this loan be approved or denied
Should this loan be approved or deniedShould this loan be approved or denied
Should this loan be approved or denied
 
Mortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek MrotekMortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek Mrotek
 
Credit Scoring of Turkey with Semiparametric Logit Models
Credit Scoring of Turkey with Semiparametric Logit ModelsCredit Scoring of Turkey with Semiparametric Logit Models
Credit Scoring of Turkey with Semiparametric Logit Models
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Construction of a robust prediction model to forecast the likelihood of a credit card holder to experience payment defaults in upcoming months.

  • 1. Construction of a robust prediction model to forecast the likelihood of a credit card holder to experience payment defaults in upcoming months. Xi global resources Group of company March 28, 2024
  • 3. Introduction Taiwan is a prominent financial institution. The company is known for provision of credit card services to a substantial clientele. Similar to other credit card issuers, this corporation encounters the task of forecasting the likelihood of a customer's payment default in the forthcoming month. It was stated by Augustin, the company project manager that, accurate forecasting of defaults is essential for effectively managing risks and making informed decisions within the organization. After much meeting with the company board of director and management, it was observed that there is a need for a professional data scientist would could create a robust, reliable, efficient and unbiased model to solve the identified problem. In this regards I was seek for as an analyst. The purpose of my service is to construct a robust prediction model that can accurately forecast the likelihood of a credit card holder experiencing payment default during the upcoming month. The organization has offered a dataset that encompasses a range of elements pertaining to the demographic information, repayment history, bill statements, and other pertinent attributes of credit card customers. These characteristics will form the foundation for constructing the predictive model. Source: https://images.app.goo.gl/5w6oRm8eHyFVsNmo8 Odusanya, Hafeez. (2023). "INFLUENCE OF CREDIT RISK MANAGEMENT ON FINANCIAL PERFORMANCE OF COMMERCIAL BANKS IN NIGERIA. Research findings indicate that the act of making payments can result in substantial financial losses of a company, while also exerting an influence on customer relationships and credit risk.
  • 4. Resources Question addressed The only resource provided is the dataset. This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005. The dataset contains 25 variables such as:  ID of the client  Amount of given credit in NT dollars  Gender  Education  Marrital status  Age  Payment status from April to September  Amount of bill statement from April to September  Amount of previous payment from April to September  Default payment next month  Which factors have the greatest impact on the probability of credit card holders defaulting on their payments?  Do demographic characteristics, including age, gender, education, and marital status, exhibit a correlation with default payment behavior?  What is the impact of repayment patterns, namely from PAY_0 to PAY_6, on the probability of default payment?  Is it possible to forecast future default payment behavior based on prior bill amounts (BILL_AMT1 to BILL_AMT6) and previous payment amounts (PAY_AMT1 to PAY_AMT6)?  Do variations in default payment behavior exist among individuals with varying levels of education or marital statuses?  What is the relationship between the credit amount provided (LIMIT_BAL) and the rates of default payments?  Given historical data and demographic information, can a predictive model effectively estimate default payment with a high degree of accuracy?  Does the dataset exhibit any temporal patterns in default payment behavior during the six- month duration?  Are there specific cohorts of credit card customers who demonstrate a greater inclination towards default payment, and if yes, what are the defining characteristics of these cohorts?  What is the relationship between various combinations of repayment status and bill/payment amounts and the results of default payments? Dataset Information
  • 5. Methods In order to tackle this issue, we will utilize machine learning methodologies to build a prognostic model utilizing the dataset furnished by the organization. The dataset was first transformed. This transformation was done to decode categorical variables which by default where either in dummy variable or dichotomous. After this, a preliminary analysis was conducted to explore and to understand the data structure. Exploratory analysis, to detect missing values and outliers. Considering the nature of the dataset, which contains both qualitative and quantitative variables. A bar chart was used for qualitative variables, while boxplot and density plot was used for the quantitative or continuous variables. Also, the relationship between the quantitative variables were investigated by conducting a correlation matrix. After which the dataset was split into training and test set with the ratio of 75 : 25. By splitting the data one ensures that a particular piece of it is to be used only to train machine learning models. Another piece is to remain unused during the training process, but is rather used to assess the model performance. Splitting the data prevents overfitting and allows for a more accurate evaluation of the model’s ability to generalize to previously unknown data in machine learning model At first, all the variables in the dataset was used to train the model, after that, a backward selection process was apply with a stop condition to remove any variable with insignificant estimate. The level of significance for the post-selection, was established as alpha 0.0.05. The Post-Selection method improves the model by continuing to iterate and identify and eliminate elements that do not assist in the model. The logistic regression model used is given by Y(x=1) = 𝑒𝑘 1+ 𝑒𝑘 , where k = α 𝑜 + 𝑋1α 𝑜 + 𝑋1α 𝑜 + …
  • 6. Structure of the dataset Figure 1: Variable contained in the dataset displaying the total number of observation by the variable type(integer or numeric)
  • 7. Structure of the dataset Figure 2: Variable contained in the dataset displaying the percentage of values present(or if there is any missing values) by total number of observation.
  • 8. Preliminary Analysis: Exploratory data analysis Figure 3: Distribution of gender by default payment next month
  • 9. Preliminary Analysis: Exploratory data analysis Figure 3: Distribution of gender by default payment next month
  • 10. Preliminary Analysis: Exploratory data analysis Figure 4: Distribution of marriage by default payment next month
  • 11. Preliminary Analysis: Exploratory data analysis Figure 5: Distribution of education by default payment next month
  • 12. Preliminary Analysis: Exploratory data analysis Figure 6: Distribution of repayment status in September, 2005 by default payment next month
  • 13. Preliminary Analysis: Exploratory data analysis Figure 7:
  • 14. Preliminary Analysis: Exploratory data analysis Figure 8: Distribution of repayment status in July, 2005 by default payment next month
  • 15. Preliminary Analysis: Exploratory data analysis Figure 9: Distribution of repayment status in June, 2005 by default payment next month
  • 16. Preliminary Analysis: Exploratory data analysis Figure 10: Distribution of repayment status in May, 2005 by default payment next month
  • 17. Preliminary Analysis: Exploratory data analysis Figure 11: Distribution of repayment status in April, 2005 by default payment next month
  • 18. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 48509.16 51994.23 Std.Dev 73782.07 73577.61 Min -6676 -165580 Q1 2986.5 3676.5 Median 20185 23119.5 Q3 59667 69031 Max 613860 964511 MAD 29304.33 33292.52 IQR 56638.75 65349.75 CV 1.52 1.42 Skewness 2.97 2.58 Kurtosis 11.62 9.31 Figure 12: Distribution of amount of bill statement in September, 2005 by default payment next month
  • 19. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 47283.62 49717.44 Std.Dev 71651.03 71029.95 Min -17710 -69777 Q1 2693 3054 Median 20300.5 21660.5 Q3 57920.5 65698 Max 581775 983931 MAD 29519.31 31535.64 IQR 55225.75 62631 CV 1.52 1.43 Skewness 2.97 2.63 Kurtosis 11.54 9.95 Figure 13: Distribution of amount of bill statement in August, 2005 by default payment next month
  • 20. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 45181.6 47533.37 Std.Dev 68516.98 69576.66 Min -61506 -157264 Q1 2500 2768.5 Median 19834.5 20202.5 Q3 54734.5 61896 Max 578971 1664089 MAD 28828.42 29416.27 IQR 52233.75 59124.25 CV 1.52 1.46 Skewness 2.95 3.13 Kurtosis 11.34 22.03 Figure 14: Distribution of amount of bill statement in July, 2005 by default payment next month
  • 21. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 42036.95 43611.17 Std.Dev 64351.08 64324.8 Min -65167 -170000 Q1 2141 2360 Median 19119.5 19000 Q3 50178.5 55993 Max 548020 891586 MAD 27679.4 27591.19 IQR 48034.25 53628 CV 1.53 1.47 Skewness 3 2.77 Kurtosis 11.86 11.16 Figure 15: Distribution of amount of bill statement in June, 2005 by default payment next month
  • 22. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 39540.19 40530.45 Std.Dev 61424.7 60617.27 Min -53007 -81334 Q1 1500.5 1823 Median 18478.5 17998 Q3 47856 51136.5 Max 547880 927171 MAD 26726.83 26092.28 IQR 46350.25 49312.25 CV 1.55 1.5 Skewness 3.03 2.83 Kurtosis 12.22 12.33 Figure 16: Distribution of amount of bill statement in May, 2005 by default payment next month
  • 23. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 38271.44 39042.27 Std.Dev 59579.67 59547.02 Min -339603 -209051 Q1 1150 1265 Median 18028.5 16679 Q3 47430 49844 Max 514975 961664 MAD 26150.84 24310.19 IQR 46274 48577 CV 1.56 1.53 Skewness 2.9 2.83 Kurtosis 11.79 12.41 Figure 17: Distribution of amount of bill statement in April, 2005 by default payment next month
  • 24. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 3397.04 6307.34 Std.Dev 9544.25 18014.51 Min 0 0 Q1 0 1163.5 Median 1636 2459.5 Q3 3478.5 5606.5 Max 300000 873552 MAD 2425.53 3068.24 IQR 3478.25 4442.5 CV 2.81 2.86 Skewness 14.77 13.94 Kurtosis 323.48 371.72 Figure 18: Distribution of amount of previous payments in September, 2005 by default payment next month
  • 25. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 3388.65 6640.47 Std.Dev 11737.99 25302.26 Min 0 0 Q1 0 1005 Median 1533.5 2247.5 Q3 3310.5 5311.5 Max 358689 684259 MAD 2273.57 2863.64 IQR 3309.75 4306.25 CV 3.46 3.81 Skewness 18.23 28.94 Kurtosis 439.72 1439.86 Figure 19: Distribution of amount of previous payment in August, 2005 by default payment next month
  • 26. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 3367.35 5753.5 Std.Dev 12959.62 18684.26 Min 0 0 Q1 0 600 Median 1222 2000 Q3 3000 5000 Max 508229 896040 MAD 1811.74 2799.89 IQR 3000 4400 CV 3.85 3.25 Skewness 18.13 16.73 Kurtosis 492.11 537.52 Figure 20: Distribution of amount of previous payment in July, 2005 by default payment next month
  • 27. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 3155.63 5300.53 Std.Dev 11191.97 16689.78 Min 0 0 Q1 0 390 Median 1000 1734 Q3 2940.5 4602 Max 432130 621000 MAD 1482.6 2570.83 IQR 2939.25 4212 CV 3.55 3.15 Skewness 16.97 12.2 Kurtosis 463.45 248.77 Figure 21: Distribution of amount of previous payment in June, 2005 by default payment next month
  • 28. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 3219.14 5248.22 Std.Dev 11944.73 16071.67 Min 0 0 Q1 0 369 Median 1000 1765 Q3 3000 4600 Max 332000 426529 MAD 1482.6 2616.79 IQR 3000 4231 CV 3.71 3.06 Skewness 15.21 10.46 Kurtosis 316.99 160.71 Figure 22: Distribution of amount of previous payment in May, 2005 by default payment next month
  • 29. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 3441.48 5719.37 Std.Dev 13464.01 18792.95 Min 0 0 Q1 0 300 Median 1000 1706 Q3 2975 4545 Max 345293 528666 MAD 1482.6 2529.32 IQR 2974.5 4245 CV 3.91 3.29 Skewness 12.66 10.2 Kurtosis 208.12 155.51 Figure 23: Distribution of amount of previous payment in April,2005 by default payment next month
  • 30. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 35.73 35.42 Std.Dev 9.69 9.08 Min 21 21 Q1 28 28 Median 34 34 Q3 42 41 Max 75 79 MAD 10.38 8.9 IQR 14 13 CV 0.27 0.26 Skewness 0.66 0.75 Kurtosis 0.11 Figure 24: Distribution of age by default payment next month
  • 31. Preliminary Analysis: Exploratory data analysis Estimators Yes No Mean 130109.7 178099.73 Std.Dev 115378.5 131628.36 Min 10000 10000 Q1 50000 70000 Median 90000 150000 Q3 200000 250000 Max 740000 1000000 MAD 88956 133434 IQR 150000 180000 CV 0.89 0.74 Skewness 1.35 0.91 Kurtosis 1.55 0.38 Figure 25: Distribution of amount of given credit bill by default payment next month
  • 32. Correlation analysis Figure 26: Relationship between quantitative variables  Figure 26 shows the correlation coefficients between various pairs of variables in the dataset.  There is a moderate positive correlation between the amounts of bill statements in different months (e.g., BILL_AMT1 and BILL_AMT2 have a correlation coefficient of 0.951).  There are weak positive correlations between the amounts of bill statements and the amounts of previous payments (e.g., PAY_AMT1 and BILL_AMT1 have a correlation coefficient of 0.140).  There are also weak positive correlations between the amounts of bill statements and the repayment status (e.g., PAY_0 and BILL_AMT1 have a correlation coefficient of 0.285).
  • 33. Model Figure 27: Model summary showing each of the estimate, with significant levels.  The coefficient estimates represent the change in the log- odds of the dependent variable (default.payment.next.month) for a one-unit increase in the predictor variable, holding all other variables constant.  For example, the coefficient for AGE is 0.008812. This means that for every one-year increase in age, the log-odds of default.payment.next.month increases by 0.008812 units.  Similarly, the coefficient for PAY_0 is 0.5867. This suggests that a one-unit increase in the PAY_0 variable (repayment status in September 2005) results in a 0.5867 increase in the log-odds of default.payment.next.month, holding all other variables constant.
  • 34. Model parameters Figure 28: Visualization of the model estimates displaying level of significant for each of the estimate.
  • 35. Model  The null deviance (23778) represents the difference in deviance between the null model (with no predictors) and the saturated model (with all predictors).  The residual deviance (20943) represents the difference in deviance between the fitted model and the saturated model.  A lower residual deviance indicates a better fit of the model to the data.  The Akaike Information Criterion (AIC) is a measure of the model's goodness of fit, balancing the fit of the model with the number of parameters used. Lower AIC values indicate better model fit.  The significance codes indicate the statistical significance of each coefficient estimate.  *** '' indicates p < 0.001, ** '' indicates p < 0.01, *'' indicates p < 0.05, '.' indicates p < 0.1, and ' ' indicates p > 0.1.  For example, the coefficient estimates for AGE, PAY_0, BILL_AMT1, and PAY_AMT1 are highly significant (p < 0.001), indicating that these variables are strongly associated with default.payment.next.month.  PAY_AMT5, PAY_AMT4, and PAY_AMT6 have p-values slightly above 0.05, suggesting they may have a weaker association with default.payment.next.month.
  • 36. Model Equation The logistic regression equation can be constructed as follows: logit(p) = -1.412 + 0.008812 * AGE + 0.5867 * PAY_0 + 0.1042 * PAY_2 + 0.1218 * PAY_3 - 0.00000531 * BILL_AMT1 + 0.00000338 * BILL_AMT3 - 0.00001491 * PAY_AMT1 - 0.00001212 * PAY_AMT2 - 0.000003455 * PAY_AMT6 - 0.000003057 * PAY_AMT5 - 0.000003074 * PAY_AMT4 The formula for the logit function is: logit(p)=ln( 𝑃 1 −𝑃 ) Where: p represents the probability of the event occurring.
  • 37. Receiver Operating Characteristic curve Figure 28: Area under the curve (AUC)  The AUC (Area Under the Curve) of the ROC curve quantifies the overall performance of the model.  With an AUC of 0.72: The model demonstrates moderate discrimination ability.  An AUC of 0.72 suggests that the model is better than random chance but may still have room for improvement.  It implies that the model can distinguish between the two classes (positive and negative) with a reasonable degree of accuracy.
  • 38. Model evaluation Figure 30: Visualization of the quantile-quantile plot  In Fig 30, the observed residuals are plotted against the quantiles of a theoretical distribution (usually the standard normal distribution).  If the residuals are normally distributed, the points on the QQ plot will fall approximately along a straight line.  As seen in Fig 30, there is a deviations from a straight line suggest departures from normality in the residuals  This isn’t a desirable characteristic. And it suggests that the residuals are not normally distributed.  This departures from normality in the residuals indicate issues with the model assumptions and the presence of outliers or influential data points.
  • 39. Structure of the dataset Figure 31: Distribution of amount of bill statement in September, 2005 by default payment next month
  • 40. Structure of the dataset Figure 31: Distribution of amount of bill statement in September, 2005 by default payment next month Accuracy Trained Model 80.84 % Retrained Model 80.91 %
  • 41. Recommendations Bill Amount and previous paymentd The bill amounts (e.g., BILL_AMT1, BILL_AMT3) and previous payment amounts (e.g., PAY_AMT1, PAY_AMT2) also play a significant role in predicting default. Encouraging timely bill payments and offering flexible payment options can help mitigate default risks. Customer Segmentation Utilize the insights from the logistic regression model to segment customers based on their risk profiles. This segmentation can help prioritize collections efforts, tailor marketing campaigns, and customize financial products to better meet the needs of different customer segments. Customer Assistance Program Customer Assistance Programs: Implement customer assistance programs or financial counseling services to support customers experiencing financial difficulties. Proactively reaching out to at-risk customers and offering them assistance can help prevent defaults and foster customer loyalty. Payment Status Importance It is crucial for the business to closely monitor customers' payment behavior, especially when there are signs of payment delays or defaults. Age Factor While the age variable (AGE) has a relatively small coefficient, it still contributes to the model's predictive power. Understanding the age distribution of customers and how it correlates with default rates can help tailor marketing strategies or financial products to different age groups.
  • 42. Conclusion  In conclusion, the logistic regression analysis and correlation results provide valuable insights into the factors influencing default payment next month in the dataset.  The analysis highlights the significance of payment status, age, bill amounts, and previous payments in predicting default. By closely monitoring these factors and adapting strategies accordingly, businesses can better manage default risks and improve their financial stability.