Study on Application of Ensemble learning on Credit Scoring

Study on Application of
Ensemble learning on
Credit Scoring
Ce Chen
Hokkaido University
Graduate School of Information Science and Technology
Laboratory of Harmonious Systems Engineering
1

Background
• Increase in financing opportunities due to the
development of Fintech
– Many people who have not received loans until now will be eligible
for loans.
– More and more people are involved in credit.
• Growing needs for “credit scoring”
– We need to have an effective way to judge whether the credit
of the individual is good or bad.
– By judging individual credit correctly, it can effectively
reduce the losses.
2

Problems of Credit Scoring
• Few understandable models
– Actual operation is difficult in models that do not know
the calculation process of Credit scoring
• Multiple evaluation indices
– Improve accuracy, minimize loss amount, etc.
• Imbalanced data in credit samples
– The number of good credit samples is more than the
bad credit samples.
• Machine learning is needed to tackle the problem
3

Purpose
• Purpose
– Building a model that can better deal with credit scoring
problems with machine learning.
• Approach
– Credit scoring as classification problem
– Ensemble Learning : XGBoost
• High accuracy reported in previous research
• Explainable criteria based on decision tree
• Evaluation indicators considering cost sensitive
– EasyEnsemble and Focal loss for imbalanced data
4

Credit Scoring as Classification Problem
• Use machine learning to solve credit scoring
5
X11,X12,X13,......Y1
X21,X22,X23,......Y2
.....
Xn1,Xn2,Xn3,......Yn
past debator
scoring model
scoring model
Input:
Xm1,Xm2,Xm3,.....
new customer's feature
Output:
probability of Y
new customer's credit
If probability of y>threshold, bad credit
If probability of y<=threshold, good credit
X:Feature of Individual
Y:Credit of Individual
Y=1, bad credit
Y=0,good credit

The principle of XGBoost 6
)(obj objective
 The coefficient in front of x

n
i
ii yyl ),(
^
cost function
n Number of samples
iy real value of y
iy
^
predict value of y


K
k
kf
1
)( regularization
K Number of decision tree
• Objective=cost function + regularization
kf The complexity of decision tree

)(^ t
iy The predicted results of round t
)( it xf The predicted result of the current tree
• After round t training
yy
t
i predict
)(^


• Difference between XGBoost and Gradient Boost
:First Derivative
:Second Derivative
• Objective results only depend on the first and second
derivatives of the cost function. For complex cost functions, it
can be easier to calculate.
second order Taylor expansion
cost function

XGBoost in Previous Research 9
• Dataset：
Australian,Japanese
• Model:
k-NearestNeighbor,Logistic
Regression,Linear Discriminant
Analysis,Support Vector Machine,
Decision Tree,Random
Forest,Gradient Boost Decision
Tree,Adaboost,XGBoost
• Result：
By comparing the models,
XGBoost has higher accuracy
He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble
method for credit scoring: Adaption of different
imbalance ratios. Expert Systems with Applications, 98,
105–117.

Assessment method(AUC) 10
predict_true predict_false
label_true TP FN
label_false FP TN
false positive rate= FP/(FP+TN)
true positive rate=TP/(TP+FN)
AUC(size of blue area)
In the figure, (0,1) is the best case, and all
samples can be separated correctly. The closer
the blue line is to (0,1), the more accurate the
model will be

Assessment method(cost sensitive) 11
• For the credit scoring problem, we need only the total
cost value to judge whether the model is good or bad.
• Cost sensitive:
• 1
• 2
• 3
amount cost
FP
FN
)( 21
1
LCLC FNFPCC FPFN

21 LCLC FNFP 
)(
1
21 LCLC
Total
EMC FNFP 
1L
2L
FPC
FNC
21cos LCLCttotalthe FNFP 
FNFP CC
samplestestofNumberTotal

Assessment method(cost sensitive) 12
German Australian
LR 177.8 51
DT 191.6 86
RF 190.8 52.6
XGBoost 165.2 43
German Australian
LR 0.875 0.393
DT 0.976 0.620
RF 0.954 0.394
XGBoost 0.813 0.381
21 LCLC FNFP 
German Australian
LR 29.6 8.8
DT 30.5 14.2
RF 31.8 9.1
XGBoost 27.5 8.6
)( 21
1
LCLC FNFPCC FPFN

)(
1
21 LCLC
Total
EMC FNFP 
West, D. (2000). Neural network credit
scoring models. Computers & Operations
Research, 27(11-12), 1131–1152.
LR：Logistic Regression
DT：Decision Tree
RF：Random Forest
The average cost of a sample,
The change of test samples' amount doesn't affect the result

Assessment method(cost senstive)
• The value of and
– , is usually set to 1, is set to a constant
that greater than 1.
• In many papers, ,
– Ting, K. M. Inducing cost-sensitive trees via instance weighting.
Lecture Notes in Computer Science, 1998
– C Elkan. The foundations of cost-sensitive learning.International
joint conference on artificial intelligence, 2001
– The contributor of German dataset suggests ,
• In this paper, ,
13
NFC
5nP FC
FNFP CC 
1N FC
nP FC
1N FC
5P FC
NFC PFC
1N FC
5nP FC

Proposed Methods
1．EasyEnsemble+XGBoost
– Easyensemble as a resampling technique, XGBoost as a
base model.
2．Change the structure of XGBoost
2.1．Customizing evalution metric
• EMC cost fomula as evalution metric
• weight(parameter of xgboost)+threshold(parameter in cost
fomula)
2.2．Customizing cost function(Focal loss)
• EMC cost fomula as evalution metric
• Focal loss as cost function.
14

Experiment Setting
• 5-fold cross Validation
• XGBoost
– XGBoost module in python
• Tunning Method
– Grid Search(Preventing local optimization and perform two rounds
of tuning on important parameters)
• number of boosting round→ eta(learning rate)→ max depth,min child weight→
subsample,colsample bytree→eta(learning rate)→ number of boosting round
15

Data Set for Credit Scoring
• Data on credit scoring
– the amount of public data is small
• Data used in previous research
16
Datasets Samples Features Good/Bad
German 1000 24 700/300
Australian 690 14 307/383
Taiwan 30000 23 23364/6636
Qianhai 40000 491 34737/5263
Japanese 690 15 307/383

Data Set for Credit Scoring
• Introduction of data set
• The first method
German，Australian，Taiwan，Qianhai
• The second method
German，Taiwan
17
Datasets Samples Features Good/Bad
German 1000 24 700/300
Australian 690 14 307/383
Taiwan 30000 23 23364/6636
Qianhai 40000 491 34737/5263

Method 1(EasyEnsemble) 18
Purpose: Increase the sensitivity of minority samples. Reducing cost without reducing AUC.
Reducing losses of creditors without reducing customer satisfaction.
X1 X2 Y
1 1 0
2 2 0
3 3 0
4 4 1
5 5 1
X1 X2 Y
1 1 0
2 2 0
3 3 0
X1 X2 Y
1 1 0
2 2 0
X1 X2 Y
2 2 0
3 3 0
X1 X2 Y
3 3 0
1 1 0
A3
A1
A2
A
X1 X2 Y
4 4 1
5 5 1
B
Divide sample into two classes according to the value of y.Majority
class (good credit)is divided into several small groups, the number of
which is equal to minority class(bad credit.)

Method 1(EasyEnsemble) 19
A1+B A2+B A3+B
adaboost adaboost adaboost
Class（Y）

Experiment(EasyEnsemble) 20
Resampling AUC
Origin(adaboost
)
0.750
OverSampling 0.751
SMOTE 0.733
UnderSampling 0.742
EasyEnsemble 0.771
Resampling AUC
Origin(adaboost
)
0.763
OverSampling 0.758
SMOTE 0.719
UnderSampling 0.759
EasyEnsemble 0.776
German dataset Taiwan dataset
• Oversampling is easily lead to overfitting.
• Undersampling is easily lead to underfitting.
• Smote:Manually generated data.
• EasyEnsemble:All data is original data.

Experiment Setting(EasyEnsemble) 21
X1 X2 Y
1 1 0
2 2 0
3 3 0
4 4 1
5 5 1
X1 X2 Y
1 1 0
2 2 0
3 3 0
X1 X2 Y
1 1 0
2 2 0
X1 X2 Y
2 2 0
3 3 0
X1 X2 Y
3 3 0
1 1 0
A3
A1
A2
A
X1 X2 Y
4 4 1
5 5 1
B
X1 X2 Y
4 4 1
5 5 1
X1 X2 Y
4 4 1
4 4 1
B1
B2
bootstrap method:
Put it back after extracting

Experiment Setting(EasyEnsemble)
Ensemble learning
A1+B1
xgboost
P(Y=1) P(Y=0)
feature importance(1)
n
YP
Y
n
i
i

 1
)1(
)1(P
n
YP
Y
n
i
i

 1
)0(
)0(P
n
ceimpofeature
ceimpor
n
i
n
 1
tan
tanfeature
xgboost xgboost
A2+B2 An+Bn
P(Y=1) P(Y=0)
feature importance(2)
P(Y=1) P(Y=0)
feature importance(n)
Resampling:Easyensemble
Base model:XGBoost
parameter(colsample bytree)
adjusted within range of 0.1
colsample bytree:Column
sampling, select the proportion
of features
n:Number of base models
outpu(simple average method):
• probability of y
• feature importance

Outcome(EasyEnsemble) 23
german australian taiwan qianhai
EasyEnsemble 0.77 0.88 0.70 0.65
XGBoost 0.81 0.95 0.78 0.71
XGBoost_Easy
Ensemble
0.82 0.95 0.78 0.72
AUC
german australian taiwan qianhai
XGBoost 0.813 0.381 0.741 0.713
XGBoost_Easy
Ensemble
0.578 0.343 0.556 0.541
Cost(EMC)
Cost is reduced
without reducing AUC.
XGBoost is optimized
without losing accuracy

Method 2(Change structure)
• Purpose:
– Customizing evaluation metric to get the minimum cost
– Reducing cost without considering AUC
– The only objective is to reduce loss
• Evaluation metric
– Playing no role in directly optimizing or training model
– Stopping model from training once it stops improving
– Example:people use the logloss objective to train,create an AUC
metric to evaluate the model.
• Cost function
– The critical function to training
– It need to be optimized
24

Method 2.1(Evalution metric)
• Weight(parameter of XGBoost)
– Adjust the weight of minority class.When the cost reach lowest,
value of weight cannot be obtained.
– grad:first derivative
– hess:second derivative
• Evaluation metric(parameter of XGBoost)
– By customizing the evaluation metric, we can minimize the cost of
the model
25
originnew gradweightgrad *
originnew hessweighthess *

Experiment Setting(Evalution metric)
2.1 Customizing Evaluation metrics
• Weight
Adjust the weight of minority class
• Add threshold in evaluation metrics
– default threshold=0.5
– If probability of y>threshold, predict value of y=1
– If probability of y<=threshold, predict value of y=0
26
)(
1
21 LCLC
Total
EMC FNFP 

Outcome(Evalution metric) 27
German n=5
XGBoost 0.813
XGBoost_Customized
Evalution_metric
0.565
Taiwan n=5
XGBoost 0.742
XGBoost_Customized
Evalution_metric
0.553
Range of weight(1,10, interval=1)
Range of threshold(0.2,0.8,interval=0.05)
The best parameters
• weight=5,threshold=0.5
Range of weight(1,10, interval=1)
Range of threshold(0.2,0.8,interval=0.05)
The best parameters
• weight=3 ,threshold=0.4

Method 2.2(Focal loss)
• Focal loss:
• In focal loss:
• Reduce the weight of Easy negative(Easy Example)
• Increase the weight of hard negative(Hard
Example)
• Increase the sensitivity of minority class
28
      

N
i
iiiiii xpyxpy
N
xyloss
1
1log)1(log)(
1
),(log

)( ixp
))(1( ixp
))(1log()()1())(log())(1(
1
iii
m
i
i xpxpyxpxpyFL  


))(1log()()1())(log())(1(
1
iii
m
i



Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2018). Focal loss for dense object detection.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1.

Method 2.2(Focal loss)
• Credit scoring
– Features of good credit and bad credit are very different.
– Most of good credit samples are easy negative samples.
– Only a few sample features are similar with bad credit samples.
• Cost function needs to calculate the first and second derivatives
– First derivative
– Second derivative
• Weight(function same to alplha,ignore)
29
)))(1log())(1()(()()1()))(log()()(1())(1( iiiiiiii xpxpxpxpyxpxpxpxpy   
  12)()12()1)()())((log())(1)((  
iiiiii xpxpxpxpxpxyp
    12))(1)(12())()())((1))((1log())(1()(y-1  
iiiiiii xpxpxpxpxpxpxp

Experiment Setting(Focal loss) 30
))(1log()()1())(log())(1(
1
iii
m
i



When α=1, =0，FL=logloss
lossaFL log)0,1(  
      

N
i
iiiiii xpyxpy
N
xyloss
1
1log)1(log)(
1
),(log

0default
1default
5.0thresholddefault
Therefore

Outcome(Focal loss) 31
threshold cost
0.05 0.567
0.1 0.539
0.2 0.545
0.3 0.614
0.4 0.693
gamma cost
0 0.813
0.5 0.797
1 0.800
2 0.822
3 0.817
4 0.832
aplha cost
1 0.813
2 0.671
3 0.636
4 0.582
5 0.567
6 0.575
threshold cost
0.05 0.741
0.1 0.612
0.2 0.552
0.3 0.616
0.4 0.686
gamma cost
0 0.742
0.5 0.741
1 0.747
2 0.741
3 0.745
4 0.746
aplha cost
1 0.742
2 0.641
3 0.581
4 0.553
5 0.554
6 0.557
German dataset
Taiwan dataset
alpha=1,gamma=0 alpha=1,threshold=0.5 gamma=0,threshold=0.5
alpha=1
threshold=0.4
gamma=3
lowerst cost=0.547
alpha=2
threshold=0.3
gamma=1.5
lowest cost=0.532
alpha=1,gamma=0 alpha=1,threshold=0.5 gamma=0,threshold=0.5
Range
alpha(0,10, interval=1)
gamma(0,5,interval=0.5)
threshold(0,0.8,interval=0.05)

Ggerman n=5
XGBoost 0.813
XGBoost_focal 0.532
German dataset
Number of FP,FN in XGBoost Number of FP,FN in XGBoost_focal
predict
good
predict
bad
true
good
124 15
ture bad 29 30
Saved cost：47
predict
good
predict
bad
true
good
126 53
ture bad 12 27
21cos LCLCttotalthe FNFP 
1N FC 5P FC

Taiwan n=5
XGBoost 0.742
XGBoost_focal 0.547
Taiwan dataset
predict
good
predict
bad
true
good
4445 227
ture bad 844 482
Number of FP,FN in XGBoostNumber of FP,FN in XGBoost_focal
predict
good
predict
bad
true
good
3397 1275
ture bad 402 924
of each sample：amount of loan
1N %20 XofAmountCF 
1P XofAmountCF 
1X
Saved cost：85200 (Taiwan New Dollar)
)( 1L
)( 2L
)( 1L
)( 2L
predict
good
predict
bad
true
good
0 1680
ture bad 74000 0
predict
good
predict
bad
true
good
0 2880
ture bad 158000 0
Total cost of FP,FN in XGBoost_focal Total cost of FP,FN in XGBoost

Discussion
1．EasyEnsemble +XGBoost
– All the data are original data
– Error=bias+variance
– is the variance, is correlation coefficient
– is Number of base models
– Increasing the difference of the base model can reduce the
variance and thus reduce the error
34
2N2P
1
11
)
1
( NNPP
n
i
i
n
n
nn
n
n
X
n
Var 


 



N ,P
n
NP  ,
)
1
()
1
()
1
(  
n
i
i
n
i
i
n
i
i D
n
VarP
n
VarX
n
Var
2N2P
1
11
)
1
( NNPP
n
i
i
n
n
nn
n
n
X
n
Var 


 




Discussion
2．Change the structure of XGBoost
2.1．Customizing evalution metric
• The clear target was determined, when the value of evaluation
metric is optimal, and the model stop training
2.2．Customizing cost function(Focal loss)
• Increasing the sensitivity of minority samples.
• Distinguishing between difficult and easy to sample.
35

Conclusion
• Growing needs for “credit scoring”
• Problem
– Understandable
– Imbalanced data
– Cost sensitive
• Solution
– Resampling(Easyensemble)
– Changing structure(customize evaluation metric and cost function )
• Outcome
– Reducing cost
– Reducing creditors' losses
36

Research performance
・Information Processing Society of Japan
1) Ce Chen, Soichiro Yokoyama, Tomohisa Yamashita, Hidenori Kawamura: Application of XGBoost to credit
scoring , Special Internet Groups(Sig),Vol 194, Hokkaido(2019)
7

Study on Application of Ensemble learning on Credit Scoring

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Study on Application of Ensemble learning on Credit Scoring

Similar to Study on Application of Ensemble learning on Credit Scoring (20)

More from harmonylab

More from harmonylab (20)

Recently uploaded

Recently uploaded (20)

Study on Application of Ensemble learning on Credit Scoring