SlideShare a Scribd company logo
1 of 35
Download to read offline
Reject Inference in Credit Scoring
Adrien Ehrhardt
Christophe Biernacki, Vincent Vandewalle,
Philippe Heinrich, Sébastien Beben
Crédit Agricole Consumer Finance
Inria Lille - Nord-Europe
Journées de Statistique 2017 - Avignon
29 mai 2017
Table of contents
1 Introduction
Data generation process
Accept / Reject loan applicants
Credit Scoring in practice
2 Reject Inference methods
Introductory example
Maximum likelihood estimation
What is at stake?
A possible reinterpretation
Experimental results
3 Conclusion
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 2 / 22
Introduction
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 3 / 22
Introduction: Data generation process
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 4 / 22
Introduction: Accept / Reject loan applicants
X : random vector of a client’s
characteristics
Y ∈ {0, 1} : repayment performance
Z ∈ {f, nf} : v. a. de financement
n financed clients (Z = f)
m not financed clients (Z = nf)
x : observed features of clients
y : clients’ repayment
x =
xf
xnf ; y =
yf
ynf
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 5 / 22
Introduction: Credit Scoring in practice
“Classical” logistic regression:
∃ θ ∈ Rd+1
s.t. ∀ x, ln
pθ(1|x)
pθ(0|x)
= θ · x
Parameter estimation:
(θ; x, y)
complete
likelihood
=
n
i=1
ln(pθ(yi |xi )) +
n+m
i=n+1
ln(pθ(yi |xi ))
= (θ; xf
, yf
)
observed
likelihood
+ (θ; xnf
, ynf
)
unknown
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 6 / 22
Introduction: Credit Scoring in practice
“Classical” logistic regression:
∃ θ ∈ Rd+1
s.t. ∀ x, ln
pθ(1|x)
pθ(0|x)
= θ · x
Parameter estimation:
(θ; x, y)
complete
likelihood
=
n
i=1
ln(pθ(yi |xi )) +
n+m
i=n+1
ln(pθ(yi |xi ))
= (θ; xf
, yf
)
observed
likelihood
+ (θ; xnf
, ynf
)
unknown
Sample selection problems
Why and how use xnf?
What are the consequences of using “only” (xf, yf)?
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 6 / 22
Reject Inference methods
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 7 / 22
Reject Inference methods: Fuzzy Augmentation I
Fuzzy Augmentation can be found, among others, in [Nguyen, 2016].
yf
ynf









y1
...
yn
NA
...
NA









xf
xnf









x1
1 · · · xd
1
...
...
...
x1
n · · · xd
n
x1
n+1 · · · xd
n+1
...
...
...
x1
n+m · · · xd
n+m









Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 8 / 22
Reject Inference methods: Fuzzy Augmentation II
Step 1: Discard xnf
and estimate ˆθf
= argmaxθ (θ; xf
, yf
).
yf
ynf









y1
...
yn
NA
...
NA









xf
xnf









x1
1 · · · xd
1
...
...
...
x1
n · · · xd
n
x1
n+1 · · · xd
n+1
...
...
...
x1
n+m · · · xd
n+m









Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 9 / 22
Reject Inference methods: Fuzzy Augmentation III
Step 2: Impute ynf
with their estimation given by ˆθf
.
yf
ynf









y1
...
yn
pˆθf(Yn+1 = 1|xn+1)
...
pˆθf(Yn+m = 1|xn+m)









xf
xnf









x1
1 · · · xd
1
...
...
...
x1
n · · · xd
n
x1
n+1 · · · xd
n+1
...
...
...
x1
n+m · · · xd
n+m









Step 3: estimate ˆθfuzzy
= argmaxθ (θ; x, yf
, ˆynf
).
Problem : ˆθfuzzy
= ˆθf
.
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 10 / 22
Reject Inference methods: Maximum likelihood estimation
“Classical” estimation in the Credit Scoring field
ˆθf
= argmax
θ
(θ; xf
, yf
).
“Oracle” estimation knowing ynf
ˆθ = argmax
θ
(θ; x, y).
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 11 / 22
Reject Inference methods: Maximum likelihood estimation
“Classical” estimation in the Credit Scoring field
ˆθf
= argmax
θ
(θ; xf
, yf
).
“Oracle” estimation knowing ynf
ˆθ = argmax
θ
(θ; x, y).
Model space Θ
θf
opt
θopt
ˆθ
ˆθf
Estimation
bias+variance
Estimation
bias+variance
ptrue(y |x)
Model bias
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 11 / 22
Reject Inference methods: Maximum likelihood estimation
“Classical” estimation in the Credit Scoring field
ˆθf
= argmax
θ
(θ; xf
, yf
).
“Oracle” estimation knowing ynf
ˆθ = argmax
θ
(θ; x, y).
Asymptotics ?
Model space Θ
θf
opt
θopt
ˆθ
ˆθf
Estimation
bias+variance
Estimation
bias+variance
ptrue(y |x)
Model bias
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 11 / 22
Reject Inference methods: What is at stake?
Estimators :
1 “Oracle”:
√
n + m(ˆθ − θopt)
L
−−−−−→
n,m→∞
Nd+1(0, Σθopt )
2 Current methodology:
√
n(ˆθf − θf
opt)
L
−−−→
n→∞
Nd+1(0, Σf
θf
opt
)
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
Reject Inference methods: What is at stake?
Estimators :
1 “Oracle”:
√
n + m(ˆθ − θopt)
L
−−−−−→
n,m→∞
Nd+1(0, Σθopt )
2 Current methodology:
√
n(ˆθf − θf
opt)
L
−−−→
n→∞
Nd+1(0, Σf
θf
opt
)
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
Reject Inference methods: What is at stake?
Estimators :
1 “Oracle”:
√
n + m(ˆθ − θopt)
L
−−−−−→
n,m→∞
Nd+1(0, Σθopt )
2 Current methodology:
√
n(ˆθf − θf
opt)
L
−−−→
n→∞
Nd+1(0, Σf
θf
opt
)
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
Reject Inference methods: What is at stake?
Estimators :
1 “Oracle”:
√
n + m(ˆθ − θopt)
L
−−−−−→
n,m→∞
Nd+1(0, Σθopt )
2 Current methodology:
√
n(ˆθf − θf
opt)
L
−−−→
n→∞
Nd+1(0, Σf
θf
opt
)
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
Reject Inference methods: What is at stake?
Estimators :
1 “Oracle”:
√
n + m(ˆθ − θopt)
L
−−−−−→
n,m→∞
Nd+1(0, Σθopt )
2 Current methodology:
√
n(ˆθf − θf
opt)
L
−−−→
n→∞
Nd+1(0, Σf
θf
opt
)
Question 1 : asymptotics of the estimators
(Q1) θopt
?
= θf
opt
(Q2) Σθopt
?
= Σf
θf
opt
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
Reject Inference methods: Missingness mechanism
MAR : ∀ x, y, z, ptrue(z|x, y) = ptrue(z|x)
→ Acceptance is determined by the score : Z = 1{θ X>cut}.
MNAR : ∃ x, y, z, ptrue(z|x, y) = ptrue(z|x)
→ Operators’ “feeling” Xc influence the acceptance.
Y X
Xc
Z
Figure: Dependencies between random variables Y , Xc
, X and Z
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 13 / 22
Reject Inference methods: Model specification
Well-specified model : ∃ θtrue, ptrue(y|x) = pθtrue (y|x).
→ With real data ⇒ hypothesis unlikely to be true.
Misspecified model : θopt is the “best” in the Θ family.
→ Logistic regression commonly used for its robustness to
misspecification.
pθ(y|x)
ptrue(z|x, y)
MAR MNAR
Well specified
θf
opt = θopt
Σf
θf
opt
= Σθopt θf
opt = θopt
Misspecified
θf
opt = θopt Σf
θf
opt
= Σθopt
Σf
θf
opt
= Σθopt
Table: (Q1) and (Q2) w.r.t. model specification and missingness mechanism
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 14 / 22
Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?
Scope for action:
Change model space Θ,
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?
Scope for action:
Change model space Θ,
Model acceptance/rejection process (i.e. pγ(z|x, y)),
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?
Scope for action:
Change model space Θ,
Model acceptance/rejection process (i.e. pγ(z|x, y)),
Use xnf.
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?
Scope for action:
Change model space Θ,
Model acceptance/rejection process (i.e. pγ(z|x, y)),
Use xnf.
Natural way to achieve all three: generative approach
pα(x, y, z) = pβα (x)pθα (y|x)pγα (z|x, y).
( ˆθα , ˆβα, ˆγα) = argmax
θα,βα,γα
(α; x, yf
) = argmax
θα,βα,γα
n
i=1
ln(pθα (yi |xi ))
+
n+m
i=1
ln(pβα (xi )) +
n
i=1
ln(pγα (zi |xi , yi )) .
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?
Scope for action:
Change model space Θ logistic regression,
Model acceptance/rejection process (i.e. pγ(z|x, y)),
Use xnf.
Natural way to achieve all three: generative approach
pα(x, y, z) = pβα (x)pθα (y|x)pγα (z|x, y).
( ˆθα , ˆβα, ˆγα) = argmax
α
(α; x, yf
) = argmax
θα,βα,γα
n
i=1
ln(pθα (yi |xi ))
+
n+m
i=1
ln(pβα (xi )) +
n
i=1
ln(pγα (zi |xi , yi )) .
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?
Scope for action:
Change model space Θ logistic regression,
Model acceptance/rejection process (i.e. pγ(z|x, y))
γ cannot be estimated,
Use xnf.
Natural way to achieve all three: generative approach
pα(x, y, z) = pβα (x)pθα (y|x)pγα (z|x, y).
( ˆθα , ˆβα, ˆγα) = argmax
α
(α; x, yf
) = argmax
θα,βα,γα
n
i=1
ln(pθα (yi |xi ))
+
n+m
i=1
ln(pβα (xi )) +
n
i=1
ln(pγα (zi |xi , yi )) .
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
Reject Inference methods: A possible reinterpretation
Reclassification1 :
(ˆθCEM
, ˆynf
) = argmax
θ,ynf
(θ; x, yf
, ynf
) where ˆyi = argmax
yi
pˆθf (yi |xi ).
Problem: inconsistent estimator.
1
[Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013]
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 16 / 22
Reject Inference methods: A possible reinterpretation
Augmentation2: MAR / misspecified model.
Aug(θ; xf
, yf
) =
n
i=1
1
ptrue(f|xi )
ln(pθ(yi |xi )).
Problem: estimation of ptrue(f|xi ).
Parcelling 3:
(θ; x, yf
, ˆynf
) where ˆyi =
1 w.p. αi pˆθf (1|xi , f)
0 w.p. 1 − αi pˆθf (1|xi , f)
.
Problem: MNAR assumptions hidden in ˆynf
(αi ) impossible to test.
2
[Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013,
Nguyen, 2016]
3
[Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013]
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 17 / 22
Reject Inference methods: Experimental results
Portfolio from a consumer electronics distributor partner:
Approximately 250,000 applications
Approximately 10 discrete (or discretized) features (with interactions):
Socio-professional category(-ies)
Amount of rent
Number of children
Years in current job position
Approximately 3 % of default
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 18 / 22
Conclusion
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 19 / 22
Conclusion
Fuzzy Augmentation, Reclassification (and Twins) were proved
useless.
Augmentation and Parcelling seem legitimate depending on model
specification and missingness mechanism but difficult/impossible to
set forth in practice.
Recommendation: do not practice Reject Inference since high
risk/low return of all tested Reject Inference techniques.
For the most part, Reject Inference had previously been tackled only
experimentally.
Research paper in progress.
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 20 / 22
Thank you for your attention!
Any questions?
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 21 / 22
Banasik, J. and Crook, J. (2007).
Reject inference, augmentation, and sample selection.
European Journal of Operational Research, 183(3):1582–1594.
Guizani, A., Souissi, B., Ammou, S. B., and Saporta, G. (2013).
Une comparaison de quatre techniques d’inférence des refusés dans le
processus d’octroi de crédit.
In 45 èmes Journées de statistique.
Nguyen, H. T. (2016).
Reject inference in application scorecards: evidence from France.
Technical report, University of Paris West-Nanterre la Défense,
EconomiX.
Soulié, F. F. and Viennet, E. (2007).
Le traitement des refusés dans le risque crédit.
Revue des Nouvelles Technologies de l’Information, Data Mining et
Apprentissage Statistique : application en assurance, banque et
marketing, RNTI-A-1:22–44.
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 22 / 22

More Related Content

What's hot

How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization
confluent
 

What's hot (20)

How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization
 
Westgard rules
Westgard rulesWestgard rules
Westgard rules
 
Geospatial Data Analysis and Visualization in Python
Geospatial Data Analysis and Visualization in PythonGeospatial Data Analysis and Visualization in Python
Geospatial Data Analysis and Visualization in Python
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Improving Laboratory Performance Through Quality Control - The role of EQA in...
Improving Laboratory Performance Through Quality Control - The role of EQA in...Improving Laboratory Performance Through Quality Control - The role of EQA in...
Improving Laboratory Performance Through Quality Control - The role of EQA in...
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
Post analytical variables dr kamlesh patel
Post analytical variables dr kamlesh patelPost analytical variables dr kamlesh patel
Post analytical variables dr kamlesh patel
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation
 

Similar to Reject Inference in Credit Scoring

Similar to Reject Inference in Credit Scoring (20)

Classification
ClassificationClassification
Classification
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdf
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
main
mainmain
main
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Lecture11 xing
Lecture11 xingLecture11 xing
Lecture11 xing
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
First part of my NYU Tandon course "Numerical and Simulation Techniques in Fi...
First part of my NYU Tandon course "Numerical and Simulation Techniques in Fi...First part of my NYU Tandon course "Numerical and Simulation Techniques in Fi...
First part of my NYU Tandon course "Numerical and Simulation Techniques in Fi...
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
 

Recently uploaded

NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...
NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...
NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...
Amil baba
 
Financial Accounting and Analysis balancesheet.pdf
Financial Accounting and Analysis balancesheet.pdfFinancial Accounting and Analysis balancesheet.pdf
Financial Accounting and Analysis balancesheet.pdf
mukul381940
 
Prezentacja Q1 2024 EN strona www relacji
Prezentacja Q1 2024  EN strona www relacjiPrezentacja Q1 2024  EN strona www relacji
Prezentacja Q1 2024 EN strona www relacji
klaudiafilka
 
一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书
一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书
一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书
atedyxc
 
一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书
一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书
一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书
atedyxc
 
原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作
原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作
原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作
uotyyd
 
一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书
一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书
一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书
atedyxc
 
一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书
一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书
一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书
atedyxc
 

Recently uploaded (20)

Production and Cost of the firm with curves
Production and Cost of the firm with curvesProduction and Cost of the firm with curves
Production and Cost of the firm with curves
 
amil baba kala jadu expert uk amil baba kala jadu removal uk amil baba in mal...
amil baba kala jadu expert uk amil baba kala jadu removal uk amil baba in mal...amil baba kala jadu expert uk amil baba kala jadu removal uk amil baba in mal...
amil baba kala jadu expert uk amil baba kala jadu removal uk amil baba in mal...
 
NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...
NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...
NO1 Best kala jadu karne wale ka contact number kala jadu karne wale baba kal...
 
FEW OF THE DEVELOPMENTS FOUND IN LESOTHO
FEW OF THE DEVELOPMENTS FOUND IN LESOTHOFEW OF THE DEVELOPMENTS FOUND IN LESOTHO
FEW OF THE DEVELOPMENTS FOUND IN LESOTHO
 
DIGITAL COMMERCE SHAPE VIETNAMESE SHOPPING HABIT IN 4.0 INDUSTRY
DIGITAL COMMERCE SHAPE VIETNAMESE SHOPPING HABIT IN 4.0 INDUSTRYDIGITAL COMMERCE SHAPE VIETNAMESE SHOPPING HABIT IN 4.0 INDUSTRY
DIGITAL COMMERCE SHAPE VIETNAMESE SHOPPING HABIT IN 4.0 INDUSTRY
 
Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptxSlideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
 
Financial Accounting and Analysis balancesheet.pdf
Financial Accounting and Analysis balancesheet.pdfFinancial Accounting and Analysis balancesheet.pdf
Financial Accounting and Analysis balancesheet.pdf
 
kala jadu for love | world no 1 amil baba | best amil baba in karachi |bangal...
kala jadu for love | world no 1 amil baba | best amil baba in karachi |bangal...kala jadu for love | world no 1 amil baba | best amil baba in karachi |bangal...
kala jadu for love | world no 1 amil baba | best amil baba in karachi |bangal...
 
Prezentacja Q1 2024 EN strona www relacji
Prezentacja Q1 2024  EN strona www relacjiPrezentacja Q1 2024  EN strona www relacji
Prezentacja Q1 2024 EN strona www relacji
 
Tourism attractions in Lesotho katse dam
Tourism attractions in Lesotho katse damTourism attractions in Lesotho katse dam
Tourism attractions in Lesotho katse dam
 
STRATEGIC MANAGEMENT VIETTEL TELECOM GROUP
STRATEGIC MANAGEMENT VIETTEL TELECOM GROUPSTRATEGIC MANAGEMENT VIETTEL TELECOM GROUP
STRATEGIC MANAGEMENT VIETTEL TELECOM GROUP
 
一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书
一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书
一比一原版(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单学位证书
 
Managing personal finances wisely for financial stability and
Managing personal finances wisely for financial stability  andManaging personal finances wisely for financial stability  and
Managing personal finances wisely for financial stability and
 
一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书
一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书
一比一原版(BU毕业证书)波士顿大学毕业证成绩单学位证书
 
原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作
原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作
原版一模一样(bu文凭证书)美国贝翰文大学毕业证文凭证书制作
 
一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书
一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书
一比一原版(Cornell毕业证书)康奈尔大学毕业证成绩单学位证书
 
Module 1 CCP - Introduction and LMS.pptx
Module 1 CCP  - Introduction and LMS.pptxModule 1 CCP  - Introduction and LMS.pptx
Module 1 CCP - Introduction and LMS.pptx
 
Satoshi DEX Leverages Layer 2 To Transform DeFi Ecosystem.pdf
Satoshi DEX Leverages Layer 2 To Transform DeFi Ecosystem.pdfSatoshi DEX Leverages Layer 2 To Transform DeFi Ecosystem.pdf
Satoshi DEX Leverages Layer 2 To Transform DeFi Ecosystem.pdf
 
一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书
一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书
一比一原版(UC Davis毕业证书)加州大学戴维斯分校毕业证成绩单学位证书
 
certified amil baba ,black magic specialist in russia and kala jadu expert in...
certified amil baba ,black magic specialist in russia and kala jadu expert in...certified amil baba ,black magic specialist in russia and kala jadu expert in...
certified amil baba ,black magic specialist in russia and kala jadu expert in...
 

Reject Inference in Credit Scoring

  • 1. Reject Inference in Credit Scoring Adrien Ehrhardt Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, Sébastien Beben Crédit Agricole Consumer Finance Inria Lille - Nord-Europe Journées de Statistique 2017 - Avignon 29 mai 2017
  • 2. Table of contents 1 Introduction Data generation process Accept / Reject loan applicants Credit Scoring in practice 2 Reject Inference methods Introductory example Maximum likelihood estimation What is at stake? A possible reinterpretation Experimental results 3 Conclusion Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 2 / 22
  • 3. Introduction Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 3 / 22
  • 4. Introduction: Data generation process Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 4 / 22
  • 5. Introduction: Accept / Reject loan applicants X : random vector of a client’s characteristics Y ∈ {0, 1} : repayment performance Z ∈ {f, nf} : v. a. de financement n financed clients (Z = f) m not financed clients (Z = nf) x : observed features of clients y : clients’ repayment x = xf xnf ; y = yf ynf Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 5 / 22
  • 6. Introduction: Credit Scoring in practice “Classical” logistic regression: ∃ θ ∈ Rd+1 s.t. ∀ x, ln pθ(1|x) pθ(0|x) = θ · x Parameter estimation: (θ; x, y) complete likelihood = n i=1 ln(pθ(yi |xi )) + n+m i=n+1 ln(pθ(yi |xi )) = (θ; xf , yf ) observed likelihood + (θ; xnf , ynf ) unknown Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 6 / 22
  • 7. Introduction: Credit Scoring in practice “Classical” logistic regression: ∃ θ ∈ Rd+1 s.t. ∀ x, ln pθ(1|x) pθ(0|x) = θ · x Parameter estimation: (θ; x, y) complete likelihood = n i=1 ln(pθ(yi |xi )) + n+m i=n+1 ln(pθ(yi |xi )) = (θ; xf , yf ) observed likelihood + (θ; xnf , ynf ) unknown Sample selection problems Why and how use xnf? What are the consequences of using “only” (xf, yf)? Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 6 / 22
  • 8. Reject Inference methods Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 7 / 22
  • 9. Reject Inference methods: Fuzzy Augmentation I Fuzzy Augmentation can be found, among others, in [Nguyen, 2016]. yf ynf          y1 ... yn NA ... NA          xf xnf          x1 1 · · · xd 1 ... ... ... x1 n · · · xd n x1 n+1 · · · xd n+1 ... ... ... x1 n+m · · · xd n+m          Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 8 / 22
  • 10. Reject Inference methods: Fuzzy Augmentation II Step 1: Discard xnf and estimate ˆθf = argmaxθ (θ; xf , yf ). yf ynf          y1 ... yn NA ... NA          xf xnf          x1 1 · · · xd 1 ... ... ... x1 n · · · xd n x1 n+1 · · · xd n+1 ... ... ... x1 n+m · · · xd n+m          Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 9 / 22
  • 11. Reject Inference methods: Fuzzy Augmentation III Step 2: Impute ynf with their estimation given by ˆθf . yf ynf          y1 ... yn pˆθf(Yn+1 = 1|xn+1) ... pˆθf(Yn+m = 1|xn+m)          xf xnf          x1 1 · · · xd 1 ... ... ... x1 n · · · xd n x1 n+1 · · · xd n+1 ... ... ... x1 n+m · · · xd n+m          Step 3: estimate ˆθfuzzy = argmaxθ (θ; x, yf , ˆynf ). Problem : ˆθfuzzy = ˆθf . Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 10 / 22
  • 12. Reject Inference methods: Maximum likelihood estimation “Classical” estimation in the Credit Scoring field ˆθf = argmax θ (θ; xf , yf ). “Oracle” estimation knowing ynf ˆθ = argmax θ (θ; x, y). Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 11 / 22
  • 13. Reject Inference methods: Maximum likelihood estimation “Classical” estimation in the Credit Scoring field ˆθf = argmax θ (θ; xf , yf ). “Oracle” estimation knowing ynf ˆθ = argmax θ (θ; x, y). Model space Θ θf opt θopt ˆθ ˆθf Estimation bias+variance Estimation bias+variance ptrue(y |x) Model bias Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 11 / 22
  • 14. Reject Inference methods: Maximum likelihood estimation “Classical” estimation in the Credit Scoring field ˆθf = argmax θ (θ; xf , yf ). “Oracle” estimation knowing ynf ˆθ = argmax θ (θ; x, y). Asymptotics ? Model space Θ θf opt θopt ˆθ ˆθf Estimation bias+variance Estimation bias+variance ptrue(y |x) Model bias Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 11 / 22
  • 15. Reject Inference methods: What is at stake? Estimators : 1 “Oracle”: √ n + m(ˆθ − θopt) L −−−−−→ n,m→∞ Nd+1(0, Σθopt ) 2 Current methodology: √ n(ˆθf − θf opt) L −−−→ n→∞ Nd+1(0, Σf θf opt ) Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
  • 16. Reject Inference methods: What is at stake? Estimators : 1 “Oracle”: √ n + m(ˆθ − θopt) L −−−−−→ n,m→∞ Nd+1(0, Σθopt ) 2 Current methodology: √ n(ˆθf − θf opt) L −−−→ n→∞ Nd+1(0, Σf θf opt ) Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
  • 17. Reject Inference methods: What is at stake? Estimators : 1 “Oracle”: √ n + m(ˆθ − θopt) L −−−−−→ n,m→∞ Nd+1(0, Σθopt ) 2 Current methodology: √ n(ˆθf − θf opt) L −−−→ n→∞ Nd+1(0, Σf θf opt ) Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
  • 18. Reject Inference methods: What is at stake? Estimators : 1 “Oracle”: √ n + m(ˆθ − θopt) L −−−−−→ n,m→∞ Nd+1(0, Σθopt ) 2 Current methodology: √ n(ˆθf − θf opt) L −−−→ n→∞ Nd+1(0, Σf θf opt ) Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
  • 19. Reject Inference methods: What is at stake? Estimators : 1 “Oracle”: √ n + m(ˆθ − θopt) L −−−−−→ n,m→∞ Nd+1(0, Σθopt ) 2 Current methodology: √ n(ˆθf − θf opt) L −−−→ n→∞ Nd+1(0, Σf θf opt ) Question 1 : asymptotics of the estimators (Q1) θopt ? = θf opt (Q2) Σθopt ? = Σf θf opt Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 12 / 22
  • 20. Reject Inference methods: Missingness mechanism MAR : ∀ x, y, z, ptrue(z|x, y) = ptrue(z|x) → Acceptance is determined by the score : Z = 1{θ X>cut}. MNAR : ∃ x, y, z, ptrue(z|x, y) = ptrue(z|x) → Operators’ “feeling” Xc influence the acceptance. Y X Xc Z Figure: Dependencies between random variables Y , Xc , X and Z Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 13 / 22
  • 21. Reject Inference methods: Model specification Well-specified model : ∃ θtrue, ptrue(y|x) = pθtrue (y|x). → With real data ⇒ hypothesis unlikely to be true. Misspecified model : θopt is the “best” in the Θ family. → Logistic regression commonly used for its robustness to misspecification. pθ(y|x) ptrue(z|x, y) MAR MNAR Well specified θf opt = θopt Σf θf opt = Σθopt θf opt = θopt Misspecified θf opt = θopt Σf θf opt = Σθopt Σf θf opt = Σθopt Table: (Q1) and (Q2) w.r.t. model specification and missingness mechanism Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 14 / 22
  • 22. Reject Inference methods: How to use xnf ? Question 2: How to construct a better estimator than ˆθf? Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
  • 23. Reject Inference methods: How to use xnf ? Question 2: How to construct a better estimator than ˆθf? Scope for action: Change model space Θ, Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
  • 24. Reject Inference methods: How to use xnf ? Question 2: How to construct a better estimator than ˆθf? Scope for action: Change model space Θ, Model acceptance/rejection process (i.e. pγ(z|x, y)), Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
  • 25. Reject Inference methods: How to use xnf ? Question 2: How to construct a better estimator than ˆθf? Scope for action: Change model space Θ, Model acceptance/rejection process (i.e. pγ(z|x, y)), Use xnf. Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
  • 26. Reject Inference methods: How to use xnf ? Question 2: How to construct a better estimator than ˆθf? Scope for action: Change model space Θ, Model acceptance/rejection process (i.e. pγ(z|x, y)), Use xnf. Natural way to achieve all three: generative approach pα(x, y, z) = pβα (x)pθα (y|x)pγα (z|x, y). ( ˆθα , ˆβα, ˆγα) = argmax θα,βα,γα (α; x, yf ) = argmax θα,βα,γα n i=1 ln(pθα (yi |xi )) + n+m i=1 ln(pβα (xi )) + n i=1 ln(pγα (zi |xi , yi )) . Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
  • 27. Reject Inference methods: How to use xnf ? Question 2: How to construct a better estimator than ˆθf? Scope for action: Change model space Θ logistic regression, Model acceptance/rejection process (i.e. pγ(z|x, y)), Use xnf. Natural way to achieve all three: generative approach pα(x, y, z) = pβα (x)pθα (y|x)pγα (z|x, y). ( ˆθα , ˆβα, ˆγα) = argmax α (α; x, yf ) = argmax θα,βα,γα n i=1 ln(pθα (yi |xi )) + n+m i=1 ln(pβα (xi )) + n i=1 ln(pγα (zi |xi , yi )) . Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
  • 28. Reject Inference methods: How to use xnf ? Question 2: How to construct a better estimator than ˆθf? Scope for action: Change model space Θ logistic regression, Model acceptance/rejection process (i.e. pγ(z|x, y)) γ cannot be estimated, Use xnf. Natural way to achieve all three: generative approach pα(x, y, z) = pβα (x)pθα (y|x)pγα (z|x, y). ( ˆθα , ˆβα, ˆγα) = argmax α (α; x, yf ) = argmax θα,βα,γα n i=1 ln(pθα (yi |xi )) + n+m i=1 ln(pβα (xi )) + n i=1 ln(pγα (zi |xi , yi )) . Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 15 / 22
  • 29. Reject Inference methods: A possible reinterpretation Reclassification1 : (ˆθCEM , ˆynf ) = argmax θ,ynf (θ; x, yf , ynf ) where ˆyi = argmax yi pˆθf (yi |xi ). Problem: inconsistent estimator. 1 [Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013] Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 16 / 22
  • 30. Reject Inference methods: A possible reinterpretation Augmentation2: MAR / misspecified model. Aug(θ; xf , yf ) = n i=1 1 ptrue(f|xi ) ln(pθ(yi |xi )). Problem: estimation of ptrue(f|xi ). Parcelling 3: (θ; x, yf , ˆynf ) where ˆyi = 1 w.p. αi pˆθf (1|xi , f) 0 w.p. 1 − αi pˆθf (1|xi , f) . Problem: MNAR assumptions hidden in ˆynf (αi ) impossible to test. 2 [Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013, Nguyen, 2016] 3 [Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013] Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 17 / 22
  • 31. Reject Inference methods: Experimental results Portfolio from a consumer electronics distributor partner: Approximately 250,000 applications Approximately 10 discrete (or discretized) features (with interactions): Socio-professional category(-ies) Amount of rent Number of children Years in current job position Approximately 3 % of default Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 18 / 22
  • 32. Conclusion Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 19 / 22
  • 33. Conclusion Fuzzy Augmentation, Reclassification (and Twins) were proved useless. Augmentation and Parcelling seem legitimate depending on model specification and missingness mechanism but difficult/impossible to set forth in practice. Recommendation: do not practice Reject Inference since high risk/low return of all tested Reject Inference techniques. For the most part, Reject Inference had previously been tackled only experimentally. Research paper in progress. Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 20 / 22
  • 34. Thank you for your attention! Any questions? Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 21 / 22
  • 35. Banasik, J. and Crook, J. (2007). Reject inference, augmentation, and sample selection. European Journal of Operational Research, 183(3):1582–1594. Guizani, A., Souissi, B., Ammou, S. B., and Saporta, G. (2013). Une comparaison de quatre techniques d’inférence des refusés dans le processus d’octroi de crédit. In 45 èmes Journées de statistique. Nguyen, H. T. (2016). Reject inference in application scorecards: evidence from France. Technical report, University of Paris West-Nanterre la Défense, EconomiX. Soulié, F. F. and Viennet, E. (2007). Le traitement des refusés dans le risque crédit. Revue des Nouvelles Technologies de l’Information, Data Mining et Apprentissage Statistique : application en assurance, banque et marketing, RNTI-A-1:22–44. Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 22 / 22