Reject Inference in Credit Scoring

Reject Inference in Credit Scoring
Adrien Ehrhardt
Christophe Biernacki, Vincent Vandewalle,
Philippe Heinrich, Sébastien Beben
Crédit Agricole Consumer Finance
Inria Lille - Nord-Europe
Journées de Statistique 2017 - Avignon
29 mai 2017

Table of contents
1 Introduction
Data generation process
Accept / Reject loan applicants
Credit Scoring in practice
2 Reject Inference methods
Introductory example
Maximum likelihood estimation
What is at stake?
A possible reinterpretation
Experimental results
3 Conclusion
Adrien Ehrhardt (CACF - Inria) Reject Inference 29 mai 2017 2 / 22

Introduction

Introduction: Data generation process

Introduction: Accept / Reject loan applicants
X : random vector of a client’s
characteristics
Y ∈ {0, 1} : repayment performance
Z ∈ {f, nf} : v. a. de financement
n financed clients (Z = f)
m not financed clients (Z = nf)
x : observed features of clients
y : clients’ repayment
x =
xf
xnf ; y =
yf
ynf

Introduction: Credit Scoring in practice
“Classical” logistic regression:
∃ θ ∈ Rd+1
s.t. ∀ x, ln
pθ(1|x)
pθ(0|x)
= θ · x
Parameter estimation:
(θ; x, y)
complete
likelihood
=
n
i=1
ln(pθ(yi |xi )) +
n+m
i=n+1
ln(pθ(yi |xi ))
= (θ; xf
, yf
)
observed
likelihood
+ (θ; xnf
, ynf
)
unknown

Introduction: Credit Scoring in practice
“Classical” logistic regression:
∃ θ ∈ Rd+1
s.t. ∀ x, ln
pθ(1|x)
pθ(0|x)
= θ · x
Parameter estimation:
(θ; x, y)
complete
likelihood
=
n
i=1
ln(pθ(yi |xi )) +
n+m
i=n+1
ln(pθ(yi |xi ))
= (θ; xf
, yf
)
observed
likelihood
+ (θ; xnf
, ynf
)
unknown
Sample selection problems
Why and how use xnf?
What are the consequences of using “only” (xf, yf)?

Reject Inference methods

Reject Inference methods: Fuzzy Augmentation I
Fuzzy Augmentation can be found, among others, in [Nguyen, 2016].
yf
ynf









y1
...
yn
NA
...
NA









xf
xnf









x1
1 · · · xd
1
...
...
...
x1
n · · · xd
n
x1
n+1 · · · xd
n+1
...
...
...
x1
n+m · · · xd
n+m










Reject Inference methods: Fuzzy Augmentation II
Step 1: Discard xnf
and estimate ˆθf
= argmaxθ (θ; xf
, yf
).
yf
ynf









y1
...
yn
NA
...
NA









xf
xnf









x1
1 · · · xd
1
...
...
...
x1
n · · · xd
n
x1
n+1 · · · xd
n+1
...
...
...
x1
n+m · · · xd
n+m










Reject Inference methods: Fuzzy Augmentation III
Step 2: Impute ynf
with their estimation given by ˆθf
.
yf
ynf









y1
...
yn
pˆθf(Yn+1 = 1|xn+1)
...
pˆθf(Yn+m = 1|xn+m)









xf
xnf









x1
1 · · · xd
1
...
...
...
x1
n · · · xd
n
x1
n+1 · · · xd
n+1
...
...
...
x1
n+m · · · xd
n+m









Step 3: estimate ˆθfuzzy
= argmaxθ (θ; x, yf
, ˆynf
).
Problem : ˆθfuzzy
= ˆθf
.

Reject Inference methods: Maximum likelihood estimation
“Classical” estimation in the Credit Scoring ﬁeld
ˆθf
= argmax
θ
(θ; xf
, yf
).
“Oracle” estimation knowing ynf
ˆθ = argmax
θ
(θ; x, y).

ˆθf
= argmax
θ
(θ; xf
, yf
).
ˆθ = argmax
θ
(θ; x, y).
Model space Θ
θf
opt
θopt
ˆθ
ˆθf
Estimation
bias+variance
Estimation
bias+variance
ptrue(y |x)
Model bias

ˆθf
= argmax
θ
(θ; xf
, yf
).
ˆθ = argmax
θ
(θ; x, y).
Asymptotics ?
Model space Θ
θf
opt
θopt
ˆθ
ˆθf
Estimation
bias+variance
Estimation
bias+variance
ptrue(y |x)
Model bias

Reject Inference methods: What is at stake?
Estimators :
1 “Oracle”:
√
n + m(ˆθ − θopt)
L
−−−−−→
n,m→∞
Nd+1(0, Σθopt )
2 Current methodology:
√
n(ˆθf − θf
opt)
L
−−−→
n→∞
Nd+1(0, Σf
θf
opt
)

Reject Inference methods: What is at stake?
Estimators :
1 “Oracle”:
√
n + m(ˆθ − θopt)
L
−−−−−→
n,m→∞
Nd+1(0, Σθopt )
2 Current methodology:
√
n(ˆθf − θf
opt)
L
−−−→
n→∞
Nd+1(0, Σf
θf
opt
)
Question 1 : asymptotics of the estimators
(Q1) θopt
?
= θf
opt
(Q2) Σθopt
?
= Σf
θf
opt

Reject Inference methods: Missingness mechanism
MAR : ∀ x, y, z, ptrue(z|x, y) = ptrue(z|x)
→ Acceptance is determined by the score : Z = 1{θ X>cut}.
MNAR : ∃ x, y, z, ptrue(z|x, y) = ptrue(z|x)
→ Operators’ “feeling” Xc inﬂuence the acceptance.
Y X
Xc
Z
Figure: Dependencies between random variables Y , Xc
, X and Z

Reject Inference methods: Model specification
Well-specified model : ∃ θtrue, ptrue(y|x) = pθtrue (y|x).
→ With real data ⇒ hypothesis unlikely to be true.
Misspecified model : θopt is the “best” in the Θ family.
→ Logistic regression commonly used for its robustness to
misspecification.
pθ(y|x)
ptrue(z|x, y)
MAR MNAR
Well specified
θf
opt = θopt
Σf
θf
opt
= Σθopt θf
opt = θopt
Misspecified
θf
opt = θopt Σf
θf
opt
= Σθopt
Σf
θf
opt
= Σθopt
Table: (Q1) and (Q2) w.r.t. model specification and missingness mechanism

Reject Inference methods: How to use xnf
?
Question 2: How to construct a better estimator than ˆθf?

?
Scope for action:
Change model space Θ,

?
Scope for action:
Model acceptance/rejection process (i.e. pγ(z|x, y)),

?
Scope for action:
Use xnf.

?
Scope for action:
Use xnf.
Natural way to achieve all three: generative approach
pα(x, y, z) = pβα (x)pθα (y|x)pγα (z|x, y).
( ˆθα , ˆβα, ˆγα) = argmax
θα,βα,γα
(α; x, yf
) = argmax
θα,βα,γα
n
i=1
ln(pθα (yi |xi ))
+
n+m
i=1
ln(pβα (xi )) +
n
i=1
ln(pγα (zi |xi , yi )) .

?
Scope for action:
Change model space Θ logistic regression,
Use xnf.
α
(α; x, yf
) = argmax
θα,βα,γα
n
i=1
ln(pθα (yi |xi ))
+
n+m
i=1
ln(pβα (xi )) +
n
i=1

?
Scope for action:
Change model space Θ logistic regression,
Model acceptance/rejection process (i.e. pγ(z|x, y))
γ cannot be estimated,
Use xnf.
α
(α; x, yf
) = argmax
θα,βα,γα
n
i=1
ln(pθα (yi |xi ))
+
n+m
i=1
ln(pβα (xi )) +
n
i=1

Reject Inference methods: A possible reinterpretation
Reclassiﬁcation1 :
(ˆθCEM
, ˆynf
) = argmax
θ,ynf
(θ; x, yf
, ynf
) where ˆyi = argmax
yi
pˆθf (yi |xi ).
Problem: inconsistent estimator.
1
[Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013]

Reject Inference methods: A possible reinterpretation
Augmentation2: MAR / misspeciﬁed model.
Aug(θ; xf
, yf
) =
n
i=1
1
ptrue(f|xi )
ln(pθ(yi |xi )).
Problem: estimation of ptrue(f|xi ).
Parcelling 3:
(θ; x, yf
, ˆynf
) where ˆyi =
1 w.p. αi pˆθf (1|xi , f)
0 w.p. 1 − αi pˆθf (1|xi , f)
.
Problem: MNAR assumptions hidden in ˆynf
(αi ) impossible to test.
2
[Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013,
Nguyen, 2016]
3
[Soulié and Viennet, 2007, Banasik and Crook, 2007, Guizani et al., 2013]

Reject Inference methods: Experimental results
Portfolio from a consumer electronics distributor partner:
Approximately 250,000 applications
Approximately 10 discrete (or discretized) features (with interactions):
Socio-professional category(-ies)
Amount of rent
Number of children
Years in current job position
Approximately 3 % of default

Conclusion

Conclusion
Fuzzy Augmentation, Reclassification (and Twins) were proved
useless.
Augmentation and Parcelling seem legitimate depending on model
specification and missingness mechanism but difficult/impossible to
set forth in practice.
Recommendation: do not practice Reject Inference since high
risk/low return of all tested Reject Inference techniques.
For the most part, Reject Inference had previously been tackled only
experimentally.
Research paper in progress.

Thank you for your attention!
Any questions?

Banasik, J. and Crook, J. (2007).
Reject inference, augmentation, and sample selection.
European Journal of Operational Research, 183(3):1582–1594.
Guizani, A., Souissi, B., Ammou, S. B., and Saporta, G. (2013).
Une comparaison de quatre techniques d’inférence des refusés dans le
processus d’octroi de crédit.
In 45 èmes Journées de statistique.
Nguyen, H. T. (2016).
Reject inference in application scorecards: evidence from France.
Technical report, University of Paris West-Nanterre la Défense,
EconomiX.
Soulié, F. F. and Viennet, E. (2007).
Le traitement des refusés dans le risque crédit.
Revue des Nouvelles Technologies de l’Information, Data Mining et
Apprentissage Statistique : application en assurance, banque et
marketing, RNTI-A-1:22–44.

Reject Inference in Credit Scoring

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reject Inference in Credit Scoring

Similar to Reject Inference in Credit Scoring (20)

Recently uploaded

Recently uploaded (20)

Reject Inference in Credit Scoring