Ensemble Based Credit
Risk Assessment System
BY
• NIKITA KAPIL
DOMAIN:
Machine Learning
Problem Statement
• Credit risk is a crucial factor when
commercial banks and financial
institutions grant loans to
customers.
• Constructing reliable evaluation
models that will play a huge role in
loss control
and revenue maximization.
• This project aims to reduce credit
risks by predicting defaulters based
on the behaviour of past defaulters.
Objective
• While the banking sector has always required an automated and
reliable way to distinguish between ‘good’ and ‘bad’ customers
due to the inaccuracies in the current models, the need for
accuracy far outweighs the available.
• The objective of this model is to improve and make the system
more usable. by creating three levels:​
• one unsupervised, clustering level;​
• the next a supervised, classification level which involves
several algorithms; and
• And the third semi-supervised, which takes a consensus
of the classes.​
The necessity of this endeavour
is abundant in the banking sector,
where about 5.21 trillion Indian
rupees was lost in NPA due to
defaulting.
What is
•Unreliable, biased
decisions
•Inaccurate predictions
•Intuition-based credit
assignment
•Large Non-Performing
Assets.
What can be
• Guided, carefully analysed
decisions
• Highly accurate predictions
• Understanding over
intuition
• More Performing Assets,
lesser liabilities
Existing
System
• The current system is largely based on credit scoring,
which in India is handled by CIBIL.
• The score ranges between 300 and 900.
• The problem with this type of scoring is that it
depends on the number of defaults rather than the
density of the amount defaulted.
• This leads to numerous exploits and loopholes in the
system that potentially affects the economic balance of
the customers.
Literature
survey
Sno. Authors Topic
1.
AghaeiRad, A., Chen, N., & Ribeiro,
B.(2017)
Improve credit scoring using transfer of
learned knowledge from self-organizing map.
2.
Asgharbeygi, N., & Maleki,
A.(2008)
Geodesic K-means clustering.
3. Breiman, L.(1999) Random forest. Machine Learning.
4. Cortes, C., & Vapnik, V.(1995) Support vector machine. Machine Learning.
5. Cover, T., & Hart, P.(1967) Nearest neighbor pattern classification.
6. Henley, W. E., & Hand, D. J.(1996)
A k-nearest-neighbour classifier for assessing
consumer credit risk.
Proposed
System
Architecture
Diagram
Modelling system of the Ensemble - Credit
Risk Assessment System
Route 1: No model works best for every
problem.
Route 2: Drawback of not being able to make
sense of the data before it is processed, which
can add to a lot of complexity and error.
Route 3: Does not provide an accuracy good
enough to positively make the system useful.
Route 4: This reduces pre-processing problems,
inconsistencies and inaccuracies of the system.
Process Flow of
the Ensemble -
Credit Risk
Assessment
system, based
on Route 4
Clustering
(Unsupervised)
Machine Learning
Methods
• Kohonen’s Self-Organizing Maps (SOM)
• k-Means Clustering (kMC)
Classification
(Supervised)
Machine Learning
Methods
• Logistic Regression (LR)
• Support Vector Machines (SVM)
• C4.5 Decision Tree (DT)
• Random forest (RF)
• Gradient Boosting Decision Trees
(GBDT)
• k-Nearest Neighbors (kNN)
Consensus (Semi-
Supervised)
Machine Learning
method
Voting Classifier
Software Requirements Specification
Hardware Requirements:
• Minimum 2GB RAM
• Intel Pentium 4 or Higher
• Recommended 1GB Storage
Space
Preferences:
• 6GB RAM or higher
• Intel Core i3 6600K or higher
Software Requirements:
• Python 3.6 installed
Preferred Operating Systems:
• Windows 10
• Ubuntu 18.04LTS
Experimental Results
• Algorithms have been tested on
the main dataset. The
performance metrics used to
appraise this model are
Accuracy, Recall, Precision, F1
Score and Confusion Matrix.
• The accuracy of the model is
93%.
● As can be inferred from both the tables next slide, the prediction of defaulters is
affected mostly by the fact that the number of samples that exist for them are
low.
● The overall performance of the model is significantly better than many
currently used models, and with a few more improvements can be useful for
progressing research in credit risk assessment.
• Using clustering algorithms that does not
require pre-set number of clusters at all, and
also identifies noisy data and does not use
them as a data point.
• The proposed CRA model can be
further enhanced to the following effects:
• Faster processing
• Reduced data overheads
• Client–side advisory assistant, so that the
debtor is warned about following a bad
spending behavioural pattern.
Future Work
Conclusion
• This approach to assessing credit risk allows for much more accurate
and far-sighted predictions such that countermeasures or advisory
procedures can be followed beforehand.
• This, in turn, recovers the public monetary assets that are rendered
ineffective due to defaulters of larger proportions, which is a major
problem in the Indian economy.
• Hence, the improved CRA model will make improvements in society
and country, and make it a better place to live in.
References
• Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art
classification algorithm for credit scoring.
• Breiman, L. (1999). Random forest.
• Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine.
• Cortes, C., & Vapnik, V. (1995). Support vector machine.
• Zhou, L., Lai, K. K., & Yu, L. (2010). Least square support vector machines ensemble models for
credit scoring.
• Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification.
• Henley, W. E., & Hand, D. J. (1996). A k- nearest- neighbor classifier for assessing consumer
credit risk.
• Islam, M. J., Wu, Q. M. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007). Investigating the
performance of Naïve-Bayes classifier and K- nearest neighbor classifiers.
• Asgharbeygi, N., & Maleki, A. (2008).Geodesic K- means clustering.
Credit risk

Credit risk

  • 1.
    Ensemble Based Credit RiskAssessment System BY • NIKITA KAPIL
  • 2.
  • 3.
    Problem Statement • Creditrisk is a crucial factor when commercial banks and financial institutions grant loans to customers. • Constructing reliable evaluation models that will play a huge role in loss control and revenue maximization. • This project aims to reduce credit risks by predicting defaulters based on the behaviour of past defaulters.
  • 4.
  • 5.
    • While thebanking sector has always required an automated and reliable way to distinguish between ‘good’ and ‘bad’ customers due to the inaccuracies in the current models, the need for accuracy far outweighs the available. • The objective of this model is to improve and make the system more usable. by creating three levels:​ • one unsupervised, clustering level;​ • the next a supervised, classification level which involves several algorithms; and • And the third semi-supervised, which takes a consensus of the classes.​
  • 6.
    The necessity ofthis endeavour is abundant in the banking sector, where about 5.21 trillion Indian rupees was lost in NPA due to defaulting.
  • 7.
    What is •Unreliable, biased decisions •Inaccuratepredictions •Intuition-based credit assignment •Large Non-Performing Assets.
  • 8.
    What can be •Guided, carefully analysed decisions • Highly accurate predictions • Understanding over intuition • More Performing Assets, lesser liabilities
  • 9.
  • 10.
    • The currentsystem is largely based on credit scoring, which in India is handled by CIBIL. • The score ranges between 300 and 900. • The problem with this type of scoring is that it depends on the number of defaults rather than the density of the amount defaulted. • This leads to numerous exploits and loopholes in the system that potentially affects the economic balance of the customers.
  • 11.
  • 12.
    Sno. Authors Topic 1. AghaeiRad,A., Chen, N., & Ribeiro, B.(2017) Improve credit scoring using transfer of learned knowledge from self-organizing map. 2. Asgharbeygi, N., & Maleki, A.(2008) Geodesic K-means clustering. 3. Breiman, L.(1999) Random forest. Machine Learning. 4. Cortes, C., & Vapnik, V.(1995) Support vector machine. Machine Learning. 5. Cover, T., & Hart, P.(1967) Nearest neighbor pattern classification. 6. Henley, W. E., & Hand, D. J.(1996) A k-nearest-neighbour classifier for assessing consumer credit risk.
  • 13.
  • 14.
  • 15.
    Modelling system ofthe Ensemble - Credit Risk Assessment System
  • 16.
    Route 1: Nomodel works best for every problem. Route 2: Drawback of not being able to make sense of the data before it is processed, which can add to a lot of complexity and error. Route 3: Does not provide an accuracy good enough to positively make the system useful. Route 4: This reduces pre-processing problems, inconsistencies and inaccuracies of the system.
  • 17.
    Process Flow of theEnsemble - Credit Risk Assessment system, based on Route 4
  • 18.
    Clustering (Unsupervised) Machine Learning Methods • Kohonen’sSelf-Organizing Maps (SOM) • k-Means Clustering (kMC)
  • 19.
    Classification (Supervised) Machine Learning Methods • LogisticRegression (LR) • Support Vector Machines (SVM) • C4.5 Decision Tree (DT) • Random forest (RF) • Gradient Boosting Decision Trees (GBDT) • k-Nearest Neighbors (kNN)
  • 20.
  • 21.
    Software Requirements Specification HardwareRequirements: • Minimum 2GB RAM • Intel Pentium 4 or Higher • Recommended 1GB Storage Space Preferences: • 6GB RAM or higher • Intel Core i3 6600K or higher Software Requirements: • Python 3.6 installed Preferred Operating Systems: • Windows 10 • Ubuntu 18.04LTS
  • 22.
    Experimental Results • Algorithmshave been tested on the main dataset. The performance metrics used to appraise this model are Accuracy, Recall, Precision, F1 Score and Confusion Matrix. • The accuracy of the model is 93%.
  • 23.
    ● As canbe inferred from both the tables next slide, the prediction of defaulters is affected mostly by the fact that the number of samples that exist for them are low. ● The overall performance of the model is significantly better than many currently used models, and with a few more improvements can be useful for progressing research in credit risk assessment.
  • 24.
    • Using clusteringalgorithms that does not require pre-set number of clusters at all, and also identifies noisy data and does not use them as a data point. • The proposed CRA model can be further enhanced to the following effects: • Faster processing • Reduced data overheads • Client–side advisory assistant, so that the debtor is warned about following a bad spending behavioural pattern. Future Work
  • 25.
    Conclusion • This approachto assessing credit risk allows for much more accurate and far-sighted predictions such that countermeasures or advisory procedures can be followed beforehand. • This, in turn, recovers the public monetary assets that are rendered ineffective due to defaulters of larger proportions, which is a major problem in the Indian economy. • Hence, the improved CRA model will make improvements in society and country, and make it a better place to live in.
  • 26.
    References • Lessmann, S.,Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithm for credit scoring. • Breiman, L. (1999). Random forest. • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. • Cortes, C., & Vapnik, V. (1995). Support vector machine. • Zhou, L., Lai, K. K., & Yu, L. (2010). Least square support vector machines ensemble models for credit scoring. • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. • Henley, W. E., & Hand, D. J. (1996). A k- nearest- neighbor classifier for assessing consumer credit risk. • Islam, M. J., Wu, Q. M. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007). Investigating the performance of Naïve-Bayes classifier and K- nearest neighbor classifiers. • Asgharbeygi, N., & Maleki, A. (2008).Geodesic K- means clustering.