Credit Risk with AI tools
The old, the new and the unexpected
ARMANDO VIEIRA
Armandosvieira.wordpress.com
Customer fails
to pay
Losing money
Wrong Strategy
Change in
market
prices
Processing failures and
frauds
Regulatory compliance
Customer fails
to pay
Losing money
Wrong Strategy
Change in
market
prices
Processing failures and
frauds
Regulatory compliance
RISK
Importance of Credit Risk
A statistical means of providing a quantifiable risk factor for a given
applicant.
Credit scoring is a process whereby information provided is converted into
numbers to arrive at a score.
The objective is to forecast future performance from past behavior of
clients (SME or individuals).
Credit scoring are used in many areas of industries:
Banking
Decision Models Finance
Insurance
Retail
Telecommunications
What is Credit Scoring?
• Predict financial distress of private companies one year ahead
based on account balance sheet from previous years.
• Enventualy the probability to become so.
• Obtain reliable data from up to 5 previous years before failure
• Classify and release warning signs
Bankruptcy prediction problem
The curse of dimensionality
Problems
• Sparness of the search space
• Presence of Irrelevant Features
• Poor generalization of Learning Machine
• Exceptions difficult to identify
Solutions
• Dimensionality reduction: feature selection
• Constrain the complexity of the Learning Machine
The Diane Database
• Financial statements of French companies, initially of 60,000
industrial French companies, for the years of 2002 to 2006,
with at least 10 employees
• 3,000 were declared bankrupted in 2007 or presented a
• restructuring plan 30 financial ratios which allow the
description of firms in terms of the financial strength,
liquidity, solvability, productivity of labor and capital, margins,
net profitability and return on investment
The inputs
Number of employees Net Current Assets/Turnover (days)
Financial Debt / Capital Employed (%) Working Capital Needs / Turnover (%)
Capital Employed / Fixed Assets Export (%)
Depreciation of Tangible Assets (%) Value added per employee
Working capital / current assets Total Assets / Turnover
Current ratio Operating Profit Margin (%)
Liquidity ratio Net Profit Margin (%)
Stock Turnover days Added Value Margin (%)
Collection period Part of Employees (%)
Credit Period Return on Capital Employed (%)
Turnover per Employee Return on Total Assets (%)
Interest / Turnover EBIT Margin (%)
Debt Period (days) EBITDA Margin (%)
Financial Debt / Equity (%) Cashflow / Turnover (%)
Financial Debt / Cashflow Working Capital / Turnover (days)
Hard problem
0
2
4
6
3 4 5 6 7
Class 0
Class 1
λ
1
λ
2
First two principal component from PCA
How HLVQ-C works
0
0.5
1.0
1.5
0 0.5 1.0 1.5
Class 0
Class 1
After
Before
?
d2
d1
X
Y
DIANE 1 (error%)
Model Error I Error II Total
MDA
SVM
MLP
HLVQ-C
26.4
17.6
25.7
11.1
21.0
12.2
13.1
10.6
23.7
14.8
19.4
10.8
DIANE 1 - HLVQC Results
Method
Classification
Weighted Efficiency
(%)
Z-score (Altman) 62.7
Best Discriminant 66.1
MLP 71.4
OurMethod 84.1
Source: Vieira, A.S., Neves, J.C.: Improving Bankruptcy Prediction with Hidden Layer
Learning. Vector Quantization. European Accounting Review, 15 (2), 253-271 (2006).
Personal credit
Results I – 30 days into arrears
Classifier Accuracy (%) Type I Type II
G
Logistic 66.3 27.3 40.1
54.8
MLP 67.5 8.1 57.1
61.1
SVM 64.9 35.6 34.6
52.3
AdaboostM1 69.0 12.6 49.4
55.7
HLVQ-C 72.6 5.3 49.5
52.3
Results I – 60 days into arrears
Classifier Accuracy Type I Type II
G
Logistic 81.2 48.2 11.0
21.2
MLP 82.3 57.4 9.1
20.1
SVM 83.3 38.1 12.4
19.3
AdaboostM1 84.1 45.7 8.0
14.7
HLVQ-C 86.5 48.3 6.2
11.9
DIANE II (2002 – 2007)
• More data
• Longer history
• More features
Year
2006
Classifier Accuracy Type I Type II
Logistic 91.25 6.33 11.17
MLP 91.17 6.33 11.33
C-SVM 92.42 5.16 10.00
AdaboostM1 89.75 8.16 12.33
Year
2005
Classifier Accuracy Type I Type II
Logistic 79.92 19.50 20.67
MLP 75.83 24.50 23.83
C-SVM 80.00 21.17 18.83
AdaboostM1 78.17 20.50 23.17
Results
How useful?
[ ]mexexNV III )1()1( −−−=η






−
>>
− I
II
e
e
mmG
x
x
11
The Rating System
French market - 2006
-2
-1
0
1
2
-2
-1
0
1
2
-1.5
-1
-0.5
0
0.5
1
cr
eb
Score (EBIT, Current ratio)
MOGA
Multiobjective Genetic Algorithms
MOGA – feature selection
S-ISOMAP – manifold learning
The idea behind it
Other approaches
• SVM+ - domain knowledge SVMs
• RVM – probabilistic SVMs
• NMF – Non-negative Matrix
Factorization
• Genetic Programming
• …
The Power of Social Network
Analysis
Bad Rank Algorithm
Where are the bad guys?
Bad Rank for Fraud Detection
Results with Semi-supervised Learning
Networks Analysis
A world of possibilities
• Identify critical nodes / links / clusters
• Detailed information of dynamics
• Stability / robustness of system
• Information / crisis Propagation
• Stress tests
Team
João Carvalho das Neves
Professor of
Management, ISEG.
Ph.D. in Business
Administration,
Manchester Business
School
Armando Vieira
Professor of Physics, &
entrepreneur. Ph.D. in
Physics and researcher
in Artificial Intelligence
Bernardete Ribeiro
Associate Professor
of Computer
Science, University
Coimbra,
researcher at
CISUC.
Tiago Marques
Marketing and
Business
Consultant,
E-Business
Specialist,
Director of
Research
Business
Director
IT Researcher Marketing
10+ years experience in AI
25 years experience in Credit Risk & Financial Analysis
15 years of marketing experience
What do banks need in credit
management?
Efficiency Accuracy
Savings of Capital – Basel requirements
This is a highly regulated industry with detailed and focused regulators
What do they get?
Boosting the accuracy of credit risk methodologies will lead to considerable gains for banks
Source: Issue 2 of NPLEurope, a publication overing non-performing loan
(NPL) markets in Europe and the United Kingdom (UK).,
PriceWaterhouseCoopers
Non-performing loans - Europe
0
50
100
150
200
250
Germany UK Spain Italy Russia Greece
2008
2009
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
2005 2006 2007 2008 2009
% Corporate Debt Default -
Portugal
BillionsofEUR
NPL(%)
Source: Bank of Portugal
AIRES Solution
AIRES.dei.uc.pt

Credit risk with neural networks bankruptcy prediction machine learning

  • 1.
    Credit Risk withAI tools The old, the new and the unexpected ARMANDO VIEIRA Armandosvieira.wordpress.com
  • 2.
    Customer fails to pay Losingmoney Wrong Strategy Change in market prices Processing failures and frauds Regulatory compliance Customer fails to pay Losing money Wrong Strategy Change in market prices Processing failures and frauds Regulatory compliance RISK
  • 3.
  • 4.
    A statistical meansof providing a quantifiable risk factor for a given applicant. Credit scoring is a process whereby information provided is converted into numbers to arrive at a score. The objective is to forecast future performance from past behavior of clients (SME or individuals). Credit scoring are used in many areas of industries: Banking Decision Models Finance Insurance Retail Telecommunications What is Credit Scoring?
  • 6.
    • Predict financialdistress of private companies one year ahead based on account balance sheet from previous years. • Enventualy the probability to become so. • Obtain reliable data from up to 5 previous years before failure • Classify and release warning signs Bankruptcy prediction problem
  • 7.
    The curse ofdimensionality Problems • Sparness of the search space • Presence of Irrelevant Features • Poor generalization of Learning Machine • Exceptions difficult to identify Solutions • Dimensionality reduction: feature selection • Constrain the complexity of the Learning Machine
  • 8.
    The Diane Database •Financial statements of French companies, initially of 60,000 industrial French companies, for the years of 2002 to 2006, with at least 10 employees • 3,000 were declared bankrupted in 2007 or presented a • restructuring plan 30 financial ratios which allow the description of firms in terms of the financial strength, liquidity, solvability, productivity of labor and capital, margins, net profitability and return on investment
  • 9.
    The inputs Number ofemployees Net Current Assets/Turnover (days) Financial Debt / Capital Employed (%) Working Capital Needs / Turnover (%) Capital Employed / Fixed Assets Export (%) Depreciation of Tangible Assets (%) Value added per employee Working capital / current assets Total Assets / Turnover Current ratio Operating Profit Margin (%) Liquidity ratio Net Profit Margin (%) Stock Turnover days Added Value Margin (%) Collection period Part of Employees (%) Credit Period Return on Capital Employed (%) Turnover per Employee Return on Total Assets (%) Interest / Turnover EBIT Margin (%) Debt Period (days) EBITDA Margin (%) Financial Debt / Equity (%) Cashflow / Turnover (%) Financial Debt / Cashflow Working Capital / Turnover (days)
  • 10.
    Hard problem 0 2 4 6 3 45 6 7 Class 0 Class 1 λ 1 λ 2 First two principal component from PCA
  • 11.
    How HLVQ-C works 0 0.5 1.0 1.5 00.5 1.0 1.5 Class 0 Class 1 After Before ? d2 d1 X Y
  • 12.
    DIANE 1 (error%) ModelError I Error II Total MDA SVM MLP HLVQ-C 26.4 17.6 25.7 11.1 21.0 12.2 13.1 10.6 23.7 14.8 19.4 10.8
  • 13.
    DIANE 1 -HLVQC Results Method Classification Weighted Efficiency (%) Z-score (Altman) 62.7 Best Discriminant 66.1 MLP 71.4 OurMethod 84.1 Source: Vieira, A.S., Neves, J.C.: Improving Bankruptcy Prediction with Hidden Layer Learning. Vector Quantization. European Accounting Review, 15 (2), 253-271 (2006).
  • 14.
  • 15.
    Results I –30 days into arrears Classifier Accuracy (%) Type I Type II G Logistic 66.3 27.3 40.1 54.8 MLP 67.5 8.1 57.1 61.1 SVM 64.9 35.6 34.6 52.3 AdaboostM1 69.0 12.6 49.4 55.7 HLVQ-C 72.6 5.3 49.5 52.3
  • 16.
    Results I –60 days into arrears Classifier Accuracy Type I Type II G Logistic 81.2 48.2 11.0 21.2 MLP 82.3 57.4 9.1 20.1 SVM 83.3 38.1 12.4 19.3 AdaboostM1 84.1 45.7 8.0 14.7 HLVQ-C 86.5 48.3 6.2 11.9
  • 17.
    DIANE II (2002– 2007) • More data • Longer history • More features
  • 18.
    Year 2006 Classifier Accuracy TypeI Type II Logistic 91.25 6.33 11.17 MLP 91.17 6.33 11.33 C-SVM 92.42 5.16 10.00 AdaboostM1 89.75 8.16 12.33 Year 2005 Classifier Accuracy Type I Type II Logistic 79.92 19.50 20.67 MLP 75.83 24.50 23.83 C-SVM 80.00 21.17 18.83 AdaboostM1 78.17 20.50 23.17 Results
  • 19.
    How useful? [ ]mexexNVIII )1()1( −−−=η       − >> − I II e e mmG x x 11
  • 20.
  • 21.
  • 25.
  • 26.
  • 27.
  • 30.
  • 31.
  • 35.
    Other approaches • SVM+- domain knowledge SVMs • RVM – probabilistic SVMs • NMF – Non-negative Matrix Factorization • Genetic Programming • …
  • 36.
    The Power ofSocial Network Analysis
  • 37.
  • 38.
    Where are thebad guys?
  • 39.
    Bad Rank forFraud Detection
  • 40.
  • 41.
    Networks Analysis A worldof possibilities • Identify critical nodes / links / clusters • Detailed information of dynamics • Stability / robustness of system • Information / crisis Propagation • Stress tests
  • 44.
    Team João Carvalho dasNeves Professor of Management, ISEG. Ph.D. in Business Administration, Manchester Business School Armando Vieira Professor of Physics, & entrepreneur. Ph.D. in Physics and researcher in Artificial Intelligence Bernardete Ribeiro Associate Professor of Computer Science, University Coimbra, researcher at CISUC. Tiago Marques Marketing and Business Consultant, E-Business Specialist, Director of Research Business Director IT Researcher Marketing 10+ years experience in AI 25 years experience in Credit Risk & Financial Analysis 15 years of marketing experience
  • 45.
    What do banksneed in credit management? Efficiency Accuracy Savings of Capital – Basel requirements This is a highly regulated industry with detailed and focused regulators
  • 46.
    What do theyget? Boosting the accuracy of credit risk methodologies will lead to considerable gains for banks Source: Issue 2 of NPLEurope, a publication overing non-performing loan (NPL) markets in Europe and the United Kingdom (UK)., PriceWaterhouseCoopers Non-performing loans - Europe 0 50 100 150 200 250 Germany UK Spain Italy Russia Greece 2008 2009 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 2005 2006 2007 2008 2009 % Corporate Debt Default - Portugal BillionsofEUR NPL(%) Source: Bank of Portugal
  • 47.
  • 48.

Editor's Notes

  • #46 the banking industry is a highly regulated industry with detailed and focused regulators Fast, fully adaptable, performance and accuracy Commercial Benefits Cost Reduction Investor Scale Negócio que irá permanecer com alta procura ROI Of the team An experienced team, where the whole is far greater than the sum of its parts
  • #47 Boosting the accuracy of credit risk methodologies used by banks and financial institutions may lead to considerable gains. Default rate in Portugal has more than double in the past 5 years Similary in Europe NPL increase by over 25%, many as much as 50% 620 billion euros in 2009 For example, improving the accuracy of credit risk assessment models by only 1% may lead to a gain in banking sector of about 50 million Euros - in Portugal alone