The document describes a new credit risk modeling technique using a Bayesian network with a latent variable. It introduces a discrete Bayesian network model containing a latent variable that represents different classes of probability distributions for credit risk. The model allows evaluating credit risk and clustering loan subscribers. The document then provides details of the Bayesian network model and proposes a customized Expectation Maximization algorithm to learn the model parameters from data. The model and learning approach are applied to a real loan data set to classify loans and analyze credit risk profiles.
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
Many business operations and strategies rely on bankruptcy prediction. In this paper, we aim to
study the impacts of public records and firmographics and predict the bankruptcy in a 12-
month-ahead period with using different classification models and adding values to traditionally
used financial ratios. Univariate analysis shows the statistical association and significance of
public records and firmographics indicators with the bankruptcy. Further, seven statistical
models and machine learning methods were developed, including Logistic Regression, Decision
Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and
Neural Network. The performance of models were evaluated and compared based on
classification accuracy, Type I error, Type II error, and ROC curves on the hold-out dataset.
Moreover, an experiment was set up to show the importance of oversampling for rare event
prediction. The result also shows that Bayesian Network is comparatively more robust than
other models without oversampling.
An Innovative Approach to Predict Bankruptcyvivatechijri
Bankruptcy is a legal status of a person or other organization that cannot repay their debts to
creditors. Bankruptcy prediction is the task of predicting bankruptcy and by doing various surveys we can avoid
financial distress of firms. It is a huge area of accounting and finance research. The significance of this area is
an important part of financial specialists and creditors in assessing the probability that a firm may go bankrupt
or not. Estimating the risk of corporate bankruptcies is very important as the effect of bankruptcy is on a global
level. The aim of predicting financial distress is to develop a predictive model that combines various economic
factors which allow foreseeing the financial status of a firm. In this domain, various methods were proposed that
were based on neural networks, Support Vector Machines, Decision Trees, Random Forests, Naïve Bayes,
Balanced Bagging and Logistic Regression. In this paper, we document our observations as we explore and build
a Restricted Boltzmann Machine to Bankruptcy Prediction. We started by carrying out data pre-processing where
we impute the missing data values using Mean Imputation. To solve the data imbalance issue, we apply the
Synthetic Minority Oversampling Technique (SMOTE) to oversample the minority class labels. Finally, we
analyze and evaluate the performance of the model.
INFLUENCE OF THE EVENT RATE ON DISCRIMINATION ABILITIES OF BANKRUPTCY PREDICT...ijdms
In bankruptcy prediction, the proportion of events is very low, which is often oversampled to eliminate this
bias. In this paper, we study the influence of the event rate on discrimination abilities of bankruptcy
prediction models. First the statistical association and significance of public records and firmographics
indicators with the bankruptcy were explored. Then the event rate was oversampled from 0.12% to 10%,
20%, 30%, 40%, and 50%, respectively. Seven models were developed, including Logistic Regression,
Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and
Neural Network. Under different event rates, models were comprehensively evaluated and compared based
on Kolmogorov-Smirnov Statistic, accuracy,F1 score, Type I error, Type II error, and ROC curve on the
hold-out dataset with their best probability cut-offs. Results show that Bayesian Network is the most
insensitive to the event rate, while Support Vector Machine is the most sensitive.
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSAndresz26
Este Working Paper relata sobre el nivel del riesgo crediticio, se debe reconocer que el riesgo de un grupo proviene de la diversidad de sus miembros, este libro propone una metodología para la aplicación de la medición del riesgo crediticio, y permite hacer un ranking de la población por su nivel de riesgo. La misma que realiza una distinción en los diferentes rankings de la población por su nivel de riesgo, y considerando en el ranking los riesgos de sus preferencias en sus decisiones realizadas.
http://www.udla.edu.ec/
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
Aim is to build a classification model to predict whether company will become bankrupt or not using financial ratios of Polish companies. Applied various machine learning models like Random Forest, KNN, AdaBoost & Decision Tree with pre-processing techniques like SMOTE-ENN (to deal with class imbalance) & feature selection (for identifying ) and trained on Polish Bankruptcy dataset with prediction accuracy of 89%.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
Many business operations and strategies rely on bankruptcy prediction. In this paper, we aim to
study the impacts of public records and firmographics and predict the bankruptcy in a 12-
month-ahead period with using different classification models and adding values to traditionally
used financial ratios. Univariate analysis shows the statistical association and significance of
public records and firmographics indicators with the bankruptcy. Further, seven statistical
models and machine learning methods were developed, including Logistic Regression, Decision
Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and
Neural Network. The performance of models were evaluated and compared based on
classification accuracy, Type I error, Type II error, and ROC curves on the hold-out dataset.
Moreover, an experiment was set up to show the importance of oversampling for rare event
prediction. The result also shows that Bayesian Network is comparatively more robust than
other models without oversampling.
An Innovative Approach to Predict Bankruptcyvivatechijri
Bankruptcy is a legal status of a person or other organization that cannot repay their debts to
creditors. Bankruptcy prediction is the task of predicting bankruptcy and by doing various surveys we can avoid
financial distress of firms. It is a huge area of accounting and finance research. The significance of this area is
an important part of financial specialists and creditors in assessing the probability that a firm may go bankrupt
or not. Estimating the risk of corporate bankruptcies is very important as the effect of bankruptcy is on a global
level. The aim of predicting financial distress is to develop a predictive model that combines various economic
factors which allow foreseeing the financial status of a firm. In this domain, various methods were proposed that
were based on neural networks, Support Vector Machines, Decision Trees, Random Forests, Naïve Bayes,
Balanced Bagging and Logistic Regression. In this paper, we document our observations as we explore and build
a Restricted Boltzmann Machine to Bankruptcy Prediction. We started by carrying out data pre-processing where
we impute the missing data values using Mean Imputation. To solve the data imbalance issue, we apply the
Synthetic Minority Oversampling Technique (SMOTE) to oversample the minority class labels. Finally, we
analyze and evaluate the performance of the model.
INFLUENCE OF THE EVENT RATE ON DISCRIMINATION ABILITIES OF BANKRUPTCY PREDICT...ijdms
In bankruptcy prediction, the proportion of events is very low, which is often oversampled to eliminate this
bias. In this paper, we study the influence of the event rate on discrimination abilities of bankruptcy
prediction models. First the statistical association and significance of public records and firmographics
indicators with the bankruptcy were explored. Then the event rate was oversampled from 0.12% to 10%,
20%, 30%, 40%, and 50%, respectively. Seven models were developed, including Logistic Regression,
Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and
Neural Network. Under different event rates, models were comprehensively evaluated and compared based
on Kolmogorov-Smirnov Statistic, accuracy,F1 score, Type I error, Type II error, and ROC curve on the
hold-out dataset with their best probability cut-offs. Results show that Bayesian Network is the most
insensitive to the event rate, while Support Vector Machine is the most sensitive.
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSAndresz26
Este Working Paper relata sobre el nivel del riesgo crediticio, se debe reconocer que el riesgo de un grupo proviene de la diversidad de sus miembros, este libro propone una metodología para la aplicación de la medición del riesgo crediticio, y permite hacer un ranking de la población por su nivel de riesgo. La misma que realiza una distinción en los diferentes rankings de la población por su nivel de riesgo, y considerando en el ranking los riesgos de sus preferencias en sus decisiones realizadas.
http://www.udla.edu.ec/
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
Aim is to build a classification model to predict whether company will become bankrupt or not using financial ratios of Polish companies. Applied various machine learning models like Random Forest, KNN, AdaBoost & Decision Tree with pre-processing techniques like SMOTE-ENN (to deal with class imbalance) & feature selection (for identifying ) and trained on Polish Bankruptcy dataset with prediction accuracy of 89%.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization IJECEIAES
Credit scoring is a procedure that exists in every financial institution. A way to predict whether the debtor was qualified to be given the loan or not and has been a major concern in the overall steps of the loan process. Almost all banks and other financial institutions have their own credit scoring methods. Nowadays, data mining approach has been accepted to be one of the wellknown methods. Certainly, accuracy was also a major issue in this approach. This research proposed a hybrid method using CART algorithm and Binary Particle Swarm Optimization. Performance indicators that are used in this research are classification accuracy, error rate, sensitivity, specificity, and precision. Experimental results based on the public dataset showed that the proposed method accuracy is 78 % and 87.53 %. In compare to several popular algorithms, such as neural network, logistic regression and support vector machine, the proposed method showed an outstanding performance.
Bankruptcy Prediction is an art of predicting bankruptcy and various measures of financial
distress of public or private firms. In recent past days we are seeing many cases with distress
and bankrupted. It is a huge area of finance and accounting research. The importance of the
world is due partially to the relevance for creditors and investors in evaluating the likelihood
that a firm may go bankrupt. The quantity of research is additionally a function of the supply of
data: for public firms which went bankrupt or not, numerous accounting ratios which
may indicate danger can be calculated, and various other potential explanatory variables also
are available. Consequently, the world is well-suited for testing of increasingly sophisticated,
data-intensive forecasting approaches.
https://ijitce.com/index.php
Our journal maintains rigorous peer review standards. Each submitted article undergoes a thorough evaluation by experts in the respective field. This stringent review process helps ensure that only high-quality and scientifically sound research is accepted for publication. Researchers can trust that the articles they find in IJITCE have been critically assessed for validity, significance, and originality.
Synthetic feature generation to improve accuracy in prediction of credit limitsBOHRInternationalJou1
Financial institutions use various data mining algorithms to determine the credit limits for individuals using features
like age, education, employment, gender, income, and marital status. But, there is still a question of accurate
predictability, that is, how accurate can an institution be in predicting risk and granting credit levels. If an institution
grants too low of a credit limit/loan for an individual, then the institution may lose business to competitors, but
if the institution grants too high of a credit limit/loan, then the institution may lose money if that individual does
not repay the credit/loan. The novelty of this work is that it shows how to improve the accuracy in predicting
credit limits/loan amounts using synthetic feature generation. By creating secondary groupings and including
both the original binning and the synthetic bins, the classification accuracy and other statistical measures like
precision and ROC improved substantially. Hence, our research showed that without synthetic feature generation,
the classification rates were low, and the use of synthetic features greatly improved the classification accuracy and
other statistical measures.
Review Parameters Model Building & Interpretation and Model Tunin.docxcarlstromcurtis
Review Parameters: Model Building & Interpretation and Model Tuning
1. Model Building
a. Assessments and Rationale of Various Models Employed to Predict Loan Defaults
The z-score formula model was employed by Altman (1968) while envisaging bankruptcy. The model was utilized to forecast the likelihood that an organization may fall into bankruptcy in a period of two years. In addition, the Z-score model was instrumental in predicting corporate defaults. The model makes use of various organizational income and balance sheet data to weigh the financial soundness of a firm. The Z-score involves a Linear combination of five general financial ratios which are assessed through coefficients. The author employed the statistical technique of discriminant examination of data set sourced from publically listed manufacturers. A research study by Alexander (2012) made use of symmetric binary alternative models, otherwise referred to as conditional probability models. The study sought to establish the asymmetric binary options models subject to the extreme value theory in better explicating bankruptcy.
In their research study on the likelihood of default models examining Russian banks, Anatoly et al. (2014) made use of binary alternative models in predicting the likelihood of default. The study established that preface specialist clustering or mechanical clustering enhances the prediction capacity of the models. Rajan et al. (2010) accentuated the statistical default models as well as inducements. They postulated that purely numerical models disregard the concept that an alteration in the inducements of agents who produce the data may alter the very nature of data. The study attempted to appraise statistical models that unpretentiously pool resources on historical figures devoid of modeling the behavior of driving forces that generates these data. Goodhart (2011) sought to assess the likelihood of small businesses to default on loans. Making use of data on business loan assortment, the study established the particular lender, loan, and borrower characteristics as well as modifications in the economic environments that lead to a rise in the probability of default. The results of the study form the basis for the scoring model. Focusing on modeling default possibility, Singhee & Rutenbar (2010) found the risk as the uncertainty revolving around an enterprise’s capacity to service its obligations and debts.
Using the logistic model to forecast the probability of bank loan defaults, Adam et al. (2012) employed a data set with demographic information on borrowers. The authors attempted to establish the risk factors linked to borrowers are attributable to default. The identified risk factors included marital status, gender, occupation, age, and loan duration. Cababrese (2012) employed three accepted data mining algorithms, naïve Bayesian classifiers, artificial neural network decision trees coupled with a logical regression model to formulate a prediction m ...
Carry out a systematic literature review on the application of macTawnaDelatorrejs
Carry out a systematic literature review on the application of machine learning in Banking risk management building on the work of Leo et al., 2019 (attached).
· 12-13 pages (excluding title page and reference pages)
· References must not be older than 2 years.
· APA 7.0
Carry out a
systematic
literature review on the application of machine learning in
Banking risk
management
building on the work of Leo et al., 2019 (attached).
·
1
2
-
1
3
pages (excluding title page and reference pages)
·
References must not be older t
han 2 years.
·
APA 7.0
Carry out a systematic literature review on the application of machine learning in
Banking risk management building on the work of Leo et al., 2019 (attached).
12-13 pages (excluding title page and reference pages)
References must not be older than 2 years.
APA 7.0
risks
Article
Machine Learning in Banking Risk Management:
A Literature Review
Martin Leo * , Suneel Sharma and K. Maddulety
SP Jain School of Global Management, Sydney 2127, Australia; [email protected] (S.S.);
[email protected] (K.M.)
* Correspondence: [email protected]; Tel.: +65-9028-9209
Received: 25 January 2019; Accepted: 27 February 2019; Published: 5 March 2019
����������
�������
Abstract: There is an increasing influence of machine learning in business applications, with many
solutions already implemented and many more being explored. Since the global financial crisis,
risk management in banks has gained more prominence, and there has been a constant focus around
how risks are being detected, measured, reported and managed. Considerable research in academia
and industry has focused on the developments in banking and risk management and the current
and emerging challenges. This paper, through a review of the available literature seeks to analyse
and evaluate machine-learning techniques that have been researched in the context of banking risk
management, and to identify areas or problems in risk management that have been inadequately
explored and are potential areas for further research. The review has shown that the application of
machine learning in the management of banking risks such as credit risk, market risk, operational
risk and liquidity risk has been explored; however, it doesn’t appear commensurate with the current
industry level of focus on both risk management and machine learning. A large number of areas
remain in bank risk management that could significantly benefit from the study of how machine
learning can be applied to address specific problems.
Keywords: risk management; bank; machine learning; credit scoring; fraud
1. Introduction
Since the global financial crisis, risk management in banks has gained more prominence, and
there has been a constant focus on how risks are being detected, measured, reported and managed.
Considerable research (Van Liebergen 2017; Deloitte University Press 2017; Helbekkmo et al. 2013;
MetricStream 2018; Oliver Wyman 2017), both in academia and indus ...
A potential objective of every financial organization is to retain existing customers and attain new
prospective customers for long-term. The economic behaviour of customer and the nature of the
organization are controlled by a prescribed form called Know Your Customer (KYC) in manual banking.
Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc) are with
high risk; whereas in some sectors (Transport Operators, Auto-delear, religious) are with medium risk;
and in remaining sectors (Retail, Corporate, Service, Farmer etc) belongs to low risk. Presently, credit risk
for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are
many existing systems on customer retention as well as customer attrition systems in bank, these rigorous
methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used
records of business customers of a retail commercial bank in the city including rural and urban area of
(Tangail city) Bangladesh to analyse the major transactional determinants of customers and predicting of a
model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for
analysing the challenging issues, where pruned decision tree classification technique has been used to
develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to
build up a model to predict prospective business sectors in retail banking.
Explainable ensemble technique for enhancing credit risk predictionIAESIJAI
Credit risk prediction is a critical task in financial institutions that can impact lending decisions and financial stability. While machine learning (ML) models have shown promise in accurately predicting credit risk, the complexity of these models often makes them difficult to interpret and explain. The paper proposes the explainable ensemble method to improve credit risk prediction while maintaining interpretability. In this study, an ensemble model is built by combining multiple base models that uses different ML algorithms. In addition, the model interpretation techniques to identify the most important features and visualize the model's decision-making process. Experimental results demonstrate that the proposed explainable ensemble model outperforms individual base models and achieves high accuracy with low loss. Additionally, the proposed model provides insights into the factors that contribute to credit risk, which can help financial institutions make more informed lending decisions. Overall, the study highlights the potential of explainable ensemble methods in enhancing credit risk prediction and promoting transparency and trust in financial decision-making.
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization IJECEIAES
Credit scoring is a procedure that exists in every financial institution. A way to predict whether the debtor was qualified to be given the loan or not and has been a major concern in the overall steps of the loan process. Almost all banks and other financial institutions have their own credit scoring methods. Nowadays, data mining approach has been accepted to be one of the wellknown methods. Certainly, accuracy was also a major issue in this approach. This research proposed a hybrid method using CART algorithm and Binary Particle Swarm Optimization. Performance indicators that are used in this research are classification accuracy, error rate, sensitivity, specificity, and precision. Experimental results based on the public dataset showed that the proposed method accuracy is 78 % and 87.53 %. In compare to several popular algorithms, such as neural network, logistic regression and support vector machine, the proposed method showed an outstanding performance.
Bankruptcy Prediction is an art of predicting bankruptcy and various measures of financial
distress of public or private firms. In recent past days we are seeing many cases with distress
and bankrupted. It is a huge area of finance and accounting research. The importance of the
world is due partially to the relevance for creditors and investors in evaluating the likelihood
that a firm may go bankrupt. The quantity of research is additionally a function of the supply of
data: for public firms which went bankrupt or not, numerous accounting ratios which
may indicate danger can be calculated, and various other potential explanatory variables also
are available. Consequently, the world is well-suited for testing of increasingly sophisticated,
data-intensive forecasting approaches.
https://ijitce.com/index.php
Our journal maintains rigorous peer review standards. Each submitted article undergoes a thorough evaluation by experts in the respective field. This stringent review process helps ensure that only high-quality and scientifically sound research is accepted for publication. Researchers can trust that the articles they find in IJITCE have been critically assessed for validity, significance, and originality.
Synthetic feature generation to improve accuracy in prediction of credit limitsBOHRInternationalJou1
Financial institutions use various data mining algorithms to determine the credit limits for individuals using features
like age, education, employment, gender, income, and marital status. But, there is still a question of accurate
predictability, that is, how accurate can an institution be in predicting risk and granting credit levels. If an institution
grants too low of a credit limit/loan for an individual, then the institution may lose business to competitors, but
if the institution grants too high of a credit limit/loan, then the institution may lose money if that individual does
not repay the credit/loan. The novelty of this work is that it shows how to improve the accuracy in predicting
credit limits/loan amounts using synthetic feature generation. By creating secondary groupings and including
both the original binning and the synthetic bins, the classification accuracy and other statistical measures like
precision and ROC improved substantially. Hence, our research showed that without synthetic feature generation,
the classification rates were low, and the use of synthetic features greatly improved the classification accuracy and
other statistical measures.
Review Parameters Model Building & Interpretation and Model Tunin.docxcarlstromcurtis
Review Parameters: Model Building & Interpretation and Model Tuning
1. Model Building
a. Assessments and Rationale of Various Models Employed to Predict Loan Defaults
The z-score formula model was employed by Altman (1968) while envisaging bankruptcy. The model was utilized to forecast the likelihood that an organization may fall into bankruptcy in a period of two years. In addition, the Z-score model was instrumental in predicting corporate defaults. The model makes use of various organizational income and balance sheet data to weigh the financial soundness of a firm. The Z-score involves a Linear combination of five general financial ratios which are assessed through coefficients. The author employed the statistical technique of discriminant examination of data set sourced from publically listed manufacturers. A research study by Alexander (2012) made use of symmetric binary alternative models, otherwise referred to as conditional probability models. The study sought to establish the asymmetric binary options models subject to the extreme value theory in better explicating bankruptcy.
In their research study on the likelihood of default models examining Russian banks, Anatoly et al. (2014) made use of binary alternative models in predicting the likelihood of default. The study established that preface specialist clustering or mechanical clustering enhances the prediction capacity of the models. Rajan et al. (2010) accentuated the statistical default models as well as inducements. They postulated that purely numerical models disregard the concept that an alteration in the inducements of agents who produce the data may alter the very nature of data. The study attempted to appraise statistical models that unpretentiously pool resources on historical figures devoid of modeling the behavior of driving forces that generates these data. Goodhart (2011) sought to assess the likelihood of small businesses to default on loans. Making use of data on business loan assortment, the study established the particular lender, loan, and borrower characteristics as well as modifications in the economic environments that lead to a rise in the probability of default. The results of the study form the basis for the scoring model. Focusing on modeling default possibility, Singhee & Rutenbar (2010) found the risk as the uncertainty revolving around an enterprise’s capacity to service its obligations and debts.
Using the logistic model to forecast the probability of bank loan defaults, Adam et al. (2012) employed a data set with demographic information on borrowers. The authors attempted to establish the risk factors linked to borrowers are attributable to default. The identified risk factors included marital status, gender, occupation, age, and loan duration. Cababrese (2012) employed three accepted data mining algorithms, naïve Bayesian classifiers, artificial neural network decision trees coupled with a logical regression model to formulate a prediction m ...
Carry out a systematic literature review on the application of macTawnaDelatorrejs
Carry out a systematic literature review on the application of machine learning in Banking risk management building on the work of Leo et al., 2019 (attached).
· 12-13 pages (excluding title page and reference pages)
· References must not be older than 2 years.
· APA 7.0
Carry out a
systematic
literature review on the application of machine learning in
Banking risk
management
building on the work of Leo et al., 2019 (attached).
·
1
2
-
1
3
pages (excluding title page and reference pages)
·
References must not be older t
han 2 years.
·
APA 7.0
Carry out a systematic literature review on the application of machine learning in
Banking risk management building on the work of Leo et al., 2019 (attached).
12-13 pages (excluding title page and reference pages)
References must not be older than 2 years.
APA 7.0
risks
Article
Machine Learning in Banking Risk Management:
A Literature Review
Martin Leo * , Suneel Sharma and K. Maddulety
SP Jain School of Global Management, Sydney 2127, Australia; [email protected] (S.S.);
[email protected] (K.M.)
* Correspondence: [email protected]; Tel.: +65-9028-9209
Received: 25 January 2019; Accepted: 27 February 2019; Published: 5 March 2019
����������
�������
Abstract: There is an increasing influence of machine learning in business applications, with many
solutions already implemented and many more being explored. Since the global financial crisis,
risk management in banks has gained more prominence, and there has been a constant focus around
how risks are being detected, measured, reported and managed. Considerable research in academia
and industry has focused on the developments in banking and risk management and the current
and emerging challenges. This paper, through a review of the available literature seeks to analyse
and evaluate machine-learning techniques that have been researched in the context of banking risk
management, and to identify areas or problems in risk management that have been inadequately
explored and are potential areas for further research. The review has shown that the application of
machine learning in the management of banking risks such as credit risk, market risk, operational
risk and liquidity risk has been explored; however, it doesn’t appear commensurate with the current
industry level of focus on both risk management and machine learning. A large number of areas
remain in bank risk management that could significantly benefit from the study of how machine
learning can be applied to address specific problems.
Keywords: risk management; bank; machine learning; credit scoring; fraud
1. Introduction
Since the global financial crisis, risk management in banks has gained more prominence, and
there has been a constant focus on how risks are being detected, measured, reported and managed.
Considerable research (Van Liebergen 2017; Deloitte University Press 2017; Helbekkmo et al. 2013;
MetricStream 2018; Oliver Wyman 2017), both in academia and indus ...
A potential objective of every financial organization is to retain existing customers and attain new
prospective customers for long-term. The economic behaviour of customer and the nature of the
organization are controlled by a prescribed form called Know Your Customer (KYC) in manual banking.
Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc) are with
high risk; whereas in some sectors (Transport Operators, Auto-delear, religious) are with medium risk;
and in remaining sectors (Retail, Corporate, Service, Farmer etc) belongs to low risk. Presently, credit risk
for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are
many existing systems on customer retention as well as customer attrition systems in bank, these rigorous
methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used
records of business customers of a retail commercial bank in the city including rural and urban area of
(Tangail city) Bangladesh to analyse the major transactional determinants of customers and predicting of a
model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for
analysing the challenging issues, where pruned decision tree classification technique has been used to
develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to
build up a model to predict prospective business sectors in retail banking.
Explainable ensemble technique for enhancing credit risk predictionIAESIJAI
Credit risk prediction is a critical task in financial institutions that can impact lending decisions and financial stability. While machine learning (ML) models have shown promise in accurately predicting credit risk, the complexity of these models often makes them difficult to interpret and explain. The paper proposes the explainable ensemble method to improve credit risk prediction while maintaining interpretability. In this study, an ensemble model is built by combining multiple base models that uses different ML algorithms. In addition, the model interpretation techniques to identify the most important features and visualize the model's decision-making process. Experimental results demonstrate that the proposed explainable ensemble model outperforms individual base models and achieves high accuracy with low loss. Additionally, the proposed model provides insights into the factors that contribute to credit risk, which can help financial institutions make more informed lending decisions. Overall, the study highlights the potential of explainable ensemble methods in enhancing credit risk prediction and promoting transparency and trust in financial decision-making.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
2. 158 K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166
The remaining of this paper was structured as follows:
Section 2 detailed the previous studies related to the topic.
Section 3 described the discrete BN with a latent variable and
the proposed procedure for learning this class of BNs using a
customized EM algorithm. The proposed method was applied in
the context of loans classification and credit risk evaluation in
Section 4. Finally, our main conclusions were drawn in the ulti-
mate section.
2. Related work
The credit risk and bankruptcy prediction were extensively
studied over the recent years. Various models and techniques were
employed in these studies in the context of risk evaluation and
debtors classification. For instance, Danenas and Garsva (2015) in-
troduced a new approach based on linear SVM combined with ex-
ternal evaluation and sliding window testing. Their method ad-
dresses the imbalanced classes issue and is suitable for large
data sets. They showed that their method provides equivalent re-
sults compared to the logistic regression and RBF network. In
the same sphere of predicting financial distress, Geng, Bose, and
Chen (2015) used different types of data mining techniques, such
as neural network, Decision trees and SVM to build an early warn-
ing system that serves to forecast financial distress of Chinese com-
panies. The obtained results show that the neural network is more
accurate than the other considered classifiers. Alternatively, Bemš,
Starỳ, Macaš, Žegklitz, and Pošík (2015) used an innovative ap-
proach based on a modified magic square in order to evaluate the
corporations economic performance and detect potential compa-
nies that might be subject to default. One of the strengths of this
method is the graphical interpretation of a company score. Con-
trary to other techniques, such as discriminant analysis, logistic
regression, neural networks and survival analysis, Du Jardin and
Séverin (2011) proposed a new approach to evaluate a company
financial health based on a Kohonen map. This approach improves
the financial failure prediction accuracy.
On the other hand, Leong (2016) compared the BN to other
classification methods in the context of credit scoring. He pointed
out the ability of the BNs to tackle both the imbalanced classes
and data censoring issues. He also showed that the BN has an ef-
ficient performance when implemented onto a large data set com-
pared to the logistic regression, SVM, decision trees and Multi-
Layer Perceptron (MLP). Moreover, Sanford and Moosa (2015) used
the BN in the context of operational risk evaluation. They described
a BN-based tool developed within one of the major Australian
banks to predict operational risk events, aggregate operational
loss distributions, and Operational Value-at-Risk. Sousa, Gama, and
Brandão (2016) used a new dynamic modeling framework for
credit risk assessment consisting of sequential learning from the
new incoming data. This technique applied on a Brazilian credit
cards data set outperformed the static modeling methods. An-
other interesting technique for assessing credit risk, called the hy-
brid associative memory, was proposed by Cleofas-Sánchez, Gar-
cía, Marqués, and Sánchez (2016). This method was compared
to several popular machine learning techniques using nine real-
life financial databases. The obtained results suggest that this ap-
proach can be appropriate to predicting financial distress, espe-
cially in the case of data sets where the classes are strongly im-
balanced and/or over-lapped. In the same context, Mselmi, Lahiani,
and Hamza (2017) adopted five different techniques to predict the
financial distress of small and medium French firms. Their study
compared the Logit, ANN, SVM, Partial Least Squares models and
a hybrid model integrating SVM with Partial Least Squares. They
showed that while the SVM is the best classifier for one year prior
to financial distress, the hybrid model outperforms all the others
for a two year period prior to financial distress. Using credit de-
fault swaps data sets, Luo, Wu, and Wu (2017) used a deep learn-
ing technique in order to examine the performances of credit scor-
ing models. When Compared to other traditional techniques such
as logistic regression, MLP and SVM, the authors found out that
deep belief networks give the best performance. Tavana, Abtahi,
Di Caprio, and Poortarigh (2018) used two methods among the
most recent machine learning techniques namely ANNs and BNs
in order to evaluate financial problems related to key factors of
liquidity risk. Relying on real-case numerical experiments, the au-
thors came up to the conclusion that the two techniques are rather
complementary in revealing the liquidity risk.
3. Discrete Bayesian network with a latent variable
A BN consists of a directed acyclic graph (DAG) and a set of
associated conditional probability distributions. The DAG reflects
a set of conditional independence relationships between a set of
variables (nodes). A finite Discrete BN B = (G, P) is a BN whose
nodes are discrete random variables (X1, . . ., Xd ) taking a finite
number of values. In this paper, the following notations were used:
• d denotes the number of variables in the BN.
• Each node Xi takes ri possible values encoded as 1, 2, . . ., ri.
• The parents of each node Xi in G, denoted by Pa(Xi) have qi =
m∈i
rm possible configurations encoded as 1, 2, . . ., qi, where
i = {m; Xm ∈ Pa(Xi)}.
• The conditional distribution of Xi|Pa(Xi) is defined by the matrix
of probabilities θi = (θijk)1≤ j≤qi
1≤k≤ri
, where:
θijk = P(Xi = k|Pa(Xi) = j).
The set of all the parameters is denoted by = (θi)1≤i≤d and
the discrete BN model is depicted by B(G, ). For each node
Xi, θij = (θij1, . . ., θijri
) is the vector of conditional probabili-
ties defining the distribution of (Xi|Pa(Xi) = j) and satisfying
ri
k=1
θijk = 1.
The joint probability distribution of (X1, . . ., Xd ) is given by
P(X1, X2, . . .., Xd ) =
d
i=1
P(Xi|Pa(Xi)). (1)
Including latent variables in a BN is, generally useful to model hid-
den effects or causes, which leads to simpler relationships between
the nodes (Kwoh Gillies, 1996). In other situations, the latent
variable is a classifying or ranking one (Kim Jun, 2013). In this
case, we have a BN with a fixed DAG combined with a mixture dis-
tribution. The classifying variable, therefore, is used to manage the
memberships to the different classes. In what follows, a detailed
description of the proposed model is provided and an EM-based
algorithm is proposed to learn the model parameters.
3.1. Model description
Consider a classifying latent variable C taking values in {1, . . ., L}
and d discrete random variables X1, X2, . . ., Xd whose conditional
dependencies are encoded by a DAG G = (X, E), the BN with the
latent variable C is defined as follows:
For each 1 ≤ l ≤ L, X|C = l ∼ B(G, l ) (2)
Let π1, . . ., πL be the prior probabilities of each class, i.e., πl =
P(C = l) and L(G, l ) is the joint probability distribution induced
by the discrete BN B(G, l ). Then, the model can be rewritten as
a mixture model:
X ∼
L
l=1
πlL(G, l ) (3)
3. K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166 159
Fig. 1. An example of a BN with a classifying latent variable.
An example involving five observable variables is reported in
Fig. 1. The number of classes L is assumed to be known. This in-
formation can be either provided by domain experts or determined
through an advanced data analysis involving a model selection pro-
cedure (Njah, Jamoussi, Mahdi, Masmoudi, 2015).
This kind of BNs offers several advantages: First, it provides a
flexible tool to fit any probability distribution since it relies on
a mixture of different distributions. Second, this model is natu-
rally appropriate to deal with the data drawn from a population
which consists of L groups. Finally, after calibrating the model from
data, the main characteristics of each group can be easily extracted.
These advantages are illustrated in Section 4 in the context of
credit clients classification.
3.2. Model learning and inference
The main challenge with the BNs lies in the use of real data
sets when calibrating the model by identifying the most appro-
priate DAG and the most likely parameters. Indeed, the model
estimation consists of two parts: structure learning and param-
eters learning. The first task can be achieved using score-based
(Korb Nicholson, 2010) or constraint-based (Spirtes, Glymour,
Scheines et al., 2001) algorithms. For parameters learning, we pro-
pose a customized version of the EM algorithm assuming a fixed
structure.
Consider a random sample D = (X(1)
, X(2)
, . . ., X(M)
) drawn
from (3). The data set contains only the observable variables
X1, . . ., Xd. Here, the model is by construction formulated as an
incomplete-data problem, since the classifying variable C is unob-
servable. Hence, it would be adequate to use the EM algorithm in
order to estimate the model parameters. Thus, we introduce the
component label vectors (Z(1)
, Z(2)
, . . ., Z(M)
) associated to the in-
stances of X in D, where Z(m)
= (Zm1, . . ., ZmL) is L-dimensional
vector such that Zml = 1 if X(m) belongs to lth class an Zml = 0 oth-
erwise. The sample’s complete likelihood is given by:
lc(X(1)
, . . ., X(M)
, Z(1)
, . . ., Z(M)
| = (l )1≤l≤L, G)
=
M
m=1
L
l=1
πlP(X = X(m)
|l )
Zml
where P(X = X(m)
|l ) =
n
i=1
qi
j=1
ri
k=1
θ
δ(m)
ijk
lijk
and δ(m)
ijk
=
1 i f X(m)
i
= k and X(m)
pa(i)
= j
0 otherwise
.
The log-likelihood function is given by:
Lc =log(lc )=
M
m=1
L
l=1
Zml log(πl ) +
n
i=1
qi
j=1
ri
k=1
Zml (δ(m)
ijk
log(θlijk))
(4)
The proposed algorithm for parameters estimation is as follows:
• Initialization: Choose the initial parameters (0).
• E-Step: The latent variable C is handled, in the tth iteration,
by computing the conditional expectation of the complete-data
log likelihood given the observed data D, using the current fit
for (t)
Q(||(t)
) = E(Lc|(t)
, D) (5)
=
M
m=1
L
l=1
τ(t)
ml
log(πl ) + τ(t)
ml
n
i=1
qi
i=1
ri
k=1
δ(m)
ijk
log(θlijk)
where τ(t)
ml
= E(Zml|D, (t)) is the current expectation of Zml
given the observed data D. It can be viewed as the posterior
probability that X(m) belongs to the lth class:
τ(t)
ml
=
π(t)
l
P(X = X(m)
|(t)
l
)
L
k=1
π(t)
k
P(X = X(m)
|(t)
k
)
(6)
• M-Step The M-step, at the (t + 1)th iteration, requires the
global maximization of Q(||(t)) with respect to over the
parameters space to give the updated estimate.
(t+1)
= arg max
Q(||(t)
) (7)
The parameters update that maximizes Q(||(t)) is given by:
π(t+1)
l
=
1
M
M
m=1
τ(t)
ml
(8)
and
θ(t+1)
lijk
=
M
m=1
τ(t)
ml
δ(m)
ijk
ri
k=1
M
m=1
τ(t)
ml
δ(m)
ijk
(9)
As within any EM-based algorithm, these steps are alternated
until convergence. The data is then clustered using posterior prob-
abilities τml.
Proof. In order to maximize Q, we simply search the zero of its
first derivative with respect to θlijk. Taking into account the con-
straint
ri
k=1
θlijk = 1, we rewrite Q(||(t)) as follows:
M
m=1
L
l=1
τ(t)
ml
log(πl ) + τ(t)
ml
d
i=1
qi
j=1
δ(m)
ij1
log(1 −
ri
k=2
θlijk)
+
ri
k=2
δ(m)
ijk
log(θlijk) (10)
The first derivative of Q with respect to θlijk is given by:
∂Q
∂θlijk
=
M
m=1
τ(t)
ml
δ(m)
ijk
θlijk
−
δ(m)
ij1
1 −
ri
k=2
θlijk
(11)
Hence,
∂Q
∂θlijk
= 0 ⇔ θlijk
M
m=1
τ(t)
ml
δ(m)
ij1
= θlij1
M
m=1
τ(t)
ml
δ(m)
ijk
(12)
4. 160 K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166
Table 1
Descriptive statistics of the continuous variables: Amount, Duration, MBR and Age.
Min. 1st Qu. Median Mean 3rd Qu. Max.
Amount 3001.00 8000.00 13232.00 17200.96 20000.00 149958.00
Duration 1.00 4.84 6.84 7.48 7.07 28.43
MRB 51.00 157.00 219.00 261.24 312.00 15550.00
Age 18.07 34.26 43.33 43.37 51.60 75.98
Finally, summing over k and using the fact that
ri
k=1
θlijk = 1, we
get:
M
m=1
τ(t)
ml
δ(m)
ij1
= θlij1
ri
k=1
M
m=1
τ(t)
ml
δ(m)
ijk
. (13)
Thus, the maximum is achieved for θlij1 =
M
m=1 τ(t)
ml
δ(m)
ij1
ri
k=1
M
m=1 τ(t)
ml
δ(m)
ijk
. Simi-
larly, this formula generalizes for any k = 2, . . ., ri.
The proposed procedure for learning the discrete BN with latent
variable can be summarized as follows:
(i) Use the incomplete data set to learn the structure of the dis-
crete BN involving the observable variables (X1, . . ., Xd ).
(ii) Choose the appropriate number of classes L.
(iii) Learn the model’s parameters and cluster the data using the
proposed EM algorithm.
(iv) Use the conditional probability tables in order to determine the
main characteristics of the obtained classes.
4. Application: loans subscriber classification
In this section, the proposed BN, described in Section 3, was
used to model the credit risk and cluster the loans subscribers.
This model evaluates the credit default probability taking into ac-
count several explanatory variables and a classifying latent variable
C. The resulting model presents a default probability distribution
with several regimes or classes. These regimes correspond for ex-
ample to different market conditions (depending on economic and
political environment). They can also indicate various clients sub-
classes. The built BN model is useful as a decision aid system to
deal with new loan requests.
4.1. Data
A data set provided by the Tunisian Central bank was used
to illustrate the proposed model. This data set contains 9 vari-
ables describing loan contracts granted by several Tunisian banks
during the period 1990–2012. Among these variables, three pro-
vide personal information about the loan subscribers (Age, Job and
Gender). The remaining variables specify the contract characteris-
tics (Type, Amount, Duration, Monthly Reimbursement MBR, De-
fault) and whether the credit provider is a public bank (isPublic).
The Default variable indicates whether the credit was duly paid
(Default=0) or not (Default=1): a customer is considered as a de-
faulter when a payment delay for more than 90 days is observed.
The data set includes 4 continuous variables whose statistical
summary and histograms are reported in Table 1 and Fig. 2, re-
spectively.
Amount
Amount
Frequency
0 50000 100000 150000
0
5000
15000
25000
Duration
Duration
Frequency
0 5 10 15 20 25
0
5000
15000
25000
MRB
MRB
Frequency
0 500 1000 1500 2000
0
5000
10000
15000
20000
Age
Age
Frequency
20 30 40 50 60 70
0
2000
4000
6000
Fig. 2. Histograms of the continuous variables: Amount, Duration, MBR and Age.
5. K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166 161
Table 2
Discretized variables of the credit data set.
Variable Brief description Values (Frequency)
Default Default payment 0: Non default (16%) 1: Default (84%)
isPublic Is a state-owned bank? 0: No (87%) 1: Yes (13%)
Type Credit type 1: Consumer Credit (23%) 2:Housing credit (77%)
Amount Amount of credit 1:[13.6–30](38%) 2:[3–13.6](18%) 3:[30–500](44%)
MRB Monthly repayment burden of credit 1:[0–261](26%) 2:[261,568](21%) 3:[568,2000](53%)
Duration Credit Duration 1:[0–5.1](52%) 2:[5.1–30](24%) 3:[5.1–15.1](24%)
Gender Men and Female 1: Women (39%) 1: Men (61%)
Age Age in years 1: [0–35](25%) 2:[35–55](65%) 3:[55–99](10%)
Job Job of households 1: Jobless (18%) 2:Middle Executive (26%) 3:Retired (14%)
4: Senior Executive (14%) 5:Student (14%) 6: Worker (14%)
In order to use the proposed model, these 4 variables are dis-
cretized taking the values indicated in Table 2. This table describes
the 9 discrete variables included in the final data set: the discrete
values and their empirical frequencies are reported in Table 2. It
is worth noting that bad debtors proportion is 16%. The consid-
ered data set which contains n = 106298 instances was split into
a training set composed of 79724 observations and a testing set of
size 26574.
4.2. Model calibration
Using the above-described data set, a BN was built with a la-
tent variable consisting of two classes. The learning procedure
described in Section 3.2 was applied. First, a prior expert knowl-
edge was combined with the Hill-climbing algorithm (Chow
Liu, 1968) to obtain the DAG reported in Fig. 3. Second, relying on
the proposed EM algorithm, we estimated the model parameters.
In particular, we learned the conditional probability tables within
each class. Fig. 3. The DAG structure from credit data.
Fig. 4. Marginal distributions whithin class 1.
6. 162 K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166
4.3. Inference and analysis
The DAG reported in Fig. 3 provides the relationships between
the studied variables through the analysis of the active trails.
Specifically, the Default variable has a direct connection with Age
and Duration variables.The Type and Amount variables are both
connected to the Default variable through different trails, e.g.,
Amount variable is connected to Default variable through Job vari-
able, once Job is instantiated the connection via this trail is broken,
and Type variable is connected to Default variable through Dura-
tion variable. Other possible trails exist.
Relying on the learned conditional probability tables, the joint
probability distribution over the DAG nodes is a mixture of two
BNs corresponding to two classes. In fact, the loan contracts are
classified into 2 clusters having different properties and probabil-
ity distributions. In order to highlight the characteristics of each
class, we firstly compared the marginal distributions of each node;
Then, we performed several inference tasks using various scenar-
ios. Specifically, we used a step-by-step instantiation technique to
track the default probability evolution within each class.
The marginal distributions for the two classes are displayed in
Figs. 4 and 5, respectively. These figures particularly show that the
probability of default in class 1 is 23.15% whereas it is 17.26% in
class 2. For both classes, the marginal distributions show that the
majority of borrowers are middle executives and got credit from
private banks with a Duration not exceeding 5 years [0–5 years].
For class 2, the majority of borrowers received Housing credits
(99.7%) whereas, in class 1, only 2/3 of borrowers obtained housing
credits. Considering the Age variable, the majority of the borrow-
ers (83.60%) are within the age interval 35–55 years old for class
2. As for class 1, only 53.76% of the borrowers belong to above-
mentioned age group. It is worth noting that the highest propor-
tion of debtors in class 2 got credits ranging between 30kTND and
500kTND, whereas the majority of households in class 1 obtained
loans ranging between 13.6kTND and 30kTND.
We notice that the default probability is higher for housing
credits compared to consumer credits. The most risky profile corre-
sponds to jobless men below the age of 35 years having subscribed
for a loan from a public bank.
A deeper analysis for both classes probability distributions was
performed relying on step by step technique with two scenar-
ios. This method consists in instantiating a variable at each sin-
gle step and tracking the credit default probability. For scenario 1,
the adopted steps are arranged by instantiating the five variables
Fig. 5. Marginal distributions whithin class 2.
7. K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166 163
Fig. 6. Step by step inference to evaluate payment default probability in scenario 1.
as follows: Step 1, type of bank would be either Private or Pub-
lic; Step 2 Gender variable is set either to Man or Woman; Step
3, the Duration variable is fixed either to [0–5years], [5–15 years]
or [15–30 years]; And finally, in steps 4, 5 and 6, the age is set to
0–35, 35–55 and 55–99, respectively. For the second scenario, the
first step is replaced by credit type instantiation and the remaining
steps are the same.
For the two classes, the default probability evolution at each
step is shown in Fig. 6. This figure proves that the default proba-
bility is higher for state-owned banks than private banks (35.07%
for class 1 and 31.77% for class 2). This result seems logical since
the state-owned banks are less restrictive when granting loans. The
gender variable has little significant effect on payment default oc-
currence. However, Duration variable is closely related to the de-
fault occurrence. The probability of defaulting increases when Du-
ration variable is instantiated in state [0–5 years] and decreases
for the remaining states ([5–15 years] and [15–30 years]). More-
over, the riskiest clients are under 35 years old. For the remaining
age categories, the probability of default shows a sharp downward
trend.
Fig. 7 confirms that the highest probability of default is ob-
tained from Housing credit. This probability of default is higher in
class 1 (24.97%). This situation is stressed during the recent previ-
ous years as stated in the 2016 annual report of the Tunisian Cen-
tral Bank. According to this report, the amount of unpaid housing
loans has continued to evolve from 356 MTD in 2014 to 437 MTD
in 2016, whereas the amount of unpaid consumer credits increased
from 141 MTD to 168 MTD during the same period.
For both scenarios, we notice that the probability of default is
higher in class 1 than class 2. Another difference between the two
classes is the effect of the duration and age variables on the prob-
ability of default. In fact, this probability is significantly lower for
class 2 when the duration variable is fixed to [5–15] years. It re-
mains low when the age variable is instantiated for class 2 but
clearly increases in class 1 for young loan subscribers (less than 35
years old).
The calibrated model allows handling a default probability dis-
tribution with several regimes or classes taking into account vari-
ous interdependent factors. In fact, using the posterior probability
defined in (6), we clustered the loans contracts. The obtained clas-
sification is summarized in Fig. 8 which displays the frequency of
each class by year. This figure shows in particular, a shift in the
regime occurred in the year 2010: before 2010, the majority of
contracts belonged to the second class whereas after this year the
8. 164 K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166
Fig. 7. Step by step inference to evaluate payment default probability in scenario 2.
Fig. 8. Contract proportions per year for each class based on the EM algorithm clustering.
9. K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166 165
Table 3
Performance comparison of the four techniques.
Method Accuracy Precision Recall F1
Discrete BN 93.53% 83.02% 74.87% 78.73%
Decision tree 87.39% 61.51% 65.12% 63.26%
Radial SVM 93.61% 83.25% 75.18% 79.01%
BN with latent variable 94,77% 85,31% 81,31% 83,26%
bigger part of the contracts belonged to class 1. This result is in
concordance with the economic and political events happening
during the period around 2010. First, there was the global crisis
that had an important impact on the nonperforming loans. This
period also witnessed a political instability and various social con-
flicts in Tunisia.
4.4. Classification and comparison
In this subsection, the calibrated BN with latent variable is used
to cluster the customers through the probability of default. Its per-
formance is compared, in terms of accuracy, precision and recall,
to three other popular classifiers namely the decision tree, the dis-
crete BN and the Radial SVM. The obtained results are reported in
Table 3. The displayed evaluation metrics are defined as follows:
• Accuracy is the proportion of correctly classified items.
• Precision is the fraction of positive predictions that are correct.
• Recall is the fraction of all positive (default) instances the clas-
sifier correctly identifies as positive. It is also known as the True
positive rate.
• F1 score is calculated according to the following formula:
F1 = 2.
Precision.Recall
Precision + Recall
The reported results show that, for the considered data set, the
discrete BN and SVM techniques have comparable performances
and outperform the decision tree approach. Moreover, the addition
of the latent variable in the BN improves the classification results.
5. Discussion and conclusion
In this paper, we described the BN with a latent variable and
proposed a procedure for its calibration. This model was used to
evaluate the payment default probability of loans subscribers. The
calibrated model takes into account several risk factors including
clients attributes (Age, job,...) and contract characteristics (amount,
duration,...). It also describes the relationships between these fac-
tors relying on their probabilistic conditional dependencies. Finally,
the loan contracts can be classified using the Bayes criterion.
The obtained BN involved two classes which have similari-
ties and differences in terms of probability distributions. For both
classes, the gender variable seems to have little significant effect
on the probability of payment default.The second similarity lies in
the fact that the payment default is more likely to occur when the
household is under 35. Finally, housing loans have a higher prob-
ability of default compared to other loan types. However, the pay-
ment default rate is higher in class 1 and the impact of Age and
Duration variables is different for the two classes. Focusing on the
relationship established between the loans classification and their
subscription dates showed a change in the probability distribution
occurring after the year 2010. This shift is tightly connected to the
economic and political environment: growth stagnation, inflation
and rising interest rates.
Apart from the reached conclusions, the BN with a latent vari-
able proved to be useful as a decision making tool in loans attri-
bution and clients’ selection. The use of computational models may
be informative on a more adequate management of nonperforming
loans.
The proposed network can be further developed to involve
other variables than those discussed in this research study. It
would also be interesting to use the proposed model in other con-
texts relying on other data sets.
Credit authorship contribution statement
Khalil Masmoudi: Software, Conceptualization, Methodology,
Data curation, Writing - review editing. Lobna Abid: Concep-
tualization, Visualization, Investigation, Resources, Formal analysis.
Afif Masmoudi: Conceptualization, Methodology, Supervision, Val-
idation.
References
Abid, L., Masmoudi, A., Zouari-Ghorbel, S. (2016). The consumer loans payment
default predictive model: An application of the logistic regression and the dis-
criminant analysis in a tunisian commercial bank. Journal of the Knowledge Econ-
omy, 1–15.
Abid, L., Zaghdene, S., Masmoudi, A., Ghorbel, S. Z. (2017). Bayesian network mod-
eling: A case study of credit scoring analysis of consumer loans default pay-
ment. Asian Economic and Financial Review, 7(9), 846.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of cor-
porate bankruptcy. The Journal of Finance, 23(4), 589–609.
Bemš, J., Starỳ, O., Macaš, M., Žegklitz, J., Pošík, P. (2015). Innovative default pre-
diction approach. Expert Systems with Applications, 42(17–18), 6277–6285.
Bijak, K., Thomas, L. C. (2012). Does segmentation always improve model perfor-
mance in credit scoring? Expert Systems with Applications, 39(3), 2433–2442.
Bouchaala, L., Masmoudi, A., Gargouri, F., Rebai, A. (2010). Improving algorithms
for structure learning in Bayesian networks using a new implicit score. Expert
Systems with Applications, 37(7), 5470–5475.
Chow, C., Liu, C. (1968). Approximating discrete probability distributions with de-
pendence trees. IEEE Transactions on Information Theory, 14(3), 462–467.
Cleofas-Sánchez, L., García, V., Marqués, A., Sánchez, J. S. (2016). Financial distress
prediction using the hybrid associative memory with translation. Applied Soft
Computing, 44, 144–152.
Danenas, P., Garsva, G. (2015). Selection of support vector machines based classi-
fiers for credit risk domain. Expert Systems with Applications, 42(6), 3194–3204.
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from in-
complete data via the em algorithm. Journal of the Royal Statistical Society. Series
B , 1–38.
Du Jardin, P., Séverin, E. (2011). Predicting corporate bankruptcy using a self-orga-
nizing map: An empirical study to improve the forecasting horizon of a financial
failure model. Decision Support Systems, 51(3), 701–711.
García, V., Marqués, A. I., Sánchez, J. S. (2015). An insight into the experimental
design for credit risk and corporate bankruptcy prediction systems. Journal of
Intelligent Information Systems, 44(1), 159–189.
Geng, R., Bose, I., Chen, X. (2015). Prediction of financial distress: An empirical
study of listed chinese companies using data mining. European Journal of Oper-
ational Research, 241(1), 236–247.
Gevaert, O., De Smet, F., Kirk, E., Van Calster, B., Bourne, T., Van Huffel, S.,
et al. (2006). Predicting the outcome of pregnancies of unknown location:
Bayesian networks with expert prior information compared to logistic regres-
sion. Human Reproduction, 21(7), 1824–1831.
Ghribi, A., Masmoudi, A. (2013). A compound poisson model for learning discrete
Bayesian networks. Acta Mathematica Scientia, 33(6), 1767–1784.
Hand, D. J., Henley, W. E. (1997). Statistical classification methods in consumer
credit scoring: A review. Journal of the Royal Statistical Society: Series A, 160(3),
523–541.
Harris, T. (2015). Credit scoring using the clustered support vector machine. Expert
Systems with Applications, 42(2), 741–750.
Hassen, H. B., Masmoudi, A., Rebai, A. (2008). Causal inference in biomolecular
pathways using a Bayesian network approach and an implicit method. Journal
of Theoretical Biology, 253(4), 717–724.
Kim, J.-S., Jun, C.-H. (2013). Ranking evaluation of institutions based on a Bayesian
network having a latent variable. Knowledge-Based Systems, 50, 87–99.
Korb, K. B., Nicholson, A. E. (2010). Bayesian artificial intelligence. CRC press.
Kwoh, C.-K., Gillies, D. F. (1996). Using hidden nodes in Bayesian networks. Artifi-
cial Intelligence, 88(1–2), 1–38.
Leong, C. K. (2016). Credit risk scoring with Bayesian network models. Computational
Economics, 47(3), 423–446.
Lessmann, S., Baesens, B., Seow, H.-V., Thomas, L. C. (2015). Benchmarking
state-of-the-art classification algorithms for credit scoring: An update of re-
search. European Journal of Operational Research, 247(1), 124–136.
Longstaff, F. A. (2010). The subprime credit crisis and contagion in financial markets.
Journal of Financial Economics, 97(3), 436–450.
Louzada, F., Ara, A., Fernandes, G. B. (2016). Classification methods applied to
credit scoring: systematic review and overall comparison. Surveys in Operations
Research and Management Science.
10. 166 K. Masmoudi, L. Abid and A. Masmoudi / Expert Systems With Applications 127 (2019) 157–166
Luo, C., Wu, D., Wu, D. (2017). A deep learning approach for credit scoring us-
ing credit default swaps. Engineering Applications of Artificial Intelligence, 65,
465–470.
Mselmi, N., Lahiani, A., Hamza, T. (2017). Financial distress prediction: The case of
french small and medium-sized firms. International Review of Financial Analysis,
50, 67–80.
Njah, H., Jamoussi, S., Mahdi, W., Masmoudi, A. (2015). A new equilibrium crite-
rion for learning the cardinality of latent variables. In Tools with artificial intelli-
gence (ICTAI), 2015 IEEE 27th international conference on (pp. 958–965). IEEE.
Nkusu, M. M. (2011). Nonperforming loans and macrofinancial vulnerabilities in ad-
vanced economies. International Monetary Fund.
Pearl, J. (1988). Morgan Kaufmann series in representation and reasoning. proba-
bilistic reasoning in intelligent systems: Networks of plausible inference.
Sanford, A., Moosa, I. (2015). Operational risk modelling and organizational learn-
ing in structured finance operations: A bayesian network approach. Journal of
the Operational Research Society, 66(1), 86–115.
Sousa, M. R., Gama, J., Brandão, E. (2016). A new dynamic modeling framework
for credit risk assessment. Expert Systems with Applications, 45, 341–351.
Spirtes, P., Glymour, C., Scheines, R., et al. (2001). Causation, prediction, and search.
MIT Press Books, 1.
Tavana, M., Abtahi, A.-R., Di Caprio, D., Poortarigh, M. (2018). An artificial neural
network and Bayesian network model for liquidity risk assessment in banking.
Neurocomputing, 275, 2525–2554.
Thomas, L. C., Edelman, D., Crook, J. (2002). Credit scoring and its applications:
Siam monographs on mathematical modeling and computation. Philadelphia: Uni-
versity City Science Center, SIAM.
Tomczak, J. M., Zieba, M. (2015). Classification restricted Boltzmann machine for
comprehensible credit scoring model. Expert Systems with Applications, 42(4),
1789–1796.
Zhao, Z., Xu, S., Kang, B. H., Kabir, M. M. J., Liu, Y., Wasinger, R. (2015). Investi-
gation and improvement of multi-layer perceptron neural networks for credit
scoring. Expert Systems with Applications, 42(7), 3508–3516.