Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Bayesian statistics by Sagar Kamble 864 views
- Probabilistic modeling in deep lear... by Denis Dus 2227 views
- Cost-Aware Virtual Machine Placemen... by Soodeh Farokhi 640 views
- Module 5 Bayesian belief network mo... by Think2Impact 1369 views
- Bayes Belief Network by Hendri Karisma 2959 views
- Bayesian Network by Tommy96 5125 views

468 views

Published on

Full text link: http://www.idi.ntnu.no/~helgel/papers/BorchaniMartinezMasegosaLangsethNielsenSalmeronFernandezMadsenSaezSCAI15.pdf

Published in:
Science

License: CC Attribution License

No Downloads

Total views

468

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

19

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Dynamic Bayesian modeling for risk prediction in credit operations Hanen Borchani1, Ana M. Martínez1, Andrés R. Masegosa2, Helge Langseth2, Thomas D. Nielsen1, Antonio Salmerón3, Antonio Fernández4, Anders L. Madsen1,5, Ramón Sáez4 1Department of Computer Science, Aalborg University, Denmark 2 Department of Computer and Information Science, The Norwegian University of Science and Technology, Norway 3Department of Mathematics, University of Almería, Spain 4 Banco de Crédito Cooperativo, Spain 5 Hugin Expert A/S, Aalborg, Denmark Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 1
- 2. Outline 1 Introduction 2 The ﬁnancial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 2
- 3. Outline 1 Introduction 2 The ﬁnancial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 3
- 4. Introduction Eﬃcient solutions for risk prediction in banks can be crucial for reducing losses due to ineﬃcient business procedures. Such solutions can be used as tools for monitoring the evolution of customers in terms of credit operations risk to increase solvency of the banking institutions. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 4
- 5. Introduction Eﬃcient solutions for risk prediction in banks can be crucial for reducing losses due to ineﬃcient business procedures. Such solutions can be used as tools for monitoring the evolution of customers in terms of credit operations risk to increase solvency of the banking institutions. From a machine learning perspective, credit scoring has traditionally been approached as a supervised classiﬁcation problem. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 5
- 6. Introduction Eﬃcient solutions for risk prediction in banks can be crucial for reducing losses due to ineﬃcient business procedures. Such solutions can be used as tools for monitoring the evolution of customers in terms of credit operations risk to increase solvency of the banking institutions. From a machine learning perspective, credit scoring has traditionally been approached as a supervised classiﬁcation problem. However, recently, this problem presents additional challenging characteristics that separate it from the standard supervised classiﬁcation problems. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 6
- 7. Challenges Classiﬁcation in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 7
- 8. Challenges Classiﬁcation in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. A delayed class-feedback: the class label for each sample/client corresponds to the client’s defaulting behavior in the following twelve months and this information is therefore only available after a twelve month delay. Thus, the available data is a mixture of labeled and unlabeled samples. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 8
- 9. Challenges Classiﬁcation in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. A delayed class-feedback: the class label for each sample/client corresponds to the client’s defaulting behavior in the following twelve months and this information is therefore only available after a twelve month delay. Thus, the available data is a mixture of labeled and unlabeled samples. Concept drift: the domain exhibits a form of concept drift where the data distribution as well as the set of feature variables relevant for classiﬁcation may vary over time. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 9
- 10. Challenges Classiﬁcation in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. A delayed class-feedback: the class label for each sample/client corresponds to the client’s defaulting behavior in the following twelve months and this information is therefore only available after a twelve month delay. Thus, the available data is a mixture of labeled and unlabeled samples. Concept drift: the domain exhibits a form of concept drift where the data distribution as well as the set of feature variables relevant for classiﬁcation may vary over time. Objective: Explore the credit scoring problem based on a real-world data set. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 10
- 11. Outline 1 Introduction 2 The ﬁnancial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 11
- 12. The ﬁnancial data set Provided by a Spanish bank in the Almería region: Banco de Crédito Cooperativo (BCC). It contains monthly aggregated information for a set of BCC clients for the period from April 2007 to March 2014. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 12
- 13. The ﬁnancial data set Provided by a Spanish bank in the Almería region: Banco de Crédito Cooperativo (BCC). It contains monthly aggregated information for a set of BCC clients for the period from April 2007 to March 2014. Only “active” clients are considered, meaning that we restrict our attention to individuals between 18 and 65 years of age, who have at least one automatic bill payment or direct debit in the bank. BCC employees are excluded since they have special conditions. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 13
- 14. The ﬁnancial data set Provided by a Spanish bank in the Almería region: Banco de Crédito Cooperativo (BCC). It contains monthly aggregated information for a set of BCC clients for the period from April 2007 to March 2014. Only “active” clients are considered, meaning that we restrict our attention to individuals between 18 and 65 years of age, who have at least one automatic bill payment or direct debit in the bank. BCC employees are excluded since they have special conditions. The resulting data set includes 50 000 clients each month. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 14
- 15. The ﬁnancial data set 44 feature variables, denoted Xt, where 11 variables describing the ﬁnancial status of a client (VARXX) and 33 socio-demographic variables (SOCXX). Variable ID Description Variable ID Description VAR01 Total credit amount VAR07 Unpaid amount in mortgages VAR02 Income VAR08 Unpaid amount in personal loans VAR03 Expenses VAR09 Unpaid amount in credit cards VAR04 Account balance VAR10 Unpaid amount in bank account deﬁcit VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 15
- 16. The ﬁnancial data set 44 feature variables, denoted Xt, where 11 variables describing the ﬁnancial status of a client (VARXX) and 33 socio-demographic variables (SOCXX). Variable ID Description Variable ID Description VAR01 Total credit amount VAR07 Unpaid amount in mortgages VAR02 Income VAR08 Unpaid amount in personal loans VAR03 Expenses VAR09 Unpaid amount in credit cards VAR04 Account balance VAR10 Unpaid amount in bank account deﬁcit VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables Each client u has an associated class variable C (u) t for each time step t that indicates if that particular client will default during the following 12 months. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 16
- 17. Outline 1 Introduction 2 The ﬁnancial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 17
- 18. Dynamic Bayesian classiﬁers A dynamic probabilistic model for doing prediction in the BCC domain At time T (the current time), we predict the defaulting status (CT ) of a particular client based on previous socio-economical observations and the client’s known defaulting status λ = 12 months earlier. in fact apply to most credit scoring problems as well as many other domains. We will discuss this issue further in Section 5, which also serves to demonstrate the broader relevance of the above mentioned problems. In this paper we present a ﬁrst approach to address the BCC credit scoring problem3 based on the use of a simple dynamic probabilistic graphical model [5]. A rough visual description of this model is given in Figure 1. Our preliminary approach is implemented based on the AMIDST Toolbox4 . This toolbox provides an e cient implementation of approximate inference and learning methods for streaming data using the Bayesian networks modeling framework [5] as well as variational Bayes inference and learning procedures [6]. CT 12 CT 11 XT 11 CT 10 XT 10 CT 1 XT 1 CT XT Figure 1. A dynamic probabilistic model for doing prediction in the BCC domain. At time T (assumed to be the current time) we wish to predict the defaulting status (CT ) of a particular customer based on previous socio-economical observations as well as the customer’s known defaulting status = 12 months earlier. Note that due to the independence assumptions in the Figure 1: Square/Round boxes indicate data which is available/non-available when predicting the defaulting status of the clients at month T. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 18
- 19. Dynamic Bayesian classiﬁers A 2-time-slices Dynamic Naïve Bayes classiﬁer X1,t−1 Ct−1 Ct ."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 19
- 20. Dynamic Bayesian classiﬁers A 2-time-slices Dynamic Naïve Bayes classiﬁer X1,t−1 Ct−1 Ct ."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t It assumes that only the class variables are connected across time and that all the predictive variables at time step t are conditionally independent given the class variable at time t. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 20
- 21. Dynamic Bayesian classiﬁers A 2-time-slices Dynamic Naïve Bayes classiﬁer X1,t−1 Ct−1 Ct ."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t It assumes that only the class variables are connected across time and that all the predictive variables at time step t are conditionally independent given the class variable at time t. The joint probability factorizes as p(c1:T , x1:T ) = T t=1 p(ct|ct−1) n i=1 p (xi,t|ct) · Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 21
- 22. Dynamic Bayesian classiﬁers Learning the model Bayesian approach for multinomial and normally distributed data. p (xi,t|ct) are learned from the labeled data DT−λ. p(ct|ct−1) are learned using the class transitions from DT−λ−1 to DT−λ. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 22
- 23. Dynamic Bayesian classiﬁers Learning the model Bayesian approach for multinomial and normally distributed data. p (xi,t|ct) are learned from the labeled data DT−λ. p(ct|ct−1) are learned using the class transitions from DT−λ−1 to DT−λ. Prediction It amounts to calculating the conditional probability for the class label for each client u at time T given all the information collected so far, D1:T . p c (u) t |x (u) t−λ+1:t , c (u) t−λ ∝ p x (u) t |c (u) t c (u) t−1 p c (u) t |c (u) t−1 p c (u) t−1|x (u) t−λ+1:t−1, c (u) t−λ · Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 23
- 24. Dynamic Bayesian classiﬁers Feature subset selection The relevance of the variables may vary over time. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 24
- 25. Dynamic Bayesian classiﬁers Feature subset selection The relevance of the variables may vary over time. We consider a wrapper feature selection method with the Naïve Bayes model as the base classiﬁer combined with greedy search. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 25
- 26. Dynamic Bayesian classiﬁers Feature subset selection The relevance of the variables may vary over time. We consider a wrapper feature selection method with the Naïve Bayes model as the base classiﬁer combined with greedy search. The area under the curve (AUC) was used as the objective function, because AUC usually performs well even if the data has class imbalance. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 26
- 27. Dynamic Bayesian classiﬁers Feature subset selection The relevance of the variables may vary over time. We consider a wrapper feature selection method with the Naïve Bayes model as the base classiﬁer combined with greedy search. The area under the curve (AUC) was used as the objective function, because AUC usually performs well even if the data has class imbalance. In our case, the feature selection method is performed at each time step to infer which variables are helpful in separating defaulters from non-defaulters. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 27
- 28. Outline 1 Introduction 2 The ﬁnancial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 28
- 29. AMIDST toolbox Open source Java toolbox http://amidst.github.io/toolbox/ Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 29
- 30. Predictive performance analysis The feature subset selection helps to improve the value of the AUC. The AUC value increases over time: the problem becomes easier to solve.0.650.700.750.800.850.900.951.00 AUC Dynamic NB with FS Dynamic NB May2008 Jul2008 Sep2008 Nov2008 Jan2009 Mar2009 May2009 Jul2009 Sep2009 Nov2009 Jan2010 Mar2010 May2010 Jul2010 Sep2010 Nov2010 Jan2011 Mar2011 May2011 Jul2011 Sep2011 Nov2011 Jan2012 Mar2012 May2012 Jul2012 Sep2012 Nov2012 Jan2013 Mar2013 May2013 Jul2013 Sep2013 Nov2013 Jan2014 Mar2014 Figure 2: AUC results for the Dynamic Naive Bayes (NB) classiﬁer with and without feature selection (FS). Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 30
- 31. Analysis of relevant features In general, the sociodemographic features play a minor role in terms of predictive performance. VAR01 VAR02 VAR04 VAR05 VAR06 VAR07 VAR08 VAR09 VAR10 VAR11 SOC01 SOC02 SOC03 SOC05 SOC06 SOC07 SOC10 SOC11 SOC12 SOC14 SOC16 SOC17 SOC18 SOC20 SOC22 SOC26 SOC28 SOC31 May2007 Jul2007 Sep2007 Nov2007 Jan2008 Mar2008 May2008 Jul2008 Sep2008 Nov2008 Jan2009 Mar2009 May2009 Jul2009 Sep2009 Nov2009 Jan2010 Mar2010 May2010 Jul2010 Sep2010 Nov2010 Jan2011 Mar2011 May2011 Jul2011 Sep2011 Nov2011 Jan2012 Mar2012 May2012 Jul2012 Sep2012 Nov2012 Jan2013 Mar2013 Figure 3: The set of selected features throughout the months. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 31
- 32. Analysis of relevant features The most frequently selected variables consistently separate the two types of clients, such as VAR04 and VAR08. −0.3−0.2−0.10.00.1 VAR04 Jun2007 Sep2007 Dec2007 Mar2008 Jun2008 Sep2008 Dec2008 Mar2009 Jun2009 Sep2009 Dec2009 Mar2010 Jun2010 Sep2010 Dec2010 Mar2011 Jun2011 Sep2011 Dec2011 Mar2012 Jun2012 Sep2012 Dec2012 Mar2013 Jun2013 Sep2013 Dec2013 Mar2014 Non−defaulting Defaulting 0.00.51.01.52.02.5 VAR08 Jun2007 Sep2007 Dec2007 Mar2008 Jun2008 Sep2008 Dec2008 Mar2009 Jun2009 Sep2009 Dec2009 Mar2010 Jun2010 Sep2010 Dec2010 Mar2011 Jun2011 Sep2011 Dec2011 Mar2012 Jun2012 Sep2012 Dec2012 Mar2013 Jun2013 Sep2013 Dec2013 Mar2014 Non−defaulting Defaulting Figure 4: Time-dependent averages of variables VAR04 (“Account balance”) and VAR08 (“Unpaid amount in personal loans”) for non-defaulting and defaulting clients. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 32
- 33. Outline 1 Introduction 2 The ﬁnancial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 33
- 34. Conclusion A ﬁrst step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classiﬁer with a wrapper feature subset selection. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 34
- 35. Conclusion A ﬁrst step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classiﬁer with a wrapper feature subset selection. The feature subset selection helps to improve the results and gives insight into which attributes are most relevant as a function of time. Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 35
- 36. Conclusion A ﬁrst step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classiﬁer with a wrapper feature subset selection. The feature subset selection helps to improve the results and gives insight into which attributes are most relevant as a function of time. The AMIDST toolbox performs inference and learning under a Bayesian framework and provides functionality to improve the presented model Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 36
- 37. Conclusion A ﬁrst step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classiﬁer with a wrapper feature subset selection. The feature subset selection helps to improve the results and gives insight into which attributes are most relevant as a function of time. The AMIDST toolbox performs inference and learning under a Bayesian framework and provides functionality to improve the presented model Use of more expressive network structures Extend the feature subset selection method to take the set of selected features from the previous time-steps into account Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 37
- 38. Thank you for your attention Questions? Acknowledgments: This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209 Scandinavian Conference on Artiﬁcial Intelligence, Halmstad, November 5–6, 2015 38

No public clipboards found for this slide

Be the first to comment