SlideShare a Scribd company logo
1 of 7
Download to read offline
International Journal of Trend in Scientific Research and Development (IJTSRD)
Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021
Organized by: Uttaranchal Institute of Technology, Uttaranchal University, Dehradun
Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
@ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 22
A Review on Credit Card Default Modelling using Data Science
Harsh Nautiyal, Ayush Jyala, Dishank Bhandari
UIT, Uttaranchal University, Dehradun, Uttarakhand, India
How to cite this paper: Harsh Nautiyal | Ayush
Jyala | Dishank Bhandari "A Review on Credit Card
Default Modelling using Data Science" Published in
International Journal ofTrendinScientificResearch
and Development (ijtsrd), ISSN: 2456-6470,Special
Issue | International Conference on Advances in
Engineering, Science and Technology – 2021, May
2021, pp.22-28, URL:
www.ijtsrd.com/papers/ijtsrd42461.pdf
Copyright © 2021 by author(s) and
International Journal of Trend in Scientific
Research and
Development Journal.
This is an Open Access
article distributed under the terms of the
Creative Commons Attribution License
(CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)
1. INTRODUCTION
In the last few years, credit card issuers have become one of
the major consumer lending products in the U.S. as well as
several other developed nations of the world, representing
roughly 30% of total consumer lending (USD 3.6tnin2016).
Credit cards issued by banks hold the majority of the market
share with approximately 70% of the total outstanding
balance. Bank’s credit card charge offs have stabilized after
the financial crisis to around 3% of the outstanding total
balance. However, there are still differences in the credit
card charge off levels between different competitors.
Credit card is a flexible tool by which you can use bank’s
money for a short period of time. If you accept a credit card,
you agree to pay your bills by the due date listed on your
credit card statement. Otherwise, the credit card will be
defaulted. When a customer is not able to pay back the loan
by the due date and the bank is totally certain that they are
not able to collect the payment, it will usually try to sell the
loan. After that, if the bank recognizes that they are not able
to sell it, they will write it off. This is called a charge-off. This
results in significant financial losses to thebank ontopofthe
damaged credit rating of the customer and thus it is an
important problem to be tackled in todays world where
financial risks are happening vigorously.
Predicting accurately which customers are mostprobableto
default represents significant business opportunity and
strategy for all banks. Bank cards are the most common
credit card type in the U.S., which emphasizes the impact of
risk prediction to both the consumers and banks. In a well-
developed financial system, risk prediction is essential for
predicting business performance or individual customers’
credit risk and to reduce the damage and uncertainty.
Our client ITBCO Bank has approached us to help them to
predict and prevent credit card defaulters to improve their
bottom line. The client has a screening process, for instance,
it has collected a rich data set of their customers, but they
are unable to use it properly due to shortage of analytics
capabilities.
The fundamental objective of the project is implementing a
proactive default prevention guideline to help the bank
identify and take action on customers with high probability
of defaulting to improve their bottom line. The challenge is
to help the bank to improve its credit card services for the
mutual benefit of customers and the business itself.Creating
a human-interpretable solution is emphasized in each stage
of the project.
Even though plenty of solutions to the default prediction
using the full data set have been previously done, but there
lies a problem in the interpretability ,even in published
papers, the scope of our project extends beyond that, as our
ultimate goal is to provide an easy-to-interpret default
mitigation program to the client bank.Which is done fairly
easy by using gradient boosting LightGBM algorithm for
prediction.
In addition to default prevention, the case study includes a
set of learning goals. The team must understand key
considerations in selecting analytics and machine learning
methods and how these methodologies can be used
efficiently to create direct business value.McKinseyalsosets
the objective of learning how to communicate complex
topics to people with different backgrounds.
The project should include a recommended set of actions to
mitigate the default and a clear explanation of the business
implications. The interpretability and adaptability of our
solution needs to be emphasized when constructing the
solution. The bank needs a solution that can be understood
and applied by people with varying expertise, so that no
further outside consultation isrequiredinunderstanding the
business implications of the decisions.
2. RELATED WORK
There is much research on credit card lending, it is a widely
researched subject. Many statistical methods have been
applied to developing credit risk prediction, such as
discriminant analysis, logistic regression,Knearestneighbor
classifiers, and probabilistic classifiers such as Bayes
classifiers. Advanced machine learning methods including
decision trees and artificial neural networks have also been
applied. A short introduction to thesetechniquesisprovided
here.
K-nearest Neighbor Classifiers K-nearest neighbor (KNN)
classifier is one of the simplest unsupervised learning
algorithms which is based on learning by analogy. The main
IJTSRD42461
Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21)
Available online @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 23
idea is to define k centroids, one for each cluster. These
centroids should be placed in appropriately because of
different location causes different result. Therefore, the
better choice is to place them as much as possible far away
from each other. When given an unknown data, the KNN
classifier searches the pattern space for the KNN which are
the closest to this unknown data.Thisclosenessisdefinedby
distance. The unknown data sample is assigned to the most
common class among its KNN.
Discriminant Analysis (DA) The objective of discriminant
analysis is to maximize the distance between different
groups and to minimize the distance within each group. DA
assumes that, for each given class, the explanatory variables
are distributed as a multivariate normal distribution with a
common variance–covariance matrix.
Logistic Regression (LR) Logistic regression is often used
in credit risk modeling and prediction in the finance and
economics literature. Logisticregressionanalysisstudies the
association between a categorical dependent variable and a
set of independent variables. A logistic regression model
produces a probabilistic formula of classification. LR has
problems to deal with non-linear effects of explanatory
variables.
Classification Trees (CTs) The classification tree structure
is composed of nodes and leafs. Each internal node defines a
test on certain attribute whereas each branch represents an
outcome of the test, and the leafnodesrepresentclasses.The
root node is he top-most node in the tree. The segmentation
process is generally carried out using only one explanatory
variable at a time. Classification trees can result in simple
classification rules and can also handle the nonlinear and
interactive effects of explanatory variables. But they may
depend on the observed data so a small changecanaffectthe
structure of the tree.
Artificial Neural Networks (ANNs) Artificial neural
networks are used to develop relationships between the
input and output variables through a learning process. This
is done by formulating non-linearmathematical equationsto
describe these relationships. It can perform a number of
classification tasks at once, although commonly each
network performs only one. The best solution is usually to
train separate networks for each output, then to combine
them into an ensemble so that they can be run as a unit.Back
propagation algorithm is the best known example of neural
networks algorithm. This algorithm is applied to classify
data. In back propagation neural network, the gradient
vector of the error surface is computed. This vector points
along the line of steepest descent from the current point, so
we know that if we move along it a "short" distance, we will
decrease the error. A sequence of suchmoveswill eventually
find a minimum of some sort. The difficult part is to decide
how large the steps should be. Large steps may converge
more quickly, but may also overstep the solution or go off in
the wrong direction.
Naïve Bayesian classifier (NB) The Bayesian classifier is a
probabilistic classifier based on Bayes theory. This classifier
is based on the conditional independence which assumes
that the effect of an attribute value on a given class is
independent of the values of the other attributes.
Computations are simplified by using this assumption. In
practice, however, dependencescanexistbetweenvariables.
Comparing the results of the six data mining techniques,
classification trees and K-nearest neighbor classifiers have
the lowest error rate for the training set. However, for the
validation data, artificial neural networks has the best
performance with the highest area ratio and the relatively
low error rate. As the validation data is the effective
measurement of the classification accuracyofmodels,so,we
can conclude that artificial neural networksisthebestmodel
among the six methods. However, the error rates are not the
appropriate criteria for measuring the performance of the
models. As, for example, the KNN classifier has the lowest
error rate, while it does not perform better than artificial
neural networks and classification trees based on the area
ratio. While considering the area ratio in validation data,the
results show that the performance of the six techniques is
ranked as: artificial neural networks, classification trees,
Naïve Bayesian classifier, kNN classifier, logistic regression,
and Discriminant Analysis, respectively.
3. PROBLEM FORMULATION
With the growth of e-commerce websites, people and
financial companies rely on online services tocarryouttheir
transactions that have led to an exponential and vigorous
increase in the credit card frauds. Fraudulent credit card
transactions lead to a loss of huge amountofmoneytobanks
as well as various other sectors.
The design of an effective fraud detection system is
necessary in order to reduce the losses incurred by the
customers and financial companies. Researchhasbeendone
on many models and methods to prevent and detect credit
card frauds. Some credit card fraud transaction datasets
contain the problem of imbalance in datasets. A good fraud
detection system should be able to identify the fraud
transaction accurately and should make the detection
possible in real-time transactions. Fraud detection can be
divided into two groups: anomaly detection and misuse
detection. Anomaly detection systems bring normal
transaction to be trained and use techniques to determine
novel frauds. Conversely, a misuse fraud detection system
uses the labeled transaction as normal or fraud transaction
to be trained in the database history. So, this misuse
detection system entails a system ofsupervisedlearningand
anomaly detection system a system of unsupervised
learning. Fraudsters masquerade the normal behavior of
customers and the fraud patternsarechangingrapidlyso the
fraud detection system needstoconstantlylearnandupdate.
Background Timely information on fraudulent activities is
strategic to the banking industry as banks have huge
databases with variety.Valuablebusinessinformationcanbe
extracted from these data stores. Credit card frauds can be
broadly classified into three categories, that is, traditional
card related frauds (application, stolen, account takeover,
fake and counterfeit), merchant related frauds (merchant
collusion and triangulation) and Internet frauds (site
cloning, credit card generators and false merchant sites)
Methodology Basically, there are five basic steps for the
data mining process whichdefinestheproblem.1)preparing
data 2) exploring the data 3) development of the model 4)
exploration and validation of the models 5) deployment and
updation in the models. In this project, LightGBM is used as
the data mining technique and it utilized above mentioned
steps for accurate and reliable result. Moreover, Neural
network was used as it has the capability of adaption and
generalization. Moreover,python[3]isalsoa goodoptionfor
Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21)
Available online @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 24
the experiment purpose. Jupyter is a notebook style open
source interface for pyhon. It is an interactive web-based
environment that allows persons to combine text, plot,
mathematics, executable code in a single document.
4. OBJECTIVES
1. Higher accuracy of fraud detection.Comparedtorule-
based solutions, machine learning tools have higher
precision and return more relevant results as they
consider multiple additional factors. This is because ML
technologies can consider many more data points,
including the tiniest details of behavior patterns
associated with a particular account.
2. Less manual work needed for additional
verification. Enhanced accuracy leads reduces the
burden on analysts. “People are unable to check all
transactions manually, even if we are talking about a
small bank,” Alexander Konduforov, data science
competence leader at AltexSoft, explains. “ML-driven
systems filter out, roughly speaking, 99.9 percent of
normal patterns leaving only 0.1 percent of events to be
verified by experts.”
3. Fewer false declines. False declines or false positives
happen when a system identifies a legitimate
transaction as suspicious and wrongly cancels it.
4. Ability to identify new patterns and adapt to
changes. Unlike rule-based systems, ML algorithms are
aligned with a constantly changing environment and
financial conditions. They enable analysts to identify
new suspicious patterns
5. METHODOLOGY
DBSCAN
(Density Based Spatial ClusteringofApplicationswithNoise)
algorithm is a well-known data clustering algorithm, which
is used for discovering clusters for a spatial data set. The
algorithm requires the knowledge of two parameters. First
parameter is eps which is defined as the minimum distance
between two points. It simply means that if the distance
between two points is smaller or equal to eps, these points
are considered to be neighbors. The secondisminPoints:the
minimum number of points to form a dense region. For
instance, if we define the minPoints parameter as 5, then at
least 5 points are required to form a dense region. Based on
the parameters Eps and MinPts of each cluster and at least
one point from the respective cluster, the algorithm groups
together the points that are close to each other[6].Gradient
boosting is a popular machine learning algorithm that
combines multiple weak learners, like trees, into a one
strong ensemble model. This is done by first fitting a model
into the data. However, the first model is not likely to fit the
model perfectly to the data points so we are left with
residuals. We can then fit another tree to the residuals to
minimize a loss function that can be the second norm but
gradient boosting allows the use of any loss function. This
can be iterated for multiple steps which leads to a stronger
model and with proper regularization overfitting can be
avoided [7]. The gradient boosting has many parameters
that need to be optimized to find the best performing model
for a certain problem. These parameters include both tree
specific parameters like size limitationsforleafnodesaswell
as tree depth. There are also parameters considering the
boosting itself, for example how many models are fitted in
order to receive the final model and how much each
individual tree impacts the end result. Theseparametersare
usually optimized with a grid search that iteratesthrough all
the possible parameter combinations. This is usually
computationally expensive since a large number of models
have to fitted since the number of parameters needing to be
tested increases rapidly as more parameters are introduced
Self-organizing map (SOM), also known as Kohonen
network, is a type of artificial neural network that is used to
produce low dimensional discretized mappings of an input
space [9]. Self-organizing maps produce a grid that consists
of nodes, which are arranged in a regular hexagonal or
rectangular pattern. The training of a SOM works by
assigning a model for each of the nodes in the output grid.
The models are calculated by the SOMalgorithm,andobjects
are mapped into the output nodes based on which node’s
model is most similar to the object, or in other words, which
node has the smallest distance to the object on a chosen
metric. For real-valued objects, the most commonly used
distance metric is the euclidean distance,
although in this study, the sum of squares was used. For
categorical variables, thedistancemetricusedinthisstudyis
the Tanimoto distance.
The grid nodes’ models are more similar to nearby nodes
than those located farther away. Since it is thenodesthat are
being calculated to fit the data, themappingaimstopreserve
the topology of the original space. The models are also
known as codebook vectors, which is the term used in the R
package ‘kohonen’ used to implement the algorithm [10].
Also, the Tanimoto distance metric is defined under the
function supersom details in the package documentation.
In this project, multiple unsupervised self-organizing maps
were trained using the demographic variables to produce a
two-dimensional mapping serving as a customer
segmentation. Different parameters and map sizes were
tested to find the optimal mapping that would maximize
quality of representation and distance to neighbouring
clusters within the map. The maps were also compared on
their ability to produce clusters with varying financial
impact and default risk measured by the financial model and
the default prediction algorithm. The two primarymeasures
used to compare different mappings in this study was the
quality (mean distance of objects from the center of node)
and the U-matrix distances (mean distance of nodes to their
neighbouring nodes). The name quality is used due tohowit
appears in the kohonen R package.
Preliminary data analysis Describing the data The data
consists of 30,000 customers and 26 columns of variables.
Each sample corresponds to a single customer. The columns
consist of the following variables:
Default (Yes or no) as a binary response variable
Balance limit (Amount of credit in U.S. $)
Sex (Male, Female)
Education (Graduate school, University, High school,
Others)
Marital status (Married, Single, Others)
Age (Years)
Employer (Company name)
Location (Latitude, Longitude)
Payment status (last 6 months)
Indicates payment delay in monthsorwhetherpayment
was made duly
Bill amount (last 6 months)
Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21)
Available online @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 25
States amount of bill statement in U.S. $
Payment amount (last 6 months)
Amount paid by customer in U.S. $ 5
The variables Balance limit, Age, Sex, Education, Marital
status, Employer, and Location are defined as demographic
variables, since they describe a demography of customers
and are available for new customers, unlike the historical
payment data which is only available for existing customers.
The total proportion of defaults in the data is 22.12% which
is 6,636 out of the total data set comprising of 30,000
samples. This could be due to a large bias andtherefore nota
realistic representation of the bank’s customer base.
However, the data was collected during a debt crisis which
provides an argument for the assumption that the data
represents a non-biased sample of the customer base.Inany
case, the high amount of defaults in should be taken into
consideration when making generalizations about the
results or methodology of this case study. The high number
of defaults will especially have an effect on estimates of the
bank’s financials.
Default
This variable indicates whether or not the customer
defaulted in their credit card debt payment. For the purpose
of this project, predicting default is the main focus of the
data analysis. A value of 1 indicates default, and a value of 0
indicates no default. It is unclear how long after the
collection of the data this variable is measured. This means
that default could have happened the following month or a
longer time thereafter. Since this is unknown, no
assumptions are based on the time of default. It is also not
clear whether a value of 1 indicating defaultmeanstheclient
missed only a single payment or multipleandwhetheror not
the time of delay in payment was taken into account.
Balance limit
states the amount of given credit in US $. This is the
maximum amount a customer can spend with their credit
card in a single month. The amount of balance limit is
dependent on the bank’s own screening processesandother
unknown factors.
Sex
This variable can obtain a value of 1 for male and 2 for
female. In this study, sex and gender are used
interchangeably to intend the same thing. It is unknown
whether the difference between the two definitions were
taken into account when the data was collected.
Education
The education level of a customer is represented as one of
four values: 1 = Graduate school, 2 = University, 3 = High
school, 4 = Other. For the purpose of analysing customer
groups, this is assumed to indicate the highest level of
education completed.
Marital status
Referred to as “married” in the analysis, this variable can
obtain three values: 1 = Married, 2 = Single, 3=Othersuchas
divorced or widowed.
Age
of the customer is stated in years.
Location
This variable is composed of two different values for each
customer. One is for the latitude, and the second one is for
the longitude. In order to gain benefits from this data in
predictions using only the demographic variables, we
applied the DBSCAN algorithm.
Payment status
is represented as 6 different columns, one for each month.
The value of payment status for a month indicates whether
repayment of credit is was delayed or paid duly.Avalueof-1
indicates pay duly. 6 Values from 1 to 8 indicate payment
delay in months, with a value of 9 defined as a delay of 9
months or more. Data collected from 6 months, April to
September.
Bill amount
Amount of bill statement in U.S. $ is recordedinthisvariable.
It is represented in the data as 6 columns, one for each
month. Data collected from 6 months, April to September.
Payment amount
Amount of previous payment in U.S. $, stored in 6 different
columns for each month, similarly to paymentstatusand bill
amount. The payment amounts correspond to the same
months as payment status and bill amount. For example, the
payment amount for April indicates amount paid in April.
Checking data unbalance:
Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21)
Available online @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 26
Preliminary data analysis
Features correlation
Using mainstream (LightGBM) algorithm:
Training the dataset:
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.006994 seconds.
You can set `force_col_wise=true` to remove the overhead. Training until validation scores don't improve for 50 rounds
23. train's auc: 0.778238 valid's auc: 0.771173
[100] train's auc: 0.789346 valid's auc: 0.782605
[150] train's auc: 0.794861 valid's auc: 0.784753 Early stopping, best iteration is:
[135] train's auc: 0.793452 valid's auc: 0.785154
Out[62]:
33
Best validation score was obtained for round 135, for which AUC ~= 0.78.
Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21)
Available online @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 27
Plotting the variable importance
6. CONCLUSION
The results of analysis and predictive modelling show that
neither directly measuring or using predicted proportion of
defaults of a customer group to predict default is accurate.
This is most likely due to multiple reasons. One of them
being the limitations in accuracy of any machine learning
algorithm caused by the small number of variables or due to
missing values. Another reason is most likely the lack of
specificity in customer segments, mixing up actual high risk
customers with those of low risk. Comparing paying
amounts, gender and maternal status in the training set and
test set also showed large variation.Thisismostlikelydueto
the high losses that a single customer can produce by
defaulting with high amounts of debt. Much of the variation
in the data could not be represented, since customer
segmentation was only done using the demographic
variables. Further analysis should be done in order to fully
justify and support business decisions based on the
customer segmentation in this study.
When it comes to default prediction, we have a model that is
able to predict the defaults of customers with high enough
certainty that the bank can utilize it in their functions.
Assuming that the banks continuestoreceivecustomersthat
are represented in our dataset we could implement our
model in the banks preliminary screening process and it
would bring financial gain to the bank.
However, our solution is not viable to be used as a
standalone system in its current form since it only considers
part of the banks actions.Manyfactorsthatwerenotcovered
in this case study should be taken into consideration when
taking any business action. For example young people could
be preferable for the bank since they stay longer as a
customer so it could be in banks interest to favor having
them as a customer even if our model would suggest 26
otherwise.
Single customers should not be discriminated against
especially based on the customer segmentation which relies
on calculating averages over a group. A single customer
defaulting with high debt can result in much higher losses
than might be anticipated simply based on averages.
Similarly, the analysis does not go in-depth enough to justify
assuming that the variables used in this study could explain
or predict how reliable the customers are on the long run,
especially considering that the data was collected during a
debt crisis.
7. REFERENCES
[1] Wikipideahttps://www.8051projects.net/files/public
/1259220442_20766_FT0_7380969-line-follower-
using-at89c51.pdf
[2] Default Credit Card Clients Dataset,
https://www.kaggle.com/uciml/default-of-credit-
card-clients-dataset/
[3] RandomForrestClassifier, http://scikit-
learn.org/stable/modules/generated/sklearn.ensemb
le.RandomForestClassifier.html
[4] ROC-AUC characteristic,
https://en.wikipedia.org/wiki/Receiver_operating_ch
aracteristic#Area_under_the_curve
[5] AdaBoostClassifier, http://scikit-
learn.org/stable/modules/generated/sklearn.ensemb
le.AdaBoostClassifier.html
[6] CatBoostClassifier,
https://tech.yandex.com/catboost/doc/dg/concepts/
python-reference_catboostclassifier-docpage/
[7] XGBoost PythonAPI Reference,
http://xgboost.readthedocs.io/en/latest/python/pyt
hon_api.html
Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21)
Available online @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 28
[8] LightGBM Python implementation,
https://github.com/Microsoft/LightGBM/tree/maste
r/python-package
[9] LightGBMalgorithm,https://www.microsoft.com/en-
us/research/wp-
content/uploads/2017/11/lightgbm.pdf
[10] Chauhan, N., Dhaundiyal, R., & Joshi, K. K-MEANS ON
SEARCH ENGINE DATASET THROUGH WEKA.
International Journal of Research Fellow for
Engineering (IJRFE)–Volume, 4.
[11] Joshi, K., Rawat, S., & Chaudhary, S. ANALYSIS OF
DIFFERENT OPTICAL SWITCHING TECHNIQUES IN
NOC ROUTER ARCHITECTURE. International Journal
of Research Fellow for Engineering (IJRFE)–Volume, 4.
[12] Joshi, K., Chaudhary, S., & Chauhan, N. HYBRID
CLUSTERING ALGORITHM USING K-MEANS
CLUSTERING ALGORITHM. International Journal of
Research Fellow for Engineering (IJRFE)–Volume, 4.
[13] Longkumer, M., Joshi, K. A Comprehensive Study on
Recent Botnet. International Journal of Science and
Research (IJSR)- Volume, 7.
[14] Joshi, K., Gupta, H., & Lamba, S. An Overview on Image
Fusion Concept. Journal of Emerging Technologies
and Innovative Research (JETIR)–Volume, 5, 873-879.

More Related Content

What's hot

Default payment prediction system
Default payment prediction systemDefault payment prediction system
Default payment prediction systemAshish Arora
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card PredictionAlexandre Pinto
 
ART1197.DOC
ART1197.DOCART1197.DOC
ART1197.DOCbutest
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelSubhasis Mishra
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is AnalyticsRamkumar Ravichandran
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionRavi Gupta
 
Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelSaurabh Singh
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersSaurabh Singh
 
Phase 1 of Predicting Payment default on Vehicle Loan EMI
Phase 1 of Predicting Payment default on Vehicle Loan EMIPhase 1 of Predicting Payment default on Vehicle Loan EMI
Phase 1 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswaAkhil Bandhu Hens, FRM
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1Venkata Reddy Konasani
 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440stone55
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewAdam Pah
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionVandanaSharma356
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithmsankit panigrahy
 

What's hot (20)

Default payment prediction system
Default payment prediction systemDefault payment prediction system
Default payment prediction system
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
 
ART1197.DOC
ART1197.DOCART1197.DOC
ART1197.DOC
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring Model
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is Analytics
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
 
Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring Model
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customers
 
Phase 1 of Predicting Payment default on Vehicle Loan EMI
Phase 1 of Predicting Payment default on Vehicle Loan EMIPhase 1 of Predicting Payment default on Vehicle Loan EMI
Phase 1 of Predicting Payment default on Vehicle Loan EMI
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overview
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
B05840510
B05840510B05840510
B05840510
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 

Similar to A Review on Credit Card Default Modelling using Data Science

Decision support system using decision tree and neural networks
Decision support system using decision tree and neural networksDecision support system using decision tree and neural networks
Decision support system using decision tree and neural networksAlexander Decker
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKIRJET Journal
 
Improving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceImproving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceeSAT Publishing House
 
A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...IJDKP
 
Proficiency comparison ofladtree
Proficiency comparison ofladtreeProficiency comparison ofladtree
Proficiency comparison ofladtreeijcsa
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET Journal
 
A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...
A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...
A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...IJDKP
 
An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...Alexander Decker
 
Loan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesLoan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesIRJET Journal
 
A data mining approach to predict
A data mining approach to predictA data mining approach to predict
A data mining approach to predictIJDKP
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...IRJET Journal
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET Journal
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONIRJET Journal
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.Souma Maiti
 
Information security risk assessment under uncertainty using dynamic bayesian...
Information security risk assessment under uncertainty using dynamic bayesian...Information security risk assessment under uncertainty using dynamic bayesian...
Information security risk assessment under uncertainty using dynamic bayesian...eSAT Publishing House
 
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning AlgorithmCredit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithmijtsrd
 
IRJET- Survey on Credit Card Security System for Bank Transaction using N...
IRJET-  	  Survey on Credit Card Security System for Bank Transaction using N...IRJET-  	  Survey on Credit Card Security System for Bank Transaction using N...
IRJET- Survey on Credit Card Security System for Bank Transaction using N...IRJET Journal
 
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition MethodKPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition Methodijtsrd
 

Similar to A Review on Credit Card Default Modelling using Data Science (20)

Decision support system using decision tree and neural networks
Decision support system using decision tree and neural networksDecision support system using decision tree and neural networks
Decision support system using decision tree and neural networks
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
 
Improving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceImproving the credit scoring model of microfinance
Improving the credit scoring model of microfinance
 
A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...
 
Proficiency comparison ofladtree
Proficiency comparison ofladtreeProficiency comparison ofladtree
Proficiency comparison ofladtree
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...
A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...
A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...
 
An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...
 
Loan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesLoan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning Techniques
 
A data mining approach to predict
A data mining approach to predictA data mining approach to predict
A data mining approach to predict
 
Data mining on Financial Data
Data mining on Financial DataData mining on Financial Data
Data mining on Financial Data
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Information security risk assessment under uncertainty using dynamic bayesian...
Information security risk assessment under uncertainty using dynamic bayesian...Information security risk assessment under uncertainty using dynamic bayesian...
Information security risk assessment under uncertainty using dynamic bayesian...
 
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning AlgorithmCredit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
 
IRJET- Survey on Credit Card Security System for Bank Transaction using N...
IRJET-  	  Survey on Credit Card Security System for Bank Transaction using N...IRJET-  	  Survey on Credit Card Security System for Bank Transaction using N...
IRJET- Survey on Credit Card Security System for Bank Transaction using N...
 
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition MethodKPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
 

More from YogeshIJTSRD

Cosmetic Science An Overview
Cosmetic Science An OverviewCosmetic Science An Overview
Cosmetic Science An OverviewYogeshIJTSRD
 
Standardization and Formulations of Calotropis Procera
Standardization and Formulations of Calotropis ProceraStandardization and Formulations of Calotropis Procera
Standardization and Formulations of Calotropis ProceraYogeshIJTSRD
 
Review of the Diagnosis and Treatment of Paralysis
Review of the Diagnosis and Treatment of ParalysisReview of the Diagnosis and Treatment of Paralysis
Review of the Diagnosis and Treatment of ParalysisYogeshIJTSRD
 
Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...
Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...
Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...YogeshIJTSRD
 
Criminology Educators Triumphs and Struggles
Criminology Educators Triumphs and StrugglesCriminology Educators Triumphs and Struggles
Criminology Educators Triumphs and StrugglesYogeshIJTSRD
 
A Review Herbal Drugs Used in Skin Disorder
A Review Herbal Drugs Used in Skin DisorderA Review Herbal Drugs Used in Skin Disorder
A Review Herbal Drugs Used in Skin DisorderYogeshIJTSRD
 
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...YogeshIJTSRD
 
A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...
A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...
A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...YogeshIJTSRD
 
Data Security by AES Advanced Encryption Standard
Data Security by AES Advanced Encryption StandardData Security by AES Advanced Encryption Standard
Data Security by AES Advanced Encryption StandardYogeshIJTSRD
 
Antimicrobial and Phytochemical Screening of Phyllantus Niruri
Antimicrobial and Phytochemical Screening of Phyllantus NiruriAntimicrobial and Phytochemical Screening of Phyllantus Niruri
Antimicrobial and Phytochemical Screening of Phyllantus NiruriYogeshIJTSRD
 
Heat Sink for Underground Pipe Line
Heat Sink for Underground Pipe LineHeat Sink for Underground Pipe Line
Heat Sink for Underground Pipe LineYogeshIJTSRD
 
Newly Proposed Multi Channel Fiber Optic Cable Core
Newly Proposed Multi Channel Fiber Optic Cable CoreNewly Proposed Multi Channel Fiber Optic Cable Core
Newly Proposed Multi Channel Fiber Optic Cable CoreYogeshIJTSRD
 
Security Sector Reform toward Professionalism of Military and Police
Security Sector Reform toward Professionalism of Military and PoliceSecurity Sector Reform toward Professionalism of Military and Police
Security Sector Reform toward Professionalism of Military and PoliceYogeshIJTSRD
 
Stress An Undetachable Condition of Life
Stress An Undetachable Condition of LifeStress An Undetachable Condition of Life
Stress An Undetachable Condition of LifeYogeshIJTSRD
 
Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...
Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...
Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...YogeshIJTSRD
 
To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...
To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...
To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...YogeshIJTSRD
 
Novel Drug Delivery System An Overview
Novel Drug Delivery System An OverviewNovel Drug Delivery System An Overview
Novel Drug Delivery System An OverviewYogeshIJTSRD
 
Security Issues Related to Biometrics
Security Issues Related to BiometricsSecurity Issues Related to Biometrics
Security Issues Related to BiometricsYogeshIJTSRD
 
Comparative Analysis of Different Numerical Methods for the Solution of Initi...
Comparative Analysis of Different Numerical Methods for the Solution of Initi...Comparative Analysis of Different Numerical Methods for the Solution of Initi...
Comparative Analysis of Different Numerical Methods for the Solution of Initi...YogeshIJTSRD
 
Evaluation of Different Paving Mixes Using Optimum Stabilizing Content
Evaluation of Different Paving Mixes Using Optimum Stabilizing ContentEvaluation of Different Paving Mixes Using Optimum Stabilizing Content
Evaluation of Different Paving Mixes Using Optimum Stabilizing ContentYogeshIJTSRD
 

More from YogeshIJTSRD (20)

Cosmetic Science An Overview
Cosmetic Science An OverviewCosmetic Science An Overview
Cosmetic Science An Overview
 
Standardization and Formulations of Calotropis Procera
Standardization and Formulations of Calotropis ProceraStandardization and Formulations of Calotropis Procera
Standardization and Formulations of Calotropis Procera
 
Review of the Diagnosis and Treatment of Paralysis
Review of the Diagnosis and Treatment of ParalysisReview of the Diagnosis and Treatment of Paralysis
Review of the Diagnosis and Treatment of Paralysis
 
Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...
Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...
Comparative Analysis of Forced Draft Cooling Tower Using Two Design Methods A...
 
Criminology Educators Triumphs and Struggles
Criminology Educators Triumphs and StrugglesCriminology Educators Triumphs and Struggles
Criminology Educators Triumphs and Struggles
 
A Review Herbal Drugs Used in Skin Disorder
A Review Herbal Drugs Used in Skin DisorderA Review Herbal Drugs Used in Skin Disorder
A Review Herbal Drugs Used in Skin Disorder
 
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...
 
A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...
A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...
A New Proposal for Smartphone Based Drowsiness Detection and Warning System f...
 
Data Security by AES Advanced Encryption Standard
Data Security by AES Advanced Encryption StandardData Security by AES Advanced Encryption Standard
Data Security by AES Advanced Encryption Standard
 
Antimicrobial and Phytochemical Screening of Phyllantus Niruri
Antimicrobial and Phytochemical Screening of Phyllantus NiruriAntimicrobial and Phytochemical Screening of Phyllantus Niruri
Antimicrobial and Phytochemical Screening of Phyllantus Niruri
 
Heat Sink for Underground Pipe Line
Heat Sink for Underground Pipe LineHeat Sink for Underground Pipe Line
Heat Sink for Underground Pipe Line
 
Newly Proposed Multi Channel Fiber Optic Cable Core
Newly Proposed Multi Channel Fiber Optic Cable CoreNewly Proposed Multi Channel Fiber Optic Cable Core
Newly Proposed Multi Channel Fiber Optic Cable Core
 
Security Sector Reform toward Professionalism of Military and Police
Security Sector Reform toward Professionalism of Military and PoliceSecurity Sector Reform toward Professionalism of Military and Police
Security Sector Reform toward Professionalism of Military and Police
 
Stress An Undetachable Condition of Life
Stress An Undetachable Condition of LifeStress An Undetachable Condition of Life
Stress An Undetachable Condition of Life
 
Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...
Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...
Comparative Studies of Diabetes in Adult Nigerians Lipid Profile and Antioxid...
 
To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...
To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...
To Assess the Severity and Mortality among Covid 19 Patients after Having Vac...
 
Novel Drug Delivery System An Overview
Novel Drug Delivery System An OverviewNovel Drug Delivery System An Overview
Novel Drug Delivery System An Overview
 
Security Issues Related to Biometrics
Security Issues Related to BiometricsSecurity Issues Related to Biometrics
Security Issues Related to Biometrics
 
Comparative Analysis of Different Numerical Methods for the Solution of Initi...
Comparative Analysis of Different Numerical Methods for the Solution of Initi...Comparative Analysis of Different Numerical Methods for the Solution of Initi...
Comparative Analysis of Different Numerical Methods for the Solution of Initi...
 
Evaluation of Different Paving Mixes Using Optimum Stabilizing Content
Evaluation of Different Paving Mixes Using Optimum Stabilizing ContentEvaluation of Different Paving Mixes Using Optimum Stabilizing Content
Evaluation of Different Paving Mixes Using Optimum Stabilizing Content
 

Recently uploaded

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 

Recently uploaded (20)

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 

A Review on Credit Card Default Modelling using Data Science

  • 1. International Journal of Trend in Scientific Research and Development (IJTSRD) Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 Organized by: Uttaranchal Institute of Technology, Uttaranchal University, Dehradun Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470 @ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 22 A Review on Credit Card Default Modelling using Data Science Harsh Nautiyal, Ayush Jyala, Dishank Bhandari UIT, Uttaranchal University, Dehradun, Uttarakhand, India How to cite this paper: Harsh Nautiyal | Ayush Jyala | Dishank Bhandari "A Review on Credit Card Default Modelling using Data Science" Published in International Journal ofTrendinScientificResearch and Development (ijtsrd), ISSN: 2456-6470,Special Issue | International Conference on Advances in Engineering, Science and Technology – 2021, May 2021, pp.22-28, URL: www.ijtsrd.com/papers/ijtsrd42461.pdf Copyright © 2021 by author(s) and International Journal of Trend in Scientific Research and Development Journal. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by/4.0) 1. INTRODUCTION In the last few years, credit card issuers have become one of the major consumer lending products in the U.S. as well as several other developed nations of the world, representing roughly 30% of total consumer lending (USD 3.6tnin2016). Credit cards issued by banks hold the majority of the market share with approximately 70% of the total outstanding balance. Bank’s credit card charge offs have stabilized after the financial crisis to around 3% of the outstanding total balance. However, there are still differences in the credit card charge off levels between different competitors. Credit card is a flexible tool by which you can use bank’s money for a short period of time. If you accept a credit card, you agree to pay your bills by the due date listed on your credit card statement. Otherwise, the credit card will be defaulted. When a customer is not able to pay back the loan by the due date and the bank is totally certain that they are not able to collect the payment, it will usually try to sell the loan. After that, if the bank recognizes that they are not able to sell it, they will write it off. This is called a charge-off. This results in significant financial losses to thebank ontopofthe damaged credit rating of the customer and thus it is an important problem to be tackled in todays world where financial risks are happening vigorously. Predicting accurately which customers are mostprobableto default represents significant business opportunity and strategy for all banks. Bank cards are the most common credit card type in the U.S., which emphasizes the impact of risk prediction to both the consumers and banks. In a well- developed financial system, risk prediction is essential for predicting business performance or individual customers’ credit risk and to reduce the damage and uncertainty. Our client ITBCO Bank has approached us to help them to predict and prevent credit card defaulters to improve their bottom line. The client has a screening process, for instance, it has collected a rich data set of their customers, but they are unable to use it properly due to shortage of analytics capabilities. The fundamental objective of the project is implementing a proactive default prevention guideline to help the bank identify and take action on customers with high probability of defaulting to improve their bottom line. The challenge is to help the bank to improve its credit card services for the mutual benefit of customers and the business itself.Creating a human-interpretable solution is emphasized in each stage of the project. Even though plenty of solutions to the default prediction using the full data set have been previously done, but there lies a problem in the interpretability ,even in published papers, the scope of our project extends beyond that, as our ultimate goal is to provide an easy-to-interpret default mitigation program to the client bank.Which is done fairly easy by using gradient boosting LightGBM algorithm for prediction. In addition to default prevention, the case study includes a set of learning goals. The team must understand key considerations in selecting analytics and machine learning methods and how these methodologies can be used efficiently to create direct business value.McKinseyalsosets the objective of learning how to communicate complex topics to people with different backgrounds. The project should include a recommended set of actions to mitigate the default and a clear explanation of the business implications. The interpretability and adaptability of our solution needs to be emphasized when constructing the solution. The bank needs a solution that can be understood and applied by people with varying expertise, so that no further outside consultation isrequiredinunderstanding the business implications of the decisions. 2. RELATED WORK There is much research on credit card lending, it is a widely researched subject. Many statistical methods have been applied to developing credit risk prediction, such as discriminant analysis, logistic regression,Knearestneighbor classifiers, and probabilistic classifiers such as Bayes classifiers. Advanced machine learning methods including decision trees and artificial neural networks have also been applied. A short introduction to thesetechniquesisprovided here. K-nearest Neighbor Classifiers K-nearest neighbor (KNN) classifier is one of the simplest unsupervised learning algorithms which is based on learning by analogy. The main IJTSRD42461
  • 2. Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21) Available online @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 23 idea is to define k centroids, one for each cluster. These centroids should be placed in appropriately because of different location causes different result. Therefore, the better choice is to place them as much as possible far away from each other. When given an unknown data, the KNN classifier searches the pattern space for the KNN which are the closest to this unknown data.Thisclosenessisdefinedby distance. The unknown data sample is assigned to the most common class among its KNN. Discriminant Analysis (DA) The objective of discriminant analysis is to maximize the distance between different groups and to minimize the distance within each group. DA assumes that, for each given class, the explanatory variables are distributed as a multivariate normal distribution with a common variance–covariance matrix. Logistic Regression (LR) Logistic regression is often used in credit risk modeling and prediction in the finance and economics literature. Logisticregressionanalysisstudies the association between a categorical dependent variable and a set of independent variables. A logistic regression model produces a probabilistic formula of classification. LR has problems to deal with non-linear effects of explanatory variables. Classification Trees (CTs) The classification tree structure is composed of nodes and leafs. Each internal node defines a test on certain attribute whereas each branch represents an outcome of the test, and the leafnodesrepresentclasses.The root node is he top-most node in the tree. The segmentation process is generally carried out using only one explanatory variable at a time. Classification trees can result in simple classification rules and can also handle the nonlinear and interactive effects of explanatory variables. But they may depend on the observed data so a small changecanaffectthe structure of the tree. Artificial Neural Networks (ANNs) Artificial neural networks are used to develop relationships between the input and output variables through a learning process. This is done by formulating non-linearmathematical equationsto describe these relationships. It can perform a number of classification tasks at once, although commonly each network performs only one. The best solution is usually to train separate networks for each output, then to combine them into an ensemble so that they can be run as a unit.Back propagation algorithm is the best known example of neural networks algorithm. This algorithm is applied to classify data. In back propagation neural network, the gradient vector of the error surface is computed. This vector points along the line of steepest descent from the current point, so we know that if we move along it a "short" distance, we will decrease the error. A sequence of suchmoveswill eventually find a minimum of some sort. The difficult part is to decide how large the steps should be. Large steps may converge more quickly, but may also overstep the solution or go off in the wrong direction. Naïve Bayesian classifier (NB) The Bayesian classifier is a probabilistic classifier based on Bayes theory. This classifier is based on the conditional independence which assumes that the effect of an attribute value on a given class is independent of the values of the other attributes. Computations are simplified by using this assumption. In practice, however, dependencescanexistbetweenvariables. Comparing the results of the six data mining techniques, classification trees and K-nearest neighbor classifiers have the lowest error rate for the training set. However, for the validation data, artificial neural networks has the best performance with the highest area ratio and the relatively low error rate. As the validation data is the effective measurement of the classification accuracyofmodels,so,we can conclude that artificial neural networksisthebestmodel among the six methods. However, the error rates are not the appropriate criteria for measuring the performance of the models. As, for example, the KNN classifier has the lowest error rate, while it does not perform better than artificial neural networks and classification trees based on the area ratio. While considering the area ratio in validation data,the results show that the performance of the six techniques is ranked as: artificial neural networks, classification trees, Naïve Bayesian classifier, kNN classifier, logistic regression, and Discriminant Analysis, respectively. 3. PROBLEM FORMULATION With the growth of e-commerce websites, people and financial companies rely on online services tocarryouttheir transactions that have led to an exponential and vigorous increase in the credit card frauds. Fraudulent credit card transactions lead to a loss of huge amountofmoneytobanks as well as various other sectors. The design of an effective fraud detection system is necessary in order to reduce the losses incurred by the customers and financial companies. Researchhasbeendone on many models and methods to prevent and detect credit card frauds. Some credit card fraud transaction datasets contain the problem of imbalance in datasets. A good fraud detection system should be able to identify the fraud transaction accurately and should make the detection possible in real-time transactions. Fraud detection can be divided into two groups: anomaly detection and misuse detection. Anomaly detection systems bring normal transaction to be trained and use techniques to determine novel frauds. Conversely, a misuse fraud detection system uses the labeled transaction as normal or fraud transaction to be trained in the database history. So, this misuse detection system entails a system ofsupervisedlearningand anomaly detection system a system of unsupervised learning. Fraudsters masquerade the normal behavior of customers and the fraud patternsarechangingrapidlyso the fraud detection system needstoconstantlylearnandupdate. Background Timely information on fraudulent activities is strategic to the banking industry as banks have huge databases with variety.Valuablebusinessinformationcanbe extracted from these data stores. Credit card frauds can be broadly classified into three categories, that is, traditional card related frauds (application, stolen, account takeover, fake and counterfeit), merchant related frauds (merchant collusion and triangulation) and Internet frauds (site cloning, credit card generators and false merchant sites) Methodology Basically, there are five basic steps for the data mining process whichdefinestheproblem.1)preparing data 2) exploring the data 3) development of the model 4) exploration and validation of the models 5) deployment and updation in the models. In this project, LightGBM is used as the data mining technique and it utilized above mentioned steps for accurate and reliable result. Moreover, Neural network was used as it has the capability of adaption and generalization. Moreover,python[3]isalsoa goodoptionfor
  • 3. Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21) Available online @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 24 the experiment purpose. Jupyter is a notebook style open source interface for pyhon. It is an interactive web-based environment that allows persons to combine text, plot, mathematics, executable code in a single document. 4. OBJECTIVES 1. Higher accuracy of fraud detection.Comparedtorule- based solutions, machine learning tools have higher precision and return more relevant results as they consider multiple additional factors. This is because ML technologies can consider many more data points, including the tiniest details of behavior patterns associated with a particular account. 2. Less manual work needed for additional verification. Enhanced accuracy leads reduces the burden on analysts. “People are unable to check all transactions manually, even if we are talking about a small bank,” Alexander Konduforov, data science competence leader at AltexSoft, explains. “ML-driven systems filter out, roughly speaking, 99.9 percent of normal patterns leaving only 0.1 percent of events to be verified by experts.” 3. Fewer false declines. False declines or false positives happen when a system identifies a legitimate transaction as suspicious and wrongly cancels it. 4. Ability to identify new patterns and adapt to changes. Unlike rule-based systems, ML algorithms are aligned with a constantly changing environment and financial conditions. They enable analysts to identify new suspicious patterns 5. METHODOLOGY DBSCAN (Density Based Spatial ClusteringofApplicationswithNoise) algorithm is a well-known data clustering algorithm, which is used for discovering clusters for a spatial data set. The algorithm requires the knowledge of two parameters. First parameter is eps which is defined as the minimum distance between two points. It simply means that if the distance between two points is smaller or equal to eps, these points are considered to be neighbors. The secondisminPoints:the minimum number of points to form a dense region. For instance, if we define the minPoints parameter as 5, then at least 5 points are required to form a dense region. Based on the parameters Eps and MinPts of each cluster and at least one point from the respective cluster, the algorithm groups together the points that are close to each other[6].Gradient boosting is a popular machine learning algorithm that combines multiple weak learners, like trees, into a one strong ensemble model. This is done by first fitting a model into the data. However, the first model is not likely to fit the model perfectly to the data points so we are left with residuals. We can then fit another tree to the residuals to minimize a loss function that can be the second norm but gradient boosting allows the use of any loss function. This can be iterated for multiple steps which leads to a stronger model and with proper regularization overfitting can be avoided [7]. The gradient boosting has many parameters that need to be optimized to find the best performing model for a certain problem. These parameters include both tree specific parameters like size limitationsforleafnodesaswell as tree depth. There are also parameters considering the boosting itself, for example how many models are fitted in order to receive the final model and how much each individual tree impacts the end result. Theseparametersare usually optimized with a grid search that iteratesthrough all the possible parameter combinations. This is usually computationally expensive since a large number of models have to fitted since the number of parameters needing to be tested increases rapidly as more parameters are introduced Self-organizing map (SOM), also known as Kohonen network, is a type of artificial neural network that is used to produce low dimensional discretized mappings of an input space [9]. Self-organizing maps produce a grid that consists of nodes, which are arranged in a regular hexagonal or rectangular pattern. The training of a SOM works by assigning a model for each of the nodes in the output grid. The models are calculated by the SOMalgorithm,andobjects are mapped into the output nodes based on which node’s model is most similar to the object, or in other words, which node has the smallest distance to the object on a chosen metric. For real-valued objects, the most commonly used distance metric is the euclidean distance, although in this study, the sum of squares was used. For categorical variables, thedistancemetricusedinthisstudyis the Tanimoto distance. The grid nodes’ models are more similar to nearby nodes than those located farther away. Since it is thenodesthat are being calculated to fit the data, themappingaimstopreserve the topology of the original space. The models are also known as codebook vectors, which is the term used in the R package ‘kohonen’ used to implement the algorithm [10]. Also, the Tanimoto distance metric is defined under the function supersom details in the package documentation. In this project, multiple unsupervised self-organizing maps were trained using the demographic variables to produce a two-dimensional mapping serving as a customer segmentation. Different parameters and map sizes were tested to find the optimal mapping that would maximize quality of representation and distance to neighbouring clusters within the map. The maps were also compared on their ability to produce clusters with varying financial impact and default risk measured by the financial model and the default prediction algorithm. The two primarymeasures used to compare different mappings in this study was the quality (mean distance of objects from the center of node) and the U-matrix distances (mean distance of nodes to their neighbouring nodes). The name quality is used due tohowit appears in the kohonen R package. Preliminary data analysis Describing the data The data consists of 30,000 customers and 26 columns of variables. Each sample corresponds to a single customer. The columns consist of the following variables: Default (Yes or no) as a binary response variable Balance limit (Amount of credit in U.S. $) Sex (Male, Female) Education (Graduate school, University, High school, Others) Marital status (Married, Single, Others) Age (Years) Employer (Company name) Location (Latitude, Longitude) Payment status (last 6 months) Indicates payment delay in monthsorwhetherpayment was made duly Bill amount (last 6 months)
  • 4. Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21) Available online @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 25 States amount of bill statement in U.S. $ Payment amount (last 6 months) Amount paid by customer in U.S. $ 5 The variables Balance limit, Age, Sex, Education, Marital status, Employer, and Location are defined as demographic variables, since they describe a demography of customers and are available for new customers, unlike the historical payment data which is only available for existing customers. The total proportion of defaults in the data is 22.12% which is 6,636 out of the total data set comprising of 30,000 samples. This could be due to a large bias andtherefore nota realistic representation of the bank’s customer base. However, the data was collected during a debt crisis which provides an argument for the assumption that the data represents a non-biased sample of the customer base.Inany case, the high amount of defaults in should be taken into consideration when making generalizations about the results or methodology of this case study. The high number of defaults will especially have an effect on estimates of the bank’s financials. Default This variable indicates whether or not the customer defaulted in their credit card debt payment. For the purpose of this project, predicting default is the main focus of the data analysis. A value of 1 indicates default, and a value of 0 indicates no default. It is unclear how long after the collection of the data this variable is measured. This means that default could have happened the following month or a longer time thereafter. Since this is unknown, no assumptions are based on the time of default. It is also not clear whether a value of 1 indicating defaultmeanstheclient missed only a single payment or multipleandwhetheror not the time of delay in payment was taken into account. Balance limit states the amount of given credit in US $. This is the maximum amount a customer can spend with their credit card in a single month. The amount of balance limit is dependent on the bank’s own screening processesandother unknown factors. Sex This variable can obtain a value of 1 for male and 2 for female. In this study, sex and gender are used interchangeably to intend the same thing. It is unknown whether the difference between the two definitions were taken into account when the data was collected. Education The education level of a customer is represented as one of four values: 1 = Graduate school, 2 = University, 3 = High school, 4 = Other. For the purpose of analysing customer groups, this is assumed to indicate the highest level of education completed. Marital status Referred to as “married” in the analysis, this variable can obtain three values: 1 = Married, 2 = Single, 3=Othersuchas divorced or widowed. Age of the customer is stated in years. Location This variable is composed of two different values for each customer. One is for the latitude, and the second one is for the longitude. In order to gain benefits from this data in predictions using only the demographic variables, we applied the DBSCAN algorithm. Payment status is represented as 6 different columns, one for each month. The value of payment status for a month indicates whether repayment of credit is was delayed or paid duly.Avalueof-1 indicates pay duly. 6 Values from 1 to 8 indicate payment delay in months, with a value of 9 defined as a delay of 9 months or more. Data collected from 6 months, April to September. Bill amount Amount of bill statement in U.S. $ is recordedinthisvariable. It is represented in the data as 6 columns, one for each month. Data collected from 6 months, April to September. Payment amount Amount of previous payment in U.S. $, stored in 6 different columns for each month, similarly to paymentstatusand bill amount. The payment amounts correspond to the same months as payment status and bill amount. For example, the payment amount for April indicates amount paid in April. Checking data unbalance:
  • 5. Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21) Available online @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 26 Preliminary data analysis Features correlation Using mainstream (LightGBM) algorithm: Training the dataset: [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.006994 seconds. You can set `force_col_wise=true` to remove the overhead. Training until validation scores don't improve for 50 rounds 23. train's auc: 0.778238 valid's auc: 0.771173 [100] train's auc: 0.789346 valid's auc: 0.782605 [150] train's auc: 0.794861 valid's auc: 0.784753 Early stopping, best iteration is: [135] train's auc: 0.793452 valid's auc: 0.785154 Out[62]: 33 Best validation score was obtained for round 135, for which AUC ~= 0.78.
  • 6. Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21) Available online @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 27 Plotting the variable importance 6. CONCLUSION The results of analysis and predictive modelling show that neither directly measuring or using predicted proportion of defaults of a customer group to predict default is accurate. This is most likely due to multiple reasons. One of them being the limitations in accuracy of any machine learning algorithm caused by the small number of variables or due to missing values. Another reason is most likely the lack of specificity in customer segments, mixing up actual high risk customers with those of low risk. Comparing paying amounts, gender and maternal status in the training set and test set also showed large variation.Thisismostlikelydueto the high losses that a single customer can produce by defaulting with high amounts of debt. Much of the variation in the data could not be represented, since customer segmentation was only done using the demographic variables. Further analysis should be done in order to fully justify and support business decisions based on the customer segmentation in this study. When it comes to default prediction, we have a model that is able to predict the defaults of customers with high enough certainty that the bank can utilize it in their functions. Assuming that the banks continuestoreceivecustomersthat are represented in our dataset we could implement our model in the banks preliminary screening process and it would bring financial gain to the bank. However, our solution is not viable to be used as a standalone system in its current form since it only considers part of the banks actions.Manyfactorsthatwerenotcovered in this case study should be taken into consideration when taking any business action. For example young people could be preferable for the bank since they stay longer as a customer so it could be in banks interest to favor having them as a customer even if our model would suggest 26 otherwise. Single customers should not be discriminated against especially based on the customer segmentation which relies on calculating averages over a group. A single customer defaulting with high debt can result in much higher losses than might be anticipated simply based on averages. Similarly, the analysis does not go in-depth enough to justify assuming that the variables used in this study could explain or predict how reliable the customers are on the long run, especially considering that the data was collected during a debt crisis. 7. REFERENCES [1] Wikipideahttps://www.8051projects.net/files/public /1259220442_20766_FT0_7380969-line-follower- using-at89c51.pdf [2] Default Credit Card Clients Dataset, https://www.kaggle.com/uciml/default-of-credit- card-clients-dataset/ [3] RandomForrestClassifier, http://scikit- learn.org/stable/modules/generated/sklearn.ensemb le.RandomForestClassifier.html [4] ROC-AUC characteristic, https://en.wikipedia.org/wiki/Receiver_operating_ch aracteristic#Area_under_the_curve [5] AdaBoostClassifier, http://scikit- learn.org/stable/modules/generated/sklearn.ensemb le.AdaBoostClassifier.html [6] CatBoostClassifier, https://tech.yandex.com/catboost/doc/dg/concepts/ python-reference_catboostclassifier-docpage/ [7] XGBoost PythonAPI Reference, http://xgboost.readthedocs.io/en/latest/python/pyt hon_api.html
  • 7. Special Issue: International Conference on Advances in Engineering, Science and Technology – 2021 (ICAEST-21) Available online @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD42461 | ICAEST-21 | May 2021 Page 28 [8] LightGBM Python implementation, https://github.com/Microsoft/LightGBM/tree/maste r/python-package [9] LightGBMalgorithm,https://www.microsoft.com/en- us/research/wp- content/uploads/2017/11/lightgbm.pdf [10] Chauhan, N., Dhaundiyal, R., & Joshi, K. K-MEANS ON SEARCH ENGINE DATASET THROUGH WEKA. International Journal of Research Fellow for Engineering (IJRFE)–Volume, 4. [11] Joshi, K., Rawat, S., & Chaudhary, S. ANALYSIS OF DIFFERENT OPTICAL SWITCHING TECHNIQUES IN NOC ROUTER ARCHITECTURE. International Journal of Research Fellow for Engineering (IJRFE)–Volume, 4. [12] Joshi, K., Chaudhary, S., & Chauhan, N. HYBRID CLUSTERING ALGORITHM USING K-MEANS CLUSTERING ALGORITHM. International Journal of Research Fellow for Engineering (IJRFE)–Volume, 4. [13] Longkumer, M., Joshi, K. A Comprehensive Study on Recent Botnet. International Journal of Science and Research (IJSR)- Volume, 7. [14] Joshi, K., Gupta, H., & Lamba, S. An Overview on Image Fusion Concept. Journal of Emerging Technologies and Innovative Research (JETIR)–Volume, 5, 873-879.