1. SmartCube
Data Analytics Case Study
Team: Mafias In Training
Team Members:
Apoorv Parmar
Kushaang Deswal
Hemal Aurora
Mudith
2. Executive Summary
• The basis of identifying NPA stems
from the age of the applicant.
• The dataset has been divided into
4 clusters based on age of the
applicant.
– 20-41 Year
• Low Monthly Income
• Less Dependents
– 42-62
• Increased Monthly Income
• Increase In Dependents
– 63-96
• Retirement age
• Giving Loan not Preferred
• Only Credit Card Lines, no Real
Estate Loans
– Greater than 96
• Loan attached to spouse/Children in
eligibility age of dependent is <63
• Thus we were able to obtain 16
clusters in which the data was
divided.
• New Features added to the
dataset.
– Credit Card Lines = Total open Credit
Lines – Total Real Estate lines
– Utilization = Credit card lines * Revolving
Utilization of unsecured Lines
– Per Capita Income of Family = Monthly
Income/(Number of Dependents + 1)
• The other major contributing
factor is the Debt ratio. The
second level of clustering is done
based on debt ratio.
– Debt Ratio >=0.5
• Greater amount of credit lines
• Less monthly income
– Debt Ratio <0.5
• Less Credit lines
• Greater Monthly income
• The third level of clustering is
done on the basis of
NumberOfTimes90DaysLate.
– If the applicant has defaulted in 90
days, the probability of it to go NPA in
high.
– If the applicant has not defaulted in
90 days, probability of it to go NPA is
less
3. CLUSTERING
Dataset
Age : [21,41]
Debt Ratio<=0.50
+90days past
default =0
+90days past
default !=90
Debt Ratio > 0.50
+90days past
default =0
+90days past
default !=90
Age : [42,62]
Debt Ratio<=0.50
+90days past
default =0
+90days past
default !=90
Debt Ratio > 0.50
+90days past
default =0
+90days past
default !=90
Age : [63,96]
Debt Ratio<=0.50
+90days past
default =0
+90days past
default !=90
Debt Ratio > 0.50
+90days past
default =0
+90days past
default !=90
Age : [97,109]
Debt Ratio<=0.50
+90days past
default =0
+90days past
default !=90
Debt Ratio > 0.50
+90days past
default =0
+90days past
default !=90
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 6
Cluster 7
Cluster 8
Cluster 9
Cluster 10
Cluster 11
Cluster 12
Cluster 13
Cluster 14
Cluster 15
Cluster 16
4. INSIGHTS
• Age, Debt Ratio & Payment History have considerable impact on tendency of a customer to convert into
NPA.
• If borrowers cross the age of 65years then:
• They can only avail credit cards as a loan independently.
• Only Real Estate loan can be obtained by them in partnership with their Spouse / Children of eligibility
age < 65 years.
OPINIONS
• Loans to customers with age > 65years should be given real estate loan only if they have dependents
due to low mortality rate.
• Revolving Utilization must not be allowed to hop over 100%.
5. Technique Used for filling missing data
MICE : Multiple Imputation by Chained Equations, principle method of dealing with missing data.
Removal of Outliers
Outliers have been conservatively removed since Test data was
kept in consideration. On applying our technique for outliers
identification on Test data, significant number of outliers were
identified and removing them could have caused trimming of test
data by 15-20%.
Hence, outliers for below mentioned factors have only been
removed.
BASIC OUTLIERS WERE CLEARED ON THE BASIS OF A BOX PLOT FOR
EACH OF THE CLUSTERS FORMED.
Revolving Utilization Age DebtRatio NumberOfTimes90DaysLate
Upper Bound 200% 63 7000 17
Lower Bound 20
6. Exploratory Analysis
Our analysis began with data visualization considering different dimensions.
Count Of NPA vs Age Count of NPA on Credit Lines vs Age
The above pattern showed us a distribution similar to normal distribution. Hence, we got a
clustering dimension as Age. Similarly, visualization was done on 13 combinations to have a
clear analysis of relationship amongst factors.
7. DECISION MAKING FACTORS
Debt Ratio : Normalized form of Debt Ratio given in each cluster.
Normalized Debt Ratio : Debt Ratio / Max(Debt Ratio in each cluster)
Past Default : Weighted Normalized average of past defaults.
Formula : (0.17*No. of defaults in 30-59days) + ( 0.33*No. of defaults in 60-89days)+(0.50*No. of defaults
+90days)
Utilization of Unsecured Credit Lines : Check the utilization of number of open credit lines which are
other than real estate / long term
Formula : Normalized value of( (No. of open credit lines - No. of Real Estate Loans )*Revolving Utilization of
unsecured lines)
Number of Dependents : Normalized value of number of dependents for a borrower in a cluster
8. FACTORS IMPACTING CLUSTERS
Debt
Ratio
Past
Default
Utilization of
unsecured lines
Number of
Dependents Final Score
Cluster 1 0.25 0.5 0.15 0.10
Weighted Sum of all Factors
for each Cluster.
If Final Score >0.10, Then
Borrower qualifies for NPA
Else Borrower does not
qualify for NPA.
(94.12% Accuracy)
Cluster 2 0.25 0.5 0.15 0.10
Cluster 3 0.25 0.5 0.15 0.10
Cluster 4 0.25 0.5 0.15 0.10
Cluster 5 0.25 0.45 0.15 0.15
Cluster 6 0.25 0.45 0.15 0.15
Cluster 7 0.25 0.45 0.15 0.15
Cluster 8 0.25 0.45 0.15 0.15
Cluster 9 0.2 0.35 0.15 0.20
Cluster 10 0.2 0.35 0.15 0.20
Cluster 11 0.2 0.35 0.15 0.20
Cluster 12 0.2 0.35 0.15 0.20
Cluster 13 0.2 0.3 0.25 0.25
Cluster 14 0.2 0.3 0.25 0.25
Cluster 15 0.2 0.3 0.25 0.25
Cluster 16 0.2 0.3 0.25 0.25
9. Team Members and CGPA
Apoorv Parmar 6.63
Kushaang Deswal 7.96
Hemal Aurora 7.43
Mudith 7.92