2. BUSINESS
OBJECTIVE
Toapprove the loan applications of the clients who are capableof
repaying the loans.
In other words, the company wants to understand the driving
factors (or driver variables) behind loan default, i.e. the variables
which are strong indicators of default. The company can utilise this
knowledge for its portfolio and risk assessment.
EDA on the available data to understand how customer
attributes and loan attributes influence the tendency of
default.
Identifying the patterns which indicate if a client has
difficulty paying their installments, to take different actions
like denying the loan, reducing loan amount, lending at
higher interest rates to riskyclients.
3. TYPESOF
DECISIONON
LOAN
APPLICATIONS
Approved: TheCompany hasapproved loan Application
Cancelled: Theclient cancelled the application sometime
during approval. Either the client changed her/his mind
about the loan or in some casesdue to ahigher risk ofthe
client he received worse pricing which he did notwant.
Refused: Thecompany had rejected the loan (because the
client does not meet their requirementsetc.).
Unused offer: Loan hasbeen cancelled by the client but on
different stagesof the process.
4. Risks
Associated
with decision
If the applicant is not likely to repay the
loan, i.e. he/she is likely to default,
then approving the loan may lead to a
financial lossfor thecompany.
If the applicant is likely to repay the
loan, then not approving the loan
theresults in a loss of business to
company.
5. Business
understanding
Theloan providing companies find it hard
to give loans to the people due to their
insufficient or non-existent credit history.
Becauseof that, some consumers useit
astheir advantage by becoming a
defaulter.
Using EDAto analyse the patterns
present in the data to ensure that the
applicants capable of repaying the loan
are not rejected and those who are
unlikely to pay are notapproved.
7. Resultsof analysis– SignificantCATEGORICALVariables– Forrejecting or approvingthe client
By the EDA so far and results discussed above, theseare the Categoricalvariablesto be consideredfor making the decision on a new client
Application DataVariables
NAME_CONTRACT_TYPE
NAME_EDUCATION_TYPE
NAME_HOUSING_TYPE
NAME_TYPE_SUITE
FAMILY_STATUS
OCUUPATION_TYPE
NAME_INCOME_TYPE
FLAG_OWN_REALTY
FLAG_OWN_CAR
Merged data variables:The previousdata also has a lot to influence on the decision, Although this is not a completeanalysis but, thesevariablesare more significant than others.
NAME_PORTFOLIO
NAME_TYPE_SUITE_PREV
NAME_CLIENT_TYPE
NAME_PRODUCT_TYPE
NAME_SELLER_INDUSTRY
NAME_INCOME_TYPE
Theinsights and results of the analysisof each variable are mentioned along with the plots in the next pages.
8. Univariate AnalysisSegmentedover TARGETvariable 0 forno payment difficulties and 1 for
defaults in:
-Application Data
-Application and Previous Merged Data
Imbalance of Data for TARGETvariable 0 and 1 in ApplicationData Imbalance of Data for TARGETvariable 0 and 1 in PreviousData
9. Univariate AnalysisSegmentedover TARGET0 AND1
CONTRACTTYPEConsumer,Cashand Revolving Loans:
Application Data: PERCENTAGEOF
DEFAULTERS(TARGET=1)
IN CASHLOANS– 8.35%
IN REVOLVINGLOANS–5.48%
Contract Typeof previousapplications:
PERCENTAGEOFDEFAULTERS(TARGET=1)
IN CASHLOANS– 9.12%
IN REVOLVINGLOANS-10.46
IN ConsumerLoans-7.70%
Onmerging the two datasets:
Revolving Loansare slightly better in terms of numberof
defaulters turningup
10. Univariate AnalysisSegmentedover TARGET0 AND1
Applicant OwnCaror not Applicants own Realty ornot
RESULTSOFANALYSIS:
FLAG_OWN_CAR:
Thereare moreapplicants
who do not havecar in
the TARGET1 applicants
who havepayment
difficulties
FLAG_OWN_REALTY:
Applicants who own
realty
Not amajor differencein
ratio
11. Univariate AnalysisSegmentedover Target0 and1
Applicant’s SuiteType Applicant’s FamilyStatus
RESULTSOFANALYSIS:
Type_Suite:
Unaccompaniedshowa
higherDefaulter
Family_Status:
Married on the other
hand lower defaultsare
found
12. Univariate AnalysisSegmentedover Target0 and 1 Applicant’s HousingType
Applicant’s EducationType
RESULTSOFANALYSIS:
Applicant’s Housing
Type:
Thosewho own House
turn lesserintoDefaulters
Thosewho are with
parents are more of arisk
of beingdefaulters
Applicant’s Education
Type:
Secondaryeducationare
at higher risk of turning
into defaulters asper
current data
Higher education
applicants couldbe
potentially better
repayers
13. Univariate AnalysisSegmentedover Target0 and1
Applicant’s OccupationType
RESULTSOFANALYSIS:
As The Laborers , Sales Staff, Core
Staff, Managers and Drivers are the
highest number ofapplicants.
TheLaborers turn out to be even
higher defaulters aswell,
Sales Staff is also higher in count of
defaulters.
Managers on the other hand are
slightly low in count in being
defaulter thanrepayers.
According to this information ,
Managers can be preferred higher
over laborers and Sales Staff, but
this needsto be furtheranalysed
14. Univariate AnalysisSegmentedover Target0 and1
Applicant’s LoanApplication ProcessStartDay Applicant’s Gender
RESULTSOFANALYSIS:
CODE_GENDER:
Wecannot saymuch about Gender
becauseit showsalmost no Bias
here in the segmentation on
TARGET
WEEKDAY_APPR_PROCESS_START:
Not significant but aslightlyless
Defaulters onTUESDAY
16. Resultsof analysis– previousand current application mergeddata
NAME_PORTFOLIO:
POSseemsto be betterin
terms of repayers
While XNAand CARDSare
more towardsdefaults
NAME_TYPE_SUITE_PREV:In
the previous data,the same
pattern isseen.
Unaccompaniedand moreof
defaulters
17. Resultsof analysis– previousand current application mergeddata
NAME_CLIENT_TYPE:
Ascompared to Repeaterand
New type of client , the
Refreshedclient type is a
better type that repaysbut
only aslight difference is
observed
NAME_Product_TYPE:
Walk-in type is observed tobe
high at turning into defaulter
compared to XNAand x-sell
which are more of repayin
type
18. Resultsof analysis– previousand current application mergeddata
NAME_SELLER_INDUSTRY:
Consumerelectronics is better
at repaying
While XNAis amajor defaulter
turning category.(XNAis
unknown industry here)
NAME_INCOME_TYPE:
Working: This is the most risky
category asdefaulter turn up is
highest
CommercialAssoc.: slightly
lower default turnup.
Pensioner: lower defaults
observed and hencecouldbe
taken into consideration
Stateservant: Thishaving
lesserdefaults
Unemployed:
Student: negligible numberof
applicants but no defaulters.
Couldbe agood sourcefor
businessfor company.And
alsosafeclients
Businessman:safeclients as
no defaulters, but negligible
applicants.
19. Insightsfrom theCorrelation
Significant NUMERICVariables – Forrejectingor approving the client
Bythe EDAsofar and results discussedabove, these are the Numeric variables which have high correlation In the
Application Data
.
• AMT_INCOME_TOTAL
• AMT_CREDIT
• AMT_ANNUITY
• AMT_GOODS_PRICE
In the previousand current application analysis the following variablesarefound to be highly correlated:
• AMT_ANNUITY_PREV
• AMT_APPLICATION
• AMT_CREDIT_PREV
• AMT_DOWN_PAYMENT
• AMT_GOODS_PRICE_PREV
• CNT_PAYMENT
Theresults of the analysis of each of these variables is mentionedalong
withthe plots in the next pages.
24. Insights from correlation
top 3correlations of appdf TARGET0:
• 1.AMT_GOODS_PRICE and AMT_CREDIT :0.912
• 2.AMT_ANNUITY and AMT_CREDIT :0.643
• 3.AMT_ANNUITY and AMT_INCOME_TOTAL :0.345
top 3correlations of appdf TARGET1:
• 1.AMT_GOODS_PRICE and AMT_CREDIT :0.890
• 2.AMT_ANNUITY and AMT_CREDIT :0.621
• 3.AMT_ANNUITY and AMT_INCOME_TOTAL :0.305
top 3correlations of prev_app_dfTARGET0
• 1.AMT_APPLICATION and AMT_GOODS_PRICE_PREV :0.999
• 2.AMT_GOODS_PRICE_PREV and AMT_CREDIT_PREV :0.932
• 3.AMT_CREDIT_PREV andAMT_APPLICATION :0.878
top 3correlations of prev_app_dfTARGET1
• 1. AMT_APPLICATION and AMT_GOODS_PRICE_PREV :0.999
• 2.AMT_GOODS_PRICE_PREV and AMT_CREDIT_PREV : 0.932
• 3.AMT_CREDIT_PREV andAMT_APPLICATION :0.889