This document presents an analysis of credit risk for a bank. It aims to identify patterns that indicate if a client will have difficulty paying installments. The analysis includes:
- Cleaning and merging loan application and previous loan data
- Analyzing relationships between client attributes and payment difficulties through visualization
- Key insights show strong indicators of default include clients with certain housing types, family statuses, occupations or lower education levels. Clients with higher incomes, providing more documents, or older ages are less likely to default. Based on these insights, a credit scoring system is proposed to help the bank make lending decisions.
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case StudyStudyStudyStudyStudy
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)PRABHASH GOKARN
An attempt to analyze Bank Data on loans and find patterns in the data that are predictors of loan defaults. This will ensure that future loan decisions are made more logically and reduce possible defaults. The analysis has been done using Python.
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case StudyStudyStudyStudyStudy
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)PRABHASH GOKARN
An attempt to analyze Bank Data on loans and find patterns in the data that are predictors of loan defaults. This will ensure that future loan decisions are made more logically and reduce possible defaults. The analysis has been done using Python.
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
Worked on real life business problem where due to Covid-19, Airbnb has seen a major decline in revenue. To prepare for the next best steps that Airbnb needs to take as a business, analysis has been done on a dataset consisting of various Airbnb listings in New York.
This analysis served as the basis for the presentation created for the Lead Data Analyst and Data Analysis Managers
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
There are 100,000 applicants for loans. Who is likely to default? How to effectively offer a loan
There are 100,000 consumers who is likely to buy my product? How to effectively market my product?
There are more than 1,000,000,000 transactions in a day. How to identify the fraud transaction?
There are 1,000,000 claims every year. How to identify the fake claims
Ingredients based - Recipe recommendation engineBharat Gandhi
I teamed up with 3 of my classmates to come up with a recipe recommendation engine that takes in ingredients & cuisine preferences as an input & gives you the best suited recipe for you. This was the final project for our Data Science in the Wild class at Cornell Tech for Spring 2020. Shoutout to my team Infinite Players, Prashant, Saloni & Dale!
Customer churn prediction for telecom data set.Kuldeep Mahani
Customer churn prediction and relevant recommendations as per DSN telecom data analysis. Random forest and logistic regression were applied to predict customer churn.
Customer Churn is a burning problem for Telecom companies. In this project, we simulate one such case of customer churn where we work on a data of postpaid customers with a contract. The data has information about the customer usage behavior, contract details and the payment details. The data also indicates which were the customers who canceled their service. Based on this past data, we need to build a model which can predict whether a customer will cancel their service in the future or not.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
Loan default prediction with machine language Aayush Kumar
Deafult-Loan-Prediction-Project-Using-Random-Forest-and-Decision-Tree
Deafult Loan Prediction Project Using Random Forest and Decision Tree, In This Project we use loan data from Leanding Club Random Forest Project - Deafult Loan Prediction For this project we will be exploring publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor you would want to invest in people who showed a profile of having a high probability of paying you back. We will try to create a model that will help predict this.
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
Worked on real life business problem where due to Covid-19, Airbnb has seen a major decline in revenue. To prepare for the next best steps that Airbnb needs to take as a business, analysis has been done on a dataset consisting of various Airbnb listings in New York.
This analysis served as the basis for the presentation created for the Lead Data Analyst and Data Analysis Managers
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
There are 100,000 applicants for loans. Who is likely to default? How to effectively offer a loan
There are 100,000 consumers who is likely to buy my product? How to effectively market my product?
There are more than 1,000,000,000 transactions in a day. How to identify the fraud transaction?
There are 1,000,000 claims every year. How to identify the fake claims
Ingredients based - Recipe recommendation engineBharat Gandhi
I teamed up with 3 of my classmates to come up with a recipe recommendation engine that takes in ingredients & cuisine preferences as an input & gives you the best suited recipe for you. This was the final project for our Data Science in the Wild class at Cornell Tech for Spring 2020. Shoutout to my team Infinite Players, Prashant, Saloni & Dale!
Customer churn prediction for telecom data set.Kuldeep Mahani
Customer churn prediction and relevant recommendations as per DSN telecom data analysis. Random forest and logistic regression were applied to predict customer churn.
Customer Churn is a burning problem for Telecom companies. In this project, we simulate one such case of customer churn where we work on a data of postpaid customers with a contract. The data has information about the customer usage behavior, contract details and the payment details. The data also indicates which were the customers who canceled their service. Based on this past data, we need to build a model which can predict whether a customer will cancel their service in the future or not.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
Loan default prediction with machine language Aayush Kumar
Deafult-Loan-Prediction-Project-Using-Random-Forest-and-Decision-Tree
Deafult Loan Prediction Project Using Random Forest and Decision Tree, In This Project we use loan data from Leanding Club Random Forest Project - Deafult Loan Prediction For this project we will be exploring publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor you would want to invest in people who showed a profile of having a high probability of paying you back. We will try to create a model that will help predict this.
This presentation gives us an idea about how to read an CIBIL Report, Credit Score meanings, what are the different ways by which the score is determined which is very rare to know in any other PPT that you come across. Hope this helps all. thanks
Learn about how to do a qualitative and quantitative analysis to determine the gap in your market for micro and small business financing. Friedman Associates has developed a unique methodology in this area.
Ryan Murphy and I share an introductory analysis of the CLV of a national credit union. It includes an exploratory analysis of the data set of over 60,000 accounts and how demographic and other factors play into the profitability of our calculated customer clusters.
Credit Repair Education for Libraries 6.15.19Victor Johnson
Victor Vonico Johnson, as General Partner for Credit Restoration Mentors, conducts a talk at a Carrollton Texas Library to educate the community about the benefits of good credit, and how to achieve it.
- Conduct a job analysis to determine critical behaviors for success in CRM roles (e.g., customer service representatives, sales representatives, account managers).
- Gather input from managers, employees, and customers to identify essential behaviors.
- Align behaviors with company values and CRM goals.
2. Define Performance Levels:
- Establish clear and measurable performance levels for each behavior (e.g., unsatisfactory, needs improvement, meets expectations, exceeds expectations).
- Use specific examples to illustrate each level.
3. Create the Scorecard:
- Develop a visual representation of the scorecard, listing behaviors and performance levels.
- Use a simple and easy-to-understand format.
How a Credit Union Can Stay Off the CFPB's RadarSilver cloud
Learn what it takes to align consumer expectations, CFPB expectations and your business. SilverCloud, Inc. will take you through the evolving consumer behaviors, the current regulatory landscape and where you want to be to stay off the radar. Learn what your financial institution needs to be doing to have happier consumers, drive more revenue, lower costs, and stay compliant.
• Accounts Payables invoice processor utilizing Quickbooks Accounting Software.
• Member Services: Answers inbound calls from members, takes control of each call and by identifying the nature of the call, performs research, and places outbound calls to vendors to obtain necessary information to resolves members issue.
• Master’s proficiency and demonstrates a capacity for intense multi-tasking while answering member/vendor calls, placing outbound calls to resolve the nature of the calls, answering emails in Microsoft Outlook, and processes Purchase orders in Quickbooks, when needed.
• Resolves order shipping errors by clarifying the member’s complaint and determining course of action. Reports error to vendor and requests RMA for incorrect item to be returned and correct item to be shipped. Explains the best solution to solve member’s problem. Expedites correction or adjustment and follows up to ensure resolution.
• Processes up to 100 invoices on a daily basis for Accounts Payables. Maintains accuracy.
• Maintains invoice filing and prepares for shipment to warehouse for storage.
Worked on End-to-End Implementation of Machine Learning Project.
Project Name: Loan Status Prediction
• Handling Null Values, Outliers, Unbalanced Dataset
• Data Pre-processing, Restructuring for Balanced Data
• Applying various Machine Learning Classification models
• Analyzing Various Accuracy Parameters
• Tuning and Pickling Models
• Deploying Model on Streamlit
Similar to Exploratory Data Analysis For Credit Risk Assesment (20)
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Exploratory Data Analysis For Credit Risk Assesment
1. THE CREDIT RISK ANALYTICS
EDA Case Study By,
• Mr. Prathmesh Pise
• Mr. Vishal Patil
2. CONTENTS
Problem statement
Flow Chart
Importing and Cleaning1
Importing and Cleaning2
Approach
Data Visualization
Significant Insights
3. PROBLEM STATEMENT:
1. Aim is to identify patterns which indicate if a client had difficulty paying their installments which
will help the bank in taking following actions:
• Denying the loan
• Reducing the amount of loan
• Lending (to risky applicants) at a higher interest rate, etc.
2. Identifying the co-relation between dependent variables with target variable
3. To ensure that the consumers capable of repaying the loan are not rejected
5. 1. Imported pandas, matplotlib and seaborn library for loading the data and data
visualization
2. Target variable is flag variable weather a clients pays instalments on time or not
3. Two data frames were created from csv files namely,
• Application data- Contains all the information of the client at the time of application
• Previous application data - contains information about the client’s previous loan data
4. Dropped unnecessary columns like the one belonging to client’s house dimensions
5. Achieved 40% memory usage reduction by changing the data types of categorical
variables from object to category.
IMPORTING AND CLEANING1:
6. IMPORTING AND CLEANING2:
1. Imported required data set for previous application data set:
• Previous application data set as previous_app
2. Cleaned the data by removing columns that were less significant for
analysis and were prone to containing erroneous data, namely,
• WEEKDAY_APPR_PROCESS_START
• HOUR_APPR_PROCESS_START, etc.
3. Achieved 40% memory usage reduction by changing the data types
of categorical variables from object to category and dropping
unnecessary columns
7. HANDLING DATA AND MISSING VALUES:
1. Checked for null values in application_data and found that:
• OWN_CAR_AGE had 65.99%, OCCUPATION_TYPE had 31.35% and EXT_SOURCE_1
had 56.38% missing values
• Hence decided to drop these columns
2. We also checked for null values in previous_app and found that:
• RATE_INTEREST_PRIMARY had 99.64%
• RATE_INTEREST_PRIVILEGED had 99.64% had of Null values
• Hence we dropped them
3. The external source data had some missing values , We impute them to zero
as the External agencies have not provided score for these customers
meaning the client's account was not prone to be a defaulter. Hence score
was assumed as zero.
4. Took average of EXT_SOURCE_1, EXT_SOURCE_2, EXT_SOURCE_3 columns
creating ext_sources column.
5. In previous_app, NAME_TYPE_SUITE had 49% missing values and does not
affect whether the client will default or not. Hence, we drop this column.
8. 6. Defined a function null_percentage to calculate null values in the columns
from both the data sets.
7. Since data is imbalanced we have taken proportion of all the categories to
analyse the data and have used stacked bar plots as it enhances our
understanding.
8. Defined a function called stacker this function compares a categorical column
with our Target variable, it considers data imbalance and converts each
category into percentages and plots the stacked chart with their proportion.
9. Merged previous_app data set with application data set, to compare it with
our Target variable.
9. DATA VISUALIZATION
• Univariate analysis on following variables,
1. Target
2. Income
3. Children count
• Bi-variate analysis on Target variable against the following,
1. Gender & age
2. Contract type
3. Average external score
4. Income & occupation type
5. Education type etc
• Multi-variate analysis on Target variable against the following,
1. Income and education type
2. Income and previous application status
10. TARGET V/S GENDER
Inference:
• The percentage of Males that pay late installments is more than that of females.
• The percentage of Females paying on time is more than that of males.
11. TARGET V/S CONTRACT TYPE
Inference:
• The clients with Cash loans tend to pay late as compared to the clients with
Revolving loans.
12. TARGET V/S CAR
Inference:
• Percentage of people with No-Car and paying late installments are slightly more
than that of people with Car
13. TARGET V/S AVG_EXT_SCORE
Inference:
• 50% client population who delay their installment payments have a low average
external score, and it ranges from 0.2-0.4 approximately.
• The clients who pay their installments on time have a moderate average score ranging
from 0.3-0.5 approximately.
• There are some clients who have received a very high score and they delay their
installments.
14. TARGET V/S AMT INCOME
Inference:
• The clients with income less than 2 lakhs pa pay late installments among these
classes.
• The clients with income more that 6 lakhs pa i.e. Rich class is more likely to pay on
time than other classes.
15. TARGET V/S INCOME TYPE
Inference:
• Amongst all the Income types, the Others(Maternity leaves, Students, Unemployed clients, etc.) are the
one who tend to pay late installments.
• The Businessman income types do not pay late installments.
• The working class also have a higher percentage of people in late paying installments which is 10%.
16. TARGET V/S FAMILY STATUS
Inference:
• The clients who are Single/not married and the Civil marriage class tend to
pay late installments.
17. TARGET V/S HOUSING TYPE
Inference:
• The clients who live in rented apartments and with parents tend to pay late
installments.
• The clients who stay in office apartments pay on time installments.
18. Inference:
• The people who do not provide the Document2 tend to pay late
installments. Hence it is advisable to make this document mandatory.
TARGET V/S DOCUMENT 2
19. Inference:
• The people who provide mobile number tend to pay installments on time.
• Hence it is advisable to collect mobile number of the clients.
TARGET V/S CLIENTS PROVIDING MOBILE NUMBERS
20. TARGET V/S AGE
Inference:
• The clients with age below 25 tend to pay late installments.
• The clients with age of 65 and above pay the installments on time.
• The possible reason is that clients below age 25 are less financially stable as
compared to those above 65.
21. TARGET V/S OCCUPATION TYPE
Inference:
• Low skill laborers , Waiters/barmen staff , security staff , cooking , cleaning staff , drivers, Laborers tend
to pay late installments.
• Most of the accountants, High skill tech staff and HR-staffs pay the installments on time.
• The obvious reason being that they represent the sectors with higher salary.
22. TARGET V/S CNT_CHILDREN
Inference:
• The clients who have count of children greater than 5 tend to pay late installments.
• Most of the clients with count of children of 2 or 3 pay installments on time.
23. TARGET V/S NAME_EDUCATION_TYPE
Inference:
• The clients with academic degree pay installments on time.
• The clients with lower secondary education pay late installments.
24. MULTIVARIATE ANALYSIS ON NUMERIC VARIABLES
Inference:
• A positive high co-relation is seen between good's price and amount credit
• A positive high co-relation is seen between annuity amount and amount credit
• A positive high co-relation is seen between annuity amount and good's price
25. PROPORTIONS OF CLIENTS BASED ON PREVIOUS APPLICATION STATUS
Inference:
• Out of the total loan applications only 63% were Approved.
• 17% were Refused loan and 19% applications were cancelled by the clients.
26. HANDLING OUTLIERS
Inference:
• Outliers were observed in the annual income variable.
• 99% clients had their income less than 4.75 LPA
• Hence for analyzing the annual income, the analysis was limited to clients with annual
income less than 4.75
27. TARGET V/S INCOME V/S EDUCATION TYPE
Inference:
• The clients with Education type as academic degree and income in range of
3-3.6 Lakhs pay late installments as compared to those with low income
28. TARGET V/S NAME_CASH_LOAN_PURPOSE
Inference:
• The clients who previously took loan for the payments on other loan pay
late installments.
• Following them ,are the clients with Home/Office/Land Loan and personal
household expenses, they pay late installments
29. TARGET V/S INCOME V/S PREVIOUS APPLICATION
STATUS
Inference:
• Clients who took loan for Business Development and annual income above
2.6 LPA pay late instalments.
30. TARGET V/S PREVIOUS LOAN STATUS
Inference:
• The clients for whom the previous loan was Refused , pay
the installments late
31. KEY INSIGHTS
• Following are the strong indicators of default
1. NAME_HOUSING_TYPE : Clients living in rented apartments
2. NAME_FAMILY_STATUS : Clients belonging to Civil marriage
and those who are single/married
3. NAME_INCOME_TYPE : Maternity leave , students,
Unemployed clients
4. FLAG_DOCUMENT_2 : The clients who do not provide
document 2
5. FLAG_MOBIL : The clients who do not provide mobile number
6. OCCUPATION_TYPE : Low skill, Laborer, Waiters, Barmen,
Security staff
7. CNT_CHILDREN : Positive co-relation between number of
children with the chance of client being a defaulter
8. NAME_EDUCATION_TYPE : Clients with lower secondary and
secondary/ secondary special and incomplete higher
9. EDUCATION_TYPE : Clients with academic degree and annual
income between 3-3.6 lakhs
10. CASH_LOAN_PURPOSE : Clients with previous loan purpose as
payment on other loans
• Following clients should be targeted
1. CODE_GENDER : Females
2. NAME_CONTRACT_TYPE : Clients with revolving loans
3. FLAG_CODE_CAR : Clients with car
4. AVG_EXT_SCORE : Clients with moderate external score
5. AMT_INCOME_TOTAL : Clients with annual income
greater than 6 lakhs
6. NAME_INCOME_TYPE : The businessmen and pensioners
7. FLAG_MOBIL :Clients who provide mobile number
8. DAYS_BIRTH :Clients with age of 65 and above
9. OCCUPATION_TYPE : accountants, High skill tech staff and
HR-staffs pay the installments on time
10. NAME_EDUCATION_TYPE : Clients with academic degree
32. CONCLUSION
• Based on the inferences obtained, a credit score can be
set
• Variables which contributes towards the chances of a client
being a defaulter will be rated a low score
• The variables contributing towards the chances of a client paying
the installments on time, will be rated with high credit scores
• Based on the final credit score, bank can take following
decision,
1. Grant loan to clients with healthy overall credit score
2. Grant loan at higher interest rates to clients with
comparatively low credit scores
3. Reject loan for clients with extremely low credit score