SlideShare a Scribd company logo
1 of 24
Aditi Wadhawan | Chinmayee Mohapatra | Malvika Elango
Manasa Damera | Vatsal Randhar
AN EXPLORATORY ANALYSIS
AGENDA
PROJECT GOAL
A brief description of what we desire to achieve in this project
OVERVIEW OF DATA
Details regarding the dataset including source, size, type of variables
DATA QUALITY ISSUES
Explanation of the data quality issues we faced including missing data, outliers
and presence of insignificant variables
DATA CLEANING
DATA INSIGHTS
2
1
3
4
5
Steps to correct each of the above issues was undertaken to produce a clean
dataset to conduct further analysis on
The findings of analysis is presented through simple and effective plots
● Lending Club is a US peer-to-peer lending
company
● It connects borrowers with investors
through online marketplace.
DATA SOURCE
Source: https://www.lendingclub.com/info/download-data.action
PROJECT GOAL
2 31
Explore different
variables explaining
the attributes
related to loan and
customers
Identify factors that
are important to
predict customer
default
Visualize findings
in a simple &
effective manner
OVERVIEW OF THE DATA
421,095 observation
143 variables
225 MB
Numeric
110
Categorical
33
Examples:
● Loan description provided by the
borrower
● Job title
● Home ownership
● Loan review status - approved/not
approved
Examples:
● Self reported annual income
● FICO range - high and low
● Loan amount
● Interest rate of loan
DATA DESCRIPTION
DATA QUALITY ISSUES
&
DATA CLEANING
DATA QUALITY ISSUES
Missing Data
Outliers
Invalid Data
Too Many Categories One Category
Variables with more than 50%
missing data were removed
● 64 variables
Variable Percent Missing(%)
Loan description by the borrower 100
Months since the last public record. 82
Months since most recent 90-day or
worse rating
71
The combined self-reported annual
income provided by the co-
borrowers
100
Co-borrowers' joint income was
verified by LC, not verified, or if the
income source was verified
100
MISSING DATA
Variables with less than 50% missing
data were imputed with median (low
impact) or min/max (penalize high/low
values)
● 18 total columns
○ 10 imputed
○ 8 - rows with missing values
removed
Variable
Percent
Missing(%)
Imputed By
DTI 0.04 Median
Months since last
delinquency
48 Maximum
Number of
Revolving Accounts
0.2 Median
Number of Current
Delinquent
Accounts
4 Minimum
MISSING DATA
INVALID DATA
● Variables - total_rec_late_fee had
invalid values
● There were 13 negative values that
were identified
● These values were later removed by
imputing them with 0 to replace by
mode of values for this indicator
OUTLIERS
12
• Outliers were replaced with the median value.
• Above graphs illustrate outlier treatment for the variable “Total revolving
high credit/credit limit”.
OUTLIERS
After
replacing
with median
value
Variable:
The upper boundary
range the borrower’s last
FICO pulled belongs to.
TOO MANY CATEGORIES/ONE CATEGORY
14
• Publicly available policy code had only 1 category
• Zip codes only included first three numbers and did not add any value
DATA INSIGHTS
LOANS ISSUED BY REGION
● July and October had the highest amount of loans issued
● Southeast and northeast regions had the highest amount of loans issued
● Southwest region had the lowest amount of loans issued
GEOGRAPHIC DISTRIBUTION OF LENDING CLUB ISSUED LOANS
17
CALIFORNIA
TEXAS
FLORIDA
NEW YORK
CORRELATION OF IMPORTANT VARIABLES
Fico Score, Total payment, Total
recovery on principal amount are good
indicators of probability of default
A logistic regression classification
model could be built using the above
customer characteristics.
GOOD INDICATORS OF DEFAULT
In general, the default percentages are
directly proportional to the interest rates In general, the default percentages are
inversely proportional to the last
payment amount
INCOME CHARACTERISTICS
● Loan borrowers having high income
(i.e. >$200,000) took out higher loan
amount compared to people with lower
and medium incomes
● Loan borrowers having low income (i.e.
<$100,000) generally had lower
median employment years(5 years)
than the people with medium and high
income (7 years).
INCOME CHARACTERISTICS
● Loan borrowers irrespective of their
income have similar FICO score
distribution, implying fairness of score
● Loan borrowers having low income
have the highest interest rate, followed
by medium and high income groups
INCOME AND AVERAGE INTEREST ON LOAN PURPOSE
22
Lot of data cleaning and processing required to create an analysis ready dataset.
Limited period of data (12 months) for analysis.
Limited knowledge of the lending domain.
Fewer number of strongly correlated variables.
1
2
3
4
DATA CHALLENGES/LIMITATIONS
REFERENCES
Peer to Peer Lending & Alternative Investing. (n.d.). Retrieved from https://www.lendingclub.com/
Bachmann, J. A. (n.d.). Lending Club || Risk Analysis and Metrics. Retrieved from
https://www.kaggle.com/janiobachmann/lending-club-risk-analysis-and-metrics
Sheth, A. (n.d.). Analysis and Modelling of Lending Club loan data. Retrieved from
https://www.kaggle.com/adityasheth/analysis-and-modelling-of-lending-club-loan-data

More Related Content

Similar to Apanps5210 - final presentation

CECL Prep: Do's and Don'ts
CECL Prep: Do's and Don'tsCECL Prep: Do's and Don'ts
CECL Prep: Do's and Don'tsLibby Bierman
 
Understanding home loan eligibility criteria.pptx
Understanding home loan eligibility criteria.pptxUnderstanding home loan eligibility criteria.pptx
Understanding home loan eligibility criteria.pptxAvinashTyagi15
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
SAP - Business process Automation - Accounts Receivable
SAP - Business process Automation - Accounts Receivable SAP - Business process Automation - Accounts Receivable
SAP - Business process Automation - Accounts Receivable Rui Fonseca Dias
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentVishalPatil527
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
PrecisionLender Overview Deck - Deloitte
PrecisionLender Overview Deck - DeloittePrecisionLender Overview Deck - Deloitte
PrecisionLender Overview Deck - DeloittePrecisionLender
 
Customer Lifetime Value
Customer Lifetime ValueCustomer Lifetime Value
Customer Lifetime ValueJennaToler
 
Trust But Verify - Equifax Automotive
Trust But Verify - Equifax AutomotiveTrust But Verify - Equifax Automotive
Trust But Verify - Equifax AutomotiveEquifax
 
Models for Financing Clean Energy- SWEEP 2009
Models for Financing Clean Energy- SWEEP 2009Models for Financing Clean Energy- SWEEP 2009
Models for Financing Clean Energy- SWEEP 2009HarcourtBrownEF
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditFriedman Associates
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
The CECL Workshop Series Part I: Crafting Your Implementation Plan
The CECL Workshop Series Part I: Crafting Your Implementation PlanThe CECL Workshop Series Part I: Crafting Your Implementation Plan
The CECL Workshop Series Part I: Crafting Your Implementation PlanLibby Bierman
 
Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...
Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...
Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...Corporate Insight
 
Trends in Alternative Financial Services
Trends in Alternative Financial Services Trends in Alternative Financial Services
Trends in Alternative Financial Services Experian
 
Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateKaren Yang
 

Similar to Apanps5210 - final presentation (20)

Programming for big data
Programming for big dataProgramming for big data
Programming for big data
 
CECL Prep: Do's and Don'ts
CECL Prep: Do's and Don'tsCECL Prep: Do's and Don'ts
CECL Prep: Do's and Don'ts
 
Understanding home loan eligibility criteria.pptx
Understanding home loan eligibility criteria.pptxUnderstanding home loan eligibility criteria.pptx
Understanding home loan eligibility criteria.pptx
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
SAP - Business process Automation - Accounts Receivable
SAP - Business process Automation - Accounts Receivable SAP - Business process Automation - Accounts Receivable
SAP - Business process Automation - Accounts Receivable
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk Assesment
 
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptxCredit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
 
Creditscore
CreditscoreCreditscore
Creditscore
 
PrecisionLender Overview Deck - Deloitte
PrecisionLender Overview Deck - DeloittePrecisionLender Overview Deck - Deloitte
PrecisionLender Overview Deck - Deloitte
 
Resume, Joyce Montier-Drew
Resume, Joyce Montier-DrewResume, Joyce Montier-Drew
Resume, Joyce Montier-Drew
 
Customer Lifetime Value
Customer Lifetime ValueCustomer Lifetime Value
Customer Lifetime Value
 
Trust But Verify - Equifax Automotive
Trust But Verify - Equifax AutomotiveTrust But Verify - Equifax Automotive
Trust But Verify - Equifax Automotive
 
Models for Financing Clean Energy- SWEEP 2009
Models for Financing Clean Energy- SWEEP 2009Models for Financing Clean Energy- SWEEP 2009
Models for Financing Clean Energy- SWEEP 2009
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for Microcredit
 
Ch16 bb
Ch16 bbCh16 bb
Ch16 bb
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
The CECL Workshop Series Part I: Crafting Your Implementation Plan
The CECL Workshop Series Part I: Crafting Your Implementation PlanThe CECL Workshop Series Part I: Crafting Your Implementation Plan
The CECL Workshop Series Part I: Crafting Your Implementation Plan
 
Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...
Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...
Best-in-Class for the Online DC Plan Participant Experience - A Competitive A...
 
Trends in Alternative Financial Services
Trends in Alternative Financial Services Trends in Alternative Financial Services
Trends in Alternative Financial Services
 
Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRate
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Apanps5210 - final presentation

  • 1. Aditi Wadhawan | Chinmayee Mohapatra | Malvika Elango Manasa Damera | Vatsal Randhar AN EXPLORATORY ANALYSIS
  • 2. AGENDA PROJECT GOAL A brief description of what we desire to achieve in this project OVERVIEW OF DATA Details regarding the dataset including source, size, type of variables DATA QUALITY ISSUES Explanation of the data quality issues we faced including missing data, outliers and presence of insignificant variables DATA CLEANING DATA INSIGHTS 2 1 3 4 5 Steps to correct each of the above issues was undertaken to produce a clean dataset to conduct further analysis on The findings of analysis is presented through simple and effective plots
  • 3. ● Lending Club is a US peer-to-peer lending company ● It connects borrowers with investors through online marketplace. DATA SOURCE Source: https://www.lendingclub.com/info/download-data.action
  • 4. PROJECT GOAL 2 31 Explore different variables explaining the attributes related to loan and customers Identify factors that are important to predict customer default Visualize findings in a simple & effective manner
  • 6. 421,095 observation 143 variables 225 MB Numeric 110 Categorical 33 Examples: ● Loan description provided by the borrower ● Job title ● Home ownership ● Loan review status - approved/not approved Examples: ● Self reported annual income ● FICO range - high and low ● Loan amount ● Interest rate of loan DATA DESCRIPTION
  • 8. DATA QUALITY ISSUES Missing Data Outliers Invalid Data Too Many Categories One Category
  • 9. Variables with more than 50% missing data were removed ● 64 variables Variable Percent Missing(%) Loan description by the borrower 100 Months since the last public record. 82 Months since most recent 90-day or worse rating 71 The combined self-reported annual income provided by the co- borrowers 100 Co-borrowers' joint income was verified by LC, not verified, or if the income source was verified 100 MISSING DATA
  • 10. Variables with less than 50% missing data were imputed with median (low impact) or min/max (penalize high/low values) ● 18 total columns ○ 10 imputed ○ 8 - rows with missing values removed Variable Percent Missing(%) Imputed By DTI 0.04 Median Months since last delinquency 48 Maximum Number of Revolving Accounts 0.2 Median Number of Current Delinquent Accounts 4 Minimum MISSING DATA
  • 11. INVALID DATA ● Variables - total_rec_late_fee had invalid values ● There were 13 negative values that were identified ● These values were later removed by imputing them with 0 to replace by mode of values for this indicator
  • 12. OUTLIERS 12 • Outliers were replaced with the median value. • Above graphs illustrate outlier treatment for the variable “Total revolving high credit/credit limit”.
  • 13. OUTLIERS After replacing with median value Variable: The upper boundary range the borrower’s last FICO pulled belongs to.
  • 14. TOO MANY CATEGORIES/ONE CATEGORY 14 • Publicly available policy code had only 1 category • Zip codes only included first three numbers and did not add any value
  • 16. LOANS ISSUED BY REGION ● July and October had the highest amount of loans issued ● Southeast and northeast regions had the highest amount of loans issued ● Southwest region had the lowest amount of loans issued
  • 17. GEOGRAPHIC DISTRIBUTION OF LENDING CLUB ISSUED LOANS 17 CALIFORNIA TEXAS FLORIDA NEW YORK
  • 18. CORRELATION OF IMPORTANT VARIABLES Fico Score, Total payment, Total recovery on principal amount are good indicators of probability of default A logistic regression classification model could be built using the above customer characteristics.
  • 19. GOOD INDICATORS OF DEFAULT In general, the default percentages are directly proportional to the interest rates In general, the default percentages are inversely proportional to the last payment amount
  • 20. INCOME CHARACTERISTICS ● Loan borrowers having high income (i.e. >$200,000) took out higher loan amount compared to people with lower and medium incomes ● Loan borrowers having low income (i.e. <$100,000) generally had lower median employment years(5 years) than the people with medium and high income (7 years).
  • 21. INCOME CHARACTERISTICS ● Loan borrowers irrespective of their income have similar FICO score distribution, implying fairness of score ● Loan borrowers having low income have the highest interest rate, followed by medium and high income groups
  • 22. INCOME AND AVERAGE INTEREST ON LOAN PURPOSE 22
  • 23. Lot of data cleaning and processing required to create an analysis ready dataset. Limited period of data (12 months) for analysis. Limited knowledge of the lending domain. Fewer number of strongly correlated variables. 1 2 3 4 DATA CHALLENGES/LIMITATIONS
  • 24. REFERENCES Peer to Peer Lending & Alternative Investing. (n.d.). Retrieved from https://www.lendingclub.com/ Bachmann, J. A. (n.d.). Lending Club || Risk Analysis and Metrics. Retrieved from https://www.kaggle.com/janiobachmann/lending-club-risk-analysis-and-metrics Sheth, A. (n.d.). Analysis and Modelling of Lending Club loan data. Retrieved from https://www.kaggle.com/adityasheth/analysis-and-modelling-of-lending-club-loan-data

Editor's Notes

  1. California, Texas, New York and Florida are the states with the highest amount of loans issued. California, Texas and New York are all above the average annual income (with the exclusion of Florida), this could be the probable reason why most loans are issued in these states.