2. Lending Club Data Analysis
Lending Club (LC) data, LC is a peer-to-
peer online lending platform. It is the
world’s largest marketplace connecting
borrowers and investors, where
consumers and small business owners
lower the cost of their credit and enjoy
a better experience than traditional
bank lending, and investors earn
attractive risk-adjusted returns.
3. Project Objective
Predict if lenders can
make default payment for
the borrowed loan
Predict Interest Rate to be
charged on the loan
amount
Predict if the loan will be
approved for an interest
rate of 10% or below
End Users : Borrowers And Lenders
4. Data Exploration
For each loan, over 100 characteristics are
recorded in the table.
We have explored Data Dictionary from the
Lending Club website, which gives us the
information about the features in the dataset.
We explored the dataset using r and Tableau to
understand and find correlations between
different features.
5. Data Pre-Processing
We are selecting 31 columns from 115 columns available based on the data exploration and
feature co-relation methods.
Removing NA’s
Removing Wildcards
Removing Outliers
Creating Calculated Fields
•Fico Mean
•Indicator
•Monthly Income
6. Models:Loanstatus
• Logistic
Regression
• Neural Network
• Random Forest
LoanApproval
• Logistic
• Neural Network
• Random Forest
InterestRate
• Linear Regression
• Neural Network
• Boosted Decision
Tree
8. Model Evaluation for Loan Status
• We have compared over all accuracy, recall, precision, ROC curve and
confusion matrix
• If this model is to help lenders avoid bad loans, the true positive rate must
be much more robust
Neural Network Logistic
Regression
Random forest
Accuracy 0.914629 0.910 0.9006
Precision 0.914629 0.935 0.9006
Recall 0.914629 0.957 0.9006
9. Model Evaluation for Interest Rate
Model Name / Features Neural Network Linear Regression Boosted Decision Tree
RMSE 1.50 1.79 1.20
Co-efficient of
Determination
0.83 0.76 0.89
• We have compared over all RMSE and Co-efficient of
Determination.
10. Model Evaluation for Loan Approval
• We have compared over all accuracy, recall, precision, ROC curve and
confusion matrix
Neural Network Logistic
Regression
Random forest
Accuracy 0.8194 0.8410 0.80
Precision 0.8194 0.8410 0.780
Recall 0.8194 0.8410 0.822
12. Sentiment Analysis
We collected tweets
for lending club
from Twitter
Incorporated our
Research project to
detect Sentiments
of Tweets
Used Tableau for
visualization of
Results
Incorporated the
visualizations on
front End