Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Project data analysis
1. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
Title: Decreased FICO score is associated with increased interest rate
Introduction:
Lending Club is an online financial community that brings together creditworthy
borrowers and savvy investors so that both can benefit financially [1]. It allows its
members to directly invest in and borrow from each other and so avoid the cost and
complexity of the banking system.
On the Lending Club site there are several files that contain complete loan data, including
the current loan status and latest payment information. [2] The data used in this analysis
represents a sample of 2,500 peer-to-peer loans issued by the Lending Club explained
through 14 variables such as: monthly income, amount requested, FICO range (a range
indicating the applicants FICO score) [3], inquiries in the last six months etc. The goal of
this analysis is to establish if there is any correlation between the outcome variable – the
interest rate of the loans – and the other variables especially considering the FICO score,
which is a measure of the creditworthiness of the applicant.
In this project we performed an analysis to determine if there was a significant association
between the interest rate and the FICO score. Using exploratory analysis and standard
multiple regression techniques we show that there is a significant negative relationship
between the interest rate and the FICO score, even after adjusting for important
confounders such as the length of the loan, the amount funded by the investors and the
amount requested by the borrowers.
Our analysis suggests that there is a significant, negative association between Interest
Rate and FICO score. Our analysis estimates the relationship using a linear model relating
one percent of interest rate to one unit of FICO score. There appears to be a strong inverse
relationship between the two variables.
Our results suggest that there are other variables such as loan length, amount requested by
the borrower and amount funded by the investors which are associated with both interest
rate and FICO score. Including these variables in the regression model relating interest
rate to FICO score improves the model fit, but does not remove the significant positive
relationship between the variables.
Methods:
Data Collection
For our analysis we used the data loans from the Lending Club site from 2007 to 2011.
The data were downloaded from lendingclub.com on November 16, 2013 using the R
programming language [3].
Exploratory Analysis
Exploratory analysis was performed by examining tables and plots of the observed data.
We identified transformations to perform on the raw data on the basis of plots and
knowledge of the scale of measured variables. Exploratory analysis was used to (1)
identify missing values, (2) verify the quality of the data, and (3) determine the terms
used in the regression model relating interest rate to FICO score.
Statistical Modeling
1 /9
2. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
To relate interest rate to FICO score we performed a standard multivariate linear
regression model [4]. Model selection was performed on the basis of our exploratory
analysis and prior knowledge of the relationship between interest rate and FICO score,
amount of the loan requested and the length in time of the loan. Coefficients were
estimated with ordinary least squares and standard errors were calculated using standard
asymptotic approximations [5].
Reproducibility
All analyses performed in this manuscript are reproduced in the R markdown file
loansdata.Rmd [6]. To reproduce the exact results presented in this manuscript the cached
version of the analysis must be performed.
Results:
The loans data used in this analysis contains information on the amount requested by the
borrower (Amount.Requested), the amount funded by the investors
(Amount.Funded.By.Investors), the lending interest rate (Interest.rate), the length in time
(in months) of the loan (Loan.Length), the purpose of the loan as stated by the applicant
(Loan.Purpose), the percentage of consumer’s gross income that goes toward paying
debts (Debt.To.Income.Ratio), the U.S. state of residence of the loan applicant (State), the
ownership type of the home (Home.Ownership), the monthly income of the applicant (in
dollars) (Monthly.income), a range indicating the applicants FICO score (FICO.range),
the number of open lines of credit the applicant had at the time of application
(Open.CREDIT.Lines), the total amount outstanding all lines of credit
(Revolving.CREDIT.Balance), the number of authorized queries about the
creditworthiness of the applicant in the 6 months before the loan was issued
(Inquiries.in.the.Last.6.Months), the length of time employed at current job
(Employment.Length). [5].
We identified 77 missing values in the data set we collected for the variable Employment
Length, one missing value for the variable Monthly Income, 2 missing values each for the
variables the number of open lines of credit the applicant had at the time of application
(Open.CREDIT.Lines), the total amount outstanding all lines of credit
(Revolving.CREDIT.Balance), the number of authorized queries about the
creditworthiness of the applicant in the 6 months before the loan was issued
(Inquiries.in.the.Last.6.Months).
Three measured variables were outside the standard ranges: for the variable Home
Ownership there are five options (none, other, owns, rents or has a mortgage), although
there must have been only three: owns, rents or has a mortgage and for the variable
Amount Funded by the Investors there are 2 negative values and 4 values of 0; for the
variable the percentage of consumer’s gross income that goes toward paying debts
(Debt.To.Income.Ratio) there are 8 values of 0% which we consider that must be
removed because it represents the percentage of consumer’s gross income that goes
toward paying the loans that were approved.
2 /9
3. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
After removing the missing values and the observations that were outside the standard
ranges, the data now has 2403 observations and 14 variables.
From the barplot of the variable FICO range we can see that the distribution is positively
skewed with a long right tail (figure 1).
Figure 1. Histogram of FICO Range
The histogram of the interest rate shows a relatively normal distribution with mean 13
(figure 2). The majority of the loans granted had an interest rate between 10,2% and
15,8%.
3 /9
4. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
Figure 2. Histogram of Interest rate
We performed some exploratory analysis and from the boxplots of the interest rate
variable and the factor variables we observed that the monthly income of the borrower,
the employment length, the type of the home ownership and the state from which was the
borrower don’t have any impact on the size of the interest rate of the loan granted. The
variables Loan Purpose, Open Credit Lines, Revolving Credit Balance, Inquiries in the
last 6 months and Debt to income ratio have little correlation with the interest rate
variable. The potential confounders are: the length of the loan, the amount founded by the
investors and the amount requested by the borrowers.
We decided to transform the variable FICO range into the variable FICO score which
represent the average of the low number and the upper number of a FICO range for each
observed loan granted. Subsequent analyses focus on this transformed FICO score
variable. From the boxplot of the FICO range and interest rate we can observe a strongly
negative association between the two (figure 3). The correlation coefficient between the
interest rate and FICO score is -71%.
4 /9
5. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
Figure 3. The Boxplot between the Interest Rate and FICO Range
We first fit a regression model relating interest rate to FICO score (figure 4). Taking into
consideration that the multiple R squared is 50,3% which is not equal to the correlation
coefficient of 71%, it means that there are confounders that explain the rest of 49,7% of
the variation of the variable interest rate.
Figure 4. The relationship between the Interest Rate and FICO score
5 /9
6. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
The correlation coefficient between the amount funded by the investors and the interest
rate is 33%. The same coefficient is for the amount requested by the borrowers and the
association between the interest rate and the loan length is 42%. The mean of the residuals
is approx. 0, the variance is 8,6 and they follow a normal distribution positively skewed
(figure 5).
Figure 5. Residuals distribution for the linear model
Residuals show patterns of non-random variation (figure 6). We attempted to explain
those patterns by fitting models including potential confounders.
6 /9
7. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
Figure 6. The variation of residuals
Our final regression model was: Interest.Rate = b0 + b1*FICO.score +
b2*Amount.Funded.By.Investors + b3*Amount.Requested + f(Length.Loan) + e,
where b0 is an intercept term and b1 represents the change in Interest rate associated with a
change of one unit in FICO score at the same amount funded by investors, amount
requested by borrowers and the same loan length of time. The term f(Length.Loan)
represents a factor model with two different levels. This model explains 75% of the
variation by one percent in the interest rate variable. The P-values show that all the
coefficients are statistically significant.
The error term e represents all sources of unmeasured and unmodeled random variation in
interest rate. Our final regression model appeared to remove most of the non-random
patterns of variation in the residuals. We observe that the residuals for the multivariate
linear model follow a normal distribution with mean 0 and variation 4,38 (figure 7).
7 /9
8. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
Figure 7. Residuals distribution for multivariate linear regression
From figure 8 we notice that the residuals’ variation for the multivariate linear model is
smaller and that we can say it follows a White Noise frequency.
Figure 8. Variation of residuals for multivariate linear model
8 /9
9. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
We observed a highly statistically significant (P = 2e-16) association between interest rate
and FICO score. A change of one percent in Interest Rate corresponded to a change of b1
= -0.08 FICO score (95% Confidence Interval: -0.088, -0.081).
For example, for two loans at the same loan length, amount requested by the borrower,
amount funded by the investors, we would expect an interest rate to increase by 1% at
every 0.08 decrease in the FICO score.
Conclusions:
Our analysis suggests that there is a significant, negative association between Interest
Rate and FICO score. Our analysis estimates the relationship using a linear model relating
one percent of interest rate to one unit of FICO score. There appears to be a strong inverse
relationship between the two variables.
We also observed that other variables such as loan length, amount requested by the
borrower and amount funded by the investors are associated with both interest rate and
FICO score. Including these variables in the regression model relating interest rate to
FICO score improves the model fit, but does not remove the significant positive
relationship between the variables.
Our analysis may be of interest to both investors and borrowers. Investors are interested
in selecting the potential borrowers on the financial market at a low cost, to establish a
fair interest rate and, in consequence, to build an efficient portfolio with a high return rate.
Borrowers are also concerned in obtaining better interest rates at low costs. It could also
be of interest to the Lending Club to support its members in selecting the proper partners.
References
1. LendingClub Corporation. URL: https://www.lendingclub.com/public/about-us.action
Accessed 09/16/2014.
2. LendingClub Corporation. URL: https://www.lendingclub.com/info/download-data.
action, Accessed 09/16/2014
3. http://en.wikipedia.org/wiki/Credit_score_in_the_United_States
4. LendingClub Corporation. URL: https://spark-public.
s3.amazonaws.com/dataanalysis/loansData.csv Accessed 09/16/2014
5. https://spark-public.s3.amazonaws.com/dataanalysis/loansCodebook.pdf
6. R Markdown Page. URL:http://www.rstudio.com/ide/docs/authoring/using_markdown.
Accessed 09/16/2014
9 /9
10. Title: Increased earthquake depth is associated with increased magnitude
Beca Marușa
We observed a highly statistically significant (P = 2e-16) association between interest rate
and FICO score. A change of one percent in Interest Rate corresponded to a change of b1
= -0.08 FICO score (95% Confidence Interval: -0.088, -0.081).
For example, for two loans at the same loan length, amount requested by the borrower,
amount funded by the investors, we would expect an interest rate to increase by 1% at
every 0.08 decrease in the FICO score.
Conclusions:
Our analysis suggests that there is a significant, negative association between Interest
Rate and FICO score. Our analysis estimates the relationship using a linear model relating
one percent of interest rate to one unit of FICO score. There appears to be a strong inverse
relationship between the two variables.
We also observed that other variables such as loan length, amount requested by the
borrower and amount funded by the investors are associated with both interest rate and
FICO score. Including these variables in the regression model relating interest rate to
FICO score improves the model fit, but does not remove the significant positive
relationship between the variables.
Our analysis may be of interest to both investors and borrowers. Investors are interested
in selecting the potential borrowers on the financial market at a low cost, to establish a
fair interest rate and, in consequence, to build an efficient portfolio with a high return rate.
Borrowers are also concerned in obtaining better interest rates at low costs. It could also
be of interest to the Lending Club to support its members in selecting the proper partners.
References
1. LendingClub Corporation. URL: https://www.lendingclub.com/public/about-us.action
Accessed 09/16/2014.
2. LendingClub Corporation. URL: https://www.lendingclub.com/info/download-data.
action, Accessed 09/16/2014
3. http://en.wikipedia.org/wiki/Credit_score_in_the_United_States
4. LendingClub Corporation. URL: https://spark-public.
s3.amazonaws.com/dataanalysis/loansData.csv Accessed 09/16/2014
5. https://spark-public.s3.amazonaws.com/dataanalysis/loansCodebook.pdf
6. R Markdown Page. URL:http://www.rstudio.com/ide/docs/authoring/using_markdown.
Accessed 09/16/2014
9 /9