SlideShare a Scribd company logo
MISY 267: BUSINESS
ANALYTICS FINAL
PROJECT
FICO Credit Risk Data
MAY 9TH
, 2016
UNIVERSITY OF DELAWARE
Daniel Amato, Kyle Zaino, Chirag Dhamecha, Erik Caputo, Joey Czechowicz
MISY267: Business Analytics 1
Goals
Our problem is that we are uncertain what variables influence a person’s FICO Score and
so our goal is to ascertain what demographic variables determine a given person’s FICO
score. We will use a variable selection method to determine which variables to include or
exclude from the model and we will run a linear regression model because we are
predicting a continuous variable as opposed to a binary variable. Our team will test the
model assumptions and check for any violations within the model. If any violations are
found, we will make suggestions on how to address them. This model will then be able to
assist a manager or a financial firm that is attempting to determine if an applicant is too
risky to be given a loan.
Methodologies
Data Available & Variables of Interest
Once we eliminated the variables with the terms “Application” and “Trades” in their
names, we were left with the following variables: risk flag (paid as negotiated flag), interest
revenue accumulated, sampling weight, debt to income ratio, number of borrowers,
geographic region, prior or current bank relationship, collateral value, loan amount
requested, loan to value ratio, FICO score, average months in file, months since most recent
delinquency, maximum delinquency and public records in the last twelve months,
maximums delinquency ever, number of inquiries in last 6 months, number of inquiries in
last 6 months excluding 7 days, net fraction revolving burden, and net fraction installment
burden.
Variable Selection: Backward Stepwise Selection
We used the Backward Stepwise variable selection method where we started running the
model with all variables available, and removed one variable at a time, choosing to remove
the variables that contributed the least to the model. Since we are working with FICO scores
of individual persons, we will choose a model based on the best prediction accuracy and
not R-squared value. Therefore, we want a model that yields the smallest errors out-of-
sample. This led us to remove all of the variables from our model except for average
months in file, maximum delinquency with the last 12 months, maximum delinquency ever,
and net fraction revolving burden. Our final model has an R-squared value of
approximately 71%, and the aforementioned four variables. Additionally, we looked for
interactions between our remaining variables, but found that there were not any interactions
that added value to our model. Lastly, we tested non-linear relationships, where we tested
transformations of variables within our model. We added the square root of the ‘Max
Delinquency in 12 months’ variable into our model, which increased our R2
by about 2%.
However, none of the variable transformations we tested substantially increased the R-
squared of our model and so they were not included in the model to avoid the risk of
overfitting. If a variable added an R-squared value of less than 5% we decided not to
include it in our model.
MISY267: Business Analytics 2
Proposed Model:
 Division of Data: 70% of the data is part our training set and 30% is part of our
test set
 Random Selection Method: The observations of our data occur over different
subjects so we will use randomly selected observations as our test set.
 Proposed Linear Regression Model: FICO Score = 590.41 + 0.39*Average
Months In File + 12.71*Max Delinquency Last 12 Months + 6.28*Max
Delinquency Ever –0.95*Net Fraction Revolving Burden
 Response Variable Interpretation: An increase in FICO score decreases the risk
of the individual/applicant.
● Multicollinearity: Multicollinearity is when a predictor variable shares a
relationship with another predictor variable to the point where we cannot determine
how much of an impact increasing one variable has on the model. We looked at the
correlation matrix to check for multicollinearity in order to make sure we can
interpret our variables directly and found that some of our variables have a
relationship above .25 or below -.25. There was a correlation of 0.61 between max
delinquency in the last 12 months and max delinquency ever. Although the
correlation is larger than 0.25, we don’t want to remove any of these variables as
they contribute a lot to the model on their own. However, interpretation of these
variables requires careful evaluation. Therefore, multicollinearity is an issue in this
model because all variables cannot be interpreted directly and some require more
careful evaluation.
● Statistical Significance: All of our variables have stars in the output, therefore they
are all statistically significant predictors of FICO scores.
● Intercept Interpretation: When the application has been in the file for 0 months,
there has been 0 delinquencies in the last 12 months and ever and the net fraction
revolving burden (outstanding balance/credit) is 0, the FICO Score on average is
590.41. The intercept makes sense to us intuitively, because the FICO score is
positive as it should be. For a more detailed interpretation of the intercept, we will
consult with the data manager.
● Average Months in File: When the average months in file increases by 1 month,
the FICO score increases on average increases by 0.39 points. This makes sense to
us intuitively, because the longer a file is being investigated, the riskier the
applicant probably is. However, this coefficient is not economically significant,
because a 0.39 point increase is miniscule. However, we decided not to remove it
from our model because it contributes 5.6% to our R2
.
● Maximum Delinquency: When the Maximum Delinquency within last 12 Months
is increased by 10, our FICO score is increased by approximately 127.1. This does
not make sense because if the individual has a large maximum delinquency it is
likely that the individual is risky. Since the FICO score increases by approximately
130 points it is economically significant because this has a substantial impact on an
individual's FICO score, which will determine their eligibility to receive a loan.
MISY267: Business Analytics 3
● Maximum Delinquency Ever: When max delinquency ever increases by 10, the
FICO score increases on average by (6.28 *10) 62.8 points. This does not make
sense to us intuitively, because the larger the max delinquency (failure to pay
outstanding debt) ever is, the riskier the applicant probably is. This coefficient is
economically significant, because a 70 point increase can be a deciding factor when
determining an individual’s eligibility to receive a loan.
● Net Fraction Revolving Burden: When the net fraction revolving burden
increases by 1 unit, meaning your balance in relation to your credit increases, the
FICO score decreases on average by 0.95 points. Intuitively this is sensible because
the larger the net fraction revolving burden (outstanding balance/credit) is, the less
risky the applicant probably is. This coefficient is not economically significant,
because a 0.89 point increase is close to 0 so it is too small to be economically
significant. However, we don’t want to remove it from our model because it
contributes 23.32% to R2
.
Tests of Model Assumptions
Assumption 1: Exclude Unnecessary Variables & Good Linear Model
Assumption 1 is a formality and so we will assume that our model is a good linear model.
Assumption 1 passes automatically because of this formality.
Assumption 2: No Perfect Multicollinearity
There is no perfect multicollinearity in our model. Based on the correlation matrix none
of the correlation results were equal to 1 or -1.
Assumption 3: Independent Errors
Based on the Residual Chart shown below, we pass the assumption of independent errors
because there is no pattern in the graph. Intuitively, one person’s FICO score should not be
related to another person’s FICO score so this makes sense. Additionally, in the ACF plot
shown below, we pass the assumption of independent errors because we see that the first
lag does not cross the dotted line threshold. All other lags are insignificant if we view the
first lag as insignificant.
Assumption 4: No Heteroskedasticity
Based on the Fitted Vs Residual plot shown below, we see unequal variances in FICO
scores. Therefore, we fail the assumption of no heteroskedasticity. Since we fail the
assumption of no heteroskedasticity, we would address the violation in one of the following
ways:
● Ignoring
● Bootstrapping
● Or controlling the correlation by including the lagged values as predictors in the
model
MISY267: Business Analytics 4
Assumption 5: Normal Distribution of Errors
According to the histogram of our residuals, the distribution of the errors visually looks
approximately normal. The average error for the training data set is 29.4 and the average
error for the testing data set is 29.5. This illustrates that our model is consistent since both
numbers are similar. Additionally, this is a good model because our average error is
approximately 30 points for someone’s FICO Score, which is not a substantial amount
when dealing with numbers that are in the hundreds.
Removal of Outliers
We determined that outliers should not be a issue for this model. Since a FICO score has
a defined range, we do not believe that any FICO Score would be unusual to see. Also,
we want our model to be able to predict perfect credit scores as well very low credit
scores, so these values should be included in our model.
Prediction In & Out of Samples
The mean absolute value of errors on average are 23.1 for the training set and 23.9 for the
test set. This indicates that our model is very consistent. The model performs almost
exactly as well on data it hasn’t seen before as data that it has seen. A roughly 23 point
average error is very small given the range of a FICO score. This means our model is a
good model. Therefore, our model has both consistency and goodness, so we can assume
that we created a useful model.
Conclusion
Based on the analysis we conducted, we would use the model to predict an individual’s
FICO Score, but would exercise with caution since some assumptions were violated. A
manager or financial firm can use this model in order to assess an individual’s FICO score
and use that information to approve or deny a loan. Lastly, we emphasize that this model
usemethods such as bootstrapping in order to address the violation of no heteroskedasticity.

More Related Content

What's hot

Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
Matt Hansen
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Matt Hansen
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
Ganes Kesari
 
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Matt Hansen
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
Matt Hansen
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
Matt Hansen
 
Toys R Us -- Mffiinity model Test and Learn April 08
Toys R Us -- Mffiinity model Test and Learn April 08Toys R Us -- Mffiinity model Test and Learn April 08
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2
shyaminfo15
 
Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)
Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Matt Hansen
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence Intervals
Matt Hansen
 
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2
vikscarter
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Matt Hansen
 
Best crime predictor: Linear Regression
Best crime predictor: Linear RegressionBest crime predictor: Linear Regression
Best crime predictor: Linear Regression
Jonathan Chauwa
 
ECON104RoughDraft1
ECON104RoughDraft1ECON104RoughDraft1
ECON104RoughDraft1
John Nguyen
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Matt Hansen
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
Matt Hansen
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
Amit Sharma
 

What's hot (19)

Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
 
Toys R Us -- Mffiinity model Test and Learn April 08
Toys R Us -- Mffiinity model Test and Learn April 08Toys R Us -- Mffiinity model Test and Learn April 08
Toys R Us -- Mffiinity model Test and Learn April 08
 
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2
 
Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence Intervals
 
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
 
Best crime predictor: Linear Regression
Best crime predictor: Linear RegressionBest crime predictor: Linear Regression
Best crime predictor: Linear Regression
 
ECON104RoughDraft1
ECON104RoughDraft1ECON104RoughDraft1
ECON104RoughDraft1
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
 

Similar to FICO Credit Risk Data

Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Edu4Sure
 
Hy2208 Final
Hy2208 FinalHy2208 Final
Hy2208 Final
ssuser433675
 
Hy2208 final
Hy2208 finalHy2208 final
Hy2208 final
ssuser433675
 
Econometrics
EconometricsEconometrics
Econometrics
Stephanie King
 
Logistic regression sage
Logistic regression sageLogistic regression sage
Logistic regression sage
Pakistan Gum Industries Pvt. Ltd
 
report
reportreport
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
Eric Esajian
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
AsadJaved304231
 
Linear regression
Linear regressionLinear regression
Linear regression
NilanjanaPradhan2
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
Ramkumar Ravichandran
 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_Final
Eric Esajian
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
KexinZhang22
 
Risk Management in Five Easy Pieces
Risk Management in Five Easy PiecesRisk Management in Five Easy Pieces
Risk Management in Five Easy Pieces
Glen Alleman
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
bhuvana ganesan
 
Pm 6
Pm 6Pm 6
QWE Inc Report_Group 2
QWE Inc Report_Group 2QWE Inc Report_Group 2
QWE Inc Report_Group 2
Xinyu Liu
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spss
Aditya Banerjee
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
Pranov Mishra
 
statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentation
KexinZhang22
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
KexinZhang22
 

Similar to FICO Credit Risk Data (20)

Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
 
Hy2208 Final
Hy2208 FinalHy2208 Final
Hy2208 Final
 
Hy2208 final
Hy2208 finalHy2208 final
Hy2208 final
 
Econometrics
EconometricsEconometrics
Econometrics
 
Logistic regression sage
Logistic regression sageLogistic regression sage
Logistic regression sage
 
report
reportreport
report
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_Final
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
 
Risk Management in Five Easy Pieces
Risk Management in Five Easy PiecesRisk Management in Five Easy Pieces
Risk Management in Five Easy Pieces
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Pm 6
Pm 6Pm 6
Pm 6
 
QWE Inc Report_Group 2
QWE Inc Report_Group 2QWE Inc Report_Group 2
QWE Inc Report_Group 2
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spss
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentation
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
 

Recently uploaded

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 

Recently uploaded (20)

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 

FICO Credit Risk Data

  • 1. MISY 267: BUSINESS ANALYTICS FINAL PROJECT FICO Credit Risk Data MAY 9TH , 2016 UNIVERSITY OF DELAWARE Daniel Amato, Kyle Zaino, Chirag Dhamecha, Erik Caputo, Joey Czechowicz
  • 2. MISY267: Business Analytics 1 Goals Our problem is that we are uncertain what variables influence a person’s FICO Score and so our goal is to ascertain what demographic variables determine a given person’s FICO score. We will use a variable selection method to determine which variables to include or exclude from the model and we will run a linear regression model because we are predicting a continuous variable as opposed to a binary variable. Our team will test the model assumptions and check for any violations within the model. If any violations are found, we will make suggestions on how to address them. This model will then be able to assist a manager or a financial firm that is attempting to determine if an applicant is too risky to be given a loan. Methodologies Data Available & Variables of Interest Once we eliminated the variables with the terms “Application” and “Trades” in their names, we were left with the following variables: risk flag (paid as negotiated flag), interest revenue accumulated, sampling weight, debt to income ratio, number of borrowers, geographic region, prior or current bank relationship, collateral value, loan amount requested, loan to value ratio, FICO score, average months in file, months since most recent delinquency, maximum delinquency and public records in the last twelve months, maximums delinquency ever, number of inquiries in last 6 months, number of inquiries in last 6 months excluding 7 days, net fraction revolving burden, and net fraction installment burden. Variable Selection: Backward Stepwise Selection We used the Backward Stepwise variable selection method where we started running the model with all variables available, and removed one variable at a time, choosing to remove the variables that contributed the least to the model. Since we are working with FICO scores of individual persons, we will choose a model based on the best prediction accuracy and not R-squared value. Therefore, we want a model that yields the smallest errors out-of- sample. This led us to remove all of the variables from our model except for average months in file, maximum delinquency with the last 12 months, maximum delinquency ever, and net fraction revolving burden. Our final model has an R-squared value of approximately 71%, and the aforementioned four variables. Additionally, we looked for interactions between our remaining variables, but found that there were not any interactions that added value to our model. Lastly, we tested non-linear relationships, where we tested transformations of variables within our model. We added the square root of the ‘Max Delinquency in 12 months’ variable into our model, which increased our R2 by about 2%. However, none of the variable transformations we tested substantially increased the R- squared of our model and so they were not included in the model to avoid the risk of overfitting. If a variable added an R-squared value of less than 5% we decided not to include it in our model.
  • 3. MISY267: Business Analytics 2 Proposed Model:  Division of Data: 70% of the data is part our training set and 30% is part of our test set  Random Selection Method: The observations of our data occur over different subjects so we will use randomly selected observations as our test set.  Proposed Linear Regression Model: FICO Score = 590.41 + 0.39*Average Months In File + 12.71*Max Delinquency Last 12 Months + 6.28*Max Delinquency Ever –0.95*Net Fraction Revolving Burden  Response Variable Interpretation: An increase in FICO score decreases the risk of the individual/applicant. ● Multicollinearity: Multicollinearity is when a predictor variable shares a relationship with another predictor variable to the point where we cannot determine how much of an impact increasing one variable has on the model. We looked at the correlation matrix to check for multicollinearity in order to make sure we can interpret our variables directly and found that some of our variables have a relationship above .25 or below -.25. There was a correlation of 0.61 between max delinquency in the last 12 months and max delinquency ever. Although the correlation is larger than 0.25, we don’t want to remove any of these variables as they contribute a lot to the model on their own. However, interpretation of these variables requires careful evaluation. Therefore, multicollinearity is an issue in this model because all variables cannot be interpreted directly and some require more careful evaluation. ● Statistical Significance: All of our variables have stars in the output, therefore they are all statistically significant predictors of FICO scores. ● Intercept Interpretation: When the application has been in the file for 0 months, there has been 0 delinquencies in the last 12 months and ever and the net fraction revolving burden (outstanding balance/credit) is 0, the FICO Score on average is 590.41. The intercept makes sense to us intuitively, because the FICO score is positive as it should be. For a more detailed interpretation of the intercept, we will consult with the data manager. ● Average Months in File: When the average months in file increases by 1 month, the FICO score increases on average increases by 0.39 points. This makes sense to us intuitively, because the longer a file is being investigated, the riskier the applicant probably is. However, this coefficient is not economically significant, because a 0.39 point increase is miniscule. However, we decided not to remove it from our model because it contributes 5.6% to our R2 . ● Maximum Delinquency: When the Maximum Delinquency within last 12 Months is increased by 10, our FICO score is increased by approximately 127.1. This does not make sense because if the individual has a large maximum delinquency it is likely that the individual is risky. Since the FICO score increases by approximately 130 points it is economically significant because this has a substantial impact on an individual's FICO score, which will determine their eligibility to receive a loan.
  • 4. MISY267: Business Analytics 3 ● Maximum Delinquency Ever: When max delinquency ever increases by 10, the FICO score increases on average by (6.28 *10) 62.8 points. This does not make sense to us intuitively, because the larger the max delinquency (failure to pay outstanding debt) ever is, the riskier the applicant probably is. This coefficient is economically significant, because a 70 point increase can be a deciding factor when determining an individual’s eligibility to receive a loan. ● Net Fraction Revolving Burden: When the net fraction revolving burden increases by 1 unit, meaning your balance in relation to your credit increases, the FICO score decreases on average by 0.95 points. Intuitively this is sensible because the larger the net fraction revolving burden (outstanding balance/credit) is, the less risky the applicant probably is. This coefficient is not economically significant, because a 0.89 point increase is close to 0 so it is too small to be economically significant. However, we don’t want to remove it from our model because it contributes 23.32% to R2 . Tests of Model Assumptions Assumption 1: Exclude Unnecessary Variables & Good Linear Model Assumption 1 is a formality and so we will assume that our model is a good linear model. Assumption 1 passes automatically because of this formality. Assumption 2: No Perfect Multicollinearity There is no perfect multicollinearity in our model. Based on the correlation matrix none of the correlation results were equal to 1 or -1. Assumption 3: Independent Errors Based on the Residual Chart shown below, we pass the assumption of independent errors because there is no pattern in the graph. Intuitively, one person’s FICO score should not be related to another person’s FICO score so this makes sense. Additionally, in the ACF plot shown below, we pass the assumption of independent errors because we see that the first lag does not cross the dotted line threshold. All other lags are insignificant if we view the first lag as insignificant. Assumption 4: No Heteroskedasticity Based on the Fitted Vs Residual plot shown below, we see unequal variances in FICO scores. Therefore, we fail the assumption of no heteroskedasticity. Since we fail the assumption of no heteroskedasticity, we would address the violation in one of the following ways: ● Ignoring ● Bootstrapping ● Or controlling the correlation by including the lagged values as predictors in the model
  • 5. MISY267: Business Analytics 4 Assumption 5: Normal Distribution of Errors According to the histogram of our residuals, the distribution of the errors visually looks approximately normal. The average error for the training data set is 29.4 and the average error for the testing data set is 29.5. This illustrates that our model is consistent since both numbers are similar. Additionally, this is a good model because our average error is approximately 30 points for someone’s FICO Score, which is not a substantial amount when dealing with numbers that are in the hundreds. Removal of Outliers We determined that outliers should not be a issue for this model. Since a FICO score has a defined range, we do not believe that any FICO Score would be unusual to see. Also, we want our model to be able to predict perfect credit scores as well very low credit scores, so these values should be included in our model. Prediction In & Out of Samples The mean absolute value of errors on average are 23.1 for the training set and 23.9 for the test set. This indicates that our model is very consistent. The model performs almost exactly as well on data it hasn’t seen before as data that it has seen. A roughly 23 point average error is very small given the range of a FICO score. This means our model is a good model. Therefore, our model has both consistency and goodness, so we can assume that we created a useful model. Conclusion Based on the analysis we conducted, we would use the model to predict an individual’s FICO Score, but would exercise with caution since some assumptions were violated. A manager or financial firm can use this model in order to assess an individual’s FICO score and use that information to approve or deny a loan. Lastly, we emphasize that this model usemethods such as bootstrapping in order to address the violation of no heteroskedasticity.