SlideShare a Scribd company logo
1 of 5
Download to read offline
MISY 267: BUSINESS
ANALYTICS FINAL
PROJECT
FICO Credit Risk Data
MAY 9TH
, 2016
UNIVERSITY OF DELAWARE
Daniel Amato, Kyle Zaino, Chirag Dhamecha, Erik Caputo, Joey Czechowicz
MISY267: Business Analytics 1
Goals
Our problem is that we are uncertain what variables influence a person’s FICO Score and
so our goal is to ascertain what demographic variables determine a given person’s FICO
score. We will use a variable selection method to determine which variables to include or
exclude from the model and we will run a linear regression model because we are
predicting a continuous variable as opposed to a binary variable. Our team will test the
model assumptions and check for any violations within the model. If any violations are
found, we will make suggestions on how to address them. This model will then be able to
assist a manager or a financial firm that is attempting to determine if an applicant is too
risky to be given a loan.
Methodologies
Data Available & Variables of Interest
Once we eliminated the variables with the terms “Application” and “Trades” in their
names, we were left with the following variables: risk flag (paid as negotiated flag), interest
revenue accumulated, sampling weight, debt to income ratio, number of borrowers,
geographic region, prior or current bank relationship, collateral value, loan amount
requested, loan to value ratio, FICO score, average months in file, months since most recent
delinquency, maximum delinquency and public records in the last twelve months,
maximums delinquency ever, number of inquiries in last 6 months, number of inquiries in
last 6 months excluding 7 days, net fraction revolving burden, and net fraction installment
burden.
Variable Selection: Backward Stepwise Selection
We used the Backward Stepwise variable selection method where we started running the
model with all variables available, and removed one variable at a time, choosing to remove
the variables that contributed the least to the model. Since we are working with FICO scores
of individual persons, we will choose a model based on the best prediction accuracy and
not R-squared value. Therefore, we want a model that yields the smallest errors out-of-
sample. This led us to remove all of the variables from our model except for average
months in file, maximum delinquency with the last 12 months, maximum delinquency ever,
and net fraction revolving burden. Our final model has an R-squared value of
approximately 71%, and the aforementioned four variables. Additionally, we looked for
interactions between our remaining variables, but found that there were not any interactions
that added value to our model. Lastly, we tested non-linear relationships, where we tested
transformations of variables within our model. We added the square root of the ‘Max
Delinquency in 12 months’ variable into our model, which increased our R2
by about 2%.
However, none of the variable transformations we tested substantially increased the R-
squared of our model and so they were not included in the model to avoid the risk of
overfitting. If a variable added an R-squared value of less than 5% we decided not to
include it in our model.
MISY267: Business Analytics 2
Proposed Model:
 Division of Data: 70% of the data is part our training set and 30% is part of our
test set
 Random Selection Method: The observations of our data occur over different
subjects so we will use randomly selected observations as our test set.
 Proposed Linear Regression Model: FICO Score = 590.41 + 0.39*Average
Months In File + 12.71*Max Delinquency Last 12 Months + 6.28*Max
Delinquency Ever –0.95*Net Fraction Revolving Burden
 Response Variable Interpretation: An increase in FICO score decreases the risk
of the individual/applicant.
● Multicollinearity: Multicollinearity is when a predictor variable shares a
relationship with another predictor variable to the point where we cannot determine
how much of an impact increasing one variable has on the model. We looked at the
correlation matrix to check for multicollinearity in order to make sure we can
interpret our variables directly and found that some of our variables have a
relationship above .25 or below -.25. There was a correlation of 0.61 between max
delinquency in the last 12 months and max delinquency ever. Although the
correlation is larger than 0.25, we don’t want to remove any of these variables as
they contribute a lot to the model on their own. However, interpretation of these
variables requires careful evaluation. Therefore, multicollinearity is an issue in this
model because all variables cannot be interpreted directly and some require more
careful evaluation.
● Statistical Significance: All of our variables have stars in the output, therefore they
are all statistically significant predictors of FICO scores.
● Intercept Interpretation: When the application has been in the file for 0 months,
there has been 0 delinquencies in the last 12 months and ever and the net fraction
revolving burden (outstanding balance/credit) is 0, the FICO Score on average is
590.41. The intercept makes sense to us intuitively, because the FICO score is
positive as it should be. For a more detailed interpretation of the intercept, we will
consult with the data manager.
● Average Months in File: When the average months in file increases by 1 month,
the FICO score increases on average increases by 0.39 points. This makes sense to
us intuitively, because the longer a file is being investigated, the riskier the
applicant probably is. However, this coefficient is not economically significant,
because a 0.39 point increase is miniscule. However, we decided not to remove it
from our model because it contributes 5.6% to our R2
.
● Maximum Delinquency: When the Maximum Delinquency within last 12 Months
is increased by 10, our FICO score is increased by approximately 127.1. This does
not make sense because if the individual has a large maximum delinquency it is
likely that the individual is risky. Since the FICO score increases by approximately
130 points it is economically significant because this has a substantial impact on an
individual's FICO score, which will determine their eligibility to receive a loan.
MISY267: Business Analytics 3
● Maximum Delinquency Ever: When max delinquency ever increases by 10, the
FICO score increases on average by (6.28 *10) 62.8 points. This does not make
sense to us intuitively, because the larger the max delinquency (failure to pay
outstanding debt) ever is, the riskier the applicant probably is. This coefficient is
economically significant, because a 70 point increase can be a deciding factor when
determining an individual’s eligibility to receive a loan.
● Net Fraction Revolving Burden: When the net fraction revolving burden
increases by 1 unit, meaning your balance in relation to your credit increases, the
FICO score decreases on average by 0.95 points. Intuitively this is sensible because
the larger the net fraction revolving burden (outstanding balance/credit) is, the less
risky the applicant probably is. This coefficient is not economically significant,
because a 0.89 point increase is close to 0 so it is too small to be economically
significant. However, we don’t want to remove it from our model because it
contributes 23.32% to R2
.
Tests of Model Assumptions
Assumption 1: Exclude Unnecessary Variables & Good Linear Model
Assumption 1 is a formality and so we will assume that our model is a good linear model.
Assumption 1 passes automatically because of this formality.
Assumption 2: No Perfect Multicollinearity
There is no perfect multicollinearity in our model. Based on the correlation matrix none
of the correlation results were equal to 1 or -1.
Assumption 3: Independent Errors
Based on the Residual Chart shown below, we pass the assumption of independent errors
because there is no pattern in the graph. Intuitively, one person’s FICO score should not be
related to another person’s FICO score so this makes sense. Additionally, in the ACF plot
shown below, we pass the assumption of independent errors because we see that the first
lag does not cross the dotted line threshold. All other lags are insignificant if we view the
first lag as insignificant.
Assumption 4: No Heteroskedasticity
Based on the Fitted Vs Residual plot shown below, we see unequal variances in FICO
scores. Therefore, we fail the assumption of no heteroskedasticity. Since we fail the
assumption of no heteroskedasticity, we would address the violation in one of the following
ways:
● Ignoring
● Bootstrapping
● Or controlling the correlation by including the lagged values as predictors in the
model
MISY267: Business Analytics 4
Assumption 5: Normal Distribution of Errors
According to the histogram of our residuals, the distribution of the errors visually looks
approximately normal. The average error for the training data set is 29.4 and the average
error for the testing data set is 29.5. This illustrates that our model is consistent since both
numbers are similar. Additionally, this is a good model because our average error is
approximately 30 points for someone’s FICO Score, which is not a substantial amount
when dealing with numbers that are in the hundreds.
Removal of Outliers
We determined that outliers should not be a issue for this model. Since a FICO score has
a defined range, we do not believe that any FICO Score would be unusual to see. Also,
we want our model to be able to predict perfect credit scores as well very low credit
scores, so these values should be included in our model.
Prediction In & Out of Samples
The mean absolute value of errors on average are 23.1 for the training set and 23.9 for the
test set. This indicates that our model is very consistent. The model performs almost
exactly as well on data it hasn’t seen before as data that it has seen. A roughly 23 point
average error is very small given the range of a FICO score. This means our model is a
good model. Therefore, our model has both consistency and goodness, so we can assume
that we created a useful model.
Conclusion
Based on the analysis we conducted, we would use the model to predict an individual’s
FICO Score, but would exercise with caution since some assumptions were violated. A
manager or financial firm can use this model in order to assess an individual’s FICO score
and use that information to approve or deny a loan. Lastly, we emphasize that this model
usemethods such as bootstrapping in order to address the violation of no heteroskedasticity.

More Related Content

What's hot

Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Matt Hansen
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?Ganes Kesari
 
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Matt Hansen
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Matt Hansen
 
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2shyaminfo15
 
Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsMatt Hansen
 
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2vikscarter
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Matt Hansen
 
Best crime predictor: Linear Regression
Best crime predictor: Linear RegressionBest crime predictor: Linear Regression
Best crime predictor: Linear RegressionJonathan Chauwa
 
ECON104RoughDraft1
ECON104RoughDraft1ECON104RoughDraft1
ECON104RoughDraft1John Nguyen
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Matt Hansen
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Matt Hansen
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
 

What's hot (19)

Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
 
Toys R Us -- Mffiinity model Test and Learn April 08
Toys R Us -- Mffiinity model Test and Learn April 08Toys R Us -- Mffiinity model Test and Learn April 08
Toys R Us -- Mffiinity model Test and Learn April 08
 
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2
 
Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)Hypothesis Testing: Spread (Compare 1:Standard)
Hypothesis Testing: Spread (Compare 1:Standard)
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence Intervals
 
Stayer mat 510 final exam2
Stayer mat 510 final exam2Stayer mat 510 final exam2
Stayer mat 510 final exam2
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
 
Best crime predictor: Linear Regression
Best crime predictor: Linear RegressionBest crime predictor: Linear Regression
Best crime predictor: Linear Regression
 
ECON104RoughDraft1
ECON104RoughDraft1ECON104RoughDraft1
ECON104RoughDraft1
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
 

Similar to FICO Credit Risk Data

Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptEdu4Sure
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationAsadJaved304231
 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_FinalEric Esajian
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project presentKexinZhang22
 
Risk Management in Five Easy Pieces
Risk Management in Five Easy PiecesRisk Management in Five Easy Pieces
Risk Management in Five Easy PiecesGlen Alleman
 
QWE Inc Report_Group 2
QWE Inc Report_Group 2QWE Inc Report_Group 2
QWE Inc Report_Group 2Xinyu Liu
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spssAditya Banerjee
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentationKexinZhang22
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project presentKexinZhang22
 

Similar to FICO Credit Risk Data (20)

Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
 
Hy2208 final
Hy2208 finalHy2208 final
Hy2208 final
 
Hy2208 Final
Hy2208 FinalHy2208 Final
Hy2208 Final
 
Econometrics
EconometricsEconometrics
Econometrics
 
Logistic regression sage
Logistic regression sageLogistic regression sage
Logistic regression sage
 
report
reportreport
report
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_Final
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
 
Risk Management in Five Easy Pieces
Risk Management in Five Easy PiecesRisk Management in Five Easy Pieces
Risk Management in Five Easy Pieces
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Pm 6
Pm 6Pm 6
Pm 6
 
QWE Inc Report_Group 2
QWE Inc Report_Group 2QWE Inc Report_Group 2
QWE Inc Report_Group 2
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spss
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentation
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

FICO Credit Risk Data

  • 1. MISY 267: BUSINESS ANALYTICS FINAL PROJECT FICO Credit Risk Data MAY 9TH , 2016 UNIVERSITY OF DELAWARE Daniel Amato, Kyle Zaino, Chirag Dhamecha, Erik Caputo, Joey Czechowicz
  • 2. MISY267: Business Analytics 1 Goals Our problem is that we are uncertain what variables influence a person’s FICO Score and so our goal is to ascertain what demographic variables determine a given person’s FICO score. We will use a variable selection method to determine which variables to include or exclude from the model and we will run a linear regression model because we are predicting a continuous variable as opposed to a binary variable. Our team will test the model assumptions and check for any violations within the model. If any violations are found, we will make suggestions on how to address them. This model will then be able to assist a manager or a financial firm that is attempting to determine if an applicant is too risky to be given a loan. Methodologies Data Available & Variables of Interest Once we eliminated the variables with the terms “Application” and “Trades” in their names, we were left with the following variables: risk flag (paid as negotiated flag), interest revenue accumulated, sampling weight, debt to income ratio, number of borrowers, geographic region, prior or current bank relationship, collateral value, loan amount requested, loan to value ratio, FICO score, average months in file, months since most recent delinquency, maximum delinquency and public records in the last twelve months, maximums delinquency ever, number of inquiries in last 6 months, number of inquiries in last 6 months excluding 7 days, net fraction revolving burden, and net fraction installment burden. Variable Selection: Backward Stepwise Selection We used the Backward Stepwise variable selection method where we started running the model with all variables available, and removed one variable at a time, choosing to remove the variables that contributed the least to the model. Since we are working with FICO scores of individual persons, we will choose a model based on the best prediction accuracy and not R-squared value. Therefore, we want a model that yields the smallest errors out-of- sample. This led us to remove all of the variables from our model except for average months in file, maximum delinquency with the last 12 months, maximum delinquency ever, and net fraction revolving burden. Our final model has an R-squared value of approximately 71%, and the aforementioned four variables. Additionally, we looked for interactions between our remaining variables, but found that there were not any interactions that added value to our model. Lastly, we tested non-linear relationships, where we tested transformations of variables within our model. We added the square root of the ‘Max Delinquency in 12 months’ variable into our model, which increased our R2 by about 2%. However, none of the variable transformations we tested substantially increased the R- squared of our model and so they were not included in the model to avoid the risk of overfitting. If a variable added an R-squared value of less than 5% we decided not to include it in our model.
  • 3. MISY267: Business Analytics 2 Proposed Model:  Division of Data: 70% of the data is part our training set and 30% is part of our test set  Random Selection Method: The observations of our data occur over different subjects so we will use randomly selected observations as our test set.  Proposed Linear Regression Model: FICO Score = 590.41 + 0.39*Average Months In File + 12.71*Max Delinquency Last 12 Months + 6.28*Max Delinquency Ever –0.95*Net Fraction Revolving Burden  Response Variable Interpretation: An increase in FICO score decreases the risk of the individual/applicant. ● Multicollinearity: Multicollinearity is when a predictor variable shares a relationship with another predictor variable to the point where we cannot determine how much of an impact increasing one variable has on the model. We looked at the correlation matrix to check for multicollinearity in order to make sure we can interpret our variables directly and found that some of our variables have a relationship above .25 or below -.25. There was a correlation of 0.61 between max delinquency in the last 12 months and max delinquency ever. Although the correlation is larger than 0.25, we don’t want to remove any of these variables as they contribute a lot to the model on their own. However, interpretation of these variables requires careful evaluation. Therefore, multicollinearity is an issue in this model because all variables cannot be interpreted directly and some require more careful evaluation. ● Statistical Significance: All of our variables have stars in the output, therefore they are all statistically significant predictors of FICO scores. ● Intercept Interpretation: When the application has been in the file for 0 months, there has been 0 delinquencies in the last 12 months and ever and the net fraction revolving burden (outstanding balance/credit) is 0, the FICO Score on average is 590.41. The intercept makes sense to us intuitively, because the FICO score is positive as it should be. For a more detailed interpretation of the intercept, we will consult with the data manager. ● Average Months in File: When the average months in file increases by 1 month, the FICO score increases on average increases by 0.39 points. This makes sense to us intuitively, because the longer a file is being investigated, the riskier the applicant probably is. However, this coefficient is not economically significant, because a 0.39 point increase is miniscule. However, we decided not to remove it from our model because it contributes 5.6% to our R2 . ● Maximum Delinquency: When the Maximum Delinquency within last 12 Months is increased by 10, our FICO score is increased by approximately 127.1. This does not make sense because if the individual has a large maximum delinquency it is likely that the individual is risky. Since the FICO score increases by approximately 130 points it is economically significant because this has a substantial impact on an individual's FICO score, which will determine their eligibility to receive a loan.
  • 4. MISY267: Business Analytics 3 ● Maximum Delinquency Ever: When max delinquency ever increases by 10, the FICO score increases on average by (6.28 *10) 62.8 points. This does not make sense to us intuitively, because the larger the max delinquency (failure to pay outstanding debt) ever is, the riskier the applicant probably is. This coefficient is economically significant, because a 70 point increase can be a deciding factor when determining an individual’s eligibility to receive a loan. ● Net Fraction Revolving Burden: When the net fraction revolving burden increases by 1 unit, meaning your balance in relation to your credit increases, the FICO score decreases on average by 0.95 points. Intuitively this is sensible because the larger the net fraction revolving burden (outstanding balance/credit) is, the less risky the applicant probably is. This coefficient is not economically significant, because a 0.89 point increase is close to 0 so it is too small to be economically significant. However, we don’t want to remove it from our model because it contributes 23.32% to R2 . Tests of Model Assumptions Assumption 1: Exclude Unnecessary Variables & Good Linear Model Assumption 1 is a formality and so we will assume that our model is a good linear model. Assumption 1 passes automatically because of this formality. Assumption 2: No Perfect Multicollinearity There is no perfect multicollinearity in our model. Based on the correlation matrix none of the correlation results were equal to 1 or -1. Assumption 3: Independent Errors Based on the Residual Chart shown below, we pass the assumption of independent errors because there is no pattern in the graph. Intuitively, one person’s FICO score should not be related to another person’s FICO score so this makes sense. Additionally, in the ACF plot shown below, we pass the assumption of independent errors because we see that the first lag does not cross the dotted line threshold. All other lags are insignificant if we view the first lag as insignificant. Assumption 4: No Heteroskedasticity Based on the Fitted Vs Residual plot shown below, we see unequal variances in FICO scores. Therefore, we fail the assumption of no heteroskedasticity. Since we fail the assumption of no heteroskedasticity, we would address the violation in one of the following ways: ● Ignoring ● Bootstrapping ● Or controlling the correlation by including the lagged values as predictors in the model
  • 5. MISY267: Business Analytics 4 Assumption 5: Normal Distribution of Errors According to the histogram of our residuals, the distribution of the errors visually looks approximately normal. The average error for the training data set is 29.4 and the average error for the testing data set is 29.5. This illustrates that our model is consistent since both numbers are similar. Additionally, this is a good model because our average error is approximately 30 points for someone’s FICO Score, which is not a substantial amount when dealing with numbers that are in the hundreds. Removal of Outliers We determined that outliers should not be a issue for this model. Since a FICO score has a defined range, we do not believe that any FICO Score would be unusual to see. Also, we want our model to be able to predict perfect credit scores as well very low credit scores, so these values should be included in our model. Prediction In & Out of Samples The mean absolute value of errors on average are 23.1 for the training set and 23.9 for the test set. This indicates that our model is very consistent. The model performs almost exactly as well on data it hasn’t seen before as data that it has seen. A roughly 23 point average error is very small given the range of a FICO score. This means our model is a good model. Therefore, our model has both consistency and goodness, so we can assume that we created a useful model. Conclusion Based on the analysis we conducted, we would use the model to predict an individual’s FICO Score, but would exercise with caution since some assumptions were violated. A manager or financial firm can use this model in order to assess an individual’s FICO score and use that information to approve or deny a loan. Lastly, we emphasize that this model usemethods such as bootstrapping in order to address the violation of no heteroskedasticity.