SlideShare a Scribd company logo
1 of 12
StephanusDarmapuspita
Eric Esajian
DSO 528
Executive Summary
Objective: Trojan Financial Services has offered thousands of lines of home loans to their
clients. Many of these lines of credit have defaulted on their loans. The objective here is to
build a model to predict whether or not an applicant will default on their loan. Currently,
there are 10,000 loan applications per month that the approver receives. Thus, the model will
aid the loan officer and help Trojan Financial decide whether or not to approve, or deny the
loan based on a number of decision variables that are given. Every loan that is offered, then
defaulted on, costs the company around $20,000 on average. Therefore, a sound logistic
regression model needs to be constructed in order to identify which applicants are likely to
default on their home equity line of credit issued by Trojan Financial.
JMP Model: The JMP model will consider a number of variables to fit this model. It will
focus on those variables that predict whether a loan WILL default, as opposed to whether it
will the loan will be successful.
Key Insights: It is imperative that only the variables that will positively affect this model be
chosen. Those variables must have a significance level greater than 0.05, though even lower
p-values are more ideal. However, it must be kept in mind that in addition to variable with
low value, we should use business insight to consider other variable that can improve the
prediction even though it has slightly higher p-value. The general idea is to select variables
that will predict whether the loan will default which costs Trojan Financial on average
$20,000 a loan. The more variables we add that are significant, the greater the chance of
finding a better model.
Your Best Model: Our best JMP model used the “Bad” as the Y-axis. Since we want to
predict which loans will be bad. As for the x-intercepts, our best model included the
following variables: Reason[DebtCon], Derog, Delinq, Debtinc, Log Age, and Loan/Value.
These variables provided us with the strongest model. The R-square was reasonable, though
not ideal at 0.22, and the probably of most of our variables we used is greater than 0.05, four
of which were actually less than 0.001. The formula we used to narrow down the probability
of default was the following if statement: If the probability of a bad loan is greater than 0.22,
then we will give it a 1. If not, we will give 0. With all this set in place, we were able to
derive over $2.4 million in training data, and well over $2.5 million in testing data.
Key Changes Made: Some of the key changes made included looking at the probability at a
higher perspective. The default probability was typically set at 0.15, however, it was found
that a probability 0.22 gave the model higher total profit.
Key Insights: Some key insights that were found included the amount of variables that are
used in this model. Standard thought tells us that the less variables used, would provide us
with a better, simpler model. This however was not the case. What was found was that the
more variables that were included in the model, the better the profit as well as a higher R-
square.
StephanusDarmapuspita
Eric Esajian
DSO 528
Why your model is better? This model is better because it will provide Trojan Financial
Services between 24 and 25 million dollars. The overall model show very low p-value and
the r-square of the overall model is around was 0.22, so we are confident that this will capture
all the bad loans, with keeping all the profitable lines of credit. One unique characteristic
about this model is that both false positives are extremely low, around 4%. This means we
can safely say that this model will do a good job of staying true, and that we have a very low
chance of being wrong with the variables chosen.
What is the lift (as a ratio) provided by your model compared to Baseline Model for
both training and testing? What is the increase in net dollar amount compared to the
Baseline Model for both training and testing? This model’s lift with respect to the baseline
JMP Model was 1.44 for the training data, and 0.0005 for the testing data. Under the
propensity table, it is shown that the lift with respect to the baseline, our training data is at
9.7, as opposed to no lift for the testing data.
Conclusion: We believe that this is good model because we used the variables that were
given and added in two new variables which proved to be significant in improving the total
profit. JMP Go has done a good job, but if we add more of a business perspective into this,
we may be able to come up with a better solution. We could probably do a better job beating
JMP by adding in more variables that are based on tribal knowledge, and could have used
less of the variables that seemed to be insignificant.
Base Model
i) Statistical KPIs of JMP Model – From JMP Printout
Measure Training Validation Definition
Entropy RSquare 0.2180 0.2641 1-Loglike(model)/Loglike(0)
GeneralizedRSquare 0.2751 0.3253
(1-(L(0)/L(model))^(2/n))/(1-
L(0)^(2/n))
Mean -Log p 0.2490 0.2226 ∑ -Log(ρ[j])/n
RMSE 0.2576 0.2420 √ ∑(y[j]-ρ[j])²/n
Mean Abs Dev 0.1351 0.1202 ∑ |y[j]-ρ[j]|/n
Misclassification
Rate 0.08 0.0720 ∑ (ρ[j]≠ρMax)/n
N 1000 1000 n
ii) Statistical KPIs of JMP Model – From Excel Printout
Other Metrics Training Validation
Accuracy % 86.80% 86.70%
StephanusDarmapuspita
Eric Esajian
DSO 528
True Positive Rate 50.52% 56.67%
False PositiveRate 9.30% 10.33%
Sensitivity( True PositiveRate) 50.52% 56.67%
Specificity(True Negative Rate) 90.70% 89.67%
iii) a) Business KPIs of JMP Model – Training
Predictednumberof GoodLoans = 8670
Upper limitforLoans = 10000
Actual numberof approvedloans = 8670
Propensityof GoodLoan = 94.464%
Propensityof BadLoan = 5.536%
Total Profit = $ 23,160,000
b) Business KPIs of JMP Model – Testing
Predictednumberof GoodLoans = 8550
Upper limitforLoans = 10000
Actual numberof approvedloans = 8550
Propensityof GoodLoan = 95.439%
Propensityof BadLoan = 4.561%
Total Profit = $ 24,840,000
iv) Interpret the Model (Base Logistic Regression) – From Business Point of view &
Statistical Point of view
From business point of view, JMP has done a good job identifying the variables that are
indeed significant to determine whether loan candidate will likely be a good loan or bad loan.
Variables such as DEROG, DELINQ, and DEBTINC indeed indicate whether loan candidate
will lead to good or bad loan. However, from business perspective, it will be foolish to
determine whether someone will be a good or bad loan candidate just from these 4 variables.
There are many other variables that can be considered such as the reason of getting Loan and
also the amount of total Loan with respect to the total value of the home. For example, people
StephanusDarmapuspita
Eric Esajian
DSO 528
that seek loan to pay out other loan obviously more likely to default in its new loan compare
to people that seek loan to do home improvement.
From statistical point of view, all variables that are chosen are statistically significant and we
think that this is a good think from statistical point of view. However, we note that CLAGE
variable show a skewed distribution. We might need to do something about this variable in
order to get a better regression result.
v) Confusion Matrix for Training (cut off probability of 0.15)
Predicted
GoodLoan BadLoan
GoodLoan 819 84 903
Actual BadLoan 48 49 97
867 133 1000
vi) Confusion Matrix for Testing
Predicted
GoodLoan BadLoan
GoodLoan 816 94 910
Actual BadLoan 39 51 90
855 145 1000
v) Lift Table
Lift Table in Dollars Training Testing
Lift with respect to Baseline - JMP Model 1.385167464 1.485645933
Lift with respect to Baseline - My Best Model 1.437799043 1.523923445
Lift with respect to JMP Model - My
Contribution 1.037996546 1.100172712
Overall Lift with respect to Baseline -My Best
Model 1.437799043 1.523923445
Lift Table in Propensity Training Testing
Lift with respect to Baseline - JMP Model 9.738522456 9.839030566
Lift with respect to Baseline - My Best Model 9.714728021 9.789862355
vii) Attach JMP Printout
StephanusDarmapuspita
Eric Esajian
DSO 528
StephanusDarmapuspita
Eric Esajian
DSO 528
Testing ConfusionMatrix
Best Model
i) Statistical KPIs of JMP Model – From JMP Printout
Measure Training Validation Definition
Entropy RSquare 0.2214 0.1758 1-Loglike(model)/Loglike(0)
GeneralizedRSquare
0.2792 0.2222 (1-(L(0)/L(model))^(2/n))/(1-
L(0)^(2/n))
Mean -Log p 0.2479 0.2494 ∑ -Log(ρ[j])/n
RMSE 0.2570 0.26 √ ∑(y[j]-ρ[j])²/n
StephanusDarmapuspita
Eric Esajian
DSO 528
Mean Abs Dev 0.1342 0.1353 ∑ |y[j]-ρ[j]|/n
Misclassification
Rate
0.0790 0.0790
∑ (ρ[j]≠ρMax)/n
N 1000 1000 n
ii) Statistical KPIs of JMP Model – From Excel Printout
Other Metrics Training Validation
Accuracy % 91.00% 91.10%
True Positive Rate 45.36% 48.89%
False PositiveRate 4.10% 4.73%
Sensitivity( True PositiveRate) 45.36% 48.89%
Specificity(True Negative Rate) 95.90% 95.27%
iii) a)Business KPIs of JMP Model – Training
Predictednumberof GoodLoans = 9190
Upper limitforLoans = 10000
Actual numberof approvedloans = 9190
Propensityof GoodLoan = 94.233%
Propensityof BadLoan = 5.767%
Total Profit = $ 24,040,000
b) Business KPIs of JMP Model – Testing
Predictednumberof GoodLoans = 9130
Upper limitforLoans = 10000
Actual numberof approvedloans = 9130
Propensityof GoodLoan = 94.962%
Propensityof BadLoan = 5.038%
Total Profit = $ 25,480,000
iv) Interpret the Model (MyBest Model) – From Business Point of view & Statistical
Point of view
StephanusDarmapuspita
Eric Esajian
DSO 528
From business perspective, we decided that we should definitely include DEROG, DEBTINC,
and DELINQ variables as it is significant from business perspective. In addition to that, we
decided to include REASON as we think that it is one of the variables that are significant
from business perspective. As explained, loan candidates are more likely to default on loan if
the motive to take loan is for refinancing of debt rather than to do home improvement. Total
loan to value ratio also play important role. From business perspective, very high loan to
value ratio means that it is more likely for people to simply default from loan and let the
foreclosure happen if the value of their home fall way below the loan value itself.
From statistical point of view, we decided to do log operation on CLAGE to normalize it. It is
done under “Log Age” column. It turned out that this variable is statistically significant as
well. Combination of the statistic and business insight from above allow us to produce better
model that beat the JMP go-option stepwise regression at 1.04 lift on training and 1.1 lift on
testing.
v) Confusion Matrix for Training (cut off probability of 0.22)
Predicted
GoodLoan BadLoan
GoodLoan 866 37 903
Actual BadLoan 53 44 97
919 81 1000
vi) Confusion Matrix for Testing
Predicted
GoodLoan BadLoan
GoodLoan 867 43 910
Actual BadLoan 46 44 90
913 87 1000
v) Lift Table
Lift Table in Dollars Training Testing
Lift with respect to Baseline - JMP Model 1.385167464 1.485645933
Lift with respect to Baseline - My Best Model 1.437799043 1.523923445
Lift with respect to JMP Model - My
Contribution 1.037996546 1.100172712
Overall Lift with respect to Baseline -My Best
Model 1.437799043 1.523923445
Lift Table in Propensity Training Testing
Lift with respect to Baseline - JMP Model 9.738522456 9.839030566
Lift with respect to Baseline - My Best Model 9.714728021 9.789862355
vii) Attach JMP Printout (JMP Print out with some explanation is as per below)
StephanusDarmapuspita
Eric Esajian
DSO 528
Nominal Logistic Fit for BAD
Converged in Gradient, 7 iterations
Whole Model Test
Model -LogLikelihood DF ChiSquare Prob>ChiSq
Difference 70.49985 6 140.9997 <.0001*
Full 247.94100
Reduced 318.44085
 With the best model that we chose, we can see that the probability of this model to be
true is high, with respect to the p-value where the probability is <0.0001.
RSquare (U) 0.2214
AICc 509.995
BIC 544.236
Observations (or Sum Wgts) 1000
 The RSquare here is acceptable although it may not be ideal. The higher the RSquare
the better typically, however in this case where there are over 10,000 approvals that
need to take place every month, and less than 10 variables we can choose from, we
believe that the score of 0.22 is acceptable.
Measure Training Definition
Entropy RSquare 0.2214 1-Loglike(model)/Loglike(0)
Generalized RSquare 0.2792 (1-(L(0)/L(model))^(2/n))/(1-L(0)^(2/n))
Mean -Log p 0.2479 ∑ -Log(ρ[j])/n
RMSE 0.2570 √ ∑(y[j]-ρ[j])²/n
Mean Abs Dev 0.1342 ∑ |y[j]-ρ[j]|/n
Misclassification Rate 0.0790 ∑ (ρ[j]≠ρMax)/n
N 1000 n
Lack Of Fit
Source DF -LogLikelihood ChiSquare
Lack Of Fit 993 247.94100 495.882
Saturated 999 0.00000 Prob>ChiSq
Fitted 6 247.94100 1.0000
Parameter Estimates
Term Estimate Std Error ChiSquare Prob>ChiSq
Intercept 0.50243494 1.416252 0.13 0.7228
REASON[DebtCon] -0.0903551 0.1352828 0.45 0.5042
DEROG -0.8309048 0.1616376 26.43 <.0001*
DELINQ -0.6897292 0.1260756 29.93 <.0001*
DEBTINC -0.0859525 0.0165176 27.08 <.0001*
Log Age 2.20439479 0.5301889 17.29 <.0001*
Loan/Value 0.67044572 0.6135404 1.19 0.2745
StephanusDarmapuspita
Eric Esajian
DSO 528
For log odds of 0/1
Effect Likelihood Ratio Tests
Source Nparm DF L-R
ChiSquare
Prob>ChiSq
REASON 1 1 0.45359543 0.5006
DEROG 1 1 32.6890439 <.0001*
DELINQ 1 1 29.3558579 <.0001*
DEBTINC 1 1 32.4278218 <.0001*
Log Age 1 1 17.48342 <.0001*
Loan/Value 1 1 2.66649466 0.1025
StephanusDarmapuspita
Eric Esajian
DSO 528
Testing Confusion Matrix
StephanusDarmapuspita
Eric Esajian
DSO 528
Training Confusion Matrix

More Related Content

Similar to Case2_Best_Model_Final

Drsk bloomberg deafult risk
Drsk bloomberg deafult riskDrsk bloomberg deafult risk
Drsk bloomberg deafult riskdenurpramana
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docxShiraPrater50
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxcarlstromcurtis
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extractJefferson Lynch
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmBill Fite
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine LearningBill Fite
 
Minimize Fraud And Maximize Revenue Deposit Risk Scoring
Minimize Fraud And Maximize Revenue   Deposit Risk ScoringMinimize Fraud And Maximize Revenue   Deposit Risk Scoring
Minimize Fraud And Maximize Revenue Deposit Risk Scoringjiz95001
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Gaetan Lion
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine LearningRupaDutta3
 
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...Eesti Pank
 
Choosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning ModelChoosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning ModelExperian
 
DAT 520 Milestone Three Guidelines and Rubric In this m.docx
DAT 520 Milestone Three Guidelines and Rubric  In this m.docxDAT 520 Milestone Three Guidelines and Rubric  In this m.docx
DAT 520 Milestone Three Guidelines and Rubric In this m.docxsimonithomas47935
 
Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019Bruce Upbin
 

Similar to Case2_Best_Model_Final (20)

Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Drsk bloomberg deafult risk
Drsk bloomberg deafult riskDrsk bloomberg deafult risk
Drsk bloomberg deafult risk
 
Model Validation
Model Validation Model Validation
Model Validation
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docx
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extract
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning Algorithm
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine Learning
 
FICO Credit Risk Data
FICO Credit Risk DataFICO Credit Risk Data
FICO Credit Risk Data
 
Minimize Fraud And Maximize Revenue Deposit Risk Scoring
Minimize Fraud And Maximize Revenue   Deposit Risk ScoringMinimize Fraud And Maximize Revenue   Deposit Risk Scoring
Minimize Fraud And Maximize Revenue Deposit Risk Scoring
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine Learning
 
Regresión
RegresiónRegresión
Regresión
 
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
 
Choosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning ModelChoosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning Model
 
DAT 520 Milestone Three Guidelines and Rubric In this m.docx
DAT 520 Milestone Three Guidelines and Rubric  In this m.docxDAT 520 Milestone Three Guidelines and Rubric  In this m.docx
DAT 520 Milestone Three Guidelines and Rubric In this m.docx
 
Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019
 

Case2_Best_Model_Final

  • 1. StephanusDarmapuspita Eric Esajian DSO 528 Executive Summary Objective: Trojan Financial Services has offered thousands of lines of home loans to their clients. Many of these lines of credit have defaulted on their loans. The objective here is to build a model to predict whether or not an applicant will default on their loan. Currently, there are 10,000 loan applications per month that the approver receives. Thus, the model will aid the loan officer and help Trojan Financial decide whether or not to approve, or deny the loan based on a number of decision variables that are given. Every loan that is offered, then defaulted on, costs the company around $20,000 on average. Therefore, a sound logistic regression model needs to be constructed in order to identify which applicants are likely to default on their home equity line of credit issued by Trojan Financial. JMP Model: The JMP model will consider a number of variables to fit this model. It will focus on those variables that predict whether a loan WILL default, as opposed to whether it will the loan will be successful. Key Insights: It is imperative that only the variables that will positively affect this model be chosen. Those variables must have a significance level greater than 0.05, though even lower p-values are more ideal. However, it must be kept in mind that in addition to variable with low value, we should use business insight to consider other variable that can improve the prediction even though it has slightly higher p-value. The general idea is to select variables that will predict whether the loan will default which costs Trojan Financial on average $20,000 a loan. The more variables we add that are significant, the greater the chance of finding a better model. Your Best Model: Our best JMP model used the “Bad” as the Y-axis. Since we want to predict which loans will be bad. As for the x-intercepts, our best model included the following variables: Reason[DebtCon], Derog, Delinq, Debtinc, Log Age, and Loan/Value. These variables provided us with the strongest model. The R-square was reasonable, though not ideal at 0.22, and the probably of most of our variables we used is greater than 0.05, four of which were actually less than 0.001. The formula we used to narrow down the probability of default was the following if statement: If the probability of a bad loan is greater than 0.22, then we will give it a 1. If not, we will give 0. With all this set in place, we were able to derive over $2.4 million in training data, and well over $2.5 million in testing data. Key Changes Made: Some of the key changes made included looking at the probability at a higher perspective. The default probability was typically set at 0.15, however, it was found that a probability 0.22 gave the model higher total profit. Key Insights: Some key insights that were found included the amount of variables that are used in this model. Standard thought tells us that the less variables used, would provide us with a better, simpler model. This however was not the case. What was found was that the more variables that were included in the model, the better the profit as well as a higher R- square.
  • 2. StephanusDarmapuspita Eric Esajian DSO 528 Why your model is better? This model is better because it will provide Trojan Financial Services between 24 and 25 million dollars. The overall model show very low p-value and the r-square of the overall model is around was 0.22, so we are confident that this will capture all the bad loans, with keeping all the profitable lines of credit. One unique characteristic about this model is that both false positives are extremely low, around 4%. This means we can safely say that this model will do a good job of staying true, and that we have a very low chance of being wrong with the variables chosen. What is the lift (as a ratio) provided by your model compared to Baseline Model for both training and testing? What is the increase in net dollar amount compared to the Baseline Model for both training and testing? This model’s lift with respect to the baseline JMP Model was 1.44 for the training data, and 0.0005 for the testing data. Under the propensity table, it is shown that the lift with respect to the baseline, our training data is at 9.7, as opposed to no lift for the testing data. Conclusion: We believe that this is good model because we used the variables that were given and added in two new variables which proved to be significant in improving the total profit. JMP Go has done a good job, but if we add more of a business perspective into this, we may be able to come up with a better solution. We could probably do a better job beating JMP by adding in more variables that are based on tribal knowledge, and could have used less of the variables that seemed to be insignificant. Base Model i) Statistical KPIs of JMP Model – From JMP Printout Measure Training Validation Definition Entropy RSquare 0.2180 0.2641 1-Loglike(model)/Loglike(0) GeneralizedRSquare 0.2751 0.3253 (1-(L(0)/L(model))^(2/n))/(1- L(0)^(2/n)) Mean -Log p 0.2490 0.2226 ∑ -Log(ρ[j])/n RMSE 0.2576 0.2420 √ ∑(y[j]-ρ[j])²/n Mean Abs Dev 0.1351 0.1202 ∑ |y[j]-ρ[j]|/n Misclassification Rate 0.08 0.0720 ∑ (ρ[j]≠ρMax)/n N 1000 1000 n ii) Statistical KPIs of JMP Model – From Excel Printout Other Metrics Training Validation Accuracy % 86.80% 86.70%
  • 3. StephanusDarmapuspita Eric Esajian DSO 528 True Positive Rate 50.52% 56.67% False PositiveRate 9.30% 10.33% Sensitivity( True PositiveRate) 50.52% 56.67% Specificity(True Negative Rate) 90.70% 89.67% iii) a) Business KPIs of JMP Model – Training Predictednumberof GoodLoans = 8670 Upper limitforLoans = 10000 Actual numberof approvedloans = 8670 Propensityof GoodLoan = 94.464% Propensityof BadLoan = 5.536% Total Profit = $ 23,160,000 b) Business KPIs of JMP Model – Testing Predictednumberof GoodLoans = 8550 Upper limitforLoans = 10000 Actual numberof approvedloans = 8550 Propensityof GoodLoan = 95.439% Propensityof BadLoan = 4.561% Total Profit = $ 24,840,000 iv) Interpret the Model (Base Logistic Regression) – From Business Point of view & Statistical Point of view From business point of view, JMP has done a good job identifying the variables that are indeed significant to determine whether loan candidate will likely be a good loan or bad loan. Variables such as DEROG, DELINQ, and DEBTINC indeed indicate whether loan candidate will lead to good or bad loan. However, from business perspective, it will be foolish to determine whether someone will be a good or bad loan candidate just from these 4 variables. There are many other variables that can be considered such as the reason of getting Loan and also the amount of total Loan with respect to the total value of the home. For example, people
  • 4. StephanusDarmapuspita Eric Esajian DSO 528 that seek loan to pay out other loan obviously more likely to default in its new loan compare to people that seek loan to do home improvement. From statistical point of view, all variables that are chosen are statistically significant and we think that this is a good think from statistical point of view. However, we note that CLAGE variable show a skewed distribution. We might need to do something about this variable in order to get a better regression result. v) Confusion Matrix for Training (cut off probability of 0.15) Predicted GoodLoan BadLoan GoodLoan 819 84 903 Actual BadLoan 48 49 97 867 133 1000 vi) Confusion Matrix for Testing Predicted GoodLoan BadLoan GoodLoan 816 94 910 Actual BadLoan 39 51 90 855 145 1000 v) Lift Table Lift Table in Dollars Training Testing Lift with respect to Baseline - JMP Model 1.385167464 1.485645933 Lift with respect to Baseline - My Best Model 1.437799043 1.523923445 Lift with respect to JMP Model - My Contribution 1.037996546 1.100172712 Overall Lift with respect to Baseline -My Best Model 1.437799043 1.523923445 Lift Table in Propensity Training Testing Lift with respect to Baseline - JMP Model 9.738522456 9.839030566 Lift with respect to Baseline - My Best Model 9.714728021 9.789862355 vii) Attach JMP Printout
  • 6. StephanusDarmapuspita Eric Esajian DSO 528 Testing ConfusionMatrix Best Model i) Statistical KPIs of JMP Model – From JMP Printout Measure Training Validation Definition Entropy RSquare 0.2214 0.1758 1-Loglike(model)/Loglike(0) GeneralizedRSquare 0.2792 0.2222 (1-(L(0)/L(model))^(2/n))/(1- L(0)^(2/n)) Mean -Log p 0.2479 0.2494 ∑ -Log(ρ[j])/n RMSE 0.2570 0.26 √ ∑(y[j]-ρ[j])²/n
  • 7. StephanusDarmapuspita Eric Esajian DSO 528 Mean Abs Dev 0.1342 0.1353 ∑ |y[j]-ρ[j]|/n Misclassification Rate 0.0790 0.0790 ∑ (ρ[j]≠ρMax)/n N 1000 1000 n ii) Statistical KPIs of JMP Model – From Excel Printout Other Metrics Training Validation Accuracy % 91.00% 91.10% True Positive Rate 45.36% 48.89% False PositiveRate 4.10% 4.73% Sensitivity( True PositiveRate) 45.36% 48.89% Specificity(True Negative Rate) 95.90% 95.27% iii) a)Business KPIs of JMP Model – Training Predictednumberof GoodLoans = 9190 Upper limitforLoans = 10000 Actual numberof approvedloans = 9190 Propensityof GoodLoan = 94.233% Propensityof BadLoan = 5.767% Total Profit = $ 24,040,000 b) Business KPIs of JMP Model – Testing Predictednumberof GoodLoans = 9130 Upper limitforLoans = 10000 Actual numberof approvedloans = 9130 Propensityof GoodLoan = 94.962% Propensityof BadLoan = 5.038% Total Profit = $ 25,480,000 iv) Interpret the Model (MyBest Model) – From Business Point of view & Statistical Point of view
  • 8. StephanusDarmapuspita Eric Esajian DSO 528 From business perspective, we decided that we should definitely include DEROG, DEBTINC, and DELINQ variables as it is significant from business perspective. In addition to that, we decided to include REASON as we think that it is one of the variables that are significant from business perspective. As explained, loan candidates are more likely to default on loan if the motive to take loan is for refinancing of debt rather than to do home improvement. Total loan to value ratio also play important role. From business perspective, very high loan to value ratio means that it is more likely for people to simply default from loan and let the foreclosure happen if the value of their home fall way below the loan value itself. From statistical point of view, we decided to do log operation on CLAGE to normalize it. It is done under “Log Age” column. It turned out that this variable is statistically significant as well. Combination of the statistic and business insight from above allow us to produce better model that beat the JMP go-option stepwise regression at 1.04 lift on training and 1.1 lift on testing. v) Confusion Matrix for Training (cut off probability of 0.22) Predicted GoodLoan BadLoan GoodLoan 866 37 903 Actual BadLoan 53 44 97 919 81 1000 vi) Confusion Matrix for Testing Predicted GoodLoan BadLoan GoodLoan 867 43 910 Actual BadLoan 46 44 90 913 87 1000 v) Lift Table Lift Table in Dollars Training Testing Lift with respect to Baseline - JMP Model 1.385167464 1.485645933 Lift with respect to Baseline - My Best Model 1.437799043 1.523923445 Lift with respect to JMP Model - My Contribution 1.037996546 1.100172712 Overall Lift with respect to Baseline -My Best Model 1.437799043 1.523923445 Lift Table in Propensity Training Testing Lift with respect to Baseline - JMP Model 9.738522456 9.839030566 Lift with respect to Baseline - My Best Model 9.714728021 9.789862355 vii) Attach JMP Printout (JMP Print out with some explanation is as per below)
  • 9. StephanusDarmapuspita Eric Esajian DSO 528 Nominal Logistic Fit for BAD Converged in Gradient, 7 iterations Whole Model Test Model -LogLikelihood DF ChiSquare Prob>ChiSq Difference 70.49985 6 140.9997 <.0001* Full 247.94100 Reduced 318.44085  With the best model that we chose, we can see that the probability of this model to be true is high, with respect to the p-value where the probability is <0.0001. RSquare (U) 0.2214 AICc 509.995 BIC 544.236 Observations (or Sum Wgts) 1000  The RSquare here is acceptable although it may not be ideal. The higher the RSquare the better typically, however in this case where there are over 10,000 approvals that need to take place every month, and less than 10 variables we can choose from, we believe that the score of 0.22 is acceptable. Measure Training Definition Entropy RSquare 0.2214 1-Loglike(model)/Loglike(0) Generalized RSquare 0.2792 (1-(L(0)/L(model))^(2/n))/(1-L(0)^(2/n)) Mean -Log p 0.2479 ∑ -Log(ρ[j])/n RMSE 0.2570 √ ∑(y[j]-ρ[j])²/n Mean Abs Dev 0.1342 ∑ |y[j]-ρ[j]|/n Misclassification Rate 0.0790 ∑ (ρ[j]≠ρMax)/n N 1000 n Lack Of Fit Source DF -LogLikelihood ChiSquare Lack Of Fit 993 247.94100 495.882 Saturated 999 0.00000 Prob>ChiSq Fitted 6 247.94100 1.0000 Parameter Estimates Term Estimate Std Error ChiSquare Prob>ChiSq Intercept 0.50243494 1.416252 0.13 0.7228 REASON[DebtCon] -0.0903551 0.1352828 0.45 0.5042 DEROG -0.8309048 0.1616376 26.43 <.0001* DELINQ -0.6897292 0.1260756 29.93 <.0001* DEBTINC -0.0859525 0.0165176 27.08 <.0001* Log Age 2.20439479 0.5301889 17.29 <.0001* Loan/Value 0.67044572 0.6135404 1.19 0.2745
  • 10. StephanusDarmapuspita Eric Esajian DSO 528 For log odds of 0/1 Effect Likelihood Ratio Tests Source Nparm DF L-R ChiSquare Prob>ChiSq REASON 1 1 0.45359543 0.5006 DEROG 1 1 32.6890439 <.0001* DELINQ 1 1 29.3558579 <.0001* DEBTINC 1 1 32.4278218 <.0001* Log Age 1 1 17.48342 <.0001* Loan/Value 1 1 2.66649466 0.1025