SlideShare a Scribd company logo
1
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Israel Chavez
Ngadhnjim Halilaj
Anusha Kodali
Marcos Quezada
Jyoti Shrestha
Sarat Tadi
April 28, 2016
EMC Education Services
Data Science & Big Data Analytics
2
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Project Goals
• Create a model that will allow FPC to provide a loan predicting service to
its customers.
• Identify the necessary attributes that will enable the model to give a better
prediction.
• Test the Marketing Department threshold suggestions.
• Advice FPC about the suggestions that they could offer to their customers
to increase their chances of getting a loan.
3
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Situation
•FPC wants to expand its set of services offered to its customers by creating
an online site for loan advice.
•Provide a fast and reliable planning platform for customers to manage their
personal finances.
•Attract potential customers that want to know their eligibility for loans, thus
increasing FPC business.
4
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Executive Summary
Regression and Decision tree are somewhat efficient in predicting
outcome
• Logistic Regression
– Precision: 0.786
– Recall: 0.984
•Decision Tree
– Precision: 0.784
– Recall: 0.984
5
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Discovery
• Used 2010 housing loan database by Home Mortgage Disclosure Act (HMDA).
• Filtered data based on:
4 Owner-occupied
4 1-4 Family
4 Action Type (Loan originated, application approved but not accepted,
application denied, application withdrawn)
6
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
• Data Conditioning:
4 Data was factored, incomplete data was removed Data set created.
4 Releveled variables to produce reference for possible logistic regression.
4 Tested numeric variable correlation through a correlation matrix.
4 Dataset reduced to “Originated” and “Denied” loans.
• Data Visualization:
4 Overviewed data to check distribution and noise.
4 Two originators of noise:
8 Home Improvement Loans
8 Loan amounts > $400K
Data Preparation
7
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Planning
• Model Selection:
4 Two methods:
8 Logistic Regression
8 Classification Tree
• Regression:
4 0.5 and 0.75 thresholds suggested by the Marketing Department were
used.
8
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Planning
• Variable Selection:
4 Created a Small Set for testing purposes:
8 Three possibilities:
▪ Absence of personal data
▪ Absence of County data
▪ Absence of personal and county data.
• Developed two Full models:
4 Model 1: Included everything that the example script suggested;
4 Model 2: Included only the variables that we chose to build the model
with.
• Pseudo-R² was used to check the variance of the models
• ROC & AUC were used to check the performance of our model.
9
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Building
• Created a Holdout set with 25% of the data to test models
• Logistic Regression:
4 Categorized the holdout data in three bins:
8 Low threshold (<50%),
8 Medium threshold (from 50-74%),
8 High threshold (>=75%).
• To further test Regression model, we experimented with a binary
classification: Loan Rejected/ Loan Approved
4 First prediction: threshold 0.5.
4 Second prediction: threshold 0.7.
• Decision Tree:
4 Used binary classification: Loan Rejected/ Loan Approved
• A confusion matrix was developed to compare both methods.
10
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Results and Accuracy
•The model developed with Logistic Regression with threshold 0.5 has
predictive power at least as good as the Decision Tree model
Logistic
Regression
Threshold = 0.5
Predictions
FALSE TRUE
Actual FALSE 2,452 23,657
Actual TRUE 1,385 87,383
Decision Tree
Model
Predictions
FALSE TRUE
Actual FALSE 2,082 24,027
Actual TRUE 1,349 87,419
Logistic Regression model Decision Tree model
Accuracy 0.780 0.779
Precision 0.786 0.784
Recall 0.984 0.984
11
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Logistic Regression Prediction
12
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Decision Tree Visualization
Decision Tree model is a good way to compare the prediction power of a
Logistic Regression model
13
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
• Overview of Basic Methodology: Predict the likelihood of a person getting a loan
from FPC.
• Model: Logistic regression and Decision Tree.
• Dependent variable: “Approved”, if the loan application was approved or not.
• Scope:
– 662,997 total observations for year 2010 extracted from the housing loan
database that was assembled by federal agencies pursuant to the Home
Mortgage Disclosure Act (HMDA).
•After thoroughly cleaning the data, the model had 550,336
observations.
•Sampling
– Small set: 10% of the data.
– Holdout set: 25% of the data.
Model Description
14
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Data distribution visualization
Visualizing the variables for a normal distribution helps to understand
how good of a predictor they are
15
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Data distribution visualization
Removing the unwanted “noises” from the model increases the predicting
powers of the model
16
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
ROC/AUC
The ROC curves lie just inside the full model curve
Essentially they are the same model
Full Model
AUC: 0.70
Personal data
removed 0.69
Personal data
and county
removed
AUC: 0.68
17
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
• Data available for analysis is somewhat efficient.
• Logistic Regression or Classification Tree yield a similar result.
• Logistic Regression should be used considering the web app response time
requirement.
• The model provides an estimate not an assurance that a specific customer
will or will not get a loan.
• Sensitive personal information does not affect the model.
• County information does not affect the model.
• High income increases the chances of getting a loan.
• % of minority population in the customer tract reduces the chances of getting
a loan (We don’t recommend to show this finding in the web!)
Recommendations
18
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential

More Related Content

Similar to Loan predicting web service

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
Asset Investment Manangement (AIM)
Asset Investment Manangement (AIM)Asset Investment Manangement (AIM)
Asset Investment Manangement (AIM)
Matthew Coffin
 
Case Management by EMC - xCP Platform
 Case Management by EMC - xCP Platform Case Management by EMC - xCP Platform
Case Management by EMC - xCP Platform
Amplexor
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
Edgar Alejandro Villegas
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systems
Markus Eisele
 
Migrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsMigrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive Systems
Lightbend
 
Application decommissioning stop spending millions supporting legacy applicat...
Application decommissioning stop spending millions supporting legacy applicat...Application decommissioning stop spending millions supporting legacy applicat...
Application decommissioning stop spending millions supporting legacy applicat...
Flatirons Solutions®
 
Managing Technical Debt - by Michael Hall
Managing Technical Debt - by Michael HallManaging Technical Debt - by Michael Hall
Managing Technical Debt - by Michael Hall
Synerzip
 
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-osDt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
CERTyou Formation
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
jagan477830
 
Addressing Uncertainty How to Model and Solve Energy Optimization Problems
Addressing Uncertainty How to Model and Solve Energy Optimization ProblemsAddressing Uncertainty How to Model and Solve Energy Optimization Problems
Addressing Uncertainty How to Model and Solve Energy Optimization Problems
optimizatiodirectdirect
 
Revolucion movil telesemana
Revolucion movil telesemanaRevolucion movil telesemana
Revolucion movil telesemana
Rafael Junquera
 
OCSL - VMware, vSphere Webinar May 2013
OCSL - VMware, vSphere Webinar May 2013OCSL - VMware, vSphere Webinar May 2013
OCSL - VMware, vSphere Webinar May 2013
OCSL
 
Example Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation SlidesExample Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation Slides
SlideTeam
 
OpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
OpenFlow in Enterprise Data Centers - Products, Lessons and RequirementsOpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
OpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
Open Networking Summits
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
p6academy
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
Kunal Kashyap
 
Mortgage Data for Machine Learning Algorithms
Mortgage Data for Machine Learning AlgorithmsMortgage Data for Machine Learning Algorithms
Mortgage Data for Machine Learning Algorithms
Anne Klieve
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
Edge AI and Vision Alliance
 

Similar to Loan predicting web service (20)

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
Asset Investment Manangement (AIM)
Asset Investment Manangement (AIM)Asset Investment Manangement (AIM)
Asset Investment Manangement (AIM)
 
Case Management by EMC - xCP Platform
 Case Management by EMC - xCP Platform Case Management by EMC - xCP Platform
Case Management by EMC - xCP Platform
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systems
 
Migrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsMigrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive Systems
 
Application decommissioning stop spending millions supporting legacy applicat...
Application decommissioning stop spending millions supporting legacy applicat...Application decommissioning stop spending millions supporting legacy applicat...
Application decommissioning stop spending millions supporting legacy applicat...
 
Managing Technical Debt - by Michael Hall
Managing Technical Debt - by Michael HallManaging Technical Debt - by Michael Hall
Managing Technical Debt - by Michael Hall
 
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-osDt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Addressing Uncertainty How to Model and Solve Energy Optimization Problems
Addressing Uncertainty How to Model and Solve Energy Optimization ProblemsAddressing Uncertainty How to Model and Solve Energy Optimization Problems
Addressing Uncertainty How to Model and Solve Energy Optimization Problems
 
Revolucion movil telesemana
Revolucion movil telesemanaRevolucion movil telesemana
Revolucion movil telesemana
 
OCSL - VMware, vSphere Webinar May 2013
OCSL - VMware, vSphere Webinar May 2013OCSL - VMware, vSphere Webinar May 2013
OCSL - VMware, vSphere Webinar May 2013
 
Example Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation SlidesExample Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation Slides
 
OpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
OpenFlow in Enterprise Data Centers - Products, Lessons and RequirementsOpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
OpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Mortgage Data for Machine Learning Algorithms
Mortgage Data for Machine Learning AlgorithmsMortgage Data for Machine Learning Algorithms
Mortgage Data for Machine Learning Algorithms
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 

More from Marcos Quezada

Acelerándolo Todo
Acelerándolo TodoAcelerándolo Todo
Acelerándolo Todo
Marcos Quezada
 
Como evitamos otro invierno de la ia
Como evitamos otro invierno de la iaComo evitamos otro invierno de la ia
Como evitamos otro invierno de la ia
Marcos Quezada
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
Marcos Quezada
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Marcos Quezada
 
Dime-Novel Genre Classifier: A Prototype Text-Mining Application
Dime-Novel Genre Classifier:  A Prototype Text-Mining ApplicationDime-Novel Genre Classifier:  A Prototype Text-Mining Application
Dime-Novel Genre Classifier: A Prototype Text-Mining Application
Marcos Quezada
 
Make from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMake from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your business
Marcos Quezada
 
Root4 Startup Next Demo Day 2014
Root4 Startup Next Demo Day 2014Root4 Startup Next Demo Day 2014
Root4 Startup Next Demo Day 2014
Marcos Quezada
 

More from Marcos Quezada (7)

Acelerándolo Todo
Acelerándolo TodoAcelerándolo Todo
Acelerándolo Todo
 
Como evitamos otro invierno de la ia
Como evitamos otro invierno de la iaComo evitamos otro invierno de la ia
Como evitamos otro invierno de la ia
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
 
Dime-Novel Genre Classifier: A Prototype Text-Mining Application
Dime-Novel Genre Classifier:  A Prototype Text-Mining ApplicationDime-Novel Genre Classifier:  A Prototype Text-Mining Application
Dime-Novel Genre Classifier: A Prototype Text-Mining Application
 
Make from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMake from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your business
 
Root4 Startup Next Demo Day 2014
Root4 Startup Next Demo Day 2014Root4 Startup Next Demo Day 2014
Root4 Startup Next Demo Day 2014
 

Recently uploaded

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 

Recently uploaded (20)

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 

Loan predicting web service

  • 1. 1 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Israel Chavez Ngadhnjim Halilaj Anusha Kodali Marcos Quezada Jyoti Shrestha Sarat Tadi April 28, 2016 EMC Education Services Data Science & Big Data Analytics
  • 2. 2 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Project Goals • Create a model that will allow FPC to provide a loan predicting service to its customers. • Identify the necessary attributes that will enable the model to give a better prediction. • Test the Marketing Department threshold suggestions. • Advice FPC about the suggestions that they could offer to their customers to increase their chances of getting a loan.
  • 3. 3 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Situation •FPC wants to expand its set of services offered to its customers by creating an online site for loan advice. •Provide a fast and reliable planning platform for customers to manage their personal finances. •Attract potential customers that want to know their eligibility for loans, thus increasing FPC business.
  • 4. 4 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Executive Summary Regression and Decision tree are somewhat efficient in predicting outcome • Logistic Regression – Precision: 0.786 – Recall: 0.984 •Decision Tree – Precision: 0.784 – Recall: 0.984
  • 5. 5 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Discovery • Used 2010 housing loan database by Home Mortgage Disclosure Act (HMDA). • Filtered data based on: 4 Owner-occupied 4 1-4 Family 4 Action Type (Loan originated, application approved but not accepted, application denied, application withdrawn)
  • 6. 6 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential • Data Conditioning: 4 Data was factored, incomplete data was removed Data set created. 4 Releveled variables to produce reference for possible logistic regression. 4 Tested numeric variable correlation through a correlation matrix. 4 Dataset reduced to “Originated” and “Denied” loans. • Data Visualization: 4 Overviewed data to check distribution and noise. 4 Two originators of noise: 8 Home Improvement Loans 8 Loan amounts > $400K Data Preparation
  • 7. 7 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Planning • Model Selection: 4 Two methods: 8 Logistic Regression 8 Classification Tree • Regression: 4 0.5 and 0.75 thresholds suggested by the Marketing Department were used.
  • 8. 8 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Planning • Variable Selection: 4 Created a Small Set for testing purposes: 8 Three possibilities: ▪ Absence of personal data ▪ Absence of County data ▪ Absence of personal and county data. • Developed two Full models: 4 Model 1: Included everything that the example script suggested; 4 Model 2: Included only the variables that we chose to build the model with. • Pseudo-R² was used to check the variance of the models • ROC & AUC were used to check the performance of our model.
  • 9. 9 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Building • Created a Holdout set with 25% of the data to test models • Logistic Regression: 4 Categorized the holdout data in three bins: 8 Low threshold (<50%), 8 Medium threshold (from 50-74%), 8 High threshold (>=75%). • To further test Regression model, we experimented with a binary classification: Loan Rejected/ Loan Approved 4 First prediction: threshold 0.5. 4 Second prediction: threshold 0.7. • Decision Tree: 4 Used binary classification: Loan Rejected/ Loan Approved • A confusion matrix was developed to compare both methods.
  • 10. 10 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Results and Accuracy •The model developed with Logistic Regression with threshold 0.5 has predictive power at least as good as the Decision Tree model Logistic Regression Threshold = 0.5 Predictions FALSE TRUE Actual FALSE 2,452 23,657 Actual TRUE 1,385 87,383 Decision Tree Model Predictions FALSE TRUE Actual FALSE 2,082 24,027 Actual TRUE 1,349 87,419 Logistic Regression model Decision Tree model Accuracy 0.780 0.779 Precision 0.786 0.784 Recall 0.984 0.984
  • 11. 11 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Logistic Regression Prediction
  • 12. 12 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Decision Tree Visualization Decision Tree model is a good way to compare the prediction power of a Logistic Regression model
  • 13. 13 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential • Overview of Basic Methodology: Predict the likelihood of a person getting a loan from FPC. • Model: Logistic regression and Decision Tree. • Dependent variable: “Approved”, if the loan application was approved or not. • Scope: – 662,997 total observations for year 2010 extracted from the housing loan database that was assembled by federal agencies pursuant to the Home Mortgage Disclosure Act (HMDA). •After thoroughly cleaning the data, the model had 550,336 observations. •Sampling – Small set: 10% of the data. – Holdout set: 25% of the data. Model Description
  • 14. 14 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Data distribution visualization Visualizing the variables for a normal distribution helps to understand how good of a predictor they are
  • 15. 15 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Data distribution visualization Removing the unwanted “noises” from the model increases the predicting powers of the model
  • 16. 16 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential ROC/AUC The ROC curves lie just inside the full model curve Essentially they are the same model Full Model AUC: 0.70 Personal data removed 0.69 Personal data and county removed AUC: 0.68
  • 17. 17 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential • Data available for analysis is somewhat efficient. • Logistic Regression or Classification Tree yield a similar result. • Logistic Regression should be used considering the web app response time requirement. • The model provides an estimate not an assurance that a specific customer will or will not get a loan. • Sensitive personal information does not affect the model. • County information does not affect the model. • High income increases the chances of getting a loan. • % of minority population in the customer tract reduces the chances of getting a loan (We don’t recommend to show this finding in the web!) Recommendations
  • 18. 18 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential