SlideShare a Scribd company logo
1 of 18
1
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Israel Chavez
Ngadhnjim Halilaj
Anusha Kodali
Marcos Quezada
Jyoti Shrestha
Sarat Tadi
April 28, 2016
EMC Education Services
Data Science & Big Data Analytics
2
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Project Goals
• Create a model that will allow FPC to provide a loan predicting service to
its customers.
• Identify the necessary attributes that will enable the model to give a better
prediction.
• Test the Marketing Department threshold suggestions.
• Advice FPC about the suggestions that they could offer to their customers
to increase their chances of getting a loan.
3
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Situation
•FPC wants to expand its set of services offered to its customers by creating
an online site for loan advice.
•Provide a fast and reliable planning platform for customers to manage their
personal finances.
•Attract potential customers that want to know their eligibility for loans, thus
increasing FPC business.
4
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Executive Summary
Regression and Decision tree are somewhat efficient in predicting
outcome
• Logistic Regression
– Precision: 0.786
– Recall: 0.984
•Decision Tree
– Precision: 0.784
– Recall: 0.984
5
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Discovery
• Used 2010 housing loan database by Home Mortgage Disclosure Act (HMDA).
• Filtered data based on:
4 Owner-occupied
4 1-4 Family
4 Action Type (Loan originated, application approved but not accepted,
application denied, application withdrawn)
6
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
• Data Conditioning:
4 Data was factored, incomplete data was removed Data set created.
4 Releveled variables to produce reference for possible logistic regression.
4 Tested numeric variable correlation through a correlation matrix.
4 Dataset reduced to “Originated” and “Denied” loans.
• Data Visualization:
4 Overviewed data to check distribution and noise.
4 Two originators of noise:
8 Home Improvement Loans
8 Loan amounts > $400K
Data Preparation
7
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Planning
• Model Selection:
4 Two methods:
8 Logistic Regression
8 Classification Tree
• Regression:
4 0.5 and 0.75 thresholds suggested by the Marketing Department were
used.
8
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Planning
• Variable Selection:
4 Created a Small Set for testing purposes:
8 Three possibilities:
▪ Absence of personal data
▪ Absence of County data
▪ Absence of personal and county data.
• Developed two Full models:
4 Model 1: Included everything that the example script suggested;
4 Model 2: Included only the variables that we chose to build the model
with.
• Pseudo-R² was used to check the variance of the models
• ROC & AUC were used to check the performance of our model.
9
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Building
• Created a Holdout set with 25% of the data to test models
• Logistic Regression:
4 Categorized the holdout data in three bins:
8 Low threshold (<50%),
8 Medium threshold (from 50-74%),
8 High threshold (>=75%).
• To further test Regression model, we experimented with a binary
classification: Loan Rejected/ Loan Approved
4 First prediction: threshold 0.5.
4 Second prediction: threshold 0.7.
• Decision Tree:
4 Used binary classification: Loan Rejected/ Loan Approved
• A confusion matrix was developed to compare both methods.
10
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Approach - Model Results and Accuracy
•The model developed with Logistic Regression with threshold 0.5 has
predictive power at least as good as the Decision Tree model
Logistic
Regression
Threshold = 0.5
Predictions
FALSE TRUE
Actual FALSE 2,452 23,657
Actual TRUE 1,385 87,383
Decision Tree
Model
Predictions
FALSE TRUE
Actual FALSE 2,082 24,027
Actual TRUE 1,349 87,419
Logistic Regression model Decision Tree model
Accuracy 0.780 0.779
Precision 0.786 0.784
Recall 0.984 0.984
11
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Logistic Regression Prediction
12
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Decision Tree Visualization
Decision Tree model is a good way to compare the prediction power of a
Logistic Regression model
13
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
• Overview of Basic Methodology: Predict the likelihood of a person getting a loan
from FPC.
• Model: Logistic regression and Decision Tree.
• Dependent variable: “Approved”, if the loan application was approved or not.
• Scope:
– 662,997 total observations for year 2010 extracted from the housing loan
database that was assembled by federal agencies pursuant to the Home
Mortgage Disclosure Act (HMDA).
•After thoroughly cleaning the data, the model had 550,336
observations.
•Sampling
– Small set: 10% of the data.
– Holdout set: 25% of the data.
Model Description
14
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Data distribution visualization
Visualizing the variables for a normal distribution helps to understand
how good of a predictor they are
15
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
Data distribution visualization
Removing the unwanted “noises” from the model increases the predicting
powers of the model
16
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
ROC/AUC
The ROC curves lie just inside the full model curve
Essentially they are the same model
Full Model
AUC: 0.70
Personal data
removed 0.69
Personal data
and county
removed
AUC: 0.68
17
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential
• Data available for analysis is somewhat efficient.
• Logistic Regression or Classification Tree yield a similar result.
• Logistic Regression should be used considering the web app response time
requirement.
• The model provides an estimate not an assurance that a specific customer
will or will not get a loan.
• Sensitive personal information does not affect the model.
• County information does not affect the model.
• High income increases the chances of getting a loan.
• % of minority population in the customer tract reduces the chances of getting
a loan (We don’t recommend to show this finding in the web!)
Recommendations
18
© Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential

More Related Content

Similar to Loan predicting web service

Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-osDt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
CERTyou Formation
 
Revolucion movil telesemana
Revolucion movil telesemanaRevolucion movil telesemana
Revolucion movil telesemana
Rafael Junquera
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
p6academy
 

Similar to Loan predicting web service (20)

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
Asset Investment Manangement (AIM)
Asset Investment Manangement (AIM)Asset Investment Manangement (AIM)
Asset Investment Manangement (AIM)
 
Case Management by EMC - xCP Platform
 Case Management by EMC - xCP Platform Case Management by EMC - xCP Platform
Case Management by EMC - xCP Platform
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
Migrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsMigrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive Systems
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systems
 
Application decommissioning stop spending millions supporting legacy applicat...
Application decommissioning stop spending millions supporting legacy applicat...Application decommissioning stop spending millions supporting legacy applicat...
Application decommissioning stop spending millions supporting legacy applicat...
 
Managing Technical Debt - by Michael Hall
Managing Technical Debt - by Michael HallManaging Technical Debt - by Michael Hall
Managing Technical Debt - by Michael Hall
 
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-osDt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
Dt812 g formation-infosphere-optim-test-data-management-and-data-masking-on-z-os
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Addressing Uncertainty How to Model and Solve Energy Optimization Problems
Addressing Uncertainty How to Model and Solve Energy Optimization ProblemsAddressing Uncertainty How to Model and Solve Energy Optimization Problems
Addressing Uncertainty How to Model and Solve Energy Optimization Problems
 
Revolucion movil telesemana
Revolucion movil telesemanaRevolucion movil telesemana
Revolucion movil telesemana
 
OCSL - VMware, vSphere Webinar May 2013
OCSL - VMware, vSphere Webinar May 2013OCSL - VMware, vSphere Webinar May 2013
OCSL - VMware, vSphere Webinar May 2013
 
Example Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation SlidesExample Of Business Operations Analysis Powerpoint Presentation Slides
Example Of Business Operations Analysis Powerpoint Presentation Slides
 
OpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
OpenFlow in Enterprise Data Centers - Products, Lessons and RequirementsOpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
OpenFlow in Enterprise Data Centers - Products, Lessons and Requirements
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Mortgage Data for Machine Learning Algorithms
Mortgage Data for Machine Learning AlgorithmsMortgage Data for Machine Learning Algorithms
Mortgage Data for Machine Learning Algorithms
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 

More from Marcos Quezada

More from Marcos Quezada (7)

Acelerándolo Todo
Acelerándolo TodoAcelerándolo Todo
Acelerándolo Todo
 
Como evitamos otro invierno de la ia
Como evitamos otro invierno de la iaComo evitamos otro invierno de la ia
Como evitamos otro invierno de la ia
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
 
Dime-Novel Genre Classifier: A Prototype Text-Mining Application
Dime-Novel Genre Classifier:  A Prototype Text-Mining ApplicationDime-Novel Genre Classifier:  A Prototype Text-Mining Application
Dime-Novel Genre Classifier: A Prototype Text-Mining Application
 
Make from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMake from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your business
 
Root4 Startup Next Demo Day 2014
Root4 Startup Next Demo Day 2014Root4 Startup Next Demo Day 2014
Root4 Startup Next Demo Day 2014
 

Recently uploaded

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 

Loan predicting web service

  • 1. 1 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Israel Chavez Ngadhnjim Halilaj Anusha Kodali Marcos Quezada Jyoti Shrestha Sarat Tadi April 28, 2016 EMC Education Services Data Science & Big Data Analytics
  • 2. 2 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Project Goals • Create a model that will allow FPC to provide a loan predicting service to its customers. • Identify the necessary attributes that will enable the model to give a better prediction. • Test the Marketing Department threshold suggestions. • Advice FPC about the suggestions that they could offer to their customers to increase their chances of getting a loan.
  • 3. 3 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Situation •FPC wants to expand its set of services offered to its customers by creating an online site for loan advice. •Provide a fast and reliable planning platform for customers to manage their personal finances. •Attract potential customers that want to know their eligibility for loans, thus increasing FPC business.
  • 4. 4 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Executive Summary Regression and Decision tree are somewhat efficient in predicting outcome • Logistic Regression – Precision: 0.786 – Recall: 0.984 •Decision Tree – Precision: 0.784 – Recall: 0.984
  • 5. 5 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Discovery • Used 2010 housing loan database by Home Mortgage Disclosure Act (HMDA). • Filtered data based on: 4 Owner-occupied 4 1-4 Family 4 Action Type (Loan originated, application approved but not accepted, application denied, application withdrawn)
  • 6. 6 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential • Data Conditioning: 4 Data was factored, incomplete data was removed Data set created. 4 Releveled variables to produce reference for possible logistic regression. 4 Tested numeric variable correlation through a correlation matrix. 4 Dataset reduced to “Originated” and “Denied” loans. • Data Visualization: 4 Overviewed data to check distribution and noise. 4 Two originators of noise: 8 Home Improvement Loans 8 Loan amounts > $400K Data Preparation
  • 7. 7 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Planning • Model Selection: 4 Two methods: 8 Logistic Regression 8 Classification Tree • Regression: 4 0.5 and 0.75 thresholds suggested by the Marketing Department were used.
  • 8. 8 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Planning • Variable Selection: 4 Created a Small Set for testing purposes: 8 Three possibilities: ▪ Absence of personal data ▪ Absence of County data ▪ Absence of personal and county data. • Developed two Full models: 4 Model 1: Included everything that the example script suggested; 4 Model 2: Included only the variables that we chose to build the model with. • Pseudo-R² was used to check the variance of the models • ROC & AUC were used to check the performance of our model.
  • 9. 9 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Building • Created a Holdout set with 25% of the data to test models • Logistic Regression: 4 Categorized the holdout data in three bins: 8 Low threshold (<50%), 8 Medium threshold (from 50-74%), 8 High threshold (>=75%). • To further test Regression model, we experimented with a binary classification: Loan Rejected/ Loan Approved 4 First prediction: threshold 0.5. 4 Second prediction: threshold 0.7. • Decision Tree: 4 Used binary classification: Loan Rejected/ Loan Approved • A confusion matrix was developed to compare both methods.
  • 10. 10 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Approach - Model Results and Accuracy •The model developed with Logistic Regression with threshold 0.5 has predictive power at least as good as the Decision Tree model Logistic Regression Threshold = 0.5 Predictions FALSE TRUE Actual FALSE 2,452 23,657 Actual TRUE 1,385 87,383 Decision Tree Model Predictions FALSE TRUE Actual FALSE 2,082 24,027 Actual TRUE 1,349 87,419 Logistic Regression model Decision Tree model Accuracy 0.780 0.779 Precision 0.786 0.784 Recall 0.984 0.984
  • 11. 11 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Logistic Regression Prediction
  • 12. 12 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Decision Tree Visualization Decision Tree model is a good way to compare the prediction power of a Logistic Regression model
  • 13. 13 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential • Overview of Basic Methodology: Predict the likelihood of a person getting a loan from FPC. • Model: Logistic regression and Decision Tree. • Dependent variable: “Approved”, if the loan application was approved or not. • Scope: – 662,997 total observations for year 2010 extracted from the housing loan database that was assembled by federal agencies pursuant to the Home Mortgage Disclosure Act (HMDA). •After thoroughly cleaning the data, the model had 550,336 observations. •Sampling – Small set: 10% of the data. – Holdout set: 25% of the data. Model Description
  • 14. 14 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Data distribution visualization Visualizing the variables for a normal distribution helps to understand how good of a predictor they are
  • 15. 15 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential Data distribution visualization Removing the unwanted “noises” from the model increases the predicting powers of the model
  • 16. 16 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential ROC/AUC The ROC curves lie just inside the full model curve Essentially they are the same model Full Model AUC: 0.70 Personal data removed 0.69 Personal data and county removed AUC: 0.68
  • 17. 17 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential • Data available for analysis is somewhat efficient. • Logistic Regression or Classification Tree yield a similar result. • Logistic Regression should be used considering the web app response time requirement. • The model provides an estimate not an assurance that a specific customer will or will not get a loan. • Sensitive personal information does not affect the model. • County information does not affect the model. • High income increases the chances of getting a loan. • % of minority population in the customer tract reduces the chances of getting a loan (We don’t recommend to show this finding in the web!) Recommendations
  • 18. 18 © Copyright 2011 EMC Corporation. All rights reserved. EMC Restricted Confidential