SlideShare a Scribd company logo
1 of 7
1
GERMAN CREDIT SCORING DATA ANALYSIS
The German Creditdatasetisa classiccase usedforclassificationproblemsthathas1000 observations
and 21 variables,suchas Statusof existingcheckingaccount,Credithistory, Age,Job,Nationality,etc.
withthe predictionvariable,response,whichdifferentiatesgoodcreditversusbadcredit.
The data was sampled to split it into an 80-20 training โ€“ test data. Multiple methods were employed to
solve the predictionproblemsuchas Logisticregression,RegressionTree,GeneralizedAdditiveModeland
Neural networktopredictthe predictionvariable,response inthe trainingandtestdata.The bestmodel
foreachof the modelswere evaluatedandthe belowresultswere found.The bestmodel foreachmethod
was further analyzed for its in-sample and out-of-sample performance. Further, the ROC curve and AUC
was determined for the in-sample training data and the out-of-sample testing data.
GLM (Stepwise
Variable
Selection)
RegressionTree GAM Linear
Discriminant
Analysis
Neural
Network
Model equation -sex
-present_resid -
n_credits -job -
n_people
-telephone -
property
Purpose, age,
present_emp,
other_install,
duration,
amount,
property,
credit_his
Smoothing
Term:
Amount
. -
Deviance 820.73 - 695.11 - -
AIC 836.73 - 794.77 - -
In-sample AUC 0.841 - 0.848 0.76 0.804
Out-of-sample
AUC
0.748 - 0.76 0.76 0.69
In-sample
Cost
0.43 0.39 0.43 0.46 0.33
Out-of-sample
Cost
0.565 0.64 0.63 0.55 0.62
2
GERMAN CREDIT SCORING DATA
BACKGROUND:
The GermanCreditdatasetisaclassiccase thatcanbe usedtoforclassificationproblems.Itwascollected
by the Prof.Hofmann in1994. The original file waseditedmultiple timesandseveral indicatorvariables
were addedtomake it suitable foralgorithmswhichcannotcope withcategorical variables. The dataset
classifies customers as good or bad credit risks based on a set of attributes.
ABOUT THE DATA:
The dataset contains 1000 observations and 21 variables, such as Status of existing checking account,
Credithistory, Age,Job,Nationality,etc. The data was furthersampledtosplitit into an 80-20 trainingโ€“
test data using a seed value of 12420360.
MODEL SELECTION:
An asymmetriccostfunctionwasdefinedwithacut-off probabilityof 1/6. Essentially,the False
Negatives (actual 1butpredict0) were givenaweightof 5, while the False Positives (actual 1but predict
0) were givenaweightof 1.
1. GENERALIZED LOGISTIC REGRESSION:
I) Full Model:
For the full model,the responsevariable, responsewasmodeledagainstall the 20 explanatory
variables.The devianceof the full model wasfoundtobe 697.47 and the AIC 795.47.
Many of the variables were foundtobe significant,hencerequiringvariable selectionmethods.
Deviance AIC BIC
697.47 795.47 1025.02
II) Variable Selection (using AIC and BIC):
Employingstepwise variable selectionmethodstoidentifythe bestmodel topredict response,step-wise
variable selection inbothdirections wasused.The nullmodelwasbuiltwithaconstantandthe full model
was built with all variables. AIC and BIC were both explored as the criterion for the variable selection
method.
Using AIC:
The final model obtained as a result of Step-wise AIC had the below formula.
Final Model: response ~chk_acct + duration+ credit_his+purpose + amount + saving_acct+
present_emp+installment_rate +other_debtor+age + other_install +housing+
foreign
Deviance AIC BIC
708.72 780.72 949.37
3
Using BIC:
The final model obtained as a result of Step-wise BIC generated a much simpler model with only 4
predictor variables.
Final Model: response ~chk_acct + duration+ age + other_install
Deviance AIC BIC
820.73 836.73 874.20
From the resultsof the step function,the bestmodel wasdeterminedusingAICcriterion withthe lowest
AIC value of 780.72.
Choosingthe stepwiseAICmodel asthe final model,predictionof the response variable wasdone to
calculate the in-sampleandout-of-sample error.Further, the ROCplotwasdrawn,andAUC was
calculatedforbothin-sample andout-of-sample.
Deviance AIC BIC
708.72 780.72 949.37
Fig 4. ROC plots for the Final Logistic Regression Model
In - sample Out โ€“ of โ€“ sample
MCR Cost AUC MCR Cost AUC
0.32 0.43 0.841 0.365 0.565 0.748
2. CLASSIFICATION TREE:
The CART technique separatesthe datasetintobinsbyprogressivelyaddingvariable-valuecombinations
to the sequence,ensuringthatat each stepthe splitincreasesthe homogeneityof the resultingsubsets
of observations.All 800 observationsinthe trainingdatasetwere fedintothe classification tree andthe
below tree was observed.
4
Fig 4. Classification Tree
Calculatingthe AsymmetricMisclassificationcostandthe misclassificationrate forthe Classificationtree
for the in-sample andout-of-sampledatageneratedthe followingresults.
In-sample Out-of-sample
MCR Cost MCR Cost
0.32 0.39 0.42 0.64
3. GENERALIZED ADDITIVE MODELS:
A generalizedadditive model wasbuiltwithanon-linearcomponenttothe variables โ€“duration,amount
and age, the only numerical fields. From the summary of this GAMmodel, the edf of duration and age
were foundtobe 1, indicatingnopolynomialrelationshipwiththe responsevariable,response.The GAM
generated the below plots showing the polynomial relationship with the response.
Fig 5. GAM Plots
For the final GAMmodel builtafterretainingthe polynomialrelationshipforthe amountvariable,the
deviance,AICandBICwas calculated.
5
Deviance AIC BIC
695.11 794.77 1028.20
The model wasalsotestedforthe in-sample misclassificationcostand AUCwiththe 80% trainingdata
and the out-of-sample misclassificationcostandAUC withthe 20% trainingdata. Anoptimal cut-off
probabilityof 1/6was usedforthe out-of-samplepredictioncut-off.
In - sample Out โ€“ of โ€“ sample
MCR Cost AUC MCR Cost AUC
0.32 0.43 0.848 0.395 0.63 0.76
Fig 6. ROC plots for the Final GAM Model
4. LINEAR DISCRIMINANT ANALYSIS:
To performa lineardiscriminantanalysis,the response variable,response wascodedasafactor.The LDA
was performed using the lda() and in-sample and out-of-sample misclassification cost and AUC were
calculated.
In - sample Out โ€“ of โ€“ sample
MCR Cost AUC MCR Cost AUC
0.32 0.46 0.76 0.37 0.55 0.76
Fig 7. ROC plots for the Final LDA Model
6
5. NEURAL NETWORK:
Toimplementtheneural networkalgorithm,adatapreprocessingstepisrequired.The datapreprocessing
step is necessary to ensure that the algorithm converges. The independent variables were normalized
with the max-min scaling using x = (X-Xmin)/(Xmax-Xmin).
Choosing8hiddennodestorunthe neuralnetwork,the asymmetricmisclassificationcostwascalculated.
In - sample Out โ€“ of โ€“ sample
MCR Cost AUC MCR Cost AUC
0.24 0.33 0.804 0.34 0.62 0.69
Basedonthe MSPE valuescalculatedforthe Neuralnetwork,the modelperformsthe bestincomparison
with all the models run. The below ROC curves were generated for the Neural network. Clearly neural
networks donโ€™t perform the best for this data set.
Fig 7. ROC plots for the Neural Network Model
CONCLUSION:
Summarizing the results from all the models run for the prediction problem, the below table was
populated.Fromthe belowtable,comparisonsin the performance betweenin-sample measurescanbe
done usingAIC,In-sample MSPE,while betweenthe out-of-sample measurescanbe done usingthe out-
of-sample MSE.
GLM (Stepwise
Variable
Selection)
RegressionTree GAM Linear
Discriminant
Analysis
Neural
Network
Model equation -sex
-present_resid -
n_credits -job -
n_people
Purpose, age,
present_emp,
other_install,
duration,
Smoothing
Term:
Amount
. .
7
-telephone -
property
amount,
property,
credit_his
Deviance 820.73 - 695.11 - -
AIC 836.73 - 794.77 - -
In-sample AUC 0.841 - 0.848 0.734 0.804
Out-of-sample
AUC
0.748 - 0.76 0.69 0.69
In-sample
Cost
0.43 0.39 0.43 0.46 0.33
Out-of-sample
Cost
0.565 0.64 0.63 0.55 0.62
Fig 9. ROC plots comparing the in-sample, out-of-sample measures for all models
From the table above,itโ€™sclearthatthe GAM model performsthe bestintermsof the in-sample and
out-of-sample AUCmeasures.Thisisalsobettervisualizedusingthe plotsbelow.
However,intermsof the misclassificationcost,the GAMprovesveryexpensive.Withrespecttothe
cost, the LDA performsbest.However,strikingabalance betweenthe AUCandcost,the Logistic
Regressionmodelworksbestforthe GermanCreditData.

More Related Content

What's hot

Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
Venkata Reddy Konasani
ย 

What's hot (20)

Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
ย 
Presentation slide on Investment Performance Assessment for Social Islami Ban...
Presentation slide on Investment Performance Assessment for Social Islami Ban...Presentation slide on Investment Performance Assessment for Social Islami Ban...
Presentation slide on Investment Performance Assessment for Social Islami Ban...
ย 
Financial Performance Analysis of Islamic Bank in Bangladesh: A Case Study on...
Financial Performance Analysis of Islamic Bank in Bangladesh: A Case Study on...Financial Performance Analysis of Islamic Bank in Bangladesh: A Case Study on...
Financial Performance Analysis of Islamic Bank in Bangladesh: A Case Study on...
ย 
Econometrics Project
Econometrics ProjectEconometrics Project
Econometrics Project
ย 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing Data
ย 
Real estate regression model King County
Real estate regression model   King CountyReal estate regression model   King County
Real estate regression model King County
ย 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
ย 
In 2018, Digital and Mobile Payment Systems in Turkey
In 2018, Digital and Mobile Payment Systems in TurkeyIn 2018, Digital and Mobile Payment Systems in Turkey
In 2018, Digital and Mobile Payment Systems in Turkey
ย 
Powerpoint sampling distribution
Powerpoint sampling distributionPowerpoint sampling distribution
Powerpoint sampling distribution
ย 
Sbi life smart scholar 24 k (8 yrs payment)
Sbi life smart scholar   24 k (8 yrs payment)Sbi life smart scholar   24 k (8 yrs payment)
Sbi life smart scholar 24 k (8 yrs payment)
ย 
Chapter 09
Chapter 09 Chapter 09
Chapter 09
ย 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
ย 
Predicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining ProjectPredicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining Project
ย 
Chap12 simple regression
Chap12 simple regressionChap12 simple regression
Chap12 simple regression
ย 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
ย 
Human Resource Plan of Grameenphone
Human Resource Plan of GrameenphoneHuman Resource Plan of Grameenphone
Human Resource Plan of Grameenphone
ย 
Default Probability Prediction using Artificial Neural Networks in R Programming
Default Probability Prediction using Artificial Neural Networks in R ProgrammingDefault Probability Prediction using Artificial Neural Networks in R Programming
Default Probability Prediction using Artificial Neural Networks in R Programming
ย 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETA
ย 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
ย 
Panel data analysis
Panel data analysisPanel data analysis
Panel data analysis
ย 

Similar to German credit data analysis

House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
ย 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
Kamel Mansouri
ย 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
Rahul Bhatia
ย 
AHF_IDETC_2011_Jie
AHF_IDETC_2011_JieAHF_IDETC_2011_Jie
AHF_IDETC_2011_Jie
MDO_Lab
ย 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
Liang Xie, PhD
ย 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
Yaxin Liu
ย 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
ย 
Grid search.pptx
Grid search.pptxGrid search.pptx
Grid search.pptx
AbithaSam
ย 

Similar to German credit data analysis (20)

German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
ย 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
ย 
Boston housing data analysis
Boston housing data analysisBoston housing data analysis
Boston housing data analysis
ย 
Dm
DmDm
Dm
ย 
Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595
ย 
P1121133727
P1121133727P1121133727
P1121133727
ย 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
ย 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
ย 
Machine learning algorithm for classification of activity of daily lifeโ€™s
Machine learning algorithm for classification of activity of daily lifeโ€™sMachine learning algorithm for classification of activity of daily lifeโ€™s
Machine learning algorithm for classification of activity of daily lifeโ€™s
ย 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
ย 
Six sigma
Six sigma Six sigma
Six sigma
ย 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
ย 
Realtime pothole detection system using improved CNN Models
Realtime pothole detection system using improved CNN ModelsRealtime pothole detection system using improved CNN Models
Realtime pothole detection system using improved CNN Models
ย 
AHF_IDETC_2011_Jie
AHF_IDETC_2011_JieAHF_IDETC_2011_Jie
AHF_IDETC_2011_Jie
ย 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
ย 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
ย 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
ย 
Grid search.pptx
Grid search.pptxGrid search.pptx
Grid search.pptx
ย 
Conference_paper.pdf
Conference_paper.pdfConference_paper.pdf
Conference_paper.pdf
ย 
Marc Stein, Underwrite.ai - Driverless AI Use Cases in Finance and Cancer Gen...
Marc Stein, Underwrite.ai - Driverless AI Use Cases in Finance and Cancer Gen...Marc Stein, Underwrite.ai - Driverless AI Use Cases in Finance and Cancer Gen...
Marc Stein, Underwrite.ai - Driverless AI Use Cases in Finance and Cancer Gen...
ย 

Recently uploaded

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
ย 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
ย 
Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
ย 
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
ย 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
ย 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
kumargunjan9515
ย 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
ย 

Recently uploaded (20)

Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
ย 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
ย 
๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...
๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...
๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...
ย 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
ย 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
ย 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
ย 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
ย 
Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
ย 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
ย 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
ย 
Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...
ย 
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
ย 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
ย 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...
ย 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
ย 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
ย 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
ย 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
ย 

German credit data analysis

  • 1. 1 GERMAN CREDIT SCORING DATA ANALYSIS The German Creditdatasetisa classiccase usedforclassificationproblemsthathas1000 observations and 21 variables,suchas Statusof existingcheckingaccount,Credithistory, Age,Job,Nationality,etc. withthe predictionvariable,response,whichdifferentiatesgoodcreditversusbadcredit. The data was sampled to split it into an 80-20 training โ€“ test data. Multiple methods were employed to solve the predictionproblemsuchas Logisticregression,RegressionTree,GeneralizedAdditiveModeland Neural networktopredictthe predictionvariable,response inthe trainingandtestdata.The bestmodel foreachof the modelswere evaluatedandthe belowresultswere found.The bestmodel foreachmethod was further analyzed for its in-sample and out-of-sample performance. Further, the ROC curve and AUC was determined for the in-sample training data and the out-of-sample testing data. GLM (Stepwise Variable Selection) RegressionTree GAM Linear Discriminant Analysis Neural Network Model equation -sex -present_resid - n_credits -job - n_people -telephone - property Purpose, age, present_emp, other_install, duration, amount, property, credit_his Smoothing Term: Amount . - Deviance 820.73 - 695.11 - - AIC 836.73 - 794.77 - - In-sample AUC 0.841 - 0.848 0.76 0.804 Out-of-sample AUC 0.748 - 0.76 0.76 0.69 In-sample Cost 0.43 0.39 0.43 0.46 0.33 Out-of-sample Cost 0.565 0.64 0.63 0.55 0.62
  • 2. 2 GERMAN CREDIT SCORING DATA BACKGROUND: The GermanCreditdatasetisaclassiccase thatcanbe usedtoforclassificationproblems.Itwascollected by the Prof.Hofmann in1994. The original file waseditedmultiple timesandseveral indicatorvariables were addedtomake it suitable foralgorithmswhichcannotcope withcategorical variables. The dataset classifies customers as good or bad credit risks based on a set of attributes. ABOUT THE DATA: The dataset contains 1000 observations and 21 variables, such as Status of existing checking account, Credithistory, Age,Job,Nationality,etc. The data was furthersampledtosplitit into an 80-20 trainingโ€“ test data using a seed value of 12420360. MODEL SELECTION: An asymmetriccostfunctionwasdefinedwithacut-off probabilityof 1/6. Essentially,the False Negatives (actual 1butpredict0) were givenaweightof 5, while the False Positives (actual 1but predict 0) were givenaweightof 1. 1. GENERALIZED LOGISTIC REGRESSION: I) Full Model: For the full model,the responsevariable, responsewasmodeledagainstall the 20 explanatory variables.The devianceof the full model wasfoundtobe 697.47 and the AIC 795.47. Many of the variables were foundtobe significant,hencerequiringvariable selectionmethods. Deviance AIC BIC 697.47 795.47 1025.02 II) Variable Selection (using AIC and BIC): Employingstepwise variable selectionmethodstoidentifythe bestmodel topredict response,step-wise variable selection inbothdirections wasused.The nullmodelwasbuiltwithaconstantandthe full model was built with all variables. AIC and BIC were both explored as the criterion for the variable selection method. Using AIC: The final model obtained as a result of Step-wise AIC had the below formula. Final Model: response ~chk_acct + duration+ credit_his+purpose + amount + saving_acct+ present_emp+installment_rate +other_debtor+age + other_install +housing+ foreign Deviance AIC BIC 708.72 780.72 949.37
  • 3. 3 Using BIC: The final model obtained as a result of Step-wise BIC generated a much simpler model with only 4 predictor variables. Final Model: response ~chk_acct + duration+ age + other_install Deviance AIC BIC 820.73 836.73 874.20 From the resultsof the step function,the bestmodel wasdeterminedusingAICcriterion withthe lowest AIC value of 780.72. Choosingthe stepwiseAICmodel asthe final model,predictionof the response variable wasdone to calculate the in-sampleandout-of-sample error.Further, the ROCplotwasdrawn,andAUC was calculatedforbothin-sample andout-of-sample. Deviance AIC BIC 708.72 780.72 949.37 Fig 4. ROC plots for the Final Logistic Regression Model In - sample Out โ€“ of โ€“ sample MCR Cost AUC MCR Cost AUC 0.32 0.43 0.841 0.365 0.565 0.748 2. CLASSIFICATION TREE: The CART technique separatesthe datasetintobinsbyprogressivelyaddingvariable-valuecombinations to the sequence,ensuringthatat each stepthe splitincreasesthe homogeneityof the resultingsubsets of observations.All 800 observationsinthe trainingdatasetwere fedintothe classification tree andthe below tree was observed.
  • 4. 4 Fig 4. Classification Tree Calculatingthe AsymmetricMisclassificationcostandthe misclassificationrate forthe Classificationtree for the in-sample andout-of-sampledatageneratedthe followingresults. In-sample Out-of-sample MCR Cost MCR Cost 0.32 0.39 0.42 0.64 3. GENERALIZED ADDITIVE MODELS: A generalizedadditive model wasbuiltwithanon-linearcomponenttothe variables โ€“duration,amount and age, the only numerical fields. From the summary of this GAMmodel, the edf of duration and age were foundtobe 1, indicatingnopolynomialrelationshipwiththe responsevariable,response.The GAM generated the below plots showing the polynomial relationship with the response. Fig 5. GAM Plots For the final GAMmodel builtafterretainingthe polynomialrelationshipforthe amountvariable,the deviance,AICandBICwas calculated.
  • 5. 5 Deviance AIC BIC 695.11 794.77 1028.20 The model wasalsotestedforthe in-sample misclassificationcostand AUCwiththe 80% trainingdata and the out-of-sample misclassificationcostandAUC withthe 20% trainingdata. Anoptimal cut-off probabilityof 1/6was usedforthe out-of-samplepredictioncut-off. In - sample Out โ€“ of โ€“ sample MCR Cost AUC MCR Cost AUC 0.32 0.43 0.848 0.395 0.63 0.76 Fig 6. ROC plots for the Final GAM Model 4. LINEAR DISCRIMINANT ANALYSIS: To performa lineardiscriminantanalysis,the response variable,response wascodedasafactor.The LDA was performed using the lda() and in-sample and out-of-sample misclassification cost and AUC were calculated. In - sample Out โ€“ of โ€“ sample MCR Cost AUC MCR Cost AUC 0.32 0.46 0.76 0.37 0.55 0.76 Fig 7. ROC plots for the Final LDA Model
  • 6. 6 5. NEURAL NETWORK: Toimplementtheneural networkalgorithm,adatapreprocessingstepisrequired.The datapreprocessing step is necessary to ensure that the algorithm converges. The independent variables were normalized with the max-min scaling using x = (X-Xmin)/(Xmax-Xmin). Choosing8hiddennodestorunthe neuralnetwork,the asymmetricmisclassificationcostwascalculated. In - sample Out โ€“ of โ€“ sample MCR Cost AUC MCR Cost AUC 0.24 0.33 0.804 0.34 0.62 0.69 Basedonthe MSPE valuescalculatedforthe Neuralnetwork,the modelperformsthe bestincomparison with all the models run. The below ROC curves were generated for the Neural network. Clearly neural networks donโ€™t perform the best for this data set. Fig 7. ROC plots for the Neural Network Model CONCLUSION: Summarizing the results from all the models run for the prediction problem, the below table was populated.Fromthe belowtable,comparisonsin the performance betweenin-sample measurescanbe done usingAIC,In-sample MSPE,while betweenthe out-of-sample measurescanbe done usingthe out- of-sample MSE. GLM (Stepwise Variable Selection) RegressionTree GAM Linear Discriminant Analysis Neural Network Model equation -sex -present_resid - n_credits -job - n_people Purpose, age, present_emp, other_install, duration, Smoothing Term: Amount . .
  • 7. 7 -telephone - property amount, property, credit_his Deviance 820.73 - 695.11 - - AIC 836.73 - 794.77 - - In-sample AUC 0.841 - 0.848 0.734 0.804 Out-of-sample AUC 0.748 - 0.76 0.69 0.69 In-sample Cost 0.43 0.39 0.43 0.46 0.33 Out-of-sample Cost 0.565 0.64 0.63 0.55 0.62 Fig 9. ROC plots comparing the in-sample, out-of-sample measures for all models From the table above,itโ€™sclearthatthe GAM model performsthe bestintermsof the in-sample and out-of-sample AUCmeasures.Thisisalsobettervisualizedusingthe plotsbelow. However,intermsof the misclassificationcost,the GAMprovesveryexpensive.Withrespecttothe cost, the LDA performsbest.However,strikingabalance betweenthe AUCandcost,the Logistic Regressionmodelworksbestforthe GermanCreditData.