SlideShare a Scribd company logo
1 of 21
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Binary Logistic Regression
Basic Terminologies
 Target variable usually denoted by Y , is the variable being predicted and
is also called dependent variable, output variable, response variable or
outcome variable (Ex : One highlighted in red box in table below)
 Predictor, sometimes called an independent variable, is a variable that is
being used to predict the target variable ( Ex : variables highlighted in
green box in table below ) Age Marital Status Loan Status Default
58 married no yes
44 single no no
33 married yes yes
47 married no yes
33 single no no
35 married no yes
28 single yes no
Introduction
• Objective :
• Logistic regression measures the relationship between the categorical target
variable and one or more independent variables
• It deals with situations in which the outcome for a target variable can have
only two possible types
• Thus , logistic regression makes use of one or more predictor variables that
may be either continuous or categorical to predict the target variable classes
• Benefit:
• Logistic regression model output helps identify important factors ( Xi )
impacting the target variable (Y) and also the nature of relationship between
each of these factors and dependent variable
Example : Binary Logistic Regression : Input
Let’s conduct the Binary Logistic Regression analysis on following variables :
Default Status Age Marital Status
Existing Loan
Status
Income
Defaulted 58 married no 46,399
Not Defaulted 44 single no 47,971
Defaulted 33 married yes 52,618
Defaulted 47 married no 28,717
Not Defaulted 33 single no 41,216
Defaulted 35 married no 34,372
Not Defaulted 28 single yes 64,811
Not Defaulted 42 divorced no 53,000
Defaulted 58 married no 41,375
Not Defaulted 43 single no 53,778
Not Defaulted 41 divorced no 44,440
Not Defaulted 29 single no 51,026
Independent variables (Xi)Target Variable (Y)
Example : Binary Logistic Regression : Output
Coefficients P value
(Intercept) -2.34 0.00
Age 0.01 0.07
Marital Status (Married) 0.5 0.04
Income 0.1 0.04
Existing loan (Yes) 0.3 0.03
COEFFICIENTS
• P value for marital status, income and existing loan is <0.05 ;
Hence these variables are important factors for predicting likely default/non
default class
• But p value for Age is >0.05 which means Age is not impacting the prediction
significantly
Example : Binary Logistic Regression : Output
CLASSIFICATION ACCURACY : (35+ 70) / (35+70+4+4) = 92%
• The prediction accuracy is useful criterion for assessing the model performance
• Model with prediction accuracy >= 70% is useful
CLASSIFICATION ERROR = 100- Accuracy = 8%
There is 8% chance of error in classification
Defaulted Not defaulted
Defaulted 35 4
Not defaulted 4 70
ACTUAL VERSUS PREDICTED
Predicted
Actual
Standard input parameters & Sample UI
SAMPLE OUTPUT 1 : MODEL SUMMARY
Coefficients P value
(Intercept) -2.34 0.00
Age 0.01 0.07
Marital Status (Married) 0.5 0.04
Income 0.1 0.04
Existing loan (Yes) 0.3 0.03
Defaulted Not defaulted
Defaulted 35 4
Not defaulted 4 70
ACTUAL VERSUS PREDICTED
Predicted
Actual
COEFFICIENT MATRIX :
Age
Marital
Status
Existing Loan
Status
Income Default Status Predicted class Probability
58 married no 46,399 Defaulted Defaulted 0.7
44 single no 47,971 Not Defaulted Not Defaulted 0.9
33 married yes 52,618 Defaulted Defaulted 0.8
47 married no 28,717 Defaulted Defaulted 0.7
33 single no 41,216 Not Defaulted Not Defaulted 0.6
35 married no 34,372 Defaulted Not Defaulted 0.5
28 single yes 64,811 Not Defaulted Defaulted 0.4
42 divorced no 53,000 Not Defaulted Not Defaulted 0.3
58 married no 41,375 Defaulted Defaulted 0.2
43 single no 53,778 Not Defaulted Defaulted 0.1
Thus, output will contain predicted class column, confusion matrix and classification plot
SAMPLE OUTPUT 2 : PREDICTED CLASS & PROBABILITY
SAMPLE OUTPUT 3 : CLASSIFICATION PLOT
• Lesser the overlap between two classes in the plot above , better the
classification done by model
INTERPRETATION OF IMPORTANT MODEL SUMMARY STATISTICS
Accuracy:
 If Accuracy >= 70% : Model is well fit on provided data and predicted classes
are reasonably accurate
 If Accuracy < 70% : Model is not well fit on provided data and predicted classes
are likely to contain high chances of error
Coefficients and p value :
 If value of coefficient is positive and p value <0.05 , variable is positively
correlated with target variable
 If value of coefficient is negative and p value <0.05 , variable is negatively
correlated with target variable
 If p value > 0.05, variable is unimportant in terms of predicting target variable
classes
Limitations
It is applicable only when target variable is categorical
Sample size must be at least 1000 in order to get reliable predictions
Binary logistic regression is not suitable when number of classes > 2
Level 1 of the target variable should represent the desired outcome.
i.e. if desired class is yes in response/non response target variable
then Yes has to be recoded into 1 and No into 0
General applications
Credit/loan
approval analysis
•Given a list of client’s
transactional
attributes, predict
whether a client will
default or not on a
bank loan
Medical Diagnosis
•Given a list of
symptoms, predict if a
patient has disease X
or not
Rain forecasting
•Based on
temperature,
humidity, pressure
etc. predict if it will be
raining or not
Treatment
effectiveness
analysis
•Based on patient’s
body attributes such
as blood pressure,
sugar, hemoglobin,
name of a drug taken,
type of a treatment
taken etc., check the
likelihood of a disease
being cured
Fraud analysis
•Based on various bills
submitted by an
employee for
reimbursement of
food , travel , medical
expense etc., predict
the likelihood of an
employee doing fraud
Use case 1
Business benefit:
•Once classes are assigned, bank will
have a loan applicants’ dataset with
each applicant labeled as
“likely/unlikely to default”.
•Based on this labels , bank can easily
make a decision on whether to give
loan to an applicant or not and if yes
then how much credit limit and
interest rate each applicant is eligible
for based on the amount of risk
involved.
Business problem :
•A bank loans officer wants to predict if
the loan applicant will be a bank
defaulter or non defaulter based on
attributes such as Loan amount ,
Monthly installment, Employment
tenure , Times delinquent, Annual
income, Debt to income ratio etc.
•Here the target variable would be ‘past
default status’ and predicted class
would be containing values ‘yes or no’
representing ‘likely to default/unlikely
to default’ class respectively.
Use case 1 : Input Dataset
Customer ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past default
status
1039153 21000 701.73 105000 9 5 4 No
1069697 15000 483.38 92000 11 5 2 No
1068120 25600 824.96 110000 10 9 2 No
563175 23000 534.94 80000 9 2 12 No
562842 19750 483.65 57228 11 3 21 Yes
562681 25000 571.78 113000 10 0 9 No
562404 21250 471.2 31008 12 1 12 Yes
700159 14400 448.99 82000 20 6 6 No
696484 10000 241.33 45000 18 8 2 Yes
702598 11700 381.61 45192 20 7 3 Yes
702470 10000 243.29 38000 17 9 7 Yes
702373 4800 144.77 54000 19 8 2 Yes
Use case 1 : Output : Predicted Class
Output : Each record will have the predicted class assigned as shown below (Column : Likelihood to default) :
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past
default
status
Likelihood
to default
1039153 21000 701.73 105000 9 5 4 No No
1069697 15000 483.38 92000 11 5 2 No No
1068120 25600 824.96 110000 10 9 2 No No
563175 23000 534.94 80000 9 2 12 No No
562842 19750 483.65 57228 11 3 21 Yes No
562681 25000 571.78 113000 10 0 9 No No
562404 21250 471.2 31008 12 1 12 Yes Yes
700159 14400 448.99 82000 20 6 6 No No
696484 10000 241.33 45000 18 8 2 Yes Yes
702598 11700 381.61 45192 20 7 3 Yes Yes
702470 10000 243.29 38000 17 9 7 Yes Yes
702373 4800 144.77 54000 19 8 2 Yes No
Use case 1 : Output : Class profile
 As can be seen in the table above, there are distinctive characteristics of defaulters (Class : Yes ) and
non defaulters ( Class : No ).
 Defaulters have tendency to be delinquent, higher debt to income ratio and lower employment tenure
as compared to non defaulters
 Hence , delinquency , employment tenure and debt to income ratio are the determinant factors when
it comes to classifying loan applicants into likely defaulter/non defaulters
Class(Likely to
default)
Average
loan
amount
Average
monthly
installment
Average
annual
income
Average debt
to income
ratio
Average
times
delinquent
Average
employment
tenure
No 10447.30 304.87 66467.74 9.58 1.69 16.82
Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
Use case 2
Business benefit:
•Given the body profile of a patient and
recent treatments and drugs taken by
him/her , probability of a cure can be
predicted and changes in treatment/drug
can be suggested if required.
Business problem :
•A doctor/ pharmacist wants to predict
the likelihood of a new patient’s disease
being cured/not cured based on various
attributes of a patient such as blood
pressure , hemoglobin level, sugar level ,
name of a drug given to patient, name of
a treatment given to patient etc.
•Here the target variable would be ‘past
cure status’ and predicted class would
contain values ‘yes or no’ meaning ‘prone
to cure/ not prone to cure’ respectively..
Use case 3
Business benefit:
•Such classification can prevent a
company from spending unreasonably
on any employee and can in turn save
the company budget by detecting such
fraud beforehand.
Business problem :
•An accountant/human resource
manager wants to predict the
likelihood of an employee doing fraud
to a company based on various bills
submitted by him/her so far such as
food bill , travel bill , medical bill.
•The target variable in this case would
be ‘past fraud status’ and predicted
class would contain values ‘yes or no’
representing likely fraud and no fraud
respectively.
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysisSumit Das
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression Dr Athar Khan
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionKaushik Rajan
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaEdureka!
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Haoran Zhang
 
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierRegression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierAl Arizmendez
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributionsjasondroesch
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Dr Athar Khan
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencevasu Chemistry
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisHARISH Kumar H R
 
General Introduction to ROC Curves
General Introduction to ROC CurvesGeneral Introduction to ROC Curves
General Introduction to ROC CurvesAustin Powell
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regressionA M
 

What's hot (20)

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
 
Logistic regression analysis
Logistic regression analysisLogistic regression analysis
Logistic regression analysis
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)
 
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierRegression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inference
 
Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervals
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
 
General Introduction to ROC Curves
General Introduction to ROC CurvesGeneral Introduction to ROC Curves
General Introduction to ROC Curves
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Anova; analysis of variance
Anova; analysis of varianceAnova; analysis of variance
Anova; analysis of variance
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 

Similar to What is Binary Logistic Regression Classification and How is it Used in Analysis?

What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?Smarten Augmented Analytics
 
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...Smarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
CreditCardDefaultModel
CreditCardDefaultModelCreditCardDefaultModel
CreditCardDefaultModelAndrew Rogala
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisShailendra Tomar
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdfSupuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdfEvaristoDiz1
 
Addiction severity index intro training jan 2015
Addiction severity index intro training jan 2015Addiction severity index intro training jan 2015
Addiction severity index intro training jan 2015Sunrays of Hope, Inc
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?Smarten Augmented Analytics
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model finalRitu Sarkar
 
Biostatistics Workshop: Regression
Biostatistics Workshop: RegressionBiostatistics Workshop: Regression
Biostatistics Workshop: RegressionHopkinsCFAR
 
#newapproach Alternative to the WCA V2.0
#newapproach Alternative to the WCA V2.0#newapproach Alternative to the WCA V2.0
#newapproach Alternative to the WCA V2.0Rick Burgess
 
Math 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docx
Math 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docxMath 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docx
Math 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docxandreecapon
 
Download the presentation
Download the presentationDownload the presentation
Download the presentationbutest
 
Project Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxProject Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxChristianahEfunniyi
 
A Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification TargetingA Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification TargetingCognizant
 

Similar to What is Binary Logistic Regression Classification and How is it Used in Analysis? (20)

What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
 
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
CreditCardDefaultModel
CreditCardDefaultModelCreditCardDefaultModel
CreditCardDefaultModel
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Final report mkt
Final report mktFinal report mkt
Final report mkt
 
Creditscore
CreditscoreCreditscore
Creditscore
 
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdfSupuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
 
Addiction severity index intro training jan 2015
Addiction severity index intro training jan 2015Addiction severity index intro training jan 2015
Addiction severity index intro training jan 2015
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
Biostatistics Workshop: Regression
Biostatistics Workshop: RegressionBiostatistics Workshop: Regression
Biostatistics Workshop: Regression
 
#newapproach Alternative to the WCA V2.0
#newapproach Alternative to the WCA V2.0#newapproach Alternative to the WCA V2.0
#newapproach Alternative to the WCA V2.0
 
Présentation jonathan agnew
Présentation jonathan agnewPrésentation jonathan agnew
Présentation jonathan agnew
 
Math 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docx
Math 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docxMath 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docx
Math 104 Fall 14Lab Assignment #4Math 104 Fall 14Lab Assignmen.docx
 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
 
Project Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxProject Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptx
 
A Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification TargetingA Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 

Recently uploaded

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 

Recently uploaded (20)

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 

What is Binary Logistic Regression Classification and How is it Used in Analysis?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Basic Terminologies  Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable (Ex : One highlighted in red box in table below)  Predictor, sometimes called an independent variable, is a variable that is being used to predict the target variable ( Ex : variables highlighted in green box in table below ) Age Marital Status Loan Status Default 58 married no yes 44 single no no 33 married yes yes 47 married no yes 33 single no no 35 married no yes 28 single yes no
  • 4. Introduction • Objective : • Logistic regression measures the relationship between the categorical target variable and one or more independent variables • It deals with situations in which the outcome for a target variable can have only two possible types • Thus , logistic regression makes use of one or more predictor variables that may be either continuous or categorical to predict the target variable classes • Benefit: • Logistic regression model output helps identify important factors ( Xi ) impacting the target variable (Y) and also the nature of relationship between each of these factors and dependent variable
  • 5. Example : Binary Logistic Regression : Input Let’s conduct the Binary Logistic Regression analysis on following variables : Default Status Age Marital Status Existing Loan Status Income Defaulted 58 married no 46,399 Not Defaulted 44 single no 47,971 Defaulted 33 married yes 52,618 Defaulted 47 married no 28,717 Not Defaulted 33 single no 41,216 Defaulted 35 married no 34,372 Not Defaulted 28 single yes 64,811 Not Defaulted 42 divorced no 53,000 Defaulted 58 married no 41,375 Not Defaulted 43 single no 53,778 Not Defaulted 41 divorced no 44,440 Not Defaulted 29 single no 51,026 Independent variables (Xi)Target Variable (Y)
  • 6. Example : Binary Logistic Regression : Output Coefficients P value (Intercept) -2.34 0.00 Age 0.01 0.07 Marital Status (Married) 0.5 0.04 Income 0.1 0.04 Existing loan (Yes) 0.3 0.03 COEFFICIENTS • P value for marital status, income and existing loan is <0.05 ; Hence these variables are important factors for predicting likely default/non default class • But p value for Age is >0.05 which means Age is not impacting the prediction significantly
  • 7. Example : Binary Logistic Regression : Output CLASSIFICATION ACCURACY : (35+ 70) / (35+70+4+4) = 92% • The prediction accuracy is useful criterion for assessing the model performance • Model with prediction accuracy >= 70% is useful CLASSIFICATION ERROR = 100- Accuracy = 8% There is 8% chance of error in classification Defaulted Not defaulted Defaulted 35 4 Not defaulted 4 70 ACTUAL VERSUS PREDICTED Predicted Actual
  • 9. SAMPLE OUTPUT 1 : MODEL SUMMARY Coefficients P value (Intercept) -2.34 0.00 Age 0.01 0.07 Marital Status (Married) 0.5 0.04 Income 0.1 0.04 Existing loan (Yes) 0.3 0.03 Defaulted Not defaulted Defaulted 35 4 Not defaulted 4 70 ACTUAL VERSUS PREDICTED Predicted Actual COEFFICIENT MATRIX :
  • 10. Age Marital Status Existing Loan Status Income Default Status Predicted class Probability 58 married no 46,399 Defaulted Defaulted 0.7 44 single no 47,971 Not Defaulted Not Defaulted 0.9 33 married yes 52,618 Defaulted Defaulted 0.8 47 married no 28,717 Defaulted Defaulted 0.7 33 single no 41,216 Not Defaulted Not Defaulted 0.6 35 married no 34,372 Defaulted Not Defaulted 0.5 28 single yes 64,811 Not Defaulted Defaulted 0.4 42 divorced no 53,000 Not Defaulted Not Defaulted 0.3 58 married no 41,375 Defaulted Defaulted 0.2 43 single no 53,778 Not Defaulted Defaulted 0.1 Thus, output will contain predicted class column, confusion matrix and classification plot SAMPLE OUTPUT 2 : PREDICTED CLASS & PROBABILITY
  • 11. SAMPLE OUTPUT 3 : CLASSIFICATION PLOT • Lesser the overlap between two classes in the plot above , better the classification done by model
  • 12. INTERPRETATION OF IMPORTANT MODEL SUMMARY STATISTICS Accuracy:  If Accuracy >= 70% : Model is well fit on provided data and predicted classes are reasonably accurate  If Accuracy < 70% : Model is not well fit on provided data and predicted classes are likely to contain high chances of error Coefficients and p value :  If value of coefficient is positive and p value <0.05 , variable is positively correlated with target variable  If value of coefficient is negative and p value <0.05 , variable is negatively correlated with target variable  If p value > 0.05, variable is unimportant in terms of predicting target variable classes
  • 13. Limitations It is applicable only when target variable is categorical Sample size must be at least 1000 in order to get reliable predictions Binary logistic regression is not suitable when number of classes > 2 Level 1 of the target variable should represent the desired outcome. i.e. if desired class is yes in response/non response target variable then Yes has to be recoded into 1 and No into 0
  • 14. General applications Credit/loan approval analysis •Given a list of client’s transactional attributes, predict whether a client will default or not on a bank loan Medical Diagnosis •Given a list of symptoms, predict if a patient has disease X or not Rain forecasting •Based on temperature, humidity, pressure etc. predict if it will be raining or not Treatment effectiveness analysis •Based on patient’s body attributes such as blood pressure, sugar, hemoglobin, name of a drug taken, type of a treatment taken etc., check the likelihood of a disease being cured Fraud analysis •Based on various bills submitted by an employee for reimbursement of food , travel , medical expense etc., predict the likelihood of an employee doing fraud
  • 15. Use case 1 Business benefit: •Once classes are assigned, bank will have a loan applicants’ dataset with each applicant labeled as “likely/unlikely to default”. •Based on this labels , bank can easily make a decision on whether to give loan to an applicant or not and if yes then how much credit limit and interest rate each applicant is eligible for based on the amount of risk involved. Business problem : •A bank loans officer wants to predict if the loan applicant will be a bank defaulter or non defaulter based on attributes such as Loan amount , Monthly installment, Employment tenure , Times delinquent, Annual income, Debt to income ratio etc. •Here the target variable would be ‘past default status’ and predicted class would be containing values ‘yes or no’ representing ‘likely to default/unlikely to default’ class respectively.
  • 16. Use case 1 : Input Dataset Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status 1039153 21000 701.73 105000 9 5 4 No 1069697 15000 483.38 92000 11 5 2 No 1068120 25600 824.96 110000 10 9 2 No 563175 23000 534.94 80000 9 2 12 No 562842 19750 483.65 57228 11 3 21 Yes 562681 25000 571.78 113000 10 0 9 No 562404 21250 471.2 31008 12 1 12 Yes 700159 14400 448.99 82000 20 6 6 No 696484 10000 241.33 45000 18 8 2 Yes 702598 11700 381.61 45192 20 7 3 Yes 702470 10000 243.29 38000 17 9 7 Yes 702373 4800 144.77 54000 19 8 2 Yes
  • 17. Use case 1 : Output : Predicted Class Output : Each record will have the predicted class assigned as shown below (Column : Likelihood to default) : Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status Likelihood to default 1039153 21000 701.73 105000 9 5 4 No No 1069697 15000 483.38 92000 11 5 2 No No 1068120 25600 824.96 110000 10 9 2 No No 563175 23000 534.94 80000 9 2 12 No No 562842 19750 483.65 57228 11 3 21 Yes No 562681 25000 571.78 113000 10 0 9 No No 562404 21250 471.2 31008 12 1 12 Yes Yes 700159 14400 448.99 82000 20 6 6 No No 696484 10000 241.33 45000 18 8 2 Yes Yes 702598 11700 381.61 45192 20 7 3 Yes Yes 702470 10000 243.29 38000 17 9 7 Yes Yes 702373 4800 144.77 54000 19 8 2 Yes No
  • 18. Use case 1 : Output : Class profile  As can be seen in the table above, there are distinctive characteristics of defaulters (Class : Yes ) and non defaulters ( Class : No ).  Defaulters have tendency to be delinquent, higher debt to income ratio and lower employment tenure as compared to non defaulters  Hence , delinquency , employment tenure and debt to income ratio are the determinant factors when it comes to classifying loan applicants into likely defaulter/non defaulters Class(Likely to default) Average loan amount Average monthly installment Average annual income Average debt to income ratio Average times delinquent Average employment tenure No 10447.30 304.87 66467.74 9.58 1.69 16.82 Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
  • 19. Use case 2 Business benefit: •Given the body profile of a patient and recent treatments and drugs taken by him/her , probability of a cure can be predicted and changes in treatment/drug can be suggested if required. Business problem : •A doctor/ pharmacist wants to predict the likelihood of a new patient’s disease being cured/not cured based on various attributes of a patient such as blood pressure , hemoglobin level, sugar level , name of a drug given to patient, name of a treatment given to patient etc. •Here the target variable would be ‘past cure status’ and predicted class would contain values ‘yes or no’ meaning ‘prone to cure/ not prone to cure’ respectively..
  • 20. Use case 3 Business benefit: •Such classification can prevent a company from spending unreasonably on any employee and can in turn save the company budget by detecting such fraud beforehand. Business problem : •An accountant/human resource manager wants to predict the likelihood of an employee doing fraud to a company based on various bills submitted by him/her so far such as food bill , travel bill , medical bill. •The target variable in this case would be ‘past fraud status’ and predicted class would contain values ‘yes or no’ representing likely fraud and no fraud respectively.
  • 21. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018