SlideShare a Scribd company logo
1 of 12
Generalized Logit Regression
Using SAS:
By Example
Anthony Kilili
March 2007
Introduction
• The most common regression solution for a dichotomous outcome target is the use of
Binary Logistic Regression (BLR). This has been the go-to technique for modeling
outcomes in many industries including insurance, financial and catalogue. The
technique is well-utilized in predicting outcomes such as:
 Response to a mailing (yes or no)
 Chargeback (yes or no)
 Chargeoff (yes or no)
 Attrition (yes or no)
• However, there are many other situations where the dependent variable has more
than two possible outcomes. For example, in telecommunications, we may need to know
the type of handset prospects are most likely to choose given a choice of several
models. A classical ‘choice’ problem.
• In the case of customer attrition, we may be interested not only in whether or not
the customer attrites, but also on whether they do so after 6 months, 1 year or 18
months of membership. A classical ‘conditional/survival probability’ problem.
• These types of problems have traditionally been approached by building several
separate binary models, one for each product/category/outcome.
• One of the problems with this approach is that the probabilities from these models
are not necessarily comparable. A prospect may rank highly in several of the models,
how do we determine which product to market to them?
• Another approach has been through Discriminant Analysis techniques but these are
quite unforgiving if various statistical assumptions are unmet (Multivariate normality,
Equality of Variances etc.).
• In this article, I recommend the use of a specialized form of logistic regression called
Multinomial Logit or Generalized Logistic Regression.
Generalized Logit Regression
• PROC LOGISTIC in SAS was historically only used to handle binary outcome
problems. Beginning Version 8.2, the procedure was extended to polytomous
dependent variables via Generalized Logit analysis using the LINK=glogit option.
• For a dependent variable with three possible outcomes A,B or C, the procedure will
perform variable selection and create two linear equations to calculate logits. The
Logit for the third category is not produced because it is used as the ‘reference’
category.
 For a target (Y) that takes the values A, B or C, we can use Y=C as the reference category
and run the following SAS code:
PROC LOGISTIC DATA=yourdata;
MODEL Y(ref=‘C’) = Var1 var2 Var3 /
SELECTION=stepwise LINK=glogit;
RUN;
The procedure will estimate the following logits from the 3 independent variables (var1,
var2, var3):
 Logit_A = intercept + β11Var1 + β12Var2 + β13Var3
 Logit_B = intercept + β21Var1 + β22Var2 + β23Var3
Where:
Logit_A = log(P(Y=A)/P(Y=C))
Logit_B = log(P(Y=B)/P(Y=C))
– The predicted probability for each category can be derived from these logits as:
• P(Y=C)=1/(1 + expLogit_A + expLogit_B)
• P(Y=A)= expLogit_A / (1 + expLogit_A + expLogit_C)
• P(Y=B)= 1 - P(Y=A) – P(Y=C)
Example
Let’s take a hypothetical example of a Magazine Subscription company
whose business model allows the marketing department to send direct
mail solicitations which have a ‘Bill Me Later’ option. There are three
possible outcomes in response to the mailers:
1)Prospects who enroll in the subscription AND subsequently make the
first payment when it becomes due. We will call this OUTCOME = A.
2)Prospects who enroll in the subscription but fail to make the first
payment. We will call this OUTCOME = B.
3)Non-responders, prospects who do not respond to the solicitation. We
will call this OUTCOME = C.
The objective of the marketing activities is the maximization of Outcome
A. The data science team has been tasked with building response models
that could identity the top prospects most likely to result in Outcome A
so that campaigns can be laser-focused to these prospects..
Example
Obs var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 Outcome
1 11 1 10 10 11 2 0.481 0.45849 3 0.70922 11 C
2 12 1 12 8 7 4 0.522 0.58634 4 0.62200 12 A
3 8 2 11 6 10 4 0.485 0.30000 3 0.70000 8 C
4 14 2 11 9 10 2 0.571 0.80000 3 0.50322 14 B
5 13 1 4 4 7 1 0.656 0.66096 2 0.63095 13 B
6 9 2 8 6 4 2 0.621 0.56050 4 0.68550 9 B
7 18 2 11 9 10 2 0.978 0.80000 3 0.35254 18 C
8 7 2 11 9 10 4 0.485 0.43801 3 0.63667 7 C
9 12 0 13 9 7 2 0.587 0.47677 4 0.74105 12 B
10 11 2 12 10 11 4 0.499 0.62370 4 0.58971 11 B
11 11 1 5 6 2 2 0.556 0.77539 1 0.56912 11 C
12 6 2 11 9 10 2 0.529 0.47070 3 0.64353 6 B
The table above is a selection of 12 prospect records from the hypothetical dataset. We
would like to predict the probability of class membership using the 11 independent
variables (Var1-Var11). Our target variable has the 3 outcome classes (A,B and C).
The following code was first ran as a variable selection step. Other variable selection
methods ( Backward or Forward) can be used by changing the SELECTION option. The
significance level for retaining or adding variables can also be controlled via the SLS
option.
PROC LOGISTIC DATA=sample1;
MODEL category(ref=‘C’) = Var1-Var11 /
SELECTION=stepwise SLS=0.0001 LINK=glogit;
RUN;
SAS OUTPUT
The LOGISTIC Procedure
Model Information
Data Set WORK.SAMPLE1
Response Variable category
Number of Response Levels 3
Model generalized logit
Optimization Technique Fisher's scoring
Number of Observations Read 20973
Number of Observations Used 20379
Response Profile
Ordered Total
Value category Frequency
1 A 2390
2 B 8051
3 C 9938
Logits modeled use category='C' as the reference
category.
This is the first part of the output
showing the name of our dataset
(SAMPLE1), the target variable
(CATEGORY) and the type of
modeling performed (GENERALIZED
LOGIT).
We can also see the distribution by
category and the group used as the
reference (category=‘C’)
Summary of Stepwise Selection
Effect Number Score Wald
Step Entered Removed DF In Chi-Square Chi-Square Pr > ChiSq
1 var6 2 1 959.2276 <.0001
2 var4 2 2 450.8481 <.0001
3 var1 2 3 48.6350 <.0001
4 var8 2 4 18.0758 0.0001
5 var9 2 5 10.8040 0.0045
6 var9 2 4 10.7888 0.0045
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter category DF Estimate Error Chi-Square Pr > ChiSq
Intercept A 1 -3.0312 0.1428 450.4170 <.0001
Intercept B 1 -2.0812 0.0946 483.7261 <.0001
var1 A 1 -0.0263 0.00635 17.1955 <.0001
var1 B 1 -0.0147 0.00429 11.7130 0.0006
var4 A 1 0.1300 0.0118 122.0528 <.0001
var4 B 1 0.1466 0.00770 361.9967 <.0001
var6 A 1 0.3293 0.0224 215.4427 <.0001
var6 B 1 0.3904 0.0150 677.6613 <.0001
var8 A 1 -0.00841 0.1318 0.0041 0.9491
var8 B 1 -0.3561 0.0880 16.3859 <.0001
Odds Ratio Estimates
Point 95% Wald
Effect category Estimate Confidence Limits
var1 A 0.974 0.962 0.986
var1 B 0.985 0.977 0.994
var4 A 1.139 1.113 1.165
var4 B 1.158 1.141 1.175
var6 A 1.390 1.330 1.452
var6 B 1.478 1.435 1.522
var8 A 0.992 0.766 1.284
var8 B 0.700 0.589 0.832
The stepwise method at the
0.01% level of significance,
selected 4 variables for the
model (Var6, Var4, Var1 and
Var8).
NB: VAR8 is highly significant
in predicting Logit_B but not
Logit_A. We could still retain
this variable or preferably
evaluate predictions on a
validation sample with and
without the variable to
determine if it significantly
affects the bottom line and
model stability.
Based on these, we can now
run the procedure using the
selected variables as follows:
PROC LOGISTIC DATA=sample1;
MODEL category(ref='C') = var1
var4 var6 var8 /LINK=glogit;
RUN;
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 40611.437 39048.985
SC 40627.339 39128.495
-2 Log L 40607.437 39028.985
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 1578.4519 8 <.0001
Score 1534.3289 8 <.0001
Wald 1447.7008 8 <.0001
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
var1 2 24.2915 <.0001
var4 2 428.3454 <.0001
var6 2 778.7953 <.0001
var8 2 17.1800 0.0002
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter category DF Estimate Error Chi-Square Pr > ChiSq
Intercept A 1 -3.0998 0.1409 483.8847 <.0001
Intercept B 1 -2.1294 0.0931 522.9083 <.0001
var1 A 1 -0.0267 0.00617 18.7590 <.0001
var1 B 1 -0.0152 0.00416 13.4127 0.0002
var4 A 1 0.1328 0.0117 129.8235 <.0001
var4 B 1 0.1487 0.00761 381.7793 <.0001
var6 A 1 0.3374 0.0222 231.3911 <.0001
var6 B 1 0.3975 0.0148 721.3059 <.0001
var8 A 1 0.0339 0.1283 0.0699 0.7915
var8 B 1 -0.3279 0.0854 14.7368 0.0001
The model is statistically sound
with all the four variables being
highly significant.
With 3 classes in out target variable,
the model produces two logit
equations:
1) Logit_A= -3.0998 –
0.0267var1 + 0.1328var4 +
0.3374var6 + 0.0339var8
2) Logit_B= -2.1294 -
0.0152 var1 + 0.1487var4 +
0.3975var6 - 0.3279var8
An additional step is required to
convert these Logits into
probabilities…
DATA probs;
SET sample1;
Logit_A= -3.0998 -0.0267*var1 +
0.1328*var4 + 0.3374*var6 +
0.0339*var8;
Logit_B= -2.1294 - 0.0152*var1 +
0.1487*var4 + 0.3975*var6 -
0.3279*var8;
Pr_C= 1/(1+exp(Logit_A) + exp(Logit_B));
Pr_A= Pr_C * exp(Logit_A) ;
Pr_B= Pr_C * exp(Logit_B) ;
RUN;
Obs category Pr_C Pr_A Pr_B
1 C 0.35865 0.14557 0.49578
2 B 0.31443 0.1523 0.53327
3 C 0.63112 0.09269 0.27619
4 B 0.40749 0.12573 0.46678
5 A 0.30994 0.1454 0.54466
6 A 0.32001 0.13846 0.54153
7 B 0.31631 0.15339 0.5303
8 B 0.56396 0.10476 0.33128
9 A 0.3457 0.13923 0.51507
10 C 0.51582 0.10483 0.37936
11 B 0.40708 0.12802 0.4649
12 C 0.76608 0.05566 0.17826
SAMPLE OUTPUT:
We can now look at the probability of group membership for each record and can also derive other
probabilities of interest such as in the following example where we decide to prioritize response
propensity.
IF:
Pr_A = probability of responding to a direct mail campaign AND making a payment
Pr_B = probability of responding to a direct mail campaign AND NOT making a payment
Pr_C = probability of NOT responding to a direct mail campaign
THEN:
Pr_D = probability of responding to a direct mail campaign is Pr_A + Pr_B
The following SAS code may be used to
convert the Logits into probabilities.
Obs Pr_C Pr_A Pr_B Pr_D
1 0.24957 0.1559 0.59453 0.75043
2 0.24957 0.1559 0.59453 0.75043
3 0.25201 0.1577 0.59029 0.74799
4 0.25219 0.16249 0.58532 0.74781
5 0.25465 0.16437 0.58099 0.74535
6 0.25581 0.16052 0.58367 0.74419
…….. …….. …….. …….. ……..
20968 0.83541 0.06025 0.10434 0.16459
20969 0.83541 0.04025 0.12434 0.16459
20970 0.85541 0.04025 0.10434 0.14459
20971 0.85767 0.0393 0.10304 0.14233
20972 0.85767 0.0393 0.10304 0.14233
20973 0.85989 0.03836 0.10175 0.14011
These top records have a high
likelihood that they will
Respond to the campaign AND high
likelihood that they will make payment.
We would therefore target these
for the next mailing.
These lower records have a high
likelihood of NOT responding (Pr_C).
Sorting the data by descending Pr_D and then by descending Pr_A gives us:
We can also incorporate profitability measures to our evaluation tables as
follows:
 Assume the value of each of the actions is as follows:
– Value of non-payer responder (B) = -$0.10
– Value of non-responder (C) = -$0.07
– Value of a payer (A) = $12.00
 The expected value of each prospect can be calculated as:
• Expected Value of Prospect = 12*Pr_A - 0.10*Pr_B - 0.07*Pr_C
All the records are then sorted by the Expected Value and the mailing performed
for prospects that have high expected values. This is an enhancement to the
traditional response model which does not incorporate profitability. This method
would pushing unprofitable prospects who otherwise have high response
propensities to the lower ranks.
Conclusion
• There are numerous situations in the analytics world in which we
could be required to address polytomous dependent variables. For
instance, deciding which product to more prominently feature on a
website for each visitor given a set of possible products on a
promotion.
• Generalized Logit Regression provides an easy to implement
solution for calculating comparable scores in such situations. The
approach is very versatile but is currently quite underutilized in
data science. It makes a great and novel addition to the data
scientist’s toolkit.

More Related Content

Viewers also liked

1.5.1 measures basic concepts
1.5.1 measures basic concepts1.5.1 measures basic concepts
1.5.1 measures basic concepts
A M
 
(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regression(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regression
mothersafe
 

Viewers also liked (20)

Logistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentationLogistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentation
 
Transparency7
Transparency7Transparency7
Transparency7
 
1.5.1 measures basic concepts
1.5.1 measures basic concepts1.5.1 measures basic concepts
1.5.1 measures basic concepts
 
(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regression(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regression
 
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
 
Intro to Logistic Regression
Intro to Logistic RegressionIntro to Logistic Regression
Intro to Logistic Regression
 
Churn modelling
Churn modellingChurn modelling
Churn modelling
 
Logistic Regression: Behind the Scenes
Logistic Regression: Behind the ScenesLogistic Regression: Behind the Scenes
Logistic Regression: Behind the Scenes
 
From logistic regression to linear chain CRF
From logistic regression to linear chain CRFFrom logistic regression to linear chain CRF
From logistic regression to linear chain CRF
 
Choice Models
Choice ModelsChoice Models
Choice Models
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
Binary Logistic Regression Example
Binary Logistic Regression ExampleBinary Logistic Regression Example
Binary Logistic Regression Example
 
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
 
Logistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levelsLogistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levels
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Probit and logit model
Probit and logit modelProbit and logit model
Probit and logit model
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Multilevel Binary Logistic Regression
Multilevel Binary Logistic RegressionMultilevel Binary Logistic Regression
Multilevel Binary Logistic Regression
 

Similar to Generalized Logistic Regression - by example (Anthony Kilili)

Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
jessiehampson
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy prediction
Armando Vieira
 
Statistical quality control
Statistical quality controlStatistical quality control
Statistical quality control
Irfan Hussain
 
Case Quality Management—ToyotaQuality Control Analytics at Toyo.docx
Case Quality Management—ToyotaQuality Control Analytics at Toyo.docxCase Quality Management—ToyotaQuality Control Analytics at Toyo.docx
Case Quality Management—ToyotaQuality Control Analytics at Toyo.docx
cowinhelen
 
ACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm AngeliniACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm Angelini
Brandon Angelini
 
Hızlı Ozet - Istatistiksel Proses Kontrol
Hızlı Ozet - Istatistiksel Proses KontrolHızlı Ozet - Istatistiksel Proses Kontrol
Hızlı Ozet - Istatistiksel Proses Kontrol
metallicaslayer
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
celenarouzie
 

Similar to Generalized Logistic Regression - by example (Anthony Kilili) (20)

Business Application of Conjoint analysis
Business Application of  Conjoint analysisBusiness Application of  Conjoint analysis
Business Application of Conjoint analysis
 
Introduction to Six Sigma
Introduction to Six SigmaIntroduction to Six Sigma
Introduction to Six Sigma
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning Algorithm
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine Learning
 
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
 
P & C Reserving Using GAMLSS
P & C Reserving Using GAMLSSP & C Reserving Using GAMLSS
P & C Reserving Using GAMLSS
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy prediction
 
ch06.ppt
ch06.pptch06.ppt
ch06.ppt
 
Statistical quality control
Statistical quality controlStatistical quality control
Statistical quality control
 
Melda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptxMelda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptx
 
Case Quality Management—ToyotaQuality Control Analytics at Toyo.docx
Case Quality Management—ToyotaQuality Control Analytics at Toyo.docxCase Quality Management—ToyotaQuality Control Analytics at Toyo.docx
Case Quality Management—ToyotaQuality Control Analytics at Toyo.docx
 
ACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm AngeliniACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm Angelini
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
Six sigma
Six sigma Six sigma
Six sigma
 
Machine Learning Model for M.S admissions
Machine Learning Model for M.S admissionsMachine Learning Model for M.S admissions
Machine Learning Model for M.S admissions
 
Hızlı Ozet - Istatistiksel Proses Kontrol
Hızlı Ozet - Istatistiksel Proses KontrolHızlı Ozet - Istatistiksel Proses Kontrol
Hızlı Ozet - Istatistiksel Proses Kontrol
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
 
report
reportreport
report
 
Mb0048 operations research
Mb0048  operations researchMb0048  operations research
Mb0048 operations research
 
Mb0048 operations research
Mb0048  operations researchMb0048  operations research
Mb0048 operations research
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Generalized Logistic Regression - by example (Anthony Kilili)

  • 1. Generalized Logit Regression Using SAS: By Example Anthony Kilili March 2007
  • 2. Introduction • The most common regression solution for a dichotomous outcome target is the use of Binary Logistic Regression (BLR). This has been the go-to technique for modeling outcomes in many industries including insurance, financial and catalogue. The technique is well-utilized in predicting outcomes such as:  Response to a mailing (yes or no)  Chargeback (yes or no)  Chargeoff (yes or no)  Attrition (yes or no) • However, there are many other situations where the dependent variable has more than two possible outcomes. For example, in telecommunications, we may need to know the type of handset prospects are most likely to choose given a choice of several models. A classical ‘choice’ problem. • In the case of customer attrition, we may be interested not only in whether or not the customer attrites, but also on whether they do so after 6 months, 1 year or 18 months of membership. A classical ‘conditional/survival probability’ problem. • These types of problems have traditionally been approached by building several separate binary models, one for each product/category/outcome. • One of the problems with this approach is that the probabilities from these models are not necessarily comparable. A prospect may rank highly in several of the models, how do we determine which product to market to them? • Another approach has been through Discriminant Analysis techniques but these are quite unforgiving if various statistical assumptions are unmet (Multivariate normality, Equality of Variances etc.). • In this article, I recommend the use of a specialized form of logistic regression called Multinomial Logit or Generalized Logistic Regression.
  • 3. Generalized Logit Regression • PROC LOGISTIC in SAS was historically only used to handle binary outcome problems. Beginning Version 8.2, the procedure was extended to polytomous dependent variables via Generalized Logit analysis using the LINK=glogit option. • For a dependent variable with three possible outcomes A,B or C, the procedure will perform variable selection and create two linear equations to calculate logits. The Logit for the third category is not produced because it is used as the ‘reference’ category.  For a target (Y) that takes the values A, B or C, we can use Y=C as the reference category and run the following SAS code: PROC LOGISTIC DATA=yourdata; MODEL Y(ref=‘C’) = Var1 var2 Var3 / SELECTION=stepwise LINK=glogit; RUN; The procedure will estimate the following logits from the 3 independent variables (var1, var2, var3):  Logit_A = intercept + β11Var1 + β12Var2 + β13Var3  Logit_B = intercept + β21Var1 + β22Var2 + β23Var3 Where: Logit_A = log(P(Y=A)/P(Y=C)) Logit_B = log(P(Y=B)/P(Y=C)) – The predicted probability for each category can be derived from these logits as: • P(Y=C)=1/(1 + expLogit_A + expLogit_B) • P(Y=A)= expLogit_A / (1 + expLogit_A + expLogit_C) • P(Y=B)= 1 - P(Y=A) – P(Y=C)
  • 4. Example Let’s take a hypothetical example of a Magazine Subscription company whose business model allows the marketing department to send direct mail solicitations which have a ‘Bill Me Later’ option. There are three possible outcomes in response to the mailers: 1)Prospects who enroll in the subscription AND subsequently make the first payment when it becomes due. We will call this OUTCOME = A. 2)Prospects who enroll in the subscription but fail to make the first payment. We will call this OUTCOME = B. 3)Non-responders, prospects who do not respond to the solicitation. We will call this OUTCOME = C. The objective of the marketing activities is the maximization of Outcome A. The data science team has been tasked with building response models that could identity the top prospects most likely to result in Outcome A so that campaigns can be laser-focused to these prospects..
  • 5. Example Obs var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 Outcome 1 11 1 10 10 11 2 0.481 0.45849 3 0.70922 11 C 2 12 1 12 8 7 4 0.522 0.58634 4 0.62200 12 A 3 8 2 11 6 10 4 0.485 0.30000 3 0.70000 8 C 4 14 2 11 9 10 2 0.571 0.80000 3 0.50322 14 B 5 13 1 4 4 7 1 0.656 0.66096 2 0.63095 13 B 6 9 2 8 6 4 2 0.621 0.56050 4 0.68550 9 B 7 18 2 11 9 10 2 0.978 0.80000 3 0.35254 18 C 8 7 2 11 9 10 4 0.485 0.43801 3 0.63667 7 C 9 12 0 13 9 7 2 0.587 0.47677 4 0.74105 12 B 10 11 2 12 10 11 4 0.499 0.62370 4 0.58971 11 B 11 11 1 5 6 2 2 0.556 0.77539 1 0.56912 11 C 12 6 2 11 9 10 2 0.529 0.47070 3 0.64353 6 B The table above is a selection of 12 prospect records from the hypothetical dataset. We would like to predict the probability of class membership using the 11 independent variables (Var1-Var11). Our target variable has the 3 outcome classes (A,B and C). The following code was first ran as a variable selection step. Other variable selection methods ( Backward or Forward) can be used by changing the SELECTION option. The significance level for retaining or adding variables can also be controlled via the SLS option. PROC LOGISTIC DATA=sample1; MODEL category(ref=‘C’) = Var1-Var11 / SELECTION=stepwise SLS=0.0001 LINK=glogit; RUN;
  • 6. SAS OUTPUT The LOGISTIC Procedure Model Information Data Set WORK.SAMPLE1 Response Variable category Number of Response Levels 3 Model generalized logit Optimization Technique Fisher's scoring Number of Observations Read 20973 Number of Observations Used 20379 Response Profile Ordered Total Value category Frequency 1 A 2390 2 B 8051 3 C 9938 Logits modeled use category='C' as the reference category. This is the first part of the output showing the name of our dataset (SAMPLE1), the target variable (CATEGORY) and the type of modeling performed (GENERALIZED LOGIT). We can also see the distribution by category and the group used as the reference (category=‘C’)
  • 7. Summary of Stepwise Selection Effect Number Score Wald Step Entered Removed DF In Chi-Square Chi-Square Pr > ChiSq 1 var6 2 1 959.2276 <.0001 2 var4 2 2 450.8481 <.0001 3 var1 2 3 48.6350 <.0001 4 var8 2 4 18.0758 0.0001 5 var9 2 5 10.8040 0.0045 6 var9 2 4 10.7888 0.0045 Analysis of Maximum Likelihood Estimates Standard Wald Parameter category DF Estimate Error Chi-Square Pr > ChiSq Intercept A 1 -3.0312 0.1428 450.4170 <.0001 Intercept B 1 -2.0812 0.0946 483.7261 <.0001 var1 A 1 -0.0263 0.00635 17.1955 <.0001 var1 B 1 -0.0147 0.00429 11.7130 0.0006 var4 A 1 0.1300 0.0118 122.0528 <.0001 var4 B 1 0.1466 0.00770 361.9967 <.0001 var6 A 1 0.3293 0.0224 215.4427 <.0001 var6 B 1 0.3904 0.0150 677.6613 <.0001 var8 A 1 -0.00841 0.1318 0.0041 0.9491 var8 B 1 -0.3561 0.0880 16.3859 <.0001 Odds Ratio Estimates Point 95% Wald Effect category Estimate Confidence Limits var1 A 0.974 0.962 0.986 var1 B 0.985 0.977 0.994 var4 A 1.139 1.113 1.165 var4 B 1.158 1.141 1.175 var6 A 1.390 1.330 1.452 var6 B 1.478 1.435 1.522 var8 A 0.992 0.766 1.284 var8 B 0.700 0.589 0.832 The stepwise method at the 0.01% level of significance, selected 4 variables for the model (Var6, Var4, Var1 and Var8). NB: VAR8 is highly significant in predicting Logit_B but not Logit_A. We could still retain this variable or preferably evaluate predictions on a validation sample with and without the variable to determine if it significantly affects the bottom line and model stability. Based on these, we can now run the procedure using the selected variables as follows: PROC LOGISTIC DATA=sample1; MODEL category(ref='C') = var1 var4 var6 var8 /LINK=glogit; RUN;
  • 8. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 40611.437 39048.985 SC 40627.339 39128.495 -2 Log L 40607.437 39028.985 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 1578.4519 8 <.0001 Score 1534.3289 8 <.0001 Wald 1447.7008 8 <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq var1 2 24.2915 <.0001 var4 2 428.3454 <.0001 var6 2 778.7953 <.0001 var8 2 17.1800 0.0002 Analysis of Maximum Likelihood Estimates Standard Wald Parameter category DF Estimate Error Chi-Square Pr > ChiSq Intercept A 1 -3.0998 0.1409 483.8847 <.0001 Intercept B 1 -2.1294 0.0931 522.9083 <.0001 var1 A 1 -0.0267 0.00617 18.7590 <.0001 var1 B 1 -0.0152 0.00416 13.4127 0.0002 var4 A 1 0.1328 0.0117 129.8235 <.0001 var4 B 1 0.1487 0.00761 381.7793 <.0001 var6 A 1 0.3374 0.0222 231.3911 <.0001 var6 B 1 0.3975 0.0148 721.3059 <.0001 var8 A 1 0.0339 0.1283 0.0699 0.7915 var8 B 1 -0.3279 0.0854 14.7368 0.0001 The model is statistically sound with all the four variables being highly significant. With 3 classes in out target variable, the model produces two logit equations: 1) Logit_A= -3.0998 – 0.0267var1 + 0.1328var4 + 0.3374var6 + 0.0339var8 2) Logit_B= -2.1294 - 0.0152 var1 + 0.1487var4 + 0.3975var6 - 0.3279var8 An additional step is required to convert these Logits into probabilities…
  • 9. DATA probs; SET sample1; Logit_A= -3.0998 -0.0267*var1 + 0.1328*var4 + 0.3374*var6 + 0.0339*var8; Logit_B= -2.1294 - 0.0152*var1 + 0.1487*var4 + 0.3975*var6 - 0.3279*var8; Pr_C= 1/(1+exp(Logit_A) + exp(Logit_B)); Pr_A= Pr_C * exp(Logit_A) ; Pr_B= Pr_C * exp(Logit_B) ; RUN; Obs category Pr_C Pr_A Pr_B 1 C 0.35865 0.14557 0.49578 2 B 0.31443 0.1523 0.53327 3 C 0.63112 0.09269 0.27619 4 B 0.40749 0.12573 0.46678 5 A 0.30994 0.1454 0.54466 6 A 0.32001 0.13846 0.54153 7 B 0.31631 0.15339 0.5303 8 B 0.56396 0.10476 0.33128 9 A 0.3457 0.13923 0.51507 10 C 0.51582 0.10483 0.37936 11 B 0.40708 0.12802 0.4649 12 C 0.76608 0.05566 0.17826 SAMPLE OUTPUT: We can now look at the probability of group membership for each record and can also derive other probabilities of interest such as in the following example where we decide to prioritize response propensity. IF: Pr_A = probability of responding to a direct mail campaign AND making a payment Pr_B = probability of responding to a direct mail campaign AND NOT making a payment Pr_C = probability of NOT responding to a direct mail campaign THEN: Pr_D = probability of responding to a direct mail campaign is Pr_A + Pr_B The following SAS code may be used to convert the Logits into probabilities.
  • 10. Obs Pr_C Pr_A Pr_B Pr_D 1 0.24957 0.1559 0.59453 0.75043 2 0.24957 0.1559 0.59453 0.75043 3 0.25201 0.1577 0.59029 0.74799 4 0.25219 0.16249 0.58532 0.74781 5 0.25465 0.16437 0.58099 0.74535 6 0.25581 0.16052 0.58367 0.74419 …….. …….. …….. …….. …….. 20968 0.83541 0.06025 0.10434 0.16459 20969 0.83541 0.04025 0.12434 0.16459 20970 0.85541 0.04025 0.10434 0.14459 20971 0.85767 0.0393 0.10304 0.14233 20972 0.85767 0.0393 0.10304 0.14233 20973 0.85989 0.03836 0.10175 0.14011 These top records have a high likelihood that they will Respond to the campaign AND high likelihood that they will make payment. We would therefore target these for the next mailing. These lower records have a high likelihood of NOT responding (Pr_C). Sorting the data by descending Pr_D and then by descending Pr_A gives us:
  • 11. We can also incorporate profitability measures to our evaluation tables as follows:  Assume the value of each of the actions is as follows: – Value of non-payer responder (B) = -$0.10 – Value of non-responder (C) = -$0.07 – Value of a payer (A) = $12.00  The expected value of each prospect can be calculated as: • Expected Value of Prospect = 12*Pr_A - 0.10*Pr_B - 0.07*Pr_C All the records are then sorted by the Expected Value and the mailing performed for prospects that have high expected values. This is an enhancement to the traditional response model which does not incorporate profitability. This method would pushing unprofitable prospects who otherwise have high response propensities to the lower ranks.
  • 12. Conclusion • There are numerous situations in the analytics world in which we could be required to address polytomous dependent variables. For instance, deciding which product to more prominently feature on a website for each visitor given a set of possible products on a promotion. • Generalized Logit Regression provides an easy to implement solution for calculating comparable scores in such situations. The approach is very versatile but is currently quite underutilized in data science. It makes a great and novel addition to the data scientist’s toolkit.