Logistic regression for ordered dependant variable with more than 2 levels
Upcoming SlideShare
Loading in...5
×
 

Logistic regression for ordered dependant variable with more than 2 levels

on

  • 629 views

Logistic Regression for ordered dependant variable with more than 2 levels

Logistic Regression for ordered dependant variable with more than 2 levels

Statistics

Views

Total Views
629
Views on SlideShare
618
Embed Views
11

Actions

Likes
0
Downloads
21
Comments
0

2 Embeds 11

http://www.linkedin.com 10
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Logistic regression for ordered dependant variable with more than 2 levels Logistic regression for ordered dependant variable with more than 2 levels Presentation Transcript

  • Multinomial Logistic Regression ModelsJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Logistic regression CAN handle dependant variables with more than two categories  It is important to note whether the response variable is ordinal (consisting of ordered categories like young, middle-aged, old) or nominal (dependant is unordered like red, blue, black)  Some multinomial logistic models are appropriate only for ordered response  It is not mathematically necessary to consider the natural ordering when modeling ordinal response but,  Considering the natural ordering  Leads to a more parsimonious model  Increase power to detect relationships with other variablesJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Applying logistic regression considering the natural order is done using a modeling technique called the “Proportional Odds Model”  Say the dependant variable Y has 4 states measuring the impact of radiation on the human body; fine, sick, serious,dead  Let p1=prob of fine, p2=prob of sick, p3=prob of serious, p4=prob of dead  Let us define a baseline category: fine, since this is the normal stage (we shall see why we need this later)January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India View slide
  •  What if we break up the modeling of the 4 level ordered dependant into 3 binary logistic situations: 1 – (fine,sick), 2 – (fine,serious),3 – (fine,dead)?  Then we would have 3 logit equations:  Log(p2/p1)=B11+B12X1+B13X2  Log(p3/p1)=B21+B22X1+B23X2  Log(p4/p1)=B31+B32X1+B33X2 X is the degree of radiation dummy with 3 levels so broken into 2 binary dummies  So, 9 parameters to be estmatedJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India View slide
  •  Now consider an alternative model for the same situation  Cumulative logit model:  L1=log(p1/p2+p3+p4)  L2=log(p1+p2/p3+p4)  L3=log(p1+p2+p3/p4)  The obvious way to introduce covariates is  L1=B11+B12X1+B13X2  L2=B21+B22X1+B23X2  L3=B31+B32X1+B33X2January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Let us simplyfy the model by specifying that the slope parameters are identical over the logit equations. Then,  L1=A1+B1X1+B2X2  L2=A2+B1X1+B2X2  L3=A3+B1X1+B2X2  This is the proportional odds cumulative logit modelJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Suppose that the categorical outcome is actually a categorized version of an unobservable (latent) continuous variable which has a logistic distribution  The continuous scale is divided into five regions by four cut-points c1, c2, c3, c4 which are determined by nature  If Z ≤ c1 we observe Y = 1; if c1 < Z ≤ c2 we observe Y = 2; and so on  Suppose that the Z is related to the X’s through a linear regression  Then, the coarsened categorical variable would be related Y will be related to the X’s by a proportional- odds cumulative logit modelJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Let us go back to the model  L1=A1+B1X1+B2X2  L2=A2+B1X1+B2X2  L3=A3+B1X1+B2X2  Note that Lj is the log-odds of falling into or below category j versus falling above it  Aj is the log-odds of falling into or below category j when X1 = X2 = 0  B1 is the increase in log-odds of falling into or below any category associated with a one-unit increase in Xk, holding all the other X-variables constant.  Therefore, a positive slope indicates a tendency for the response level to decrease as the variable decreasesJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Our example of 4 levels of impact of radiation corresponding to 3 levels of radiation proc logistic data=radiation_impact; freq count; class radiation / order=data param=ref ref=first; model sickness (order=data descending) = radiation / link=logit aggregate=(radiation) scale=none; run;January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Freq=count  This is important for specifying grouped data  Count is the variable that contains the frequency of occurrance of each observation  In its absence, each row would be considered a separate row of data  Class=radiation  Specifies that radiation is a classification variable to be used in the analysis  SAS would automatically generate n-1 binary dummies for n categories of radiation with param=ref optionJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Order=data  Simply tells SAS to arrange the response categories in the order they occur in the input data 1,2,3,4  Param=ref  This implies that there is going to be dummy coding for the classification variable ‘radiation’listed in class  Ref=first  Designates the first ordered level, in this case ‘fine’ as the reference levelJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Order=data descending  This tells SAS to reverse the order of the logits  So, instead of the cumulative logit model being  L1=log(p1/p2+p3+p4)  L2=log(p1+p2/p3+p4)  L3=log(p1+p2+p3/p4), it becomes  L1=log(p4/p1+p2+p3)  L2=log(p4+p3/p1+p2)  L3=log(p4+p3+p2/p1)  Now, a positive B1 indicates that a higher value of X1 leads to greater chance of radiation sicknessJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Link=logit  fits the cumulative logit model when there are more than two response categories  Aggregate=radiation  Indicates that the goodness of fit statistics are to be calculated on the subpopulations of the variable: radiation  Scale=none  No correction is need for the dispersion parameter  To understand this, read up. This happens when the goodness of fit statistic exceeds its degrees of freedom and need to be corrected forJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  When we fit this model, the first output we see: Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 17.2866 21 0.6936  Null hypothesis is that the current proportional-odds cumulative logit model is true  Seems like we fail to reject the null and so can proceed to the rest of the output under the current assumptionJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  •  Ultimately we are interested in the predicted probabilities OUTPUT <OUT=SAS-data-set><options>  Predicted=  For a cumulative model, it is the predicted cumulative probability (that is, the probability that the response variable is less than or equal to the value of _LEVEL_);  PREDPROBS=I or C  Individual|I requests the predicted probability of each response level.  CUMULATIVE | C requests the cumulative predicted probability of each response levelJanuary 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India