Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5

700 views

Published on

Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
700
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5

  1. 1. Quantitative Methods for Lawyers Class #22 Regression Analysis Part 5 @ computational computationallegalstudies.com professor daniel martin katz danielmartinkatz.com lexpredict.com slideshare.net/DanielKatz
  2. 2. Interaction Terms
  3. 3. Interaction Terms Sometime X1 Impacts Y and X2 Impacts Y but when both X1 and X2 are Present there is an additional impact (+ or - ) beyond Y = B0 + (B1 * (X1)) + (B2 * (X2)) + (B3 * (X3)(X2) + ε Income = B0 + B1 *Gender + B2 * Education + B3* Gender * Education + ε Our Beta Three Term Gives Us the Effect of Gender and Education Together Assuming Gender is Binary in the Model - The Interaction Will Explore the Differential Effect on Income By Gender
  4. 4. Image From - Thomas Brambor, William Roberts Clark & Matt Golder, Understanding Interaction Models: Improving Empirical Analyses, 14 Political Analysis 63 (2005) A Visual Display of Interaction Terms
  5. 5. Limited Dependent Variables (LPM and Logit)
  6. 6. Limited Dependent Variables (LPM and Logit) Sometimes the Dependent Variable of Interest is Limited Start with the Simplest Case - Binary (0,1) Lots of Good Examples - Voting (Republican / Democrat) Trial (Guilty / Not Guilty) Vote By Judges/Justices (i.e. Affirm / Reverse) Hiring (Hired / Not Hired) Admissions (Admitted / Not Admitted) etc.
  7. 7. Limited Dependent Variables (LPM and Logit) Linear Probability Model (a form of Binomial Regression) Observed variable for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated as depending on one or more explanatory variables. For the linear probability model, this relationship is a particularly simple one, and allows the model to be fitted by simple linear regression.
  8. 8. The LPM shows that the fitted values of the equation represent the probability that Yi=1 for the given values Xi. The error terms, however, are not normally distributed. Because there are only two possible outcomes, the error terms are binomially distributed, because Y can only be 0 and 1. Problem with LPM: 1) Heteroscedasticity 2) The difficulty of interpreting probabilities >1 and < 0 If Yi=0, then 0 =α + β1X1i + β2X2i + εi If Yi=1, then 1 =α + β1X1i + β2X2i + εi Linear Probability Model
  9. 9. Linear Probability Model AN EXAMPLE: Spector and Mazzeo examined the effect of a teaching method known as PSI on the performance of students in a course, intermediate macro economics. The question was whether students exposed to the method scored higher on exams in the class. They collected data from students in two classes, one in which PSI was used and another in which a traditional teaching method was employed. For each of 32 students, they gathered the following data: GRADE — coded 1 if the final grade was an A, 0 if the final grade was a B or C. 11 sample members (34.38%) got As and are coded 1. GPA — Grade point average before taking the class. Observed values range from a low of 2.06 to a high of 4.0 with mean 3.12. TUCE — the score on an exam given at the beginning of the term to test entering knowledge of the material. In the sample, TUCE ranges from a low of 12 to a high of 29 with a mean of 21.94. PSI — a dummy variable indicating the teaching method used (1 = used Psi, 0 = other method). 14 of the 32 sample members (43.75%) are in PSI. Example From http://www.nd.edu/~rwilliam/stats2/
  10. 10. Linear Probability Model
  11. 11. Linear Probability Model -1-.50.51 Residuals 0 .2 .4 .6 .8 1 Fitted values HETEROSKEDASTICITY - The Fitted v. Residuals should look like a random scatterplot ERRORS ARE NOT NORMALLY DISTRIBUTED - OLS assumes that, for each set of values for the k independent variables, the residuals are normally distributed. This is equivalent to saying that, for any given value of yhat, the residuals should be normally distributed. This assumption is also clearly violated
  12. 12. Linear Probability Model LINEARITY - the predicted values also suggest that there may be problems with the plausibility of the model and/or its coefficient estimates. Probabilities can only range between 0 and 1. However, in OLS, there is no constraint that the yhat estimates fall in the 0-1 range; indeed, yhat is free to vary between negative infinity and positive infinity. Check Out This Output to Help Make the Point Clear: -.50.51 2 2.5 3 3.5 4 gpa Fitted values grade
  13. 13. Good News is We Have a Potential Solution ... Logistic Regression
  14. 14. Linear Probability Model vs. Logistic Regression
  15. 15. Logistic Regression Logistic Regression is used for predicting the outcome of a binary dependent variable (a variable which can take only two possible outcomes, e.g. "yes" vs. "no" or "success" vs. "failure") based on one or more predictor variables. Logistic regression attempts to model the probability of a "yes/success" outcome using a linear function of the predictors.
  16. 16. Logistic regression is an approach to prediction, like Ordinary Least Squares (OLS) regression. However, with logistic regression, the researcher is predicting a dichotomous outcome. This situation poses problems for the assumptions of OLS that the error variances (residuals) are normally distributed. In logistic regression, a complex formula is required to convert back and forth from the logistic equation to the OLS-type equation. The logistic formulas are stated in terms of the probability that Y = 1, which is referred to as p hat The probability that Y is 0 is 1 - http://www.upa.pdx.edu/IOA/newsom/da2/
  17. 17. Logistic Regression The ln symbol refers to a natural logarithm and B0 + B1X is our familiar equation for the regression line. can be computed from the regression equation also. So, if we know the regression equation, we could, theoretically, calculate the expected probability that Y = 1 for a given value of X exp is the exponent function, sometimes written as e. So, the equation on the right is just the same thing but replacing exp with e NOTE: e here is not the residual. http://www.upa.pdx.edu/IOA/newsom/da2/
  18. 18. Logistic Regression Because of these complicated algebraic translations, our regression coefficients are not as easy to interpret. Our old maxim that b represents “the change in Y with one unit change in X” is no longer applicable. Instead, we have to translate using the exponent function. And, as it turns out, when we do that we have a type of “coefficient” that is pretty useful. This coefficient is called the http://www.upa.pdx.edu/IOA/newsom/da2/ ODDS RATIO.
  19. 19. http://www.ats.ucla.edu/stat/r/dae/logit.htm Logistic Regression In R
  20. 20. http://www.ats.ucla.edu/stat/stata/dae/logit.htm Logistic Regression In R For every one unit change in gre, the log odds of admission (versus non-admission) increases by 0.002. For a one unit increase in gpa, the log odds of being admitted to graduate school increases by 0.804. The indicator variables for rank have a slightly different interpretation. For example, having attended an undergraduate institution with rank of 2, versus an institution with a rank of 1, decreases the log odds of admission by 0.675.
  21. 21. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm What is an Odds Ratio? If Probability of success of some event is .8 Then the probability of failure is 1- .8 = .2 The odds of success are defined as the ratio of the probability of success over the probability of failure Thus, the odds of success are .8/.2 = 4  In other words, the odds of success are  4 to 1 EXAMPLE:
  22. 22. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm What is an Odds Ratio? If Probability of success of some event is .5 Then the probability of failure is 1- .5 = .5 The odds of success are defined as the ratio of the probability of success over the probability of failure Thus, the odds of success are .5/.5 = 1  In other words, the odds of success are  1 to 1 EXAMPLE:
  23. 23. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm What is an Odds Ratio? The transformation from probability to odds is a monotonic transformation, meaning the odds increase as the probability increases or vice versa.  We know that Probability ranges from 0 and 1.  Odds range from 0 and positive infinity.  Here is a table of the transformation from probability to odds
  24. 24. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm What is an Odds Ratio? Here is a Plot for the range of p less than or equal to .9
  25. 25. http://www.ats.ucla.edu/stat/stata/dae/logit.htm What is an Odds Ratio?  The transformation from odds to log of odds is the log transformation.  Again this is a monotonic transformation.  That is to say, the greater the odds, the greater the log of odds and vice versa.  This table shows the relationship among the probability, odds and log of odds. 
  26. 26. http://www.ats.ucla.edu/stat/stata/dae/logit.htm What is an Odds Ratio? Here is a Plot of Relationship Between Log Odds Against Odds
  27. 27. http://www.ats.ucla.edu/stat/stata/dae/logit.htm What is an Odds Ratio? Why do we take all the trouble doing the transformation from probability to log odds?  One reason is that it is usually difficult to model a variable which has restricted range, such as probability.  This transformation is an attempt to get around the restricted range problem.  It maps probability ranging between 0 and 1 to log odds ranging from negative infinity to positive infinity.  Another reason is that among all of the infinitely many choices of transformation, the log of odds is one of the easiest to understand and interpret.  The definition of an odds ratio tells us that for every unit increase in a given Xi the odds of the Y increases by a factor of that relevant coefficient
  28. 28. What is an Odds Ratio? Here is a Simple Example: The data set has 200 observations and the outcome variable used will be hon, indicating if a student is in an honors class or not.  So our p = prob(hon=1). Imagine we obtained a logit coefficient of -1.12546
  29. 29. (1-.245) Imagine 49 of 200 folks were in honors So p = 49/200 =  .245. The odds are __.245 = In other words, the intercept from the model with no predictor variables is the estimated log odds of being in honors class for the whole population of interest.  We can also transform the log of the odds back to a probability: p = exp(-1.12546)/(1+exp(-1.12546)) = .245, if we like. .3245 and the log of the odds (logit) is log(.3245) = -1.12546. 
  30. 30. Exponentiating the Coefficients / Interpreting them as Odds-Ratios
  31. 31. econ.la.psu.edu/~hbierens/ML_LOGIT.PDF For a More Technical Treatment
  32. 32. Daniel Martin Katz @ computational computationallegalstudies.com lexpredict.com danielmartinkatz.com illinois tech - chicago kent college of law@

×