Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3

842 views

Published on

Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3

  • Be the first to comment

Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3

  1. 1. Quantitative Methods for Lawyers Class #20 Regression Analysis Part 3 @ computational computationallegalstudies.com professor daniel martin katz danielmartinkatz.com lexpredict.com slideshare.net/DanielKatz
  2. 2. Multiple Regression
  3. 3. Just a Reminder...
  4. 4. Keep This Visual Image in Your Mind
  5. 5. Estimate a lawyer’s rate: Real Rate Report™ Regression model From the CT TyMetrix/Corporate Executive Board 2012 Real Rate Report© $15 1 $16 1 $34 per 10 years$95 +$99 (Finance) -$15 (Litigation) n = 15,353 Lawyers Tier 1 Market Experience Partner Status Practice Area Base + + +/- Source: 2012 Real Rate Report™ 32 $15 Per 100 Lawyers Law Firm Size+ + $161 $151 $15 per 100 lawyers $95 $34 per 10 years -$15 (Litigation) +$99 (Finance)
  6. 6. Y = βo +/- β1 ( X1 ) +/- β2 ( X2 ) +/- β3 ( X3 ) +/- β4 ( X3 ) +/- β5 ( X3 ) + ε Y = $151 + $15 ( ) + 161 ( ) + 95 ( ) + 34 ( ) +/- β5 ( ) + ε Per 100 Lawyers If Tier 1 Market is True Partner Status is True Per 10 Years Practice Area
  7. 7. From The Last Time...
  8. 8. Now Lets Consider the More Complex Case: Relationship Between Sat Score and Expenditures/ Variety of other Variables ? Our Y Dependent Variable Our X Predictors/ Independent Variables Multivariate Regression
  9. 9. Y = B0 + ( B1 * (X1) ) – ( B2 * (X2) ) + ( B3 * (X3) ) + ( B4 * (X4)) + ( B5 * (X5) ) + ε csat = 851.56 + 0.003*expense – 2.62*percent + 0.11*income + 1.63*high + 2.03*college + ε
  10. 10. Lets Consider Our “Beta Coefficients” Are They Statistically Significant? Look at the P Value on “Expense” - It is no longer Statistically Significant
  11. 11. Two Ways to Think About Significance: Is the P Value > .05? Is the Tstat < 1.96? Variable Significant @ .05 Level expense no percent yes income no high no college no intercept yes
  12. 12. Using Our Model to Predict
  13. 13. Using Our Model to Predict What if we had a Hypothetical State with the following factors - • Per Pupil Expenditures Primary & Secondary (expense) - $6000 • % HS of graduates taking SAT (percent) - 20% • Median Household Income (income) - 33.000 • % adults with HS Diploma (high) - 70% • % adults with College Degree (college) - 15% • Midwest State (Region=South) Please Predict the Mean Score for this Hypothetical State? Here is our Model: csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
  14. 14. Using Our Model to Predict What if we had a Hypothetical State with the following factors - • Per Pupil Expenditures Primary & Secondary (expense) - $6000 • % HS of graduates taking SAT (percent) - 20% • Median Household Income (income) - 33.000 • % adults with HS Diploma (high) - 70% • % adults with College Degree (college) - 15% • Midwest State (Region=South) Here is our Model: csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε csat = 849.59 – 0.002*(6000) – 3.01*(20) – 0.17*(33.000) + 1.81*(70) + 4.67*(15) + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
  15. 15. Using Our Model to Predict csat = 849.59 – 0.002*(6000) – 3.01*(20) – 0.17*(33.000) + 1.81*(70) + 4.67*(15) + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε csat = 849.59 – 12 – 60.2 – 5.61 + 126.7 + 70.05 + - 9.18 predicted composite SAT Score = 959.35
  16. 16. Violation of Regression Assumptions
  17. 17. Heteroskedasticity Regression Analysis assumes that error terms are independently, identically and normally distributed Assumes that error terms have mean of zero and a constant variance (i.e. variance is the same throughout all subsets of values of the error terms) What does this Mean? If there is an error in our estimate - that estimate is still centered around the true variable value No Systematic Error in over/under estimating the regression coefficients
  18. 18. Heteroskedasticity Heteroscedasticity does not cause ordinary least squares coefficient estimates to be biased, although it can cause ordinary least squares estimates of the variance (and, thus, standard errors) of the coefficients to be biased, possibly above or below the true or population variance. Thus, regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect. Biased standard errors lead to biased inference, so results of hypothesis tests are possibly wrong.
  19. 19. Heteroskedasticity HeteroskedasticHomoskedastic
  20. 20. How Do I Detect Heteroskedasticity? Visual (Ocular) Method is a good starting point (although you should probably also check with a more formal approach) However, lets just start here: (1) Run the Regression (2) Plot the Residuals against the fitted values (3) Review the Resulting Plot - When plotting residuals vs. predicted values (aka Yhat) we should not observe any pattern if the variance in the residuals is homoskedastic
  21. 21. (0) Load the Data (1) Run the Regression
  22. 22. (1) Run the Regression (2) Plot the Residuals against the fitted values (3) Review the Resulting Plot - When plotting residuals vs. predicted values (aka Yhat) we should not observe any pattern if the variance in the residuals is homoskedastic
  23. 23. Take a Look ... Here we do observe residuals that slightly expand as we move along the fitted values
  24. 24. How Do I Detect Heteroskedasticity? There is a More Formal Approach ... the Breusch-Pagan test Test the Null Hypothesis of Constant Variance (1) Run the Regression (2) Execute the Breusch-Pagan test
  25. 25. How Do I Detect Heteroskedasticity? However, it is generally considered wise to use assume Heteroskedasticity and control for it in an appropriate manner This is a Fail to Reject Situation
  26. 26. Robust Standard Errors
  27. 27. Robust Standard Errors Robust Standard Errors Control for heteroskedasticity In R you can just use “rlm” instead of “lm”
  28. 28. Robust Standard Errors Compare the Two Outputs Coefficients are roughly the same but Std. Errors and T stats are different
  29. 29. Multicollinearity
  30. 30. Multicollinearity statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors.
  31. 31. Take a Look at the Visual Mean composite SAT score Per pupil expenditures prim&sec % HS graduates taking SAT Median household income, $1,000 % adults HS diploma % adults college degree From Stata
  32. 32. Take a Look at the Visual From R
  33. 33. Take a Look at the Visual Mean composite SAT score Per pupil expenditures prim&sec % HS graduates taking SAT Median household income, $1,000 % adults HS diploma % adults college degree
  34. 34. http://cran.r-project.org/web/packages/car/car.pdf
  35. 35. How Do I Detect Multicollinearity? (1) Run the Regression (2) Obtain and then Examine the Variance Inflation Factor (“VIF”) A vif > 10 or a 1/vif < 0.10 is an issue Here we look to be okay
  36. 36. Daniel Martin Katz @ computational computationallegalstudies.com lexpredict.com danielmartinkatz.com illinois tech - chicago kent college of law@

×