Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Artificial Intelligence and Law - ... by Daniel Katz 21907 views
- Quantitative Methods for Lawyers - ... by Daniel Katz 1017 views
- Quantitative Methods for Lawyers - ... by Daniel Katz 953 views
- Quantitative Methods for Lawyers - ... by Daniel Katz 1032 views
- Quantitative Methods for Lawyers - ... by Daniel Katz 1549 views
- Legal Analytics, Machine Learning ... by Daniel Katz 8235 views

842 views

Published on

Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3

No Downloads

Total views

842

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Quantitative Methods for Lawyers Class #20 Regression Analysis Part 3 @ computational computationallegalstudies.com professor daniel martin katz danielmartinkatz.com lexpredict.com slideshare.net/DanielKatz
- 2. Multiple Regression
- 3. Just a Reminder...
- 4. Keep This Visual Image in Your Mind
- 5. Estimate a lawyer’s rate: Real Rate Report™ Regression model From the CT TyMetrix/Corporate Executive Board 2012 Real Rate Report© $15 1 $16 1 $34 per 10 years$95 +$99 (Finance) -$15 (Litigation) n = 15,353 Lawyers Tier 1 Market Experience Partner Status Practice Area Base + + +/- Source: 2012 Real Rate Report™ 32 $15 Per 100 Lawyers Law Firm Size+ + $161 $151 $15 per 100 lawyers $95 $34 per 10 years -$15 (Litigation) +$99 (Finance)
- 6. Y = βo +/- β1 ( X1 ) +/- β2 ( X2 ) +/- β3 ( X3 ) +/- β4 ( X3 ) +/- β5 ( X3 ) + ε Y = $151 + $15 ( ) + 161 ( ) + 95 ( ) + 34 ( ) +/- β5 ( ) + ε Per 100 Lawyers If Tier 1 Market is True Partner Status is True Per 10 Years Practice Area
- 7. From The Last Time...
- 8. Now Lets Consider the More Complex Case: Relationship Between Sat Score and Expenditures/ Variety of other Variables ? Our Y Dependent Variable Our X Predictors/ Independent Variables Multivariate Regression
- 9. Y = B0 + ( B1 * (X1) ) – ( B2 * (X2) ) + ( B3 * (X3) ) + ( B4 * (X4)) + ( B5 * (X5) ) + ε csat = 851.56 + 0.003*expense – 2.62*percent + 0.11*income + 1.63*high + 2.03*college + ε
- 10. Lets Consider Our “Beta Coefﬁcients” Are They Statistically Signiﬁcant? Look at the P Value on “Expense” - It is no longer Statistically Signiﬁcant
- 11. Two Ways to Think About Signiﬁcance: Is the P Value > .05? Is the Tstat < 1.96? Variable Signiﬁcant @ .05 Level expense no percent yes income no high no college no intercept yes
- 12. Using Our Model to Predict
- 13. Using Our Model to Predict What if we had a Hypothetical State with the following factors - • Per Pupil Expenditures Primary & Secondary (expense) - $6000 • % HS of graduates taking SAT (percent) - 20% • Median Household Income (income) - 33.000 • % adults with HS Diploma (high) - 70% • % adults with College Degree (college) - 15% • Midwest State (Region=South) Please Predict the Mean Score for this Hypothetical State? Here is our Model: csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
- 14. Using Our Model to Predict What if we had a Hypothetical State with the following factors - • Per Pupil Expenditures Primary & Secondary (expense) - $6000 • % HS of graduates taking SAT (percent) - 20% • Median Household Income (income) - 33.000 • % adults with HS Diploma (high) - 70% • % adults with College Degree (college) - 15% • Midwest State (Region=South) Here is our Model: csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε csat = 849.59 – 0.002*(6000) – 3.01*(20) – 0.17*(33.000) + 1.81*(70) + 4.67*(15) + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
- 15. Using Our Model to Predict csat = 849.59 – 0.002*(6000) – 3.01*(20) – 0.17*(33.000) + 1.81*(70) + 4.67*(15) + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college + -34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε csat = 849.59 – 12 – 60.2 – 5.61 + 126.7 + 70.05 + - 9.18 predicted composite SAT Score = 959.35
- 16. Violation of Regression Assumptions
- 17. Heteroskedasticity Regression Analysis assumes that error terms are independently, identically and normally distributed Assumes that error terms have mean of zero and a constant variance (i.e. variance is the same throughout all subsets of values of the error terms) What does this Mean? If there is an error in our estimate - that estimate is still centered around the true variable value No Systematic Error in over/under estimating the regression coefﬁcients
- 18. Heteroskedasticity Heteroscedasticity does not cause ordinary least squares coefﬁcient estimates to be biased, although it can cause ordinary least squares estimates of the variance (and, thus, standard errors) of the coefﬁcients to be biased, possibly above or below the true or population variance. Thus, regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect. Biased standard errors lead to biased inference, so results of hypothesis tests are possibly wrong.
- 19. Heteroskedasticity HeteroskedasticHomoskedastic
- 20. How Do I Detect Heteroskedasticity? Visual (Ocular) Method is a good starting point (although you should probably also check with a more formal approach) However, lets just start here: (1) Run the Regression (2) Plot the Residuals against the ﬁtted values (3) Review the Resulting Plot - When plotting residuals vs. predicted values (aka Yhat) we should not observe any pattern if the variance in the residuals is homoskedastic
- 21. (0) Load the Data (1) Run the Regression
- 22. (1) Run the Regression (2) Plot the Residuals against the ﬁtted values (3) Review the Resulting Plot - When plotting residuals vs. predicted values (aka Yhat) we should not observe any pattern if the variance in the residuals is homoskedastic
- 23. Take a Look ... Here we do observe residuals that slightly expand as we move along the ﬁtted values
- 24. How Do I Detect Heteroskedasticity? There is a More Formal Approach ... the Breusch-Pagan test Test the Null Hypothesis of Constant Variance (1) Run the Regression (2) Execute the Breusch-Pagan test
- 25. How Do I Detect Heteroskedasticity? However, it is generally considered wise to use assume Heteroskedasticity and control for it in an appropriate manner This is a Fail to Reject Situation
- 26. Robust Standard Errors
- 27. Robust Standard Errors Robust Standard Errors Control for heteroskedasticity In R you can just use “rlm” instead of “lm”
- 28. Robust Standard Errors Compare the Two Outputs Coefﬁcients are roughly the same but Std. Errors and T stats are different
- 29. Multicollinearity
- 30. Multicollinearity statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefﬁcient estimates may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors.
- 31. Take a Look at the Visual Mean composite SAT score Per pupil expenditures prim&sec % HS graduates taking SAT Median household income, $1,000 % adults HS diploma % adults college degree From Stata
- 32. Take a Look at the Visual From R
- 33. Take a Look at the Visual Mean composite SAT score Per pupil expenditures prim&sec % HS graduates taking SAT Median household income, $1,000 % adults HS diploma % adults college degree
- 34. http://cran.r-project.org/web/packages/car/car.pdf
- 35. How Do I Detect Multicollinearity? (1) Run the Regression (2) Obtain and then Examine the Variance Inﬂation Factor (“VIF”) A vif > 10 or a 1/vif < 0.10 is an issue Here we look to be okay
- 36. Daniel Martin Katz @ computational computationallegalstudies.com lexpredict.com danielmartinkatz.com illinois tech - chicago kent college of law@

No public clipboards found for this slide

Be the first to comment