Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1

1,262 views

Published on

Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,262
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1

  1. 1. Quantitative Methods for Lawyers Class #18 Regression Analysis Part 1 @ computational computationallegalstudies.com professor daniel martin katz danielmartinkatz.com lexpredict.com slideshare.net/DanielKatz
  2. 2. Here is an App that Predicts the Price Per Hour of Various Lawyers City Firm Size Partner Experience Calculate Regression Analysis in Legal Procurement http://tymetrix.com/mobile_apps/
  3. 3. Here is an App that Predicts the Price Per Hour of Various Lawyers City Firm Size Partner Experience Expected Hourly Rate Regression Analysis in Legal Procurement http://tymetrix.com/mobile_apps/ Our Dependent Variable (i.e. Y) Our Independent Variables (i.e. X1 ... Xn)
  4. 4. Estimate a lawyer’s rate: Real Rate Report™ Regression model From the CT TyMetrix/Corporate Executive Board 2012 Real Rate Report© $15 1 $16 1 $34 per 10 years$95 +$99 (Finance) -$15 (Litigation) n = 15,353 Lawyers Tier 1 Market Experience Partner Status Practice Area Base + + +/- Source: 2012 Real Rate Report™ 32 $15 Per 100 Lawyers Law Firm Size+ + $161 $151 $15 per 100 lawyers $95 $34 per 10 years -$15 (Litigation) +$99 (Finance)
  5. 5. Y = βo +/- β1 ( X1 ) +/- β2 ( X2 ) +/- β3 ( X3 ) +/- β4 ( X3 ) +/- β5 ( X3 ) + ε Y = $151 + $15 ( ) + 161 ( ) + 95 ( ) + 34 ( ) +/- β5 ( ) + ε Per 100 Lawyers If Tier 1 Market is True Partner Status is True Per 10 Years Practice Area
  6. 6. Multiple Regression Example
  7. 7. Multiple Regression Analysis https://s3.amazonaws.com/KatzCloud/elemapi.dta Load This Data Set from Stata into R
  8. 8. Multiple Regression Analysis https://s3.amazonaws.com/KatzCloud/elemapi.dta Load This Data Set from Stata into R We Need to Understand these Variables:
  9. 9. Multiple Regression Analysis Okay Lets Get the Variable Labels from Stata into R
  10. 10. Here are the measures: academic performance of the school (api00), average class size in kindergarten through 3rd grade (acs_k3) percentage of students receiving free meals (meals) - which is an indicator of poverty percentage of teachers who have full teaching credentials (full) Multiple Regression Analysis regression analysis using the variables api00 as the Y Dependent Variable acs_k3, meals, full X Independent Variable
  11. 11. Regression Analysis using the variables Y = α +/- β1 ( X1 ) +/- β2 ( X2 ) +/- β3 ( X3 ) + ε api00 = β0 - β1 ( acs_k3 ) - β2 ( meals ) + β3 ( full ) + ε Multiple Regression Analysis
  12. 12. Regression Analysis using the variables Y = α +/- β1 ( X1 ) +/- β2 ( X2 ) +/- β3 ( X3 ) + ε api00 = β0 - β1 ( acs_k3 ) - β2 ( meals ) + β3 ( full ) + ε Multiple Regression Analysis Some Hypotheses -- We might expect that better academic performance would be associated with ( - ) higher class size ( - ) fewer students receiving free meals ( + ) higher percentage of teachers having full teaching credentials  
  13. 13. api00 = β0 - β1 ( acs_k3 ) - β2 ( meals ) + β3 ( full ) + ε
  14. 14. api00 = 906.7 - 2.68 ( acs_k3 ) - 3.70 ( meals ) + .108 ( full ) + ε
  15. 15. the three predictors - are they statistically significant and what is the direction of the relationship? The average class size (acs_k3, b=-2.68), is not significant (p=0.055), but only just so. The coefficient is negative which would indicate that larger class size is related to lower academic performance -- which is what we would expect.
  16. 16. Effect of meals (b=-3.70, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance.  The meals variable is highly related to income level and functions more as a proxy for poverty. Thus, higher levels of poverty are associated with lower academic performance. This result also makes sense. 
  17. 17. Finally, the percentage of teachers with full credentials (full, b=0.11, p=.232) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance -- this result was somewhat unexpected.
  18. 18. More On Regression Analysis
  19. 19. “We use regression to estimate the unknown effect of changing one variable over another regression requires making two assumptions: 1) there is a linear relationship between two variables (i.e. X and Y) 2) this relationship is additive (i.e. Y= X1 + X2 + ...+ Xn) (Note: Additivity applies across terms - as within terms there can be a square, log, etc.) Technically, linear regression estimates how much Y changes when X changes one unit.” http://dss.princeton.edu/training/ Regression Analysis
  20. 20. Example: After controlling by other factors, are SAT scores higher in states that spend more money on education?* Outcome (Y) variable = SAT scores --> variable csat in dataset Predictor (X) variables • Per Pupil Expenditures Primary & Secondary (expense) • % HS of graduates taking SAT (percent) • Median Household Income (income) • % adults with HS Diploma (high) • % adults with College Degree (college) • Region (region) Regression Analysis *Source: search for dataset at http://www.duxbury.com/highered/ Use the file states.dta (educational data for the U.S.).
  21. 21. Getting Started Lets Begin by Loading it and Use the Head Command https://s3.amazonaws.com/KatzCloud/states.dta
  22. 22. Getting Started Use the Summary Command For Additional Information on Each Variable
  23. 23. Getting Started Lets Start Simple: We Might Hypothesize a Positive Relationship As Expenditures Go Up SAT Performances Also Goes Up Relationship Between Sat Score and Expenditures?
  24. 24. Getting Started It is Certainly NOT Definitive But a Scatterplot is a good place to start ...
  25. 25. Notice the Nature of the Relationship is not what we would naively anticipate Getting Started It is Certainly NOT Definitive But a Scatterplot is a good place to start ...
  26. 26. Getting Started It is Certainly NOT Definitive But a Scatterplot is a good place to start ... It Appears to be a N e g a t i v e Relationship Notice the Nature of the Relationship is not what we would naively anticipate
  27. 27. Bivariate Regression Notice the -.02228 for expense which is the slope of the regression line shown above w e j u s t fi t t h e regression line to this bivariate relationship
  28. 28. Bivariate Regression Y = B0 + ( B1 * (X1) ) csat = 1060.7 - (0.022*expense) For each one-point increase in expense, SAT scores decrease by 0.022 points.
  29. 29. Bivariate Regression Y = B0 + ( B1 * (X1) ) csat = 1060.7 - (0.022*expense) Look at the T Stats, P Values with a Tstat (which is Z when N>30) of Greater than 1.96 we can reject the notion that the coefficient is equal to zero
  30. 30. A Brief Word about Standard Errors N o t i c e t h a t t h e 9 5 % Confidence Interval is the Beta Coefficient ~ Plus or Minus Two Times the Standard Error The standard error of the estimate tells us the accuracy to expect from our prediction -- The standard error of a correlation coefficient is used to determine the confidence intervals around a true correlation of zero. look at the Standard Error and you can obtain the 95% Confidence Interval 1060.732 + 2(32.7) = ~1126.4 1060.732 - 2(32.7) = ~ 995.0
  31. 31. Daniel Martin Katz @ computational computationallegalstudies.com lexpredict.com danielmartinkatz.com illinois tech - chicago kent college of law@

×