This document summarizes a class on regression analysis in R. It discusses using the Stargazer package to generate LaTeX code for regression tables. It covers interpreting regression output, including thinking about relationships between variables in a ceteris paribus manner and discussing how changes in independent variables affect the dependent variable. The document also discusses checking for non-linear relationships and adding transformed or interaction variables to improve model fit.
3. “Stargazer is a new R package
that creates LaTeX code for well-
formatted regression tables, with
multiple models side-by-side, as
well as for summary statistics
tables. It can also output the
content of data frames directly
into LaTeX.”
If you want to go further in this
area you probably need to learn
some LaTeX.
LaTeX is the industry standard
for type setting technical
documents
4. If you do not like LyX:
First you need load a TeX Package:
http://en.wikipedia.org/wiki/Comparison_of_TeX_editors
MacTeX http://tug.org/mactex/
http://miktex.org/MikTeX
Then it is useful to have IDE:
http://www.lyx.org/
13. (2) File > New
http://www.lyx.org/(1) Open
(3) Start a LaTeX Box
(4) Cut from R output + Then Paste the LaTeX Code in box
(5) Then Hit this Button to See Output
starting here: ending here:
22. A Quick Primer on Interpreting
Regression Output
How Should We Discuss the
R e l a t i o n s h i p B e t w e e n
Independent Variables and
Dependent Variables?
We Think in a Ceteris
paribus Manner
(i.e. All Other Things
Being Equal)
24. How Should We Discuss the
R e l a t i o n s h i p B e t w e e n
Independent Variables and
Dependent Variables?
We Think in a Ceteris paribus
Manner (All Other Things Being
Equal)
25. How Should We Discuss the
R e l a t i o n s h i p B e t w e e n
Independent Variables and
Dependent Variables?
We Think in a Ceteris paribus
Manner (All Other Things Being
Equal)
The Implies We Are Interested in a Thought Experiment:
If We Were To Change Some Independent Variable by 1 Unit
-- What Would Be the Corresponding Effect on Y?
This Should be Considered Both in the Case of a
Regular Variable and a Dummy/Indicator Variable
26. The Implies We Are Interested in a
Thought Experiment:
If We Were To Change Some
Independent Variable by 1 Unit --
What Would Be the
Corresponding Effect on Y?
This Should be Considered Both in the Case of a Regular Variable
and a Dummy/Indicator Variable
Start with “College” Variable -
3.38 is the Beta Coefficient on College
27. Start with “College” Variable -
Thinking in a Ceteris Paribus
Manner
3.38 is the Beta Coefficient on College
28. Y = B0 + ( B1 * (X1) ) – ( B2 * (X2) ) + ( B3 * (X3) ) + ( B4 * (X4)) + ( B5 * (X5) ) +
( B6 * (X6) ) + ( B7 * (X7) ) + ( B8 * (X8) ) + ε
Start with “College” Variable -
Thinking in a Ceteris Paribus
Manner
3.38 is the Beta Coefficient on College
29. Y = B0 + ( B1 * (X1) ) – ( B2 * (X2) ) + ( B3 * (X3) ) + ( B4 * (X4)) + ( B5 * (X5) ) +
( B6 * (X6) ) + ( B7 * (X7) ) + ( B8 * (X8) ) + ε
csat = 786.30 – 0.004*expense – 3.02*percent + 0.48*income + 2.30*high + 3.38*college
+ 76.84*1 if region2=true + 27.26* 1 if region3=true + 34.35* 1 if region4=true + ε
Start with “College” Variable -
Thinking in a Ceteris Paribus
Manner
3.38 is the Beta Coefficient on College
30. Y = B0 + ( B1 * (X1) ) – ( B2 * (X2) ) + ( B3 * (X3) ) + ( B4 * (X4)) + ( B5 * (X5) ) +
( B6 * (X6) ) + ( B7 * (X7) ) + ( B8 * (X8) ) + ε
csat = 786.30 – 0.004*expense – 3.02*percent + 0.48*income + 2.30*high + 3.38*college
+ 76.84*1 if region2=true + 27.26* 1 if region3=true + 34.35* 1 if region4=true + ε
Start with “College” Variable -
Thinking in a Ceteris Paribus
Manner
All Else Equal - For Each 1 Unit Change in
“College” there is a corresponding 3.38 Unit
Change in “Csat”
3.38 is the Beta Coefficient on College
31. Thinking in a Ceteris Paribus
Manner
76.84 if region =2 is True
27.26 if region =3 is True
34.35 if region =4 is True
Otherwise if if region =1 is True
we retain the Default Coefficient Estimates
Notice that
there are
really 4
Separate
Models
Here
csat = 786.30 – 0.004*expense – 3.02*percent + 0.48*income + 2.30*high + 3.38*college
+ 76.84*1 if region2=true + 27.26* 1 if region3=true + 34.35* 1 if region4=true + ε
Y = B0 + ( B1 * (X1) ) – ( B2 * (X2) ) + ( B3 * (X3) ) + ( B4 * (X4)) + ( B5 * (X5) ) +
( B6 * (X6) ) + ( B7 * (X7) ) + ( B8 * (X8) ) + ε
32. Non Linearities and
Transformations
Okay This is the Interpretation in the Linear Case
From a Model / Prediction Standpoint, Failure to Adjust to Account
for Non-Linearity might lead to Type II Error
Sometimes Data Does not Neatly Conform to Our Linearity
Assumption
33. Non Linearities and
Transformations
Simple Linear Model
Y = B0 + (B1 * (X1)) + ε
Y = B0 (B1 * (X1)2
) + ε
Polynomial Regression Model
“Lin- Log” Model
Y = B0 + (B1 * (ln X1)) + ε
Dependent Variable is Linear
1 or More Indep Var is Log
In this Case of X^2
this is a Negative quadratic Function
_
34. How Do We Determine that a
Transformation is Appropriate?
These Are the Variables From Our Model
35. How Do We Determine that a
Transformation is Appropriate?
Mean
composite
SAT
score
Per pupil
expenditures
prim&sec
% HS
graduates
taking
SAT
Median
household
income,
$1,000
%
adults
HS
diploma
% adults
college
degree
Take
A
Look
at
this
36. How Do We Determine that a
Transformation is Appropriate?
Plot the Relationship
Between X & Y and
Observe the
Relationship
L e t s L o o k a t
“ C s a t ” a n d
“Percent”
37. How Do We Determine that a
Transformation is Appropriate?
R e l a t i o n s h i p
looks non-Linear
-- “Curvilinear”
Aka
Curve
+
Line
38. How Do We Determine that a
Transformation is Appropriate?
It Appears that a Polynomial (Quadratic) relationship probably exists
thus, it makes sense to add a square version of it
-300-200-1000100
Augmentedcomponentplusresidual
0 20 40 60 80
% HS graduates taking SAT
The command acprplot (augmented
component-plus-residual plot) provides
a graphical way to examine linearity.
Run this command after running a
regression
regress csat percent
This is a Stata Command
There is an alternative in R
39. How Do I Generate
a New Variable?
We Want to Generate a New Variable Called
“Percent Squared”
Here is How We Do This In R
42. Other Transformations
We Might Have A Variable Whose Relationship was Non-Linear
and follow a Natural Log
Include in the Model and Look at the Corresponding Model Fit
NOTE YOU CAN ALSO TRANSFORM THE DEPENDENT VARIABLE
ln Y = B0 + (B1 * (X1)) + ε
43. How To Understand
Log Transformed
Regression Output
Dependent Variable is not in Log Form, Independent Variable is in Log Form (aka Linear-Log)
“A 1 Percent Change in the Independent Variable is associated with a (.01* Beta) Change in
the Dependent Variable”
Dependent Variable is in Log Form, Independent Variable in Not in Log Form (aka Log-Linear)
“A Change in the Independent Variable by 1 unit is associated with a (100percent * Beta)
Change in the Dependent Variable”
Dependent Variable is in Log Form, Independent Variable in Not in Log Form (aka Log-Log)
“A Change in the Independent Variable by 1 unit is associated with a (Beta % Change) in the
Dependent Variable”
45. Interaction Terms
Sometime X1 Impacts Y and X2 Impacts Y but when both X1 and
X2 are Present there is an additional impact (+ or - ) beyond
Y = B0 + (B1 * (X1)) + (B2 * (X2)) + (B3 * (X3)(X2) + ε
Income = B0 + B1 *Gender + B2 * Education + B3* Gender * Education + ε
Our Beta Three Term Gives Us the Effect of Gender and Education
Together
Assuming Gender is Binary in the Model - The Interaction Will
Explore the Differential Effect on Income By Gender
46. Image From - Thomas Brambor, William Roberts Clark & Matt Golder, Understanding Interaction Models:
Improving Empirical Analyses, 14 Political Analysis 63 (2005)
A Visual Display of
Interaction Terms
47. For More on
Interaction Terms ...
Thomas Brambor, William Roberts Clark & Matt Golder, Understanding Interaction
Models: Improving Empirical Analyses, 14 Political Analysis 63 (2005)
48. Daniel Martin Katz
@ computational
computationallegalstudies.com
lexpredict.com
danielmartinkatz.com
illinois tech - chicago kent college of law@