8. Now Lets Consider the More Complex Case:
Relationship Between Sat Score and Expenditures/
Variety of other Variables ?
Our Y
Dependent
Variable
Our X Predictors/
Independent Variables
Multivariate Regression
10. Lets Consider Our
“Beta Coefficients”
Are They
Statistically
Significant?
Look at the
P Value on
“Expense” -
It is no longer
Statistically
Significant
11. Two Ways to Think
About Significance:
Is the P Value > .05?
Is the Tstat < 1.96?
Variable
Significant
@ .05 Level
expense no
percent yes
income no
high no
college no
intercept yes
13. Using Our Model to Predict
What if we had a Hypothetical State with the following factors -
• Per Pupil Expenditures Primary & Secondary (expense) - $6000
• % HS of graduates taking SAT (percent) - 20%
• Median Household Income (income) - 33.000
• % adults with HS Diploma (high) - 70%
• % adults with College Degree (college) - 15%
• Midwest State (Region=South)
Please Predict the Mean Score for this Hypothetical State?
Here is our Model:
csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college +
-34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
14. Using Our Model to Predict
What if we had a Hypothetical State with the following factors -
• Per Pupil Expenditures Primary & Secondary (expense) - $6000
• % HS of graduates taking SAT (percent) - 20%
• Median Household Income (income) - 33.000
• % adults with HS Diploma (high) - 70%
• % adults with College Degree (college) - 15%
• Midwest State (Region=South)
Here is our Model:
csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college +
-34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
csat = 849.59 – 0.002*(6000) – 3.01*(20) – 0.17*(33.000) + 1.81*(70) + 4.67*(15) +
-34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
15. Using Our Model to Predict
csat = 849.59 – 0.002*(6000) – 3.01*(20) – 0.17*(33.000) + 1.81*(70) + 4.67*(15) +
-34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
csat = 849.59 – 0.002*expense – 3.01*percent – 0.17*income + 1.81*high + 4.67*college +
-34.57*1 if regionWest=true + 34.87* 1 if regionNorthEast=true - 9.18* 1 if regionSouth=true + ε
csat = 849.59 – 12 – 60.2 – 5.61 + 126.7 + 70.05 + - 9.18
predicted composite SAT Score = 959.35
17. Heteroskedasticity
Regression Analysis assumes that error terms are independently,
identically and normally distributed
Assumes that error terms have mean of zero and a constant variance
(i.e. variance is the same throughout all subsets of values of the
error terms)
What does this Mean?
If there is an error in our estimate - that estimate is still centered
around the true variable value
No Systematic Error in over/under estimating the regression
coefficients
18. Heteroskedasticity
Heteroscedasticity does not cause ordinary least squares coefficient
estimates to be biased, although it can cause ordinary least squares
estimates of the variance (and, thus, standard errors) of the coefficients to
be biased, possibly above or below the true or population variance.
Thus, regression analysis using heteroscedastic data will still provide an
unbiased estimate for the relationship between the predictor variable and
the outcome, but standard errors and therefore inferences obtained from
data analysis are suspect.
Biased standard errors lead to biased inference, so results of hypothesis
tests are possibly wrong.
20. How Do I Detect
Heteroskedasticity?
Visual (Ocular) Method is a good starting point (although you
should probably also check with a more formal approach)
However, lets just start here:
(1) Run the Regression
(2) Plot the Residuals against the fitted values
(3) Review the Resulting Plot -
When plotting residuals vs. predicted values (aka Yhat) we
should not observe any pattern if the variance in the
residuals is homoskedastic
22. (1) Run the Regression
(2) Plot the Residuals against the fitted values
(3) Review the Resulting Plot -
When plotting residuals vs. predicted values
(aka Yhat) we should not observe any pattern
if the variance in the residuals is
homoskedastic
23. Take a Look ...
Here we do observe
residuals that slightly
expand as we move
along the fitted values
24. How Do I Detect
Heteroskedasticity?
There is a More Formal Approach ...
the Breusch-Pagan test
Test the Null Hypothesis of Constant Variance
(1) Run the Regression
(2) Execute the Breusch-Pagan test
25. How Do I Detect
Heteroskedasticity?
However, it is generally considered wise to use assume
Heteroskedasticity and control for it in an appropriate manner
This is a Fail to Reject
Situation
30. Multicollinearity
statistical phenomenon in which two or more predictor variables in
a multiple regression model are highly correlated.
In this situation the coefficient estimates may change erratically in
response to small changes in the model or the data.
Multicollinearity does not reduce the predictive power or reliability
of the model as a whole, at least within the sample data
themselves; it only affects calculations regarding individual
predictors.
31. Take a Look at the Visual
Mean
composite
SAT
score
Per pupil
expenditures
prim&sec
% HS
graduates
taking
SAT
Median
household
income,
$1,000
%
adults
HS
diploma
% adults
college
degree
From
Stata
33. Take a Look at the Visual
Mean
composite
SAT
score
Per pupil
expenditures
prim&sec
% HS
graduates
taking
SAT
Median
household
income,
$1,000
%
adults
HS
diploma
% adults
college
degree
35. How Do I Detect
Multicollinearity?
(1) Run the Regression
(2) Obtain and then Examine the Variance Inflation Factor (“VIF”)
A vif > 10 or a 1/vif < 0.10 is an issue
Here we look to be okay
36. Daniel Martin Katz
@ computational
computationallegalstudies.com
lexpredict.com
danielmartinkatz.com
illinois tech - chicago kent college of law@