In this presentation, we will analyze multicollinearity and inference with interaction terms in regression analysis. We will also analyze partial correlation and interpretation procedures of statistical data.
1. U N I V E R S I T Y O F S O U T H F L O R I D A //
Regression Analysis
Dr. S. Shivendu
2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Regression Analysis
Analyze the multicollinearity and inference
with interaction terms in regression analysis.
01
Analyze the partial correlation and
interpretation procedures of statistical data.
02
Conduct appropriate model selection based
on statistical data.
03
3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Regression Analysis
Regression Analysis
Regression Diagnostics and Advanced Regression topics
Multicollinearity
Interaction
Partial Regression
Concepts
Model Selection
Concepts and decision-making
Working with data
SAS procedures
4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Multiple Regression Analysis
Method for studying the relationship between a dependent
variable and two or more independent variables.
Prediction Explanation
Theory building
Purposes
5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5
Assumptions
Independence
The scores of any subject are
independent of the scores of
all other subjects
Homoscedasticity
In the population, the
variances of the dependent X
variables are equal.
Normality
In the population, the scores
on the dependent variable are
normally distributed
Linearity
The relation between the
dependent and independent
variables is linear when all the
others are held constant.
6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6
VS
One dependent variable Y predicted from one independent
variable X
One regression coefficient
r2: proportion of variation in the dependent variable Y
predictable from X
Simple Regression
One dependent variable Y predicted from a set of
independent variables (X1, X2 ….Xk)
One regression coefficient for each independent variable
R2: proportion of variation in the dependent variable Y
predictable by a set of independent variables (X’s)
Multiple Regression
Regression
7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7
VS
R = the magnitude of the relationship between the
dependent variable and the best linear combination
of the predictor variables
Multiple Correlation Coefficient (R)
R2 = the proportion of variation in Y accounted for
by the set of independent variables (X’s).
Coefficient of Multiple Determination
(R2)
Differences
8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Self Concept and Academic Achievement (N=103)
9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9
The Model
The b’s are called partial regression coefficients
Our example-Predicting AA:
Y’= 36.83 + (3.52)XASC + (-.44)XGSC
Predicted AA for person with GSC of 4 and ASC of 6
Y’= 36.83 + (3.52)(6) + (-.44)(4) = 56.23
Y’ a b x
= + +
1 1 b x +
2 2
b x
k k
10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Variation: How much?
Total Variation in Y
Unpredictable Variation
Predictable variation by the combination of
independent variables
11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Proportion of Predictable and Unpredictable Variation
Where:
Y= AA
X1 = ASC
X2 =GSC
R2 = Predictable (explained)
variation in Y
(1-R2) = Unpredictable (unexplained)
variation in Y
X
X
Y
1
2
12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Various Significance Tests
Testing R2
Test R2 through an F test
Test of competing models (difference between R2)
through an F test of difference of R2s
Testing b
Test each partial regression coefficient (b) by t-tests
Comparison of partial regression coefficients with each
other - t-test of difference between standardized
partial regression coefficients ()
13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13
Testing R2
Example
What proportion of variation in AA can be
predicted from GSC and ASC?
Compute R2: R2 = .16 (R = .41) : 16%
of the variance in AA can be
accounted for by the composite of
GSC and ASC
Is R2 statistically significant from 0?
F test: Fobserved = 9.52, Fcrit (05/2,100) =
3.09
Reject H0: in the population there is
a significant relationship between AA
and the linear composite of GSC and
ASC
14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14
Comparing Models - Testing R2
Example
Comparing models
Model 1: Y’= 35.37 +
(3.38)XASC
Model 2: Y’= 36.83 +
(3.52)XASC + (-.44)XGSC
Compute R2 for each model
Model 1: R
2 = r
2 = .160
Model 2: R
2 = .161
Test difference between R2s
Fobs = .119, Fcrit(.05/1,100) =
3.94
Conclude that GSC does
not add significantly to
ASC in predicting AA
15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15
Residual Analysis
The residual for observation i, ei, is the difference between its observed and
predicted value
Check the assumptions of regression by examining the residuals
Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
Examine for constant variance for all levels of X (homoscedasticity)
e Y Y
= -
i
i i
ˆ
16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Residual Analysis for Linearity
Not Linear Linear
x
residuals
x
Y
x
Y
x
residuals
17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Residual Analysis for Independence
Not Independent Independent
X
X
residuals
residuals
X
residuals
18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Check for Normality
Examine the Stem-and-Leaf Display of the Residuals
Examine the Boxplot of the Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the Residuals
19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Residual Analysis for Normality
Percent
Residual
When using a normal probability plot, normal errors will
approximately display in a straight line
-3 -2 -1 0 1 2 3
0
100
20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Residual Analysis for Equal Variance
Non-constant Constant
x x
Y
x x
Y
residuals
residuals
21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Multicollinearity
Collinear = highly correlated
Multicollinearity = inclusion of highly correlated
independent variables in a single regression model
High correlation of X variables causes problems for
estimation of slopes (b’s)
Variable denominators approach zero, coefficients may be
wrong/too large.
22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22
Multicollinearity Symptoms
Unusually large standard
errors and betas
Two variables have the
same large effect when
included separately
Compared if both collinear
variables aren’t included
Betas often exceed 1.0
When putting together the
effects of both variables
shrink
One remains positive, and the
other flips sign
23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Multicollinearity
• What does multicollinearity do to models?
• Note: It does not violate regression assumptions
• But it can mess things up anyway
• Multicollinearity can inflate standard error estimates
• Large standard errors = small t-values = no rejected null hypotheses
• Note: Only collinear variables are affected. The rest of the model results are
OK.
• It leads to instability of coefficient estimates
• Variable coefficients may fluctuate wildly when a collinear variable is added
• These fluctuations may not be “real”, but may just reflect amplification of
“noise” and “error”
• One variable may only be slightly better at predicting Y… but SPSS will
give it a MUCH higher coefficient.
24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24
Multicollinearity
Look at correlations of all
independent vars
Correlation >.8 is a
concern
Watch out for the “symptoms”
Sometimes problems aren’t
always bivariate… and don’t
show up in bivariate
correlations
Compute diagnostic statistics
VIF (Variance Inflation
Factor).
25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25
Multicollinearity
Tolerance is based on doing a
regression: X1 is dependent; X2
and X3 are independent.
Tolerance for X1 is simply 1
minus regression R-square.
Regression r-square will be
high… 1 minus r-square will be
low… indicating a problem.
If you have 3 independent variables: X1, X2, X3…
If a variable (X1) is highly correlated with
all the others (X2, X3) then they will do a
good job of predicting it in a regression
26. U N I V E R S I T Y O F S O U T H F L O R I D A // 26
Multicollinearity
Variance Inflation Factor (VIF) is the reciprocal of
tolerance: 1/tolerance
High VIF indicates multicollinearity
Gives an indication of how much the Standard Error of a
variable grows due to the presence of other variables.
27. U N I V E R S I T Y O F S O U T H F L O R I D A // 27
Multicollinearity
Drop unnecessary variables
If two collinear variables are really measuring the same
thing, drop one or make an index
Advanced techniques like the Ridge regression. It uses a
more efficient estimator (but not BLUE – may introduce
bias).
Solutions to multicollinearity. It can be difficult if a fully
specified model requires several collinear variables
28. U N I V E R S I T Y O F S O U T H F L O R I D A // 28
Dummy Variables
How can we incorporate nominal variables (e.g., race,
gender) into regression?
Option 1: Analyze each sub-group separately. Generates different
slopes, constant for each group
Option 2: Dummy variables, a dichotomous variable coded to indicate
the presence or absence of something. Absence coded as zero, presence
coded as 1.
29. U N I V E R S I T Y O F S O U T H F L O R I D A // 29
Dummy Variables: Interpretation
INCOME
100000
80000
60000
40000
20000
0
HAPPY
10
9
8
7
6
5
4
3
2
1
0
The overall slope for all
data points
Note: Line for men, women
have the same slope… but one
is high other is lower. The
constant differs!
If women=1, men=0: The
constant (a) reflects men only.
Dummy coefficient (b) reflects an
increase for women (relative to
men)
Visually: Women = blue, Men = red
30. U N I V E R S I T Y O F S O U T H F L O R I D A // 30
Dummy Variables
Dummy coefficients shouldn’t be called slopes
Referring to the “slope” of gender doesn’t make sense
Rather, it is the difference in the constant (or “level”)
The contrast is always with the nominal category that was left out of the equation
If DFEMALE is included, the contrast is with males
If DBLACK, DOTHER are included, coefficients reflect difference in constant compared to whites.
31. U N I V E R S I T Y O F S O U T H F L O R I D A // 31
Interaction Terms
What if you suspect that a variable has a totally different slope for
two different sub-groups in your data?
Perhaps men are more materialistic -- an extra dollar increases
their happiness a lot
If women are less materialistic, each dollar has a smaller effect
on income (compared to men)
Rather, the slope of a variable (income) differs across groups
The issue isn’t men = “more” or “less” than women
Example: Income and Happiness
32. U N I V E R S I T Y O F S O U T H F L O R I D A // 32
Interaction Terms
Visually: Women = blue, Men = red
INCOME
100000
80000
60000
40000
20000
0
HAPPY
10
9
8
7
6
5
4
3
2
1
0
Overall slope for all data
points
Note: Here, the slope for men
and women differs.
The effect of income on
happiness (X1 on Y) varies with
gender (X2). This is called an
“interaction effect”
33. U N I V E R S I T Y O F S O U T H F L O R I D A // 33
Interactions Terms
Examples of interaction:
Effect of education on income may interact with type of school attended (public vs.
private)
Private schooling has bigger effect on income
Effect of aspirations on educational attainment interacts with poverty
Aspirations matter less if you don’t have money to pay for college
Question: Can you think of examples of two variables that might interact?
From your final project? Or anything else?
34. U N I V E R S I T Y O F S O U T H F L O R I D A // 34
Interaction Terms
Interaction effects: Differences in the relationship (slope) between two
variables for each category of a third variable
Option #1: Analyze each group separately
Look for different sized slope in each group
Option #2: Multiply the two variables of interest: (DFEMALE, INCOME)
to create a new variable
Called: DFEMALE*INCOME
Add that variable to the multiple regression model.
35. U N I V E R S I T Y O F S O U T H F L O R I D A // 35
Interaction Terms
Consider the following regression equation:
i
i
i
i e
INC
DFEM
b
INCOME
b
a
Y
*
2
1
Question: What if the case is male?
Answer: DFEMALE is 0, so b2(DFEM*INC) drops out of the
equation
Result: Males are modeled using the ordinary regression
equation: a + b1X + e.
36. U N I V E R S I T Y O F S O U T H F L O R I D A // 36
Interaction Terms
Consider the following regression equation:
i
i
i
i e
INC
DFEM
b
INCOME
b
a
Y
*
2
1
Question: What if the case is female?
Answer: DFEMALE is 1, so b2(DFEM*INC) becomes b2*INCOME, which is added to b1
Result: Females are modeled using a different regression line: a + (b1+b2) X + e
Thus, the coefficient of b2 reflects difference in the slope of INCOME for women.
37. U N I V E R S I T Y O F S O U T H F L O R I D A // 37
Interpreting Interaction Terms
• Interpreting interaction terms:
• A positive b for DFEMALE*INCOME indicates the slope for income is
higher for women vs. men
• A negative effect indicates the slope is lower
• Size of coefficient indicates actual difference in slope
• Example: DFEMALE*INCOME. Observed b’s:
• Income: b = .5
• DFEMALE * INCOME: b = -.2
• Interpretation: Slope is .5 for men, .3 for women.
38. U N I V E R S I T Y O F S O U T H F L O R I D A // 38
Interaction Terms
Two continuous variables can also interact
Example: Effect of education and income on happiness
Perhaps highly educated people are less materialistic
As education increases, the slope between between income and happiness would decrease
Simply multiply Education and Income to create the interaction term “EDUCATION*INCOME”
And add it to the model.
39. U N I V E R S I T Y O F S O U T H F L O R I D A // 39
Interpreting Interaction Terms
How do you interpret continuous variable interactions?
Example: EDUCATION*INCOME: Coefficient = 2.0
Answer: For each unit change in education, the slope of income vs. happiness increases by 2
Note: coefficient is symmetrical: For each unit change in income, education slope
increases by 2
Dummy interactions effectively estimate 2 slopes: one for each group
Continuous interactions result in many slopes: Each value of education*income
yields a different slope.=
40. U N I V E R S I T Y O F S O U T H F L O R I D A // 40
Dummy Interactions
It is also possible to construct interaction terms based on two dummy variables
Instead of a “slope” interaction, dummy interactions show difference in constants
Constant (not slope) differs across values of a third variable
Example: Effect of race on school success varies by gender
African Americans do less well in school; but the difference is much larger for black
males.
41. U N I V E R S I T Y O F S O U T H F L O R I D A // 41
Interaction Terms
If you make an interaction, you should also include the component variables in the model:
A model with “DFEMALE * INCOME” should also include DFEMALE and INCOME
There are rare exceptions. But when in doubt, include them
Sometimes interaction terms are highly correlated with its components
That can cause problems (multicollinearity – which we’ll discuss next week).
Make sure you have enough cases in each group for your interaction terms
Interaction terms involve estimating slopes based on sub-groups in your data (e.g., black
females).
If you there are hardly any black females in the dataset, you can have problems.
42. U N I V E R S I T Y O F S O U T H F L O R I D A // 42
Partial Correlation
A partial correlation measures the relationship between two variables (X and Y) while
eliminating the influence of a third variable (Z).
Partial correlations are used to reveal the real, underlying relationship between two
variables when researchers suspect that the apparent relation may be distorted by a
third variable.
42
43. U N I V E R S I T Y O F S O U T H F L O R I D A // 43
Partial Correlation
For example, there probably is no underlying relationship between weight and mathematics skill for
elementary school children.
However, both variables are positively related to age: Older children weigh more and, because they have
spent more years in school, have higher mathematics skills.
43
44. U N I V E R S I T Y O F S O U T H F L O R I D A // 44
Partial Correlation
As a result, weight and mathematics skill will show a positive correlation for a sample of
children that includes several different ages.
A partial correlation between weight and mathematics skill, holding age constant, would
eliminate the influence of age and show the true correlation which is near zero.
44
45. U N I V E R S I T Y O F S O U T H F L O R I D A // 45
Properties of Partial Correlation
It Falls between -1 and +1.
The larger the absolute value, the stronger the association, controlling for the other variables
Does not depend on units of measurement
Has the same sign as the corresponding partial slope in the prediction equation
Can regard as approximating the ordinary correlation between y and x1 at a fixed value
of x2.
Equals ordinary correlation found for data points in the corresponding partial regression plot
Squared partial correlation has a proportional reduction in error (PRE) interpretation for
predicting y using that predictor, controlling for other explain Var’s in the model.
1 2
.
yx x
r
46. U N I V E R S I T Y O F S O U T H F L O R I D A // 46
Stepwise Regression
Forward Selection
Backward Elimination
Iterative; one independent
variable at a time is added or
deleted based on the F statistic
Different subsets of the
independent variables
are evaluated
Best-Subsets Regression
The first 3 procedures are heuristics.
There is no guarantee that the
best model will be found.
Model Selection: Variable Selection Procedures
47. U N I V E R S I T Y O F S O U T H F L O R I D A // 47
Variable Selection: Stepwise Regression
If no variable can be removed and no variable can be added, the procedure stops.
At each iteration, the first consideration is to see whether the least significant
variable currently in the model can be removed because its F value is less than
the user-specified or default Alpha to remove.
If no variable can be removed, the procedure checks to see whether the most
significant variable not in the model can be added because its F value is
greater than the user-specified or default Alpha to enter.
48. U N I V E R S I T Y O F S O U T H F L O R I D A // 48
Variable Selection: Forward Selection
This procedure is like stepwise regression but does not permit a
variable to be deleted.
This forward-selection procedure starts with no independent variables.
It adds variables one at a time as long as a significant reduction in the
error sum of squares (SSE) can be achieved.
49. U N I V E R S I T Y O F S O U T H F L O R I D A // 49
Variable Selection: Backward Elimination
This procedure begins with a model that includes all the
independent variables the modeler wants to be considered.
It then attempts to delete one variable at a time by determining
whether the least significant variable currently in the model can be
removed because its p-value is less than the user-specified or
default value.
Once a variable has been removed from the model it cannot
reenter at a subsequent step.
50. U N I V E R S I T Y O F S O U T H F L O R I D A // 50
Variable Selection: Best-Subsets Regression
Some software packages include best-subsets regression that enables
the user to find, given a specified number of independent variables, the
best regression model.
The three preceding procedures are one-variable-at-a-time methods
offering no guarantee that the best model for a given number of
variables will be found.
51. U N I V E R S I T Y O F S O U T H F L O R I D A // 51
With positive autocorrelation, we expect a positive residual in one
period to be followed by a positive residual in the next period.
With positive autocorrelation, we expect a negative residual in one
period to be followed by a negative residual in the next period.
With negative autocorrelation, we expect a positive residual in one
period to be followed by a negative residual in the next period, then a
positive residual, and so on.
Regression with Time Series Data:
Autocorrelation and the Durbin-Watson Test
52. U N I V E R S I T Y O F S O U T H F L O R I D A // 52
When autocorrelation is present, one of the regression assumptions is
violated: the error terms are not independent.
When autocorrelation is present, serious errors can be made in
performing tests of significance based upon the assumed regression
model.
The Durbin-Watson statistic can be used to detect first-order
autocorrelation.
Autocorrelation and the Durbin-Watson Test
53. U N I V E R S I T Y O F S O U T H F L O R I D A // 53
If successive values are far apart (negative autocorrelation is
present), the statistic will be large.
The statistic ranges in value from zero to four.
A value of two indicates no autocorrelation.
If successive values of the residuals are close together (positive
autocorrelation is present) the statistic will be small.
Autocorrelation and the Durbin-Watson Test
54. U N I V E R S I T Y O F S O U T H F L O R I D A // 54
Key Takeaway
Checking for assumptions of the regression model is key to interpreting the
results.
Even if the regression model assumption is met, the presence of
multicollinearity can lead to a bad inference.
Dummy variables and interaction terms are powerful tools for building
insightful models.
Model selection is key for inferential purposes.
55. U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end
of the presentation.