This document provides a tutorial on conducting and interpreting a multiple linear regression analysis in SPSS. It contains two sections - the first outlines the steps to specify a regression analysis in SPSS using sample data. The second section interprets example SPSS output, including descriptive statistics, bivariate correlations, model summary, ANOVA table, and coefficients output. It also provides a guide for writing up the results in APA style.
3. Multiple Linear Regression Tutorial:
RSCH-8250 Advanced Quantitative Reasoning
Assignment and Tutorial Introduction
This tutorial is intended to assist RSCH8250 students in
completing the Week 6 application assignment. I
recommend that you use this tutorial as your first line of
instruction; then, if you have time, study the
textbook chapter and other resources noted in the classroom.
3
rd
edition of Field textbook:
Chapter 7 in the Field textbook, Smart Alex's Task #1 on p. 262,
using the Supermodel.sav SPSS dataset.
4
th
edition of Field textbook:
Chapter 8 in the Field textbook, Smart Alex's Task #4 on p. 355,
using the Supermodel.sav SPSS dataset.
4. The objective of the exercise is to conduct and interpret a
standard multiple regression, including assessment of
multicollinearity.
The tutorial contains two sections. Section 1 provides step-by-
step graphic user interface (GUI) screenshots for
specifying the assignment in SPSS. If you follow the steps you
will produce correct SPSS output. Section 2
presents and interprets output for a different set of variables,
and includes a results write up guide, and sample
APA style tables (the variables and data in Section 2 are “made
up” and do not reflect real research).
Section 1: SPSS Specification of the Assignment
The assignment asks you to regress the per day salary of models
(SALARY) on model’s age (AGE), number of
years having worked as a model (YEARS), and a rating of the
model’s attractiveness (BEAUTY). The
capitalized words are the respective variable names in the
Supermodel.sav SPSS dataset.
Open the dataset, The Variable View screenshot is shown
below. There are four variables in the dataset,
corresponding to the four variable described in the previous
7. appear. Click the boxes so that checkmarks appear for each
of the elements as shown at left.
For the purposes of this assignment, there is no need to
examine the dialogue boxes for Plots, Save, Options, or
Bootstrap. Even though Field discusses some regression
diagnostics, these are (except for multicollinearity) beyond
the level of this course.
So, once you have specified the statistics at left click the
Continue button, which will return you to the Linear
Regression dialogue, in which clicking the OK button will
run the analysis and produce adequate output for the
assignment. Example output is shown and interpreted in the
next section.
Section 2: Annotated Example SPSS Output, Write Up Guide,
and Sample APA Tables
The example output shown below uses variables different from
the Week 6 assignment. The purpose is to
8. explain key elements of the output, point out what to focus on,
and demonstrate how to interpret and report the
results in APA statistical style.
The criterion (aka dependent variable, what we are trying to
predict) is overall grade point average (GPA) of 9
th
grade students. The predictors (aka independent variables) are
intelligence quotient (IQ), grade earned in an
English course (ENGG), and a measure of attention deficit
(ADDSC).
Descriptive Statistics
As shown in the descriptive statistics output (from the
DESCRIPTIVES procedure in SPSS), data had been
collected on 216 individuals. The minimum, maximum, mean,
and standard deviation of each variable are
provided. Reporting on the operationalization of each variable
and the observed values in the sample give the
reader insight into the variable being analyzed. For example:
“Attention deficit was measured on a scale of 0 to
100 with higher scores indicating more pronounced attention
deficit symptomatology. In the sample of 9
10. bivariate correlations and one-tailed p values of
each pair of variables. There was a statistically significant
inverse relationship between GPA and attention
deficit score, r(214) = -.542, p < .001 one-tailed, indicating that
as attention deficit increased, GPA tended to
decrease. You can similarly report the other two bivariate
correlations with GPA, but keep in mind that these
are just descriptive because the focus is on the multiple
regression results. As FYI, the 214 in the parenthesis in
the example above is the df value. For correlations, the df value
is N – 2. The table below indicates that N = 216,
so df = 216 – 2 = 214.
Correlations
GPA ENGG ADDSC IQ
Pearson Correlation
GPA 1.000 .746 -.542 .446
ENGG .746 1.000 -.445 .283
ADDSC -.542 -.445 1.000 -.629
IQ .446 .283 -.629 1.000
Sig. (1-tailed)
11. GPA . .000 .000 .000
ENGG .000 . .000 .000
ADDSC .000 .000 . .000
IQ .000 .000 .000 .
N
GPA 216 216 216 216
ENGG 216 216 216 216
ADDSC 216 216 216 216
IQ 216 216 216 216
Regression Method
The output below simply informs us that all three variables
were entered simultaneously, which is what had to
happen because we had specified the “Enter” method. In the
results write up you just need to identify the
method used. Such as: “The purpose of the standard regression
analysis was to examine the combined and
relative effects of 9
th
grade students IQ, English grade, and attention deficit score in
12. predicting overall GPA.”
The term standard regression means that all predictors were
entered simultaneously. Two other common
methods are statistical regression (aka stepwise regression) in
which variables enter according to level of
significance, and sequential regression (aka hierarchical
regression) in which the analyst decides and specifies
the order of entry of each variable.
Variables Entered/Removed
a
Model Variables Entered Variables
Removed
Method
1
IQ, ENGG,
ADDSC
b
. Enter
a. Dependent Variable: GPA
b. All requested variables entered.
14. R Square Change F Change df1 df2 Sig. F Change
1 .793
a
.629 .624 .51832 .629 119.836 3 212 .000
a. Predictors: (Constant), IQ, ENGG, ADDSC
ANOVA F Test of the Omnibus Regression
The ANOVA output provides the test of statistical significance
of the regression. In this example, the combined effect of
student’s IQ, English
grade, and attention deficit score statistically significantly
predicted overall GPA, F(3, 212) = 119.84, p < .001, R
2
= .63. The output shows
.000 in the Sig column, but probability cannot be zero, so in
such cases, report as p < .001 (which is APA style), do not
report p = .000. To be
clear, ignore Dr. Morrow’s reporting of p = .000 in her videos
and, instead, follow APA style.
FYI for the inquisitive:
The regression sum of squares of 96.585 is the explained
variance in GPA. The residual sum of squares of 56.955 is the
variance in GPA that
15. was not explained by the three predictors. The sum of these is
the total sum of squares. If you divide the total sum of squares
by the regression
sum of squares you get the proportion of variance explained,
which is R
2
= 96.585 ÷ 153. 540 = .629. R, which in this example is .793, is
the
correlation between predicted GPA and actual GPA. That is, if
you saved the predicted GPA scores from the regression
analysis and then did a
correlation between those predicted scores and the original GPA
scores that we were predicting, the correlation would be .793.
For multiple
regression, a R value of .14 is considered a small effect, .36 a
medium effect, and .51 a large effect; these correspond to the R
2
values of .02,
.13, and .26, respectively.
ANOVA
a
Model Sum of Squares df Mean Square F Sig.
1
Regression 96.585 3 32.195 119.836 .000
17. predictor and the criterion. For example, the simple correlation
between English grade and overall GPA is shown as .746, which
is the same as
was shown in the previous correlations output. The part
correlation (aka semipartial correlation, which is the more
common term in the
literature, and is the term I will use hereafter) indexes the
unique relationship between the predictor and criterion that
none of the other
predictors explains. That is, when the predictors are correlated,
some of the variance in the criterion is explained by more than
one predictor;
the semipartial correlation filters out any shared explanation by
predictors, leaving only each predictor’s unique contribution.
If predictors are correlated, the semipartial will always be
smaller than the simple zero-order correlation (if predictors are
uncorrelated, a rare
event, the zero-order and semipartial will be equal). In this
example, the semipartial correlation between English grade and
overall GPA is .563,
much less than the simple correlation of .746. The semipartial
squared (sr
2
), a commonly reported effect size, is the proportion of variance
in
the criterion uniquely accounted for by the predictor; so,
English grade uniquely accounted for 31.7% of the variance in
18. overall GPA.
The interpretation of the partial correlation is not as
straightforward as the semipartial correlation. In addition to the
unique variance accounted
for, the partial correlation attributes to each predictor its
relative proportion of explained variance in the criterion that is
shared with other
predictors. When predictors are correlated, the partial
correlation will always be higher than the semipartial
correlation.
Relative Importance of Predictors. If interested in rank ordering
the relative importance of each predictor, such is best arranged
by using sr
2
or the absolute value of the semipartial correlations
(Tabachnick & Fidell, 2007). In this example, order of variable
importance is English grade
(sr
2
= .317), IQ (sr
2
= .017), then attention deficit score (sr
2
= .013).
19. Coefficients
a
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig. 95.0% Confidence Interval
for B
Correlations Collinearity
Statistics
B Std. Error Beta Lower Bound Upper Bound Zero-
order
Partial Part Tolerance VIF
1
(Constant) .477 .580 .823 .412 -.666 1.620
ENGG .584 .043 .628 13.451 .000 .498 .669 .746 .679 .563 .802
1.247
ADDSC -.013 .005 -.156 -2.704 .007 -.022 -.003 -.542 -.183 -
.113 .527 1.897
21. Y’ is the predicted value of the criterion (aka dependent
variable), B0 is the constant, and the numbered Bs and Xs
represent each predictor.
Contextualized to this example, the unstandardized equation is:
Overall GPA’ = .477 + .584(English grade) -.013(attention
deficit score) + .011(IQ)
The equation can be used to predict overall GPA for specific
values of each predictor. For example, if a student had an
English grade of 2.7, an
attention deficit score of 65, and an IQ of 105, predicted overall
GPA would be: .477 + .584(2.7) - .013(65) + .011(105) = 2.36.
The standardized coefficient, β (pronounced beta), indexes the
standard unit change in the criterion for a 1-standard deviation
change in the
predictor. For a 1 standard deviation increase in attention
deficit score, overall GPA is predicted to decrease by .156
standard deviations.
Predictor Significance Tests. A t test is used to determine the
statistical significance of each predictor. Technically, the t test
determines if the
B coefficient is different from 0. The t value is equal to the B
coefficient divided by its standard error (SE). For English
grade, t = .584 ÷ .043
22. = 13.58, which is within rounding error of the t value shown in
the output (for the computation to be accurate, the B and SE
values need to be
known to several more decimal places than shown in the
output). The t value is evaluated at the error degrees of freedom
(df) value, which is N
– k - 1, where k is the number of predictor variables. For IQ one
might report: “While holding the effects of the other predictors
constant, the
effect of IQ on predicting overall GPA was statistically
significant, t(212) = 3.16, p = .002, sr
2
= .017, uniquely accounting for 1.7% of the
variance in overall GPA. For each 1-point increase in IQ,
overall GPA was expected to increase .011 points (95% CI from
.004 to .019).”
Similar statements should be made for the other predictors.
Collinearity. Multicollinearity exists if one predictor is highly
predicted by the set of other predictors, which can be the case
when highly
correlated with just one of the other predictors in the set. In the
last two columns of the Coefficients output (see previous page)
the tolerance
and variance inflation factor (VIF) values can be examined to
assess multicollinearity. Tolerance values less than .1 and VIF
values greater
24. scale from 0 (F) to 4 (A)”, or “Satisfaction was measured on a
5-point likert-
type scale from 1 (not at all satisfied) to 5 (extremely
satisfied).” Please pay attention to APA style for reporting scale
anchors (see p. 91 and p.
105 in the 6
th
edition of the APA Manual).
Report descriptive statistics such as minimum, maximum, mean,
and standard deviation for each metric variable. For nominal
variables, report
percentage for each level of the variable, for example: “Of the
total sample (N = 150) there were 40 (26.7%) males and 110
(73.3%) females.”
Keep in mind that a sentence that includes information in
parentheticals must still be a sentence (and make sense) if the
parentheticals are
removed. For example, the one above without parentheticals is
still a sentence and makes sense: “Of the total sample there
were 40 males and
110 females.”
State the purpose of the analysis or provide the guiding research
question(s). If you use research questions, do not craft them
such that they can
25. be answered with a yes or no. Instead, craft them so that they
will have a quantitative answer. For example: “What is the
strength and direction
of relationship between X and Y?” or “What is the difference in
group means on X between males and females?”
Present null and alternative hypothesis sets applicable to the
analysis. For regression there would be a hypothesis set for the
overall result (i.e.,
the combined effect of the predictors) and a hypothesis set for
each predictor while “controlling for” or “holding constant” the
effects of the
other predictors.
State assumptions or other considerations for the analysis, and
report the actual statistical result for relevant tests. For this
course, the only
regression consideration that needs to be presented and
discussed is for multicollinearity. Even if violated, you must
still report and interpret
the remaining results.
Report and interpret the overall regression results. Report and
interpret the results of each predictor. Be sure to include the
actual statistical
results in text—examples were provided within the annotated
output section of this tutorial. Don’t forget to interpret the