2. Introduction
(p. 15.1)
• X = independent (explanatory) variable
• Y = dependent (response) variable
• Use instead of correlation
when distribution of X is fixed by researcher (i.e.,
set number at each level of X)
studying functional dependency between X and Y
3. Illustrative data (bicycle.sav)
(p. 15.1)
• Same as prior chapter
• X = percent receiving
reduce or free meal
(RFM)
• Y = percent using
helmets (HELM)
• n = 12 (outlier
removed to study
linear relation)
100
80
60
40
20
0
60
50
40
30
20
10
0
X - % Receiving Reduced Fee School Lunch
4. Regression Model (Equation)
(p. 15.2)
slope
s
line'
the
represents
intercept
s
line'
the
represents
given
a
at
of
average
predicted
represents
ˆ
where
ˆ
b
a
X
Y
y
bX
a
y
“y hat”
5. How formulas determine best line
(p. 15.2)
• Distance of points
from line =
residuals (dotted)
• Minimizes sum of
square residuals
• Least squares
regression line
100
80
60
40
20
0
60
50
40
30
20
10
0
X - % Receiving Reduced Fee School Lunch
6. Formulas for Least Squares Coefficients
with Illustrative Data (p. 15.2 – 15.3)
539
.
0
67
.
7855
1333
.
4231
XX
XY
SS
SS
b
49
.
47
)
8333
.
30
)(
539
.
0
(
8833
.
30
x
b
y
a
SPSS output:
8. Interpretation of Slope (b)
(p. 15.3)
• b = expected change in Y
per unit X
• Keep track of units!
– Y = helmet users per 100
– X = % receiving free lunch
• e.g., b of –0.54 predicts
decrease of 0.54 units of Y
for each unit X
100
80
60
40
20
0
60
50
40
30
20
10
0
X - % Receiving Reduced Fee School Lunch
Intercept
1 Unit X
Slope
9. Predicting Average Y
• ŷ = a + bx
Predicted Y = intercept + (slope)(x)
HELM = 47.49 + (–0.54)(RFM)
• What is predicted HELM when RFM = 50?
ŷ = 47.49 + (–0.54)(50) = 20.5
Average HELM predicted to be 20.5 in neighborhood
where 50% of children receive reduced or free meal
• What is average Y when x = 20?
ŷ = 47.49 +(–0.54)(20) = 36.7
10. Confidence Interval for Slope Parameter
(p. 15.4)
95% confidence Interval for ß =
where
b = point estimate for slope
tn-2,.975 = 97.5th percentile (from t table or StaTable)
seb = standard error of slope estimate (formula 5)
)
)(
( 975
,.
2 b
n se
t
b
|
XX
x
Y
b
SS
se
se
2
)
(
|
n
SS
b
SS
se XY
YY
x
Y
standard error of regression
12. Interpret 95% confidence interval
Model:
Point estimate for slope (b) = –0.54
Standard error of slope (seb) = 0.24
95% confidence interval for = (–0.78, –0.30)
Interpretation:
slope estimate = –0.54 ± 0.24
We are 95% confident the slope parameter falls
between –0.78 and –0.30
13. Significance Test
(p. 15.5)
• H0: ß = 0
• tstat (formula 7) with df = n – 2
• Convert tstat to p value
09
.
5
1058
.
0
539
.
0
b
stat
se
b
t
df =12 – 2 = 10
p = 2×area beyond tstat on t10
Use t table and StaTable
14. Regression ANOVA
(not in Reader & NR)
• SPSS also does an analysis of variance on regression model
• Sum of squares of fitted values around grand mean = (ŷi – ÿ)²
• Sum of squares of residuals around line = (yi– ŷi )²
• Fstat provides same p value as tstat
• Want to learn more about relation between ANOVA and regression?
(Take regression course)
16. Validity Assumptions
(p. 15.6)
• Data = farr1852.sav
X = mean elevation above sea level
Y = cholera mortality per 10,000
• Scatterplot (right) shows
negative correlation
• Correlation and regression
computations reveal:
r = -0.88
ŷ = 129.9 + (-1.33)x
p = .009
• Farr used these results to support
miasma theory and refute
contagion theory
– But data not valid (confounded by
“polluted water source”)
Mean Elevation of the Ground above the Higwater Mark
400
300
200
100
0
Mean
mortality
from
Cholera
per
10,000
200
100
80
60
40
20
10
8
6