regression.ppt

15: Linear Regression
Expected change in Y per unit X

Introduction
(p. 15.1)
• X = independent (explanatory) variable
• Y = dependent (response) variable
• Use instead of correlation
when distribution of X is fixed by researcher (i.e.,
set number at each level of X)
studying functional dependency between X and Y

Illustrative data (bicycle.sav)
(p. 15.1)
• Same as prior chapter
• X = percent receiving
reduce or free meal
(RFM)
• Y = percent using
helmets (HELM)
• n = 12 (outlier
removed to study
linear relation)
100
80
60
40
20
0
60
50
40
30
20
10
0
X - % Receiving Reduced Fee School Lunch

Regression Model (Equation)
(p. 15.2)
slope
s
line'
the
represents
intercept
s
line'
the
represents
given
a
at
of
average
predicted
represents
ˆ
where
ˆ
b
a
X
Y
y
bX
a
y 
 “y hat”

How formulas determine best line
(p. 15.2)
• Distance of points
from line =
residuals (dotted)
• Minimizes sum of
square residuals
• Least squares
regression line
100
80
60
40
20
0
60
50
40
30
20
10
0

Formulas for Least Squares Coefficients
with Illustrative Data (p. 15.2 – 15.3)
539
.
0
67
.
7855
1333
.
4231





XX
XY
SS
SS
b
49
.
47
)
8333
.
30
)(
539
.
0
(
8833
.
30 




 x
b
y
a
SPSS output:

Alternative formula for slope
X
Y
s
r
s
b 

Interpretation of Slope (b)
(p. 15.3)
• b = expected change in Y
per unit X
• Keep track of units!
– Y = helmet users per 100
– X = % receiving free lunch
• e.g., b of –0.54 predicts
decrease of 0.54 units of Y
for each unit X
100
80
60
40
20
0
60
50
40
30
20
10
0
Intercept
1 Unit X
Slope

Predicting Average Y
• ŷ = a + bx
Predicted Y = intercept + (slope)(x)
HELM = 47.49 + (–0.54)(RFM)
• What is predicted HELM when RFM = 50?
ŷ = 47.49 + (–0.54)(50) = 20.5
Average HELM predicted to be 20.5 in neighborhood
where 50% of children receive reduced or free meal
• What is average Y when x = 20?
ŷ = 47.49 +(–0.54)(20) = 36.7

Confidence Interval for Slope Parameter
(p. 15.4)
95% confidence Interval for ß =
where
b = point estimate for slope
tn-2,.975 = 97.5th percentile (from t table or StaTable)
seb = standard error of slope estimate (formula 5)
)
)(
( 975
,.
2 b
n se
t
b 

|
XX
x
Y
b
SS
se
se 
2
)
(
|



n
SS
b
SS
se XY
YY
x
Y
standard error of regression

Illustrative Example (bicycle.sav)
• 95% confidence interval for 
= –0.54 ± (t10,.975)(0.1058)
= –0.54 ± (2.23)(0.1058)
= –0.54 ± 0.24
= (–0.78, –0.30)

Interpret 95% confidence interval
Model:
Point estimate for slope (b) = –0.54
Standard error of slope (seb) = 0.24
95% confidence interval for  = (–0.78, –0.30)
Interpretation:
slope estimate = –0.54 ± 0.24
We are 95% confident the slope parameter falls
between –0.78 and –0.30

Significance Test
(p. 15.5)
• H0: ß = 0
• tstat (formula 7) with df = n – 2
• Convert tstat to p value
09
.
5
1058
.
0
539
.
0





b
stat
se
b
t
df =12 – 2 = 10
p = 2×area beyond tstat on t10
Use t table and StaTable

Regression ANOVA
(not in Reader & NR)
• SPSS also does an analysis of variance on regression model
• Sum of squares of fitted values around grand mean = (ŷi – ÿ)²
• Sum of squares of residuals around line = (yi– ŷi )²
• Fstat provides same p value as tstat
• Want to learn more about relation between ANOVA and regression?
(Take regression course)

Distributional Assumptions
(p. 15.5)
• Linearity
• Independence
• Normality
• Equal variance X
Y
Regression Line
Distribution of Y

Validity Assumptions
(p. 15.6)
• Data = farr1852.sav
X = mean elevation above sea level
Y = cholera mortality per 10,000
• Scatterplot (right) shows
negative correlation
• Correlation and regression
computations reveal:
r = -0.88
ŷ = 129.9 + (-1.33)x
p = .009
• Farr used these results to support
miasma theory and refute
contagion theory
– But data not valid (confounded by
“polluted water source”)
Mean Elevation of the Ground above the Higwater Mark
400
300
200
100
0
Mean
mortality
from
Cholera
per
10,000
200
100
80
60
40
20
10
8
6

regression.ppt

Recommended

Recommended

More Related Content

Similar to regression.ppt

Similar to regression.ppt (20)

More from VaishnaviElumalai

More from VaishnaviElumalai (20)

Recently uploaded

Recently uploaded (20)

regression.ppt