View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
1.
Chapter 13: Simple Regression and Correlation Analysis 1
Chapter 13
Simple Regression and Correlation Analysis
LEARNING OBJECTIVES
The overall objective of this chapter is to give you an understanding of bivariate
regression and correlation analysis, thereby enabling you to:
1. Compute the equation of a simple regression line from a sample of data and
interpret the slope and intercept of the equation.
2. Understand the usefulness of residual analysis in testing the assumptions
underlying regression analysis and in examining the fit of the regression line to
the data.
3. Compute a standard error of the estimate and interpret its meaning.
4. Compute a coefficient of determination and interpret it.
5. Test hypotheses about the slope of the regression model and interpret the results.
6. Estimate values of y using the regression model.
CHAPTER TEACHING STRATEGY
This chapter is about all aspects of simple (bivariate, linear) regression. Early in
the chapter through scatter plots, the student begins to understand that the object of
simple regression is to fit a line through the points. Fairly soon in the process, the student
learns how to solve for slope and y intercept and develop the equation of the regression
line. Most of the remaining material on simple regression is to determine how good the
fit of the line is and if assumptions underlying the process are met.
The student begins to understand that by entering values of the independent
variable into the regression model, predicted values can be determined. The question
2.
Chapter 13: Simple Regression and Correlation Analysis 2
then becomes: Are the predicted values good estimates of the actual dependent values?
One rule to emphasize is that the regression model should not be used to predict for
independent variable values that are outside the range of values used to construct the
model. MINITAB issues a warning for such activity when attempted. There are many
instances where the relationship between x and y are linear over a given interval but
outside the interval the relationship becomes curvilinear or unpredictable. Of course,
with this caution having been given, many forecasters use such regression models to
extrapolate to values of x outside the domain of those used to construct the model.
Whether the forecasts obtained under such conditions are any better than "seat of the
pants" or "crystal ball" estimates remains to be seen.
The concept of residual analysis is a good one to show graphically and
numerically how the model relates to the data and the fact that it more closely fits some
points than others, etc. A graphical or numerical analysis of residuals demonstrates that
the regression line fits the data in a manner analogous to the way a mean fits a set of
numbers. The regression model passes through the points such that the geometric
distances will sum to zero. The fact that the residuals sum to zero points out the need to
square the errors (residuals) in order to get a handle on total error. This leads to the sum
of squares error and then on to the standard error of the estimate. In addition, students
can learn why the process is called least squares analysis (the slope and intercept
formulas are derived by calculus such that the sum of squares of error is minimized -
hence "least squares"). Students can learn that by examining the values of se, the
residuals, r2
, and the t ratio to test the slope they can begin to make a judgment about the
fit of the model to the data. Many of the chapter problems ask the student to comment on
these items (se, r2
, etc.).
It is my view that for many of these students, an important facet of this chapter
lies in understanding the "buzz" words of regression such as standard error of the
estimate, coefficient of determination, etc. They may well only interface regression again
as some type of computer printout to be deciphered. The concepts then become as
important or perhaps more important than the calculations.
CHAPTER OUTLINE
13.1 Introduction to Simple Regression Analysis
13.2 Determining the Equation of the Regression Line
13.3 Residual Analysis
Using Residuals to Test the Assumptions of the Regression Model
Using the Computer for Residual Analysis
13.4 Standard Error of the Estimate
3.
Chapter 13: Simple Regression and Correlation Analysis 3
13.5 Coefficient of Determination
Relationship Between r and r2
13.6 Hypothesis Tests for the Slope of the Regression Model and Testing the Overall
Model
Testing the Slope
Testing the Overall Model
13.7 Estimation
Confidence Intervals to Estimate the Conditional Mean of y: µy/x
Prediction Intervals to Estimate a Single Value of y
13.8 Interpreting Computer Output
KEY TERMS
Coefficient of Determination (r2
) Prediction Interval
Confidence Interval Probabilistic Model
Dependent Variable Regression Analysis
Deterministic Model Residual
Heteroscedasticity Residual Plot
Homoscedasticity Scatter Plot
Independent Variable Simple Regression
Least Squares Analysis Standard Error of the Estimate (se)
Outliers Sum of Squares of Error (SSE)
4.
Chapter 13: Simple Regression and Correlation Analysis 4
SOLUTIONS TO CHAPTER 13
13.1 x x
12 17
21 15
28 22
8 19
20 24
Σx = 89 Σy = 97 Σxy = 1,767
Σx2
= 1,833 Σy2
= 1,935 n = 5
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
5
)89(
833,1
5
)97)(89(
767,1
2
−
−
= 0.162
b0 =
5
89
162.0
5
97
1 −=−
∑∑
n
x
b
n
y
= 16.5
yˆ = 16.5 + 0.162 x
5.
Chapter 13: Simple Regression and Correlation Analysis 5
13.2 x y
140 25
119 29
103 46
91 70
65 88
29 112
24 128
Σx = 571 Σy = 498 Σxy = 30,099
Σx2
= 58,293 Σy2
= 45,154 n = 7
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
7
)571(
293,58
7
)498)(571(
099,30
2
−
−
= -0.898
b0 =
7
571
)898.0(
7
498
1 −−=−
∑∑
n
x
b
n
y
= 144.414
yˆ = 144.414 – 0.898 x
6.
Chapter 13: Simple Regression and Correlation Analysis 6
13.3 (Advertising) x (Sales) y
12.5 148
3.7 55
21.6 338
60.0 994
37.6 541
6.1 89
16.8 126
41.2 379
Σx = 199.5 Σy = 2,670 Σxy = 107,610.4
Σx2
= 7,667.15 Σy2
= 1,587,328 n = 8
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
8
)5.199(
15.667,7
8
)670,2)(5.199(
4.610,107
2
−
−
= 15.24
b0 =
8
5.199
24.15
8
670,2
1 −=−
∑∑
n
x
b
n
y
= -46.29
yˆ = -46.29 + 15.24 x
13.4 (Prime) x (Bond) y
16 5
6 12
8 9
4 15
7 7
Σx = 41 Σy = 48 Σxy = 333
Σx2
= 421 Σy2
= 524 n = 5
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
5
)41(
421
5
)48)(41(
333
2
−
−
= -0.715
7.
Chapter 13: Simple Regression and Correlation Analysis 7
b0 =
5
41
)715.0(
5
48
1 −−=−
∑∑
n
x
b
n
y
= 15.46
yˆ = 15.46 – 0.715 x
13.5 Starts Failures
233,710 57,097
199,091 50,361
181,645 60,747
158,930 88,140
155,672 97,069
164,086 86,133
166,154 71,558
188,387 71,128
168,158 71,931
170,475 83,384
166,740 71,857
Σx = 1,953,048 Σy = 809,405 Σx2
= 351,907,107,960
Σy2
= 61,566,568,203 Σxy = 141,238,520,688 n = 11
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
11
)048,953,1(
960,107,907,351
11
)405,809)(048,953,1(
688,520,238,141
2
−
−
=
b1 = -0.48042194
b0 =
11
048,953,1
)48042194.0(
11
405,809
1 −−=−
∑∑
n
x
b
n
y
= 158,881.1
yˆ = 158,881.1 – 0.48042194 x
8.
Chapter 13: Simple Regression and Correlation Analysis 8
13.6 No. of Farms (x) Avg. Size (y)
5.65 213
4.65 258
3.96 297
3.36 340
2.95 374
2.52 420
2.44 426
2.29 441
2.15 460
2.07 469
2.17 434
Σx = 34.21 Σy = 4,132 Σx2
= 120.3831
Σy2
= 1,627,892 Σxy = 11,834.31 n = 11
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
11
)21.34(
3831.120
11
)132,4)(21.34(
31.834,11
2
−
−
= -72.6383
b0 =
11
21.34
)6383.72(
11
132,4
1 −−=−
∑∑
n
x
b
n
y
= 601.542
yˆ = 601.542 – 72.6383 x
9.
Chapter 13: Simple Regression and Correlation Analysis 9
13.7 Steel New Orders
99.9 2.74
97.9 2.87
98.9 2.93
87.9 2.87
92.9 2.98
97.9 3.09
100.6 3.36
104.9 3.61
105.3 3.75
108.6 3.95
Σx = 994.8 Σy = 32.15 Σx2
= 99,293.28
Σy2
= 104.9815 Σxy = 3,216.652 n = 10
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
10
)8.994(
28.293,99
10
)15.32)(8.994(
652.216,3
2
−
−
= 0.05557
b0 =
10
8.994
)05557.0(
10
15.32
1 −=−
∑∑
n
x
b
n
y
= -2.31307
yˆ = -2.31307 + 0.05557 x
12.
Chapter 13: Simple Regression and Correlation Analysis 12
13.14 Miles (x) Cost y ( yˆ ) (y- yˆ )
1,245 2.64 2.5376 .1024
425 2.31 2.3322 -.0222
1,346 2.45 2.5628 -.1128
973 2.52 2.4694 .0506
255 2.19 2.2896 -.0996
865 2.55 2.4424 .1076
1,080 2.40 2.4962 -.0962
296 2.37 2.2998 .0702
No apparent violation of assumptions
13.15
Error terms appear to be non independent
13.
Chapter 13: Simple Regression and Correlation Analysis 13
13.16
There appears to be a non constant error variance.
13.17
There appears to be nonlinear regression
13.18 The MINITAB Residuals vs. Fits graphic is strongly indicative of a violation of
the homoscedasticity assumption of regression. Because the residuals are very
close together for small values of x, there is little variability in the residuals at the
left end of the graph. On the other hand, for larger values of x, the graph flares
out indicating a much greater variability at the upper end. Thus, there is a lack of
homogeneity of error across the values of the independent variable.
14.
Chapter 13: Simple Regression and Correlation Analysis 14
13.19 SSE = Σy2
– b0Σy - b1ΣXY = 1,935 - (16.51)(97) - 0.1624(1767) = 46.5692
3
5692.46
2
=
−
=
n
SSE
se = 3.94
Approximately 68% of the residuals should fall within ±1se.
3 out of 5 or 60% of the actually residuals in 11.13 fell within ± 1se.
13.20 SSE = Σy2
– b0Σy - b1ΣXY = 45,154 - 144.414(498) - (-.89824)(30,099) =
SSE = 272.0
5
0.272
2
=
−
=
n
SSE
se = 7.376
6 out of 7 = 85.7% fall within + 1se
7 out of 7 = 100% fall within + 2se
13.21 SSE = Σy2
– b0Σy - b1ΣXY = 1,587,328 - (-46.29)(2,670) - 15.24(107,610.4) =
SSE = 70,940
6
940,70
2
=
−
=
n
SSE
se = 108.7
Six out of eight (75%) of the sales estimates are within $108.7 million.
13.22 SSE = Σy2
– b0Σy - b1ΣXY = 524 - 15.46(48) - (-0.71462)(333) = 19.8885
3
8885.19
2
=
−
=
n
SSE
se = 2.575
Four out of five (80%) of the estimates are within 2.5759 of the actual rate for
bonds. This amount of error is probably not acceptable to financial analysts.
15.
Chapter 13: Simple Regression and Correlation Analysis 15
13.23 (y- yˆ ) (y- yˆ )2
4.7244 22.3200
-0.9836 .9675
-0.3996 .1597
-6.7537 45.6125
2.7683 7.6635
0.6442 .4150
Σ(y- yˆ )2
= 77.1382
SSE = 2
)ˆ(∑ − yy = 77.1382
4
1382.77
2
=
−
=
n
SSE
se = 4.391
13.24 (y- yˆ ) (y- yˆ )2
.1023 .0105
-.0222 .0005
-.1128 .0127
.0506 .0026
-.0996 .0099
.1076 .0116
-.0962 .0093
.0702 .0049
Σ(y- yˆ )2
= .0620
SSE = 2
)ˆ(∑ − yy = .0620
6
0620.
2
=
−
=
n
SSE
se = .1017
The model produces estimates that are ±.1017 or within about 10 cents 68% of the
time. However, the range of milk costs is only 45 cents for this data.
16.
Chapter 13: Simple Regression and Correlation Analysis 16
13.25 Volume (x) Sales (y)
728.6 10.5
497.9 48.1
439.1 64.8
377.9 20.1
375.5 11.4
363.8 123.8
276.3 89.0
n = 7 Σx = 3059.1 Σy = 367.7
Σx2
= 1,464,071.97 Σy2
= 30,404.31 Σxy = 141,558.6
b1 = -.1504 b0 = 118.257
yˆ = 118.257 - .1504x
SSE = Σy2
– b0Σy - b1ΣXY
= 30,404.31 - (118.257)(367.7) - (-0.1504)(141,558.6) = 8211.6245
5
6245.8211
2
=
−
=
n
SSE
se = 40.5256
This is a relatively large standard error of the estimate given the sales values
(ranging from 10.5 to 123.8).
13.26 r2
=
5
)97(
935,1
6399.46
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .123
This is a low value of r2
13.27 r2
=
7
)498(
154,45
12.272
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .972
This is a high value of r2
17.
Chapter 13: Simple Regression and Correlation Analysis 17
13.28 r2
=
8
)670,2(
328,587,1
940,70
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .898
This value of r2
is relatively high
13.29 r2
=
5
)48(
524
8885.19
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .685
This value of r2
is a modest value.
68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.
13.30 r2
=
6
)173(
837,5
1384.77
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .909
This value is a relatively high value of r2
.
Almost 91% of the variability of y is accounted for by the x values.
13.31 CCI Median Income
116.8 37.415
91.5 36.770
68.5 35.501
61.6 35.047
65.9 34.700
90.6 34.942
100.0 35.887
104.6 36.306
125.4 37.005
Σx = 323.573 Σy = 824.9 Σx2
= 11,640.93413
Σy2
= 79,718.79 Σxy = 29,804.4505 n = 9
18.
Chapter 13: Simple Regression and Correlation Analysis 18
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
9
)573.323(
93413.640,11
9
)9.824)(573.323(
4505.804,29
2
−
−
=
b1 = 19.2204
b0 =
9
573.323
)2204.19(
9
9.824
1 −=−
∑∑
n
x
b
n
y
= -599.3674
yˆ = -599.3674 + 19.2204 x
SSE = Σy2
– b0Σy - b1ΣXY =
79,718.79 – (-599.3674)(824.9) – 19.2204(29,804.4505) = 1283.13435
7
13435.1283
2
=
−
=
n
SSE
se = 13.539
r2
=
9
)9.824(
79.718,79
13435.1283
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .688
13.32 sb =
5
)89(
833.1
94.3
)( 22
2
−
=
−∑
∑
n
x
x
se
= .2498
b1 = 0.162
Ho: β = 0 α = .05
Ha: β ≠ 0
This is a two-tail test, α/2 = .025 df = n - 2 = 5 - 2 = 3
t.025,3 = ±3.182
t =
2498.
0162.011 −
=
−
bs
b β
= 0.65
Since the observed t = 0.65 < t.025,3 = 3.182, the decision is to fail to reject the
null hypothesis.
19.
Chapter 13: Simple Regression and Correlation Analysis 19
13.33 sb =
7
)571(
293,58
376.7
)( 22
2
−
=
−∑
∑
n
x
x
se
= .068145
b1 = -0.898
Ho: β = 0 α = .01
Ha: β ≠ 0
Two-tail test, α/2 = .005 df = n - 2 = 7 - 2 = 5
t.005,5 = ±4.032
t =
068145.
0898.011 −−
=
−
bs
b β
= -13.18
Since the observed t = -13.18 < t.005,5 = -4.032, the decision is to reject the null
hypothesis.
13.34 sb =
8
)5.199(
15.667,7
7.108
)( 22
2
−
=
−∑
∑
n
x
x
se
= 2.095
b1 = 15.240
Ho: β = 0 α = .10
Ha: β ≠ 0
For a two-tail test, α/2 = .05 df = n - 2 = 8 - 2 = 6
t.05,6 = 1.943
t =
095.2
0240,1511 −
=
−
bs
b β
= 7.27
Since the observed t = 7.27 > t.05,6 = 1.943, the decision is to reject the null
hypothesis.
20.
Chapter 13: Simple Regression and Correlation Analysis 20
13.35 sb =
5
)41(
421
575.2
)( 22
2
−
=
−∑
∑
n
x
x
se
= .27963
b1 = -0.715
Ho: β = 0 α = .05
Ha: β ≠ 0
For a two-tail test, α/2 = .025 df = n - 2 = 5 - 2 = 3
t.025,3 = ±3.182
t =
27963.
0715.011 −−
=
−
bs
b β
= -2.56
Since the observed t = -2.56 > t.025,3 = -3.182, the decision is to fail to reject the
null hypothesis.
13.36 Analysis of Variance
SOURCE df SS MS F
Regression 1 5,165 5,165.00 1.95
Error 7 18,554 2,650.57
Total 8 23,718
Let α = .05 F.05,1,7 = 5.59
Since observed F = 1.95 < F.05,1,7 = 5.59, the decision is to fail to reject the null
hypothesis. There is no overall predictability in this model.
t = 95.1=F = 1.40
t.025,7 = 2.365
Since t = 1.40 < t.025,7 = 2.365, the decision is to fail to reject the null
hypothesis.
The slope is not significantly different from zero.
21.
Chapter 13: Simple Regression and Correlation Analysis 21
13.37 F = 8.26 with a p-value of .021. The overall model is significant at α = .05 but not
at α = .01. For simple regression,
t = F = 2.8674
t.05,5 = 2.015 but t.01,5 = 3.365. The slope is significant at α = .05 but not at
α = .01.
13.38 x0 = 25
95% confidence α/2 = .025
df = n - 2 = 5 - 2 = 3 t.025,3 = ±3.182
5
89
==
∑
n
x
x = 17.8
Σx = 89 Σx2
= 1,833
se = 3.94
yˆ = 16.5 + 0.162(25) = 20.55
yˆ ± t /2,n-2 se
∑
∑−
−
+
n
x
x
xx
n 2
2
2
0
)(
)(1
20.55 ± 3.182(3.94)
5
)89(
833,1
)8.1725(
5
1
2
2
−
−
+ = 20.55 ± 3.182(3.94)(.63903) =
20.55 ± 8.01
12.54 < E(y25) < 28.56
22.
Chapter 13: Simple Regression and Correlation Analysis 22
13.39 x0 = 100 For 90% confidence, α/2 = .05
df = n - 2 = 7 - 2 = 5 t.05,5 = ±2.015
7
571
==
∑
n
x
x = 81.57143
Σx= 571 Σx2
= 58,293 Se = 7.377
yˆ = 144.414 - .0898(100) = 54.614
yˆ ± t /2,n-2 se
∑
∑−
−
++
n
x
x
xx
n 2
2
2
0
)(
)(1
1 =
54.614 ± 2.015(7.377)
7
)571(
293,58
)57143.81100(
7
1
1 2
2
−
−
++ =
54.614 ± 2.015(7.377)(1.08252) = 54.614 ± 16.091
38.523 < y < 70.705
For x0 = 130, yˆ = 144.414 - .0898(130) = 27.674
y ± t /2,n-2 se
∑
∑−
−
++
n
x
x
xx
n 2
2
2
0
)(
)(1
1 =
27.674 ± 2.015(7.377)
7
)571(
293,58
)57143.81130(
7
1
1 2
2
−
−
++ =
27.674 ± 2.015(7.377)(1.1589) = 27.674 ± 17.227
10.447 < y < 44.901
The width of this confidence interval of y for x0 = 130 is wider that the
confidence interval of y for x0 = 100 because x0 = 100 is nearer to the value of
x = 81.57 than is x0 = 130.
23.
Chapter 13: Simple Regression and Correlation Analysis 23
13.40 x0 = 20 For 98% confidence, α/2 = .01
df = n - 2 = 8 - 2 = 6 t.01,6 = 3.143
8
5.199
==
∑
n
x
x = 24.9375
Σx = 199.5 Σx2
= 7,667.15 Se = 108.8
yˆ = -46.29 + 15.24(20) = 258.51
yˆ ± t /2,n-2 se
∑
∑−
−
+
n
x
x
xx
n 2
2
2
0
)(
)(1
258.51 ± (3.143)(108.8)
8
)5.199(
15.667,7
)9375.2420(
8
1
2
2
−
−
+
258.51 ± (3.143)(108.8)(0.36614) = 258.51 ± 125.20
133.31 < E(y20) < 383.71
For single y value:
yˆ ± t /2,n-2 se
∑
∑−
−
++
n
x
x
xx
n 2
2
2
0
)(
)(1
1
258.51 ± (3.143)(108.8)
8
)5.199(
15.667,7
)9375.2420(
8
1
1 2
2
−
−
++
258.51 ± (3.143)(108.8)(1.06492) = 258.51 ± 364.16
-105.65 < y < 622.67
The confidence interval for the single value of y is wider than the confidence
interval for the average value of y because the average is more towards the
middle and individual values of y can vary more than values of the average.
24.
Chapter 13: Simple Regression and Correlation Analysis 24
13.41 x0 = 10 For 99% confidence α/2 = .005
df = n - 2 = 5 - 2 = 3 t.005,3 = 5.841
5
41
==
∑
n
x
x = 8.20
Σx = 41 Σx2
= 421 Se = 2.575
yˆ = 15.46 - 0.715(10) = 8.31
yˆ ± t /2,n-2 se
∑
∑−
−
+
n
x
x
xx
n 2
2
2
0
)(
)(1
8.31 ± 5.841(2.575)
5
)41(
421
)2.810(
5
1
2
2
−
−
+ =
8.31 ± 5.841(2.575)(.488065) = 8.31 ± 7.34
0.97 < E(y10) < 15.65
If the prime interest rate is 10%, we are 99% confident that the average bond rate
is between 0.97% and 15.65%.
25.
Chapter 13: Simple Regression and Correlation Analysis 25
13.42 x y
5 8
7 9
3 11
16 27
12 15
9 13
Σx = 52 Σx2
= 564
Σy = 83 Σy2
= 1,389 b1 = 1.2853
Σxy = 865 n = 6 b0 = 2.6941
a) yˆ = 2.6941 + 1.2853 x
b) yˆ (Predicted Values) (y- yˆ ) residuals
9.1206 -1.1206
11.6912 -2.6912
6.5500 4.4500
23.2588 3.7412
18.1176 -3.1176
14.2618 -1.2618
c) (y- yˆ )2
1.2557
7.2426
19.8025
13.9966
9.7194
1.5921
SSE = 53.6089
4
6089.53
2
=
−
=
n
SSE
se = 3.661
d) r2
=
6
)83(
389,1
6089.53
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .777
26.
Chapter 13: Simple Regression and Correlation Analysis 26
e) Ho: β = 0 α = .01
Ha: β ≠ 0
Two-tailed test, α/2 = .005 df = n - 2 = 6 - 2 = 4
t.005,4 = ±4.604
sb =
6
)52(
564
661.3
)( 22
2
−
=
−∑
∑
n
x
x
se
= .34389
t =
34389.
02853.111 −
=
−
bs
b β
= 3.74
Since the observed t = 3.74 < t.005,4 = 4.604, the decision is to fail to reject the
null hypothesis.
f) The r2
= 77.74% is modest. There appears to be some prediction with this model.
The slope of the regression line is not significantly different from zero using α =
.01. However, for α = .05, the null hypothesis of a zero slope is rejected. The
standard error of the estimate, se = 3.661 is not particularly small given the range
of values for y (11 - 3 = 8).
13.43 x y
53 5
47 5
41 7
50 4
58 10
62 12
45 3
60 11
Σx = 416 Σx2
= 22,032
Σy = 57 Σy2
= 489 b1 = 0.355
Σxy = 3,106 n = 8 b0 = -11.335
a) yˆ = -11.335 + 0.355 x
27.
Chapter 13: Simple Regression and Correlation Analysis 27
b) yˆ (Predicted Values) (y- yˆ ) residuals
7.48 -2.48
5.35 -0.35
3.22 3.78
6.415 -2.415
9.255 0.745
10.675 1.325
4.64 -1.64
9.965 1.035
c) (y- yˆ )2
6.1504
.1225
14.2884
5.8322
.5550
1.7556
2.6896
1.0712
SSE = 32.4649
d) se =
6
4649.32
2
=
−
=
n
SSE
se = 2.3261
e) r2
=
8
)57(
489
4649.32
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= .608
f) Ho: β = 0 α = .05
Ha: β ≠ 0
Two-tailed test, α/2 = .025 df = n - 2 = 8 - 2 = 6
t.025,6 = ±2.447
sb =
8
)416(
032,22
3261.2
)( 22
2
−
=
−∑
∑
n
x
x
se
= 0.116305
28.
Chapter 13: Simple Regression and Correlation Analysis 28
t =
116305.
03555.011 −
=
−
bs
b β
= 3.05
Since the observed t = 3.05 > t.025,6 = 2.447, the decision is to reject the null
hypothesis.
The population slope is different from zero.
g) This model produces only a modest r2
= .608. Almost 40% of the variance of Y is
unaccounted for by X. The range of Y values is 12 - 3 = 9 and the standard error
of the estimate is 2.33. Given this small range, the se is not small.
13.44 Σx = 1,263 Σx2
= 268,295
Σy = 417 Σy2
= 29,135
Σxy = 88,288 n = 6
b0 = 25.42778 b1 = 0.209369
SSE = Σy2
- b0Σy - b1Σxy =
29,135 - (25.42778)(417) - (0.209369)(88,288) = 46.845468
r2
=
5.153
845468.46
1
)(
1 2
2
−=
−
−
∑
∑
n
y
y
SSE
= .695
Coefficient of determination = r2
= .695
13.45 a) x0 = 60
Σx = 524 Σx2
= 36,224
Σy = 215 Σy2
= 6,411 b1 = .5481
Σxy = 15,125 n = 8 b0 = -9.026
se = 3.201 95% Confidence Interval α/2 = .025
df = n - 2 = 8 - 2 = 6
t.025,6 = ±2.447
yˆ = -9.026 + 0.5481(60) = 23.86
29.
Chapter 13: Simple Regression and Correlation Analysis 29
8
524
==
∑
n
x
x = 65.5
yˆ ± tα /2,n-2 se
∑
∑−
−
+
n
x
x
xx
n 2
2
2
0
)(
)(1
23.86 + 2.447((3.201)
8
)524(
224,36
)5.6560(
8
1
2
2
−
−
+
23.86 + 2.447(3.201)(.375372) = 23.86 + 2.94
20.92 < E(y60) < 26.8
b) x0 = 70
yˆ 70 = -9.026 + 0.5481(70) = 29.341
yˆ + tα/2,n-2 se
∑
∑−
−
++
n
x
x
xx
n 2
2
2
0
)(
)(1
1
29.341 + 2.447(3.201)
8
)524(
224,36
)5.6570(
8
1
1 2
2
−
−
++
29.341 + 2.447(3.201)(1.06567) = 29.341 + 8.347
20.994 < y < 37.688
c) The confidence interval for (b) is much wider because part (b) is for a single value
of y which produces a much greater possible variation. In actuality, x0 = 70 in
part (b) is slightly closer to the mean (x) than x0 = 60. However, the width of the
single interval is much greater than that of the average or expected y value in
part (a).
30.
Chapter 13: Simple Regression and Correlation Analysis 30
13.46 Σy = 267 Σy2
= 15,971
Σx = 21 Σx2
= 101
Σxy = 1,256 n = 5
b0 = 9.234375 b1 = 10.515625
SSE = Σy2
- b0Σy - b1Σxy =
15,971 - (9.234375)(267) - (10.515625)(1,256) = 297.7969
r2
=
2.713,1
7969.297
1
)(
1 2
2
−=
−
−
∑
∑
n
y
y
SSE
= .826
If a regression model would have been developed to predict number of cars sold
by the number of sales people, the model would have had an r2
of 82.6%. The
same would hold true for a model to predict number of sales people by the
number of cars sold.
13.47 n = 12 Σx = 548 Σx2
= 26,592
Σy = 5940 Σy2
= 3,211,546 Σxy = 287,908
b1 = 10.626383 b0 = 9.728511
yˆ = 9.728511 + 10.626383 x
SSE = Σy2
- b0Σy - b1Σxy =
3,211,546 - (9.728511)(5940) - (10.626383)(287,908) = 94337.9762
10
9762.337,94
2
=
−
=
n
SSE
se = 97.1277
r2
=
246,271
9762.337,94
1
)(
1 2
2
−=
−
−
∑
∑
n
y
y
SSE
= .652
31.
Chapter 13: Simple Regression and Correlation Analysis 31
t =
12
)548(
592,26
1277.97
0626383.10
2
−
−
= 4.33
If α = .01, then t.005,10 = 3.169. Since the observed t = 4.33 > t.005,10 = 3.169, the
decision is to reject the null hypothesis.
13.48 Sales(y) Number of Units(x)
17.1 12.4
7.9 7.5
4.8 6.8
4.7 8.7
4.6 4.6
4.0 5.1
2.9 11.2
2.7 5.1
2.7 2.9
Σy = 51.4 Σy2
= 460.1 Σx = 64.3
Σx2
= 538.97 Σxy = 440.46 n = 9
b1 = 0.92025 b0 = -0.863565
SSE = Σy2
- b0Σy - b1Σxy =
460.1 - (-0.863565)(51.4) - (0.92025)(440.46) =
r2
=
55.166
153926.99
1
)(
1 2
2
−=
−
−
∑
∑
n
y
y
SSE
= .405
32.
Chapter 13: Simple Regression and Correlation Analysis 32
13.49 1977 2000
581 571
213 220
668 492
345 221
1476 1760
1776 5750
Σx= 5059 Σy = 9014 Σx2
= 6,280,931
Σy2
= 36,825,446 Σxy = 13,593,272 n = 6
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
6
)5059(
931,280,6
6
)9014)(5059(
272,593,13
2
−
−
= 2.97366
b0 =
6
5059
)97366.2(
6
9014
1 −=−
∑∑
n
x
b
n
y
= -1004.9575
yˆ = -1004.9575 + 2.97366 x
for x = 700:
yˆ = 1076.6044
yˆ + tα/2,n-2se
∑
∑−
−
+
n
x
x
xx
n 2
2
2
0
)(
)(1
α = .05, t.025,4 = 2.776
x0 = 700, n = 6
x = 843.167
SSE = Σy2
– b0Σy –b1Σxy =
36,825,446 – (-1004.9575)(9014) – (2.97366)(13,593,272) = 5,462,363.69
33.
Chapter 13: Simple Regression and Correlation Analysis 33
4
69.363,462,5
2
=
−
=
n
SSE
se = 1168.585
Confidence Interval =
1076.6044 + (2.776)(1168.585)
6
)5059(
931,280,6
)167.843700(
6
1
2
2
−
−
+ =
1076.6044 + 1364.1632
-287.5588 to 2440.7676
H0: β1 = 0
Ha: β1 ≠ 0
α = .05 df = 4
Table t.025,4 = 2.132
t =
8231614.
9736.2
833.350,015,2
585.1168
09736.201
=
−
=
−
bs
b
= 3.6124
Since the observed t = 3.6124 > t.025,4 = 2.132, the decision is to reject the null
hypothesis.
13.50 Σx = 11.902 Σx2
= 25.1215
Σy = 516.8 Σy2
= 61,899.06 b1 = 66.36277
Σxy = 1,202.867 n = 7 b0 = -39.0071
yˆ = -39.0071 + 66.36277 x
SSE = Σy2
- b0 Σy - b1 Σxy
SSE = 61,899.06 - (-39.0071)(516.8) - (66.36277)(1,202.867)
SSE = 2,232.343
5
343.232,2
2
=
−
=
n
SSE
se = 21.13
34.
Chapter 13: Simple Regression and Correlation Analysis 34
r2
=
7
)8.516(
06.899,61
343.232,2
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= 1 - .094 = .906
13.51 Σx = 44,754 Σy = 17,314 Σx2
= 167,540,610
Σy2
= 24,646,062 n = 13 Σxy = 59,852,571
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
13
)754,44(
610,540,167
13
)314,17)(754,44(
571,852,59
2
−
−
= .01835
b0 =
13
754,44
)01835(.
13
314,17
1 −=−
∑∑
n
x
b
n
y
= 1268.685
yˆ = 1268.685 + .01835 x
r2
for this model is .002858. There is no predictability in this model.
Test for slope: t = 0.18 with a p-value of 0.8623. Not significant
13.52 Σx = 323.3 Σy = 6765.8
Σx2
= 29,629.13 Σy2
= 7,583,144.64
Σxy = 339,342.76 n = 7
b1 =
∑
∑
∑
∑ ∑
−
−
=
n
x
x
n
yx
xy
SS
SS
x
xy
2
2
)(
=
7
)3.323(
13.629,29
7
)8.6765)(3.323(
76.342,339
2
−
−
= 1.82751
b0 =
7
3.323
)82751.1(
7
8.6765
1 −=−
∑∑
n
x
b
n
y
= 882.138
yˆ = 882.138 + 1.82751 x
SSE = Σy2
–b0Σy –b1Σxy
35.
Chapter 13: Simple Regression and Correlation Analysis 35
= 7,583,144.64 –(882.138)(6765.8) –(1.82751)(339,342.76) = 994,623.07
5
07.623,994
2
=
−
=
n
SSE
se = 446.01
r2
=
7
)8.6765(
64.144,583,7
07.623,994
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= 1 - .953 = .047
H0: β = 0
Ha: β ≠ 0 α = .05 t.025,5 = 2.571
SSxx =
( )
7
)3.323(
13.629,29
2
2
2
−=−∑
∑
n
x
x = 14,697.29
t =
29.697,14
01.446
082751.101 −
=
−
xx
e
SS
s
b
= 0.50
Since the observed t = 0.50 < t.025,5 = 2.571, the decision is to fail to reject the
null hypothesis.
13.53 Let Water use = y and Temperature = x
Σx = 608 Σx2
= 49,584
Σy = 1,025 Σy2
= 152,711 b1 = 2.40107
Σxy = 86,006 n = 8 b0 = -54.35604
yˆ = -54.35604 + 2.40107 x
yˆ 100 = -54.35604 + 2.40107(100) = 185.751
SSE = Σy2
- b0 Σy - b1 Σxy
SSE = 152,711 - (-54.35604)(1,025) - (2.40107)(86,006) = 1919.5146
6
5146.919,1
2
=
−
=
n
SSE
se = 17.886
36.
Chapter 13: Simple Regression and Correlation Analysis 36
r2
=
8
)1025(
711,152
5145.919,1
1
)(
1 22
2
−
−=
−
−
∑
∑
n
y
y
SSE
= 1 - .09 = .91
Testing the slope:
Ho: β = 0
Ha: β ≠ 0 α = .01
Since this is a two-tailed test, α/2 = .005
df = n - 2 = 8 - 2 = 6
t.005,6 = ±3.707
sb =
8
)608(
584,49
886.17
)( 22
2
−
=
−∑
∑
n
x
x
se
= .30783
t =
30783.
040107.211 −
=
−
bs
b β
= 7.80
Since the observed t = 7.80 < t.005,6 = 3.707, the decision is to reject the null
hypothesis.
13.54 a) The regression equation is: yˆ = 67.2 – 0.0565 x
b) For every unit of increase in the value of x, the predicted value of y will
decrease by -.0565.
c) The t ratio for the slope is –5.50 with an associated p-value of .000. This is
significant at α = .10. The t ratio negative because the slope is negative and
the numerator of the t ratio formula equals the slope minus zero.
d) r2
is .627 or 62.7% of the variability of y is accounted for by x. This is only
a modest proportion of predictability. The standard error of the estimate is
10.32. This is best interpreted in light of the data and the magnitude of the
data.
e) The F value which tests the overall predictability of the model is 30.25. For
simple regression analysis, this equals the value of t2
which is (-5.50)2
.
f) The negative is not a surprise because the slope of the regression line is also
37.
Chapter 13: Simple Regression and Correlation Analysis 37
negative indicating an inverse relationship between x and y. In addition,
taking the square root of r2
which is .627 yields .7906 which is the magnitude
of the value of r considering rounding error.
13.55 The F value for overall predictability is 7.12 with an associated p-value of .0205
which is significant at α = .05. It is not significant at alpha of .01. The
coefficient of determination is .372 with an adjusted r2
of .32. Thisrepresents
very modest predictability. The standard error of the estimate is 982.219 which in
units of 1,000 laborers means that about 68% of the predictions are within
982,219 of the actual figures. The regression model is:
Number of Union Members = 22,348.97 - 0.0524 Labor Force. For a labor force
of 100,000 (thousand, actually 100 million), substitute x = 100,000 and get a
predicted value of 17,108.97 (thousand) which is actually 17,108,970 union
members.
13.56 The Residual Model Diagnostics from MINITAB indicate a relatively healthy set
of residuals. The Histogram indicates that the error terms are generally normally
distributed. This is confirmed by the nearly straight line Normal Plot of
Residuals. The I Chart indicates a relatively homogeneous set of error terms
throughout the domain of x values. This is confirmed by the Residuals vs. Fits
graph. This residual diagnosis indicates no assumption violations.
Views
Actions
Embeds 0
Report content