Relentless Regression

Relentless Regression
By Nicholas Brooks
3/15/17
Using the dataset in R called mtcars I will use descriptive and inferential statistical methods
to find out whether any significant relationships exist between miles per gallon (mpg) and
the other variables in the dataset.
> datasets::mtcars
> mc<-mtcars
> head(mc,5)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
> str(mc)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
> summary(mc$mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.42 19.20 20.09 22.80 33.90
𝑝𝑙𝑜𝑡(𝑚𝑐wt,mc$mpg,xlab="weight",ylab="mpg",main="weight and mpg
comparison",col="blue")

the plot above displays a possible strong negative relationship between
weight and mpg
cor.test(mc𝑤𝑡, 𝑚𝑐mpg)
Pearson's product-moment correlation
data: mc𝑤𝑡𝑎𝑛𝑑𝑚𝑐mpg t = -9.559, df = 30, p-value = 1.294e-10 alternative hypothesis: true
correlation is not equal to 0 95 percent confidence interval: -0.9338264 -0.7440872 sample
estimates: cor -0.8676594
The above correlation test supports that a strong negative relationship does
exist between weight and mpg conluding that as the weight of the car
increases, the mpg decreases.
plot(mcℎ𝑝, 𝑚𝑐mpg,xlab="horse power",ylab="mpg",main="horse power and mpg comparison",col="green")
2 3 4 5
1015202530
weight and mpg comparison
weight
mpg

cor.test(mcℎ𝑝, 𝑚𝑐mpg)
data: mcℎ𝑝𝑎𝑛𝑑𝑚𝑐mpg t = -6.7424, df = 30, p-value = 1.788e-07 alternative hypothesis: true
correlation is not equal to 0 95 percent confidence interval: -0.8852686 -0.5860994 sample
estimates: cor -0.7761684
The above plot displayed a possible negative relationship between horse
power and mpg. A correlation test between these two variables supports
sufficient evidence a strong negative relationship possibly exist as horse
power increases, mpg decreases.
plot(mc𝑑𝑖𝑠𝑝, 𝑚𝑐mpg,xlab="dispostion",ylab="mpg",main=" disposition and mpg comparison",col="red")
50 100 150 200 250 300
1015202530
horse power and mpg comparison
horse power
mpg

cor.test(mc𝑑𝑖𝑠𝑝, 𝑚𝑐mpg)
data: mc𝑑𝑖𝑠𝑝𝑎𝑛𝑑𝑚𝑐mpg t = -8.7472, df = 30, p-value = 9.38e-10 alternative hypothesis:
true correlation is not equal to 0 95 percent confidence interval: -0.9233594 -0.7081376
sample estimates: cor -0.8475514
The plot as well as the correlation test between dispositon and mpg does
show indications of a strong negative relationship that as disposition
increases, mpg decreases.
100 200 300 400
1015202530
disposition and mpg comparison
dispostion
mpg

plot(mc𝑑𝑟𝑎𝑡, 𝑚𝑐mpg,xlab="drat",ylab="mpg",main="drat and mpg comparison",col="black")
cor.test(mc𝑑𝑟𝑎𝑡, 𝑚𝑐mpg)
data: mc𝑑𝑟𝑎𝑡𝑎𝑛𝑑𝑚𝑐mpg t = 5.096, df = 30, p-value = 1.776e-05 alternative hypothesis: true
correlation is not equal to 0 95 percent confidence interval: 0.4360484 0.8322010 sample
estimates: cor 0.6811719
The above plot and correlation test between drat and mpg show indications of
a moderately strong positive relationship between the two variables exists
that as drat increases, mpg increases.
3.0 3.5 4.0 4.5 5.0
1015202530
drat and mpg comparison
drat
mpg

plot(mc𝑞𝑠𝑒𝑐, 𝑚𝑐mpg,xlab="qsec",ylab="mpg",main="qsec and mpg comparison",col="black")
cor.test(mc𝑞𝑠𝑒𝑐, 𝑚𝑐mpg)
data: mc𝑞𝑠𝑒𝑐𝑎𝑛𝑑𝑚𝑐mpg t = 2.5252, df = 30, p-value = 0.01708 alternative hypothesis: true
correlation is not equal to 0 95 percent confidence interval: 0.08195487 0.66961864
sample estimates: cor 0.418684
The plot and correlation test between qsec and mpg indicates a slightly
positive relationship may exist between qsec and mpg.
16 18 20 22
1015202530
qsec and mpg comparison
qsec
mpg

boxplot(mc$mpg~factor.cyl,xlab="cylinder",ylab="mpg",main="mpg and cylinder comparison",col=c(3,5,7))
This box plot reveals a possible indication that as cylinder increases the mpg
decreases.
The data visualization has shown indications that possible relationships
exist as well as substantial variance between mpg and other variables. I will
now construct a regression model that best measures if any independent
variables are statistically significant to the dependent variable mpg. The
model should also help better explain the variation in the mpg that is
predictable from any independent variables. I will use forward selection,
4 6 8
1015202530
mpg and cylinder comparison
cylinder
mpg

backward elimination, and stepwise regression to construct a regression
model with each method and then compare their results to determine which
best fits the model.
add1(lm(mc$mpg~1),scope=(~.+mc$disp+mc$hp+mc$drat+mc$wt+mc$qsec+factor.cyl+f
actor.vs+factor.am+factor.gear+factor.carb),test="F")
Single term additions
Model:
mc$mpg ~ 1
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 1126.05 115.943
mc$disp 1 808.89 317.16 77.397 76.5127 9.380e-10 ***
mc$hp 1 678.37 447.67 88.427 45.4598 1.788e-07 ***
mc$drat 1 522.48 603.57 97.988 25.9696 1.776e-05 ***
mc$wt 1 847.73 278.32 73.217 91.3753 1.294e-10 ***
mc$qsec 1 197.39 928.66 111.776 6.3767 0.0170820 *
factor.cyl 2 824.78 301.26 77.752 39.6975 4.979e-09 ***
factor.vs 1 496.53 629.52 99.335 23.6622 3.416e-05 ***
factor.am 1 405.15 720.90 103.672 16.8603 0.0002850 ***
factor.gear 2 483.24 642.80 102.003 10.9007 0.0002948 ***
factor.carb 5 500.56 625.49 107.129 4.1614 0.0065462 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(mc$mpg~1+mc$wt),scope=(~.+mc$disp+mc$hp+mc$drat+mc$qsec+factor.cyl+f

Model:
mc$mpg ~ 1 + mc$wt
<none> 278.32 73.217
mc$disp 1 31.639 246.68 71.356 3.7195 0.063620 .
mc$hp 1 83.274 195.05 63.840 12.3813 0.001451 **
mc$drat 1 9.081 269.24 74.156 0.9781 0.330854
mc$qsec 1 82.858 195.46 63.908 12.2933 0.001500 **
factor.cyl 2 95.263 183.06 63.810 7.2856 0.002835 **
factor.vs 1 54.228 224.09 68.283 7.0177 0.012926 *
factor.am 1 0.002 278.32 75.217 0.0002 0.987915
factor.gear 2 40.372 237.95 72.202 2.3753 0.111467
factor.carb 5 47.458 230.86 77.235 1.0278 0.422802
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(mc$mpg~1+mc$wt+mc$hp),scope=(~.+mc$disp+mc$drat+mc$qsec+factor.cyl+f
Model:
mc$mpg ~ 1 + mc$wt + mc$hp
<none> 195.05 63.840
mc$disp 1 0.057 194.99 65.831 0.0082 0.92851
mc$drat 1 11.366 183.68 63.919 1.7326 0.19876
mc$qsec 1 8.988 186.06 64.331 1.3527 0.25463
factor.cyl 2 34.270 160.78 61.657 2.8776 0.07364 .
factor.vs 1 6.868 188.18 64.693 1.0219 0.32072

factor.am 1 14.757 180.29 63.323 2.2918 0.14127
factor.gear 2 9.903 185.15 66.173 0.7221 0.49489
factor.carb 5 11.448 183.60 71.905 0.2993 0.90842
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model=lm(mc$mpg~mc$wt+mc$hp)
summary(model)
Call:
lm(formula = mc$mpg ~ mc$wt + mc$hp)
Residuals:
Min 1Q Median 3Q Max
-3.941 -1.600 -0.182 1.050 5.854
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
mc$wt -3.87783 0.63273 -6.129 1.12e-06 ***
mc$hp -0.03177 0.00903 -3.519 0.00145 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
AIC(model)

[1] 156.6523
The model above was constructed using the forward selection method
drop1(lm(mc$mpg~mc$disp+mc$hp+mc$drat+mc$wt+mc$qsec+factor.cyl+factor.vs+fact
or.am+factor.gear+factor.carb),test="F")
Single term deletions
Model:
mc$mpg ~ mc$disp + mc$hp + mc$drat + mc$wt + mc$qsec + factor.cyl +
factor.vs + factor.am + factor.gear + factor.carb
<none> 120.40 76.403
mc$disp 1 9.9672 130.37 76.948 1.2417 0.28267
mc$hp 1 25.6715 146.07 80.588 3.1982 0.09393 .
mc$drat 1 1.8208 122.22 74.884 0.2268 0.64074
mc$wt 1 25.5541 145.96 80.562 3.1836 0.09462 .
mc$qsec 1 1.2413 121.64 74.732 0.1546 0.69967
factor.cyl 2 10.9314 131.33 75.184 0.6809 0.52112
factor.vs 1 3.6299 124.03 75.354 0.4522 0.51151
factor.am 1 1.1420 121.55 74.705 0.1423 0.71132
factor.gear 2 3.9729 124.38 73.442 0.2475 0.78390
factor.carb 5 13.5989 134.00 69.828 0.3388 0.88144
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm(mc$mpg~mc$disp+mc$hp+mc$drat+mc$wt+mc$qsec+factor.cyl+factor.vs+
factor.am+factor.gear),test="F")
Model:

factor.vs + factor.am + factor.gear
<none> 134.00 69.828
mc$disp 1 0.9934 135.00 68.064 0.1483 0.70427
mc$hp 1 22.7935 156.79 72.855 3.4020 0.07998 .
mc$drat 1 1.1854 135.19 68.110 0.1769 0.67852
mc$wt 1 19.7963 153.80 72.237 2.9546 0.10107
mc$qsec 1 5.2634 139.26 69.061 0.7856 0.38598
factor.cyl 2 12.5642 146.57 68.696 0.9376 0.40811
factor.vs 1 3.6763 137.68 68.694 0.5487 0.46746
factor.am 1 11.9255 145.93 70.556 1.7799 0.19715
factor.gear 2 5.0215 139.02 67.005 0.3747 0.69220
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm(mc$mpg~mc$hp+mc$drat+mc$wt+mc$qsec+factor.cyl+factor.vs+factor.am+fact
or.gear),test="F")
Model:
mc$mpg ~ mc$hp + mc$drat + mc$wt + mc$qsec + factor.cyl + factor.vs +
factor.am + factor.gear
<none> 135.00 68.064
mc$hp 1 23.8685 158.86 71.274 3.7130 0.06763 .
mc$drat 1 1.5589 136.55 66.431 0.2425 0.62751
mc$wt 1 27.6318 162.63 72.023 4.2984 0.05064 .
mc$qsec 1 4.6789 139.67 67.154 0.7279 0.40320

factor.cyl 2 18.6303 153.62 68.201 1.4491 0.25732
factor.vs 1 4.6788 139.67 67.154 0.7278 0.40321
factor.am 1 13.5206 148.52 69.119 2.1033 0.16176
factor.gear 2 5.5765 140.57 65.359 0.4337 0.65375
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm(mc$mpg~mc$hp+mc$drat+mc$wt+mc$qsec+factor.cyl+factor.vs+factor.am),tes
t="F")
Model:
mc$mpg ~ mc$hp + mc$drat + mc$wt + mc$qsec + factor.cyl + factor.vs +
factor.am
<none> 140.57 65.359
mc$hp 1 18.566 159.14 67.329 3.0378 0.09470 .
mc$drat 1 0.666 141.24 63.511 0.1090 0.74426
mc$wt 1 38.996 179.57 71.194 6.3804 0.01888 *
mc$qsec 1 2.778 143.35 63.986 0.4545 0.50692
factor.cyl 2 17.987 158.56 65.212 1.4715 0.25040
factor.vs 1 2.644 143.22 63.956 0.4326 0.51726
factor.am 1 16.244 156.81 66.859 2.6578 0.11666
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> drop1(lm(mc$mpg~mc$hp+mc$wt+mc$qsec+factor.cyl+factor.vs+factor.am),test="F")
Model:

mc$mpg ~ mc$hp + mc$wt + mc$qsec + factor.cyl + factor.vs + factor.am
<none> 141.24 63.511
mc$hp 1 18.184 159.42 65.386 3.0899 0.09153 .
mc$wt 1 39.645 180.88 69.428 6.7367 0.01586 *
mc$qsec 1 2.442 143.68 62.059 0.4150 0.52557
factor.cyl 2 18.580 159.82 63.466 1.5786 0.22693
factor.vs 1 2.744 143.98 62.126 0.4663 0.50124
factor.am 1 18.885 160.12 65.527 3.2090 0.08585 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> drop1(lm(mc$mpg~mc$hp+mc$wt+factor.cyl+factor.vs+factor.am),test="F")
Model:
mc$mpg ~ mc$hp + mc$wt + factor.cyl + factor.vs + factor.am
<none> 143.68 62.059
mc$hp 1 36.344 180.02 67.275 6.3238 0.01871 *
mc$wt 1 41.088 184.77 68.108 7.1493 0.01302 *
factor.cyl 2 25.284 168.96 63.246 2.1997 0.13183
factor.vs 1 7.346 151.03 61.655 1.2782 0.26897
factor.am 1 16.443 160.12 63.527 2.8611 0.10317
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> drop1(lm(mc$mpg~mc$hp+mc$wt+factor.cyl+factor.am),test="F")

Model:
mc$mpg ~ mc$hp + mc$wt + factor.cyl + factor.am
<none> 151.03 61.655
mc$hp 1 31.943 182.97 65.794 5.4991 0.026935 *
mc$wt 1 46.173 197.20 68.191 7.9490 0.009081 **
factor.cyl 2 29.265 180.29 63.323 2.5191 0.099998 .
factor.am 1 9.752 160.78 61.657 1.6789 0.206460
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> drop1(lm(mc$mpg~mc$hp+mc$wt+factor.cyl),test="F")
Model:
mc$mpg ~ mc$hp + mc$wt + factor.cyl
<none> 160.78 61.657
mc$hp 1 22.281 183.06 63.810 3.7417 0.0636127 .
mc$wt 1 116.390 277.17 77.084 19.5458 0.0001442 ***
factor.cyl 2 34.270 195.05 63.840 2.8776 0.0736450 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> modelII<-lm(mc$mpg~mc$hp+mc$wt+factor.cyl)
> summary(modelII)
Call:
lm(formula = mc$mpg ~ mc$hp + mc$wt + factor.cyl)

Residuals:
-4.2612 -1.0320 -0.3210 0.9281 5.3947
Coefficients:
(Intercept) 35.84600 2.04102 17.563 2.67e-16 ***
mc$hp -0.02312 0.01195 -1.934 0.063613 .
mc$wt -3.18140 0.71960 -4.421 0.000144 ***
factor.cyl6 -3.35902 1.40167 -2.396 0.023747 *
factor.cyl8 -3.18588 2.17048 -1.468 0.153705
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> AIC(modelII)
[1] 154.4692
The model above was constructed using the backward elimination method.
> modelIII<-
(lm(mc$mpg~mc$disp+mc$hp+mc$drat+mc$wt+mc$qsec+factor.cyl+factor.vs+factor.am+
factor.gear+factor.carb))
> step<-stepAIC(modelIII,direction="both")
Start: AIC=76.4
factor.vs + factor.am + factor.gear + factor.carb
Df Sum of Sq RSS AIC

- factor.carb 5 13.5989 134.00 69.828
- factor.gear 2 3.9729 124.38 73.442
- factor.am 1 1.1420 121.55 74.705
- mc$qsec 1 1.2413 121.64 74.732
- mc$drat 1 1.8208 122.22 74.884
- factor.cyl 2 10.9314 131.33 75.184
- factor.vs 1 3.6299 124.03 75.354
<none> 120.40 76.403
- mc$disp 1 9.9672 130.37 76.948
- mc$wt 1 25.5541 145.96 80.562
- mc$hp 1 25.6715 146.07 80.588
Step: AIC=69.83
factor.vs + factor.am + factor.gear
- factor.gear 2 5.0215 139.02 67.005
- mc$disp 1 0.9934 135.00 68.064
- mc$drat 1 1.1854 135.19 68.110
- factor.vs 1 3.6763 137.68 68.694
- factor.cyl 2 12.5642 146.57 68.696
- mc$qsec 1 5.2634 139.26 69.061
<none> 134.00 69.828
- factor.am 1 11.9255 145.93 70.556
- mc$wt 1 19.7963 153.80 72.237
- mc$hp 1 22.7935 156.79 72.855
+ factor.carb 5 13.5989 120.40 76.403

Step: AIC=67
factor.vs + factor.am
- mc$drat 1 0.9672 139.99 65.227
- factor.cyl 2 10.4247 149.45 65.319
- mc$disp 1 1.5483 140.57 65.359
- factor.vs 1 2.1829 141.21 65.503
- mc$qsec 1 3.6324 142.66 65.830
<none> 139.02 67.005
- factor.am 1 16.5665 155.59 68.608
- mc$hp 1 18.1768 157.20 68.937
+ factor.gear 2 5.0215 134.00 69.828
- mc$wt 1 31.1896 170.21 71.482
+ factor.carb 5 14.6475 124.38 73.442
Step: AIC=65.23
mc$mpg ~ mc$disp + mc$hp + mc$wt + mc$qsec + factor.cyl + factor.vs +
factor.am
- mc$disp 1 1.2474 141.24 63.511
- factor.vs 1 2.3403 142.33 63.757
- factor.cyl 2 12.3267 152.32 63.927
- mc$qsec 1 3.1000 143.09 63.928
<none> 139.99 65.227

+ mc$drat 1 0.9672 139.02 67.005
- mc$hp 1 17.7382 157.73 67.044
- factor.am 1 19.4660 159.46 67.393
+ factor.gear 2 4.8033 135.19 68.110
- mc$wt 1 30.7151 170.71 69.574
+ factor.carb 5 13.0509 126.94 72.095
Step: AIC=63.51
mc$mpg ~ mc$hp + mc$wt + mc$qsec + factor.cyl + factor.vs + factor.am
- mc$qsec 1 2.442 143.68 62.059
- factor.vs 1 2.744 143.98 62.126
- factor.cyl 2 18.580 159.82 63.466
<none> 141.24 63.511
+ mc$disp 1 1.247 139.99 65.227
+ mc$drat 1 0.666 140.57 65.359
- mc$hp 1 18.184 159.42 65.386
- factor.am 1 18.885 160.12 65.527
+ factor.gear 2 4.684 136.55 66.431
- mc$wt 1 39.645 180.88 69.428
+ factor.carb 5 2.331 138.91 72.978
Step: AIC=62.06
mc$mpg ~ mc$hp + mc$wt + factor.cyl + factor.vs + factor.am
- factor.vs 1 7.346 151.03 61.655

<none> 143.68 62.059
- factor.cyl 2 25.284 168.96 63.246
+ mc$qsec 1 2.442 141.24 63.511
- factor.am 1 16.443 160.12 63.527
+ mc$disp 1 0.589 143.09 63.928
+ mc$drat 1 0.330 143.35 63.986
+ factor.gear 2 3.437 140.24 65.284
- mc$hp 1 36.344 180.02 67.275
- mc$wt 1 41.088 184.77 68.108
+ factor.carb 5 3.480 140.20 71.275
Step: AIC=61.65
mc$mpg ~ mc$hp + mc$wt + factor.cyl + factor.am
<none> 151.03 61.655
- factor.am 1 9.752 160.78 61.657
+ factor.vs 1 7.346 143.68 62.059
+ mc$qsec 1 7.044 143.98 62.126
- factor.cyl 2 29.265 180.29 63.323
+ mc$disp 1 0.617 150.41 63.524
+ mc$drat 1 0.220 150.81 63.608
+ factor.gear 2 1.361 149.66 65.365
- mc$hp 1 31.943 182.97 65.794
- mc$wt 1 46.173 197.20 68.191
+ factor.carb 5 5.633 145.39 70.438

> model3<-lm(mc$mpg~mc$hp+mc$wt+factor.cyl+factor.am)
> summary(model3)
Call:
lm(formula = mc$mpg ~ mc$hp + mc$wt + factor.cyl + factor.am)
Residuals:
-3.9387 -1.2560 -0.4013 1.1253 5.0513
Coefficients:
(Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
mc$hp -0.03211 0.01369 -2.345 0.02693 *
mc$wt -2.49683 0.88559 -2.819 0.00908 **
factor.cyl6 -3.03134 1.40728 -2.154 0.04068 *
factor.cyl8 -2.16368 2.28425 -0.947 0.35225
factor.am1 1.80921 1.39630 1.296 0.20646
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> AIC(model3)
[1] 154.4669
The model above was constructed using the stepwise regression method.

Now that I have 3 regression models using 3 different methods I can now
choose which is the best fitting model. Below I recoded each model to easily
compare their results.
> summary(model)
Call:
lm(formula = mc$mpg ~ mc$wt + mc$hp)
Residuals:
-3.941 -1.600 -0.182 1.050 5.854
Coefficients:
(Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
mc$wt -3.87783 0.63273 -6.129 1.12e-06 ***
mc$hp -0.03177 0.00903 -3.519 0.00145 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> summary(modelII)
Call:
lm(formula = mc$mpg ~ mc$hp + mc$wt + factor.cyl)
Residuals:
-4.2612 -1.0320 -0.3210 0.9281 5.3947
Coefficients:
(Intercept) 35.84600 2.04102 17.563 2.67e-16 ***
mc$hp -0.02312 0.01195 -1.934 0.063613 .
mc$wt -3.18140 0.71960 -4.421 0.000144 ***
factor.cyl6 -3.35902 1.40167 -2.396 0.023747 *
factor.cyl8 -3.18588 2.17048 -1.468 0.153705
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> summary(model3)
Call:
lm(formula = mc$mpg ~ mc$hp + mc$wt + factor.cyl + factor.am)
Residuals:
-3.9387 -1.2560 -0.4013 1.1253 5.0513
Coefficients:
(Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
mc$hp -0.03211 0.01369 -2.345 0.02693 *
mc$wt -2.49683 0.88559 -2.819 0.00908 **
factor.cyl6 -3.03134 1.40728 -2.154 0.04068 *
factor.cyl8 -2.16368 2.28425 -0.947 0.35225
factor.am1 1.80921 1.39630 1.296 0.20646
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> AIC(model)
[1] 156.6523
> AIC(modelII)
[1] 154.4692
> AIC(model3)
[1] 154.4669
After examining and comparing all the regression models, it can be concluded
that the variable that explains the most variability in all 3 models is weight.
Horse power would be the second varible to explain the most variability.
model3 has the lowest AIC value which is a measure used to avoid
multicollinearity. Model3 not only has the lowest AIC value, but also the
highest Adjusted R squared value or coefficent of determination that can
explain approximately 84% of the variation in the regression equation.

Relentless Regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Relentless Regression

Similar to Relentless Regression (20)

Recently uploaded

Recently uploaded (20)

Relentless Regression