SlideShare a Scribd company logo
Moving Beyond Linearity [ISLR.2013.Ch7-7]
Theodore Grammatikopoulos∗
Tue 6th
Jan, 2015
Abstract
Linear models are relatively simple to describe and implement, and have advantages
over other approaches in terms of interpretation and inference. However, standard
linear regression can have significant limitations in terms of predictive power. This is
because the linearity assumption is almost always an approximation, and sometimes a
poor one. In article 6 we see that we can improve Least Squares Estimation methods by
using Ridge Regression, Lasso, Principal Components Regression (PCR), Principal Least
Squares (PLS), and other techniques. In that setting, the improvement is obtained by
reducing the complexity of the linear model, and hence the variance of the estimates.
But we are still using a linear model, which can only be improved so far. Here,
we also relax the linearity assumption but we still attempting to maintain as much
interpret-ability as possible. We do this by examining very simple extensions of linear
models like Polynomial Regression and Piece-wise Step Functions, as well as more
sophisticated approaches such as Splines, Local Regression, and Generalized Additive
Models (GAMs).
## OTN License Agreement: Oracle Technology Network -
Developer
## Oracle Distribution of R version 3.0.1 (--) Good Sport
## Copyright (C) The R Foundation for Statistical Computing
## Platform: x86_64-unknown-linux-gnu (64-bit)
D:20150106213538+02’00’
∗
e-mail:tgrammat@gmail.com
1
1 Non-Linear Modeling
In this lab we re-analyze the Wage data set considered in the examples throughout Chapter
7 of “ISLR.2013” [James et al., 2013]. We will see that many of the complex non-linear
fitting procedures discussed in this Chapter can be easily implemented in R.
library(ISLR)
attach(Wage)
1.1 Polynomial Regression
As a first attempt to describe employee’s wage in terms of their age, we will try a poly-
nomial regression fit. We will determine the best polynomial degree to do so and we will
afterwards examine to what extent this could be successful.
The reason to search for a non-linear fit to describe the wage ∼ age dependence, it is
almost apparent by the corresponding plot of the two variables, shown in Figure 1 below.
First, the employee data set can be easily distinguished in two groups, a “High Earners
Group” and a “Low Earners” one. Secondly, the wage ∼ age dependence of this “Low
Earners” employees group is certainly non-linear and most probably of a higher than the
second polynomial degree.
Next, we should decide on the exact degree of the polynomial to use. In the article “Linear
Model Selection and Regularization (ISLR.2013.Ch6-6)”, we studied two different ways to do
so. Either by applying some subset variable selection method or by using cross-validation.
Here, we will discuss an alternative approach, the so-called Hypothesis Testing.
More specifically, we can fit models ranging from linear to degree-5 polynomial and seek to
determine the simplest model which is sufficient to explain the wage ∼ age relationship.
To do so we perform analysis of variance (ANOVA, F-test), by using the anova() function,
in order to test the null hypothesis that a model M1 is sufficient to explain the data against
the alternative hypothesis that a more complex model M2 is required. In order to use the
anova() function, M1 and M2 must be nested models: the predictors in M1 must be a
subset of the predictors in M2. In this case, we fit five different models and sequentially
compare the simpler model to the more complex model.
lm.Poly1.fit <- lm(wage ~ age, data = Wage)
lm.Poly2.fit <- lm(wage ~ poly(age, 2), data = Wage)
lm.Poly3.fit <- lm(wage ~ poly(age, 3), data = Wage)
2
lm.Poly4.fit <- lm(wage ~ poly(age, 4), data = Wage)
lm.Poly5.fit <- lm(wage ~ poly(age, 5), data = Wage)
Figure 1: 4-degree Polynomial fit for Employees’ Wages vs their Age. The probability of an
employee to be a high earner person as a function of her age is also depicted.
anova(lm.Poly1.fit, lm.Poly2.fit, lm.Poly3.fit, lm.Poly4.fit,
lm.Poly5.fit)
## Analysis of Variance Table
##
## Model 1: wage ~ age
## Model 2: wage ~ poly(age, 2)
## Model 3: wage ~ poly(age, 3)
## Model 4: wage ~ poly(age, 4)
## Model 5: wage ~ poly(age, 5)
3
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2998 5022216
## 2 2997 4793430 1 228786 143.5931 < 2.2e-16 ***
## 3 2996 4777674 1 15756 9.8888 0.001679 **
## 4 2995 4771604 1 6070 3.8098 0.051046 .
## 5 2994 4770322 1 1283 0.8050 0.369682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value comparing the linear Model 1 to the quadratic Model 2 is essentially zero
(< 1015
), indicating that a linear fit is not sufficient. Similarly the p-value comparing the
quadratic Model 2 to the cubic Model 3 is very low (∼ 0.0017), so the quadratic fit is
also insufficient. The p-value comparing the cubic to the degree-4 polynomials, Model
3 and Model 4, is approximately 5% while the degree-5 polynomial Model 5 seems
unnecessary because its p-value is ∼ 0.37 with a not large F-statistic. Hence, either a
cubic or a quartic polynomial appear to provide a reasonable fit to the data, but lower or
higher-order models are not justified.
Here, we choose to describe the wage ∼ age dependence by a quartic polynomial, i.e.:
wage ∼ Intercept + 0 ∗ age + 1 ∗ age2
+ 2 ∗ age3
+ 3 ∗ age4
.
The estimated coefficient calculated by the method can be retrieved by the following call
lm.Poly4.fit.Wage <- lm.Poly4.fit
coef(summary(lm.Poly4.fit.Wage))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 111.70361 0.7287409 153.283015 0.000000e+00
## poly(age, 4)1 447.06785 39.9147851 11.200558 1.484604e-28
## poly(age, 4)2 -478.31581 39.9147851 -11.983424 2.355831e-32
## poly(age, 4)3 125.52169 39.9147851 3.144742 1.678622e-03
## poly(age, 4)4 -77.91118 39.9147851 -1.951938 5.103865e-02
Next, we create a grid of values for age at which we want predictions, call the generic
predict() functions and calculate the standard errors
ageMinMax <- range(Wage$age)
age.grid <- seq(from = ageMinMax[1], to = ageMinMax[2])
4
preds.poly <- predict(lm.Poly4.fit.Wage, newdata = list(age = age
.grid),
se.fit = TRUE)
se.bands <- cbind(preds.poly$fit + 2 * preds.poly$se.fit, preds.
poly$fit -
2 * preds.poly$se.fit)
Finally, we plot the data and add the degree-4 Polynomial fit
par(mfrow = c(1, 2), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0))
plot(Wage$age, Wage$wage, xlim = ageMinMax, cex = 0.5, col = "
darkgrey",
xlab = "Age", ylab = "Wage")
title("Wage vs Age Fit n Degree-4 Polynomial [Wage]", outer =
TRUE)
lines(age.grid, preds.poly$fit, lwd = 2, col = "blue")
matlines(age.grid, se.bands, lwd = 2, col = "blue", lty = 3)
As shown in Figure 1, the employee data set can be easily distinguished in two groups, a
“High Earners Group” and a “Low Earners” one. To calculate the probability an employee
to annually earn more than 250k USD, we create the appropriate response vector for the
categorical variable of 1l (wage > 250)
glm.Poly4.binomial.fit.Wage <- glm(I(wage > 250) ~ poly(age,
4), data = Wage, family = binomial)
and make the predictions as before.
glm.Poly4.binomial.preds.Wage <- predict(glm.Poly4.binomial.fit.
Wage,
newdata = list(age = age.grid), se.fit = TRUE)
However, calculating the probability P (Wage > 250 | Age) and its corresponding confi-
dence intervals is slightly more involved than in the linear regression case. The default
5
prediction type for a glm() model is type="link", which what we use here. This means
we get predictions for the logit, i.e we have fit a model of the form
log
P(Y = 1 | X)
1 − P(Y = 1 | X)
= X , (1)
which means that the predictions, as well as its standard errors are of X form. Therefore,
if we have to plot the P (Wage > 250 | Age) as a function of employee’s age, we have to
transform the resulting fit accordingly, that is
P(Y = 1|X) =
exp( X)
1 + exp( X)
, (2)
or in R code
preds <- glm.Poly4.binomial.preds.Wage
pfit <- exp(preds$fit)/(1 + exp(preds$fit))
se.bands.logit <- cbind(preds$fit + 2 * preds$se.fit, preds$fit -
2 * preds$se.fit)
se.bands <- exp(se.bands.logit)/(1 + exp(se.bands.logit))
and plot the result which is shown in the left panel of Figure 1.
plot(age, I(wage > 250), xlim = ageMinMax, type = "n", ylim = c
(0,
0.2), xlab = "Age", ylab = "P(Wage>250|Age)")
points(jitter(age), (I(wage > 250)/5), cex = 0.5, pch = "|",
col = "darkgrey")
lines(age.grid, pfit, lwd = 2, col = "blue")
matlines(age.grid, se.bands, lwd = 2, lty = 3, col = "blue")
It is interesting to note here that the function
6
coef(summary(lm.Poly4.fit.Wage))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 111.70361 0.7287409 153.283015 0.000000e+00
## poly(age, 4)1 447.06785 39.9147851 11.200558 1.484604e-28
## poly(age, 4)2 -478.31581 39.9147851 -11.983424 2.355831e-32
## poly(age, 4)3 125.52169 39.9147851 3.144742 1.678622e-03
## poly(age, 4)4 -77.91118 39.9147851 -1.951938 5.103865e-02
returns a matrix whose columns are a basis of orthogonal polynomials, which essentially
means that each column is a linear combination of the variables age, age^2, age^3
and age^4. However we can also obtain a direct fit to the {age,age^2,age^3,age^4}
variable basis by demanding raw=TRUE in the the previous code as shown below. This
does not affect the model in a meaningful way. The choice of basis clearly affects the
coefficient estimates, but it does not affect the fitted values obtained.
# Direct Fit in {age,age^2,age^3,age^4} Basis
lm.Poly4.fit.Wage2 <- lm(wage ~ poly(age, 4, raw = TRUE), data =
Wage)
coef(summary(lm.Poly4.fit.Wage2))
## Estimate Std. Error
## (Intercept) -1.841542e+02 6.004038e+01
## poly(age, 4, raw = TRUE)1 2.124552e+01 5.886748e+00
## poly(age, 4, raw = TRUE)2 -5.638593e-01 2.061083e-01
## poly(age, 4, raw = TRUE)3 6.810688e-03 3.065931e-03
## poly(age, 4, raw = TRUE)4 -3.203830e-05 1.641359e-05
## t value Pr(>|t|)
## (Intercept) -3.067172 0.0021802539
## poly(age, 4, raw = TRUE)1 3.609042 0.0003123618
## poly(age, 4, raw = TRUE)2 -2.735743 0.0062606446
## poly(age, 4, raw = TRUE)3 2.221409 0.0263977518
## poly(age, 4, raw = TRUE)4 -1.951938 0.0510386498
Two other equivalents ways of calculating the same fit whereas protecting power terms of
age are the following:
# Direct Fit in {age,age^2,age^3,age^4} Basis
7
lm.Poly4.fit.Wage3 <- lm(wage ~ age + I(age^2) + I(age^3) + I(age
^4),
data = Wage)
coef(summary(lm.Poly4.fit.Wage3))
# Direct Fit in {age,age^2,age^3,age^4} Basis
lm.Poly4.fit.Wage4 <- lm(wage ~ cbind(age, age^2, age^3, age^4),
data = Wage)
coef(summary(lm.Poly4.fit.Wage4))
Comparing now the fitted values obtained in either case we found them identical, as
expected.
preds.raw <- predict(lm.Poly4.fit.Wage2, newdata = list(age = age
.grid),
se.fit = TRUE)
max(abs(preds.poly$fit - preds.raw$fit))
## [1] 8.739676e-12
Note:
The ANOVA method also works in more general cases, that is when terms other than the
orthogonal polynomials are also included. For example, we can use anova() to also
compare these three models
fit.1.Wage <- lm(wage ~ education + age, data = Wage)
fit.2.Wage <- lm(wage ~ education + poly(age, 2), data = Wage)
fit.3.Wage <- lm(wage ~ education + poly(age, 3), data = Wage)
fit.4.Wage <- lm(wage ~ education + poly(age, 4), data = Wage)
anova(fit.1.Wage, fit.2.Wage, fit.3.Wage, fit.4.Wage)
8
## Analysis of Variance Table
##
## Model 1: wage ~ education + age
## Model 2: wage ~ education + poly(age, 2)
## Model 3: wage ~ education + poly(age, 3)
## Model 4: wage ~ education + poly(age, 4)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2994 3867992
## 2 2993 3725395 1 142597 114.6595 < 2e-16 ***
## 3 2992 3719809 1 5587 4.4921 0.03413 *
## 4 2991 3719777 1 32 0.0255 0.87308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
giving an outcome which actually support the third case of model, i.e.:
Model 3 : wage ∼ education + poly(age, 3) .
Now comparing this new model with the one we have examined before, i.e.
wage ∼ Intercept + 0 ∗ age + 1 ∗ age2
+ 2 ∗ age3
+ 3 ∗ age4
,
we obtain the following results
# Split the data set in a Train and a Test Data Part
set.seed(356)
train <- sample(c(TRUE, FALSE), nrow(Wage), rep = TRUE)
test <- (!train)
# wage ~ poly(age,4) Model
lm.Poly4.fit <- lm(wage ~ poly(age, 4), data = Wage[train, ])
preds.polyNew <- predict(lm.Poly4.fit, newdata = Wage[test, ],
se.fit = TRUE)
# wage ~ education + poly(age,3)
fit.3.Wage <- lm(wage ~ education + poly(age, 3), data = Wage[
train,
])
preds.fit.3 <- predict(fit.3.Wage, newdata = Wage[test, ], se.fit
= TRUE)
9
# MSEs calculation
mse.polyNew <- mean((Wage$wage[test] - preds.polyNew$fit)^2)
mse.polyNew
## [1] 1622
mse.fit.3 <- mean((Wage$wage[test] - preds.fit.3$fit)^2)
mse.fit.3
## [1] 1278.3
which suggests that the new model,
Model 3 : wage ∼ education + poly(age, 3) ,
is in fact better fit to predict the employee’s wage variable.
1.2 Piece-wise constant functions
Here we try to fit a piece-wise constant function to describe the employee’s wage in terms
of their age. To do so we use the cut() function as shown below
stepfunction.lm.fit.Wage <- lm(wage ~ cut(age, 4), data = Wage)
coef(summary(stepfunction.lm.fit.Wage))
## Estimate Std. Error t value
## (Intercept) 94.158 1.476 63.790
## cut(age, 4)(33.5,49] 24.053 1.829 13.148
## cut(age, 4)(49,64.5] 23.665 2.068 11.443
## cut(age, 4)(64.5,80.1] 7.641 4.987 1.532
## Pr(>|t|)
## (Intercept) 0.000e+00
## cut(age, 4)(33.5,49] 1.982e-38
## cut(age, 4)(49,64.5] 1.041e-29
## cut(age, 4)(64.5,80.1] 1.256e-01
The function cut() returns an ordered categorical variable; the lm() function then
creates a set of dummy variables for use in the regression. The age < 33.5 category is left
10
out, so the intercept coefficient of $ 94, 158 can be interpreted as the average salary for
those under 33.5 years of age, and the other coefficients can be interpreted as the average
additional salary for those in the other age groups. Of course, we can produce predictions
and plots just as we did in the case of the polynomial fit.
Finally, note that the cut() function automatically picked the cut-points of the age
variable. However, one can also impose the cut-points of her/his choice by using the
breaks option of the function.
2 Splines
2.1 Regression Splines
Regression splines fits can be produced in R, by loading the splines library. First we
construct an appropriate matrix of basis functions for a specified set of knots, by calling
the bs{splines}() function.
library(splines)
basis.fxknots <- bs(Wage$age, knots = c(25, 40, 60))
dim(basis.fxknots)
## [1] 3000 6
Alternatively, we can let the library determine the correct number of knots by deciding
only the required degrees of freedoms, df. For a degree-d step polynomial and a fitted
model with K notes, one needs (d + 1)(K + 1) − d K = d + K + 1 “dofs”, or d + K “dofs”
if there is no an intercept in the model. In particular, for a cubic spline basis without
an intercept (default) and with 6 “dofs”, the model is constrained to use only 3
knots which are distributed along uniform quantiles of the age variable.
basis.fxdf1 <- bs(Wage$age, df = 6, intercept = FALSE)
dim(basis.fxdf1)
## [1] 3000 6
attr(basis.fxdf1, "knots")
## 25% 50% 75%
## 33.75 42.00 51.00
11
Should we demand this model to also have an intercept:
basis.fxdf2 <- bs(Wage$age, df = 6, intercept = TRUE)
dim(basis.fxdf2)
## [1] 3000 6
attr(basis.fxdf2, "knots")
## 33.33% 66.67%
## 37 48
whereas for one polynomial degree higher, i.e. quartic spline:
basis.fxdf3 <- bs(Wage$age, df = 6, degree = 4, intercept = TRUE)
dim(basis.fxdf3)
## [1] 3000 6
attr(basis.fxdf3, "knots")
## 50%
## 42
The first case of the cubic splines referenced above seems more promising. To produce a
prediction fit
# Produce an age Grid
ageMinMax <- range(Wage$age)
age.grid <- seq(from = ageMinMax[1], to = ageMinMax[2])
# Produce a prediction fit
splines.bs1.fit <- lm(wage ~ bs(age, df = 6, intercept = FALSE),
data = Wage)
splines.bs1.pred <- predict(splines.bs1.fit, newdata = list(age =
age.grid),
se.fit = TRUE)
12
se.bands <- cbind(splines.bs1.pred$fit + 2 * splines.bs1.pred$se.
fit,
splines.bs1.pred$fit - 2 * splines.bs1.pred$se.fit)
and a corresponding plot of the Wage ∼ Age dependence
par(mfrow = c(1, 1))
par(mfrow = c(1, 1), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0))
plot(Wage$age, Wage$wage, col = "grey", xlab = "Age", ylab = "
Wage",
cex = 0.5, pch = 8)
lines(age.grid, splines.bs1.pred$fit, col = "blue", lwd = 2)
matlines(age.grid, se.bands, lty = "dashed", col = "blue")
title(main = "Wage vs empoloyee Age nRegression and Natural
Splines Fit [Wage{ISLR}]",
outer = TRUE)
In order to fit a natural spline instead, that is a regression spline with linear boundary
conditions, we make use of the ns() function. All these results, Wage ∼ Age data points
as well the two prediction fits, that of the cubic spline (blue line) and the other of the
natural cubic spline (red line) are shown in Figure 2.
basis.ns <- ns(Wage$age, df = 6, intercept = TRUE)
splines.ns.fit <- lm(wage ~ ns(age, df = 6), data = Wage)
splines.ns.pred <- predict(splines.ns.fit, newdata = list(age =
age.grid),
se.fit = TRUE)
se.ns.bands <- cbind(splines.ns.pred$fit - 2 * splines.ns.pred$se
.fit,
splines.ns.pred$fit + 2 * splines.ns.pred$se.fit)
13
Figure 2: Cubic spline fit for Employees Wage vs their Age with 3 knots and 6 dofs (blue
lines). A natural spline fit with 6 dofs and an intercept is also depicted (red lines).
lines(age.grid, splines.ns.pred$fit, col = "red", lwd = 2)
matlines(age.grid, se.ns.bands, lty = "dashed", col = "red")
legend("topright", inset = 0.05, legend = c("Cubic Spline", "
Natural Cubic Spline"),
col = c("blue", "red"), lty = 1, lwd = c("2", "2"))
2.2 Smoothing Splines
Here, we make a smoothing spline fit for the wage ∼ age dependence of the employees’
Wage data set. To do so we utilize the smooth.spline{stats}() function as shown
below
# Smooth Spline with 16 effective dofs
sspline.fit <- smooth.spline(Wage$age, Wage$wage, df = 16)
14
# Cross Validated (LOOCV) Smooth Spline to let the software
# determine the optimal number of effective dofs
sspline.fit2 <- smooth.spline(Wage$age, Wage$wage, cv = FALSE,
df.offset = 1)
# effective dofs
sspline.fit2$df
## [1] 6.467555
which can be plotted by running the code below (Figure 3).
par(mfrow = c(1, 1))
par(mfrow = c(1, 1), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0))
plot(age, wage, xlim = ageMinMax, cex = 0.5, col = "darkgrey",
xlab = "Age", ylab = "Wage", pch = 8)
lines(sspline.fit, col = "red", lwd = "2")
lines(sspline.fit2, col = "blue", lwd = "1")
title(main = "Wage vs employee Age n Smoothing Spline Fit [Wage{
ISLR}]",
outer = TRUE)
legend("topright", inset = 0.05, legend = c("16 dofs", "6.47 dofs
"),
col = c("red", "blue"), lty = 1, lwd = c("2", "1"), cex =
0.8)
15
Figure 3: Smoothing spline fits for Employees Wage vs their Age. One with pre-configured
16 effective dofs (red line) and the other one with 6.47 effective dofs as determine by
LOOCV method.
3 Local Regression
Here, as an alternative to produce a non-linear fit, we perform local regression by making
use of the loessstats() function.
# Local Regression with each neighborhood spanning 20% of the
# observations
loess.fit <- loess(wage ~ age, span = 0.2, data = Wage)
# Local Regression with each neighborhood spannings 50% of
# the observations
loess.fit2 <- loess(wage ~ age, span = 0.5, data = Wage)
16
and produce the corresponding plot by executing the code below (Figure 4).
par(mfrow = c(1, 1))
par(mfrow = c(1, 1), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0))
plot(age, wage, xlim = ageMinMax, cex = 0.5, col = "darkgrey",
pch = 8, xlab = "Age", ylab = "Wage")
lines(age.grid, predict(loess.fit, data.frame(age = age.grid)),
col = "blue", lwd = "1")
lines(age.grid, predict(loess.fit2, data.frame(age = age.grid)),
col = "red", lwd = "1")
title(main = "Wage vs employee Age n Local Regression Fit [Wage{
ISLR}]",
outer = TRUE)
legend("topright", inset = 0.05, legend = c("Span 20%", "Span
50%"),
col = c("blue", "red"), lty = 1, lwd = c("1", "1"), cex =
0.8)
4 General Additive Models (GAMs)
As a generalization of the previously studied models, we now discuss additive linear models
but make use of a more flexible choice of the different fitting methods for the different
variables we are going to use as predictors. These class of models are the so-called
General Additive Models (GAMs) and have the general form
GAMs : yi = 0 +
p
j=1
fj(xij) + ϸi . (3)
As a first example we examine the fit
17
Figure 4: Local Regression fits for Employees Wage vs their Age. One with each neighbor-
hood pre-configured to span 20% (blue line) and the other one with 50% span percentage.
library(splines)
gam1 <- lm(wage ~ ns(year, 4) + ns(age, 5) + Wage$education,
data = Wage, subset = train)
However, in case we want to use the smooth splines or other components that cannot be
expressed in terms of the basis functions, we have to use more general sorts of GAMs to
make the fit, even if the model is additive. To do so we use the mgcv library, which was
introduced in [Wood, 2006] and it is provided here by the Oracle R distribution∗
.
The s() function, which is part of the mgcv library, is used to call smoothing spline fits.
∗
Alternatively, one can use the Trevor Hastie’s original library for that purpose, gam, [Hastie and Tib-
shirani, 1990]. However, we find mgcv much more complete to build GAMs models and we have chosen to
use this package to make our calculations.
18
To repeat the previous fit but with smoothing splines models we execute the following R
code.
library(mgcv)
gam.m3 <- gam(wage ~ s(year, k = 5) + s(age, k = 6) + education,
family = gaussian(), data = Wage, subset = train)
Here, we specify that the function of year should have 4 degrees of freedom, and that
the function of age will have 5 degrees of freedom. Since education is a categorical
variable, we leave it as is, and it is converted by the function into four dummy variables.
The produced fitted model can be produced as below.
par(mfrow = c(1, 1))
par(mfrow = c(1, 3), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0))
plot(gam.m3, se = TRUE, col = "blue")
plot(education[train], gam.m3$y)
title("Smoothing Splines Fit [mgcv]nWage ~ s (year,k = 5) + s (
age,k = 6) + education",
outer = TRUE)
Note that in the first plot of Figure 5 the function year looks rather linear. We can
perform a series of ANOVA tests in order to determine which of these three models is best:
a GAM that excludes year (M1), a GAM that uses a linear function of year (M2), or a
GAM that uses a spline function of year (M3) as the one build above? Note, that in all
these models we do include the education variable which seems to be a good choice
according the short discussion in the end of section 1.1.
gam.m1 <- gam(wage ~ s(age, k = 6) + education, family = gaussian
(),
data = Wage, subset = train)
gam.m2 <- gam(wage ~ year + s(age, k = 6) + education, family =
gaussian(),
data = Wage, subset = train)
19
Figure 5: GAM fitted model using smoothing splines through mgcv library.
anova(gam.m1, gam.m2, gam.m3, test = "F")
## Analysis of Deviance Table
##
## Model 1: wage ~ s(age, k = 6) + education
## Model 2: wage ~ year + s(age, k = 6) + education
## Model 3: wage ~ s(year, k = 5) + s(age, k = 6) + education
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 1489.0 1812973
## 2 1488.0 1804686 1.00039 8287.4 6.8395 0.008999 **
## 3 1487.2 1801351 0.77868 3334.8 3.5358 0.069726 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
20
We find that there is compelling evidence that a GAM with a linear function of year is
better than a GAM that does not include year at all (F= 6.8395, p-value=0.00899).
However, there is no strong evidence that a non-linear function of year is actually re-
quired (F=3.5358, p-value=0.069726). So, based on the results of this ANOVA test,
the M2 model is preferred. Indeed, a closer look in the summary of the last fitted model
gam.m3
summary(gam.m3)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## wage ~ s(year, k = 5) + s(age, k = 6) + education
##
## Parametric coefficients:
## Estimate Std. Error t value
## (Intercept) 86.395 2.845 30.373
## education2. HS Grad 9.289 3.266 2.844
## education3. Some College 23.054 3.450 6.682
## education4. College Grad 37.938 3.404 11.144
## education5. Advanced Degree 61.579 3.738 16.473
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## education2. HS Grad 0.00452 **
## education3. Some College 3.31e-11 ***
## education4. College Grad < 2e-16 ***
## education5. Advanced Degree < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(year) 1.760 2.171 3.908 0.0179 *
## s(age) 3.018 3.658 29.208 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.288 Deviance explained = 29.2%
## GCV score = 1219.2 Scale est. = 1211.2 n = 1497
21
reveals that a linear function in year is adequate for this term (F= 3.90, p-value =
0.0179), whereas for the age variable a non-linear function is required (F= 29.208,
p-value < 2e-16). Note, that the p-values for year and age correspond to a null
hypothesis of a linear relationship of the particular GAM term versus the alternative of a
non-linear relationship.
Of course, we can make predictions as before using a test data set of Wage and make a
more safe conclusion by comparing the Mean-Squared Errors of the two models gam.m2
and gam.m3.
gam.m2.pred <- predict(gam.m2, newdata = Wage[test, ])
gam.m3.pred <- predict(gam.m3, newdata = Wage[test, ])
mean((Wage[test, ]$wage - gam.m2.pred)^2)
## [1] 1275.063
mean((Wage[test, ]$wage - gam.m3.pred)^2)
## [1] 1276.13
Again, the gam.m2 model is found to be a better fit.
References
[Hastie and Tibshirani, 1990] Hastie, T. and Tibshirani, R. (1990). Generalized Additive
Models.
[James et al., 2013] James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An
Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics).
Springer, 1st ed. 2013. corr. 4th printing 2014 edition.
[Wood, 2006] Wood, S. (2006). Generalized Additive Models: An Introduction with R. Chap-
man and Hall/CRC.
22

More Related Content

What's hot

Linear regression
Linear regressionLinear regression
Linear regression
suncil0071
 
50120140503015
5012014050301550120140503015
50120140503015
IAEME Publication
 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
Shubham Joon
 
Chapter 3 Dosages and Calculations
Chapter 3 Dosages and CalculationsChapter 3 Dosages and Calculations
Chapter 3 Dosages and Calculations
kevinyocum4
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
Ashwin Rao
 
Linear regression
Linear regressionLinear regression
Linear regression
suncil0071
 
Comm5005 lecture 4
Comm5005 lecture 4Comm5005 lecture 4
Comm5005 lecture 4
blinking1
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
Ashwin Rao
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
Julio Martinez Andrade
 
20120140503019
2012014050301920120140503019
20120140503019
IAEME Publication
 
IRJET- Optimization of 1-Bit ALU using Ternary Logic
IRJET- Optimization of 1-Bit ALU using Ternary LogicIRJET- Optimization of 1-Bit ALU using Ternary Logic
IRJET- Optimization of 1-Bit ALU using Ternary Logic
IRJET Journal
 
5 the relational algebra and calculus
5 the relational algebra and calculus5 the relational algebra and calculus
5 the relational algebra and calculus
Kumar
 

What's hot (12)

Linear regression
Linear regressionLinear regression
Linear regression
 
50120140503015
5012014050301550120140503015
50120140503015
 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
 
Chapter 3 Dosages and Calculations
Chapter 3 Dosages and CalculationsChapter 3 Dosages and Calculations
Chapter 3 Dosages and Calculations
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Comm5005 lecture 4
Comm5005 lecture 4Comm5005 lecture 4
Comm5005 lecture 4
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
 
20120140503019
2012014050301920120140503019
20120140503019
 
IRJET- Optimization of 1-Bit ALU using Ternary Logic
IRJET- Optimization of 1-Bit ALU using Ternary LogicIRJET- Optimization of 1-Bit ALU using Ternary Logic
IRJET- Optimization of 1-Bit ALU using Ternary Logic
 
5 the relational algebra and calculus
5 the relational algebra and calculus5 the relational algebra and calculus
5 the relational algebra and calculus
 

Viewers also liked

Water Brigade 2015
Water Brigade 2015Water Brigade 2015
Water Brigade 2015
Madison Printen
 
Central tedancy & correlation project - 1
Central tedancy & correlation project - 1Central tedancy & correlation project - 1
Central tedancy & correlation project - 1
The Superior University, Lahore
 
Brasileiros no Mundo - Estimativas 2012
Brasileiros no Mundo - Estimativas 2012Brasileiros no Mundo - Estimativas 2012
Brasileiros no Mundo - Estimativas 2012
Instituto Diáspora Brasil (IDB)
 
Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?
Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?
Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?
Instituto Diáspora Brasil (IDB)
 
Android analysis of distance-based location management in wireless communica...
Android  analysis of distance-based location management in wireless communica...Android  analysis of distance-based location management in wireless communica...
Android analysis of distance-based location management in wireless communica...
Ecwayt
 
WordPressもくもく勉強会に出てからの変化について
WordPressもくもく勉強会に出てからの変化についてWordPressもくもく勉強会に出てからの変化について
WordPressもくもく勉強会に出てからの変化について
Toshiki Tanji
 
Λουτρά Πόζαρ
Λουτρά  ΠόζαρΛουτρά  Πόζαρ
Λουτρά Πόζαρ
xrysa123
 
Immigrant Integration
Immigrant IntegrationImmigrant Integration
Immigrant Integration
Instituto Diáspora Brasil (IDB)
 
Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe
Erica Windisch
 
Iταλία
IταλίαIταλία
Iταλία
xrysa123
 
Sino si Duterte
Sino si DuterteSino si Duterte
Sino si Duterte
Restie Esguerra
 
Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」
Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」
Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」
Nobutaka OSHIRO
 
The current state of access management
The current state of access managementThe current state of access management
The current state of access management
Elimity
 
Kelloggs Integrated Marketing Communication
Kelloggs Integrated Marketing CommunicationKelloggs Integrated Marketing Communication
Kelloggs Integrated Marketing Communication
Udit Goel
 
Live drawing for communication and co creation 2016
Live drawing for communication and co creation 2016Live drawing for communication and co creation 2016
Live drawing for communication and co creation 2016
Patricia Kambitsch
 
HTML5ハイブリッドアプリ開発のベストプラクティス
HTML5ハイブリッドアプリ開発のベストプラクティスHTML5ハイブリッドアプリ開発のベストプラクティス
HTML5ハイブリッドアプリ開発のベストプラクティス
アシアル株式会社
 
PROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTAL
PROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTALPROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTAL
PROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTAL
miriam gutierrez
 

Viewers also liked (17)

Water Brigade 2015
Water Brigade 2015Water Brigade 2015
Water Brigade 2015
 
Central tedancy & correlation project - 1
Central tedancy & correlation project - 1Central tedancy & correlation project - 1
Central tedancy & correlation project - 1
 
Brasileiros no Mundo - Estimativas 2012
Brasileiros no Mundo - Estimativas 2012Brasileiros no Mundo - Estimativas 2012
Brasileiros no Mundo - Estimativas 2012
 
Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?
Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?
Wealth Inequalities in Greater Boston: Do Race and Ethnicity Matter?
 
Android analysis of distance-based location management in wireless communica...
Android  analysis of distance-based location management in wireless communica...Android  analysis of distance-based location management in wireless communica...
Android analysis of distance-based location management in wireless communica...
 
WordPressもくもく勉強会に出てからの変化について
WordPressもくもく勉強会に出てからの変化についてWordPressもくもく勉強会に出てからの変化について
WordPressもくもく勉強会に出てからの変化について
 
Λουτρά Πόζαρ
Λουτρά  ΠόζαρΛουτρά  Πόζαρ
Λουτρά Πόζαρ
 
Immigrant Integration
Immigrant IntegrationImmigrant Integration
Immigrant Integration
 
Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe
 
Iταλία
IταλίαIταλία
Iταλία
 
Sino si Duterte
Sino si DuterteSino si Duterte
Sino si Duterte
 
Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」
Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」
Meteor勉強会発表資料「MeteorでiOSアプリを作ろう!」
 
The current state of access management
The current state of access managementThe current state of access management
The current state of access management
 
Kelloggs Integrated Marketing Communication
Kelloggs Integrated Marketing CommunicationKelloggs Integrated Marketing Communication
Kelloggs Integrated Marketing Communication
 
Live drawing for communication and co creation 2016
Live drawing for communication and co creation 2016Live drawing for communication and co creation 2016
Live drawing for communication and co creation 2016
 
HTML5ハイブリッドアプリ開発のベストプラクティス
HTML5ハイブリッドアプリ開発のベストプラクティスHTML5ハイブリッドアプリ開発のベストプラクティス
HTML5ハイブリッドアプリ開発のベストプラクティス
 
PROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTAL
PROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTALPROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTAL
PROCESO DE ELABORACION DE UNA INVESTIGACION DOCUMENTAL
 

Similar to Moving Beyond Linearity (Article 7 - Practical Exercises)

[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization
Andrea Rubio
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
Raymond Christopher Peralta
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
Mohamed Essam
 
Csis 5420 week 8 homework answers (13 jul 05)
Csis 5420 week 8 homework   answers (13 jul 05)Csis 5420 week 8 homework   answers (13 jul 05)
Csis 5420 week 8 homework answers (13 jul 05)
Thắng Tạ Bảo
 
Multivariate Techniques
Multivariate TechniquesMultivariate Techniques
Multivariate Techniques
Terry Chaney
 
1624.pptx
1624.pptx1624.pptx
1624.pptx
Jyoti863900
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
Arunangsu Sahu
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
Mohamad Sahil
 
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxDataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
theodorelove43763
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
Theodore Grammatikopoulos
 
MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik Malla
Kartik Malla
 
Statistics Project
Statistics ProjectStatistics Project
Statistics Project
NicholasDavis85
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
working with python
working with pythonworking with python
working with python
bhavesh lande
 
Week 4 Lecture 12 Significance Earlier we discussed co.docx
Week 4 Lecture 12 Significance Earlier we discussed co.docxWeek 4 Lecture 12 Significance Earlier we discussed co.docx
Week 4 Lecture 12 Significance Earlier we discussed co.docx
cockekeshia
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final Draft
Nick Imholte
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
oreo10
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
Raman Kannan
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
preetikumara
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
Yanchang Zhao
 

Similar to Moving Beyond Linearity (Article 7 - Practical Exercises) (20)

[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
Csis 5420 week 8 homework answers (13 jul 05)
Csis 5420 week 8 homework   answers (13 jul 05)Csis 5420 week 8 homework   answers (13 jul 05)
Csis 5420 week 8 homework answers (13 jul 05)
 
Multivariate Techniques
Multivariate TechniquesMultivariate Techniques
Multivariate Techniques
 
1624.pptx
1624.pptx1624.pptx
1624.pptx
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxDataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik Malla
 
Statistics Project
Statistics ProjectStatistics Project
Statistics Project
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
working with python
working with pythonworking with python
working with python
 
Week 4 Lecture 12 Significance Earlier we discussed co.docx
Week 4 Lecture 12 Significance Earlier we discussed co.docxWeek 4 Lecture 12 Significance Earlier we discussed co.docx
Week 4 Lecture 12 Significance Earlier we discussed co.docx
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final Draft
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 

Recently uploaded

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Moving Beyond Linearity (Article 7 - Practical Exercises)

  • 1. Moving Beyond Linearity [ISLR.2013.Ch7-7] Theodore Grammatikopoulos∗ Tue 6th Jan, 2015 Abstract Linear models are relatively simple to describe and implement, and have advantages over other approaches in terms of interpretation and inference. However, standard linear regression can have significant limitations in terms of predictive power. This is because the linearity assumption is almost always an approximation, and sometimes a poor one. In article 6 we see that we can improve Least Squares Estimation methods by using Ridge Regression, Lasso, Principal Components Regression (PCR), Principal Least Squares (PLS), and other techniques. In that setting, the improvement is obtained by reducing the complexity of the linear model, and hence the variance of the estimates. But we are still using a linear model, which can only be improved so far. Here, we also relax the linearity assumption but we still attempting to maintain as much interpret-ability as possible. We do this by examining very simple extensions of linear models like Polynomial Regression and Piece-wise Step Functions, as well as more sophisticated approaches such as Splines, Local Regression, and Generalized Additive Models (GAMs). ## OTN License Agreement: Oracle Technology Network - Developer ## Oracle Distribution of R version 3.0.1 (--) Good Sport ## Copyright (C) The R Foundation for Statistical Computing ## Platform: x86_64-unknown-linux-gnu (64-bit) D:20150106213538+02’00’ ∗ e-mail:tgrammat@gmail.com 1
  • 2. 1 Non-Linear Modeling In this lab we re-analyze the Wage data set considered in the examples throughout Chapter 7 of “ISLR.2013” [James et al., 2013]. We will see that many of the complex non-linear fitting procedures discussed in this Chapter can be easily implemented in R. library(ISLR) attach(Wage) 1.1 Polynomial Regression As a first attempt to describe employee’s wage in terms of their age, we will try a poly- nomial regression fit. We will determine the best polynomial degree to do so and we will afterwards examine to what extent this could be successful. The reason to search for a non-linear fit to describe the wage ∼ age dependence, it is almost apparent by the corresponding plot of the two variables, shown in Figure 1 below. First, the employee data set can be easily distinguished in two groups, a “High Earners Group” and a “Low Earners” one. Secondly, the wage ∼ age dependence of this “Low Earners” employees group is certainly non-linear and most probably of a higher than the second polynomial degree. Next, we should decide on the exact degree of the polynomial to use. In the article “Linear Model Selection and Regularization (ISLR.2013.Ch6-6)”, we studied two different ways to do so. Either by applying some subset variable selection method or by using cross-validation. Here, we will discuss an alternative approach, the so-called Hypothesis Testing. More specifically, we can fit models ranging from linear to degree-5 polynomial and seek to determine the simplest model which is sufficient to explain the wage ∼ age relationship. To do so we perform analysis of variance (ANOVA, F-test), by using the anova() function, in order to test the null hypothesis that a model M1 is sufficient to explain the data against the alternative hypothesis that a more complex model M2 is required. In order to use the anova() function, M1 and M2 must be nested models: the predictors in M1 must be a subset of the predictors in M2. In this case, we fit five different models and sequentially compare the simpler model to the more complex model. lm.Poly1.fit <- lm(wage ~ age, data = Wage) lm.Poly2.fit <- lm(wage ~ poly(age, 2), data = Wage) lm.Poly3.fit <- lm(wage ~ poly(age, 3), data = Wage) 2
  • 3. lm.Poly4.fit <- lm(wage ~ poly(age, 4), data = Wage) lm.Poly5.fit <- lm(wage ~ poly(age, 5), data = Wage) Figure 1: 4-degree Polynomial fit for Employees’ Wages vs their Age. The probability of an employee to be a high earner person as a function of her age is also depicted. anova(lm.Poly1.fit, lm.Poly2.fit, lm.Poly3.fit, lm.Poly4.fit, lm.Poly5.fit) ## Analysis of Variance Table ## ## Model 1: wage ~ age ## Model 2: wage ~ poly(age, 2) ## Model 3: wage ~ poly(age, 3) ## Model 4: wage ~ poly(age, 4) ## Model 5: wage ~ poly(age, 5) 3
  • 4. ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 2998 5022216 ## 2 2997 4793430 1 228786 143.5931 < 2.2e-16 *** ## 3 2996 4777674 1 15756 9.8888 0.001679 ** ## 4 2995 4771604 1 6070 3.8098 0.051046 . ## 5 2994 4770322 1 1283 0.8050 0.369682 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The p-value comparing the linear Model 1 to the quadratic Model 2 is essentially zero (< 1015 ), indicating that a linear fit is not sufficient. Similarly the p-value comparing the quadratic Model 2 to the cubic Model 3 is very low (∼ 0.0017), so the quadratic fit is also insufficient. The p-value comparing the cubic to the degree-4 polynomials, Model 3 and Model 4, is approximately 5% while the degree-5 polynomial Model 5 seems unnecessary because its p-value is ∼ 0.37 with a not large F-statistic. Hence, either a cubic or a quartic polynomial appear to provide a reasonable fit to the data, but lower or higher-order models are not justified. Here, we choose to describe the wage ∼ age dependence by a quartic polynomial, i.e.: wage ∼ Intercept + 0 ∗ age + 1 ∗ age2 + 2 ∗ age3 + 3 ∗ age4 . The estimated coefficient calculated by the method can be retrieved by the following call lm.Poly4.fit.Wage <- lm.Poly4.fit coef(summary(lm.Poly4.fit.Wage)) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 111.70361 0.7287409 153.283015 0.000000e+00 ## poly(age, 4)1 447.06785 39.9147851 11.200558 1.484604e-28 ## poly(age, 4)2 -478.31581 39.9147851 -11.983424 2.355831e-32 ## poly(age, 4)3 125.52169 39.9147851 3.144742 1.678622e-03 ## poly(age, 4)4 -77.91118 39.9147851 -1.951938 5.103865e-02 Next, we create a grid of values for age at which we want predictions, call the generic predict() functions and calculate the standard errors ageMinMax <- range(Wage$age) age.grid <- seq(from = ageMinMax[1], to = ageMinMax[2]) 4
  • 5. preds.poly <- predict(lm.Poly4.fit.Wage, newdata = list(age = age .grid), se.fit = TRUE) se.bands <- cbind(preds.poly$fit + 2 * preds.poly$se.fit, preds. poly$fit - 2 * preds.poly$se.fit) Finally, we plot the data and add the degree-4 Polynomial fit par(mfrow = c(1, 2), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0)) plot(Wage$age, Wage$wage, xlim = ageMinMax, cex = 0.5, col = " darkgrey", xlab = "Age", ylab = "Wage") title("Wage vs Age Fit n Degree-4 Polynomial [Wage]", outer = TRUE) lines(age.grid, preds.poly$fit, lwd = 2, col = "blue") matlines(age.grid, se.bands, lwd = 2, col = "blue", lty = 3) As shown in Figure 1, the employee data set can be easily distinguished in two groups, a “High Earners Group” and a “Low Earners” one. To calculate the probability an employee to annually earn more than 250k USD, we create the appropriate response vector for the categorical variable of 1l (wage > 250) glm.Poly4.binomial.fit.Wage <- glm(I(wage > 250) ~ poly(age, 4), data = Wage, family = binomial) and make the predictions as before. glm.Poly4.binomial.preds.Wage <- predict(glm.Poly4.binomial.fit. Wage, newdata = list(age = age.grid), se.fit = TRUE) However, calculating the probability P (Wage > 250 | Age) and its corresponding confi- dence intervals is slightly more involved than in the linear regression case. The default 5
  • 6. prediction type for a glm() model is type="link", which what we use here. This means we get predictions for the logit, i.e we have fit a model of the form log P(Y = 1 | X) 1 − P(Y = 1 | X) = X , (1) which means that the predictions, as well as its standard errors are of X form. Therefore, if we have to plot the P (Wage > 250 | Age) as a function of employee’s age, we have to transform the resulting fit accordingly, that is P(Y = 1|X) = exp( X) 1 + exp( X) , (2) or in R code preds <- glm.Poly4.binomial.preds.Wage pfit <- exp(preds$fit)/(1 + exp(preds$fit)) se.bands.logit <- cbind(preds$fit + 2 * preds$se.fit, preds$fit - 2 * preds$se.fit) se.bands <- exp(se.bands.logit)/(1 + exp(se.bands.logit)) and plot the result which is shown in the left panel of Figure 1. plot(age, I(wage > 250), xlim = ageMinMax, type = "n", ylim = c (0, 0.2), xlab = "Age", ylab = "P(Wage>250|Age)") points(jitter(age), (I(wage > 250)/5), cex = 0.5, pch = "|", col = "darkgrey") lines(age.grid, pfit, lwd = 2, col = "blue") matlines(age.grid, se.bands, lwd = 2, lty = 3, col = "blue") It is interesting to note here that the function 6
  • 7. coef(summary(lm.Poly4.fit.Wage)) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 111.70361 0.7287409 153.283015 0.000000e+00 ## poly(age, 4)1 447.06785 39.9147851 11.200558 1.484604e-28 ## poly(age, 4)2 -478.31581 39.9147851 -11.983424 2.355831e-32 ## poly(age, 4)3 125.52169 39.9147851 3.144742 1.678622e-03 ## poly(age, 4)4 -77.91118 39.9147851 -1.951938 5.103865e-02 returns a matrix whose columns are a basis of orthogonal polynomials, which essentially means that each column is a linear combination of the variables age, age^2, age^3 and age^4. However we can also obtain a direct fit to the {age,age^2,age^3,age^4} variable basis by demanding raw=TRUE in the the previous code as shown below. This does not affect the model in a meaningful way. The choice of basis clearly affects the coefficient estimates, but it does not affect the fitted values obtained. # Direct Fit in {age,age^2,age^3,age^4} Basis lm.Poly4.fit.Wage2 <- lm(wage ~ poly(age, 4, raw = TRUE), data = Wage) coef(summary(lm.Poly4.fit.Wage2)) ## Estimate Std. Error ## (Intercept) -1.841542e+02 6.004038e+01 ## poly(age, 4, raw = TRUE)1 2.124552e+01 5.886748e+00 ## poly(age, 4, raw = TRUE)2 -5.638593e-01 2.061083e-01 ## poly(age, 4, raw = TRUE)3 6.810688e-03 3.065931e-03 ## poly(age, 4, raw = TRUE)4 -3.203830e-05 1.641359e-05 ## t value Pr(>|t|) ## (Intercept) -3.067172 0.0021802539 ## poly(age, 4, raw = TRUE)1 3.609042 0.0003123618 ## poly(age, 4, raw = TRUE)2 -2.735743 0.0062606446 ## poly(age, 4, raw = TRUE)3 2.221409 0.0263977518 ## poly(age, 4, raw = TRUE)4 -1.951938 0.0510386498 Two other equivalents ways of calculating the same fit whereas protecting power terms of age are the following: # Direct Fit in {age,age^2,age^3,age^4} Basis 7
  • 8. lm.Poly4.fit.Wage3 <- lm(wage ~ age + I(age^2) + I(age^3) + I(age ^4), data = Wage) coef(summary(lm.Poly4.fit.Wage3)) # Direct Fit in {age,age^2,age^3,age^4} Basis lm.Poly4.fit.Wage4 <- lm(wage ~ cbind(age, age^2, age^3, age^4), data = Wage) coef(summary(lm.Poly4.fit.Wage4)) Comparing now the fitted values obtained in either case we found them identical, as expected. preds.raw <- predict(lm.Poly4.fit.Wage2, newdata = list(age = age .grid), se.fit = TRUE) max(abs(preds.poly$fit - preds.raw$fit)) ## [1] 8.739676e-12 Note: The ANOVA method also works in more general cases, that is when terms other than the orthogonal polynomials are also included. For example, we can use anova() to also compare these three models fit.1.Wage <- lm(wage ~ education + age, data = Wage) fit.2.Wage <- lm(wage ~ education + poly(age, 2), data = Wage) fit.3.Wage <- lm(wage ~ education + poly(age, 3), data = Wage) fit.4.Wage <- lm(wage ~ education + poly(age, 4), data = Wage) anova(fit.1.Wage, fit.2.Wage, fit.3.Wage, fit.4.Wage) 8
  • 9. ## Analysis of Variance Table ## ## Model 1: wage ~ education + age ## Model 2: wage ~ education + poly(age, 2) ## Model 3: wage ~ education + poly(age, 3) ## Model 4: wage ~ education + poly(age, 4) ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 2994 3867992 ## 2 2993 3725395 1 142597 114.6595 < 2e-16 *** ## 3 2992 3719809 1 5587 4.4921 0.03413 * ## 4 2991 3719777 1 32 0.0255 0.87308 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 giving an outcome which actually support the third case of model, i.e.: Model 3 : wage ∼ education + poly(age, 3) . Now comparing this new model with the one we have examined before, i.e. wage ∼ Intercept + 0 ∗ age + 1 ∗ age2 + 2 ∗ age3 + 3 ∗ age4 , we obtain the following results # Split the data set in a Train and a Test Data Part set.seed(356) train <- sample(c(TRUE, FALSE), nrow(Wage), rep = TRUE) test <- (!train) # wage ~ poly(age,4) Model lm.Poly4.fit <- lm(wage ~ poly(age, 4), data = Wage[train, ]) preds.polyNew <- predict(lm.Poly4.fit, newdata = Wage[test, ], se.fit = TRUE) # wage ~ education + poly(age,3) fit.3.Wage <- lm(wage ~ education + poly(age, 3), data = Wage[ train, ]) preds.fit.3 <- predict(fit.3.Wage, newdata = Wage[test, ], se.fit = TRUE) 9
  • 10. # MSEs calculation mse.polyNew <- mean((Wage$wage[test] - preds.polyNew$fit)^2) mse.polyNew ## [1] 1622 mse.fit.3 <- mean((Wage$wage[test] - preds.fit.3$fit)^2) mse.fit.3 ## [1] 1278.3 which suggests that the new model, Model 3 : wage ∼ education + poly(age, 3) , is in fact better fit to predict the employee’s wage variable. 1.2 Piece-wise constant functions Here we try to fit a piece-wise constant function to describe the employee’s wage in terms of their age. To do so we use the cut() function as shown below stepfunction.lm.fit.Wage <- lm(wage ~ cut(age, 4), data = Wage) coef(summary(stepfunction.lm.fit.Wage)) ## Estimate Std. Error t value ## (Intercept) 94.158 1.476 63.790 ## cut(age, 4)(33.5,49] 24.053 1.829 13.148 ## cut(age, 4)(49,64.5] 23.665 2.068 11.443 ## cut(age, 4)(64.5,80.1] 7.641 4.987 1.532 ## Pr(>|t|) ## (Intercept) 0.000e+00 ## cut(age, 4)(33.5,49] 1.982e-38 ## cut(age, 4)(49,64.5] 1.041e-29 ## cut(age, 4)(64.5,80.1] 1.256e-01 The function cut() returns an ordered categorical variable; the lm() function then creates a set of dummy variables for use in the regression. The age < 33.5 category is left 10
  • 11. out, so the intercept coefficient of $ 94, 158 can be interpreted as the average salary for those under 33.5 years of age, and the other coefficients can be interpreted as the average additional salary for those in the other age groups. Of course, we can produce predictions and plots just as we did in the case of the polynomial fit. Finally, note that the cut() function automatically picked the cut-points of the age variable. However, one can also impose the cut-points of her/his choice by using the breaks option of the function. 2 Splines 2.1 Regression Splines Regression splines fits can be produced in R, by loading the splines library. First we construct an appropriate matrix of basis functions for a specified set of knots, by calling the bs{splines}() function. library(splines) basis.fxknots <- bs(Wage$age, knots = c(25, 40, 60)) dim(basis.fxknots) ## [1] 3000 6 Alternatively, we can let the library determine the correct number of knots by deciding only the required degrees of freedoms, df. For a degree-d step polynomial and a fitted model with K notes, one needs (d + 1)(K + 1) − d K = d + K + 1 “dofs”, or d + K “dofs” if there is no an intercept in the model. In particular, for a cubic spline basis without an intercept (default) and with 6 “dofs”, the model is constrained to use only 3 knots which are distributed along uniform quantiles of the age variable. basis.fxdf1 <- bs(Wage$age, df = 6, intercept = FALSE) dim(basis.fxdf1) ## [1] 3000 6 attr(basis.fxdf1, "knots") ## 25% 50% 75% ## 33.75 42.00 51.00 11
  • 12. Should we demand this model to also have an intercept: basis.fxdf2 <- bs(Wage$age, df = 6, intercept = TRUE) dim(basis.fxdf2) ## [1] 3000 6 attr(basis.fxdf2, "knots") ## 33.33% 66.67% ## 37 48 whereas for one polynomial degree higher, i.e. quartic spline: basis.fxdf3 <- bs(Wage$age, df = 6, degree = 4, intercept = TRUE) dim(basis.fxdf3) ## [1] 3000 6 attr(basis.fxdf3, "knots") ## 50% ## 42 The first case of the cubic splines referenced above seems more promising. To produce a prediction fit # Produce an age Grid ageMinMax <- range(Wage$age) age.grid <- seq(from = ageMinMax[1], to = ageMinMax[2]) # Produce a prediction fit splines.bs1.fit <- lm(wage ~ bs(age, df = 6, intercept = FALSE), data = Wage) splines.bs1.pred <- predict(splines.bs1.fit, newdata = list(age = age.grid), se.fit = TRUE) 12
  • 13. se.bands <- cbind(splines.bs1.pred$fit + 2 * splines.bs1.pred$se. fit, splines.bs1.pred$fit - 2 * splines.bs1.pred$se.fit) and a corresponding plot of the Wage ∼ Age dependence par(mfrow = c(1, 1)) par(mfrow = c(1, 1), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0)) plot(Wage$age, Wage$wage, col = "grey", xlab = "Age", ylab = " Wage", cex = 0.5, pch = 8) lines(age.grid, splines.bs1.pred$fit, col = "blue", lwd = 2) matlines(age.grid, se.bands, lty = "dashed", col = "blue") title(main = "Wage vs empoloyee Age nRegression and Natural Splines Fit [Wage{ISLR}]", outer = TRUE) In order to fit a natural spline instead, that is a regression spline with linear boundary conditions, we make use of the ns() function. All these results, Wage ∼ Age data points as well the two prediction fits, that of the cubic spline (blue line) and the other of the natural cubic spline (red line) are shown in Figure 2. basis.ns <- ns(Wage$age, df = 6, intercept = TRUE) splines.ns.fit <- lm(wage ~ ns(age, df = 6), data = Wage) splines.ns.pred <- predict(splines.ns.fit, newdata = list(age = age.grid), se.fit = TRUE) se.ns.bands <- cbind(splines.ns.pred$fit - 2 * splines.ns.pred$se .fit, splines.ns.pred$fit + 2 * splines.ns.pred$se.fit) 13
  • 14. Figure 2: Cubic spline fit for Employees Wage vs their Age with 3 knots and 6 dofs (blue lines). A natural spline fit with 6 dofs and an intercept is also depicted (red lines). lines(age.grid, splines.ns.pred$fit, col = "red", lwd = 2) matlines(age.grid, se.ns.bands, lty = "dashed", col = "red") legend("topright", inset = 0.05, legend = c("Cubic Spline", " Natural Cubic Spline"), col = c("blue", "red"), lty = 1, lwd = c("2", "2")) 2.2 Smoothing Splines Here, we make a smoothing spline fit for the wage ∼ age dependence of the employees’ Wage data set. To do so we utilize the smooth.spline{stats}() function as shown below # Smooth Spline with 16 effective dofs sspline.fit <- smooth.spline(Wage$age, Wage$wage, df = 16) 14
  • 15. # Cross Validated (LOOCV) Smooth Spline to let the software # determine the optimal number of effective dofs sspline.fit2 <- smooth.spline(Wage$age, Wage$wage, cv = FALSE, df.offset = 1) # effective dofs sspline.fit2$df ## [1] 6.467555 which can be plotted by running the code below (Figure 3). par(mfrow = c(1, 1)) par(mfrow = c(1, 1), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0)) plot(age, wage, xlim = ageMinMax, cex = 0.5, col = "darkgrey", xlab = "Age", ylab = "Wage", pch = 8) lines(sspline.fit, col = "red", lwd = "2") lines(sspline.fit2, col = "blue", lwd = "1") title(main = "Wage vs employee Age n Smoothing Spline Fit [Wage{ ISLR}]", outer = TRUE) legend("topright", inset = 0.05, legend = c("16 dofs", "6.47 dofs "), col = c("red", "blue"), lty = 1, lwd = c("2", "1"), cex = 0.8) 15
  • 16. Figure 3: Smoothing spline fits for Employees Wage vs their Age. One with pre-configured 16 effective dofs (red line) and the other one with 6.47 effective dofs as determine by LOOCV method. 3 Local Regression Here, as an alternative to produce a non-linear fit, we perform local regression by making use of the loessstats() function. # Local Regression with each neighborhood spanning 20% of the # observations loess.fit <- loess(wage ~ age, span = 0.2, data = Wage) # Local Regression with each neighborhood spannings 50% of # the observations loess.fit2 <- loess(wage ~ age, span = 0.5, data = Wage) 16
  • 17. and produce the corresponding plot by executing the code below (Figure 4). par(mfrow = c(1, 1)) par(mfrow = c(1, 1), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0)) plot(age, wage, xlim = ageMinMax, cex = 0.5, col = "darkgrey", pch = 8, xlab = "Age", ylab = "Wage") lines(age.grid, predict(loess.fit, data.frame(age = age.grid)), col = "blue", lwd = "1") lines(age.grid, predict(loess.fit2, data.frame(age = age.grid)), col = "red", lwd = "1") title(main = "Wage vs employee Age n Local Regression Fit [Wage{ ISLR}]", outer = TRUE) legend("topright", inset = 0.05, legend = c("Span 20%", "Span 50%"), col = c("blue", "red"), lty = 1, lwd = c("1", "1"), cex = 0.8) 4 General Additive Models (GAMs) As a generalization of the previously studied models, we now discuss additive linear models but make use of a more flexible choice of the different fitting methods for the different variables we are going to use as predictors. These class of models are the so-called General Additive Models (GAMs) and have the general form GAMs : yi = 0 + p j=1 fj(xij) + ϸi . (3) As a first example we examine the fit 17
  • 18. Figure 4: Local Regression fits for Employees Wage vs their Age. One with each neighbor- hood pre-configured to span 20% (blue line) and the other one with 50% span percentage. library(splines) gam1 <- lm(wage ~ ns(year, 4) + ns(age, 5) + Wage$education, data = Wage, subset = train) However, in case we want to use the smooth splines or other components that cannot be expressed in terms of the basis functions, we have to use more general sorts of GAMs to make the fit, even if the model is additive. To do so we use the mgcv library, which was introduced in [Wood, 2006] and it is provided here by the Oracle R distribution∗ . The s() function, which is part of the mgcv library, is used to call smoothing spline fits. ∗ Alternatively, one can use the Trevor Hastie’s original library for that purpose, gam, [Hastie and Tib- shirani, 1990]. However, we find mgcv much more complete to build GAMs models and we have chosen to use this package to make our calculations. 18
  • 19. To repeat the previous fit but with smoothing splines models we execute the following R code. library(mgcv) gam.m3 <- gam(wage ~ s(year, k = 5) + s(age, k = 6) + education, family = gaussian(), data = Wage, subset = train) Here, we specify that the function of year should have 4 degrees of freedom, and that the function of age will have 5 degrees of freedom. Since education is a categorical variable, we leave it as is, and it is converted by the function into four dummy variables. The produced fitted model can be produced as below. par(mfrow = c(1, 1)) par(mfrow = c(1, 3), mar = c(4, 4, 1, 1), oma = c(0, 0, 3, 0)) plot(gam.m3, se = TRUE, col = "blue") plot(education[train], gam.m3$y) title("Smoothing Splines Fit [mgcv]nWage ~ s (year,k = 5) + s ( age,k = 6) + education", outer = TRUE) Note that in the first plot of Figure 5 the function year looks rather linear. We can perform a series of ANOVA tests in order to determine which of these three models is best: a GAM that excludes year (M1), a GAM that uses a linear function of year (M2), or a GAM that uses a spline function of year (M3) as the one build above? Note, that in all these models we do include the education variable which seems to be a good choice according the short discussion in the end of section 1.1. gam.m1 <- gam(wage ~ s(age, k = 6) + education, family = gaussian (), data = Wage, subset = train) gam.m2 <- gam(wage ~ year + s(age, k = 6) + education, family = gaussian(), data = Wage, subset = train) 19
  • 20. Figure 5: GAM fitted model using smoothing splines through mgcv library. anova(gam.m1, gam.m2, gam.m3, test = "F") ## Analysis of Deviance Table ## ## Model 1: wage ~ s(age, k = 6) + education ## Model 2: wage ~ year + s(age, k = 6) + education ## Model 3: wage ~ s(year, k = 5) + s(age, k = 6) + education ## Resid. Df Resid. Dev Df Deviance F Pr(>F) ## 1 1489.0 1812973 ## 2 1488.0 1804686 1.00039 8287.4 6.8395 0.008999 ** ## 3 1487.2 1801351 0.77868 3334.8 3.5358 0.069726 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 20
  • 21. We find that there is compelling evidence that a GAM with a linear function of year is better than a GAM that does not include year at all (F= 6.8395, p-value=0.00899). However, there is no strong evidence that a non-linear function of year is actually re- quired (F=3.5358, p-value=0.069726). So, based on the results of this ANOVA test, the M2 model is preferred. Indeed, a closer look in the summary of the last fitted model gam.m3 summary(gam.m3) ## ## Family: gaussian ## Link function: identity ## ## Formula: ## wage ~ s(year, k = 5) + s(age, k = 6) + education ## ## Parametric coefficients: ## Estimate Std. Error t value ## (Intercept) 86.395 2.845 30.373 ## education2. HS Grad 9.289 3.266 2.844 ## education3. Some College 23.054 3.450 6.682 ## education4. College Grad 37.938 3.404 11.144 ## education5. Advanced Degree 61.579 3.738 16.473 ## Pr(>|t|) ## (Intercept) < 2e-16 *** ## education2. HS Grad 0.00452 ** ## education3. Some College 3.31e-11 *** ## education4. College Grad < 2e-16 *** ## education5. Advanced Degree < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(year) 1.760 2.171 3.908 0.0179 * ## s(age) 3.018 3.658 29.208 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.288 Deviance explained = 29.2% ## GCV score = 1219.2 Scale est. = 1211.2 n = 1497 21
  • 22. reveals that a linear function in year is adequate for this term (F= 3.90, p-value = 0.0179), whereas for the age variable a non-linear function is required (F= 29.208, p-value < 2e-16). Note, that the p-values for year and age correspond to a null hypothesis of a linear relationship of the particular GAM term versus the alternative of a non-linear relationship. Of course, we can make predictions as before using a test data set of Wage and make a more safe conclusion by comparing the Mean-Squared Errors of the two models gam.m2 and gam.m3. gam.m2.pred <- predict(gam.m2, newdata = Wage[test, ]) gam.m3.pred <- predict(gam.m3, newdata = Wage[test, ]) mean((Wage[test, ]$wage - gam.m2.pred)^2) ## [1] 1275.063 mean((Wage[test, ]$wage - gam.m3.pred)^2) ## [1] 1276.13 Again, the gam.m2 model is found to be a better fit. References [Hastie and Tibshirani, 1990] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. [James et al., 2013] James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics). Springer, 1st ed. 2013. corr. 4th printing 2014 edition. [Wood, 2006] Wood, S. (2006). Generalized Additive Models: An Introduction with R. Chap- man and Hall/CRC. 22