GLM
January 25, 2017
1 Three common mistakes concerning the GLM
Recently, I had a little discussion with a colleague of mine concerning linear models. I realized
how confusing the coined terminology is and how strongly it is still influenced by old-school
statistics.
Three facts about linear models that are not always understood:
1.1 1. GLM is not the abbreviation for general linear model
GLM stands for “generalized linear model”. The general linear model expresses all sorts of
ANOVAs (analysis of variance) in the linear regression framework. This is achieved by intro-
ducing dummy variables or other contrasts that encode nominal variables. The “generalized lin-
ear model” (GLM) extends the linear regression beyond models with normal error distributions.
This remark in the corresponding wiki-article is enlightening: https://en.wikipedia.org/
wiki/Generalized_linear_model#Confusion_with_general_linear_models
1.2 2. Logistic regression is not an extension of the normal linear model
From the perspective of modern statistics the GLM comprises many different linear models,
among others the classical linear model. Every distribution in the exponential family can be writ-
ten in the following form:
f(y|θ) = exp
yθ + b(θ)
Φ
+ c(y, Φ) ,
where θ is called the canonical parameter that in turn is a function of µ, the mean. This function is
called the canonical link function that links µ to a linear function of the regression parameters. In
short: it is this function that linearizes the relation between the dependent and the independent
variables. For the sake of completeness: b(θ) is a function of the canonical parameter and hence,
also depends on µ. Φ is called the dispersion parameter and c(y, Φ) is a function depending on the
observation and the dispersion parameter.
Normal distribution
f(y|µ, σ) = (2πσ2)−1
2 exp −1
2
y2−2yµ+µ2
σ2
= exp
yµ−µ2
2
σ2 − 1
2
y2
σ2 + log(2πσ2 , with
1
µ = θ(µ), i.e. µ is the canonical parameter and the link function is given by the identity
function. Hence, the mean can be modeled directly without any transformation. This case is the
classical linear regression.
Now, for the Poisson distribution we have
f(y|µ) = µye−µ
y! = µye−µ 1
y!
= exp (y log(µ) − µ − log(y!)) , where
the link function is given by log(µ). Note that the Poisson distribution does not have any
dispersion parameter.
And finally the bernoulli distribution from which we derive the logistic regression:
f(y|π) = πy(1 − π)1−y = exp (y log(π) + (1 − y) log(1 − π))
= exp (y log(π) + log(1 − π) − y log(1 − π))
= exp y log( π
1−π ) + log(1 − π) , where
the link function evaluates to log( π
1−π ). This function is also called the logit function whose
reverse function is the logistic function. Hence, it is the logit that is modeled by a lineare function
of the regressors: log( π
1−π ) = a + b1x1 + . . . + bjxj. If we plug the right hand term into the logistic
function we get the estimated probabilities:
P(y = 1|x) =
exp(a + b1x1 + . . . + bjxj)
1 + exp(a + b1x1 + . . . + bjxj)
.
Here, we demonstrated that the classical linear regression with normal error terms can be
seen as a special case of a much wider family of models comprising all distributions out of the
exponential family. (For a more complete treatment of other distributions see again https://
en.wikipedia.org/wiki/Generalized_linear_model.)
1.3 3. The term ‘linear’ does not stem from a linear regression line
Unfortunately, again, the naming is confusing. Most often, linear regression is explained for the
univariate case that is easy to depict. In most cases, we see a straight regression line that lends
itself to give the name to this statistical procedure. But, wrong! We will show an example where
the regression line is not a straight line. Albeit, the model is still a classical linear regression model:
In [29]: options(repr.plot.width=5, repr.plot.height=5)
# first we generate some data for the abscissa:
x <- runif(100, -5, 5)
# next, we add a linear transformation that is our generating model:
# 2.1 is the intercept
# the second term is the the slope: 1.3
# then come quadratic and cubic terms
# and last some random noise
y <- 2.1 + 1.3 * x + 0.06 * x^2 + 0.05 * x^3 + rnorm(100) * 2
plot(x, y, cex = 0.4)
2
In [30]: # now we're going to use the lm-function:
lm1 <- lm(y ~ x + I(x^2) + I(x^3))
# we predict the y-values for evenly-spaced x-values:
regLine <- predict.lm(lm1, data.frame(x = seq(-5, 5, length.out = 20)))
# and plot everything:
plot(x, y, cex = 0.4)
lines(seq(-5, 5, length.out = 20), regLine)
# the fitted coefficients:
round(lm1$coeff, 3)
2.168
1.167
0.081
0.057
3
Now, it is evident that we fitted a non-linear regression line with the classical linear model :-)
This is possible because the linear model is linear in its parameters. All model parameters, i.e.,
the intercept and the slope parameters are not raised to a power higher than one. If we perceive
x1 = x, x2 = x2, x3 = x3 as different variables, a change in one variable will only give a linear
change to the outcome. That’s why this model is named ‘linear’. In this made up example, the
model-equation is not linear in its variables, because we added quadratic and cubic terms.
4

Glm

  • 1.
    GLM January 25, 2017 1Three common mistakes concerning the GLM Recently, I had a little discussion with a colleague of mine concerning linear models. I realized how confusing the coined terminology is and how strongly it is still influenced by old-school statistics. Three facts about linear models that are not always understood: 1.1 1. GLM is not the abbreviation for general linear model GLM stands for “generalized linear model”. The general linear model expresses all sorts of ANOVAs (analysis of variance) in the linear regression framework. This is achieved by intro- ducing dummy variables or other contrasts that encode nominal variables. The “generalized lin- ear model” (GLM) extends the linear regression beyond models with normal error distributions. This remark in the corresponding wiki-article is enlightening: https://en.wikipedia.org/ wiki/Generalized_linear_model#Confusion_with_general_linear_models 1.2 2. Logistic regression is not an extension of the normal linear model From the perspective of modern statistics the GLM comprises many different linear models, among others the classical linear model. Every distribution in the exponential family can be writ- ten in the following form: f(y|θ) = exp yθ + b(θ) Φ + c(y, Φ) , where θ is called the canonical parameter that in turn is a function of µ, the mean. This function is called the canonical link function that links µ to a linear function of the regression parameters. In short: it is this function that linearizes the relation between the dependent and the independent variables. For the sake of completeness: b(θ) is a function of the canonical parameter and hence, also depends on µ. Φ is called the dispersion parameter and c(y, Φ) is a function depending on the observation and the dispersion parameter. Normal distribution f(y|µ, σ) = (2πσ2)−1 2 exp −1 2 y2−2yµ+µ2 σ2 = exp yµ−µ2 2 σ2 − 1 2 y2 σ2 + log(2πσ2 , with 1
  • 2.
    µ = θ(µ),i.e. µ is the canonical parameter and the link function is given by the identity function. Hence, the mean can be modeled directly without any transformation. This case is the classical linear regression. Now, for the Poisson distribution we have f(y|µ) = µye−µ y! = µye−µ 1 y! = exp (y log(µ) − µ − log(y!)) , where the link function is given by log(µ). Note that the Poisson distribution does not have any dispersion parameter. And finally the bernoulli distribution from which we derive the logistic regression: f(y|π) = πy(1 − π)1−y = exp (y log(π) + (1 − y) log(1 − π)) = exp (y log(π) + log(1 − π) − y log(1 − π)) = exp y log( π 1−π ) + log(1 − π) , where the link function evaluates to log( π 1−π ). This function is also called the logit function whose reverse function is the logistic function. Hence, it is the logit that is modeled by a lineare function of the regressors: log( π 1−π ) = a + b1x1 + . . . + bjxj. If we plug the right hand term into the logistic function we get the estimated probabilities: P(y = 1|x) = exp(a + b1x1 + . . . + bjxj) 1 + exp(a + b1x1 + . . . + bjxj) . Here, we demonstrated that the classical linear regression with normal error terms can be seen as a special case of a much wider family of models comprising all distributions out of the exponential family. (For a more complete treatment of other distributions see again https:// en.wikipedia.org/wiki/Generalized_linear_model.) 1.3 3. The term ‘linear’ does not stem from a linear regression line Unfortunately, again, the naming is confusing. Most often, linear regression is explained for the univariate case that is easy to depict. In most cases, we see a straight regression line that lends itself to give the name to this statistical procedure. But, wrong! We will show an example where the regression line is not a straight line. Albeit, the model is still a classical linear regression model: In [29]: options(repr.plot.width=5, repr.plot.height=5) # first we generate some data for the abscissa: x <- runif(100, -5, 5) # next, we add a linear transformation that is our generating model: # 2.1 is the intercept # the second term is the the slope: 1.3 # then come quadratic and cubic terms # and last some random noise y <- 2.1 + 1.3 * x + 0.06 * x^2 + 0.05 * x^3 + rnorm(100) * 2 plot(x, y, cex = 0.4) 2
  • 3.
    In [30]: #now we're going to use the lm-function: lm1 <- lm(y ~ x + I(x^2) + I(x^3)) # we predict the y-values for evenly-spaced x-values: regLine <- predict.lm(lm1, data.frame(x = seq(-5, 5, length.out = 20))) # and plot everything: plot(x, y, cex = 0.4) lines(seq(-5, 5, length.out = 20), regLine) # the fitted coefficients: round(lm1$coeff, 3) 2.168 1.167 0.081 0.057 3
  • 4.
    Now, it isevident that we fitted a non-linear regression line with the classical linear model :-) This is possible because the linear model is linear in its parameters. All model parameters, i.e., the intercept and the slope parameters are not raised to a power higher than one. If we perceive x1 = x, x2 = x2, x3 = x3 as different variables, a change in one variable will only give a linear change to the outcome. That’s why this model is named ‘linear’. In this made up example, the model-equation is not linear in its variables, because we added quadratic and cubic terms. 4