1. http://smj.sagepub.com/
Statistical Modelling
http://smj.sagepub.com/content/13/4/335
The online version of this article can be found at:
DOI: 10.1177/1471082X13494316
2013 13: 335Statistical Modelling
RA Rigby, DM Stasinopoulos and V Voudouris
Discussion: A comparison of GAMLSS with quantile regression
Published by:
http://www.sagepublications.com
On behalf of:
Statistical Modeling Society
can be found at:Statistical ModellingAdditional services and information for
http://smj.sagepub.com/cgi/alertsEmail Alerts:
http://smj.sagepub.com/subscriptionsSubscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
http://smj.sagepub.com/content/13/4/335.refs.htmlCitations:
What is This?
- Aug 8, 2013Version of Record>>
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
2. July 30, 2013 19:10 05-SMJ-13-4
Statistical Modelling 2013; 13(4): 335–348
Discussion: A comparison of GAMLSS with quantile
regression
RA Rigby1
, DM Stasinopoulos1
and V Voudouris2
1
Statistics, Operational Research and Mathematics (STORM) research centre,
London Metropolitan University, UK
2
ESCP Europe Business School, London, UK
Abstract: A discussion on the relative merits of quantile, expectile and GAMLSS regression models
is given. We contrast the ‘complete distribution models’ provided by GAMLSS to the ‘distribution free
models’ provided by quantile (and expectile) regression. We argue that in general, a flexibility para-
metric distribution assumption has several advantages allowing possible focusing on specific aspects of
the data, model comparison and model diagnostics. A new method for concentrating only on the tail
of the distributions is suggested combining quantile regression and GAMLSS.
Key words: GAMLSS; quantile and expectile regression; regression on the tail of the distribution
1 Introduction
We would like to thank Thomas Kneib for his excellent paper bringing into focus the
idea of regression analysis where the whole shape of the distribution for the response
variable is allowed to vary according to explanatory variables (rather just the mean or
the variance). Figure 1 illustrates this point by showing the Munich rental guide data
example where we have fitted a Box-Cox Cole and Green (BCCG) distribution for
the response variable, rent, and where the shape of the distribution varies according
to the explanatory variable area. In fact, we fully support his statement that ‘once
starting to think about regression models beyond the mean, they seem to appear
basically everywhere’. We would like to add, that for data with more than say
1000 observations, regression models beyond the mean should be the norm, not
the exception. This brings into the frame three different approaches: the GAMLSS
approach where a full parametric distribution is assumed for the response variable Y
and the quantile and expectile regression approaches where no specific assumption
is made about the distribution of Y, which therefore can be considered in the realm
of non-parametric methods.
Address for correspondence: RA Rigby, Statistics, Operational Research and Mathematics (STORM)
research centre, London Metropolitan University, Holloway Road, London, N7 8DB, UK. E-mail:
r.rigby@londonmet.ac.uk
c 2013 SAGE Publications 10.1177/1471082X13494316
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
3. July 30, 2013 19:10 05-SMJ-13-4
336 RA Rigby et al.
0 50 100 150
50010001500
area
rent
Figure 1 The fitted BCCG distribution on the Munich rent data against area of the property
We are great believers to the doctrine stating that every model is wrong but some
are useful attributed to George Box. Therefore, in searching for a suitable model
(for a particular data set), we would like to be able to try different models and
choose the ones that we think are more suitable for the data and of course capable
of answering the question at hand. That brings us to one of the most important
questions in statistical modelling. How we can decide between models? In order to
do that we need a mechanism to judge models relative to each other. We should be
able to say model I is better that model II or model I is similar to model III in a
relatively objective way. We should also be able to check whether any model we use
is an adequate fit to the data. Every model is based on certain assumptions and those
assumptions should be always checked. Awareness of these assumptions is crucial
for model checking and whether we should accept the model or not. We will argue in
this article that while the GAMLSS regression models seem to depend more heavily
on parametric assumptions than quantile and expectile regression, they are in many
ways more flexible, with the essential advances that models can be compared and
every assumption within a particular model can be checked and tested.
Our contribution is divided into six sections. Section 2 discusses the historical
development of GAMLSS and the advantages and disadvantages of the approach.
In Section 3, we revisit the Munich rent analysis. Section 4 looks at quantile and
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
4. July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 337
expectile regression. Section 5 introduces a new idea of modelling the tail end of the
distribution of a response variable within a regression framework, while Section 6
makes relevant conclusions.
2 The parametric GAMLSS models
This section starts with a small history of GAMLSS and then describes the advantages,
pitfalls, and some new developments.
2.1 Historical development
The parametric mean regression model dominated the statistical scene for the last
two centuries. Modelling the distribution variance (as well as the mean) as a func-
tion of explanatory variables started more recently by Harvey (1976) and Aitkin
(1987) for normal models and by Nelder and Predibon (1986) and Nelder (1992),
for exponential family models (using through an extended quasi-likelihood, EQL,
approach). Modelling the variance (volatility in finance) within a time series frame-
work has become important after the seminal paper of Engle (1982) introducing
the ARCH (autoregressive conditional heteroskedastic) models. The LMS method in
centile estimation of Cole (1988) and Cole and Green (1992) were the first attempt to
model skewness parametrically, as a function of (one) explanatory variable (the age).
Rigby and Stasinopoulos (1996a, 1996b) introduced the Mean and Dispersion Addi-
tive models (MADAM) which use additive terms for the mean and dispersion but
assumed that the response variable belongs to the exponential family and therefore
used EQL as a method of estimation. Following the implementation of the MADAM
model, a key problem with EQL became clear. Despite the fact the EQL can produce
‘good’ statistical properties for simulated models where the parameters of the model
are assumed to be known, there is not a proper way of comparing an EQL model
with a different candidate model estimated say using maximum likelihood. It was
this fact that lead us to abandoning the EQL and to concentrate on models with
properly defined distributional assumption. For more detailed criticism of the EQL
approach see the discussion by Rigby and Stasinopoulos of Lee and Nelder (2006).
In real data situations, the distribution, the parameters and the mathematical
structure of the model itself are unknown. We only make progress when there is
a reasonable method to compare between different models and a way to check
their assumptions. The GAMLSS models of Rigby and Stasinopoulos (2005) are
based on this ideology. They assume that the response has a parametric distribution
Y ∼ D(µ, σ, ν, τ), where µ and σ are usually location and scale parameters and
ν and τ are usually shape parameters. Explanatory variables are introduced into
the model through the ‘predictors’, η1 = g1(µ), η2 = g1(σ), η3 = g1(ν) and η4 =
g1(τ). The predictors can be linear functions of the explanatory variables or can
take the form of the ‘structured additive predictors’ of equation (3) in Thomas’s
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
5. July 30, 2013 19:10 05-SMJ-13-4
338 RA Rigby et al.
paper. That is, ηi = βo +
p
j=1 fj (zi ). In fact, any Gaussian Markov random field
formulation, see Rue and Held (2005), can be used here by setting the problem
as a random effect model, as Thomas describes in section 2. The two algorithms
described in Rigby and Stasinopoulos (2005, Appendix B) as RS and CG still apply.
Both algorithms lead to the maximisation of the penalised likelihood function (or
MAP estimation in a Bayesian framework) given that the hyper parameters (the λ’s or
their effective degrees of freedom in smoothing) are known. Rigby and Stasinopoulos
(2013) have shown that both algorithms are working well if the hyperparameters are
estimated by an internal (i.e., local) ML estimation procedure. This local maximum
likelihood estimation is effectively a penalised quasi-likelihood applied locally for
each of µ, σ, ν and τ. The method is also a generalisation to multiple smoothing
terms of the procedure given by Lee et al. (2006) but is applied internally on the
predictor scale for each µ, σ, ν, and τ. Alternatively, the hyperparameters can be
estimated by minimising a generalised cross validation criterion or a generalised
Akaike information criterion (GAIC) either locally on the predictor scale for each µ,
σ, ν, and τ or globally for µ, σ, ν, and τ jointly.
2.2 Advantages of GAMLSS
Here we highlight some of the advantages of the GAMLSS models and their imple-
mentation in R.
Distributions
1. The gamlss.dist package provides more than 80 distributions for a continu-
ous, discrete or mixed response variable. (By ‘mixed’ distributions we mean
a continuous distribution which can also take some extra discrete values. For
example, the inflated gamma distribution is an example of a mixed distribution
where the response is allowed to take in addition the discrete value of zero with
a non-zero probability.)
2. Any of the distributions can be right or left truncated or both.
3. Interval response variables (i.e. censored data) can be modelled with any of the
distributions.
4. Several continuous distributions within GAMLSS provide a decomposition of
the signal for Y into location, scale, skewness and kurtosis components that
help the interpretation.
5. Any continuous distribution defined on (−∞, ∞) can have its log or logit
transformed variable defined on (0, ∞) and (0, 1), respectively, providing a
wider range of distributions on those ranges within GAMLSS.
6. For a continuous response variable Y on the positive real line (0 < Y < ∞),
the 3-parameter BCCG distribution has been widely used for centile (or quan-
tile) estimation and is called the LMS method. Rigby and Stasinopoulos (2004,
2006) extended the LMS method (which allows for location, scale and skew-
ness but not for kurtosis in the data), to allow for kurtosis by introducing the
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
6. July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 339
4-parameter Box-Cox power exponential (BCPE) and the Box-Cox t (BCT)
distributions, respectively, and called the resulting centile (or quantile) esti-
mation methods LMSP and LMST, respectively. The BCCG, BCPE and BCT
distributions are all available in GAMLSS.
Additive terms
Because of the modularity of the fitting algorithm, other statistical techniques apart
from Gaussian Markov random fields can be used as additive terms e.g., neural
networks, loess. Also it is easy to change and test different link functions for the
parameters.
Diagnostics
The normalised (randomised) quantile residuals Dunn and Smyth (1996) (or z-scores)
are well defined, provide information about the adequacy of the model and can be
used in connection with diagnostic plots like worm plots (van Buuren and Fredriks,
2001) or other test statistics e.g., Z-statistcs (Royston and Wright, 2000).
Mode of inference
Full specification of the likelihood function for the model given Y allows differ-
ent modes of statistical inference i.e., classical (including bootstrapping), Bayesian,
boosting, etc., as Thomas already indicated. Maximum likelihood provides a way of
discriminating between GAMLSS models. This can be done by the global deviance
(GD = −2 log(Likelihood)) defined for all current data, the validation global deviance
(VGD=the global deviance defined for a validation data set) or the GAIC with special
cases, the AIC and the SBC.
2.3 Disadvantages of GAMLSS
The disadvantages of GAMLSS arise from the flexibility (and therefore complexity) of
the model and the choices practitioners have to make. Let M = (D, G, T, L) represent
a GAMLSS model. The components of M are defined as follows:
1. D specifies the distribution of the response variable
2. G specifies the set of link functions
3. T specifies the terms appearing in all the predictors for µ, σ, ν and τ
4. L specifies the smoothing hyper parameters which determine the amount of
smoothing.
The empirical researcher is presented with four choices in designing an appropriate
GAMLSS model. Therefore, developing and comparing GAMLSS models is not a
trivial task, particularly the selection of the theoretical parametric distribution with
the correct tails and the selection of the terms for the distribution parameters, namely
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
7. July 30, 2013 19:10 05-SMJ-13-4
340 RA Rigby et al.
µ, σ, ν and τ. The GAMLSS framework requires the empirical researchers to have
a good understanding of the properties of the distributions from the list of available
distributions in the GAMLSS framework.
3 The Munich rent data revisited
Here we reanalyse the Munich data using the following GAMLSS model:
y ∼ BCCG(µ, σ, ν)
log(µ) = b10 + s11(age) + s12(year)
log(σ) = b20 + s21(age) + s22(year)
log(ν) = b30 + s31(age) + s32(year), (3.1)
where the distribution BCCG(µ, σ, ν) was chosen after some initial investigation
where several other distributions defined on the positive real line where tried. We call
(3.1) the basic model (bm). It has a multiplicative model for µ, (resulting from the
log link for µ). The resulting fitted global deviance, effective degrees of freedom (edf)
used in the model, Akaike information criterion (AIC) and Bayesian information
criterion (BIC) are given in the first line of Table 1.
An alternative additive model for µ in (3.1) (resulting from the identity link for µ)
is shown in row 2 of Table 1 and has a substantially worse fit according to AIC (or
BIC). We believe that the additive model ητ = s1(area) + s2(year) + district, as used
by Thomas Kneib in sections 5 and 6 and Figures 4 and 5, is probably inappropriate.
It uses an identity link relating ητ the population τ quantile of rent to district and to
smoothed functions of area and age. This implies that changing from an unpopular
to a popular district results in a fixed change in rent, irrespective of how large an
area the property has and irrespective of its year. It is more likely that the change
in rent is not a fixed amount but a fixed percentage, implying that a multiplicative
model is more appropriate. We looked at R packages, expectreg (Sobotka et al.,
2012), quantreg (Koenker, 2012) and cobs (Ng and Maechler, 2011), but they do
not appear to allow at the moment for a multiplicative model (i.e., log link).
In row 3 of Table 1, we report the results of a model, which while similar to
the basic model (3.1), fits smooth surfaces over area and year rather just additive
smoothing terms for each of µ, σ and ν. This model provides a small improvement
in AIC, but BIC was much worse.
Table 1 The deviance analysis of the Munich data BCCG model
model deviance edf fitted AIC BIC
1. basic model (bm) 38474.50 24.71 38523.92 38673.03
2. bm identity link for µ 38548.13 20.51 38589.16 38712.94
3. bm surface for µ, σ and ν 38434.80 41.19 38517.20 38765.68
4. bm spatial for µ 38164.70 128.45 38421.61 39196.63
5. bm spatial for µ, σ and ν 38164.42 130.78 38425.99 39215.04
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
8. July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 341
20 40 60 80 100 120 140 160
-0.50.00.51.0
area
Partialforpb(area)
1920 1940 1960 1980 2000
-0.20.00.2
yearc
Partialforpb(yearc)
20 40 60 80 100 120 140 160
-0.10.10.3
area
Partialforpb(area)
1920 1940 1960 1980 2000
-0.40.00.4
yearc
Partialforpb(yearc)
20 40 60 80 100 120 140 160
0.00.51.0
area
Partialforpb(area)
1920 1940 1960 1980 2000
-0.50.51.5
yearc
Partialforpb(yearc)
Figure 2 The fitted additive terms for area and year of construction for µ, σ and ν for the rent data basic model
with spatial effect for µ
The model in row 4 is the basic model with the addition of district in the µ model
only. District is modelled here as a spatial effect using an intrinsic auroregressive
model. The model in row 5 in Table 1 fits district in the models of each of µ, σ and
ν. While district for µ provides an improvement (i.e., reduction) in AIC, it appears
that it is not needed for σ and ν according to AIC.
Figure 2 shows the fitted smooth functions skj for j = 1, 2 and k = 1, 2, 3 for
the final chosen model (the model of row 4), where k and j correspond to the row
and column of the plot in Figure 2, respectively. Clearly from Figure 2, the fitted
median rent, µ in the BCCG distribution, increases with area and year, while the
approximate coefficient of variation σ increases with area but decreases with year,
and the parameter ν increases for small and large properties and increases with year
(indicating a corresponding decrease in skewness).
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
9. July 30, 2013 19:10 05-SMJ-13-4
342 RA Rigby et al.
−0.1177 0.11940
Figure 3 The fitted spatial effect for µ for the basic model with spatial effect for µ
Figure 3 shows the district effect on log(µ) (relative to the baseline district). Hence,
for example, if a district effect is 0.1, then the median rent µ is changed by a factor
e0.1
= 1.105, i.e., a 10.5% increase (relative to the baseline district).
For the final model, the τ quantile of rent is given by yτ = µ ∗ qBCCG(τ, 1, σ, ν)
where qBCCG(τ, 1, σ, ν) is the τ quantile of the BCCG(1, σ, ν) distribution. This
has a simple interpretation that the district effect on yτ is a multiplicative effect, since
it only effects µ. This disagrees with Thomas Kneib’s Figure 4, although this might
be due to his, inappropriate in our view, additive model for the τ quantile. In our
fitted model, yτ can be represented by a contour plot against area and year (for the
reference district).
Figure 4 displays a worm plot (van Buuren and Fredericks, 2001) for the chosen
fitted model (row 4 of Table 1). It shows nine detrended QQ plots for the (normalised
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
10. July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 343
G
G
G
G
G
G
G
G
G
G GG
G
G
G
G
G
G
GG
G
G
G
GG
G
G
G
G
G
G
GG
G
G
G
G G
G
G
GG
G
G
G
G
GG GG
G
GG
G
G
G G
G
G G
G G
G
GG
GGGG
GG
G
G
G
G
G
G
G
G
G
G
G G
G
GG
GG G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G G
G
G
GG G
G
G G
G
G
GG
G
GG
G
GG
G
G
G
G
G
G
G
G
G
G
GG GG
G
G
G
G G
G
G
G
G
G
G G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
G
G
G
GG G
G
GG
G
G
G
G
G
G
G
G
G
G
GGGGG GG G
G
G
G
GG
G
G
G G
G
G
G
GG GGG
G
GGG G
G
GG
G
GG
GGG
G
G
G
G GG
G
G
G
G
G
G
G
GGG GG
G
G
G
GG
GG
GG
G
GG G
G
G
G
G
G
G
G
G
G
GGG G
G
G
G
GG G G
G
G
GGG GG
G
G
G
G
G
G
G GGGG
G
G
G
G
G
G
GGG
GG G
G
G
G
G
G
G
G
G
G
G
G
G GG
G
G
G
GG GG
G
G
G
G
G
G
G
G
G
G
G G
GG
GG GGG
G
GG
G
GG
G
GG
G
G
G
G
G
G
GG
G
−0.6−0.20.20.6
G
G
G
G
G
GG
GG GG
G
G
G
G
G
G
GG
G
G
G
G
G
GG G
G GG
G
G
G
G
G GGG
GGG
G
G GG
G
G G
G GG
G
G
GG
G G
G G
G
GG
G
G
G G G
GGG
G
G
G
G
G
GGG G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG GG
G
GG GG
G
G
G
G
G
G
G
G
GG
G GG
G
G
G
G
G
G GG
G
G
G GG G
G
G
G
G
G
G
GGG
G
G
G
G
G
G GG G
G G
G
G
GGG
G G
GG GG GG
GG
G
G
G
G
G
G G
G
G
G
G
G G
GG
G
G
G
GG
G
G
GG
G
GG
G
G
G
GG G
G
G
G
GGG G GG
G
G
G
G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
GGG
G
G
G
G
G
G
G
GG
G GGG GG G
G
G
G
G G
G
G
G
G
G
GG
G
G
G
G
GG G
G
G
G
G
GG GG
G
G
GG GGG
G
G
G
G
G
GG G
G
G
G
G
G
GGG
GG
G
GG GG GG
G
GG
G
G
GG
G
G
GG
G
G
G G
G
G G
−3 −2 −1 0 1 2 3
GG GGG
G
G GG
G
G
G
G
G
GG
GG
G
G
G
G
G
G GG
G
G
G G
G
GG
G GGG GG GG
G
G
GG
G
GG
G
G
G
G
G
G
G
G
G
G
G
G GG
G
G GG G
G
GG
G GGG
G
G
GG G
G
G GGGG GG GGG
G
G
G GG G
G
G
G
GG G
G
GG G
G
G GG
G
G
GG
GGG
G G
GGGG G
G
G
G
G GG
G
G
GG
G GG
G
G G
G
GG
G GG
GG
G
G GGGG
G GGGGG
G
G G
G GG
G
GG
G
G
GG
G
GG GGG
GG
G
G
G
G
G
G
G G
G
G GG G
G
G
G
G
G
G
G GG G
G
GG
G
G
G
G
G
GG
G
G
GG GG GGGG
GG
G GG
G
G
GG
G GG GGGG G
G
G GG G
G
G
GG
G
G
G
GG GG
G
G
GG GG
G
G
G
G
G
G
G
G GGG G
G
G
G
G
GGG GG
G
G
G
G
G
GGG
GG G
G
GG
G
G
G
G
G
G
G
GGG GG G
G
G
G
GG
GG G
G
G G
G
G G
G
G
G
G
G
GG GG
G
GGG
GGG
G
G
G
G
G
GG
G
GG G
GG
G
GG G
G
G
GG
GG
G
GG G
G
G
G
G
G
G GGG
G
G
GGG
G
G G
G
GGGG
GG
G
G
G
GGGG
G
G
G
G G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G GG
G GG
G G
GG
G
GG G
G
GG
GG
G
G
G
G
G
GG GG
GG G
G
GG
GG
G
G
G
GG
G G GGG GG GG
G G
GG
G
G
G
GG G
G
GG
G GGG
G
G
G
G
G GG
G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
G
GGGG G
G
G
G
G GG G
G
G
G
G
G G
GG
G
G
G
G
G
G GG
G
G G
G
GG
GG
GG
G
G
G
G
G
G
G
GG
G
G
G
GG
GG
GG
GG
G
G
G GG
GG
G
GGG G
GG G
G
GG
G G
GG
GG
G
G
G
G
GG
GGG
G
G
G
G
G
G
G
G
G G
G
G
G G
G
G
G
G
GG
G
G
G
G G
G
GG
G
G G
G
GG
G G
G
GG
G G
G GGGGGG GG
G
G
GG
G
G
G
G
G
G G GG GGG
G
G
G GGG GGG
GG
GG GGG
G
G
G
G
G
G
GG G
G G
G
G
G
GG GGGGGG G
G
G
GGG
G
G
GG G
GG G
G G
G G
G GG GG
GG G
G
G
G
G
G
G
G
GG
GG G GGG G
G GG G
G GGG GGG G
G GG
GGGG G
G
G
G
G G
G
G
G
G
G G GGG G
G
G
G
GG
GG
G
GGG GGGG
G
G
G
GG
GG G
G
G
G
G
GGGG GGGGG
G G
G
G GG
G
G
G
G G G GGG GG GG
G
G G GG
G
G
G
G
G GG G G
G
GG
GG GG GG GGG GG
GGG
G
GGG
G
G
G
G
G
GG G
GG
G
GG
G GG GG
GG G
G GGG G
G
G G
G G
G
G
G GG
G GGG G
G G
G
G GG
G
G GG
G
G
GG G GGG G GG
G G G
G
GG
G
GG G
G
G
G GG
G
GG
G G
G
G
GG
G
GGG
GGG
GGG G
GG G
GG
GGGG
G
GG GG
G
G
G
GG G
G G
G G
G
G
G
G GG
G
G GG
GGGGGG GGG G
G
G GGG
G
G
GG
GGG G G
G
G
G
GG GG G GG
G
G
G
G G
G
GGGG GGGG
G
GGG GG
G G
GGG
G
G
G
G G
GGG
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G G
GGG GG
G
G G
G
G
G
GG G
G
G
G
G
GG
G
G
G
G GGG G
G
G
G G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
G
GG
G
G
G
G G
G
G
G
G
GG
G
G GG
G
G
G
G
G
GG
G
G
GG
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
GG
G
G
G GGG
G
G
G GG
G
G
GG
G
G
GG
G
G
GG
G
G
G
G
G
G G G
G
GG
GG
G
G
G
G
G
G
G
G
G
GG
GG
G
G
G
G GG
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
GG
G
G GGGG
G
G
G
GG
G
GGG
GG G
GG
G
G
GGG G
G
G
GG
GG G
G
G
G
G
GG
G
G
G
G
G
G
GG
G
G
GG
GG
GGGG
G
G GG
GG
G G
G
GG
G
G
G
G
G
GGG
G
G
G G
G
G
G
G
GG
G
G
G
GG G
G
G
G
G
G G
G
G G
G
G
GGG
G
G
G
G
G
G
GG
G
G
G G
G
G
G
G
G
G G
G
G
G
G
G
GGG
G
G
G
G
G
G
−0.6−0.20.20.6
G
G
G
G
GG
G
G G
GG G
GG
G
G
GG
G
GG
G
G GG
G G
G GG
GG G
G
G
G
G
G
GGG
G G
G
G
G
G
GG
G
G
G
G
G
G
G GG G
G
GGGG
G
G
G
G
GG
G G
G
G
G
G G
GG G
G
GG G
G
G
GGG
GG G
G
G
G
GG G
G
GG
G
G
GG
GG G
G
G
GG G
G
G
G
G GG
GG
G
G
GG
G
GG
GG
G
G G
G
G
GG
G
GG
G
G
G
GGG
G
G
G
G
GG
GG
GGG GG
G
G G
G G
G
G G
GG
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
GG
GG
GG
G
GG
G
G
GG
GG
G
G
G
G
G
G GG G
GG
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
GG
GG
GGG
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
GG
G G
GG
G
G
G
GG G
G
G
G
G
G
GG
GGG
GG
G
GG
G
GG GG
G
G
−3 −2 −1 0 1 2 3
−0.6−0.20.20.6
G
G GG G GG
G
G
GG G
GG
G
G GG GG G
G
G
GG
G
G
G
G
G
G
G
GG GG
G
G
GG G G
GG
G
G GG
G
G GG GG
G
GG GG
G
GG
GGG
G GGGG G
G G
G
G
G
G
GG
G G
G G G
G
G G
GG GG
G
G
G
G
GG
G
GG
G
G
G
GG
G
G
G
G
G
G G
G
GG
G
GGG
G
G
GG
G
GGG
GG
GG GGG GG G
G G
G
G G
G
G
GG
G
GG
GG
G
G
G
G
G
G
G GG
GG GG
G
G
G
G
G G
G
G G
GG
G
G
G
G
G
G
GG
G
G G
GG
G
G
GG
G G G
G
GG
G
G
G GGG
G
G
G
G
G
GG
GGG
G
G G
G
G GGG G
G
G
G GG
GG
G
G
G
G
GG G
G
G
GGGG
G
G G
G
GG
G G
G
G
G
G
G
G
GG GGG G
G G
G
G
G
G
G GG
G G
G
GGG
G GG
G
G
G
G
G
G
G
G
G
G
GGG
GG
G
G
GG
G
G
G
G GG
G
G
GG G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GGGG
GG G
G
G
GGG G
G
G
G
GG
G
G
G
GG
G G
G G
G G
G
G
G
G
G
G
G
GGGG
GG G
GG
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GGG
G
G
G
G
G
G
G
G
G
G
G
G
GG
GGG
GG
G
G
G
G
G
G
G
G
G
GG G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
GG GG
G
GGGG
G
G
G
G
G
G
G
GG
G
G
G
GGGG
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
GGG
G
GGG
G
G
GG
G
GGG
G
G
G
G
G
G
G
G
G
G
G
G GG
G
G
G
GG
GG
G G
GG
G G
G
GG
G
G
G
G
GG
G
GGGG
G
G
G
G
G GG
G
G
G GG
G
G
GG
G
G
G
G
GG
G
GG
GGG
G
G G
G
G
G
GG
G
GG
G
G
G
G
G
G
G G
G
G
G
G
GG G
G
G
G
GG G
G
G
GGG
GG
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G G
G
GG
G
G
GG
G
G
G
G
G
GG
G
GGG
G
G
G
G
GG
G
G
G
GGG
GG
G
GG
G
GG
G
G
GG
GGG
G
G
G
G
−3 −2 −1 0 1 2 3
Unit normal quantile
Deviation
20 40 60 80 100 120 140 160
Given : xvar
G
Figure 4 The worm plot of the residuals split by area for the basic model with spatial effect for µ
quantile) residuals in nine corresponding intervals of the explanatory variable area
(displayed above the worm plot). The individual worm plots are generally within
the 95% pointwise confidence intervals indicating a reasonable, though not fully
adequate, fit to the data. The worm plot against year is similar.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
11. July 30, 2013 19:10 05-SMJ-13-4
344 RA Rigby et al.
4 Quantile and expectile regression
Standard quantile regression methods estimate each quantile (i.e. centile) separately.
Koenker (2012) has developed the quantreg package in R for quantile regression and
smoothing, while Ng and Maechler (2007, 2011) have developed the COBS package
in R for smooth quantile curves using B-splines with a smoothness penalty. The fact
that the quantile regression model does not assume a distribution for the response
variable makes it flexible and reduces bias caused by assuming a distribution, but
increases the variability of the quantile curves or surfaces, especially for extreme
quantiles with τ close to 0 or 1.
A possible problem is that different quantile curves or surfaces yτ(x) of y given
explanatory variable(s) x may cross for different values of τ (implying negative
probability). The quantile regression model does not allow for interpolation between
quantile curves (for different τ’s) or extrapolations beyond the centile curves which
is desirable for estimating extreme quantiles, which are difficult to estimate directly.
[See Schnabel and Eilers (2013) for a possible solution called quantile sheets.]
The quantile regression model also lacks an explicit formula that allows the cal-
culation of the quantile yτ(x) given τ and x, or the z-score z = −1
FY(y|x) given
y and x, where −1
is the inverse cumulative distribution function of the standard
normal distribution. This was one of the requirements set by a World Health Organ-
isation expert committee (Borghi et al., 2006) for the adoption of a method for the
construction of the world standard growth curves. The quantile regression model
lacks a measure of goodness of fit and residual diagnostic plots and statistics for
model comparison and model adequacy checking.
When there is more than one explanatory variable, the quantile regression model
usually assumes a linear or additive predictor for all τ, e.g., yτ = β0τ + β1τx1 + β2τx2
or yτ = β0τ + s1τ(x1) + s2τ(x2), where s1τ and s2τ are univariate smoothing functions.
[A smooth surface sτ(x1, x2) could be fitted for each τ, but this may be unreliable
especially for a low or high τ, unless the sample size is very large.] However, the linear
or additive predictor may be inappropriate. For example, if the simple location-scale
model Y|x1, x2 ∼ N(µ, σ), where µ = β01+β11x1+β21x2 and log(σ) = β02+β12x1+β22x2
is a good model for the data, then yτ = µ+σzτ = β01 +β11x1 +β21x2 +eβ02
eβ12x1
eβ22x2
zτ.
Hence the quantile yτ has a non-linear interaction term and cannot be modelled by the
usual linear or additive quantile regression model. Similarly in the rent data analysis
in Section 3, a multiplicative model for yτ seems more appropriate that an additive
model. Our impression is that current R implementations of quantile regression do
not allow different link functions other than the identity and hence cannot currently
fit a multiplicative model.
Expectile regression has been promoted recently by the excellent work of Schnabel
and Eilers (2013). Our main objection to expectile regression is its interpretability.
An expectile value eτ is the point where in order to balance the distribution of y
you have to weight all values above eτ by τ and below by (1 − τ). What does this
mean in practice? Also the fact that for a given distribution there is a one-to-one
mapping of expectiles to quantiles that does not help within a regression situation.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
12. July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 345
Let eτ(x) of y given explanatory variable(s) x be the τ expectile at value x and let
x0 and x1 two district values for x. Then eτ(x0) and eτ(x1) will correspond to two
different quantiles yτ0
(x0) and yτ1
(x1). In general, the percentage of the population
above the expectile eτ(x) changes with x, so an expectile curve or surface eτ(x) does
not in general correspond to a centile curve or surface yτ1
(x) for any 0 < τ1 < 1.
5 Modelling the tail of the distribution
Fitting the right shape of the tail of a distribution has become very important recently,
especially in financial statistics, where both value at risk (VaR) and expected shortfall
(ES) are concepts defined in the tail of the distribution. There is a vast amount of
papers within the economic and econometric literature of methods designed to fit
the tail of the distribution, especially if it is believed, that the tail obeys the Pareto
power law. Rigby et al. (2013), in order to study properties of distributions within
the GAMLSS family, defined three major types of parametric tails for the log of
the probability density function as y → ∞ or y → −∞ : (i) −k2 log |y|
k1
(ii)
−k4 |y|k3
and (iii) −k6 ek5|y|
, in decreasing order of heaviness of the tail and called
them type I, II and III, respectively. The k’s are constants. Distribution tails can be
split into four categories: ‘non-heavy’ tails (k3 ≥ 1 or 0 < k5 < ∞), ‘heavy’ tail (i.e.
heavier than any exponential distribution) but lighter than any ‘Paretian type’ tail
(k1 > 1 and 0 < k3 < 1), ‘Paretian type’ tail (k1 = 1 and k2 > 1), and heavier than
any ‘Paretian type’ tail (k1 = 1 and k2 = 1). These four categories correspond closely
to mild, slow, wild (pre or proper) and extreme randomness, (Mandlebrot, 1997).
One of the main concerns when a parametric distribution is fitted within a regres-
sion setup is whether the fitted distribution is fitting well both in the centre and also
in the tail of the distribution. This is very important in financial statistics.
The important point here in this discussion is that quantile (and expectile) regres-
sions are less reliable in the extreme tails of the distribution because of sparsity of
data points.
We propose a method for fitting the upper tail of the response variable distribution
as follows:
1. Find the α quantile by fitting an appropriate quantile or GAMLSS model from
which we obtain the observations in the upper tail of the distribution.
2. Fit a suitable truncated distribution to the tail data where the truncation para-
meter is the fitted α quantile above.
3. Obtain the β quantile of the truncated distribution which corresponds to the
τ = α + β(1 − α) quantile of the original data.
As an illustration we investigate the upper tail of rent against area alone. We fitted
a 0.9 smooth quantile curve for rent against area using the R package cobs with
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
13. July 30, 2013 19:10 05-SMJ-13-4
346 RA Rigby et al.
20 40 60 80 100 120 140 160
50010001500
area
rent
Figure 5 Comparison of quantiles from a truncated Gumbel (solid) and quantile sheet (dashed)
automatic smoothing parameter selection. We then focussed only on 307 observa-
tions (out of 3082) with rents above the 0.9 quantile curve and fitted a truncated
Gumbel model from which we obtained the β = 0.5, 0.9 and 0.95 quantile curves,
which correspond to the τ = 0.95, 0.99 and 0.995 quantile curves of the origi-
nal rent data. We compare those truncated Gumbel curves with the corresponding
τ = 0.95, 0.99 and 0.995 curves obtained from a quantile sheet of Schnabel and
Eilers (2011a). The results are shown in Figure 5. The difference in the fitted curves
maybe due to the fact that the quantile sheet program does not have yet an automatic
selection of the smoothing parameters. We also tried the cobs τ = 0.95, 0.99 and
0.995 quartiles curves but they were very erratic.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
14. July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 347
6 Conclusions
In our contribution to the discussion of Thomas’ paper, we have shown that the
GAMLSS framework provides a platform to fit, compare and check models. We
point out some pitfalls of using quantile and expectile regression methods without
having a proper way of checking the adequacy of the model. In particular, the
choice between additive and multiplicative models is important and it is available in
GAMLSS. Finally, we introduce a novel method of checking the tail of distribution
of the response variable in which the starting point is a quantile regression.
There are several points we would like to make here. The first has to do with
the mode of inference used for parametric distribution models. We genuinely believe
that the ‘right model’ to the data is more important than the ‘right inferential mode’.
Bayesian, classical frequencial or any other mode of inference are irrelevant if the
wrong model is used in the first place. In the search for the right model, the more
tools available the better it is, especially if there are ways of comparing the fitted
models.
We welcome the contributions that Thomas has made in the field and we think that
a Bayesian version of GAMLSS where models can be fitted fast will be a wonderful
tool for statisticians and practitioners.
Second, we would like to say something about parametric and non-parametric
approaches in statistics. Non-parametric methods for fitting terms within a regression
type situation have been one of the great contributions in statistics for the last 30
years. Non-parametric methods for fitting the shape of the distribution of y in the
presence of explanatory variables are useful, but the practitioner has to be aware of
the implicit or explicit assumptions made.
Finally, we would like to finish by emphasising that looking at a single statistical
model in isolation is not good practice. Any chosen model should be able to stand
up to scrutiny and that involves checking its assumptions and being able to compare
it with alternative models. This is available in GAMLSS but is difficult in quantile or
expectile regression.
References
Aitkin M (1987) Modelling variance
heterogeneity in normal regression
using glim. Applications Statistics, 36,
332–39.
Borghi E de Onis et al. (2006) Construction of
the world health organization child growth
standards: selection of methods for attained
growth curves. Statistics in Medicine, 25,
247–65.
Cole TJ (1988) Fitting smoothed centile curves to
reference data (with discussion). Journal of
the Royal Statistical Society, Series A, 151,
385–418.
Cole TJ and Green PJ (1992) Smoothing
reference centile curves: the lms method
and penalized likelihood. Statistics in
Medicine, 11, 1305–319.
Dunn PK and Smyth GK (1996) Randomised
quantile residuals. Journal of
Computational and Graphical Statistics, 5,
236–44.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
15. July 30, 2013 19:10 05-SMJ-13-4
348 RA Rigby et al.
Engle RF (1982) Autoregressive conditional
heteroscedasticity with estimates of the
variance of united kingdom inflation.
Econometrica: Journal of the Econometric
Society, 50, 987–1007.
Harvey AC (1976) Estimating regression models
with multiplicative heteroscedasticity.
Econometrica, 41, 461–65.
Koenker R (2012) quantreg: Quantile
Regression. R package version 4.91.
Lee Y, Nelder J and Pawitan Y (2006)
Generalized linear models with random
effects: unified analysis via H-likelihood.
London: CRC Press.
Lee Y and Nelder JA (2006) Double hierarchical
generalized linear models (with discussion).
Journal of the Royal Statistical Society:
Series C, 55, 139–85.
Mandelbrot B (1997) Fractals and scaling in
finance: discontinuity, concentration, risk:
selecta volume E. New York: Springer
Verlag.
Nelder JA (1992) Joint modelling of the mean
and dispersion. In P G M Van der Heijden
W Jansen B J Francis and G U H Seeber
(ed.) Statistical modelling, pp. 263–72.
Amsterdam: North Holland.
Nelder JA and Pregibon D (1987) An extended
quasi-likelihood function. Biometrika, 74,
221–32.
Ng P and Maechler M (2007) A fast and efficient
implementation of qualitatively constrained
quantile smoothing splines. Statistical
Modelling, 7(4), 315–28.
Ng PT and Maechler M (2011) cobs:
COBS—Constrained B-splines (Sparse
matrix based). R package version 1.2-2.
Rigby RA and Stasinopoulos DM (1996a) A
semi-parametric additive model for
variance heterogeneity. Statistics and
Computing, 6, 57–65.
Rigby RA and Stasinopoulos DM (1996b) Mean
and dispersion additive models. In
W Hardle and MG Schimek (ed.) Statistical
theory and computational aspects of
smoothing, pp. 215–30. Heidelberg:
Physica.
Rigby RA and Stasinopoulos DM (2004) Smooth
centile curves for skew and kurtotic data
modelled using the Box-Cox power
exponential distribution. Statistics in
Medicine, 23, 3053–76.
Rigby RA and Stasinopoulos DM (2005)
Generalized additive models for location,
scale and shape (with discussion). Journal
of the Royal Statistical Society: Series C,
54, 507–54.
Rigby RA and Stasinopoulos DM (2006) Using
the Box-Cox t distribution in gamlss to
model skewness and kurtosis. Statistical
Modelling, 6, 209–29.
Rigby RA and Stasinopoulos DM (2013)
Automatic smoothing parameter selection
in gamlss with an application to centile
estimation. Statistical Methods in Medical
Research. Published online before print
01/02/2013. http://smm.sagepub.com/
content/early/2013/01/16/09622802124
73302.abstract
Rigby RA, Stasinopoulos DM and Voudouris V
(2013) Methods for the ordering and
comparison of theoretical distributions for
parametric models in the presence of heavy
tails. Internal Report, STORM, London
Metropolitan University.
Royston P and Wright EM (2000) Goodness-of-
fit statistics for age-specific reference
intervals. Statistics in Medicine, 19,
2943–62.
Rue H and Held L (2005) Gaussian Markov
random fields: theory and applications, vol.
104. London: Chapman & Hall/CRC.
Schnabel SK and Eilers PHC (2013)
Simultaneous estimation of quantile curves
using quantile sheets. Advances in
Statistical Analysis, 97, 77–87.
Sobotka F, Schnabel S and Schulze Waltrup L
(2012) Expectreg: Expectile and quantile
regression. R package version 0.35.
van Buuren S and Fredriks M (2001) Worm plot:
a simple diagnostic device for modelling
growth reference curves. Statistics in
Medicine, 20, 1259–77.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from