SlideShare a Scribd company logo
1 of 15
Download to read offline
http://smj.sagepub.com/
Statistical Modelling
http://smj.sagepub.com/content/13/4/335
The online version of this article can be found at:
DOI: 10.1177/1471082X13494316
2013 13: 335Statistical Modelling
RA Rigby, DM Stasinopoulos and V Voudouris
Discussion: A comparison of GAMLSS with quantile regression
Published by:
http://www.sagepublications.com
On behalf of:
Statistical Modeling Society
can be found at:Statistical ModellingAdditional services and information for
http://smj.sagepub.com/cgi/alertsEmail Alerts:
http://smj.sagepub.com/subscriptionsSubscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
http://smj.sagepub.com/content/13/4/335.refs.htmlCitations:
What is This?
- Aug 8, 2013Version of Record>>
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
Statistical Modelling 2013; 13(4): 335–348
Discussion: A comparison of GAMLSS with quantile
regression
RA Rigby1
, DM Stasinopoulos1
and V Voudouris2
1
Statistics, Operational Research and Mathematics (STORM) research centre,
London Metropolitan University, UK
2
ESCP Europe Business School, London, UK
Abstract: A discussion on the relative merits of quantile, expectile and GAMLSS regression models
is given. We contrast the ‘complete distribution models’ provided by GAMLSS to the ‘distribution free
models’ provided by quantile (and expectile) regression. We argue that in general, a flexibility para-
metric distribution assumption has several advantages allowing possible focusing on specific aspects of
the data, model comparison and model diagnostics. A new method for concentrating only on the tail
of the distributions is suggested combining quantile regression and GAMLSS.
Key words: GAMLSS; quantile and expectile regression; regression on the tail of the distribution
1 Introduction
We would like to thank Thomas Kneib for his excellent paper bringing into focus the
idea of regression analysis where the whole shape of the distribution for the response
variable is allowed to vary according to explanatory variables (rather just the mean or
the variance). Figure 1 illustrates this point by showing the Munich rental guide data
example where we have fitted a Box-Cox Cole and Green (BCCG) distribution for
the response variable, rent, and where the shape of the distribution varies according
to the explanatory variable area. In fact, we fully support his statement that ‘once
starting to think about regression models beyond the mean, they seem to appear
basically everywhere’. We would like to add, that for data with more than say
1000 observations, regression models beyond the mean should be the norm, not
the exception. This brings into the frame three different approaches: the GAMLSS
approach where a full parametric distribution is assumed for the response variable Y
and the quantile and expectile regression approaches where no specific assumption
is made about the distribution of Y, which therefore can be considered in the realm
of non-parametric methods.
Address for correspondence: RA Rigby, Statistics, Operational Research and Mathematics (STORM)
research centre, London Metropolitan University, Holloway Road, London, N7 8DB, UK. E-mail:
r.rigby@londonmet.ac.uk
c 2013 SAGE Publications 10.1177/1471082X13494316
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
336 RA Rigby et al.
0 50 100 150
50010001500
area
rent
Figure 1 The fitted BCCG distribution on the Munich rent data against area of the property
We are great believers to the doctrine stating that every model is wrong but some
are useful attributed to George Box. Therefore, in searching for a suitable model
(for a particular data set), we would like to be able to try different models and
choose the ones that we think are more suitable for the data and of course capable
of answering the question at hand. That brings us to one of the most important
questions in statistical modelling. How we can decide between models? In order to
do that we need a mechanism to judge models relative to each other. We should be
able to say model I is better that model II or model I is similar to model III in a
relatively objective way. We should also be able to check whether any model we use
is an adequate fit to the data. Every model is based on certain assumptions and those
assumptions should be always checked. Awareness of these assumptions is crucial
for model checking and whether we should accept the model or not. We will argue in
this article that while the GAMLSS regression models seem to depend more heavily
on parametric assumptions than quantile and expectile regression, they are in many
ways more flexible, with the essential advances that models can be compared and
every assumption within a particular model can be checked and tested.
Our contribution is divided into six sections. Section 2 discusses the historical
development of GAMLSS and the advantages and disadvantages of the approach.
In Section 3, we revisit the Munich rent analysis. Section 4 looks at quantile and
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 337
expectile regression. Section 5 introduces a new idea of modelling the tail end of the
distribution of a response variable within a regression framework, while Section 6
makes relevant conclusions.
2 The parametric GAMLSS models
This section starts with a small history of GAMLSS and then describes the advantages,
pitfalls, and some new developments.
2.1 Historical development
The parametric mean regression model dominated the statistical scene for the last
two centuries. Modelling the distribution variance (as well as the mean) as a func-
tion of explanatory variables started more recently by Harvey (1976) and Aitkin
(1987) for normal models and by Nelder and Predibon (1986) and Nelder (1992),
for exponential family models (using through an extended quasi-likelihood, EQL,
approach). Modelling the variance (volatility in finance) within a time series frame-
work has become important after the seminal paper of Engle (1982) introducing
the ARCH (autoregressive conditional heteroskedastic) models. The LMS method in
centile estimation of Cole (1988) and Cole and Green (1992) were the first attempt to
model skewness parametrically, as a function of (one) explanatory variable (the age).
Rigby and Stasinopoulos (1996a, 1996b) introduced the Mean and Dispersion Addi-
tive models (MADAM) which use additive terms for the mean and dispersion but
assumed that the response variable belongs to the exponential family and therefore
used EQL as a method of estimation. Following the implementation of the MADAM
model, a key problem with EQL became clear. Despite the fact the EQL can produce
‘good’ statistical properties for simulated models where the parameters of the model
are assumed to be known, there is not a proper way of comparing an EQL model
with a different candidate model estimated say using maximum likelihood. It was
this fact that lead us to abandoning the EQL and to concentrate on models with
properly defined distributional assumption. For more detailed criticism of the EQL
approach see the discussion by Rigby and Stasinopoulos of Lee and Nelder (2006).
In real data situations, the distribution, the parameters and the mathematical
structure of the model itself are unknown. We only make progress when there is
a reasonable method to compare between different models and a way to check
their assumptions. The GAMLSS models of Rigby and Stasinopoulos (2005) are
based on this ideology. They assume that the response has a parametric distribution
Y ∼ D(µ, σ, ν, τ), where µ and σ are usually location and scale parameters and
ν and τ are usually shape parameters. Explanatory variables are introduced into
the model through the ‘predictors’, η1 = g1(µ), η2 = g1(σ), η3 = g1(ν) and η4 =
g1(τ). The predictors can be linear functions of the explanatory variables or can
take the form of the ‘structured additive predictors’ of equation (3) in Thomas’s
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
338 RA Rigby et al.
paper. That is, ηi = βo +
p
j=1 fj (zi ). In fact, any Gaussian Markov random field
formulation, see Rue and Held (2005), can be used here by setting the problem
as a random effect model, as Thomas describes in section 2. The two algorithms
described in Rigby and Stasinopoulos (2005, Appendix B) as RS and CG still apply.
Both algorithms lead to the maximisation of the penalised likelihood function (or
MAP estimation in a Bayesian framework) given that the hyper parameters (the λ’s or
their effective degrees of freedom in smoothing) are known. Rigby and Stasinopoulos
(2013) have shown that both algorithms are working well if the hyperparameters are
estimated by an internal (i.e., local) ML estimation procedure. This local maximum
likelihood estimation is effectively a penalised quasi-likelihood applied locally for
each of µ, σ, ν and τ. The method is also a generalisation to multiple smoothing
terms of the procedure given by Lee et al. (2006) but is applied internally on the
predictor scale for each µ, σ, ν, and τ. Alternatively, the hyperparameters can be
estimated by minimising a generalised cross validation criterion or a generalised
Akaike information criterion (GAIC) either locally on the predictor scale for each µ,
σ, ν, and τ or globally for µ, σ, ν, and τ jointly.
2.2 Advantages of GAMLSS
Here we highlight some of the advantages of the GAMLSS models and their imple-
mentation in R.
Distributions
1. The gamlss.dist package provides more than 80 distributions for a continu-
ous, discrete or mixed response variable. (By ‘mixed’ distributions we mean
a continuous distribution which can also take some extra discrete values. For
example, the inflated gamma distribution is an example of a mixed distribution
where the response is allowed to take in addition the discrete value of zero with
a non-zero probability.)
2. Any of the distributions can be right or left truncated or both.
3. Interval response variables (i.e. censored data) can be modelled with any of the
distributions.
4. Several continuous distributions within GAMLSS provide a decomposition of
the signal for Y into location, scale, skewness and kurtosis components that
help the interpretation.
5. Any continuous distribution defined on (−∞, ∞) can have its log or logit
transformed variable defined on (0, ∞) and (0, 1), respectively, providing a
wider range of distributions on those ranges within GAMLSS.
6. For a continuous response variable Y on the positive real line (0 < Y < ∞),
the 3-parameter BCCG distribution has been widely used for centile (or quan-
tile) estimation and is called the LMS method. Rigby and Stasinopoulos (2004,
2006) extended the LMS method (which allows for location, scale and skew-
ness but not for kurtosis in the data), to allow for kurtosis by introducing the
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 339
4-parameter Box-Cox power exponential (BCPE) and the Box-Cox t (BCT)
distributions, respectively, and called the resulting centile (or quantile) esti-
mation methods LMSP and LMST, respectively. The BCCG, BCPE and BCT
distributions are all available in GAMLSS.
Additive terms
Because of the modularity of the fitting algorithm, other statistical techniques apart
from Gaussian Markov random fields can be used as additive terms e.g., neural
networks, loess. Also it is easy to change and test different link functions for the
parameters.
Diagnostics
The normalised (randomised) quantile residuals Dunn and Smyth (1996) (or z-scores)
are well defined, provide information about the adequacy of the model and can be
used in connection with diagnostic plots like worm plots (van Buuren and Fredriks,
2001) or other test statistics e.g., Z-statistcs (Royston and Wright, 2000).
Mode of inference
Full specification of the likelihood function for the model given Y allows differ-
ent modes of statistical inference i.e., classical (including bootstrapping), Bayesian,
boosting, etc., as Thomas already indicated. Maximum likelihood provides a way of
discriminating between GAMLSS models. This can be done by the global deviance
(GD = −2 log(Likelihood)) defined for all current data, the validation global deviance
(VGD=the global deviance defined for a validation data set) or the GAIC with special
cases, the AIC and the SBC.
2.3 Disadvantages of GAMLSS
The disadvantages of GAMLSS arise from the flexibility (and therefore complexity) of
the model and the choices practitioners have to make. Let M = (D, G, T, L) represent
a GAMLSS model. The components of M are defined as follows:
1. D specifies the distribution of the response variable
2. G specifies the set of link functions
3. T specifies the terms appearing in all the predictors for µ, σ, ν and τ
4. L specifies the smoothing hyper parameters which determine the amount of
smoothing.
The empirical researcher is presented with four choices in designing an appropriate
GAMLSS model. Therefore, developing and comparing GAMLSS models is not a
trivial task, particularly the selection of the theoretical parametric distribution with
the correct tails and the selection of the terms for the distribution parameters, namely
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
340 RA Rigby et al.
µ, σ, ν and τ. The GAMLSS framework requires the empirical researchers to have
a good understanding of the properties of the distributions from the list of available
distributions in the GAMLSS framework.
3 The Munich rent data revisited
Here we reanalyse the Munich data using the following GAMLSS model:
y ∼ BCCG(µ, σ, ν)
log(µ) = b10 + s11(age) + s12(year)
log(σ) = b20 + s21(age) + s22(year)
log(ν) = b30 + s31(age) + s32(year), (3.1)
where the distribution BCCG(µ, σ, ν) was chosen after some initial investigation
where several other distributions defined on the positive real line where tried. We call
(3.1) the basic model (bm). It has a multiplicative model for µ, (resulting from the
log link for µ). The resulting fitted global deviance, effective degrees of freedom (edf)
used in the model, Akaike information criterion (AIC) and Bayesian information
criterion (BIC) are given in the first line of Table 1.
An alternative additive model for µ in (3.1) (resulting from the identity link for µ)
is shown in row 2 of Table 1 and has a substantially worse fit according to AIC (or
BIC). We believe that the additive model ητ = s1(area) + s2(year) + district, as used
by Thomas Kneib in sections 5 and 6 and Figures 4 and 5, is probably inappropriate.
It uses an identity link relating ητ the population τ quantile of rent to district and to
smoothed functions of area and age. This implies that changing from an unpopular
to a popular district results in a fixed change in rent, irrespective of how large an
area the property has and irrespective of its year. It is more likely that the change
in rent is not a fixed amount but a fixed percentage, implying that a multiplicative
model is more appropriate. We looked at R packages, expectreg (Sobotka et al.,
2012), quantreg (Koenker, 2012) and cobs (Ng and Maechler, 2011), but they do
not appear to allow at the moment for a multiplicative model (i.e., log link).
In row 3 of Table 1, we report the results of a model, which while similar to
the basic model (3.1), fits smooth surfaces over area and year rather just additive
smoothing terms for each of µ, σ and ν. This model provides a small improvement
in AIC, but BIC was much worse.
Table 1 The deviance analysis of the Munich data BCCG model
model deviance edf fitted AIC BIC
1. basic model (bm) 38474.50 24.71 38523.92 38673.03
2. bm identity link for µ 38548.13 20.51 38589.16 38712.94
3. bm surface for µ, σ and ν 38434.80 41.19 38517.20 38765.68
4. bm spatial for µ 38164.70 128.45 38421.61 39196.63
5. bm spatial for µ, σ and ν 38164.42 130.78 38425.99 39215.04
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 341
20 40 60 80 100 120 140 160
-0.50.00.51.0
area
Partialforpb(area)
1920 1940 1960 1980 2000
-0.20.00.2
yearc
Partialforpb(yearc)
20 40 60 80 100 120 140 160
-0.10.10.3
area
Partialforpb(area)
1920 1940 1960 1980 2000
-0.40.00.4
yearc
Partialforpb(yearc)
20 40 60 80 100 120 140 160
0.00.51.0
area
Partialforpb(area)
1920 1940 1960 1980 2000
-0.50.51.5
yearc
Partialforpb(yearc)
Figure 2 The fitted additive terms for area and year of construction for µ, σ and ν for the rent data basic model
with spatial effect for µ
The model in row 4 is the basic model with the addition of district in the µ model
only. District is modelled here as a spatial effect using an intrinsic auroregressive
model. The model in row 5 in Table 1 fits district in the models of each of µ, σ and
ν. While district for µ provides an improvement (i.e., reduction) in AIC, it appears
that it is not needed for σ and ν according to AIC.
Figure 2 shows the fitted smooth functions skj for j = 1, 2 and k = 1, 2, 3 for
the final chosen model (the model of row 4), where k and j correspond to the row
and column of the plot in Figure 2, respectively. Clearly from Figure 2, the fitted
median rent, µ in the BCCG distribution, increases with area and year, while the
approximate coefficient of variation σ increases with area but decreases with year,
and the parameter ν increases for small and large properties and increases with year
(indicating a corresponding decrease in skewness).
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
342 RA Rigby et al.
−0.1177 0.11940
Figure 3 The fitted spatial effect for µ for the basic model with spatial effect for µ
Figure 3 shows the district effect on log(µ) (relative to the baseline district). Hence,
for example, if a district effect is 0.1, then the median rent µ is changed by a factor
e0.1
= 1.105, i.e., a 10.5% increase (relative to the baseline district).
For the final model, the τ quantile of rent is given by yτ = µ ∗ qBCCG(τ, 1, σ, ν)
where qBCCG(τ, 1, σ, ν) is the τ quantile of the BCCG(1, σ, ν) distribution. This
has a simple interpretation that the district effect on yτ is a multiplicative effect, since
it only effects µ. This disagrees with Thomas Kneib’s Figure 4, although this might
be due to his, inappropriate in our view, additive model for the τ quantile. In our
fitted model, yτ can be represented by a contour plot against area and year (for the
reference district).
Figure 4 displays a worm plot (van Buuren and Fredericks, 2001) for the chosen
fitted model (row 4 of Table 1). It shows nine detrended QQ plots for the (normalised
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 343
G
G
G
G
G
G
G
G
G
G GG
G
G
G
G
G
G
GG
G
G
G
GG
G
G
G
G
G
G
GG
G
G
G
G G
G
G
GG
G
G
G
G
GG GG
G
GG
G
G
G G
G
G G
G G
G
GG
GGGG
GG
G
G
G
G
G
G
G
G
G
G
G G
G
GG
GG G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G G
G
G
GG G
G
G G
G
G
GG
G
GG
G
GG
G
G
G
G
G
G
G
G
G
G
GG GG
G
G
G
G G
G
G
G
G
G
G G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
G
G
G
GG G
G
GG
G
G
G
G
G
G
G
G
G
G
GGGGG GG G
G
G
G
GG
G
G
G G
G
G
G
GG GGG
G
GGG G
G
GG
G
GG
GGG
G
G
G
G GG
G
G
G
G
G
G
G
GGG GG
G
G
G
GG
GG
GG
G
GG G
G
G
G
G
G
G
G
G
G
GGG G
G
G
G
GG G G
G
G
GGG GG
G
G
G
G
G
G
G GGGG
G
G
G
G
G
G
GGG
GG G
G
G
G
G
G
G
G
G
G
G
G
G GG
G
G
G
GG GG
G
G
G
G
G
G
G
G
G
G
G G
GG
GG GGG
G
GG
G
GG
G
GG
G
G
G
G
G
G
GG
G
−0.6−0.20.20.6
G
G
G
G
G
GG
GG GG
G
G
G
G
G
G
GG
G
G
G
G
G
GG G
G GG
G
G
G
G
G GGG
GGG
G
G GG
G
G G
G GG
G
G
GG
G G
G G
G
GG
G
G
G G G
GGG
G
G
G
G
G
GGG G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG GG
G
GG GG
G
G
G
G
G
G
G
G
GG
G GG
G
G
G
G
G
G GG
G
G
G GG G
G
G
G
G
G
G
GGG
G
G
G
G
G
G GG G
G G
G
G
GGG
G G
GG GG GG
GG
G
G
G
G
G
G G
G
G
G
G
G G
GG
G
G
G
GG
G
G
GG
G
GG
G
G
G
GG G
G
G
G
GGG G GG
G
G
G
G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
GGG
G
G
G
G
G
G
G
GG
G GGG GG G
G
G
G
G G
G
G
G
G
G
GG
G
G
G
G
GG G
G
G
G
G
GG GG
G
G
GG GGG
G
G
G
G
G
GG G
G
G
G
G
G
GGG
GG
G
GG GG GG
G
GG
G
G
GG
G
G
GG
G
G
G G
G
G G
−3 −2 −1 0 1 2 3
GG GGG
G
G GG
G
G
G
G
G
GG
GG
G
G
G
G
G
G GG
G
G
G G
G
GG
G GGG GG GG
G
G
GG
G
GG
G
G
G
G
G
G
G
G
G
G
G
G GG
G
G GG G
G
GG
G GGG
G
G
GG G
G
G GGGG GG GGG
G
G
G GG G
G
G
G
GG G
G
GG G
G
G GG
G
G
GG
GGG
G G
GGGG G
G
G
G
G GG
G
G
GG
G GG
G
G G
G
GG
G GG
GG
G
G GGGG
G GGGGG
G
G G
G GG
G
GG
G
G
GG
G
GG GGG
GG
G
G
G
G
G
G
G G
G
G GG G
G
G
G
G
G
G
G GG G
G
GG
G
G
G
G
G
GG
G
G
GG GG GGGG
GG
G GG
G
G
GG
G GG GGGG G
G
G GG G
G
G
GG
G
G
G
GG GG
G
G
GG GG
G
G
G
G
G
G
G
G GGG G
G
G
G
G
GGG GG
G
G
G
G
G
GGG
GG G
G
GG
G
G
G
G
G
G
G
GGG GG G
G
G
G
GG
GG G
G
G G
G
G G
G
G
G
G
G
GG GG
G
GGG
GGG
G
G
G
G
G
GG
G
GG G
GG
G
GG G
G
G
GG
GG
G
GG G
G
G
G
G
G
G GGG
G
G
GGG
G
G G
G
GGGG
GG
G
G
G
GGGG
G
G
G
G G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G GG
G GG
G G
GG
G
GG G
G
GG
GG
G
G
G
G
G
GG GG
GG G
G
GG
GG
G
G
G
GG
G G GGG GG GG
G G
GG
G
G
G
GG G
G
GG
G GGG
G
G
G
G
G GG
G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
G
GGGG G
G
G
G
G GG G
G
G
G
G
G G
GG
G
G
G
G
G
G GG
G
G G
G
GG
GG
GG
G
G
G
G
G
G
G
GG
G
G
G
GG
GG
GG
GG
G
G
G GG
GG
G
GGG G
GG G
G
GG
G G
GG
GG
G
G
G
G
GG
GGG
G
G
G
G
G
G
G
G
G G
G
G
G G
G
G
G
G
GG
G
G
G
G G
G
GG
G
G G
G
GG
G G
G
GG
G G
G GGGGGG GG
G
G
GG
G
G
G
G
G
G G GG GGG
G
G
G GGG GGG
GG
GG GGG
G
G
G
G
G
G
GG G
G G
G
G
G
GG GGGGGG G
G
G
GGG
G
G
GG G
GG G
G G
G G
G GG GG
GG G
G
G
G
G
G
G
G
GG
GG G GGG G
G GG G
G GGG GGG G
G GG
GGGG G
G
G
G
G G
G
G
G
G
G G GGG G
G
G
G
GG
GG
G
GGG GGGG
G
G
G
GG
GG G
G
G
G
G
GGGG GGGGG
G G
G
G GG
G
G
G
G G G GGG GG GG
G
G G GG
G
G
G
G
G GG G G
G
GG
GG GG GG GGG GG
GGG
G
GGG
G
G
G
G
G
GG G
GG
G
GG
G GG GG
GG G
G GGG G
G
G G
G G
G
G
G GG
G GGG G
G G
G
G GG
G
G GG
G
G
GG G GGG G GG
G G G
G
GG
G
GG G
G
G
G GG
G
GG
G G
G
G
GG
G
GGG
GGG
GGG G
GG G
GG
GGGG
G
GG GG
G
G
G
GG G
G G
G G
G
G
G
G GG
G
G GG
GGGGGG GGG G
G
G GGG
G
G
GG
GGG G G
G
G
G
GG GG G GG
G
G
G
G G
G
GGGG GGGG
G
GGG GG
G G
GGG
G
G
G
G G
GGG
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G G
GGG GG
G
G G
G
G
G
GG G
G
G
G
G
GG
G
G
G
G GGG G
G
G
G G
G
G
G
G
G
GG
G
GG
G
G
G
G
G
G
GG
G
G
G
G G
G
G
G
G
GG
G
G GG
G
G
G
G
G
GG
G
G
GG
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
GG
G
G
G GGG
G
G
G GG
G
G
GG
G
G
GG
G
G
GG
G
G
G
G
G
G G G
G
GG
GG
G
G
G
G
G
G
G
G
G
GG
GG
G
G
G
G GG
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
GG
G
G GGGG
G
G
G
GG
G
GGG
GG G
GG
G
G
GGG G
G
G
GG
GG G
G
G
G
G
GG
G
G
G
G
G
G
GG
G
G
GG
GG
GGGG
G
G GG
GG
G G
G
GG
G
G
G
G
G
GGG
G
G
G G
G
G
G
G
GG
G
G
G
GG G
G
G
G
G
G G
G
G G
G
G
GGG
G
G
G
G
G
G
GG
G
G
G G
G
G
G
G
G
G G
G
G
G
G
G
GGG
G
G
G
G
G
G
−0.6−0.20.20.6
G
G
G
G
GG
G
G G
GG G
GG
G
G
GG
G
GG
G
G GG
G G
G GG
GG G
G
G
G
G
G
GGG
G G
G
G
G
G
GG
G
G
G
G
G
G
G GG G
G
GGGG
G
G
G
G
GG
G G
G
G
G
G G
GG G
G
GG G
G
G
GGG
GG G
G
G
G
GG G
G
GG
G
G
GG
GG G
G
G
GG G
G
G
G
G GG
GG
G
G
GG
G
GG
GG
G
G G
G
G
GG
G
GG
G
G
G
GGG
G
G
G
G
GG
GG
GGG GG
G
G G
G G
G
G G
GG
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
GG
GG
GG
G
GG
G
G
GG
GG
G
G
G
G
G
G GG G
GG
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
GG
GG
GGG
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
GG
G G
GG
G
G
G
GG G
G
G
G
G
G
GG
GGG
GG
G
GG
G
GG GG
G
G
−3 −2 −1 0 1 2 3
−0.6−0.20.20.6
G
G GG G GG
G
G
GG G
GG
G
G GG GG G
G
G
GG
G
G
G
G
G
G
G
GG GG
G
G
GG G G
GG
G
G GG
G
G GG GG
G
GG GG
G
GG
GGG
G GGGG G
G G
G
G
G
G
GG
G G
G G G
G
G G
GG GG
G
G
G
G
GG
G
GG
G
G
G
GG
G
G
G
G
G
G G
G
GG
G
GGG
G
G
GG
G
GGG
GG
GG GGG GG G
G G
G
G G
G
G
GG
G
GG
GG
G
G
G
G
G
G
G GG
GG GG
G
G
G
G
G G
G
G G
GG
G
G
G
G
G
G
GG
G
G G
GG
G
G
GG
G G G
G
GG
G
G
G GGG
G
G
G
G
G
GG
GGG
G
G G
G
G GGG G
G
G
G GG
GG
G
G
G
G
GG G
G
G
GGGG
G
G G
G
GG
G G
G
G
G
G
G
G
GG GGG G
G G
G
G
G
G
G GG
G G
G
GGG
G GG
G
G
G
G
G
G
G
G
G
G
GGG
GG
G
G
GG
G
G
G
G GG
G
G
GG G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GGGG
GG G
G
G
GGG G
G
G
G
GG
G
G
G
GG
G G
G G
G G
G
G
G
G
G
G
G
GGGG
GG G
GG
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GGG
G
G
G
G
G
G
G
G
G
G
G
G
GG
GGG
GG
G
G
G
G
G
G
G
G
G
GG G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
GG GG
G
GGGG
G
G
G
G
G
G
G
GG
G
G
G
GGGG
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
GGG
G
GGG
G
G
GG
G
GGG
G
G
G
G
G
G
G
G
G
G
G
G GG
G
G
G
GG
GG
G G
GG
G G
G
GG
G
G
G
G
GG
G
GGGG
G
G
G
G
G GG
G
G
G GG
G
G
GG
G
G
G
G
GG
G
GG
GGG
G
G G
G
G
G
GG
G
GG
G
G
G
G
G
G
G G
G
G
G
G
GG G
G
G
G
GG G
G
G
GGG
GG
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G G
G
GG
G
G
GG
G
G
G
G
G
GG
G
GGG
G
G
G
G
GG
G
G
G
GGG
GG
G
GG
G
GG
G
G
GG
GGG
G
G
G
G
−3 −2 −1 0 1 2 3
Unit normal quantile
Deviation
20 40 60 80 100 120 140 160
Given : xvar
G
Figure 4 The worm plot of the residuals split by area for the basic model with spatial effect for µ
quantile) residuals in nine corresponding intervals of the explanatory variable area
(displayed above the worm plot). The individual worm plots are generally within
the 95% pointwise confidence intervals indicating a reasonable, though not fully
adequate, fit to the data. The worm plot against year is similar.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
344 RA Rigby et al.
4 Quantile and expectile regression
Standard quantile regression methods estimate each quantile (i.e. centile) separately.
Koenker (2012) has developed the quantreg package in R for quantile regression and
smoothing, while Ng and Maechler (2007, 2011) have developed the COBS package
in R for smooth quantile curves using B-splines with a smoothness penalty. The fact
that the quantile regression model does not assume a distribution for the response
variable makes it flexible and reduces bias caused by assuming a distribution, but
increases the variability of the quantile curves or surfaces, especially for extreme
quantiles with τ close to 0 or 1.
A possible problem is that different quantile curves or surfaces yτ(x) of y given
explanatory variable(s) x may cross for different values of τ (implying negative
probability). The quantile regression model does not allow for interpolation between
quantile curves (for different τ’s) or extrapolations beyond the centile curves which
is desirable for estimating extreme quantiles, which are difficult to estimate directly.
[See Schnabel and Eilers (2013) for a possible solution called quantile sheets.]
The quantile regression model also lacks an explicit formula that allows the cal-
culation of the quantile yτ(x) given τ and x, or the z-score z = −1
FY(y|x) given
y and x, where −1
is the inverse cumulative distribution function of the standard
normal distribution. This was one of the requirements set by a World Health Organ-
isation expert committee (Borghi et al., 2006) for the adoption of a method for the
construction of the world standard growth curves. The quantile regression model
lacks a measure of goodness of fit and residual diagnostic plots and statistics for
model comparison and model adequacy checking.
When there is more than one explanatory variable, the quantile regression model
usually assumes a linear or additive predictor for all τ, e.g., yτ = β0τ + β1τx1 + β2τx2
or yτ = β0τ + s1τ(x1) + s2τ(x2), where s1τ and s2τ are univariate smoothing functions.
[A smooth surface sτ(x1, x2) could be fitted for each τ, but this may be unreliable
especially for a low or high τ, unless the sample size is very large.] However, the linear
or additive predictor may be inappropriate. For example, if the simple location-scale
model Y|x1, x2 ∼ N(µ, σ), where µ = β01+β11x1+β21x2 and log(σ) = β02+β12x1+β22x2
is a good model for the data, then yτ = µ+σzτ = β01 +β11x1 +β21x2 +eβ02
eβ12x1
eβ22x2
zτ.
Hence the quantile yτ has a non-linear interaction term and cannot be modelled by the
usual linear or additive quantile regression model. Similarly in the rent data analysis
in Section 3, a multiplicative model for yτ seems more appropriate that an additive
model. Our impression is that current R implementations of quantile regression do
not allow different link functions other than the identity and hence cannot currently
fit a multiplicative model.
Expectile regression has been promoted recently by the excellent work of Schnabel
and Eilers (2013). Our main objection to expectile regression is its interpretability.
An expectile value eτ is the point where in order to balance the distribution of y
you have to weight all values above eτ by τ and below by (1 − τ). What does this
mean in practice? Also the fact that for a given distribution there is a one-to-one
mapping of expectiles to quantiles that does not help within a regression situation.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 345
Let eτ(x) of y given explanatory variable(s) x be the τ expectile at value x and let
x0 and x1 two district values for x. Then eτ(x0) and eτ(x1) will correspond to two
different quantiles yτ0
(x0) and yτ1
(x1). In general, the percentage of the population
above the expectile eτ(x) changes with x, so an expectile curve or surface eτ(x) does
not in general correspond to a centile curve or surface yτ1
(x) for any 0 < τ1 < 1.
5 Modelling the tail of the distribution
Fitting the right shape of the tail of a distribution has become very important recently,
especially in financial statistics, where both value at risk (VaR) and expected shortfall
(ES) are concepts defined in the tail of the distribution. There is a vast amount of
papers within the economic and econometric literature of methods designed to fit
the tail of the distribution, especially if it is believed, that the tail obeys the Pareto
power law. Rigby et al. (2013), in order to study properties of distributions within
the GAMLSS family, defined three major types of parametric tails for the log of
the probability density function as y → ∞ or y → −∞ : (i) −k2 log |y|
k1
(ii)
−k4 |y|k3
and (iii) −k6 ek5|y|
, in decreasing order of heaviness of the tail and called
them type I, II and III, respectively. The k’s are constants. Distribution tails can be
split into four categories: ‘non-heavy’ tails (k3 ≥ 1 or 0 < k5 < ∞), ‘heavy’ tail (i.e.
heavier than any exponential distribution) but lighter than any ‘Paretian type’ tail
(k1 > 1 and 0 < k3 < 1), ‘Paretian type’ tail (k1 = 1 and k2 > 1), and heavier than
any ‘Paretian type’ tail (k1 = 1 and k2 = 1). These four categories correspond closely
to mild, slow, wild (pre or proper) and extreme randomness, (Mandlebrot, 1997).
One of the main concerns when a parametric distribution is fitted within a regres-
sion setup is whether the fitted distribution is fitting well both in the centre and also
in the tail of the distribution. This is very important in financial statistics.
The important point here in this discussion is that quantile (and expectile) regres-
sions are less reliable in the extreme tails of the distribution because of sparsity of
data points.
We propose a method for fitting the upper tail of the response variable distribution
as follows:
1. Find the α quantile by fitting an appropriate quantile or GAMLSS model from
which we obtain the observations in the upper tail of the distribution.
2. Fit a suitable truncated distribution to the tail data where the truncation para-
meter is the fitted α quantile above.
3. Obtain the β quantile of the truncated distribution which corresponds to the
τ = α + β(1 − α) quantile of the original data.
As an illustration we investigate the upper tail of rent against area alone. We fitted
a 0.9 smooth quantile curve for rent against area using the R package cobs with
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
346 RA Rigby et al.
20 40 60 80 100 120 140 160
50010001500
area
rent
Figure 5 Comparison of quantiles from a truncated Gumbel (solid) and quantile sheet (dashed)
automatic smoothing parameter selection. We then focussed only on 307 observa-
tions (out of 3082) with rents above the 0.9 quantile curve and fitted a truncated
Gumbel model from which we obtained the β = 0.5, 0.9 and 0.95 quantile curves,
which correspond to the τ = 0.95, 0.99 and 0.995 quantile curves of the origi-
nal rent data. We compare those truncated Gumbel curves with the corresponding
τ = 0.95, 0.99 and 0.995 curves obtained from a quantile sheet of Schnabel and
Eilers (2011a). The results are shown in Figure 5. The difference in the fitted curves
maybe due to the fact that the quantile sheet program does not have yet an automatic
selection of the smoothing parameters. We also tried the cobs τ = 0.95, 0.99 and
0.995 quartiles curves but they were very erratic.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
Discussion: A comparison of GAMLSS with quantile regression 347
6 Conclusions
In our contribution to the discussion of Thomas’ paper, we have shown that the
GAMLSS framework provides a platform to fit, compare and check models. We
point out some pitfalls of using quantile and expectile regression methods without
having a proper way of checking the adequacy of the model. In particular, the
choice between additive and multiplicative models is important and it is available in
GAMLSS. Finally, we introduce a novel method of checking the tail of distribution
of the response variable in which the starting point is a quantile regression.
There are several points we would like to make here. The first has to do with
the mode of inference used for parametric distribution models. We genuinely believe
that the ‘right model’ to the data is more important than the ‘right inferential mode’.
Bayesian, classical frequencial or any other mode of inference are irrelevant if the
wrong model is used in the first place. In the search for the right model, the more
tools available the better it is, especially if there are ways of comparing the fitted
models.
We welcome the contributions that Thomas has made in the field and we think that
a Bayesian version of GAMLSS where models can be fitted fast will be a wonderful
tool for statisticians and practitioners.
Second, we would like to say something about parametric and non-parametric
approaches in statistics. Non-parametric methods for fitting terms within a regression
type situation have been one of the great contributions in statistics for the last 30
years. Non-parametric methods for fitting the shape of the distribution of y in the
presence of explanatory variables are useful, but the practitioner has to be aware of
the implicit or explicit assumptions made.
Finally, we would like to finish by emphasising that looking at a single statistical
model in isolation is not good practice. Any chosen model should be able to stand
up to scrutiny and that involves checking its assumptions and being able to compare
it with alternative models. This is available in GAMLSS but is difficult in quantile or
expectile regression.
References
Aitkin M (1987) Modelling variance
heterogeneity in normal regression
using glim. Applications Statistics, 36,
332–39.
Borghi E de Onis et al. (2006) Construction of
the world health organization child growth
standards: selection of methods for attained
growth curves. Statistics in Medicine, 25,
247–65.
Cole TJ (1988) Fitting smoothed centile curves to
reference data (with discussion). Journal of
the Royal Statistical Society, Series A, 151,
385–418.
Cole TJ and Green PJ (1992) Smoothing
reference centile curves: the lms method
and penalized likelihood. Statistics in
Medicine, 11, 1305–319.
Dunn PK and Smyth GK (1996) Randomised
quantile residuals. Journal of
Computational and Graphical Statistics, 5,
236–44.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
July 30, 2013 19:10 05-SMJ-13-4
348 RA Rigby et al.
Engle RF (1982) Autoregressive conditional
heteroscedasticity with estimates of the
variance of united kingdom inflation.
Econometrica: Journal of the Econometric
Society, 50, 987–1007.
Harvey AC (1976) Estimating regression models
with multiplicative heteroscedasticity.
Econometrica, 41, 461–65.
Koenker R (2012) quantreg: Quantile
Regression. R package version 4.91.
Lee Y, Nelder J and Pawitan Y (2006)
Generalized linear models with random
effects: unified analysis via H-likelihood.
London: CRC Press.
Lee Y and Nelder JA (2006) Double hierarchical
generalized linear models (with discussion).
Journal of the Royal Statistical Society:
Series C, 55, 139–85.
Mandelbrot B (1997) Fractals and scaling in
finance: discontinuity, concentration, risk:
selecta volume E. New York: Springer
Verlag.
Nelder JA (1992) Joint modelling of the mean
and dispersion. In P G M Van der Heijden
W Jansen B J Francis and G U H Seeber
(ed.) Statistical modelling, pp. 263–72.
Amsterdam: North Holland.
Nelder JA and Pregibon D (1987) An extended
quasi-likelihood function. Biometrika, 74,
221–32.
Ng P and Maechler M (2007) A fast and efficient
implementation of qualitatively constrained
quantile smoothing splines. Statistical
Modelling, 7(4), 315–28.
Ng PT and Maechler M (2011) cobs:
COBS—Constrained B-splines (Sparse
matrix based). R package version 1.2-2.
Rigby RA and Stasinopoulos DM (1996a) A
semi-parametric additive model for
variance heterogeneity. Statistics and
Computing, 6, 57–65.
Rigby RA and Stasinopoulos DM (1996b) Mean
and dispersion additive models. In
W Hardle and MG Schimek (ed.) Statistical
theory and computational aspects of
smoothing, pp. 215–30. Heidelberg:
Physica.
Rigby RA and Stasinopoulos DM (2004) Smooth
centile curves for skew and kurtotic data
modelled using the Box-Cox power
exponential distribution. Statistics in
Medicine, 23, 3053–76.
Rigby RA and Stasinopoulos DM (2005)
Generalized additive models for location,
scale and shape (with discussion). Journal
of the Royal Statistical Society: Series C,
54, 507–54.
Rigby RA and Stasinopoulos DM (2006) Using
the Box-Cox t distribution in gamlss to
model skewness and kurtosis. Statistical
Modelling, 6, 209–29.
Rigby RA and Stasinopoulos DM (2013)
Automatic smoothing parameter selection
in gamlss with an application to centile
estimation. Statistical Methods in Medical
Research. Published online before print
01/02/2013. http://smm.sagepub.com/
content/early/2013/01/16/09622802124
73302.abstract
Rigby RA, Stasinopoulos DM and Voudouris V
(2013) Methods for the ordering and
comparison of theoretical distributions for
parametric models in the presence of heavy
tails. Internal Report, STORM, London
Metropolitan University.
Royston P and Wright EM (2000) Goodness-of-
fit statistics for age-specific reference
intervals. Statistics in Medicine, 19,
2943–62.
Rue H and Held L (2005) Gaussian Markov
random fields: theory and applications, vol.
104. London: Chapman & Hall/CRC.
Schnabel SK and Eilers PHC (2013)
Simultaneous estimation of quantile curves
using quantile sheets. Advances in
Statistical Analysis, 97, 77–87.
Sobotka F, Schnabel S and Schulze Waltrup L
(2012) Expectreg: Expectile and quantile
regression. R package version 0.35.
van Buuren S and Fredriks M (2001) Worm plot:
a simple diagnostic device for modelling
growth reference curves. Statistics in
Medicine, 20, 1259–77.
Statistical Modelling 2013; 13(4): 335–348
at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from

More Related Content

What's hot

Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1Muhammad Ali
 
Multicolinearity
MulticolinearityMulticolinearity
MulticolinearityPawan Kawan
 
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Muhammad Ali
 
A Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest RatesA Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest RatesIlias Lekkos
 
Application of Weighted Least Squares Regression in Forecasting
Application of Weighted Least Squares Regression in ForecastingApplication of Weighted Least Squares Regression in Forecasting
Application of Weighted Least Squares Regression in Forecastingpaperpublications3
 
Cointegration of Interest Rate- The Case of Albania
Cointegration of Interest Rate- The Case of AlbaniaCointegration of Interest Rate- The Case of Albania
Cointegration of Interest Rate- The Case of Albaniarahulmonikasharma
 
Econometrics project
Econometrics projectEconometrics project
Econometrics projectShubham Joon
 
Conditional Correlation 2009
Conditional Correlation 2009Conditional Correlation 2009
Conditional Correlation 2009yamanote
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingColleen Farrelly
 
Business methmitcs
Business methmitcsBusiness methmitcs
Business methmitcsAltyeb Sayf
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ingMatt Grant
 

What's hot (18)

Notes s8811 structuralequations2004
Notes s8811 structuralequations2004Notes s8811 structuralequations2004
Notes s8811 structuralequations2004
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
 
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
 
A Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest RatesA Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest Rates
 
Application of Weighted Least Squares Regression in Forecasting
Application of Weighted Least Squares Regression in ForecastingApplication of Weighted Least Squares Regression in Forecasting
Application of Weighted Least Squares Regression in Forecasting
 
Decomposing Differences in Quantile Portfolio Returns betweenNorth America an...
Decomposing Differences in Quantile Portfolio Returns betweenNorth America an...Decomposing Differences in Quantile Portfolio Returns betweenNorth America an...
Decomposing Differences in Quantile Portfolio Returns betweenNorth America an...
 
Cointegration of Interest Rate- The Case of Albania
Cointegration of Interest Rate- The Case of AlbaniaCointegration of Interest Rate- The Case of Albania
Cointegration of Interest Rate- The Case of Albania
 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
 
Conditional Correlation 2009
Conditional Correlation 2009Conditional Correlation 2009
Conditional Correlation 2009
 
8225 project report (2) (1)
8225 project report (2) (1)8225 project report (2) (1)
8225 project report (2) (1)
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk Modeling
 
Business methmitcs
Business methmitcsBusiness methmitcs
Business methmitcs
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
 
One Graduate Paper
One Graduate PaperOne Graduate Paper
One Graduate Paper
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ing
 
Ols by hiron
Ols by hironOls by hiron
Ols by hiron
 

Viewers also liked

Actuaries and Examiners Talk Numbers: Go Figure!
Actuaries and Examiners Talk Numbers:  Go Figure!Actuaries and Examiners Talk Numbers:  Go Figure!
Actuaries and Examiners Talk Numbers: Go Figure!Sedgwick
 
Chapter 6: FINANCIAL OPERATIONS OF I NSURERS
Chapter 6: FINANCIAL OPERATIONS OF I NSURERSChapter 6: FINANCIAL OPERATIONS OF I NSURERS
Chapter 6: FINANCIAL OPERATIONS OF I NSURERSMarya Sholevar
 
Capital Adequacy
Capital AdequacyCapital Adequacy
Capital AdequacyT A Sairam
 
Writing thesis chapters 1-3 guidelines
Writing thesis chapters 1-3 guidelinesWriting thesis chapters 1-3 guidelines
Writing thesis chapters 1-3 guidelinespoleyseugenio
 

Viewers also liked (6)

P & C Reserving Using GAMLSS
P & C Reserving Using GAMLSSP & C Reserving Using GAMLSS
P & C Reserving Using GAMLSS
 
Actuaries and Examiners Talk Numbers: Go Figure!
Actuaries and Examiners Talk Numbers:  Go Figure!Actuaries and Examiners Talk Numbers:  Go Figure!
Actuaries and Examiners Talk Numbers: Go Figure!
 
Chapter 6: FINANCIAL OPERATIONS OF I NSURERS
Chapter 6: FINANCIAL OPERATIONS OF I NSURERSChapter 6: FINANCIAL OPERATIONS OF I NSURERS
Chapter 6: FINANCIAL OPERATIONS OF I NSURERS
 
Capital Adequacy
Capital AdequacyCapital Adequacy
Capital Adequacy
 
Capital adequcy
Capital adequcyCapital adequcy
Capital adequcy
 
Writing thesis chapters 1-3 guidelines
Writing thesis chapters 1-3 guidelinesWriting thesis chapters 1-3 guidelines
Writing thesis chapters 1-3 guidelines
 

Similar to StatsModelling

Nonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel DataNonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel DataIJCSIS Research Publications
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfAlemAyahu
 
NONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORK
NONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORKNONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORK
NONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORKcscpconf
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesQUESTJOURNAL
 
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...Alkis Vazacopoulos
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxcarlstromcurtis
 
Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183Soma Sinha Roy
 
2017 sarstedtetal. handbookof_marketresearch
2017 sarstedtetal. handbookof_marketresearch2017 sarstedtetal. handbookof_marketresearch
2017 sarstedtetal. handbookof_marketresearchssuser08046e
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...BRNSS Publication Hub
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsAdrian Olszewski
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing theIJDKP
 
Bidanset Lombard Cityscape
Bidanset Lombard CityscapeBidanset Lombard Cityscape
Bidanset Lombard CityscapePaul Bidanset
 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)AJHSSR Journal
 

Similar to StatsModelling (20)

Statsci
StatsciStatsci
Statsci
 
Nonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel DataNonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel Data
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
Naszodi a
Naszodi aNaszodi a
Naszodi a
 
NONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORK
NONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORKNONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORK
NONLINEAR EXTENSION OF ASYMMETRIC GARCH MODEL WITHIN NEURAL NETWORK FRAMEWORK
 
02_AJMS_441_22.pdf
02_AJMS_441_22.pdf02_AJMS_441_22.pdf
02_AJMS_441_22.pdf
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children Diseases
 
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
 
Basheka 244
Basheka 244Basheka 244
Basheka 244
 
beven 2001.pdf
beven 2001.pdfbeven 2001.pdf
beven 2001.pdf
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docx
 
Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183
 
2017 sarstedtetal. handbookof_marketresearch
2017 sarstedtetal. handbookof_marketresearch2017 sarstedtetal. handbookof_marketresearch
2017 sarstedtetal. handbookof_marketresearch
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statistics
 
GARCH
GARCHGARCH
GARCH
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing the
 
Bidanset Lombard Cityscape
Bidanset Lombard CityscapeBidanset Lombard Cityscape
Bidanset Lombard Cityscape
 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)
 

StatsModelling

  • 1. http://smj.sagepub.com/ Statistical Modelling http://smj.sagepub.com/content/13/4/335 The online version of this article can be found at: DOI: 10.1177/1471082X13494316 2013 13: 335Statistical Modelling RA Rigby, DM Stasinopoulos and V Voudouris Discussion: A comparison of GAMLSS with quantile regression Published by: http://www.sagepublications.com On behalf of: Statistical Modeling Society can be found at:Statistical ModellingAdditional services and information for http://smj.sagepub.com/cgi/alertsEmail Alerts: http://smj.sagepub.com/subscriptionsSubscriptions: http://www.sagepub.com/journalsReprints.navReprints: http://www.sagepub.com/journalsPermissions.navPermissions: http://smj.sagepub.com/content/13/4/335.refs.htmlCitations: What is This? - Aug 8, 2013Version of Record>> at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 2. July 30, 2013 19:10 05-SMJ-13-4 Statistical Modelling 2013; 13(4): 335–348 Discussion: A comparison of GAMLSS with quantile regression RA Rigby1 , DM Stasinopoulos1 and V Voudouris2 1 Statistics, Operational Research and Mathematics (STORM) research centre, London Metropolitan University, UK 2 ESCP Europe Business School, London, UK Abstract: A discussion on the relative merits of quantile, expectile and GAMLSS regression models is given. We contrast the ‘complete distribution models’ provided by GAMLSS to the ‘distribution free models’ provided by quantile (and expectile) regression. We argue that in general, a flexibility para- metric distribution assumption has several advantages allowing possible focusing on specific aspects of the data, model comparison and model diagnostics. A new method for concentrating only on the tail of the distributions is suggested combining quantile regression and GAMLSS. Key words: GAMLSS; quantile and expectile regression; regression on the tail of the distribution 1 Introduction We would like to thank Thomas Kneib for his excellent paper bringing into focus the idea of regression analysis where the whole shape of the distribution for the response variable is allowed to vary according to explanatory variables (rather just the mean or the variance). Figure 1 illustrates this point by showing the Munich rental guide data example where we have fitted a Box-Cox Cole and Green (BCCG) distribution for the response variable, rent, and where the shape of the distribution varies according to the explanatory variable area. In fact, we fully support his statement that ‘once starting to think about regression models beyond the mean, they seem to appear basically everywhere’. We would like to add, that for data with more than say 1000 observations, regression models beyond the mean should be the norm, not the exception. This brings into the frame three different approaches: the GAMLSS approach where a full parametric distribution is assumed for the response variable Y and the quantile and expectile regression approaches where no specific assumption is made about the distribution of Y, which therefore can be considered in the realm of non-parametric methods. Address for correspondence: RA Rigby, Statistics, Operational Research and Mathematics (STORM) research centre, London Metropolitan University, Holloway Road, London, N7 8DB, UK. E-mail: r.rigby@londonmet.ac.uk c 2013 SAGE Publications 10.1177/1471082X13494316 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 3. July 30, 2013 19:10 05-SMJ-13-4 336 RA Rigby et al. 0 50 100 150 50010001500 area rent Figure 1 The fitted BCCG distribution on the Munich rent data against area of the property We are great believers to the doctrine stating that every model is wrong but some are useful attributed to George Box. Therefore, in searching for a suitable model (for a particular data set), we would like to be able to try different models and choose the ones that we think are more suitable for the data and of course capable of answering the question at hand. That brings us to one of the most important questions in statistical modelling. How we can decide between models? In order to do that we need a mechanism to judge models relative to each other. We should be able to say model I is better that model II or model I is similar to model III in a relatively objective way. We should also be able to check whether any model we use is an adequate fit to the data. Every model is based on certain assumptions and those assumptions should be always checked. Awareness of these assumptions is crucial for model checking and whether we should accept the model or not. We will argue in this article that while the GAMLSS regression models seem to depend more heavily on parametric assumptions than quantile and expectile regression, they are in many ways more flexible, with the essential advances that models can be compared and every assumption within a particular model can be checked and tested. Our contribution is divided into six sections. Section 2 discusses the historical development of GAMLSS and the advantages and disadvantages of the approach. In Section 3, we revisit the Munich rent analysis. Section 4 looks at quantile and Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 4. July 30, 2013 19:10 05-SMJ-13-4 Discussion: A comparison of GAMLSS with quantile regression 337 expectile regression. Section 5 introduces a new idea of modelling the tail end of the distribution of a response variable within a regression framework, while Section 6 makes relevant conclusions. 2 The parametric GAMLSS models This section starts with a small history of GAMLSS and then describes the advantages, pitfalls, and some new developments. 2.1 Historical development The parametric mean regression model dominated the statistical scene for the last two centuries. Modelling the distribution variance (as well as the mean) as a func- tion of explanatory variables started more recently by Harvey (1976) and Aitkin (1987) for normal models and by Nelder and Predibon (1986) and Nelder (1992), for exponential family models (using through an extended quasi-likelihood, EQL, approach). Modelling the variance (volatility in finance) within a time series frame- work has become important after the seminal paper of Engle (1982) introducing the ARCH (autoregressive conditional heteroskedastic) models. The LMS method in centile estimation of Cole (1988) and Cole and Green (1992) were the first attempt to model skewness parametrically, as a function of (one) explanatory variable (the age). Rigby and Stasinopoulos (1996a, 1996b) introduced the Mean and Dispersion Addi- tive models (MADAM) which use additive terms for the mean and dispersion but assumed that the response variable belongs to the exponential family and therefore used EQL as a method of estimation. Following the implementation of the MADAM model, a key problem with EQL became clear. Despite the fact the EQL can produce ‘good’ statistical properties for simulated models where the parameters of the model are assumed to be known, there is not a proper way of comparing an EQL model with a different candidate model estimated say using maximum likelihood. It was this fact that lead us to abandoning the EQL and to concentrate on models with properly defined distributional assumption. For more detailed criticism of the EQL approach see the discussion by Rigby and Stasinopoulos of Lee and Nelder (2006). In real data situations, the distribution, the parameters and the mathematical structure of the model itself are unknown. We only make progress when there is a reasonable method to compare between different models and a way to check their assumptions. The GAMLSS models of Rigby and Stasinopoulos (2005) are based on this ideology. They assume that the response has a parametric distribution Y ∼ D(µ, σ, ν, τ), where µ and σ are usually location and scale parameters and ν and τ are usually shape parameters. Explanatory variables are introduced into the model through the ‘predictors’, η1 = g1(µ), η2 = g1(σ), η3 = g1(ν) and η4 = g1(τ). The predictors can be linear functions of the explanatory variables or can take the form of the ‘structured additive predictors’ of equation (3) in Thomas’s Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 5. July 30, 2013 19:10 05-SMJ-13-4 338 RA Rigby et al. paper. That is, ηi = βo + p j=1 fj (zi ). In fact, any Gaussian Markov random field formulation, see Rue and Held (2005), can be used here by setting the problem as a random effect model, as Thomas describes in section 2. The two algorithms described in Rigby and Stasinopoulos (2005, Appendix B) as RS and CG still apply. Both algorithms lead to the maximisation of the penalised likelihood function (or MAP estimation in a Bayesian framework) given that the hyper parameters (the λ’s or their effective degrees of freedom in smoothing) are known. Rigby and Stasinopoulos (2013) have shown that both algorithms are working well if the hyperparameters are estimated by an internal (i.e., local) ML estimation procedure. This local maximum likelihood estimation is effectively a penalised quasi-likelihood applied locally for each of µ, σ, ν and τ. The method is also a generalisation to multiple smoothing terms of the procedure given by Lee et al. (2006) but is applied internally on the predictor scale for each µ, σ, ν, and τ. Alternatively, the hyperparameters can be estimated by minimising a generalised cross validation criterion or a generalised Akaike information criterion (GAIC) either locally on the predictor scale for each µ, σ, ν, and τ or globally for µ, σ, ν, and τ jointly. 2.2 Advantages of GAMLSS Here we highlight some of the advantages of the GAMLSS models and their imple- mentation in R. Distributions 1. The gamlss.dist package provides more than 80 distributions for a continu- ous, discrete or mixed response variable. (By ‘mixed’ distributions we mean a continuous distribution which can also take some extra discrete values. For example, the inflated gamma distribution is an example of a mixed distribution where the response is allowed to take in addition the discrete value of zero with a non-zero probability.) 2. Any of the distributions can be right or left truncated or both. 3. Interval response variables (i.e. censored data) can be modelled with any of the distributions. 4. Several continuous distributions within GAMLSS provide a decomposition of the signal for Y into location, scale, skewness and kurtosis components that help the interpretation. 5. Any continuous distribution defined on (−∞, ∞) can have its log or logit transformed variable defined on (0, ∞) and (0, 1), respectively, providing a wider range of distributions on those ranges within GAMLSS. 6. For a continuous response variable Y on the positive real line (0 < Y < ∞), the 3-parameter BCCG distribution has been widely used for centile (or quan- tile) estimation and is called the LMS method. Rigby and Stasinopoulos (2004, 2006) extended the LMS method (which allows for location, scale and skew- ness but not for kurtosis in the data), to allow for kurtosis by introducing the Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 6. July 30, 2013 19:10 05-SMJ-13-4 Discussion: A comparison of GAMLSS with quantile regression 339 4-parameter Box-Cox power exponential (BCPE) and the Box-Cox t (BCT) distributions, respectively, and called the resulting centile (or quantile) esti- mation methods LMSP and LMST, respectively. The BCCG, BCPE and BCT distributions are all available in GAMLSS. Additive terms Because of the modularity of the fitting algorithm, other statistical techniques apart from Gaussian Markov random fields can be used as additive terms e.g., neural networks, loess. Also it is easy to change and test different link functions for the parameters. Diagnostics The normalised (randomised) quantile residuals Dunn and Smyth (1996) (or z-scores) are well defined, provide information about the adequacy of the model and can be used in connection with diagnostic plots like worm plots (van Buuren and Fredriks, 2001) or other test statistics e.g., Z-statistcs (Royston and Wright, 2000). Mode of inference Full specification of the likelihood function for the model given Y allows differ- ent modes of statistical inference i.e., classical (including bootstrapping), Bayesian, boosting, etc., as Thomas already indicated. Maximum likelihood provides a way of discriminating between GAMLSS models. This can be done by the global deviance (GD = −2 log(Likelihood)) defined for all current data, the validation global deviance (VGD=the global deviance defined for a validation data set) or the GAIC with special cases, the AIC and the SBC. 2.3 Disadvantages of GAMLSS The disadvantages of GAMLSS arise from the flexibility (and therefore complexity) of the model and the choices practitioners have to make. Let M = (D, G, T, L) represent a GAMLSS model. The components of M are defined as follows: 1. D specifies the distribution of the response variable 2. G specifies the set of link functions 3. T specifies the terms appearing in all the predictors for µ, σ, ν and τ 4. L specifies the smoothing hyper parameters which determine the amount of smoothing. The empirical researcher is presented with four choices in designing an appropriate GAMLSS model. Therefore, developing and comparing GAMLSS models is not a trivial task, particularly the selection of the theoretical parametric distribution with the correct tails and the selection of the terms for the distribution parameters, namely Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 7. July 30, 2013 19:10 05-SMJ-13-4 340 RA Rigby et al. µ, σ, ν and τ. The GAMLSS framework requires the empirical researchers to have a good understanding of the properties of the distributions from the list of available distributions in the GAMLSS framework. 3 The Munich rent data revisited Here we reanalyse the Munich data using the following GAMLSS model: y ∼ BCCG(µ, σ, ν) log(µ) = b10 + s11(age) + s12(year) log(σ) = b20 + s21(age) + s22(year) log(ν) = b30 + s31(age) + s32(year), (3.1) where the distribution BCCG(µ, σ, ν) was chosen after some initial investigation where several other distributions defined on the positive real line where tried. We call (3.1) the basic model (bm). It has a multiplicative model for µ, (resulting from the log link for µ). The resulting fitted global deviance, effective degrees of freedom (edf) used in the model, Akaike information criterion (AIC) and Bayesian information criterion (BIC) are given in the first line of Table 1. An alternative additive model for µ in (3.1) (resulting from the identity link for µ) is shown in row 2 of Table 1 and has a substantially worse fit according to AIC (or BIC). We believe that the additive model ητ = s1(area) + s2(year) + district, as used by Thomas Kneib in sections 5 and 6 and Figures 4 and 5, is probably inappropriate. It uses an identity link relating ητ the population τ quantile of rent to district and to smoothed functions of area and age. This implies that changing from an unpopular to a popular district results in a fixed change in rent, irrespective of how large an area the property has and irrespective of its year. It is more likely that the change in rent is not a fixed amount but a fixed percentage, implying that a multiplicative model is more appropriate. We looked at R packages, expectreg (Sobotka et al., 2012), quantreg (Koenker, 2012) and cobs (Ng and Maechler, 2011), but they do not appear to allow at the moment for a multiplicative model (i.e., log link). In row 3 of Table 1, we report the results of a model, which while similar to the basic model (3.1), fits smooth surfaces over area and year rather just additive smoothing terms for each of µ, σ and ν. This model provides a small improvement in AIC, but BIC was much worse. Table 1 The deviance analysis of the Munich data BCCG model model deviance edf fitted AIC BIC 1. basic model (bm) 38474.50 24.71 38523.92 38673.03 2. bm identity link for µ 38548.13 20.51 38589.16 38712.94 3. bm surface for µ, σ and ν 38434.80 41.19 38517.20 38765.68 4. bm spatial for µ 38164.70 128.45 38421.61 39196.63 5. bm spatial for µ, σ and ν 38164.42 130.78 38425.99 39215.04 Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 8. July 30, 2013 19:10 05-SMJ-13-4 Discussion: A comparison of GAMLSS with quantile regression 341 20 40 60 80 100 120 140 160 -0.50.00.51.0 area Partialforpb(area) 1920 1940 1960 1980 2000 -0.20.00.2 yearc Partialforpb(yearc) 20 40 60 80 100 120 140 160 -0.10.10.3 area Partialforpb(area) 1920 1940 1960 1980 2000 -0.40.00.4 yearc Partialforpb(yearc) 20 40 60 80 100 120 140 160 0.00.51.0 area Partialforpb(area) 1920 1940 1960 1980 2000 -0.50.51.5 yearc Partialforpb(yearc) Figure 2 The fitted additive terms for area and year of construction for µ, σ and ν for the rent data basic model with spatial effect for µ The model in row 4 is the basic model with the addition of district in the µ model only. District is modelled here as a spatial effect using an intrinsic auroregressive model. The model in row 5 in Table 1 fits district in the models of each of µ, σ and ν. While district for µ provides an improvement (i.e., reduction) in AIC, it appears that it is not needed for σ and ν according to AIC. Figure 2 shows the fitted smooth functions skj for j = 1, 2 and k = 1, 2, 3 for the final chosen model (the model of row 4), where k and j correspond to the row and column of the plot in Figure 2, respectively. Clearly from Figure 2, the fitted median rent, µ in the BCCG distribution, increases with area and year, while the approximate coefficient of variation σ increases with area but decreases with year, and the parameter ν increases for small and large properties and increases with year (indicating a corresponding decrease in skewness). Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 9. July 30, 2013 19:10 05-SMJ-13-4 342 RA Rigby et al. −0.1177 0.11940 Figure 3 The fitted spatial effect for µ for the basic model with spatial effect for µ Figure 3 shows the district effect on log(µ) (relative to the baseline district). Hence, for example, if a district effect is 0.1, then the median rent µ is changed by a factor e0.1 = 1.105, i.e., a 10.5% increase (relative to the baseline district). For the final model, the τ quantile of rent is given by yτ = µ ∗ qBCCG(τ, 1, σ, ν) where qBCCG(τ, 1, σ, ν) is the τ quantile of the BCCG(1, σ, ν) distribution. This has a simple interpretation that the district effect on yτ is a multiplicative effect, since it only effects µ. This disagrees with Thomas Kneib’s Figure 4, although this might be due to his, inappropriate in our view, additive model for the τ quantile. In our fitted model, yτ can be represented by a contour plot against area and year (for the reference district). Figure 4 displays a worm plot (van Buuren and Fredericks, 2001) for the chosen fitted model (row 4 of Table 1). It shows nine detrended QQ plots for the (normalised Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 10. July 30, 2013 19:10 05-SMJ-13-4 Discussion: A comparison of GAMLSS with quantile regression 343 G G G G G G G G G G GG G G G G G G GG G G G GG G G G G G G GG G G G G G G G GG G G G G GG GG G GG G G G G G G G G G G GG GGGG GG G G G G G G G G G G G G G GG GG G G G G G G G G G G G G G G G G GG G G G G G GG G G G G G G GG G GG G GG G G G G G G G G G G GG GG G G G G G G G G G G G G G G G G G GG G GG G G G G G G G G GG G G GG G G G G G G G G G G GGGGG GG G G G G GG G G G G G G G GG GGG G GGG G G GG G GG GGG G G G G GG G G G G G G G GGG GG G G G GG GG GG G GG G G G G G G G G G G GGG G G G G GG G G G G GGG GG G G G G G G G GGGG G G G G G G GGG GG G G G G G G G G G G G G G GG G G G GG GG G G G G G G G G G G G G GG GG GGG G GG G GG G GG G G G G G G GG G −0.6−0.20.20.6 G G G G G GG GG GG G G G G G G GG G G G G G GG G G GG G G G G G GGG GGG G G GG G G G G GG G G GG G G G G G GG G G G G G GGG G G G G G GGG G G G G G G G G G G G G G G GG GG G GG GG G G G G G G G G GG G GG G G G G G G GG G G G GG G G G G G G G GGG G G G G G G GG G G G G G GGG G G GG GG GG GG G G G G G G G G G G G G G GG G G G GG G G GG G GG G G G GG G G G G GGG G GG G G G G G G G G G GG G GG G G G G G GGG G G G G G G G GG G GGG GG G G G G G G G G G G G GG G G G G GG G G G G G GG GG G G GG GGG G G G G G GG G G G G G G GGG GG G GG GG GG G GG G G GG G G GG G G G G G G G −3 −2 −1 0 1 2 3 GG GGG G G GG G G G G G GG GG G G G G G G GG G G G G G GG G GGG GG GG G G GG G GG G G G G G G G G G G G G GG G G GG G G GG G GGG G G GG G G G GGGG GG GGG G G G GG G G G G GG G G GG G G G GG G G GG GGG G G GGGG G G G G G GG G G GG G GG G G G G GG G GG GG G G GGGG G GGGGG G G G G GG G GG G G GG G GG GGG GG G G G G G G G G G G GG G G G G G G G G GG G G GG G G G G G GG G G GG GG GGGG GG G GG G G GG G GG GGGG G G G GG G G G GG G G G GG GG G G GG GG G G G G G G G G GGG G G G G G GGG GG G G G G G GGG GG G G GG G G G G G G G GGG GG G G G G GG GG G G G G G G G G G G G G GG GG G GGG GGG G G G G G GG G GG G GG G GG G G G GG GG G GG G G G G G G G GGG G G GGG G G G G GGGG GG G G G GGGG G G G G G G G GG G G G G G G G G G G G GG G GG G G GG G GG G G GG GG G G G G G GG GG GG G G GG GG G G G GG G G GGG GG GG G G GG G G G GG G G GG G GGG G G G G G GG G G G G G G GG G GG G G G G G G GGGG G G G G G GG G G G G G G G GG G G G G G G GG G G G G GG GG GG G G G G G G G GG G G G GG GG GG GG G G G GG GG G GGG G GG G G GG G G GG GG G G G G GG GGG G G G G G G G G G G G G G G G G G G GG G G G G G G GG G G G G GG G G G GG G G G GGGGGG GG G G GG G G G G G G G GG GGG G G G GGG GGG GG GG GGG G G G G G G GG G G G G G G GG GGGGGG G G G GGG G G GG G GG G G G G G G GG GG GG G G G G G G G G GG GG G GGG G G GG G G GGG GGG G G GG GGGG G G G G G G G G G G G G GGG G G G G GG GG G GGG GGGG G G G GG GG G G G G G GGGG GGGGG G G G G GG G G G G G G GGG GG GG G G G GG G G G G G GG G G G GG GG GG GG GGG GG GGG G GGG G G G G G GG G GG G GG G GG GG GG G G GGG G G G G G G G G G GG G GGG G G G G G GG G G GG G G GG G GGG G GG G G G G GG G GG G G G G GG G GG G G G G GG G GGG GGG GGG G GG G GG GGGG G GG GG G G G GG G G G G G G G G G GG G G GG GGGGGG GGG G G G GGG G G GG GGG G G G G G GG GG G GG G G G G G G GGGG GGGG G GGG GG G G GGG G G G G G GGG G G G G G G G G G G G G G G G GG G G G GGG GG G G G G G G GG G G G G G GG G G G G GGG G G G G G G G G G G GG G GG G G G G G G GG G G G G G G G G G GG G G GG G G G G G GG G G GG G G G G G G G G G G G G G G G GG G G G G G G G G G G GG G G G G G G GG G G G GGG G G G GG G G GG G G GG G G GG G G G G G G G G G GG GG G G G G G G G G G GG GG G G G G GG G G G G G G G G G G G GG G G G GG G G GGGG G G G GG G GGG GG G GG G G GGG G G G GG GG G G G G G GG G G G G G G GG G G GG GG GGGG G G GG GG G G G GG G G G G G GGG G G G G G G G G GG G G G GG G G G G G G G G G G G G GGG G G G G G G GG G G G G G G G G G G G G G G G G GGG G G G G G G −0.6−0.20.20.6 G G G G GG G G G GG G GG G G GG G GG G G GG G G G GG GG G G G G G G GGG G G G G G G GG G G G G G G G GG G G GGGG G G G G GG G G G G G G G GG G G GG G G G GGG GG G G G G GG G G GG G G GG GG G G G GG G G G G G GG GG G G GG G GG GG G G G G G GG G GG G G G GGG G G G G GG GG GGG GG G G G G G G G G GG G G G G G G G G G G G G GG G G G GG GG GG G GG G G GG GG G G G G G G GG G GG G G G G GG G G G G G G G G G G G G G G G G G GG G G G G G GG GG GGG G G G G G G G GG G G G G G G GG G G GG G G G GG G G G G G G GG GGG GG G GG G GG GG G G −3 −2 −1 0 1 2 3 −0.6−0.20.20.6 G G GG G GG G G GG G GG G G GG GG G G G GG G G G G G G G GG GG G G GG G G GG G G GG G G GG GG G GG GG G GG GGG G GGGG G G G G G G G GG G G G G G G G G GG GG G G G G GG G GG G G G GG G G G G G G G G GG G GGG G G GG G GGG GG GG GGG GG G G G G G G G G GG G GG GG G G G G G G G GG GG GG G G G G G G G G G GG G G G G G G GG G G G GG G G GG G G G G GG G G G GGG G G G G G GG GGG G G G G G GGG G G G G GG GG G G G G GG G G G GGGG G G G G GG G G G G G G G G GG GGG G G G G G G G G GG G G G GGG G GG G G G G G G G G G G GGG GG G G GG G G G G GG G G GG G G G G G G G G G G G G G G G G G G GGGG GG G G G GGG G G G G GG G G G GG G G G G G G G G G G G G G GGGG GG G GG G G G G G G G G G G GG G G G G G G G G G G G G G G G GGG G G G G G G G G G G G G GG GGG GG G G G G G G G G G GG G G G G G G G GG G G G G G G G G G G G G GG GG G GGGG G G G G G G G GG G G G GGGG G G G G G G G G G G G G G G G G G GGG G GGG G G GG G GGG G G G G G G G G G G G G GG G G G GG GG G G GG G G G GG G G G G GG G GGGG G G G G G GG G G G GG G G GG G G G G GG G GG GGG G G G G G G GG G GG G G G G G G G G G G G G GG G G G G GG G G G GGG GG G G G G G G G G G G G G G G G G G G GG G G GG G G G G G GG G GGG G G G G GG G G G GGG GG G GG G GG G G GG GGG G G G G −3 −2 −1 0 1 2 3 Unit normal quantile Deviation 20 40 60 80 100 120 140 160 Given : xvar G Figure 4 The worm plot of the residuals split by area for the basic model with spatial effect for µ quantile) residuals in nine corresponding intervals of the explanatory variable area (displayed above the worm plot). The individual worm plots are generally within the 95% pointwise confidence intervals indicating a reasonable, though not fully adequate, fit to the data. The worm plot against year is similar. Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 11. July 30, 2013 19:10 05-SMJ-13-4 344 RA Rigby et al. 4 Quantile and expectile regression Standard quantile regression methods estimate each quantile (i.e. centile) separately. Koenker (2012) has developed the quantreg package in R for quantile regression and smoothing, while Ng and Maechler (2007, 2011) have developed the COBS package in R for smooth quantile curves using B-splines with a smoothness penalty. The fact that the quantile regression model does not assume a distribution for the response variable makes it flexible and reduces bias caused by assuming a distribution, but increases the variability of the quantile curves or surfaces, especially for extreme quantiles with τ close to 0 or 1. A possible problem is that different quantile curves or surfaces yτ(x) of y given explanatory variable(s) x may cross for different values of τ (implying negative probability). The quantile regression model does not allow for interpolation between quantile curves (for different τ’s) or extrapolations beyond the centile curves which is desirable for estimating extreme quantiles, which are difficult to estimate directly. [See Schnabel and Eilers (2013) for a possible solution called quantile sheets.] The quantile regression model also lacks an explicit formula that allows the cal- culation of the quantile yτ(x) given τ and x, or the z-score z = −1 FY(y|x) given y and x, where −1 is the inverse cumulative distribution function of the standard normal distribution. This was one of the requirements set by a World Health Organ- isation expert committee (Borghi et al., 2006) for the adoption of a method for the construction of the world standard growth curves. The quantile regression model lacks a measure of goodness of fit and residual diagnostic plots and statistics for model comparison and model adequacy checking. When there is more than one explanatory variable, the quantile regression model usually assumes a linear or additive predictor for all τ, e.g., yτ = β0τ + β1τx1 + β2τx2 or yτ = β0τ + s1τ(x1) + s2τ(x2), where s1τ and s2τ are univariate smoothing functions. [A smooth surface sτ(x1, x2) could be fitted for each τ, but this may be unreliable especially for a low or high τ, unless the sample size is very large.] However, the linear or additive predictor may be inappropriate. For example, if the simple location-scale model Y|x1, x2 ∼ N(µ, σ), where µ = β01+β11x1+β21x2 and log(σ) = β02+β12x1+β22x2 is a good model for the data, then yτ = µ+σzτ = β01 +β11x1 +β21x2 +eβ02 eβ12x1 eβ22x2 zτ. Hence the quantile yτ has a non-linear interaction term and cannot be modelled by the usual linear or additive quantile regression model. Similarly in the rent data analysis in Section 3, a multiplicative model for yτ seems more appropriate that an additive model. Our impression is that current R implementations of quantile regression do not allow different link functions other than the identity and hence cannot currently fit a multiplicative model. Expectile regression has been promoted recently by the excellent work of Schnabel and Eilers (2013). Our main objection to expectile regression is its interpretability. An expectile value eτ is the point where in order to balance the distribution of y you have to weight all values above eτ by τ and below by (1 − τ). What does this mean in practice? Also the fact that for a given distribution there is a one-to-one mapping of expectiles to quantiles that does not help within a regression situation. Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 12. July 30, 2013 19:10 05-SMJ-13-4 Discussion: A comparison of GAMLSS with quantile regression 345 Let eτ(x) of y given explanatory variable(s) x be the τ expectile at value x and let x0 and x1 two district values for x. Then eτ(x0) and eτ(x1) will correspond to two different quantiles yτ0 (x0) and yτ1 (x1). In general, the percentage of the population above the expectile eτ(x) changes with x, so an expectile curve or surface eτ(x) does not in general correspond to a centile curve or surface yτ1 (x) for any 0 < τ1 < 1. 5 Modelling the tail of the distribution Fitting the right shape of the tail of a distribution has become very important recently, especially in financial statistics, where both value at risk (VaR) and expected shortfall (ES) are concepts defined in the tail of the distribution. There is a vast amount of papers within the economic and econometric literature of methods designed to fit the tail of the distribution, especially if it is believed, that the tail obeys the Pareto power law. Rigby et al. (2013), in order to study properties of distributions within the GAMLSS family, defined three major types of parametric tails for the log of the probability density function as y → ∞ or y → −∞ : (i) −k2 log |y| k1 (ii) −k4 |y|k3 and (iii) −k6 ek5|y| , in decreasing order of heaviness of the tail and called them type I, II and III, respectively. The k’s are constants. Distribution tails can be split into four categories: ‘non-heavy’ tails (k3 ≥ 1 or 0 < k5 < ∞), ‘heavy’ tail (i.e. heavier than any exponential distribution) but lighter than any ‘Paretian type’ tail (k1 > 1 and 0 < k3 < 1), ‘Paretian type’ tail (k1 = 1 and k2 > 1), and heavier than any ‘Paretian type’ tail (k1 = 1 and k2 = 1). These four categories correspond closely to mild, slow, wild (pre or proper) and extreme randomness, (Mandlebrot, 1997). One of the main concerns when a parametric distribution is fitted within a regres- sion setup is whether the fitted distribution is fitting well both in the centre and also in the tail of the distribution. This is very important in financial statistics. The important point here in this discussion is that quantile (and expectile) regres- sions are less reliable in the extreme tails of the distribution because of sparsity of data points. We propose a method for fitting the upper tail of the response variable distribution as follows: 1. Find the α quantile by fitting an appropriate quantile or GAMLSS model from which we obtain the observations in the upper tail of the distribution. 2. Fit a suitable truncated distribution to the tail data where the truncation para- meter is the fitted α quantile above. 3. Obtain the β quantile of the truncated distribution which corresponds to the τ = α + β(1 − α) quantile of the original data. As an illustration we investigate the upper tail of rent against area alone. We fitted a 0.9 smooth quantile curve for rent against area using the R package cobs with Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 13. July 30, 2013 19:10 05-SMJ-13-4 346 RA Rigby et al. 20 40 60 80 100 120 140 160 50010001500 area rent Figure 5 Comparison of quantiles from a truncated Gumbel (solid) and quantile sheet (dashed) automatic smoothing parameter selection. We then focussed only on 307 observa- tions (out of 3082) with rents above the 0.9 quantile curve and fitted a truncated Gumbel model from which we obtained the β = 0.5, 0.9 and 0.95 quantile curves, which correspond to the τ = 0.95, 0.99 and 0.995 quantile curves of the origi- nal rent data. We compare those truncated Gumbel curves with the corresponding τ = 0.95, 0.99 and 0.995 curves obtained from a quantile sheet of Schnabel and Eilers (2011a). The results are shown in Figure 5. The difference in the fitted curves maybe due to the fact that the quantile sheet program does not have yet an automatic selection of the smoothing parameters. We also tried the cobs τ = 0.95, 0.99 and 0.995 quartiles curves but they were very erratic. Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 14. July 30, 2013 19:10 05-SMJ-13-4 Discussion: A comparison of GAMLSS with quantile regression 347 6 Conclusions In our contribution to the discussion of Thomas’ paper, we have shown that the GAMLSS framework provides a platform to fit, compare and check models. We point out some pitfalls of using quantile and expectile regression methods without having a proper way of checking the adequacy of the model. In particular, the choice between additive and multiplicative models is important and it is available in GAMLSS. Finally, we introduce a novel method of checking the tail of distribution of the response variable in which the starting point is a quantile regression. There are several points we would like to make here. The first has to do with the mode of inference used for parametric distribution models. We genuinely believe that the ‘right model’ to the data is more important than the ‘right inferential mode’. Bayesian, classical frequencial or any other mode of inference are irrelevant if the wrong model is used in the first place. In the search for the right model, the more tools available the better it is, especially if there are ways of comparing the fitted models. We welcome the contributions that Thomas has made in the field and we think that a Bayesian version of GAMLSS where models can be fitted fast will be a wonderful tool for statisticians and practitioners. Second, we would like to say something about parametric and non-parametric approaches in statistics. Non-parametric methods for fitting terms within a regression type situation have been one of the great contributions in statistics for the last 30 years. Non-parametric methods for fitting the shape of the distribution of y in the presence of explanatory variables are useful, but the practitioner has to be aware of the implicit or explicit assumptions made. Finally, we would like to finish by emphasising that looking at a single statistical model in isolation is not good practice. Any chosen model should be able to stand up to scrutiny and that involves checking its assumptions and being able to compare it with alternative models. This is available in GAMLSS but is difficult in quantile or expectile regression. References Aitkin M (1987) Modelling variance heterogeneity in normal regression using glim. Applications Statistics, 36, 332–39. Borghi E de Onis et al. (2006) Construction of the world health organization child growth standards: selection of methods for attained growth curves. Statistics in Medicine, 25, 247–65. Cole TJ (1988) Fitting smoothed centile curves to reference data (with discussion). Journal of the Royal Statistical Society, Series A, 151, 385–418. Cole TJ and Green PJ (1992) Smoothing reference centile curves: the lms method and penalized likelihood. Statistics in Medicine, 11, 1305–319. Dunn PK and Smyth GK (1996) Randomised quantile residuals. Journal of Computational and Graphical Statistics, 5, 236–44. Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from
  • 15. July 30, 2013 19:10 05-SMJ-13-4 348 RA Rigby et al. Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the Econometric Society, 50, 987–1007. Harvey AC (1976) Estimating regression models with multiplicative heteroscedasticity. Econometrica, 41, 461–65. Koenker R (2012) quantreg: Quantile Regression. R package version 4.91. Lee Y, Nelder J and Pawitan Y (2006) Generalized linear models with random effects: unified analysis via H-likelihood. London: CRC Press. Lee Y and Nelder JA (2006) Double hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society: Series C, 55, 139–85. Mandelbrot B (1997) Fractals and scaling in finance: discontinuity, concentration, risk: selecta volume E. New York: Springer Verlag. Nelder JA (1992) Joint modelling of the mean and dispersion. In P G M Van der Heijden W Jansen B J Francis and G U H Seeber (ed.) Statistical modelling, pp. 263–72. Amsterdam: North Holland. Nelder JA and Pregibon D (1987) An extended quasi-likelihood function. Biometrika, 74, 221–32. Ng P and Maechler M (2007) A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling, 7(4), 315–28. Ng PT and Maechler M (2011) cobs: COBS—Constrained B-splines (Sparse matrix based). R package version 1.2-2. Rigby RA and Stasinopoulos DM (1996a) A semi-parametric additive model for variance heterogeneity. Statistics and Computing, 6, 57–65. Rigby RA and Stasinopoulos DM (1996b) Mean and dispersion additive models. In W Hardle and MG Schimek (ed.) Statistical theory and computational aspects of smoothing, pp. 215–30. Heidelberg: Physica. Rigby RA and Stasinopoulos DM (2004) Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution. Statistics in Medicine, 23, 3053–76. Rigby RA and Stasinopoulos DM (2005) Generalized additive models for location, scale and shape (with discussion). Journal of the Royal Statistical Society: Series C, 54, 507–54. Rigby RA and Stasinopoulos DM (2006) Using the Box-Cox t distribution in gamlss to model skewness and kurtosis. Statistical Modelling, 6, 209–29. Rigby RA and Stasinopoulos DM (2013) Automatic smoothing parameter selection in gamlss with an application to centile estimation. Statistical Methods in Medical Research. Published online before print 01/02/2013. http://smm.sagepub.com/ content/early/2013/01/16/09622802124 73302.abstract Rigby RA, Stasinopoulos DM and Voudouris V (2013) Methods for the ordering and comparison of theoretical distributions for parametric models in the presence of heavy tails. Internal Report, STORM, London Metropolitan University. Royston P and Wright EM (2000) Goodness-of- fit statistics for age-specific reference intervals. Statistics in Medicine, 19, 2943–62. Rue H and Held L (2005) Gaussian Markov random fields: theory and applications, vol. 104. London: Chapman & Hall/CRC. Schnabel SK and Eilers PHC (2013) Simultaneous estimation of quantile curves using quantile sheets. Advances in Statistical Analysis, 97, 77–87. Sobotka F, Schnabel S and Schulze Waltrup L (2012) Expectreg: Expectile and quantile regression. R package version 0.35. van Buuren S and Fredriks M (2001) Worm plot: a simple diagnostic device for modelling growth reference curves. Statistics in Medicine, 20, 1259–77. Statistical Modelling 2013; 13(4): 335–348 at Stanford University Libraries on August 12, 2013smj.sagepub.comDownloaded from