4thchannel conference poster_freedom_gumedze

Detection of outliers in Poisson regression models via overdispersion
Freedom Gumedze and Tinashe Chatora
Department of Statistical Sciences, University of Cape Town
http://www.stats.uct.ac.za Email: freedom.gumedze@uct.ac.za
Introduction
Both undispersed and overdispersed count may contain outliers
We propose a variance shift outlier model (VSOM) for the detection and
accommodation of outliers in count data
Our proposed model is a form of a hierarchical generalized linear model (HGLM)
We consider both independent and longitudinal data settings
Hierarchical generalized linear model (HGLM)
A HGLM has the the following properties (Lee and Nelder, 1996):
Let Yij be the jth observation for the ith subject and bi be the unobserved random
effect for the ith subject, for i = 1, . . . , q and j = 1, . . . , ni. Conditional on bi, Yij
follows an exponential family distribution and has the following properties
E(Yij|bi) = µij and var(Yij|bi) = φV (µij),
where V (.) is a monotonic function of µij and φ is the dispersion parameter. The
linear predictor for µij takes the form
g(E(Yij|bi)) = g(µij) = ηij = Xijβ + νi, (1)
where νi is a monotonic function of bi, Xij is the jth row of the design matrix Xi
and Xi is a ni × p design matrix for the fixed effects for the ith subject.
The random component bi follows a distribution conjugate to an exponential family
of distributions with parameter λi.
Negative binomial model GLM and Poisson-gamma HGLM
The negative binomial GLM can be fitted as a Poisson-gamma HGLM with a
saturated random effect
log[E(Yi|si)] = Xiβ + νsi
, (2)
where si is the random effect for the ith observation. Let νsi
= log(si), with si
following a gamma distribution with a mean of one and variance of α.
The model has the negative binomial variance
var(Yi) = µi + αµ2
i . (3)
αµ2
i measures the amount overdispersion.
Variance shift outlier model (VSOM) for Poisson count data
Independent count data: a VSOM for the ith observation
log[E(Yi|δi)] = Xiβ + νδi
, (4)
where δi is a random effect for the ith count, νδi
= log(δi) and δi has a gamma
distribution with a mean of one and variance of λi.
Longitudinal setting
VSOM for the ijth observation:
ηij = log[E(Yij|bi, δij)] = Xijβ + νbi
+ νδij
, (5)
where both bi and δij follow gamma distributions with each mean of one, and variances λij and γ,
respectively.
VSOM for the ith subject
ηij = log[E(Yij|bi)] = Xijβ + νbi
+ νζi
, (6)
where both bi and ζi follow gamma distributions with each mean of one, and variances γ and τi,
respectively.
Large estimates of the variance parameters λi, λij or τi are indicative of potential
outliers
Likelihood ratio tests (LRTs) are used to test for variance parameters, with LRTs
having 0.68χ2
0 + 0.32χ2
1 mixture distributions.
Application: Epilepsy data
Data description: The dataset is taken from Thall and Vail (1990) and contains
59 patients with epilepsy who were randomized to a new drug or a placebo. For each
patient the number of seizure counts were recorded at baseline, and every fortnight
during a 8-week period.
Initial model: Negative binomial - gamma HGLM (since the data are overdispersed):
log[E(Yij|bi)] = (β0 + bi) + β1lij + β2tij + β3tijlij + β4aij + β5vij + δij,
where lij = log(baseline seizure count), vij is the linear trend for the visits, coded as
(−3, −1, 1, 3)/10, bi is the subject random effect.
VSOM for the ijth observation:
log[E(Yij|bi)] = (β0 + bi) + β1lij + β2tij + β3tijlij + β4aij + β5vij + δij,
where δij is the random effect for the ijth observation.
VSOM for the ith subject:
log[E(Yij|bi)] = (β0 + bi) + β1lij + β2tij + β3tijlij + β4aij + β5vij + ζi,
where ζi is the random effect for the ith subject.
Application: continued
qqqqqqqqqq
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
qqqqqqq
qq
q
q
qq
q
qqqqqqqqqqqqqq
q
qq
q
qq
q
q
q
qqqqqqqqqqq
q
q
q
qqqqqqqqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qq
q
q
q
qqqq
q
qqq
q
qqqqqqqqqqq
q
q
q
qqq
q
q
q
qqqqq
q
q
q
q
qq
q
qqq
q
qqq
q
q
q
qqqqqq
q
qqqqqq
qqqqqqqqqqqqqq
q
q
q
qqq
q
qqqq
q
qqqq
q
q
q
qqq
q
qqqqqqqqq
0 50 100 150 200
0.01.02.03.0
λk
(a)
qqqqqqqqqqqqqqqq
q
qq
q
qqqqqq
q
q
q
qqqqqqq
qqq
q
qq
q
qqqqqqqqqqqqqq
q
qq
q
qq
qqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqq
q
qqq
q
qqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqq
q
qqqq
q
q
q
qqqqqqqqqqqqq
0 50 100 150 200
0.080.100.12
αk
(b)
qqqqqqqqqq
q
qqqqq
q
qqqq
q
qqqq
q
q
q
qqqqqqq
qq
q
q
qq
q
qqqqqqqqqqqqqqqqq
q
qq
q
qq
qqqqqqqqqqq
q
qqqqqqqqqqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qq
q
qqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqq
q
q
qqqqq
q
qq
q
qq
q
qqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqq
q
qqqq
q
qqqq
q
q
q
qqq
q
qqqqqqqqq
0 50 100 150 200
02468
Observations
LRTk
(c)
12345678910
11
1213141516
17
18192021
22
23242526
27
282930313233343536
3738
39
40
4142
43
4445464748495051525354555657585960
61
6263
64
6566
6768697071727374757677
78
79808182838485868788899091929394
95
96
97
98
99
100101102103104105106107108109110111112113114115116117118119120
121
122123
124
125126127128129130131132133134135136137138139140141142143144145146
147
148
149
150151152153
154
155
156157158159160
161
162163
164
165166
167
168169170171172173174175176
177
178179180181182183184185186187188189190191192193194195196197198199200201202203204
205
206207208209210211
212213214215
216
217218219220
221
222
223
224225226
227
228229230231232233234235236
Negative binomial-gamma VSOM statistics plotted against observation number. (c)
Likelihood ratio statistics, LRTk with rth percentiles from 0.68χ2
0 + 0.32χ2
1 mixture
distribution: r = 95 (solid line), r = 97.5 (dashed line) and r = 99 (dotted line).
k = 1, . . . , N = 236.
Potential outliers: observations 40, 62, 62, 78, 99 and 221.
q q q q q q q q q
q
q q q q q
q q
q q q q q q q
q
q
q q q q q q q q
q
q q
q
q q
q
q q q q q q
q
q
q q
q
q q q
q
q
q
q
0 10 20 30 40 50 60
01234
ψi
(a)
q q q q q q q q q
q
q q q q q
q
q
q q q q q q q
q
q q q q q q
q
q q
q
q
q
q
q q q q q q q q q q
q
q q
q
q q q
q
q
q
q
0 10 20 30 40 50 60
0.11100.1120
αi
(b)
q q q q q q q q q
q
q q q q q
q q
q q q q q q q
q
q q q q q q q q q
q
q q
q
q q
q
q q q q q q q q q q
q
q q q
q
q
q
q
0 10 20 30 40 50 60
02468
Subject
LRTi
(c)
56
58
Only subject 58 is a potential outlier.
Parameter estimates of combined VSOMs fitted to the epilepsy data set.
Parameter M0 M1 M2 M3
Estimate (s.e.) Estimate (s.e.) Estimate (s.e.) Estimate (s.e.)
constant -1.326 (1.210 -1.015 (1.199) -1.558 (1.163) -1.273 (1.149)
lbase 0.881 (0.129) 0.834 (0.128) 0.880 (0.124) 0.834 (0.122)
treatment -0.887(0.392 -0.932 (0.387) -0.799 (0.378) -0.846 (0.373)
treatment × lbase 0.337 (0.198) 0.372 (0.196) 0.308 (0.190) 0.343 (0.187)
log(age) 0.496 (0.360) 0.432 (0.357) 0.574 (0.345) 0.508 (0.342)
visit -0.264 (0.116) -0.312 (0.136) -0.264 (0.158) -0.312 (0.136)
γ 0.235 (0.051) 0.244 (0.051) 0.208 (0.046) 0.216 (0.047)
α 0.051 (0.011) 0.112 (0.018) 0.052 (0.011)
λ40 3.353 (4.091) 3.309 (4.037)
λ62 3.665 (4.435) 3.680 (4.453)
λ63 3.565 (4.314) 3.580 (4.332)
λ78 2.891 (3.614) 2.878 (3.598)
λ99 2.040 (2.693) 2.063 (2.723)
λ221 2.195 (2.766) 2.173 (2.738)
ψ58 4.012 (4.774) 4.097 (4.875)
deviance 1265.425 1217.112 1256.919 1208.506
Combined Negative binomial-gamma VSOMs (denoted M1, M2, M3, respectively)
accommodate outliers in the analysis, and perform better than the null model
(denoted M0).
Conclusions and future work
The VSOM for count data can be used to identify outliers, and down-weight them
in the analysis if desired.
An advantage of the VSOM over case deletion methods is ability to both identify
and down-weight outlying observations rather than deleting them.
Extension of the parametric bootstrap procedure of Gumedze et al. (2010) to
obtain a sampling distribution for the likelihod ratio test statistics and deal with the
problem of multiple testing.
Acknowledgements
Funding for this research was provided by University of Cape Town and the National
Research Foundation.

4thchannel conference poster_freedom_gumedze

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 4thchannel conference poster_freedom_gumedze

Similar to 4thchannel conference poster_freedom_gumedze (20)

Recently uploaded

Recently uploaded (20)

4thchannel conference poster_freedom_gumedze