This document proposes a variance shift outlier model (VSOM) to detect and accommodate outliers in count data regression models. The VSOM treats outliers as additional random effects in a hierarchical generalized linear model. The model is applied to epilepsy patient seizure count data to identify potential outlier observations and subjects. Combined VSOM models are shown to fit the data better than a standard negative binomial-gamma model by down-weighting outlier observations. Future work includes developing a parametric bootstrap procedure to obtain sampling distributions for likelihood ratio test statistics used to identify outliers while addressing multiple testing issues.
Residuals represent variation in the data that cannot be explained by the model.
Residual plots useful for discovering patterns, outliers or misspecifications of the model. Systematic patterns discovered may suggest how to reformulate the model.
If the residuals exhibit no pattern, then this is a good indication that the model is appropriate for the particular data.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
In classical data analysis, data are single values. This is the case if you consider a dataset of n patients which age and size you know. But what if you record the blood pressure or the weight of each patient during a day ? Then, for each patient, you do not have a single-valued data but a set of values since the blood pressure or the weight are not constant during the day.
Suppose now that you do not want to record blood pressure a thousand times for each patient and to store it into a database because your memory space is limited. Therefore, you need to aggregate each set of values into symbols: intervals (lower and upper bounds only), box plots, histograms or even distributions (distribution law with mean and variance)...
Thus, the issue is to adapt classical statistical tools to symbolic data analysis. More precisely, this article is aimed at proposing a method to fit a regression on Gaussian distributions. This paper is divided as follows: first, it presents the computation of the maximum likelihood estimator and then it compares the new approach with the usual least squares regression.
A big task often faced by practitioners is in deciding the appropriate model to adopt
in fitting count datasets. This paper is aimed at investigating a suitable model for fitting
highly skewed count datasets. Among other models, COM-Poisson regression model was
proposed in this paper for fitting count data due to its varying normalizing constant. Some
statistical models were investigated along with the proposed model; these include
Poisson, Negative Binomial, Zero-Inflated, Zero-inflated Poisson and Quasi- Poisson
models. A real life dataset relating to visits to Doctor within a given period was equally
used to test the behavior of the underlying models. From the findings, it is recommended
that COM-Poisson regression model should be adopted in fitting highly skewed count
datasets irrespective of the type of dispersion.
We provide a comprehensive convergence analysis of the asymptotic preserving implicit-explicit particle-in-cell (IMEX-PIC) methods for the Vlasov–Poisson system with a strong magnetic field. This study is of utmost importance for understanding the behavior of plasmas in magnetic fusion devices such as tokamaks, where such a large magnetic field needs to be applied in order to keep the plasma particles on desired tracks.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Landownership in the Philippines under the Americans-2-pptx.pptx
4thchannel conference poster_freedom_gumedze
1. Detection of outliers in Poisson regression models via overdispersion
Freedom Gumedze and Tinashe Chatora
Department of Statistical Sciences, University of Cape Town
http://www.stats.uct.ac.za Email: freedom.gumedze@uct.ac.za
Introduction
Both undispersed and overdispersed count may contain outliers
We propose a variance shift outlier model (VSOM) for the detection and
accommodation of outliers in count data
Our proposed model is a form of a hierarchical generalized linear model (HGLM)
We consider both independent and longitudinal data settings
Hierarchical generalized linear model (HGLM)
A HGLM has the the following properties (Lee and Nelder, 1996):
Let Yij be the jth observation for the ith subject and bi be the unobserved random
effect for the ith subject, for i = 1, . . . , q and j = 1, . . . , ni. Conditional on bi, Yij
follows an exponential family distribution and has the following properties
E(Yij|bi) = µij and var(Yij|bi) = φV (µij),
where V (.) is a monotonic function of µij and φ is the dispersion parameter. The
linear predictor for µij takes the form
g(E(Yij|bi)) = g(µij) = ηij = Xijβ + νi, (1)
where νi is a monotonic function of bi, Xij is the jth row of the design matrix Xi
and Xi is a ni × p design matrix for the fixed effects for the ith subject.
The random component bi follows a distribution conjugate to an exponential family
of distributions with parameter λi.
Negative binomial model GLM and Poisson-gamma HGLM
The negative binomial GLM can be fitted as a Poisson-gamma HGLM with a
saturated random effect
log[E(Yi|si)] = Xiβ + νsi
, (2)
where si is the random effect for the ith observation. Let νsi
= log(si), with si
following a gamma distribution with a mean of one and variance of α.
The model has the negative binomial variance
var(Yi) = µi + αµ2
i . (3)
αµ2
i measures the amount overdispersion.
Variance shift outlier model (VSOM) for Poisson count data
Independent count data: a VSOM for the ith observation
log[E(Yi|δi)] = Xiβ + νδi
, (4)
where δi is a random effect for the ith count, νδi
= log(δi) and δi has a gamma
distribution with a mean of one and variance of λi.
Longitudinal setting
VSOM for the ijth observation:
ηij = log[E(Yij|bi, δij)] = Xijβ + νbi
+ νδij
, (5)
where both bi and δij follow gamma distributions with each mean of one, and variances λij and γ,
respectively.
VSOM for the ith subject
ηij = log[E(Yij|bi)] = Xijβ + νbi
+ νζi
, (6)
where both bi and ζi follow gamma distributions with each mean of one, and variances γ and τi,
respectively.
Large estimates of the variance parameters λi, λij or τi are indicative of potential
outliers
Likelihood ratio tests (LRTs) are used to test for variance parameters, with LRTs
having 0.68χ2
0 + 0.32χ2
1 mixture distributions.
Application: Epilepsy data
Data description: The dataset is taken from Thall and Vail (1990) and contains
59 patients with epilepsy who were randomized to a new drug or a placebo. For each
patient the number of seizure counts were recorded at baseline, and every fortnight
during a 8-week period.
Initial model: Negative binomial - gamma HGLM (since the data are overdispersed):
log[E(Yij|bi)] = (β0 + bi) + β1lij + β2tij + β3tijlij + β4aij + β5vij + δij,
where lij = log(baseline seizure count), vij is the linear trend for the visits, coded as
(−3, −1, 1, 3)/10, bi is the subject random effect.
VSOM for the ijth observation:
log[E(Yij|bi)] = (β0 + bi) + β1lij + β2tij + β3tijlij + β4aij + β5vij + δij,
where δij is the random effect for the ijth observation.
VSOM for the ith subject:
log[E(Yij|bi)] = (β0 + bi) + β1lij + β2tij + β3tijlij + β4aij + β5vij + ζi,
where ζi is the random effect for the ith subject.
Application: continued
qqqqqqqqqq
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
qqqqqqq
qq
q
q
qq
q
qqqqqqqqqqqqqq
q
qq
q
qq
q
q
q
qqqqqqqqqqq
q
q
q
qqqqqqqqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qq
q
q
q
qqqq
q
qqq
q
qqqqqqqqqqq
q
q
q
qqq
q
q
q
qqqqq
q
q
q
q
qq
q
qqq
q
qqq
q
q
q
qqqqqq
q
qqqqqq
qqqqqqqqqqqqqq
q
q
q
qqq
q
qqqq
q
qqqq
q
q
q
qqq
q
qqqqqqqqq
0 50 100 150 200
0.01.02.03.0
λk
(a)
qqqqqqqqqqqqqqqq
q
qq
q
qqqqqq
q
q
q
qqqqqqq
qqq
q
qq
q
qqqqqqqqqqqqqq
q
qq
q
qq
qqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqq
q
qqq
q
qqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqq
q
qqqq
q
q
q
qqqqqqqqqqqqq
0 50 100 150 200
0.080.100.12
αk
(b)
qqqqqqqqqq
q
qqqqq
q
qqqq
q
qqqq
q
q
q
qqqqqqq
qq
q
q
qq
q
qqqqqqqqqqqqqqqqq
q
qq
q
qq
qqqqqqqqqqq
q
qqqqqqqqqqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qq
q
qqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqq
q
q
qqqqq
q
qq
q
qq
q
qqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqq
q
qqqq
q
qqqq
q
q
q
qqq
q
qqqqqqqqq
0 50 100 150 200
02468
Observations
LRTk
(c)
12345678910
11
1213141516
17
18192021
22
23242526
27
282930313233343536
3738
39
40
4142
43
4445464748495051525354555657585960
61
6263
64
6566
6768697071727374757677
78
79808182838485868788899091929394
95
96
97
98
99
100101102103104105106107108109110111112113114115116117118119120
121
122123
124
125126127128129130131132133134135136137138139140141142143144145146
147
148
149
150151152153
154
155
156157158159160
161
162163
164
165166
167
168169170171172173174175176
177
178179180181182183184185186187188189190191192193194195196197198199200201202203204
205
206207208209210211
212213214215
216
217218219220
221
222
223
224225226
227
228229230231232233234235236
Negative binomial-gamma VSOM statistics plotted against observation number. (c)
Likelihood ratio statistics, LRTk with rth percentiles from 0.68χ2
0 + 0.32χ2
1 mixture
distribution: r = 95 (solid line), r = 97.5 (dashed line) and r = 99 (dotted line).
k = 1, . . . , N = 236.
Potential outliers: observations 40, 62, 62, 78, 99 and 221.
Application: continued
q q q q q q q q q
q
q q q q q
q q
q q q q q q q
q
q
q q q q q q q q
q
q q
q
q q
q
q q q q q q
q
q
q q
q
q q q
q
q
q
q
0 10 20 30 40 50 60
01234
ψi
(a)
q q q q q q q q q
q
q q q q q
q
q
q q q q q q q
q
q q q q q q
q
q q
q
q
q
q
q q q q q q q q q q
q
q q
q
q q q
q
q
q
q
0 10 20 30 40 50 60
0.11100.1120
αi
(b)
q q q q q q q q q
q
q q q q q
q q
q q q q q q q
q
q q q q q q q q q
q
q q
q
q q
q
q q q q q q q q q q
q
q q q
q
q
q
q
0 10 20 30 40 50 60
02468
Subject
LRTi
(c)
56
58
Only subject 58 is a potential outlier.
Application: continued
Parameter estimates of combined VSOMs fitted to the epilepsy data set.
Parameter M0 M1 M2 M3
Estimate (s.e.) Estimate (s.e.) Estimate (s.e.) Estimate (s.e.)
constant -1.326 (1.210 -1.015 (1.199) -1.558 (1.163) -1.273 (1.149)
lbase 0.881 (0.129) 0.834 (0.128) 0.880 (0.124) 0.834 (0.122)
treatment -0.887(0.392 -0.932 (0.387) -0.799 (0.378) -0.846 (0.373)
treatment × lbase 0.337 (0.198) 0.372 (0.196) 0.308 (0.190) 0.343 (0.187)
log(age) 0.496 (0.360) 0.432 (0.357) 0.574 (0.345) 0.508 (0.342)
visit -0.264 (0.116) -0.312 (0.136) -0.264 (0.158) -0.312 (0.136)
γ 0.235 (0.051) 0.244 (0.051) 0.208 (0.046) 0.216 (0.047)
α 0.051 (0.011) 0.112 (0.018) 0.052 (0.011)
λ40 3.353 (4.091) 3.309 (4.037)
λ62 3.665 (4.435) 3.680 (4.453)
λ63 3.565 (4.314) 3.580 (4.332)
λ78 2.891 (3.614) 2.878 (3.598)
λ99 2.040 (2.693) 2.063 (2.723)
λ221 2.195 (2.766) 2.173 (2.738)
ψ58 4.012 (4.774) 4.097 (4.875)
deviance 1265.425 1217.112 1256.919 1208.506
Combined Negative binomial-gamma VSOMs (denoted M1, M2, M3, respectively)
accommodate outliers in the analysis, and perform better than the null model
(denoted M0).
Conclusions and future work
The VSOM for count data can be used to identify outliers, and down-weight them
in the analysis if desired.
An advantage of the VSOM over case deletion methods is ability to both identify
and down-weight outlying observations rather than deleting them.
Extension of the parametric bootstrap procedure of Gumedze et al. (2010) to
obtain a sampling distribution for the likelihod ratio test statistics and deal with the
problem of multiple testing.
Acknowledgements
Funding for this research was provided by University of Cape Town and the National
Research Foundation.