Multiple Linear Regression Models in Outlier Detection
poster_Reza
1. Adapt-then-Combine (ATC) diffusion strategy
•Error analysis:
The algorithm is stable in mean if
•Steady state mean variance:
• Assuming small probability of missing, we have
• Smoothing filters
to estimate the
variance
• Relation between perfect and imperfect estimates is give as
The estimate is biased with respect to
• To compensate the bias, we associate the following individual cost to each agent:
where is a symmetric matrix to be chosen.
To have an unbiased estimate, i.e.,
The minimum cost:
Assumption 3: The covariance matrix of regressor is diagonal.
Under assumption 3:
Mohammad Reza Gholami1
, Erik G. Ström1
, and Ali H. Sayed2
1
Department of Signals and Systems, Chalmers University of Technology, Gothenburg, SE-412 96, Sweden
2
Electrical Engineering Department, University of California, Los Angeles, CA 90095, USA
Emails: {moreza,erik.strom}@chalmers.se, sayed@ee.ucla.edu
In many fields, and especially in the medical and social sciences and in various
recommender systems, data are often gathered through clinical studies or targeted
surveys. Participants are generally reluctant to respond to all questions in a survey or
they may lack information to respond adequately to the questions. The data collected
from these studies tend to lead to linear regression models where the regression
vectors are only known partially: some of their entries are either missing completely or
replaced randomly by noisy values. There are also situations where it is not known
beforehand which entries are missing or censored. There have been many useful
studies in the literature on techniques to perform estimation and inference with
missing data. In this work, we examine how a connected network of agents, with each
one of them subjected to a stream of data with incomplete regression information, can
cooperate with each other through local interactions to estimate the underlying model
parameters in the presence of missing data. We explain how to modify traditional
distributed strategies through regularization in order to eliminate the bias introduced
by the incomplete model. We also examine the stability and performance of the
resulting diffusion strategy and provide simulations in support of the findings. We
consider two applications: one dealing with a mental health survey and the other
dealing with a household consumption survey.
Diffusion Estimation over Cooperative Networks with Missing Data
AbstractAbstract
System ModelSystem Model
• Consider a connected network. Each agent senses a wide-sense stationary data
that satisfy the following linaer regresson model:
Assumption 1: The regression and the noise processes are each spatially independent and
temporally white. In addition,
• The model for incomplete regressor : (1)
Assumption 2: Random variables are independent of each other.
• Optimal estimator (minimum-mean-square error):
Perfect: Missing data:
The minimum cost for the perfect scenario:
(www.asl.ee.ucla.edu)
Simulation ResultsSimulation Results• In data gathering procedures, it is common that some components of the data are
missing or left unobserved, e.g., a participant may be reluctant to answer some
questions in a clinical study.
• Data can be missed in a random or deterministic fashion.
• Two techniques to deal with missing data are: imputation, which makes biased in
estimation, and deletion, which degrades the performance.
• This work studies the missing data problem over a network of agents, with each one
of them subjected to a stream of data with incomplete regression information, can
cooperate with each other to estimate the underlying model parameters in the
presence of missing data.
• In this study, we consider a linear regression model.
• We adjust the traditional diffusion strategies through (de)regularization in order to
mitigate the bias introduced by imputation.
• We consider two applications: one dealing with a mental health survey and the other
dealing with a household consumption survey.
IntroductionIntroduction
Bias CompensationBias Compensation
Distributed AlgorithmDistributed Algorithm
• Household Consumption:
• Mental Health Survey:
Adaptive Systems Laboratory
Estimation of Regularization ParameterEstimation of Regularization Parameter
Ncoop: Non Cooperative
MATC: Modified ATC