This document proposes and analyzes low-rank linear regression models for classification tasks. It shows that low-rank linear regression is equivalent to performing linear regression in the subspace defined by linear discriminant analysis (LDA). It further develops regularized extensions, including low-rank ridge regression and sparse low-rank regression, and proves their connections to regularized LDA. Experimental results on six datasets demonstrate that the proposed low-rank models achieve better classification accuracy than full-rank baselines, especially when the rank is small.
The Engineer of Industrial Universtiy of Santander, Elkin Santafe, give us a little summary about direct methods for the solution of systems of equations
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...Alkis Vazacopoulos
Presented in this short document is a description of our technology we call “Sparse Observability”. Observability is the estimatability metric (Bagajewicz, 2010) to structurally determine that an unmeasured variable or regressed parameter is either uniquely solvable (observable) or otherwise unsolvable (unobservable) in data reconciliation and regression (DRR) applications. Ultimately, our purpose to use efficient sparse matrix techniques is to solve large industrial DRR flowsheets quickly and accurately.
Most other implementations of observability calculation use dense linear algebra such as reduced row echelon form (RREF), Gauss-Jordan decomposition (Crowe et. al. 1983; Madron 1992), QR factorization which can now be considered as semi-sparse (Swartz, 1989; Sanchez and Romagnoli, 1996), Schur complements, Cholesky factorization (Kelly, 1998a) and singular value decomposition (SVD) (Kelly, 1999). A sparse LU decomposition with complete-pivoting from Albuquerque and Biegler (1996) for dynamic data reconciliation observability computation was used but it is uncertain if complete-pivoting causes extreme “fill-ins” of the lower and upper triangular matrices essentially making them near-dense. There is another sparse observability method using an LP sub-solver found in Kelly and Zyngier (2008) but this requires solving as many LP sub-problems as there are unmeasured variables which can be considered as somewhat inefficient.
IMPL’s sparse observability technique uses the variable classification and nomenclature found in Kelly (1998b) given that if we partition or separate the unmeasured variables into independent (B12) and dependent (B34) sub-matrices then all dependent unmeasured variables by definition are unobservable. If any independent unmeasured variable is a (linear) function of any dependent variable then this independent variable is of course also unobservable because it is dependent on another non-observable variable.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
The Engineer of Industrial Universtiy of Santander, Elkin Santafe, give us a little summary about direct methods for the solution of systems of equations
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...Alkis Vazacopoulos
Presented in this short document is a description of our technology we call “Sparse Observability”. Observability is the estimatability metric (Bagajewicz, 2010) to structurally determine that an unmeasured variable or regressed parameter is either uniquely solvable (observable) or otherwise unsolvable (unobservable) in data reconciliation and regression (DRR) applications. Ultimately, our purpose to use efficient sparse matrix techniques is to solve large industrial DRR flowsheets quickly and accurately.
Most other implementations of observability calculation use dense linear algebra such as reduced row echelon form (RREF), Gauss-Jordan decomposition (Crowe et. al. 1983; Madron 1992), QR factorization which can now be considered as semi-sparse (Swartz, 1989; Sanchez and Romagnoli, 1996), Schur complements, Cholesky factorization (Kelly, 1998a) and singular value decomposition (SVD) (Kelly, 1999). A sparse LU decomposition with complete-pivoting from Albuquerque and Biegler (1996) for dynamic data reconciliation observability computation was used but it is uncertain if complete-pivoting causes extreme “fill-ins” of the lower and upper triangular matrices essentially making them near-dense. There is another sparse observability method using an LP sub-solver found in Kelly and Zyngier (2008) but this requires solving as many LP sub-problems as there are unmeasured variables which can be considered as somewhat inefficient.
IMPL’s sparse observability technique uses the variable classification and nomenclature found in Kelly (1998b) given that if we partition or separate the unmeasured variables into independent (B12) and dependent (B34) sub-matrices then all dependent unmeasured variables by definition are unobservable. If any independent unmeasured variable is a (linear) function of any dependent variable then this independent variable is of course also unobservable because it is dependent on another non-observable variable.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
This presentation is about electromagnetic fields, history of this theory and personalities contributing to this theory. Applications of electromagnetism. Vector Analysis and coordinate systems.
Robust Low-rank and Sparse Decomposition for Moving Object DetectionActiveEon
Presentation summary:
* Moving object detection by background modeling and subtraction.
* Solved and unsolved challenges.
* Framework for low-rank and sparse decomposition.
* Some applications of RPCA on:
* * Background modeling and foreground separation.
* * Very dynamic background.
* * Multidimensional and streaming data.
* LRSLibrary1 + demo.
Recent developments in the field of reduced order modeling - and in particular, active subspace construction - have made it possible to efficiently approximate complex models by constructing low-order response surfaces based upon a small subspace of the original high dimensional parameter space. These methods rely upon the fact that the response tends to vary more prominently in a few dominant directions defined by linear combinations of the original inputs, allowing for a rotation of the coordinate axis and a consequent transformation of the parameters. In this talk, we discuss a gradient free active subspace algorithm that is feasible for high dimensional parameter spaces where finite-difference techniques are impractical. We illustrate an initialized gradient-free active subspace algorithm for a neutronics example implemented with SCALE6.1.
Simple Regression presentation is a
partial fulfillment to the requirement in PA 297 Research for Public Administrators, presented by Atty. Gayam , Dr. Cabling and Mr. Cagampang
1. On The Equivalent of Low-Rank Linear Regressions and Linear Discriminant Analysis Based Regressions
Xiao Cai, Chris Ding, Feiping Nie, Heng Huang CSE Department, The University of Texas at Arlington
xiao.cai@mavs.uta.edu, chqding@uta.edu, feipingnie@gmail.com, heng@uta.edu
Problem
Multivariate linear regression attempts to model the
relationship between predictors and responses by
fitting a linear equation to observed data. Such
linear regression models suffer from the following
two deficiencies. On one hand, the linear regression
models usually have low performance for analyzing
the high-dimensional data. To perform accurate re-
gression or classification tasks on such data, we have
to collect an enormous number of samples. Howev-
er, due to the data and label collection difficulty,
we often cannot obtain enough samples and suffer
from the curse-of-dimensionality problem [1]. On
the other hand, the linear regression models don’t
emphasize the correlations among different respons-
es. Standard least squares regression is equivalent
to regressing each response on the predictors sepa-
rately.
Our Key Contributions
(1) We prove that the low-rank linear regression is e-
quivalent to doing linear regression in the LDA sub-
space.
(2) We derive global and concise algorithms for low-
rank regression models.
(3) We show the connection between low-rank re-
gression and regularized LDA. From both theory
and experiments, the low-rank ridge regression has
better performance than the low-rank linear regres-
sion, which has been used in many existing studies.
(4) To solve related feature selection problem, we
propose the sparse low-rank regression method with
exploring both classes/tasks correlations and fea-
ture structures.
Reference
[1] D.Donoho: High-dimensional data analysis:
The curses and blessings of dimensionality. AM-
S Math Challenges Lecture, pages 1-32, (2000)
[2] T. Anderson. Estimating linear retrictions on
regression coefficients for multivariate normal
distributions. AMS, pages 327-351, (1951)
Linear Low-Rank Regression And LDA + LR
Traditional Linear Regression model for classification is
to solve the following problem:
min
W
||Y − XT
W ||2
F , (1)
where X = [x1, x2, ...., xn] ∈ ℜd×n
is the centered
training data matrix and Y ∈ ℜn×k
is the normalized
class indicator matrix, i.e. Yi,j = 1/
√
nj if the i-th data
point belongs to the j-th class and Yi,j = 0 otherwise
and nj is the sample size of the j-th class.
When the class or task number is large, there are of-
ten underlying correlation structures between classes or
tasks. To incorporate the response correlations into the
regression method [2], we propose the following discrimi-
nant Low-Rank Linear Regression formulation (LRLR):
min
A,B
||Y − XT
AB||2
F , (2)
where A ∈ ℜd×s
, B ∈ ℜs×k
, s < min(n, k). Thus
W = AB has low-rank s.
Theorem 1 The low-rank linear regression method of
Eq. (2) is identical to doing standard linear regression
in LDA subspace.
Proof: Denoting J1(A, B) = ||Y −XT
AB||2
F and taking
its derivative w.r.t. B, we have,
∂J1(A, B)
∂B
= −2AT
XY + 2AT
XXT
AB. (3)
Setting Eq. (3) to zero, we obtain,
B = (AT
XXT
A)−1
AT
XY. (4)
Substituting Eq. (4) back into Eq. (2), we have,
min
A
||Y − XT
A(AT
XXT
A)−1
AT
XY ||2
F , (5)
which is equivalent to
max
A
Tr ((AT
(XXT
)A)−1
AT
XY Y T
XT
A). (6)
Note that
St = XXT
, Sb = XY Y T
XT
, (7)
where St and Sb are the total-class scatter matrix and
the between-class scatter matrix defined in the LDA,
respectively. Therefore, the solution of Eq. (6) can be
written as:
A∗
= arg max
A
Tr [(AT
StA)−1
AT
SbA], (8)
which is exactly the problem of LDA.
Two Extensions: LRRR, SLRR
Theorem 2 The proposed Low-Rank Ridge Regres-
sion (LRRR) method min
A,B
||Y −XT
AB||2
F +λ||AB||2
F
is equivalent to doing the regularized regression in
the regularized LDA subspace.
Theorem 3 The optimal solution of the proposed
SLRR method min
A,B
||Y − XT
AB||2
F + λ||AB||2,1 has
the same column space of a special regularized LDA.
Algorithms
The algorithm to LRLR or LRRR or SLRR
Input:
1. The centralized training data X ∈ ℜd×n.
2. The normalized training indicator matrix Y ∈ ℜn×k.
3. The low-rank parameter s.
4. IF LRRR or SLRR, the regularization parameter λ.
Output:
1. The matrices A ∈ ℜd×s and B ∈ ℜs×k.
Process:
IF LRLR,
Calculate A by Eq. (8)
Calculate B by Eq. (4)
ELSE IF LRRR,
Calculate A by
A∗ = arg max
A
{Tr((AT (St + λI)A)−1AT SbA)}
Calculate B by
B = (AT (XXT + λI)A)−1AT XY
ELSE IF SLRR,
Initialization:
1. Set t = 0
2. Initialize D(t) = I ∈ ℜd×d.
Repeat:
1. Calculate A(t+1)
A∗ = arg max
A
{Tr ((AT (St + λD)A)−1AT SbA)}
2. Calculate B(t+1)
B = (AT (XXT + λD)A)−1AT XY
3. Update the diagonal matrix D(t+1) ∈ ℜd×d, where the
i-th diagonal element is 1
2||(A(t+1)B(t+1))i||2
.
4. Update t = t + 1
Until Converge.
END
Experiment Data Summary
Dataset k d n
UMIST 20 10304 575
BIN36 36 320 1404
BIN26 26 320 1014
VOWEL 11 10 990
MNIST 10 784 150
JAFFE 10 1024 213
Experiment Results
The average classification accuracy V.S. the rank
using 5-fold C.V. on six datasets, low rank is
marked as red and full rank is marked as blue. Left
column: linear regression; middle column: ridge
regression; right column: sparse regression
11 12 13 14 15 16 17 18 19
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
The number of rank s
Theclassificationacc
full rank
low rank
(a) UMIST linear regression
11 12 13 14 15 16 17 18 19
0.91
0.92
0.93
0.94
0.95
0.96
0.97
The number of rank s
Theclassificationacc
full rank
low rank
(b) UMIST ridge regression
11 12 13 14 15 16 17 18 19
0.935
0.94
0.945
0.95
0.955
0.96
0.965
0.97
0.975
The number of rank s
Theclassificationacc
full rank
low rank
(c) UMIST sparse linear regression
5 6 7 8 9 10
0.29
0.291
0.292
0.293
0.294
0.295
0.296
0.297
0.298
0.299
The number of rank s
Theclassificationacc
full rank
low rank
(d) VOWEL linear regression
5 6 7 8 9 10
0.292
0.294
0.296
0.298
0.3
0.302
0.304
0.306
0.308
The number of rank s
Theclassificationacc
full rank
low rank
(e) VOWEL ridge regression
5 6 7 8 9 10
0.29
0.292
0.294
0.296
0.298
0.3
0.302
0.304
0.306
The number of rank s
Theclassificationacc
full rank
low rank
(f) VOWEL sparse linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.36
0.37
0.38
0.39
0.4
0.41
0.42
0.43
0.44
0.45
The number of rank s
Theclassificationacc
full rank
low rank
(g) MNIST linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
The number of rank s
Theclassificationacc
full rank
low rank
(h) MNIST ridge regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.65
0.7
0.75
0.8
0.85
The number of rank s
Theclassificationacc
full rank
low rank
(i) MNIST sparse linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.6
0.65
0.7
0.75
0.8
0.85
The number of rank s
Theclassificationacc
full rank
low rank
(j) JAFFE linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.85
0.9
0.95
1
The number of rank s
Theclassificationacc
Least square loss with ridge regression
full rank
low rank
(k) JAFFE ridge regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
The number of rank s
Theclassificationacc
full rank
low rank
(l) JAFFE sparse linear regression
17 19 21 23 25 27 29 31 33 35
0.34
0.35
0.36
0.37
0.38
0.39
0.4
The number of rank s
Theclassificationacc
full rank
low rank
(m) BINALPHA36 linear regression
17 19 21 23 25 27 29 31 33 35
0.59
0.595
0.6
0.605
0.61
0.615
The number of rank s
Theclassificationacc
full rank
low rank
(n) BINALPHA36 ridge regression
17 19 21 23 25 27 29 31 33 35
0.58
0.585
0.59
0.595
0.6
0.605
0.61
The number of rank s
Theclassificationacc
full rank
low rank
(o) BINALPHA36 sparse linear regres-
sion
13 15 17 19 21 23 25
0.36
0.37
0.38
0.39
0.4
0.41
0.42
0.43
0.44
0.45
The number of rank s
Theclassificationacc
full rank
low rank
(p) BINALPHA26 linear regression
13 15 17 19 21 23 25
0.645
0.65
0.655
0.66
0.665
0.67
0.675
0.68
0.685
The number of rank s
Theclassificationacc
full rank
low rank
(q) BINALPHA26 ridge regression
13 15 17 19 21 23 25
0.635
0.64
0.645
0.65
0.655
0.66
0.665
The number of rank s
Theclassificationacc
full rank
low rank
(r) BINALPHA26 sparse linear regres-
sion
Demonstration of the low-rank structure and sparse
structure found by our proposed SLRR method.
0 5 10 15 20
0
0.2
0.4
0.6
0.8
1
1.2
1.4
x 10
−3
index of singular value
singularvalue
index of class
absofweightcoefficients
5 10 15 20
100
200
300
400
500
600
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
x 10
−4
(a) UMIST low-rank structure and sparse structure
1 2 3 4 5 6 7 8 9 10
0
0.01
0.02
0.03
0.04
0.05
0.06
index of singular value
singularvalue
index of class
absofweightcoefficients
2 4 6 8 10
1
2
3
4
5
6
7
8
9
10
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
(b) VOWEL low-rank structure and sparse structure
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
1.2
x 10
−3
index of singular value
singularvalue
index of class
absofweightcoefficients
2 4 6 8 10
100
200
300
400
500
600
700
0.5
1
1.5
2
2.5
x 10
−4
(c) MNIST low-rank structure and sparse structure
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
1.2
x 10
−3
index of singular value
singularvalue
index of class
absofweightcoefficients
2 4 6 8 10
100
200
300
400
500
600
700
800
900
1000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
−4
(d) JAFFE low-rank structure and sparse structure
0 5 10 15 20 25 30 35 40
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
index of singular value
singularvalue
index of class
absofweightcoefficients
5 10 15 20 25 30 35
50
100
150
200
250
300
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
−3
(e) BINALPHA36 low-rank structure and sparse structure
0 5 10 15 20 25 30
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
index of singular value
singularvalue
index of class
absofweightcoefficients
5 10 15 20 25
50
100
150
200
250
300
0.5
1
1.5
2
2.5
3
3.5
4
x 10
−3
(f) BINALPHA26 low-rank structure and sparse structure