2.
A new algorithm for solving L1 – regularized least squares
problem which is more efficient for learning sparse coding
bases
A new approach for L2 – constrained least squares
problem which result in significant speed up for sparse
coding
Goal
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
3. What is Sparse Coding?
Sparse Coding applications in Computer Vision
Image Denoising Image Restoration
Introduction
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
4. Sparse coding is a method for discovering good basis
vectors automatically using only unlabeled data
It learns the basis functions that capture high-level
features in the data
Input Features selected
Sparse Coding Problem
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
5. Sparse coding is a method for discovering good basis
vectors automatically using only unlabeled data
It is similar to PCA
Given a training set of m vectors
where
we attempt to find a succinct representation for each xi
using basis vectors and a sparse vector
such that
Note that the basis can be overcomplete, i.e., n>k
Sparse Coding Problem
1 2, ,[ ], mx xX x L
k
ix R
1 2, , , k
nbb b L R n
sR
1 2
1
[ , , , ]
n
i j j n
j
x s b b b b s
L
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
6. The goal of sparse coding is to present input vectors as
weighted linear combinations of ‘basis vectors’, which capture
high level patterns in input data
The optimization problem in sparse coding
where
and ᶲ is a sparse penalty function (we consider L1 penalty
function).
Sparse Coding Problem
1 1, , , , ,[ ], [ ], [ 1, ]m n mx bB b S sX x s L L L
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
7. The formulation of LASSO
where 𝑥, 𝑦 are vectors, 𝐴 is a matrix and 𝛾 is a constant.
Basic idea of the algorithm is
To get the most useful attributes in a vector (data record)
To guess the sign of each component (attribute) of 𝑥 (data
record), thus, guessing the impact of any changes in the
attribute to the classification of the data record.
A new algorithm to solve
LASSO
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
10. Consider optimization problem (given in the LASSO slide) augmented
with the additional constraint that 𝑥 is consistent with a given active
set and sign vector. Then, if the current coefficients 𝑥 𝑐 are consistent
with the active set and sign vector, but are not optimal for the
augmented problem at the start of Step 3, the feature-sign step is
guaranteed to strictly reduce the objective.
Consider optimization problem (LASSO equation) augmented with the
additional constraint that 𝑥 is consistent with a given active set and
sign vector. If the coefficients 𝑥 𝑐 at the start of Step 2 are optimal for
the augmented problem, but are not optimal for problem (LASSO
equation), the feature-sign step is guaranteed to strictly reduce the
objective.
The feature-sign search algorithm converges to a global optimum of
the optimization problem in a finite number of steps.
Proofs of the Algorithm
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
12. UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
Solving optimization problem over bases B and given fixed
coefficients S.
This is least squares problem with quadratic constraints,
which can be efficiently solved using Lagrange dual.
After the calculations, we find the optimal bases 𝐵 as
follows :
14. Performance of the algorithms was evaluated on four natural
stimulus datasets:
Natural Images
Speech
Stereo Images
Natural Image Videos
All experiments were conducted on a Linux machine with AMD
Opteron 2GHz CPU and 2GB RAM
All the algorithms were implemented in MATLAB
Experiment
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
15. Evaluating Feature sign search algorithm for learning coefficients with L1
sparsity function
Running time and error are compared with other
coefficient learning algorithms
For each dataset, a test set of 100 input vectors and a training set of 1000
input vectors was used
Values: Running Time (Relative Error)
Evaluating Feature Sign
Search Algorithm
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
16. Running time (in seconds) for different algorithm combinations
of coefficient learning and basis learning algorithms using
different sparsity functions is shown below:
Time Taken for learning
Bases
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
17. Using these efficient algorithms they were able to learn
overcomplete bases of natural images
1024 bases 2000 bases
(14 X 14 pixels each) (20 X 20 pixels each)
Learning overcomplete
natural images
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
18. Sparse coding can model the interaction (inhibition)
between the bases (neurons) by sparsifying their
coefficients (activations), and our algorithms enable these
phenomena to be tested with highly overcomplete bases.
They evaluated whether end-stopping behavior could be
observed in sparse coding framework. The results seemed
consistent with the end stopping behavior of the neurons.
Using the learned overcomplete bases, they tested for
center-surround non classical receptive field (nCRF)
effects.
Replicating Complex
Neuroscience phenomena
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
19. They applied sparse coding approaches to self-taught learning,
a new machine learning formalism.
A supervised learning problem along with additional unlabeled
instances that may not have same class labels as labeled
instances.
Sparse coding algorithms are applied to unlabeled data to learn
bases which gives a higher level representation of images, thus
making supervised learning task easier.
This approach proved 11-36% reductions in test error.
Related Work: R. Raina, A. Battle, H. Lee, B. Packer, and A. Y.
Ng. Self-taught learning. In NIPS Workshop on Learning when
test and training inputs have different distributions, 2006
Application to self-taught
learning
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
20. In this paper, sparse coding is formulated as a
combination of two convex optimization problems
Efficient algorithms for these problems were presented:
the feature-sign search for solving the L1-least squares
problem to learn coefficients, and a Lagrange dual
method for the L2-constrained least squares problem to
learn the bases for any sparsity penalty function.
Partially explain the phenomena of end-stopping and
nCRF surround suppression in V1 neurons.
Conclusion
UTA : CSE6363 : Machine Learning Anshu Dipit / Likitha Seeram
2 optimization problems over 2 subset of variables.
Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data
Digit Recognition. Features capture significant properties of the digits
Sparse coding can be applied for learning over complete basis, in which the number of bases is greater than the input dimension.
Beta is constant. Assuming uniform prior on basis. This objective is to iteratively optimize by alternatingly optimizing w.r.t B and S while holding the other constant.
LARS – Least Angle Regression, Chen et al’s interior point method. Relative error calculation – f obj is final objective value attained by the algorithm and f* is best objective value attained from among all the algorithms
As a result, we can see that Lagrange dual was much faster than gradient descent with projections
1024 bases in 2 hours. 2000 bases in 10 hours. This is not possible using gradient descent method of basis learning.
V1 neurons – primary visual cortex
Paper link - http://ai.stanford.edu/~hllee/nips06-sparsecoding.pdf