Sparse and Redundant Representations: Theory and Applications

Sparse and Redundant
Representations in Signal Processing

Aggelos K. Katsaggelos

AT&T Chaired Professor
Northwestern University
Department of EECS
Director Motorola Center for Seamless Communications
Department of Linguistics
NorthSide University Hospital System
Argonne National Laboratory
Evanston, IL 60208
www.ece.northwestern.edu/~aggk

2nd Greek Signal Processing Jam, Thessaloniki, May 17, 2012

Talk Outline
• Underdetermined Linear Systems and Sparsity
• Processing of Sparsely‐Generated Signals
– Compressive Sensing
– Video Indexing and Retrieval
– Recommendation Systems (Matrix Completion)
– Robust PCA
• Final Thoughts

Underdetermined Linear Systems

• Problem formulation
Ax = b A is n £ m; n < m A full rank

• Solution approach: Regularization

• Choices of
Unique solution, strictly convex function

More than one solutions, convex function

Even if infinitely many solutions, there exists at least
one with at most n non‐zeros – sparse solution

Promoting Sparse Solutions

• As we move from to we promote sparser
solutions
• Do norms with p<1 (no formal norms) lead to
sparser solutions?
• 0<p<1 non‐convex optimization

Sparsifying Norms

Elad, Sparse and Redundant Representations, Springer, 2010

Promoting Sparse Solutions

minimizekxk0 such that y = Ax

• norm (p=0); extreme among all sparsifying norms;
combinatorial search, problem NP‐hard
• Under what conditions can uniqueness of solution
be claimed?
• Can we perform a simple test to verify that an
available candidate solution is a global minimizer?
(answers through coherence, sparc, and RIP)

Signal Processing Perspective
• Finding sparse solutions to underdetermined linear systems is
a better‐behaved problem
• A much more practical and relevant notion than we might
have thought of a few years ago
• Many media types can be sparsely represented
• Signal representation problem: given a dictionary A find a
single representation among the many possible ones for b
• With the norm both the forward transform (from b to x)
and the inverse transform (from x to b) are linear
• With the norm the inverse transform is linear but the
forward is highly non‐linear

Processing of Sparsely‐Generated
Signals
minimize kxk0
subject to k y ¡ Ax k2 < ²

• Compressed sensing
• Analysis (atomic decomposition)
• Compression
• Denoising
• Inverse problems (deblurring, SR)
• Morphological component analysis (inpainting)
• Sparsity‐based recognition
• Sparse‐modeling image classification
• Computational Photography

Dictionaries for Sparse Representation
• ON basis vs overcomplete dictionaries
• Choice of sparsifying dictionary critical. Based on
– Mathematical modeling of data
(e.g., wavelets, wavelet packets and curvelets)
– Training data
• Given A, find X (sparse coding)
• Design dictionary for sparse representation (solve for A
and X simultaneously)
• Sparse modeling for image classification (add
discriminative terms to the above formulation)
• Learning to sense (solve for A, X and S – sensing matrix)

Learning Restoration Approach

R. Nakagaki and A. K. Katsaggelos, "A VQ‐Based Blind Image Restoration Algorithm,"
IEEE Trans. Image Processing, vol.12, no.9, pp. 1044‐1053, Sept. 2003.

Learning Restoration Approach

A Compressive Sensing System

Compressive Data Acquisition
• When data is sparse/compressible, can directly acquire a
condensed representation with no/little information loss
• Random projection will work

Candes-Remberg-Tao, Donoho, 2004

Universality
• Random measurements can be used for signals sparse in any
basis

Millimeter‐Wave Radiometry
Usefulness of millimeter waves:
Atmospheric Propagation: Millimeter Wave Radiation is
attenuated millions of times less in clouds, fog, smoke,
snow and sandstorms than visible or IR radiation.
Yujiri L. et al. 2006 “Passive Millimeter Wave Imaging”
Differences in
emissivity of *
objects: Better
thermal contrast.

*Sub‐millimeter Wave λ ~[0.3 ‐ 1]mm

Advantages
Provide target information under all weather conditions.
Visible and IR require clear atmospheric conditions for reliable operation.
Offer better thermal contrast of objects:
Emissivity differences of objects at these wavelengths.
Reflectivity variations of common objects for millimeter waves: (metal ~ 1,
water 0.6 and concrete 0.2).
Minimally affected by sun or artificial illumination:
Day and night application.

Atmospheric Attenuation Apparent Temperature
(drizzle and fog) (sky at 94 GHz)
PMMW 0.07 ~ 3 dB/km 70 K
Visible and IR 100 dB/km 220 K
Gopalsami et al. 2010 “Passive Millimeter Wave Imaging and Spectroscopy System for Terrestrial Remote
Sensing”

Passive Millimeter Wave Imagers
Main types of Passive Millimeter Wave Imagers:
Single Detector or Single Pixel Imager.
– Allows for the use of only one detector.
– Not practical for real time imaging due to the point‐
by‐point required scanning.

Array of Detectors (similar to CCD or CMOS optical
imagers).
– Suitable for real time imaging.
– Complex and expensive at mm wavelengths

Lens Scanning Imaging System
at ANL

• Dicke‐switched radiometer
• 16 Channel, each 0.5 GHz BW, spanning 146 to 154 GHz
• 6 inch imaging lens

PMMW Image of Outdoor Scene
Upper part of car body looks cooler due to cold‐sky‐reflected radiation.
The lower part of the car looks hotter from ground‐reflected radiation.
R

Compressive Sensing System
6 inch imaging lens
Dicke‐switched Radiometer

Reconstruction
Mask and Super‐Resolution

15 Channel radiometer, each 0.5 GHz bandwidth, spanning 146 to 154 GHz.
Neither the lens nor the radiometer antenna are scanned (thus avoiding
cable noise) but a coded aperture mask is scanned at the focal plane of the
lens to produce a set of coded aperture images

Compressive Sensing System
Without Mask With Mask

Imaging and Spectroscopy Compressive Sensing Imaging System
System
Gopalsami et al. 2009 “Passive Millimeter Babacan et al. 2011 “Compressive Passive Millimeter-
Wave Imaging and Spectroscopy System for Wave Imaging”
Terrestrial Remote Sensing”
Gopalsami et al. 2011 “Compressive Sampling in
passive millimeter-wave imaging”

Sparse Representation
Passive Millimeter Wave (PMMW) Images are
smooth and they contain no texture information.
If we ignore noise, gradient sparsity is expected to be
higher than in natural images.

Car Image

Scissors Image

A Bayesian Compressive Sensing
Algorithm
An empirical Bayesian formulation is used, inference is based on an
approximation of the posterior distribution.
We assume Gaussian noise; the forward model is given by:
μ ¶
N=2 ¯
p (yjx; ¯) = ¯ exp ¡ ky ¡ ©xk2
2
Due to the ill‐posedness of the inverse problem, it is necessary to use a
priori information about the unknown image x.
In CS, a requirement for successful reconstruction is that the signal is
compressible in some basis, i.e., a basis exists in which the signal can be
well represented using a small number of non‐zero coefficients.

Algorithm
For images, high spatial frequencies are represented by edges.
Hence, it can be assumed that the output of a high pass filter is sparse.
This knowledge is modeled using the following image prior:
L
Ã L
!
X 1 X
p(xjA) / j DT ADk j¡1=2 exp ¡
k xT DT ADk x
k
2
k=1 k=1

Dk: High frequency filter matrices (2 horizontal, 2 vertical, 2 diagonal)
A: Diagonal covariance matrix with a variance parameter for each pixel:
A = diag (®i ) i = 1; : : : ; N
Following a fully Bayesian approach, we assign gamma priors to the
¯ ;®
hyperparameters: i

Algorithm
We use the evidence procedure and approximate the posterior by  xxxxx
p (x; A; ¯jy) = p (xjy; A; ¯) p (A; ¯jy)
The first distribution is found to be Gaussian:

p (xjy; A; ¯) = N (xj¹x ; §x )
N
X
¹x = §x ¯©T y §¡1 = ¯©T © +
x Dk ADk
k=1

We can also maximize                                             y) with respect to       and
p (A; ¯jy) / p (A; ¯; A
A
¯ ; the maximizer for       is given by:
1+ 2(a0
¡ 1) Xh
L
¡ ¢ i
®i = ®
vi = (Dk ¹x )2 + DT Dk §x
i k ii
vi + 2b0
® k=1

Reconstruction Results –
Comparison
Comparison with state of the art algorithm that solves the min‐TV problem with
quadratic constraints:
subject to
Original Image
Bayesian

10% 30% 50% 70% 90%
min-TV

10% 30% 50% 70% 90%

Reconstruction Results –
Comparison
Comparison with state of the art algorithm that solves the min‐TV problem with
quadratic constraints:
subject to

Results – Algorithm Comparison
Reconstruction Comparison (Gaussian) Reconstruction Comparison (Binary)
40 40
Proposed Bayesian Method
38 Bayesian TVAL3

PSNR range between experiments
TVAL3 35
36
l1MAGIC
34 NESTA
30
32
PSNR

30
25
28

26
20
24

22
15
20
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
#Measurements/#Pixels #Measurements/#Pixels

The proposed method outperforms the others.
It is more robust to measurement matrix selection.

MSE Comparison
PSNR Comparison
40
Bayesian (eq - spaced)
Bayesian (random) - mean of 10 experiments
TVAL3 (random) - mean of 10 experiments
35

30
PSNR

25

20

15
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
#Measurements/#Pixels

Sparse Representation for Video
Indexing and Retrieval

Motivation – QBE Case

Content Provider Video DB

Network
Located full size program

Query by Example
from a 5-sec QCIF video
Highlights

Mobile

TV Set

Luminance Field Trace (LUFT)

PCA
+
scale

• Scaling to a common spatial scale,
101
201 for example,11x9 for noise reduction
1
and handling frame size variation

301 • PCA to identify the trace residing
subspace in R11x9.

“foreman” seq in 2D (1st and 2nd component) PCA space
L. Gao, Z. Li, and A.K. Katsaggelos, "An Efficient Video Indexing and Retrieval Algorithm using the Luminance Field,"
IEEE Trans. Circuits and Systems for Video Technology, vol. 19, issue 10, 1566‐1570, Oct. 2009.

Video Trace Examples
video as trace in PCA space with 1st,2nd and 3rd components

•“foreman” : 400
40 frames
30 •“stefan” : 300 frames
20 •“mother-daughter”:
10 300 frames
0 •“mixed”: 40 shots of
-10 60 frames each from
-20
40
randomly selected
20
200
250 sequences.
0
150
-20 100
50
-40 0

. “foreman” . “stefan” . “mother-daughter” . “mixed”

Indexing Scheme
• For large video
collections,
exhaustive
search is not
efficient

• Need to have
efficient Query clip
indexing Example of video traces of
scheme 50K frames from TRECVID

Top‐down Iterative Data
Partition Scheme
• Project data to the axis with the largest variance
• Split into Left and Right sets at median value
• Store cutting plane index and median value, as well as Min Bounding
Box (MBB) at each node (x2, v1)

R2
(x1, v2) R4
(x1, v2) (x1, v3)

(x1, v3)

R1
R3 R1 R2 R3 R4
(x2, v1)

Kd-Tree: L=2

• At retrieval time, query clip is traversing the tree by MBB intersections and
splits

Indexing Scheme
• Example:
luma space trace partition: L=12, d=2
600
– For 5 hours of video from
500

400
NIST TRECVID
300 – An index tree of 12 levels,
200
and 4096 leaf nodes level
MBBs are plotted. Each
x2

100

0

−100
node has about 132
−200 frames
−300
– Indexing space dimension
−400
0 200 400 600 800
x1
1000 1200 1400 1600
shown d=2

– Time to build this index: 530 sec on an 2.4GHz Celeron/256M
RAM Laptop in Matlab (not bad at all).

Query Clip Example

• A positive query example
– Query clip is localized
with a subset of leaf
nodes
– Then the query clip is
matched

Sparse Representation
• Video Query

• Database Ordering

• Problem Formulation

P. Ruiz et al, “Video Retrieval using Sparse Bayesian Reconstruction”, ICME, July 2011.

Bayesian Formulation
• Joint distribution

• Noise Model

• Hierarchical Laplace prior on x

S. D. Babacan, R. Molina, and A. K. Katsaggelos, "Bayesian Compressive Sensing using Laplace Prors,"
IEEE Transactions on Image Processing, vol. 19, issue 1, 53‐64, January 2010.

Retrieval Algorithm
• Gaussian

• Hyperparameter estimation

• Also greedy solution approach

Recommender Systems
• E‐commerce leaders have made recommender systems a salient part of
their websites
• RS are based on two strategies: content‐filtering and collaborative
filtering
• Content‐filtering approaches build product and user profiles which are
associated by programs; they rely on external information that may not be
available or easy to collect
• Collaborative filtering relies on past user behavior; it is domain free but
suffers from the cold start problem (inability to address the system’s new
products and users)
• Collaborative filtering is classified into neighborhood methods and latent
factor models
• Some of the most successful realizations of latent factor models are based
on matrix factorization

User‐Oriented Neighborhood

Y. Koren, R Bell, C. Volinsky, “Matrix Factorization Techniques for Recommender Systems, Computer, pp. 42‐49, Aug. 2009.

Matrix Factorization and
Completion
• Old problem
• Numerous applications
– Tracking and geolocation
– Inpainting
– System Identification
– Sensor Networks

Estimation of Low‐Rank Matrices

• General Problem
minimize rank(X)
subject to Y = f (X):
• Solution Approaches
minimize kXk¤
subject to Y = f (X);

• or
minimize kXk¤
subject to k Y ¡ f (X) k2 < ²;
F

Low‐Rank Modeling
• Parameterization of the unknown
X = ABT

where A is m £ r B is n £ r rank(X) = r · min(m; n)

• Problem formulation
minimize k A k2 + k B k2
F F
2 < ²:
subject to k Y ¡ f (X) kF

Bayesian Formulation
• Sum of outer products
k
X
X = ABT = a¢i b¢i T
i=1

• Achieve column sparsity in A and B, through prior modeling
k
Y
p(Aj°) = N (a¢i j0; °i I)
i=1
k
Y
p(Bj° ) = N (b¢i j0; °i I)]
i=1
D. Babacan, M. Luessi, R. Molina, and A. K. Katsaggelos, Sparse Bayesian Methods for Low-Rank Matrix
Estimation, to appear, IEEE Trans. on Signal Processing (also ICASSP 2011).

Bayesian Formulation (cont’ed)
• Alternatively

μ ¶ Ã k
!
1 T 1 X ¡1 2
p(Aj° ) / exp ¡ Tr(A ¡A) = exp ¡ °i ¾A;i ;
2 2
i=1
μ ¶ Ã k
!
1 T 1 X ¡1 2
p(Bj° ) / exp ¡ Tr(B ¡B) = exp ¡ °i ¾B;i ;
2 2
i=1

• Gamma Hyperprior on the variances

μ ¶a+1 μ ¶
1 b
p(°i ) / exp ¡ :
°i °i

Matrix Completion Problem
• Observation Model
Yij = Xij + Nij ; (i; j) 2 Ð,
Y = PÐ (X + N) ;

• Noise Model
Y ¡ ¢
¡1
p(YjA; B; ¯) = N Yij jXij ; ¯ ;
(i;j)2Ð

• Joint Distribution
p(Y; A; B; ° ; ¯) = p(YjA; B; ¯) p(Aj° ) p(Bj° )p(° ) p(¯) :

Bayesian Inference
• Latent variables
z = (A; B; °; ¯)

• Posterior of each latent variable
log q(zk ) = h log p(Y; z)iznzk + const;

Estimation of A

• Posterior of the i‐th row of A

q(ai¢ ) = N (ai¢ jhai¢ i; §a ) ;
i

T ¡ ¢¡1
hai¢ i = h¯i §a hBi iT
i
T
yi¢ ; §a
i = h¯i hBT Bi i
i +¡
X X ³ ´
hBT Bi i
i = hbj¢ T bj¢ i = hbj¢ T ihbj¢ i + §b ;
j
j:(i;j)2Ð j:(i;j)2Ð

Estimation of B

• Posterior of the j‐th row of B
³ ´
q(bj¢ ) = N bj¢ jhbj¢ i; §b
j

T T ¡ ¢¡1
hbj¢ i = h¯i §b
j hAj i y¢j ; §b
j = h¯i hAT Aj i
j +¡ ;

Estimation of hyperparameters

μ ¶a+1+ m+n μ T a i + hb T b i ¶
1 2 2b + ha¢i ¢i ¢i ¢i
q(°i ) / exp ¡
°i 2°i

2b + ha¢i T a¢i i + hb¢i T b¢i i
h°i i = :
2a + m + n
T
X¡ ¢
T
ha¢i a¢i i = ha¢i i ha¢i i + §a ii ;
j
j
X³ ´
hb¢i T b¢i i = hb¢i iT hb¢i i + §b j :
ii
j
pmn
h¯i = T ) k2 i
:
h k Y ¡ PÐ (AB F

Robust PCA
• Observation Model
Y =X+E+N
• Noise Model μ ¶
¡ T ¡1
¢ ¯
p(YjA; B; E; ¯) = N YjAB + E; ¯ I / exp k Y ¡ ABT ¡ E k2
F
2
m n
YY ³ ´
p(Ej®) = N Eij j0; ®¡1 ;
ij
i=1 j=1

p(®ij ) = const; 8i; j :
• Joint Distribution
p(Y; A; B; E; °; ®; ¯) = p(YjA; B; E; ¯) p(Aj°) p(Bj°) p(Ej®) p(°) p(®) p(¯)

Estimation of E

¡ E
¢
q(Eij ) = N Eij jhEij i; §ij ;

hEij i = h¯i §E (Yij ¡ hai¢ ihbj¢ iT ) ;
ij

1
§E =
ij :
h¯i + h®ij i

Final Comments
• Sparsity is a new and powerful concept for a
number of image processing, computer vision,
pattern recognition, machine learning, and
communication problems
• In most cases, large amounts of scale/data
problems are encountered and improved
computational approaches are needed
• Advances on both theoretical and application
fronts

Current Collaborators
• University of Granada
– Prof. Rafael Molina
– Prof. Javier Mateos
– Publo Ruiz, PhD student
• Northwestern University
– Derin Babacan, ex‐PhD student, UIUC
– Zhu Li, ex‐PhD student, Huawei
– Li Gao, PhD student
– Bruno Azimic, PhD student
– Martin Luessi, ex‐PhD student, Harvard Medical
– Leonidas Spinoulas, PhD student
– Michael Iliadis, PhD student

Sparse and Redundant Representations: Theory and Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Sparse and Redundant Representations: Theory and Applications

Similar to Sparse and Redundant Representations: Theory and Applications (20)

More from Distinguished Lecturer Series - Leon The Mathematician

More from Distinguished Lecturer Series - Leon The Mathematician (9)

Recently uploaded

Recently uploaded (20)

Sparse and Redundant Representations: Theory and Applications