Aggelos Katsaggelos, Professor and AT&T Chair, Northwestern University, Department of Electrical Engineering & Computer Science (IEEE/ SPIE Fellow, IEEE SPS DL), Sparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and Applications
1. Sparse and Redundant
Representations in Signal Processing
Aggelos K. Katsaggelos
AT&T Chaired Professor
Northwestern University
Department of EECS
Director Motorola Center for Seamless Communications
Department of Linguistics
NorthSide University Hospital System
Argonne National Laboratory
Evanston, IL 60208
www.ece.northwestern.edu/~aggk
2nd Greek Signal Processing Jam, Thessaloniki, May 17, 2012
3. Underdetermined Linear Systems
• Problem formulation
Ax = b A is n £ m; n < m A full rank
• Solution approach: Regularization
• Choices of
Unique solution, strictly convex function
More than one solutions, convex function
Even if infinitely many solutions, there exists at least
one with at most n non‐zeros – sparse solution
5. Sparsifying Norms
Elad, Sparse and Redundant Representations, Springer, 2010
6. Promoting Sparse Solutions
minimizekxk0 such that y = Ax
• norm (p=0); extreme among all sparsifying norms;
combinatorial search, problem NP‐hard
• Under what conditions can uniqueness of solution
be claimed?
• Can we perform a simple test to verify that an
available candidate solution is a global minimizer?
(answers through coherence, sparc, and RIP)
7. Signal Processing Perspective
• Finding sparse solutions to underdetermined linear systems is
a better‐behaved problem
• A much more practical and relevant notion than we might
have thought of a few years ago
• Many media types can be sparsely represented
• Signal representation problem: given a dictionary A find a
single representation among the many possible ones for b
• With the norm both the forward transform (from b to x)
and the inverse transform (from x to b) are linear
• With the norm the inverse transform is linear but the
forward is highly non‐linear
18. Advantages
Provide target information under all weather conditions.
Visible and IR require clear atmospheric conditions for reliable operation.
Offer better thermal contrast of objects:
Emissivity differences of objects at these wavelengths.
Reflectivity variations of common objects for millimeter waves: (metal ~ 1,
water 0.6 and concrete 0.2).
Minimally affected by sun or artificial illumination:
Day and night application.
Atmospheric Attenuation Apparent Temperature
(drizzle and fog) (sky at 94 GHz)
PMMW 0.07 ~ 3 dB/km 70 K
Visible and IR 100 dB/km 220 K
Gopalsami et al. 2010 “Passive Millimeter Wave Imaging and Spectroscopy System for Terrestrial Remote
Sensing”
19. Passive Millimeter Wave Imagers
Main types of Passive Millimeter Wave Imagers:
Single Detector or Single Pixel Imager.
– Allows for the use of only one detector.
– Not practical for real time imaging due to the point‐
by‐point required scanning.
Array of Detectors (similar to CCD or CMOS optical
imagers).
– Suitable for real time imaging.
– Complex and expensive at mm wavelengths
20. Lens Scanning Imaging System
at ANL
• Dicke‐switched radiometer
• 16 Channel, each 0.5 GHz BW, spanning 146 to 154 GHz
• 6 inch imaging lens
22. Compressive Sensing System
6 inch imaging lens
Dicke‐switched Radiometer
Reconstruction
Mask and Super‐Resolution
15 Channel radiometer, each 0.5 GHz bandwidth, spanning 146 to 154 GHz.
Neither the lens nor the radiometer antenna are scanned (thus avoiding
cable noise) but a coded aperture mask is scanned at the focal plane of the
lens to produce a set of coded aperture images
23. Compressive Sensing System
Without Mask With Mask
Imaging and Spectroscopy Compressive Sensing Imaging System
System
Gopalsami et al. 2009 “Passive Millimeter Babacan et al. 2011 “Compressive Passive Millimeter-
Wave Imaging and Spectroscopy System for Wave Imaging”
Terrestrial Remote Sensing”
Gopalsami et al. 2011 “Compressive Sampling in
passive millimeter-wave imaging”
25. Compressive Sensing System
6 inch imaging lens
Dicke‐switched Radiometer
Reconstruction
Mask and Super‐Resolution
15 Channel radiometer, each 0.5 GHz bandwidth, spanning 146 to 154 GHz.
Neither the lens nor the radiometer antenna are scanned (thus avoiding
cable noise) but a coded aperture mask is scanned at the focal plane of the
lens to produce a set of coded aperture images
28. A Bayesian Compressive Sensing
Algorithm
For images, high spatial frequencies are represented by edges.
Hence, it can be assumed that the output of a high pass filter is sparse.
This knowledge is modeled using the following image prior:
L
à L
!
X 1 X
p(xjA) / j DT ADk j¡1=2 exp ¡
k xT DT ADk x
k
2
k=1 k=1
Dk: High frequency filter matrices (2 horizontal, 2 vertical, 2 diagonal)
A: Diagonal covariance matrix with a variance parameter for each pixel:
A = diag (®i ) i = 1; : : : ; N
Following a fully Bayesian approach, we assign gamma priors to the
¯ ;®
hyperparameters: i
30. Reconstruction Results –
Comparison
Comparison with state of the art algorithm that solves the min‐TV problem with
quadratic constraints:
subject to
Original Image
Bayesian
10% 30% 50% 70% 90%
min-TV
10% 30% 50% 70% 90%
31. Reconstruction Results –
Comparison
Comparison with state of the art algorithm that solves the min‐TV problem with
quadratic constraints:
subject to
32. Results – Algorithm Comparison
Reconstruction Comparison (Gaussian) Reconstruction Comparison (Binary)
40 40
Proposed Bayesian Method
38 Bayesian TVAL3
PSNR range between experiments
TVAL3 35
36
l1MAGIC
34 NESTA
30
32
PSNR
30
25
28
26
20
24
22
15
20
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
#Measurements/#Pixels #Measurements/#Pixels
The proposed method outperforms the others.
It is more robust to measurement matrix selection.
33. MSE Comparison
PSNR Comparison
40
Bayesian (eq - spaced)
Bayesian (random) - mean of 10 experiments
TVAL3 (random) - mean of 10 experiments
35
30
PSNR
25
20
15
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
#Measurements/#Pixels
35. Motivation – QBE Case
Content Provider Video DB
Network
Located full size program
Query by Example
from a 5-sec QCIF video
Highlights
Mobile
TV Set
36. Luminance Field Trace (LUFT)
PCA
+
scale
• Scaling to a common spatial scale,
101
201 for example,11x9 for noise reduction
1
and handling frame size variation
301 • PCA to identify the trace residing
subspace in R11x9.
“foreman” seq in 2D (1st and 2nd component) PCA space
L. Gao, Z. Li, and A.K. Katsaggelos, "An Efficient Video Indexing and Retrieval Algorithm using the Luminance Field,"
IEEE Trans. Circuits and Systems for Video Technology, vol. 19, issue 10, 1566‐1570, Oct. 2009.
37. Video Trace Examples
video as trace in PCA space with 1st,2nd and 3rd components
•“foreman” : 400
40 frames
30 •“stefan” : 300 frames
20 •“mother-daughter”:
10 300 frames
0 •“mixed”: 40 shots of
-10 60 frames each from
-20
40
randomly selected
20
200
250 sequences.
0
150
-20 100
50
-40 0
. “foreman” . “stefan” . “mother-daughter” . “mixed”
38. Indexing Scheme
• For large video
collections,
exhaustive
search is not
efficient
• Need to have
efficient Query clip
indexing Example of video traces of
scheme 50K frames from TRECVID
39. Top‐down Iterative Data
Partition Scheme
• Project data to the axis with the largest variance
• Split into Left and Right sets at median value
• Store cutting plane index and median value, as well as Min Bounding
Box (MBB) at each node (x2, v1)
R2
(x1, v2) R4
(x1, v2) (x1, v3)
(x1, v3)
R1
R3 R1 R2 R3 R4
(x2, v1)
Kd-Tree: L=2
• At retrieval time, query clip is traversing the tree by MBB intersections and
splits
40. Indexing Scheme
• Example:
luma space trace partition: L=12, d=2
600
– For 5 hours of video from
500
400
NIST TRECVID
300 – An index tree of 12 levels,
200
and 4096 leaf nodes level
MBBs are plotted. Each
x2
100
0
−100
node has about 132
−200 frames
−300
– Indexing space dimension
−400
0 200 400 600 800
x1
1000 1200 1400 1600
shown d=2
– Time to build this index: 530 sec on an 2.4GHz Celeron/256M
RAM Laptop in Matlab (not bad at all).
42. Sparse Representation
• Video Query
• Database Ordering
• Problem Formulation
P. Ruiz et al, “Video Retrieval using Sparse Bayesian Reconstruction”, ICME, July 2011.
43. Bayesian Formulation
• Joint distribution
• Noise Model
• Hierarchical Laplace prior on x
S. D. Babacan, R. Molina, and A. K. Katsaggelos, "Bayesian Compressive Sensing using Laplace Prors,"
IEEE Transactions on Image Processing, vol. 19, issue 1, 53‐64, January 2010.
47. Recommender Systems
• E‐commerce leaders have made recommender systems a salient part of
their websites
• RS are based on two strategies: content‐filtering and collaborative
filtering
• Content‐filtering approaches build product and user profiles which are
associated by programs; they rely on external information that may not be
available or easy to collect
• Collaborative filtering relies on past user behavior; it is domain free but
suffers from the cold start problem (inability to address the system’s new
products and users)
• Collaborative filtering is classified into neighborhood methods and latent
factor models
• Some of the most successful realizations of latent factor models are based
on matrix factorization
50. Matrix Factorization and
Completion
• Old problem
• Numerous applications
– Tracking and geolocation
– Inpainting
– System Identification
– Sensor Networks
51. Estimation of Low‐Rank Matrices
• General Problem
minimize rank(X)
subject to Y = f (X):
• Solution Approaches
minimize kXk¤
subject to Y = f (X);
• or
minimize kXk¤
subject to k Y ¡ f (X) k2 < ²;
F
53. Bayesian Formulation
• Sum of outer products
k
X
X = ABT = a¢i b¢i T
i=1
• Achieve column sparsity in A and B, through prior modeling
k
Y
p(Aj°) = N (a¢i j0; °i I)
i=1
k
Y
p(Bj° ) = N (b¢i j0; °i I)]
i=1
D. Babacan, M. Luessi, R. Molina, and A. K. Katsaggelos, Sparse Bayesian Methods for Low-Rank Matrix
Estimation, to appear, IEEE Trans. on Signal Processing (also ICASSP 2011).
54. Bayesian Formulation (cont’ed)
• Alternatively
μ ¶ Ã k
!
1 T 1 X ¡1 2
p(Aj° ) / exp ¡ Tr(A ¡A) = exp ¡ °i ¾A;i ;
2 2
i=1
μ ¶ Ã k
!
1 T 1 X ¡1 2
p(Bj° ) / exp ¡ Tr(B ¡B) = exp ¡ °i ¾B;i ;
2 2
i=1
• Gamma Hyperprior on the variances
μ ¶a+1 μ ¶
1 b
p(°i ) / exp ¡ :
°i °i
55. Matrix Completion Problem
• Observation Model
Yij = Xij + Nij ; (i; j) 2 Ð,
Y = PÐ (X + N) ;
• Noise Model
Y ¡ ¢
¡1
p(YjA; B; ¯) = N Yij jXij ; ¯ ;
(i;j)2Ð
• Joint Distribution
p(Y; A; B; ° ; ¯) = p(YjA; B; ¯) p(Aj° ) p(Bj° )p(° ) p(¯) :
57. Estimation of A
• Posterior of the i‐th row of A
q(ai¢ ) = N (ai¢ jhai¢ i; §a ) ;
i
T ¡ ¢¡1
hai¢ i = h¯i §a hBi iT
i
T
yi¢ ; §a
i = h¯i hBT Bi i
i +¡
X X ³ ´
hBT Bi i
i = hbj¢ T bj¢ i = hbj¢ T ihbj¢ i + §b ;
j
j:(i;j)2Ð j:(i;j)2Ð
59. Estimation of hyperparameters
μ ¶a+1+ m+n μ T a i + hb T b i ¶
1 2 2b + ha¢i ¢i ¢i ¢i
q(°i ) / exp ¡
°i 2°i
2b + ha¢i T a¢i i + hb¢i T b¢i i
h°i i = :
2a + m + n
T
X¡ ¢
T
ha¢i a¢i i = ha¢i i ha¢i i + §a ii ;
j
j
X³ ´
hb¢i T b¢i i = hb¢i iT hb¢i i + §b j :
ii
j
pmn
h¯i = T ) k2 i
:
h k Y ¡ PÐ (AB F
68. Final Comments
• Sparsity is a new and powerful concept for a
number of image processing, computer vision,
pattern recognition, machine learning, and
communication problems
• In most cases, large amounts of scale/data
problems are encountered and improved
computational approaches are needed
• Advances on both theoretical and application
fronts
69. Current Collaborators
• University of Granada
– Prof. Rafael Molina
– Prof. Javier Mateos
– Publo Ruiz, PhD student
• Northwestern University
– Derin Babacan, ex‐PhD student, UIUC
– Zhu Li, ex‐PhD student, Huawei
– Li Gao, PhD student
– Bruno Azimic, PhD student
– Martin Luessi, ex‐PhD student, Harvard Medical
– Leonidas Spinoulas, PhD student
– Michael Iliadis, PhD student