Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements
1. Fundamental Limits of Recovering Tree Sparse Vectors
from Noisy Linear Measurements
a`(1)
a`(2)
a`(5)
a`(3)
a`(4) a`(6) a`(7)EE-8500 Seminar
Akshay Soni
University of Minnesota
sonix022@umn.edu
(joint work with J. Haupt)
aupt
Minnesota
Computer Engineering
essive Imaging
al Learned Dictionaries
Supported by
6. 9me
Key Idea – Sparsity
frequency
Many signals exhibit sparsity in
the canonical or ‘pixel basis’
Communication signals often
have sparse frequency content
Natural images often have sparse
wavelet representationDWT
DFT
8. A Model for Sparse Signals
Union of Subspace Model
signal support
numbe
signal c
arse Signal Model
signal support set
number of nonzero
signal components
signal support set
number of no
signal compo
Signals of interest are vectors x 2 Rn
9. Structured Sparsity
Tree Sparsity in Wavelets Grid Sparsity in Networks Graph Sparsity – background
subtraction
9
(a) Wavelet Tree Sparsity (b) Background Subtracted Image: Graph Sparsity
Figure 1.3: Structured sparsity. (a) The brain image has tree sparsity after wavelet transfor-
mation; (b) The background subtracted image has graph sparsity.
From above introductions, we know that there exists literature on structured sparsity, with
empirical evidence showing that one can achieve better performance by imposing additional
structures. However, none of the previous work was able to establish a general theoretical
framework for structured sparsity that can quantify its effectiveness. The goal of this thesis
is to develop such a general theory that addresses the following issues, where we pay special
attention to the benefit of structured sparsity over the standard non-structured sparsity:
• Quantifying structured sparsity;
• The minimal number of measurements required in compressive sensing;
• locations of nonzeros are inter-dependent
• structure knowledge can be used during sensing, inference or both
10. Structured Sparsity
Our focus – Tree Structured Sparsity!
Tree Sparsity in Wavelets Grid Sparsity in Networks Graph Sparsity – background
subtraction
9
(a) Wavelet Tree Sparsity (b) Background Subtracted Image: Graph Sparsity
Figure 1.3: Structured sparsity. (a) The brain image has tree sparsity after wavelet transfor-
mation; (b) The background subtracted image has graph sparsity.
From above introductions, we know that there exists literature on structured sparsity, with
empirical evidence showing that one can achieve better performance by imposing additional
structures. However, none of the previous work was able to establish a general theoretical
framework for structured sparsity that can quantify its effectiveness. The goal of this thesis
is to develop such a general theory that addresses the following issues, where we pay special
attention to the benefit of structured sparsity over the standard non-structured sparsity:
• Quantifying structured sparsity;
• The minimal number of measurements required in compressive sensing;
• locations of nonzeros are inter-dependent
• structure knowledge can be used during sensing, inference or both
12. Tree Structured Sparsity – Why?
Wavelets!
• Tree sparsity naturally arises in the wavelet
coefficients of many signals
• for e.g. natural images
• Several prior efforts that examined wavelet tree
structure specialized sensing techniques
• for e.g. in dynamic MRI [*] and compressive
imaging [**]
• Previous work was either experimental or
analyzed only in noise-free settings
[*]
L.
P.
Panych
and
F.
A.
Jolesz,
“A
dynamically
adap9ve
imaging
algorithm
for
wavelet-‐encoded
MRI,”
Magne9c
Resonance
in
Medicine,
vol.
32,
no.
6,
pp.
738–748,
1994.
[**]
M.
W.
Seeger
and
H.
Nickisch,
“Compressed
sensing
and
Bayesian
experimental
design,”
in
Proc.
ICML,
2008,
pp.
912–919.
[**]
S.
Deutsch,
A.
Averbuch,
and
S.
Dekel,
“Adap9ve
compressed
image
sensing
based
on
wavelet
modeling
and
direct
sampling,”
in
Proc.
Intl.
Conf
on
Sampling
Theory
and
Applica9ons,
2009.
15. Sensing Strategies
Non-Adaptive Sensing Adaptive Sensing
• j-th measurement vector aj is a function of {al, yl}j 1
l=1
for each j = 2, 3, . . . , m.
Measurement
vectors
y
y1
y2
yj
ym
18. -- Adaptive Sensing of Tree-Sparse Signals --
A Simple Algorithm with Guarantees
19. Few Tree Specifics
• Signal components are coefficients in an
orthonormal representation (canonical basis
without loss of generality)
• We consider binary trees (all results may be
extended to trees with any degree)
1
52
3 4 6 7
23. Tree Structured Adaptive Support Recovery
1
5
2
3
4
6
7
Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
24. Tree Structured Adaptive Support Recovery
1
5
2
3
4
6
7
Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
(can also measure each location r 1 times
and average to reduce e↵ective noise)
25. Theorem (2011 & 2013): AS & J. Haupt
Tree Structured Adaptive Support Recovery
1
5
2
3
4
6
7
Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
Choose any 2 (0, 1) and set ⌧ =
p
2 2 log(4k/ ). If the signal x being acquired
by our procedure is k-tree sparse, and the nonzero components of x satisfy
|xi|
s
24
1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,
for every i 2 S(x), then with probability at least 1 , a “repeated measure-
ment” variant of algorithm to the left that acquires r measurements at each
observed location terminates after collecting m r(2k + 1) measurements, and
produces support estimate ˆS satisfying ˆS = S(x)
26. Question:
Can any other “smart” scheme recover support of a tree-
sparse signal having “significantly” smaller magnitude?
i.e., is this the best one can hope for?
Theorem (2011 & 2013): AS & J. Haupt
Tree Structured Adaptive Support Recovery
1
5
2
3
4
6
7
Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
Choose any 2 (0, 1) and set ⌧ =
p
2 2 log(4k/ ). If the signal x being acquired
by our procedure is k-tree sparse, and the nonzero components of x satisfy
|xi|
s
24
1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,
for every i 2 S(x), then with probability at least 1 , a “repeated measure-
ment” variant of algorithm to the left that acquires r measurements at each
observed location terminates after collecting m r(2k + 1) measurements, and
produces support estimate ˆS satisfying ˆS = S(x)
28. The Big Picture: Minimum Signal Amplitudes for ESR
Let’s identify necessary conditions for ESR in
each case…
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
29. The Big Picture:
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
[*]
S.
Aeron,
V.
Saligrama,
and
M.
Zhao,
"Informa9on
Theore9c
Bounds
for
Compressed
Sensing,"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.56,
no.10,
pp.5111-‐5130,
2010
[*]
M.
J.
Wainwright,
”Sharp
thresholds
for
high-‐dimensional
and
noisy
sparsity
recovery
using
l1-‐constrained
quadra9c
programming
(lasso),
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.55,
no.5,
pp.2183-‐2202,
2009
[*]
M.
J.
Wainwright,
”Informa9on-‐theore9c
limita9ons
on
sparsity
recovery
in
the
high-‐dimensional
and
noisy
sehng,
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.55,
no.12,
2009
[*]
W.
Wang,
M.
J.
Wainwright
and
K.
Ramchandran,
”Informa9on-‐theore9c
limits
on
sparse
signal
recovery:
Dense
versus
sparse
measurement
matrices,
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.56,
no.6,
pp.2967-‐2979,
2010
30. The Big Picture:
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
[*]
S.
Aeron,
V.
Saligrama,
and
M.
Zhao,
"Informa9on
Theore9c
Bounds
for
Compressed
Sensing,"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.56,
no.10,
pp.5111-‐5130,
2010
[*]
M.
J.
Wainwright,
”Sharp
thresholds
for
high-‐dimensional
and
noisy
sparsity
recovery
using
l1-‐constrained
quadra9c
programming
(lasso),
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.55,
no.5,
pp.2183-‐2202,
2009
[*]
M.
J.
Wainwright,
”Informa9on-‐theore9c
limita9ons
on
sparsity
recovery
in
the
high-‐dimensional
and
noisy
sehng,
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.55,
no.12,
2009
[*]
W.
Wang,
M.
J.
Wainwright
and
K.
Ramchandran,
”Informa9on-‐theore9c
limits
on
sparse
signal
recovery:
Dense
versus
sparse
measurement
matrices,
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.56,
no.6,
pp.2967-‐2979,
2010
uncompressed or
compressed
31. The Big Picture:
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
[*]
M.
Malloy
and
R.
Nowak,
“Sequen9al
analysis
in
high-‐dimensional
mul9ple
tes9ng
and
sparse
recovery,”
in
Proc.
IEEE
Intl.
Symp.
on
Informa9on
Theory,
2011,
pp.
2661-‐2665.
Adaptivity may at best improve log(n) to log(k)!
33. Signal Model:
Sensing Strategies:
Observations:
1
52
3 4 6 7
{Am, ym} : short hand for {aj, yj}m
j=1
Notations:
Adaptive : aj depends on {al, yl}j 1
l=1 , subject to constraint kajk2
2 = 1 8 j
Support estimate:
amplitude
parameter (>=0) Set of all k-node
rooted sub-trees
(in underlying tree)
Non Adaptive : here Gaussian; row aj of A is independent and aj ⇠ N(0, I/n)
Mm : class of all adaptive (or non-adaptive) sensing strategies based on m measurements
a mapping from observations ! subset of {1, 2, . . . , n}
34. (Maximum) Risk of a support estimator:
Element whose support
is most difficult to estimate
Minimax Risk:
Our aim – quantify errors corresponding to these
hard cases!
Preliminaries:
for estimators and sensing strategies M 2 M
In words, error of the best estimator when estimating the support of the “most di cult”
If R⇤
Xµ,k,M > 0 =) regardless of and M 2 M, we have at least one signal x 2 Xµ,k for
Note
In words, worst-case performance of when estimating the “most di cult”
36. Theorem (2013): AS & J. Haupt
Non-Adaptive Tree-Structured Sensing – fundamental limits
Implications: no uniform guarantees can be made for any estimation
procedure for recovering the support of tree-sparse signals when
signal amplitude is “too small”.
For ESR with non-adaptive sensing a necessary condition is:
37. The Big Picture:
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
[*]
AS
and
J.
Haupt,
“On
the
Fundamental
Limits
of
Recovering
Tree
Sparse
Vectors
from
Noisy
Linear
Measurement,”
IEEE
Transac9ons
on
Informa9on
Theory,
2013
(accepted
for
publica9on).
38. The Big Picture:
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
Same necessary conditions as for adaptive + unstructured!
Structure or Adaptivity in isolation may at best improve
log(n) to log(k)
[*]
AS
and
J.
Haupt,
“On
the
Fundamental
Limits
of
Recovering
Tree
Sparse
Vectors
from
Noisy
Linear
Measurement,”
IEEE
Transac9ons
on
Informa9on
Theory,
2013
(accepted
for
publica9on).
39. Proof Idea – Non-Adaptive + Tree-Sparse
Restrict to a “Smaller Set”:
Convert to a Multiple-Hypothesis testing problem:
We can get a lower bound on minimax risk over a smaller subset of signals!
minimax prob. of error for
multiple hypothesis testing problem
Introduc9on
to
Nonparametric
Es9ma9on
–
A.B.
Tsybokov
sup
x2Xµ,k
Prx ( (Am, ym; M) 6= S(x)) sup
x2X0
µ,k
Prx ( (Am, ym; M) 6= S(x))
For any X0
µ,k ✓ Xµ,k,
=)
• get lower bound on pe,L using Fano’s inequality (or similar ideas)
41. Theorem (2013): AS & J. Haupt
Adaptive Tree-Structured Sensing – fundamental limits
For ESR with non-adaptive sensing a necessary condition is:
Proof Idea: this problem is as hard as recovering the location of one
nonzero given all other k-1 nonzero locations.
42. The Big Picture:
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
[*]
AS
and
J.
Haupt,
“On
the
Fundamental
Limits
of
Recovering
Tree
Sparse
Vectors
from
Noisy
Linear
Measurement,”
IEEE
Transac9ons
on
Informa9on
Theory,
2013
(accepted
for
publica9on).
43. Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
Recall, for our simple tree-structured adaptive algorithm the
sufficient condition for ESR was
which is only log(k) factor away from the lower bound.
We cannot do much better than the simple proposed algorithm!
µ
q
2 k
m log k,
The Big Picture:
44. Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
(when m > n)
Note: for adaptive + unstructured, our proof
ideas can show in case of m < n, a
necessary condition for ESR is
µ
q
2 n k+1
m
The Big Picture:
45. The Big Picture:
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
Related Works:
[*]
A.
Krishnamurthy,
J.
Sharpnack,
and
A.
Singh,
“Recovering
block-‐structured
ac9va9ons
using
compressive
measurements,”
Submi0ed
2012.
46. Question:
Can any other “smart” scheme recover support of a tree-
sparse signal having “significantly” smaller magnitude?
Theorem (2011 & 2013): AS & J. Haupt
Tree Structured Adaptive Support Recovery
1
5
2
3
4
6
7
Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
Choose any 2 (0, 1) and set ⌧ =
p
2 2 log(4k/ ). If the signal x being acquired
by our procedure is k-tree sparse, and the nonzero components of x satisfy
|xi|
s
24
1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,
for every i 2 S(x), then with probability at least 1 , a “repeated measure-
ment” variant of algorithm to the left that acquires r measurements at each
observed location terminates after collecting m r(2k + 1) measurements, and
produces support estimate ˆS satisfying ˆS = S(x)
47. Answer:
No! We’re within log(k) of minimax optimal
Question:
Can any other “smart” scheme recover support of a tree-
sparse signal having “significantly” smaller magnitude?
Theorem (2011 & 2013): AS & J. Haupt
Tree Structured Adaptive Support Recovery
1
5
2
3
4
6
7
Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
Choose any 2 (0, 1) and set ⌧ =
p
2 2 log(4k/ ). If the signal x being acquired
by our procedure is k-tree sparse, and the nonzero components of x satisfy
|xi|
s
24
1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,
for every i 2 S(x), then with probability at least 1 , a “repeated measure-
ment” variant of algorithm to the left that acquires r measurements at each
observed location terminates after collecting m r(2k + 1) measurements, and
produces support estimate ˆS satisfying ˆS = S(x)
51. MSE estimation implications
Unstructured + Non-Adaptive:
If the measurement matrix Am satisfies the norm constraint kAmk2
F m, then
we have minimax MSE bound
Unstructured + Adaptive:
infbx,M2Mna
supx:|S(x)|=k E
⇥
kbx(Am, ym; M) xk2
2
⇤
c 2 n
m k log n,
[
*
]
E.
J.
Cand`es
and
M.
A.
Davenport,
“How
well
can
we
es9mate
a
sparse
vector?,”
Applied
and
Computa9onal
Harmonic
Analysis,
vol.
34,
no.
2,
pp.
317–323,
2013
[
**
]
E.
Arias-‐Castro,
E.
J.
Candes,
and
M.
Davenport,
“On
the
fundamental
limits
of
adap9ve
sensing,”
Submi0ed,
2011,
online
at
arxiv.org/abs/1111.4646.
c > 0 is a constant. [ * ]
c0
> 0 is another constant. [ ** ]
52. MSE estimation implications
Unstructured + Non-Adaptive:
If the measurement matrix Am satisfies the norm constraint kAmk2
F m, then
we have minimax MSE bound
Unstructured + Adaptive:
c > 0 is a constant.
infbx,M2Mna
supx:|S(x)|=k E
⇥
kbx(Am, ym; M) xk2
2
⇤
c 2 n
m k log n,
c0
> 0 is another constant.
Tree Structured + Non-Adaptive:
Tree Structured + Adaptive:
53. MSE estimation implications
Unstructured + Non-Adaptive:
If the measurement matrix Am satisfies the norm constraint kAmk2
F m, then
we have minimax MSE bound
Unstructured + Adaptive:
c > 0 is a constant.
infbx,M2Mna
supx:|S(x)|=k E
⇥
kbx(Am, ym; M) xk2
2
⇤
c 2 n
m k log n,
c0
> 0 is another constant.
Tree-sparse + our adaptive procedure:
There exists a two-stage (support recovery followed by direct measurements)
adaptive compressive sensing procedure for k-tree sparse signals that produces,
from O(k) measurements, an estimate ˆx satisfying
kˆx xk2
2 = O
✓
2
✓
k
m
◆
k
◆
,
with high probability, provided the nonzero signal component amplitudes exceed
a constant times
q
2 k
m log k.
55. LASeR
Use Dictionary Learning and training data to learn tree-sparse representationsLearning Adaptive Sensing
!"#$%$%&'
(#)#'
*)"+,)+"-.'
*/#"0$)1'
2.#/34-'
*-%0$%&'
52*-6'
52*-67'5-#"%$%&'2.#/34-'*-%0$%&'6-/"-0-%)#38%0'
Example images (128 ⇥ 128)
52*-6'
52*-67'5-#"%$%&'2.#/34-'*-%0$%&'6-/"-0-%)#38%0'
(PICS) http://pics.psych.stir.ac.uk/
Example images (128 ⇥ 128)
Learn representation for 163 images from
Psychological Image Collection at Stirling
Wavelet Tree
Sensing
PCA
CS LASSO
CS Tree LASSO
LASeR
m = 50 m = 80m = 20
R = 128⇥128
32
“Sensing Energy”
Qualitative Results
Details
&
examples
of
LASeR
in
ac9on:
AS
and
J.
Haupt,
“Efficient
adap9ve
compressive
sensing
using
sparse
hierarchical
learned
dic9onaries,”
in
Proc.
Asilomar
Conf.
on
Signals,
Systems
and
Computers,
2011,
pp.
1250-‐1254.
Tree Elements
Present in Sparse
Representation
Original Image
56. Overall Taxonomy
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
Sufficient condition for ESR for our algorithm:
=)nearly optimal!!
µ
q
2 k
m log k
57. Overall Taxonomy
Non-Adaptive
Adaptive
Non-Adaptive
Adaptive
Unstructured
Unstructured
TreeSparse
TreeSparse
Thank You!
Akshay Soni
University of Minnesota
sonix022@umn.edu
Sufficient condition for ESR for our algorithm:
=)nearly optimal!!
µ
q
2 k
m log k