Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements

Fundamental Limits of Recovering Tree Sparse Vectors
from Noisy Linear Measurements
a`(1)
a`(2)
a`(5)
a`(3)
a`(4) a`(6) a`(7)EE-8500 Seminar
Akshay Soni
University of Minnesota
sonix022@umn.edu
(joint work with J. Haupt)
aupt
Minnesota
Computer Engineering
essive Imaging
al Learned Dictionaries
Supported by

Integral to Science, Engineering, Discovery

Inevitable Data Deluge!
The Economist, February 2010

Novel Sensing Architectures
20 %
sample
40 %
sample
original
Single
Pixel
Images
-‐-‐
h0p://dsp.rice.edu/cscamera

9me

Key Idea – Sparsity
frequency

Many signals exhibit sparsity in
the canonical or ‘pixel basis’
Communication signals often
have sparse frequency content
Natural images often have sparse
wavelet representationDWT
DFT

-- Background --
Sparsity and Structured Sparsity

A Model for Sparse Signals
Union of Subspace Model
signal support
numbe
signal c
arse Signal Model
signal support set
number of nonzero
signal components
signal support set
number of no
signal compo
Signals of interest are vectors x 2 Rn

Structured Sparsity
Tree Sparsity in Wavelets Grid Sparsity in Networks Graph Sparsity – background
subtraction
9
(a) Wavelet Tree Sparsity (b) Background Subtracted Image: Graph Sparsity
Figure 1.3: Structured sparsity. (a) The brain image has tree sparsity after wavelet transfor-
mation; (b) The background subtracted image has graph sparsity.
From above introductions, we know that there exists literature on structured sparsity, with
empirical evidence showing that one can achieve better performance by imposing additional
structures. However, none of the previous work was able to establish a general theoretical
framework for structured sparsity that can quantify its eﬀectiveness. The goal of this thesis
is to develop such a general theory that addresses the following issues, where we pay special
attention to the beneﬁt of structured sparsity over the standard non-structured sparsity:
• Quantifying structured sparsity;
• The minimal number of measurements required in compressive sensing;
• locations of nonzeros are inter-dependent
• structure knowledge can be used during sensing, inference or both

Structured Sparsity
Our focus – Tree Structured Sparsity!
Tree Sparsity in Wavelets Grid Sparsity in Networks Graph Sparsity – background
subtraction
9
(a) Wavelet Tree Sparsity (b) Background Subtracted Image: Graph Sparsity
Figure 1.3: Structured sparsity. (a) The brain image has tree sparsity after wavelet transfor-
mation; (b) The background subtracted image has graph sparsity.
From above introductions, we know that there exists literature on structured sparsity, with
empirical evidence showing that one can achieve better performance by imposing additional
structures. However, none of the previous work was able to establish a general theoretical
framework for structured sparsity that can quantify its eﬀectiveness. The goal of this thesis
is to develop such a general theory that addresses the following issues, where we pay special
attention to the beneﬁt of structured sparsity over the standard non-structured sparsity:
• Quantifying structured sparsity;
• The minimal number of measurements required in compressive sensing;
• locations of nonzeros are inter-dependent
• structure knowledge can be used during sensing, inference or both

Tree Structured Sparsity
1
52
3 4 6 7
Characteristics of tree structure
1 2 3 4 5 6 7

Tree Structured Sparsity – Why?
Wavelets!
•  Tree sparsity naturally arises in the wavelet
coefficients of many signals
•  for e.g. natural images
•  Several prior efforts that examined wavelet tree
structure specialized sensing techniques
•  for e.g. in dynamic MRI [*] and compressive
imaging [**]
•  Previous work was either experimental or
analyzed only in noise-free settings
[*]
L.
P.
Panych
and
F.
A.
Jolesz,
“A
dynamically
adap9ve
imaging
algorithm
for
wavelet-‐encoded
MRI,”
Magne9c
Resonance
in
Medicine,
vol.
32,

no.
6,
pp.
738–748,
1994.

[**]
M.
W.
Seeger
and
H.
Nickisch,
“Compressed
sensing
and
Bayesian
experimental
design,”
in
Proc.
ICML,
2008,
pp.
912–919.

[**]
S.
Deutsch,
A.
Averbuch,
and
S.
Dekel,
“Adap9ve
compressed
image
sensing
based
on
wavelet
modeling
and
direct
sampling,”
in
Proc.
Intl.
Conf

on
Sampling
Theory
and
Applica9ons,
2009.

-- Sensing Sparse Signals --
Noisy Linear Measurement Model

Sensing Strategies
Non-Adaptive Sensing Adaptive Sensing
• j-th measurement vector aj is a function of {al, yl}j 1
l=1
for each j = 2, 3, . . . , m.
Measurement
vectors
y
y1
y2
yj
ym

Exact Support Recovery (ESR)
1 2 3 4 5 6 7
so that |xi| µ > 0, i 2 S,
Task of Interest:

Primary questions:
Exact Support Recovery (ESR)
1 2 3 4 5 6 7
so that |xi| µ > 0, i 2 S,
Task of Interest:

-- Adaptive Sensing of Tree-Sparse Signals --
A Simple Algorithm with Guarantees

Few Tree Specifics
•  Signal components are coefficients in an
orthonormal representation (canonical basis
without loss of generality)
•  We consider binary trees (all results may be
extended to trees with any degree)
1
52
3 4 6 7

Tree Structured Adaptive Support Recovery
1

5
2

3
4
6
7

1

5
2

3
4
6
7

Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}

1

5
2

3
4
6
7

Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
(can also measure each location r 1 times
and average to reduce e↵ective noise)

Theorem (2011 & 2013): AS & J. Haupt
1

5
2

3
4
6
7

Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
Choose any 2 (0, 1) and set ⌧ =
p
2 2 log(4k/ ). If the signal x being acquired
by our procedure is k-tree sparse, and the nonzero components of x satisfy
|xi|
s
24

1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,
for every i 2 S(x), then with probability at least 1 , a “repeated measure-
ment” variant of algorithm to the left that acquires r measurements at each
observed location terminates after collecting m  r(2k + 1) measurements, and
produces support estimate ˆS satisfying ˆS = S(x)

Question:
Can any other “smart” scheme recover support of a tree-
sparse signal having “significantly” smaller magnitude?
i.e., is this the best one can hope for?
1

5
2

3
4
6
7

Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
p
|xi|
s
24

1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,

-- Our Investigation in Context --
Fundamental Limits for ESR

The Big Picture: Minimum Signal Amplitudes for ESR
Let’s identify necessary conditions for ESR in
each case…
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

The Big Picture:
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

[*]
S.
Aeron,
V.
Saligrama,
and
M.
Zhao,
"Informa9on
Theore9c
Bounds
for
Compressed
Sensing,"
IEEE
Transac9ons
on
Informa9on
Theory,

vol.56,
no.10,
pp.5111-‐5130,
2010

[*]
M.
J.
Wainwright,
”Sharp
thresholds
for
high-‐dimensional
and
noisy
sparsity
recovery
using
l1-‐constrained
quadra9c
programming
(lasso),
"

IEEE
Transac9ons
on
Informa9on
Theory,
vol.55,
no.5,
pp.2183-‐2202,
2009

[*]
M.
J.
Wainwright,
”Informa9on-‐theore9c
limita9ons
on
sparsity
recovery
in
the
high-‐dimensional
and
noisy
sehng,
"
IEEE
Transac9ons
on

Informa9on
Theory,
vol.55,
no.12,

2009

[*]
W.
Wang,
M.
J.
Wainwright
and
K.
Ramchandran,
limits
on
sparse
signal
recovery:
Dense
versus
sparse

measurement
matrices,
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.56,
no.6,
pp.2967-‐2979,
2010

The Big Picture:
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

[*]
S.
Aeron,
V.
Saligrama,
and
M.
Zhao,
"Informa9on
Theore9c
Bounds
for
Compressed
Sensing,"
IEEE
Transac9ons
on
Informa9on
Theory,

vol.56,
no.10,
pp.5111-‐5130,
2010

[*]
M.
J.
Wainwright,
”Sharp
thresholds
for
high-‐dimensional
and
noisy
sparsity
recovery
using
l1-‐constrained
quadra9c
programming
(lasso),
"

IEEE
Transac9ons
on
Informa9on
Theory,
vol.55,
no.5,
pp.2183-‐2202,
2009

[*]
M.
J.
Wainwright,
limita9ons
on
sparsity
recovery
in
the
high-‐dimensional
and
noisy
sehng,
"
IEEE
Transac9ons
on

Informa9on
Theory,
vol.55,
no.12,

2009

[*]
W.
Wang,
M.
J.
Wainwright
and
K.
Ramchandran,
limits
on
sparse
signal
recovery:
Dense
versus
sparse

measurement
matrices,
"
IEEE
Transac9ons
on
Informa9on
Theory,
vol.56,
no.6,
pp.2967-‐2979,
2010

uncompressed or
compressed

The Big Picture:
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

[*]
M.
Malloy
and
R.
Nowak,
“Sequen9al
analysis
in
high-‐dimensional
mul9ple
tes9ng
and
sparse
recovery,”
in
Proc.
IEEE
Intl.
Symp.
on

Informa9on
Theory,
2011,
pp.
2661-‐2665.

Adaptivity may at best improve log(n) to log(k)!

-- Problem Formulation --
Tree-Sparse Model

Signal Model:
Sensing Strategies:
Observations:
1
52
3 4 6 7
{Am, ym} : short hand for {aj, yj}m
j=1
Notations:
Adaptive : aj depends on {al, yl}j 1
l=1 , subject to constraint kajk2
2 = 1 8 j
Support estimate:
amplitude
parameter (>=0) Set of all k-node
rooted sub-trees
(in underlying tree)
Non Adaptive : here Gaussian; row aj of A is independent and aj ⇠ N(0, I/n)
Mm : class of all adaptive (or non-adaptive) sensing strategies based on m measurements
a mapping from observations ! subset of {1, 2, . . . , n}

(Maximum) Risk of a support estimator:
Element whose support
is most difficult to estimate
Minimax Risk:
Our aim – quantify errors corresponding to these
hard cases!
Preliminaries:
for estimators and sensing strategies M 2 M
In words, error of the best estimator when estimating the support of the “most di cult”
If R⇤
Xµ,k,M > 0 =) regardless of and M 2 M, we have at least one signal x 2 Xµ,k for
Note
In words, worst-case performance of when estimating the “most di cult”

-- Non-Adaptive Tree-Structured Sensing --
Fundamental Limits

Theorem (2013): AS & J. Haupt
Non-Adaptive Tree-Structured Sensing – fundamental limits
Implications: no uniform guarantees can be made for any estimation
procedure for recovering the support of tree-sparse signals when
signal amplitude is “too small”.
For ESR with non-adaptive sensing a necessary condition is:

The Big Picture:
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

[*]
AS
and
J.
Haupt,
“On
the
Fundamental
Limits
of
Recovering
Tree
Sparse
Vectors
from
Noisy
Linear
Measurement,”
IEEE
Transac9ons
on

Informa9on
Theory,

2013
(accepted
for
publica9on).

The Big Picture:
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

Same necessary conditions as for adaptive + unstructured!
Structure or Adaptivity in isolation may at best improve
log(n) to log(k)
[*]
AS
and
J.
Haupt,
“On
the
Fundamental
Limits
of
Recovering
Tree
Sparse
Vectors
from
Noisy
Linear
Measurement,”
IEEE
Transac9ons
on

Informa9on
Theory,

2013
(accepted
for
publica9on).

Proof Idea – Non-Adaptive + Tree-Sparse
Restrict to a “Smaller Set”:
Convert to a Multiple-Hypothesis testing problem:
We can get a lower bound on minimax risk over a smaller subset of signals!
minimax prob. of error for
multiple hypothesis testing problem
Introduc9on
to
Nonparametric
Es9ma9on
–
A.B.
Tsybokov

sup
x2Xµ,k
Prx ( (Am, ym; M) 6= S(x)) sup
x2X0
µ,k
Prx ( (Am, ym; M) 6= S(x))
For any X0
µ,k ✓ Xµ,k,
=)
• get lower bound on pe,L using Fano’s inequality (or similar ideas)

-- Adaptive Tree-Structured Sensing --
Fundamental Limits

Theorem (2013): AS & J. Haupt
Adaptive Tree-Structured Sensing – fundamental limits
For ESR with non-adaptive sensing a necessary condition is:
Proof Idea: this problem is as hard as recovering the location of one
nonzero given all other k-1 nonzero locations.

Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

Recall, for our simple tree-structured adaptive algorithm the
sufficient condition for ESR was
which is only log(k) factor away from the lower bound.
We cannot do much better than the simple proposed algorithm!
µ
q
2 k
m log k,
The Big Picture:

Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

(when m > n)
Note: for adaptive + unstructured, our proof
ideas can show in case of m < n, a
necessary condition for ESR is
µ
q
2 n k+1
m
The Big Picture:

The Big Picture:
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

Related Works:
[*]
A.
Krishnamurthy,
J.
Sharpnack,
and
A.
Singh,
“Recovering
block-‐structured
ac9va9ons
using
compressive
measurements,”
Submi0ed
2012.

Question:
1

5
2

3
4
6
7

Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
p
|xi|
s
24

1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,

Answer:
No! We’re within log(k) of minimax optimal
Question:
1

5
2

3
4
6
7

Q[1] = {5}
y5 = eT
5 x + w
ˆS ˆS [ {5}
Q {6, 7} [ Q{5}
Q[1] = {6} ˆS = {1, 5}
y6 = eT
6 x + w
suppose |y5| > ⌧
suppose |y6| < ⌧
Q Q{6}
Q[1] = {7}
ˆS = {1, 5}
y7 = eT
7 x + w
suppose |y7| < ⌧
Q Q{7}
Q[1] = {;}
ˆS = {1, 5}
p
|xi|
s
24

1 + log
✓
4
◆ s
2
✓
k
m
◆
log k,

Simulation Setup
Non-adaptive + unstructured:
Non-adaptive + tree sparsity:
Adaptive + unstructured:
Adaptive + tree sparsity:
10
−1
10
0
10
1
10
2
0
0.2
0.4
0.6
0.8
1
Amplitude parameter µ
Prob.Error
n=28
−1
10
−1
10
0
10
1
10
2
0
0.2
0.4
0.6
0.8
1
Amplitude parameter µ
Prob.Error
n=2
10
−1
10
−1
10
0
10
1
10
2
0
0.2
0.4
0.6
0.8
1
Amplitude parameter µProb.Error
n=212
−1
4 orders of magnitude
[*]
M.
Malloy
and
R.
Nowak,
“Near-‐op9mal
adap9ve
compressive
sensing,”
in
Proc.
Asilomar
Conf.
on
Signals,

Systems,
and
Computers,
2012.

-- Next Step --
1) MSE estimation implications?

MSE estimation implications
Unstructured + Non-Adaptive:
If the measurement matrix Am satisﬁes the norm constraint kAmk2
F  m, then
we have minimax MSE bound
Unstructured + Adaptive:
infbx,M2Mna
supx:|S(x)|=k E
⇥
kbx(Am, ym; M) xk2
2
⇤
c 2 n
m k log n,
[
*
]
E.
J.
Cand`es
and
M.
A.
Davenport,
“How
well
can
we
es9mate
a
sparse
vector?,”
Applied
and

Computa9onal
Harmonic
Analysis,
vol.
34,
no.
2,
pp.
317–323,
2013

[
**
]
E.
Arias-‐Castro,
E.
J.
Candes,
and
M.
Davenport,
“On
the
fundamental
limits
of
adap9ve
sensing,”

Submi0ed,
2011,
online
at
arxiv.org/abs/1111.4646.

c > 0 is a constant. [ * ]
c0
> 0 is another constant. [ ** ]

F  m, then
c > 0 is a constant.
infbx,M2Mna
supx:|S(x)|=k E
⇥
kbx(Am, ym; M) xk2
2
⇤
c 2 n
m k log n,
c0
> 0 is another constant.
Tree Structured + Non-Adaptive:
Tree Structured + Adaptive:

F  m, then
c > 0 is a constant.
infbx,M2Mna
supx:|S(x)|=k E
⇥
kbx(Am, ym; M) xk2
2
⇤
c 2 n
m k log n,
c0
> 0 is another constant.
Tree-sparse + our adaptive procedure:
There exists a two-stage (support recovery followed by direct measurements)
adaptive compressive sensing procedure for k-tree sparse signals that produces,
from O(k) measurements, an estimate ˆx satisfying
kˆx xk2
2 = O
✓
2
✓
k
m
◆
k
◆
,
with high probability, provided the nonzero signal component amplitudes exceed
a constant times
q
2 k
m log k.

-- Next Step --
2) Learning Adaptive Sensing Representations (LASeR)

LASeR
Use Dictionary Learning and training data to learn tree-sparse representationsLearning Adaptive Sensing
!"#$%$%&'
(#)#'
*)"+,)+"-.'
*/#"0$)1'
2.#/34-'
*-%0$%&'
52*-6'
52*-67'5-#"%$%&'2.#/34-'*-%0$%&'6-/"-0-%)#38%0'
Example images (128 ⇥ 128)
52*-6'
52*-67'5-#"%$%&'2.#/34-'*-%0$%&'6-/"-0-%)#38%0'
(PICS) http://pics.psych.stir.ac.uk/
Example images (128 ⇥ 128)
Learn representation for 163 images from
Psychological Image Collection at Stirling
Wavelet Tree
Sensing
PCA
CS LASSO
CS Tree LASSO
LASeR
m = 50 m = 80m = 20
R = 128⇥128
32
“Sensing Energy”
Qualitative Results
Details
&
examples
of
LASeR
in
ac9on:
AS
and
J.
Haupt,
“Eﬃcient
adap9ve
compressive
sensing
using
sparse
hierarchical
learned

dic9onaries,”
in
Proc.
Asilomar
Conf.
on
Signals,
Systems
and
Computers,
2011,
pp.
1250-‐1254.

Tree Elements
Present in Sparse
Representation
Original Image

Overall Taxonomy
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

Sufficient condition for ESR for our algorithm:
=)nearly optimal!!

µ
q
2 k
m log k

Overall Taxonomy
Non-Adaptive
Adaptive

Non-Adaptive
Adaptive

Unstructured

Unstructured

TreeSparse

TreeSparse

Thank You!
Akshay Soni
University of Minnesota
sonix022@umn.edu
Sufficient condition for ESR for our algorithm:
=)nearly optimal!!

µ
q
2 k
m log k

Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements

Similar to Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements (20)

Recently uploaded

Recently uploaded (20)

Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements