This document summarizes optimization techniques for matrix factorization and completion problems. Section 8.1 introduces the matrix factorization problem and considers minimizing reconstruction error subject to a nuclear norm penalty. Section 8.2 discusses properties of the nuclear norm, including relationships to the trace norm and Frobenius norm. Section 8.3 provides performance guarantees for matrix completion when the underlying matrix is exactly low-rank. Section 8.4 describes proximal gradient methods for optimization, including updates that involve singular value thresholding. The document concludes by discussing an extension of these methods to dictionary learning and alignment problems.
9. 8.2
• 8.2.1
•
•
9
kW k⇤ = max
X
hX, W i subject to kXk 1
kW k⇤ = min
P ,Q
1
2
(tr(P ) + tr(Q)) subject to
P W
W T
Q
⌫ 0
kW k⇤ = min
U,V
1
2
(kUk2
F + kV k2
F ) subject to W = UV T
10. 8.2.2
• 5.2 k- L1
• 8.2 r
• r k
L2 L1
• → L1
10
kW k⇤
p
rkW kF
kwk1
p
kkwk2
11. 8.2.3
• 6.1 L1
•
• 6.3
11
kW k⇤ = min
1
2
(tr(W T †
W ) + tr( )) subject to ⌫ 0
kwk1 =
1
2
dX
j=1
min
⌘2Rd:⌘j 0
(
w2
j
⌘j
+ ⌘j)
12. 8.2.4 prox
• 6.2 L1 prox )
• 8.4 prox
•
• →
12
proxtr
(Y ) = argmin
W 2Rd1⇥d2
(
1
2
kY W k2
F + kW k⇤)
= U max (⌃ Id, 0)V T
max
h
proxl1
(y)
i
j
= max(|yj| , 0)
yj
|yj|
13. 8.3
• W
•
• ν
•
13
NEGAHBAN AND WAINWRIGHT
1
n ∑n
i=1 ξi
√
RX(i)
√
C, and secondly, we need to understand how to choose the parameter r so as
to achieve the tightest possible bound. When Θ∗ is exactly low-rank, then it is obvious that we
should choose r = rank(Θ∗), so that the approximation error vanishes—more specifically, so that
∑
dr
j=r+1 σj(
√
RΘ∗
√
C)j = 0. Doing so yields the following result:
Corollary 1 (Exactly low-rank matrices) Suppose that the noise sequence {ξi} is i.i.d., zero-mean
and sub-exponential, and Θ∗ has rank at most r, Frobenius norm at most 1, and spikiness at most
αsp(Θ∗) ≤ α∗. If we solve the SDP (7) with λn = 4ν d logd
n then there is a numerical constant c′
1
such that
|||Θ−Θ∗
|||2
ω(F) ≤ c′
1 (ν2
∨L2
) (α∗
)2 rd logd
n
+
c1(α∗L)2
n
(10)
with probability greater than 1−c2 exp(−c3 logd).
Note that this rate has a natural interpretation: since a rank r matrix of dimension dr × dc has
roughly r(dr + dc) free parameters, we require a sample size of this order (up to logarithmic fac-
tors) so as to obtain a controlled error bound. An interesting feature of the bound (10) is the term
ν2 ∨1 = max{ν2,1}, which implies that we do not obtain exact recovery as ν → 0. As we discuss at
more length in Section 3.4, under the mild spikiness condition that we have imposed, this behavior
is unavoidable due to lack of identifiability within a certain radius, as specified in the set C. For
instance, consider the matrix Θ∗ and the perturbed version Θ = Θ∗ + 1√
drdc
e1eT
1 . With high prob-
ˆ⇥ˆW
rd log d
n
NEGAHBAN AND WAINWRIGHT
1
n ∑n
i=1 ξi
√
RX(i)
√
C, and secondly, we need to understand how to choo
to achieve the tightest possible bound. When Θ∗ is exactly low-rank,
should choose r = rank(Θ∗), so that the approximation error vanishes—
∑
dr
j=r+1 σj(
√
RΘ∗
√
C)j = 0. Doing so yields the following result:
Corollary 1 (Exactly low-rank matrices) Suppose that the noise sequen
and sub-exponential, and Θ∗ has rank at most r, Frobenius norm at mos
αsp(Θ∗) ≤ α∗. If we solve the SDP (7) with λn = 4ν d logd
n then there i
such that
|||Θ−Θ∗
|||2
ω(F) ≤ c′
1 (ν2
∨L2
) (α∗
)2 rd logd
n
+
c1(α
with probability greater than 1−c2 exp(−c3 logd).
Note that this rate has a natural interpretation: since a rank r matrix