Chapter 8 Sections on Matrix Factorization and Optimization

8.1
8.1.1
8.1.2
8.1.2
8.2
8.2.1
8.2.2
8.2.3
8.2.4 prox
2
8.3
8.4
8.4.1
8.4.2
8.4.3

•
• 1h
•
•
•
•
•
•
•
3

8.1
•
•
• (Nuclear) -1
• cf)
• U,V
4
W = U diag( 1, ..., d)V T
=
dX
j=1
jujvT
j
kW k⇤ = tr(
p
W T
W ) =
dX
j=1
j(W )
≧0
≧0
u,v

•
• W= -  
•
5
j
W =
dX
j=1
jujvT
j
j
j

8.1.1
• ×
•
• →
6
( i, j)
min
W 2Rd1⇥d2
X
(i,j)2⌦
l(Yi,j, Wi,j) + kW k⇤
Yi,j

8.1.1
•
•
•
7
min
w1,...,wT 2Rd0
TX
t=1
ˆLt(wt) + kW k⇤
j
W =
dX
j=1
jujvT
j
j

8.1.1 /
•  
etc.
•
8
min
W 2Rd1⇥d2 ,b2R
fl(X(W ) + 1nb) + kW k⇤
X(W ) = (hXi, W i)n
i=1

8.2
• 8.2.1
•
•
9
kW k⇤ = max
X
hX, W i subject to kXk  1
kW k⇤ = min
P ,Q
1
2
(tr(P ) + tr(Q)) subject to

P W
W T
Q
⌫ 0
kW k⇤ = min
U,V
1
2
(kUk2
F + kV k2
F ) subject to W = UV T

8.2.2
• 5.2 k- L1
• 8.2 r
• r k  
L2 L1
• → L1  
10
kW k⇤ 
p
rkW kF
kwk1 
p
kkwk2

8.2.3
• 6.1 L1
•
• 6.3
11
kW k⇤ = min
1
2
(tr(W T †
W ) + tr( )) subject to ⌫ 0
kwk1 =
1
2
dX
j=1
min
⌘2Rd:⌘j 0
(
w2
j
⌘j
+ ⌘j)

8.2.4 prox
• 6.2 L1 prox )
• 8.4 prox
•
• →
12
proxtr
(Y ) = argmin
W 2Rd1⇥d2
(
1
2
kY W k2
F + kW k⇤)
= U max (⌃ Id, 0)V T
max
h
proxl1
(y)
i
j
= max(|yj| , 0)
yj
|yj|

8.3
• W
•
• ν
•
13
NEGAHBAN AND WAINWRIGHT
1
n ∑n
i=1 ξi
√
RX(i)
√
C, and secondly, we need to understand how to choose the parameter r so as
to achieve the tightest possible bound. When Θ∗ is exactly low-rank, then it is obvious that we
should choose r = rank(Θ∗), so that the approximation error vanishes—more specifically, so that
∑
dr
j=r+1 σj(
√
RΘ∗
√
C)j = 0. Doing so yields the following result:
Corollary 1 (Exactly low-rank matrices) Suppose that the noise sequence {ξi} is i.i.d., zero-mean
and sub-exponential, and Θ∗ has rank at most r, Frobenius norm at most 1, and spikiness at most
αsp(Θ∗) ≤ α∗. If we solve the SDP (7) with λn = 4ν d logd
n then there is a numerical constant c′
1
such that
|||Θ−Θ∗
|||2
ω(F) ≤ c′
1 (ν2
∨L2
) (α∗
)2 rd logd
n
+
c1(α∗L)2
n
(10)
with probability greater than 1−c2 exp(−c3 logd).
Note that this rate has a natural interpretation: since a rank r matrix of dimension dr × dc has
roughly r(dr + dc) free parameters, we require a sample size of this order (up to logarithmic fac-
tors) so as to obtain a controlled error bound. An interesting feature of the bound (10) is the term
ν2 ∨1 = max{ν2,1}, which implies that we do not obtain exact recovery as ν → 0. As we discuss at
more length in Section 3.4, under the mild spikiness condition that we have imposed, this behavior
is unavoidable due to lack of identifiability within a certain radius, as specified in the set C. For
instance, consider the matrix Θ∗ and the perturbed version Θ = Θ∗ + 1√
drdc
e1eT
1 . With high prob-
ˆ⇥ˆW
rd log d
n
NEGAHBAN AND WAINWRIGHT
1
n ∑n
i=1 ξi
√
RX(i)
√
C, and secondly, we need to understand how to choo
to achieve the tightest possible bound. When Θ∗ is exactly low-rank,
should choose r = rank(Θ∗), so that the approximation error vanishes—
∑
dr
j=r+1 σj(
√
RΘ∗
√
C)j = 0. Doing so yields the following result:
Corollary 1 (Exactly low-rank matrices) Suppose that the noise sequen
and sub-exponential, and Θ∗ has rank at most r, Frobenius norm at mos
αsp(Θ∗) ≤ α∗. If we solve the SDP (7) with λn = 4ν d logd
n then there i
such that
|||Θ−Θ∗
|||2
ω(F) ≤ c′
1 (ν2
∨L2
) (α∗
)2 rd logd
n
+
c1(α
with probability greater than 1−c2 exp(−c3 logd).
Note that this rate has a natural interpretation: since a rank r matrix

8.4
• 8.4.1
• 6.3 8.2.3  
•
•  
14
t+1
= (W W T
)1/2
W t

8.4.2
•
•
•
•
•
15
W t+1
= proxtr
⌘t
(W t
⌘trˆL(W t
))
W t+1/2
= W t
⌘trˆL(W t
)
W t+1
= proxtr
(W t+1/2
)
= U max (W 1/2
Id, 0)V T

8.4.2
•
• λ
• k
• k  
k←2k
• × …
• (
16
W t
= U max (W 1/2
Id, 0)V T
W tY

8.4.3 DAL
•
•
•
17
min
↵2Rn
f⇤
l ( ↵) + k·k (XT
(↵)) + ·=0(1T
n ↵)
min
W 2Rd1⇥d2 ,b2R
fl(X(W ) + 1nb) + kW k⇤
XT
(↵) =
nX
i=1
↵iXi
X(W ) = (hXi, W i)n
i=1
't(↵) = f⇤
l ( ↵) +
1
2⌘t
kproxtr
⌘t
(W t
+ ⌘tXT
(↵))k2
F +
1
2⌘t
(bt
+ ⌘t1T
n ↵)2

8.4.3 DAL
•
• L1 ( )
• ( )
• prox
18
't(↵) = f⇤
l ( ↵) +
1
2⌘t
kproxtr
⌘t
(W t
+ ⌘tXT
(↵))k2
F +
1
2⌘t
(bt
+ ⌘t1T
n ↵)2
't(↵) = f⇤
l ( ↵) +
1
2⌘t
kproxl1
⌘t
(wt
+ ⌘tXT
↵)k2
2

8.4.3 DAL
•
• prox
•
• α
19
↵t+1
u argmin
↵2Rn
't(↵)
't(↵) = f⇤
l ( ↵) +
1
2⌘t
kproxtr
⌘t
(W t
+ ⌘tXT
(↵))k2
F +
1
2⌘t
(bt
+ ⌘t1T
n ↵)2
W t+1
= proxtr
⌘t
(W t
+ ⌘tXT
(↵t+1
))
bt+1
= bt
+ ⌘t1T
n ↵t+1

•
• W
•
•
• , (j) ,
•
• DAL
20
W =
dX
j=1
jujvT
j

Chapter 8 Sections on Matrix Factorization and Optimization

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Chapter 8 Sections on Matrix Factorization and Optimization

Similar to Chapter 8 Sections on Matrix Factorization and Optimization (20)

Recently uploaded

Recently uploaded (20)

Chapter 8 Sections on Matrix Factorization and Optimization