SlideShare a Scribd company logo
Low-Rank Matrix Optimization Problems
Junxiao Song and Daniel P. Palomar
The Hong Kong University of Science and Technology (HKUST)
ELEC 5470 - Convex Optimization
Fall 2013-14, HKUST, Hong Kong
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 1 / 38
Outline of Lecture
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 2 / 38
Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 3 / 38
High Dimensional Data
Data becomes increasingly massive, high dimensional...
Images: compression, denoising, recognition. . .
Videos: streaming, tracking, stabilization. . .
User data: clustering, classification, recommendation. . .
Web data: indexing, ranking, search. . .
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 4 / 38
Low Dimensional Structures in High Dimensional Data
Low dimensional structures in visual data
User Data: profiles of different users may share some common factors
How to extract low dimensional structures from such high
dimensional data?
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 5 / 38
Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 6 / 38
Rank Minimization Problem (RMP)
In many scenarios, low dimensional structure is closely related to low
rank.
But in real applications, the true rank is usually unknown. A natural
approach to solve this is to formulate it as a rank minimization
problem (RMP), i.e., finding the matrix of lowest rank that satisfies
some constraint
minimize
X
rank(X)
subject to X ∈ C,
where X ∈ Rm×n is the optimization variable and C is a convex set
denoting the constraints.
When X is restricted to be diagonal, rank(X) = diag(X) 0 and the
rank minimization problem reduces to the cardinality minimization
problem ( 0-norm minimization).
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 7 / 38
Matrix Rank
The rank of a matrix X ∈ Rm×n is
the number of linearly independent rows of X
the number of linearly independent columns of X
the number of nonzero singular values of X, i.e., σ(X) 0.
the smallest number r such that there exists an m × r matrix F and an
r × n matrix G with X = FG
It can be shown that any nonsquare matrix X can be associated with a
positive semidefinite matrix whose rank is exactly twice the rank of X.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 8 / 38
Semidefinite Embedding Lemma
Lemma
Let X ∈ Rm×n be a given matrix. Then rank(X) ≤ r if and only if there
exist matrices Y = YT ∈ Rm×m and Z = ZT ∈ Rn×n such that
Y X
XT Z
0, rank(Y) + rank(Z) ≤ 2r.
Based on the semidefinite embedding lemma, minimizing the rank of a
general nonsquare matrix X, is equivalent to minimizing the rank of
the positive semidefinite, block diagonal matrix blkdiag(Y, Z):
minimize
X,Y,Z
1
2rank(blkdiag(Y, Z))
subject to
Y X
XT Z
0
X ∈ C.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 9 / 38
Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 10 / 38
Solving Rank Minimization Problem
In general, the rank minimization problem is NP-hard, and there is
little hope of finding the global minimum efficiently in all instances.
What we are going to talk about, instead, are efficient heuristics,
categorized into two groups:
1 Approximate the rank function with some surrogate functions
Nuclear norm heuristic
Log-det heuristic
2 Solving a sequence of rank-constrained feasibility problems
Matrix factorization based method
Rank constraint via convex iteration
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 11 / 38
Nuclear Norm Heuristic
A well known heuristic for rank minimization problem is replacing the rank
function in the objective with the nuclear norm
minimize
X
X ∗
subject to X ∈ C
Proposed by Fazel (2002) [Fazel, 2002].
The nuclear norm X ∗ is defined as the sum of singular values, i.e.,
X ∗ = r
i=1 σi.
If X = XT 0, X ∗ is just Tr(X) and the “nuclear norm heuristic”
reduces to the “trace heuristic”.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 12 / 38
Why Nuclear Norm?
Nuclear norm can be viewed as the 1-norm of the vector of singular
values.
Just as 1-norm ⇒ sparsity, nuclear norm ⇒ sparse singular value
vector, i.e., low rank.
When X is restricted to be diagonal, X ∗ = diag(X) 1 and the
nuclear norm heuristic for rank minimization problem reduces to the
1-norm heuristic for cardinality minimization problem.
x 1 is the convex envelope of card(x) over {x| x ∞ ≤ 1}.
Similarly, X ∗ is the convex envelope of rank(X) on the convex set
{X| X 2 ≤ 1} .
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 13 / 38
Equivalent SDP Formulation
Lemma
For X ∈ Rm×n and t ∈ R, we have X ∗ ≤ t if and only if there exist
matrices Y ∈ Rm×m and Z ∈ Rn×n such that
Y X
XT Z
0, Tr(Y) + Tr(Z) ≤ 2t.
Based on the above lemma, the nuclear norm minimization problem is
equivalent to
minimize
X,Y,Z
1
2Tr(Y + Z)
subject to
Y X
XT Z
0
X ∈ C.
This SDP formulation can also be obtained by applying the “trace
heuristic” to the PSD form of the RMP.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 14 / 38
Log-det Heuristic
In the log-det heuristic, log-det function is used as a smooth surrogate
for rank function.
Symmetric positive semidefinite case:
minimize
X
log det(X + δI)
subject to X ∈ C,
where δ > 0 is a small regularization constant.
Note that log det(X + δI) = i log(σi(X + δI)),
rank(X) = σ(X) 0, and log(s + δ) can be seen as a surrogate
function of card(s).
However, the surrogate function log det(X + δI) is not convex (in
fact, it is concave).
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 15 / 38
Log-det Heuristic
An iterative linearization and minimization scheme (called
majorization-minimization) is used to find a local minimum.
Let X(k) denote the kth iterate of the optimization variable X. The
first-order Taylor series expansion of log det (X + δI) about X(k) is
given by
log det (X + δI) ≈ log det X(k)
+ δI +Tr X(k)
+ δI
−1
X − X(k)
Then, one could minimize log det (X + δI) by iteratively minimizing
the local linearization, which leads to
X(k+1)
= argmin
X∈C
Tr X(k)
+ δI
−1
X .
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 16 / 38
Interpretation of Log-det Heuristic
If we choose X(0) = I, the first iteration is equivalent to minimizing
the trace of X, which is just the trace heuristic. The iterations that
follow try to reduce the rank further. In this sense, we can view this
heuristic as a refinement of the trace heuristic.
At each iteration we solve a weighted trace minimization problem,
with weights W(k) = X(k) + δI
−1
. Thus, the log-det heuristic can
be considered as an extension of the iterative reweighted 1-norm
heuristic to the matrix case.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 17 / 38
Log-det Heuristic for General Matrix
For general nonsquare matrix X, we can apply the log-det heuristic to
the equivalent PSD form and obtain
minimize
X,Y,Z
log det(blkdiag(Y, Z) + δI)
subject to
Y X
XT Z
0
X ∈ C.
Linearizing as before, at iteration k we solve the following problem to
get X(k+1), Y(k+1) and Z(k+1)
minimize
X,Y,Z
Tr blkdiag(Y(k), Z(k)) + δI
−1
blkdiag(Y, Z)
subject to
Y X
XT Z
0
X ∈ C.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 18 / 38
Matrix Factorization based Method
The idea behind factorization based methods is that rank(X) ≤ r if
and only if X can be factorized as X = FG, where F ∈ Rm×r and
G ∈ Rr×n.
For each given r, we check if there exists a feasible X of rank less
than or equal to r by checking if any X ∈ C can be factored as above.
The expression X = FG is not convex in X, F and G simultaneously,
but it is convex in (X, F) when G is fixed and convex in (X, G) when
F is fixed.
Various heuristics can be applied to handle this non-convex equality
constraint, but it is not guaranteed to find an X with rank r even if
one exists.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 19 / 38
Matrix Factorization based Method
Coordinate descent method: Fix F and G one at a time and
iteratively solve a convex problem at each iteration.
Choose F(0)
∈ Rm×r
. Set k = 1.
repeat
( ˜X(k)
, G(k)
) = argmin
X∈C,G∈Rr×n
X − F(k−1)
G
F
(X(k)
, F(k)
) = argmin
X∈C,F∈Rm×r
X − FG(k)
F
e(k)
= X(k)
− F(k)
G(k)
F
,
until e(k)
≤ , or e(k−1)
and e(k)
are approximately equal.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 20 / 38
Rank Constraint via Convex Iteration
Consider a semidefinite rank-constrained feasibility problem
find
X∈Sn
X
subject to X ∈ C
X 0
rank(X) ≤ r,
It is proposed in [Dattorro, 2005] to solve this problem via iteratively
solving the following two convex problems:
minimize
X
Tr(W X)
subject to X ∈ C
X 0
minimize
W
Tr(WX )
subject to 0 W I
Tr(W) = n − r,
where W is the optimal solution of the second problem and X is the
optimal solution of the first problem.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 21 / 38
Rank Constraint via Convex Iteration
An optimal solution to the second problem is known in closed form.
Given non-increasingly ordered diagonalization X = QΛQT , then
matrix W = U U T is optimal where
U = Q(:, r + 1 : n) ∈ Rn×n−r, and
Tr(W X ) =
n
i=r+1
λi(X ).
We start from W = I and iteratively solving the two convex
problems. Note that in the first iteration the first problem is just the
“trace heuristic”.
Suppose at convergence, Tr(W X ) = τ, if τ = 0, then
rank(X ) ≤ r and X is a feasible point. But this is not guaranteed,
only local convergence can be established, i.e., converging to some
τ ≥ 0.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 22 / 38
Rank Constraint via Convex Iteration
For general nonsquare matrix X ∈ Rm×n, we have an equivalent PSD
form
find
X∈Rm×n
X
subject to X ∈ C
rank(X) ≤ r
⇔
find
X,Y,Z
X
subject to X ∈ C
G =
Y X
XT Z
0
rank(G) ≤ r.
The same convex iterations can be applied now. Note that if we start
from W = I, now the first problem is just the “nuclear norm
heuristic” for the first iteration.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 23 / 38
Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 24 / 38
Recommender Systems
How does Amazon recommend commodities?
How does Netflix recommend movies?
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 25 / 38
Netflix Prize
Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings
to highest accuracy
17,770 total movies, 480,189 total users
How to fill in the blanks?
Can you improve the recommendation accuracy by 10% over what
Netflix was using? =⇒ One million dollars!
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 26 / 38
Abstract Setup: Matrix Completion
Consider a rating matrix Rm×n with Rij representing the rating user i
gives to movie j.
But some Rij are unknown since no one watches all movies
Movies
R =




2 3 ? ? 5 ?
1 ? ? 4 ? 3
? ? 3 2 ? 5
4 ? 3 ? 2 4



 Users
We would like to predict how users will like unwatched movies.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 27 / 38
Structure of the Rating Matrix
The rating matrix is very big, 480, 189 (number of users) times
17, 770 (number of movies) in the Netflix case.
But there are much fewer types of people and movies than there are
people and movies.
So it is reasonable to assume that for each user i, there is a
k-dimensional vector pi explaining the user’s movie taste and for each
movie j, there is also a k-dimensional vector qj explaining the movie’s
appeal. And the inner product between these two vectors, pT
i qj, is
the rating user i gives to movie j, i.e., Rij = pT
i qj. Or equivalently in
matrix form, R is factorized as R = PT Q, where P ∈ Rk×m,
Q ∈ Rk×n, k min(m, n).
It is the same as assuming the matrix R is of low rank.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 28 / 38
Matrix Completion
The true rank is unknown, a natural approach is to find the minimum
rank solution
minimize
X
rank(X)
subject to Xij = Rij, ∀(i, j) ∈ Ω,
where Ω is the set of observed entries.
In practice, instead of requiring strict equality for the observed entries,
one may allow some error and the formulation becomes
minimize
X
rank(X)
subject to (i,j)∈Ω(Xij − Rij)2 ≤ .
Then, all the heuristics can be applied, e.g., log-det heuristic, matrix
factorization.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 29 / 38
What did the Winners Use?
What algorithm did the final winner of the Netflix Prize use?
You can find the report from the Netflix Prize website. The winning
solution is really a cocktail of many methods combined and thousands
of model parameters fine-tuned specially to the training set provided
by Netflix.
But one key idea they used is just the factorization of the rating
matrix as the product of two low rank matrices [Koren et al., 2009],
[Koren and Bell, 2011].
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 30 / 38
What did the Winners Use?
What algorithm did the final winner of the Netflix Prize use?
You can find the report from the Netflix Prize website. The winning
solution is really a cocktail of many methods combined and thousands
of model parameters fine-tuned specially to the training set provided
by Netflix.
But one key idea they used is just the factorization of the rating
matrix as the product of two low rank matrices [Koren et al., 2009],
[Koren and Bell, 2011].
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 30 / 38
Background Extraction from Video
Given video sequence Fi, i = 1, . . . , n.
The objective is to extract the background in the video sequence, i.e.,
separating the background from the human activities.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 31 / 38
Low-Rank Matrix + Sparse Matrix
Stacking the images and grouping the video sequence
Y = [vec(F1), . . . , vec(Fn)]
The background component is of low rank, since the background is
static within a short period (ideally it is rank one as the image would
be the same).
The foreground component is sparse, as activities only occupy a small
fraction of space.
The problem fits into the following signal model
Y = X + E,
where Y is the observation, X is a low rank matrix (the low rank
background) and E is a sparse matrix (the sparse foreground).
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 32 / 38
Low-Rank and Sparse Matrix Recovery
Low-rank and sparse matrix recovery
minimize
X
rank(X) + γ vec(E) 0
subject to Y = X + E.
Applying the nuclear norm heuristic and 1-norm heuristic
simultaneously
minimize
X
X ∗ + γ vec(E) 1
subject to Y = X + E.
Recently, some theoretical results indicate that when X is of low rank
and E is sparse enough, exact recovery happens with high probability
[Wright et al., 2009].
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 33 / 38
Background Extraction Result
row 1: the original video sequences.
row 2: the extracted low-rank background.
row 3: the extracted sparse foreground.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 34 / 38
Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 35 / 38
Summary
We have introduced the rank minimization problem.
We have developed different heuristics to solve the rank minimization
problem:
Nuclear norm heuristic
Log-det heuristic
Matrix factorization based method
Rank constraint via convex iteration
Real applications are provided.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 36 / 38
References
Candès, E. J. and Recht, B. (2009).
Exact matrix completion via convex optimization.
Foundations of Computational mathematics, 9(6):717–772.
Dattorro, J. (2005).
Convex Optimization & Euclidean Distance Geometry.
Meboo Publishing USA.
Fazel, M. (2002).
Matrix rank minimization with applications.
PhD thesis, Stanford University.
Koren, Y. and Bell, R. (2011).
Advances in collaborative filtering.
In Recommender Systems Handbook, pages 145–186. Springer.
Koren, Y., Bell, R., and Volinsky, C. (2009).
Matrix factorization techniques for recommender systems.
Computer, 42(8):30–37.
Wright, J., Ganesh, A., Rao, S., Peng, Y., and Ma, Y. (2009).
Robust principal component analysis: Exact recovery of corrupted low-rank matrices via
convex optimization.
In Advances in neural information processing systems, pages 2080–2088.
Thanks
For more information visit:
http://www.ece.ust.hk/~palomar

More Related Content

What's hot

Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theory
Heinrich Hartmann
 
Discrete form of the riccati equation
Discrete form of the riccati equationDiscrete form of the riccati equation
Discrete form of the riccati equation
Alberth Carantón
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
Valentin De Bortoli
 
Puy chosuantai2
Puy chosuantai2Puy chosuantai2
Puy chosuantai2
Watcharaporn Cholamjiak
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Christian Robert
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Avichai Cohen
 
Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3
Sebastian De Haro
 
Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)
Ha Phuong
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Katsuya Ito
 
Chap05
Chap05Chap05
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
Jungkyu Lee
 
Functional analysis in mechanics 2e
Functional analysis in mechanics  2eFunctional analysis in mechanics  2e
Functional analysis in mechanics 2e
Springer
 
Functional analysis in mechanics
Functional analysis in mechanicsFunctional analysis in mechanics
Functional analysis in mechanics
Springer
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
Christian Robert
 
On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications
Premier Publishers
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
Christian Robert
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
Christian Robert
 
A note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integersA note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integers
Lukas Nabergall
 
Joco pavone
Joco pavoneJoco pavone
Joco pavone
Mario Pavone
 
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
mathsjournal
 

What's hot (20)

Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theory
 
Discrete form of the riccati equation
Discrete form of the riccati equationDiscrete form of the riccati equation
Discrete form of the riccati equation
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
 
Puy chosuantai2
Puy chosuantai2Puy chosuantai2
Puy chosuantai2
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
 
Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3
 
Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
 
Chap05
Chap05Chap05
Chap05
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
 
Functional analysis in mechanics 2e
Functional analysis in mechanics  2eFunctional analysis in mechanics  2e
Functional analysis in mechanics 2e
 
Functional analysis in mechanics
Functional analysis in mechanicsFunctional analysis in mechanics
Functional analysis in mechanics
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 
A note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integersA note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integers
 
Joco pavone
Joco pavoneJoco pavone
Joco pavone
 
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
 

Similar to slides_low_rank_matrix_optim_farhad

Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
Teja Turk
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
Jules Esp
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
Tasuku Soma
 
Hardness of approximation
Hardness of approximationHardness of approximation
Hardness of approximation
carlol
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
boyd 11.6
boyd 11.6boyd 11.6
boyd 11.6
Koki Isokawa
 
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
The Statistical and Applied Mathematical Sciences Institute
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Taiji Suzuki
 
A short study on minima distribution
A short study on minima distributionA short study on minima distribution
A short study on minima distribution
Loc Nguyen
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
BenoitValea
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
BenoitValea
 
The low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspaceThe low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspace
Tasuku Soma
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
Don Sheehy
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion Presentation
Michael Hankin
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
lswing
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
Charles Deledalle
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
matsushimalab
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
AnaNeacsu5
 
SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)
Eric Zhang
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
Alexander Litvinenko
 

Similar to slides_low_rank_matrix_optim_farhad (20)

Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 
Hardness of approximation
Hardness of approximationHardness of approximation
Hardness of approximation
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
boyd 11.6
boyd 11.6boyd 11.6
boyd 11.6
 
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
A short study on minima distribution
A short study on minima distributionA short study on minima distribution
A short study on minima distribution
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
 
The low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspaceThe low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspace
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion Presentation
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
 
SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 

More from Farhad Gholami

Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkley
Farhad Gholami
 
zeropadding
zeropaddingzeropadding
zeropadding
Farhad Gholami
 
FinalProject_Complete_6
FinalProject_Complete_6FinalProject_Complete_6
FinalProject_Complete_6
Farhad Gholami
 
EE8120_Projecte_15
EE8120_Projecte_15EE8120_Projecte_15
EE8120_Projecte_15
Farhad Gholami
 
Direct_studies_report13
Direct_studies_report13Direct_studies_report13
Direct_studies_report13
Farhad Gholami
 
Presentation_final_Farhad_9
Presentation_final_Farhad_9Presentation_final_Farhad_9
Presentation_final_Farhad_9
Farhad Gholami
 
ee8220_project_W2013_v5
ee8220_project_W2013_v5ee8220_project_W2013_v5
ee8220_project_W2013_v5
Farhad Gholami
 

More from Farhad Gholami (7)

Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkley
 
zeropadding
zeropaddingzeropadding
zeropadding
 
FinalProject_Complete_6
FinalProject_Complete_6FinalProject_Complete_6
FinalProject_Complete_6
 
EE8120_Projecte_15
EE8120_Projecte_15EE8120_Projecte_15
EE8120_Projecte_15
 
Direct_studies_report13
Direct_studies_report13Direct_studies_report13
Direct_studies_report13
 
Presentation_final_Farhad_9
Presentation_final_Farhad_9Presentation_final_Farhad_9
Presentation_final_Farhad_9
 
ee8220_project_W2013_v5
ee8220_project_W2013_v5ee8220_project_W2013_v5
ee8220_project_W2013_v5
 

slides_low_rank_matrix_optim_farhad

  • 1. Low-Rank Matrix Optimization Problems Junxiao Song and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2013-14, HKUST, Hong Kong J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 1 / 38
  • 2. Outline of Lecture 1 Motivation 2 Problem Formulation 3 Heuristics for Rank Minimization Problem Nuclear Norm Heuristic Log-det Heuristic Matrix Factorization based Method Rank Constraint via Convex Iteration 4 Real Applications Netflix Prize Video Intrusion Detection 5 Summary J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 2 / 38
  • 3. Outline 1 Motivation 2 Problem Formulation 3 Heuristics for Rank Minimization Problem Nuclear Norm Heuristic Log-det Heuristic Matrix Factorization based Method Rank Constraint via Convex Iteration 4 Real Applications Netflix Prize Video Intrusion Detection 5 Summary J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 3 / 38
  • 4. High Dimensional Data Data becomes increasingly massive, high dimensional... Images: compression, denoising, recognition. . . Videos: streaming, tracking, stabilization. . . User data: clustering, classification, recommendation. . . Web data: indexing, ranking, search. . . J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 4 / 38
  • 5. Low Dimensional Structures in High Dimensional Data Low dimensional structures in visual data User Data: profiles of different users may share some common factors How to extract low dimensional structures from such high dimensional data? J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 5 / 38
  • 6. Outline 1 Motivation 2 Problem Formulation 3 Heuristics for Rank Minimization Problem Nuclear Norm Heuristic Log-det Heuristic Matrix Factorization based Method Rank Constraint via Convex Iteration 4 Real Applications Netflix Prize Video Intrusion Detection 5 Summary J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 6 / 38
  • 7. Rank Minimization Problem (RMP) In many scenarios, low dimensional structure is closely related to low rank. But in real applications, the true rank is usually unknown. A natural approach to solve this is to formulate it as a rank minimization problem (RMP), i.e., finding the matrix of lowest rank that satisfies some constraint minimize X rank(X) subject to X ∈ C, where X ∈ Rm×n is the optimization variable and C is a convex set denoting the constraints. When X is restricted to be diagonal, rank(X) = diag(X) 0 and the rank minimization problem reduces to the cardinality minimization problem ( 0-norm minimization). J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 7 / 38
  • 8. Matrix Rank The rank of a matrix X ∈ Rm×n is the number of linearly independent rows of X the number of linearly independent columns of X the number of nonzero singular values of X, i.e., σ(X) 0. the smallest number r such that there exists an m × r matrix F and an r × n matrix G with X = FG It can be shown that any nonsquare matrix X can be associated with a positive semidefinite matrix whose rank is exactly twice the rank of X. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 8 / 38
  • 9. Semidefinite Embedding Lemma Lemma Let X ∈ Rm×n be a given matrix. Then rank(X) ≤ r if and only if there exist matrices Y = YT ∈ Rm×m and Z = ZT ∈ Rn×n such that Y X XT Z 0, rank(Y) + rank(Z) ≤ 2r. Based on the semidefinite embedding lemma, minimizing the rank of a general nonsquare matrix X, is equivalent to minimizing the rank of the positive semidefinite, block diagonal matrix blkdiag(Y, Z): minimize X,Y,Z 1 2rank(blkdiag(Y, Z)) subject to Y X XT Z 0 X ∈ C. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 9 / 38
  • 10. Outline 1 Motivation 2 Problem Formulation 3 Heuristics for Rank Minimization Problem Nuclear Norm Heuristic Log-det Heuristic Matrix Factorization based Method Rank Constraint via Convex Iteration 4 Real Applications Netflix Prize Video Intrusion Detection 5 Summary J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 10 / 38
  • 11. Solving Rank Minimization Problem In general, the rank minimization problem is NP-hard, and there is little hope of finding the global minimum efficiently in all instances. What we are going to talk about, instead, are efficient heuristics, categorized into two groups: 1 Approximate the rank function with some surrogate functions Nuclear norm heuristic Log-det heuristic 2 Solving a sequence of rank-constrained feasibility problems Matrix factorization based method Rank constraint via convex iteration J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 11 / 38
  • 12. Nuclear Norm Heuristic A well known heuristic for rank minimization problem is replacing the rank function in the objective with the nuclear norm minimize X X ∗ subject to X ∈ C Proposed by Fazel (2002) [Fazel, 2002]. The nuclear norm X ∗ is defined as the sum of singular values, i.e., X ∗ = r i=1 σi. If X = XT 0, X ∗ is just Tr(X) and the “nuclear norm heuristic” reduces to the “trace heuristic”. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 12 / 38
  • 13. Why Nuclear Norm? Nuclear norm can be viewed as the 1-norm of the vector of singular values. Just as 1-norm ⇒ sparsity, nuclear norm ⇒ sparse singular value vector, i.e., low rank. When X is restricted to be diagonal, X ∗ = diag(X) 1 and the nuclear norm heuristic for rank minimization problem reduces to the 1-norm heuristic for cardinality minimization problem. x 1 is the convex envelope of card(x) over {x| x ∞ ≤ 1}. Similarly, X ∗ is the convex envelope of rank(X) on the convex set {X| X 2 ≤ 1} . J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 13 / 38
  • 14. Equivalent SDP Formulation Lemma For X ∈ Rm×n and t ∈ R, we have X ∗ ≤ t if and only if there exist matrices Y ∈ Rm×m and Z ∈ Rn×n such that Y X XT Z 0, Tr(Y) + Tr(Z) ≤ 2t. Based on the above lemma, the nuclear norm minimization problem is equivalent to minimize X,Y,Z 1 2Tr(Y + Z) subject to Y X XT Z 0 X ∈ C. This SDP formulation can also be obtained by applying the “trace heuristic” to the PSD form of the RMP. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 14 / 38
  • 15. Log-det Heuristic In the log-det heuristic, log-det function is used as a smooth surrogate for rank function. Symmetric positive semidefinite case: minimize X log det(X + δI) subject to X ∈ C, where δ > 0 is a small regularization constant. Note that log det(X + δI) = i log(σi(X + δI)), rank(X) = σ(X) 0, and log(s + δ) can be seen as a surrogate function of card(s). However, the surrogate function log det(X + δI) is not convex (in fact, it is concave). J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 15 / 38
  • 16. Log-det Heuristic An iterative linearization and minimization scheme (called majorization-minimization) is used to find a local minimum. Let X(k) denote the kth iterate of the optimization variable X. The first-order Taylor series expansion of log det (X + δI) about X(k) is given by log det (X + δI) ≈ log det X(k) + δI +Tr X(k) + δI −1 X − X(k) Then, one could minimize log det (X + δI) by iteratively minimizing the local linearization, which leads to X(k+1) = argmin X∈C Tr X(k) + δI −1 X . J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 16 / 38
  • 17. Interpretation of Log-det Heuristic If we choose X(0) = I, the first iteration is equivalent to minimizing the trace of X, which is just the trace heuristic. The iterations that follow try to reduce the rank further. In this sense, we can view this heuristic as a refinement of the trace heuristic. At each iteration we solve a weighted trace minimization problem, with weights W(k) = X(k) + δI −1 . Thus, the log-det heuristic can be considered as an extension of the iterative reweighted 1-norm heuristic to the matrix case. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 17 / 38
  • 18. Log-det Heuristic for General Matrix For general nonsquare matrix X, we can apply the log-det heuristic to the equivalent PSD form and obtain minimize X,Y,Z log det(blkdiag(Y, Z) + δI) subject to Y X XT Z 0 X ∈ C. Linearizing as before, at iteration k we solve the following problem to get X(k+1), Y(k+1) and Z(k+1) minimize X,Y,Z Tr blkdiag(Y(k), Z(k)) + δI −1 blkdiag(Y, Z) subject to Y X XT Z 0 X ∈ C. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 18 / 38
  • 19. Matrix Factorization based Method The idea behind factorization based methods is that rank(X) ≤ r if and only if X can be factorized as X = FG, where F ∈ Rm×r and G ∈ Rr×n. For each given r, we check if there exists a feasible X of rank less than or equal to r by checking if any X ∈ C can be factored as above. The expression X = FG is not convex in X, F and G simultaneously, but it is convex in (X, F) when G is fixed and convex in (X, G) when F is fixed. Various heuristics can be applied to handle this non-convex equality constraint, but it is not guaranteed to find an X with rank r even if one exists. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 19 / 38
  • 20. Matrix Factorization based Method Coordinate descent method: Fix F and G one at a time and iteratively solve a convex problem at each iteration. Choose F(0) ∈ Rm×r . Set k = 1. repeat ( ˜X(k) , G(k) ) = argmin X∈C,G∈Rr×n X − F(k−1) G F (X(k) , F(k) ) = argmin X∈C,F∈Rm×r X − FG(k) F e(k) = X(k) − F(k) G(k) F , until e(k) ≤ , or e(k−1) and e(k) are approximately equal. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 20 / 38
  • 21. Rank Constraint via Convex Iteration Consider a semidefinite rank-constrained feasibility problem find X∈Sn X subject to X ∈ C X 0 rank(X) ≤ r, It is proposed in [Dattorro, 2005] to solve this problem via iteratively solving the following two convex problems: minimize X Tr(W X) subject to X ∈ C X 0 minimize W Tr(WX ) subject to 0 W I Tr(W) = n − r, where W is the optimal solution of the second problem and X is the optimal solution of the first problem. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 21 / 38
  • 22. Rank Constraint via Convex Iteration An optimal solution to the second problem is known in closed form. Given non-increasingly ordered diagonalization X = QΛQT , then matrix W = U U T is optimal where U = Q(:, r + 1 : n) ∈ Rn×n−r, and Tr(W X ) = n i=r+1 λi(X ). We start from W = I and iteratively solving the two convex problems. Note that in the first iteration the first problem is just the “trace heuristic”. Suppose at convergence, Tr(W X ) = τ, if τ = 0, then rank(X ) ≤ r and X is a feasible point. But this is not guaranteed, only local convergence can be established, i.e., converging to some τ ≥ 0. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 22 / 38
  • 23. Rank Constraint via Convex Iteration For general nonsquare matrix X ∈ Rm×n, we have an equivalent PSD form find X∈Rm×n X subject to X ∈ C rank(X) ≤ r ⇔ find X,Y,Z X subject to X ∈ C G = Y X XT Z 0 rank(G) ≤ r. The same convex iterations can be applied now. Note that if we start from W = I, now the first problem is just the “nuclear norm heuristic” for the first iteration. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 23 / 38
  • 24. Outline 1 Motivation 2 Problem Formulation 3 Heuristics for Rank Minimization Problem Nuclear Norm Heuristic Log-det Heuristic Matrix Factorization based Method Rank Constraint via Convex Iteration 4 Real Applications Netflix Prize Video Intrusion Detection 5 Summary J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 24 / 38
  • 25. Recommender Systems How does Amazon recommend commodities? How does Netflix recommend movies? J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 25 / 38
  • 26. Netflix Prize Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy 17,770 total movies, 480,189 total users How to fill in the blanks? Can you improve the recommendation accuracy by 10% over what Netflix was using? =⇒ One million dollars! J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 26 / 38
  • 27. Abstract Setup: Matrix Completion Consider a rating matrix Rm×n with Rij representing the rating user i gives to movie j. But some Rij are unknown since no one watches all movies Movies R =     2 3 ? ? 5 ? 1 ? ? 4 ? 3 ? ? 3 2 ? 5 4 ? 3 ? 2 4     Users We would like to predict how users will like unwatched movies. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 27 / 38
  • 28. Structure of the Rating Matrix The rating matrix is very big, 480, 189 (number of users) times 17, 770 (number of movies) in the Netflix case. But there are much fewer types of people and movies than there are people and movies. So it is reasonable to assume that for each user i, there is a k-dimensional vector pi explaining the user’s movie taste and for each movie j, there is also a k-dimensional vector qj explaining the movie’s appeal. And the inner product between these two vectors, pT i qj, is the rating user i gives to movie j, i.e., Rij = pT i qj. Or equivalently in matrix form, R is factorized as R = PT Q, where P ∈ Rk×m, Q ∈ Rk×n, k min(m, n). It is the same as assuming the matrix R is of low rank. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 28 / 38
  • 29. Matrix Completion The true rank is unknown, a natural approach is to find the minimum rank solution minimize X rank(X) subject to Xij = Rij, ∀(i, j) ∈ Ω, where Ω is the set of observed entries. In practice, instead of requiring strict equality for the observed entries, one may allow some error and the formulation becomes minimize X rank(X) subject to (i,j)∈Ω(Xij − Rij)2 ≤ . Then, all the heuristics can be applied, e.g., log-det heuristic, matrix factorization. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 29 / 38
  • 30. What did the Winners Use? What algorithm did the final winner of the Netflix Prize use? You can find the report from the Netflix Prize website. The winning solution is really a cocktail of many methods combined and thousands of model parameters fine-tuned specially to the training set provided by Netflix. But one key idea they used is just the factorization of the rating matrix as the product of two low rank matrices [Koren et al., 2009], [Koren and Bell, 2011]. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 30 / 38
  • 31. What did the Winners Use? What algorithm did the final winner of the Netflix Prize use? You can find the report from the Netflix Prize website. The winning solution is really a cocktail of many methods combined and thousands of model parameters fine-tuned specially to the training set provided by Netflix. But one key idea they used is just the factorization of the rating matrix as the product of two low rank matrices [Koren et al., 2009], [Koren and Bell, 2011]. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 30 / 38
  • 32. Background Extraction from Video Given video sequence Fi, i = 1, . . . , n. The objective is to extract the background in the video sequence, i.e., separating the background from the human activities. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 31 / 38
  • 33. Low-Rank Matrix + Sparse Matrix Stacking the images and grouping the video sequence Y = [vec(F1), . . . , vec(Fn)] The background component is of low rank, since the background is static within a short period (ideally it is rank one as the image would be the same). The foreground component is sparse, as activities only occupy a small fraction of space. The problem fits into the following signal model Y = X + E, where Y is the observation, X is a low rank matrix (the low rank background) and E is a sparse matrix (the sparse foreground). J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 32 / 38
  • 34. Low-Rank and Sparse Matrix Recovery Low-rank and sparse matrix recovery minimize X rank(X) + γ vec(E) 0 subject to Y = X + E. Applying the nuclear norm heuristic and 1-norm heuristic simultaneously minimize X X ∗ + γ vec(E) 1 subject to Y = X + E. Recently, some theoretical results indicate that when X is of low rank and E is sparse enough, exact recovery happens with high probability [Wright et al., 2009]. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 33 / 38
  • 35. Background Extraction Result row 1: the original video sequences. row 2: the extracted low-rank background. row 3: the extracted sparse foreground. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 34 / 38
  • 36. Outline 1 Motivation 2 Problem Formulation 3 Heuristics for Rank Minimization Problem Nuclear Norm Heuristic Log-det Heuristic Matrix Factorization based Method Rank Constraint via Convex Iteration 4 Real Applications Netflix Prize Video Intrusion Detection 5 Summary J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 35 / 38
  • 37. Summary We have introduced the rank minimization problem. We have developed different heuristics to solve the rank minimization problem: Nuclear norm heuristic Log-det heuristic Matrix factorization based method Rank constraint via convex iteration Real applications are provided. J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 36 / 38
  • 38. References Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772. Dattorro, J. (2005). Convex Optimization & Euclidean Distance Geometry. Meboo Publishing USA. Fazel, M. (2002). Matrix rank minimization with applications. PhD thesis, Stanford University. Koren, Y. and Bell, R. (2011). Advances in collaborative filtering. In Recommender Systems Handbook, pages 145–186. Springer. Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8):30–37. Wright, J., Ganesh, A., Rao, S., Peng, Y., and Ma, Y. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Advances in neural information processing systems, pages 2080–2088.
  • 39. Thanks For more information visit: http://www.ece.ust.hk/~palomar