The summary is:
1. The singular value decomposition (SVD) provides a factorization of any matrix A as A = UΣVT, where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values.
2. The SVD exists for any matrix A by considering the eigendecompositions of the positive semidefinite matrices ATA and AAT.
3. The SVD is useful for finding least squares solutions to overdetermined or underdetermined systems of linear equations, by using the pseudoinverse of A.
1. Lecture 10: The Singular Value
Decomposition
Nicholas Ruozzi
University of Texas at Dallas
2. Motivation
• We just spent a lot of time looking at interesting consequences
of the observation that any symmetric matrix 𝐴 can be
factorized as 𝐴 = 𝑄𝐷𝑄𝑇
for some orthogonal matrix 𝑄 and
some diagonal matrix 𝐷
• Is there a generalization of this for arbitrary matrices?
• Yes! It is based on the same eigenvectors/eigenvalues
construction as above, but it isn’t symmetric
2
3. The Singular Value Decomposition
• Every matrix 𝐴 ∈ ℝ𝑚×𝑛
can be factorized as 𝐴 = 𝑈Σ𝑉𝑇
where
𝑈 ∈ ℝ𝑚×𝑚 and 𝑉 ∈ ℝ𝑛×𝑛 are orthogonal matrices and Σ ∈
ℝ𝑚×𝑛
is a diagonal matrix
• The diagonal entries of 𝐴 are called the singular values of the
matrix 𝐴
• We’ll show that we can use this decomposition similarly to the
eigenvalue decomposition, e.g., for finding least squares
solutions for linear systems
• But first, let’s show that the SVD exists
3
4. Existence of the SVD
• Let 𝐴 ∈ ℝ𝑚×𝑛
be an arbitrary matrix and consider the positive
semidefinite matrix 𝐴𝑇𝐴
• For now, let’s assume that it is full rank, i.e., it has no zero
eigenvalues
• It can be written as 𝐴𝑇𝐴 = 𝑖=1
𝑛
𝜆𝑖𝑣(𝑖)𝑣 𝑖 𝑇
, where 𝑣′𝑠 are
orthonormal eigenvectors in ℝ𝑛
• Define 𝜎𝑖 = 𝜆𝑖
• Let 𝑢(𝑖) =
𝐴𝑣 𝑖
𝜎𝑖
. Note that 𝑢 𝑖 𝑇
𝑢(𝑖) = 1 and 𝐴𝐴𝑇𝑢(𝑖) = 𝜆𝑖𝑢(𝑖)
4
5. Existence of the SVD
• Let 𝐴 ∈ ℝ𝑚×𝑛
be an arbitrary matrix and consider the positive
semidefinite matrix 𝐴𝑇𝐴
• Let 𝑢(𝑖) =
𝐴𝑣 𝑖
𝜎𝑖
. Note that 𝑢 𝑖 𝑇
𝑢(𝑖) = 1 and 𝐴𝐴𝑇𝑢(𝑖) = 𝜆𝑖𝑢(𝑖)
• The 𝑢′𝑠 are orthonormal eigenvectors of 𝐴𝐴𝑇
• Let Σ ∈ ℝ𝑚×𝑛
be a diagonal matrix with Σ𝑖𝑖 = 𝜎𝑖 and Σ−1
∈
ℝ𝑛×𝑚be a diagonal matrix with Σ𝑖𝑖
−1
= 1/𝜎𝑖
• Now, by construction,
𝑈 = 𝐴𝑉Σ−1
𝑈Σ = 𝐴𝑉
𝑈𝛴𝑉𝑇
= 𝐴
5
6. Existence of the SVD
• Finally if the positive semidefinite matrix 𝐴𝑇
𝐴 is not full rank, we
can apply the same argument by constructing one 𝑢 for each of
the non-zero eigenvalues and then extending them to an
orthonormal basis
• We keep the diagonal entries of both Σ and Σ−1 equal to zero
for vectors in the extension
• That’s it
6
*This nice argument was taken from a blog post of Gregory Gunderson
7. The Singular Value Decomposition
• Every matrix 𝐴 ∈ ℝ𝑚×𝑛
can be factorized as 𝐴 = 𝑈Σ𝑉𝑇
where
𝑈 ∈ ℝ𝑚×𝑚 and 𝑉 ∈ ℝ𝑛×𝑛 are orthogonal matrices and Σ ∈
ℝ𝑚×𝑛
is a diagonal matrix
• The diagonal entries of 𝐴 are called the singular values of the
matrix 𝐴 and they are the square roots of the eigenvalues of
𝐴𝑇𝐴 and 𝐴𝐴𝑇
• 𝑈 are orthonormal eigenvectors of 𝐴𝐴𝑇
• 𝑉 are orthonormal eigenvectors of 𝐴𝑇𝐴
7
8. Least Squares Solutions Linear Systems
min
𝑥∈ℝ𝑛
1
2
𝑥 2
2
subject to
𝐴𝑥 = 𝑏
𝐿 𝑥, 𝜈 =
1
2
𝑥𝑇
𝑥 + 𝜈𝑇
(𝐴𝑥 − 𝑏)
∇𝑥𝐿 = 𝑥 + 𝐴𝑇𝜈 = 0
𝑥 = 𝐴𝑇
𝑄𝐷+
𝑄𝑇
𝑏
𝜈 = 𝑄𝐷+𝑄𝑇𝑏, where 𝐴𝐴𝑇 = 𝑄𝐷𝑄𝑇
𝑥 = 𝐴𝑇𝑄𝐷+𝑄𝑇𝑏
Note that 𝐴 = 𝑄𝛴𝑉𝑇 where 𝛴𝑖𝑖 = 𝐷𝑖𝑖
8
9. Least Squares Solutions Linear Systems
min
𝑥∈ℝ𝑛
1
2
𝑥 2
2
subject to
𝐴𝑥 = 𝑏
𝐿 𝑥, 𝜈 =
1
2
𝑥𝑇
𝑥 + 𝜈𝑇
(𝐴𝑥 − 𝑏)
∇𝑥𝐿 = 𝑥 + 𝐴𝑇𝜈 = 0
𝑥 = 𝐴𝑇
𝑄𝐷+
𝑄𝑇
𝑏
𝜈 = 𝑄𝐷+𝑄𝑇𝑏, where 𝐴𝐴𝑇 = 𝑄𝐷𝑄𝑇
𝑥 = 𝐴𝑇𝑄𝐷+𝑄𝑇𝑏 = 𝑉ΣQTQD+QTb = VΣ+𝑄𝑇𝑏
Note that 𝐴 = 𝑄𝛴𝑉𝑇
where 𝛴𝑖𝑖 = 𝐷𝑖𝑖
9
10. The Morse-Penrose Pseudo-inverse
• If the linear system 𝐴𝑥 = 𝑏 has a solution, then the solution of
minimum norm is given by 𝑥 = 𝐴+𝑏 where
𝐴 = 𝑈Σ𝑉𝑇, the singular value decomposition
𝑨+
= 𝑽𝚺+
𝑼𝑻
• If 𝐴 is invertible, then 𝐴+ = 𝐴−1
• All the interesting algebraic properties of 𝐴+ following directly
from the definition above, e.g.,
𝐴𝐴+𝐴 = 𝑈Σ𝑉𝑇𝑉Σ+𝑈𝑇𝑈Σ𝑉𝑇 = 𝑈ΣΣ+Σ𝑉𝑇 = 𝑈Σ𝑉𝑇 = 𝐴
10
11. Ex 1: Convex Quadratic Minimization
inf
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑄𝑥 − 𝑠𝑇
𝑥
Either
𝑄𝑥 = 𝑠
or
inf
𝑥
1
2
𝑥𝑇𝑄𝑥 − 𝑠𝑇𝑥 = −∞
11
𝐴 ∈ ℝ𝑛×𝑛
is an arbitrary
p.s.d matrix
16. Pseudoinverse
min
𝑥∈ℝ𝑛
𝑥𝑇𝐴𝑇𝐴𝑥 − 2𝑏𝑇𝐴𝑥 + 𝑏𝑇𝑏
𝐴𝑇
𝐴𝑥 = 𝐴𝑇
𝑏
Let 𝐴 = 𝑈Σ𝑉𝑇
𝑥 = 𝐴𝑇𝐴 +𝐴𝑇𝑏
= 𝑉 Σ+ 2𝑉𝑇𝑉Σ𝑈𝑇𝑏
= 𝑉Σ+𝑈𝑇𝑏
= 𝐴+
𝑏
The pseudoinverse solves the norm minimization problem even if
there is no solution to 𝐴𝑥 = 𝑏!
16
17. Ex 2: Convex Quadratic Minimization
min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑄𝑥 − 𝑠𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑄𝑥 − 𝑠𝑇
𝑥 − 𝜆𝑇
𝑥
17
𝐴 ∈ ℝ𝑛×𝑛
is an arbitrary
p.s.d matrix
18. Ex 2: Convex Quadratic Minimization
min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑄𝑥 − 𝑠𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑄𝑥 − 𝑠𝑇
𝑥 − 𝜆𝑇
𝑥
Either
𝑄𝑥 = 𝑠 + 𝜆
or
inf
𝑥
𝐿 𝑥, 𝜆 = −∞
18
29. Eigenvectors of 𝐴 + 𝜆𝐼
• If is 𝑥 an eigenvector of 𝐴 with eigenvalue 𝜇, then
• 𝐴𝑥 = 𝜇𝑥
• 𝜆𝐼𝑥 = 𝜆𝑥
• 𝐴 + 𝜆𝐼 𝑥 = 𝜇 + 𝜆 𝑥
• So, 𝐴 + 𝜆𝐼 is positive semidefinite if and only if 𝜆 ≥ −𝜇𝑚𝑖𝑛
where 𝜇𝑚𝑖𝑛 is the smallest eigenvalue of 𝐴
29
30. Non-Convex Quadratic Minimization
min
𝑥∈ℝ𝑛
𝑥𝑇𝐴𝑥 + 2𝑏𝑇𝑥
subject to
𝑥𝑇𝑥 ≤ 1
𝐿 𝑥, 𝑦 = 𝑥𝑇
𝐴𝑥 + 2𝑏𝑇
𝑥 + 𝜆 𝑥𝑇
𝑥 − 1
= 𝑥𝑇 𝐴 + 𝜆𝐼 𝑥 + 2𝑏𝑇𝑥 − 𝜆
max
𝜆≥0
−𝑏𝑇 𝐴 + 𝜆𝐼 +𝑏 − 𝜆
subject to
𝜆 ≥ −𝜇𝑚𝑖𝑛
∃𝑥 st. 𝐴 + 𝜆𝐼 𝑥 = −𝑏
30
𝐴 ∈ ℝ𝑛×𝑛
is an arbitrary
symmetric matrix
31. Range of a Matrix
• ∃𝑥 st. 𝐴 + 𝜆𝐼 𝑥 = −𝑏 has a solution for a symmetric matrix
𝐴 + 𝜆𝐼 if and only if 𝑏 is in the span of the eigenvectors of 𝐴 +
𝜆𝐼 with non-zero eigenvalues (note the eigenvectors of 𝐴 + 𝜆𝐼
are the same as the orthonormal eigenvectors of 𝐴)
• In this case, 𝑏𝑇 𝐴 + 𝜆𝐼 +𝑏 = 𝑖:𝜆+𝜇𝑖>0
𝑥 𝑖 𝑇
𝑏
2
𝜆+𝜇𝑖
• Otherwise, the constraint that ∃𝑥 st. 𝐴 + 𝜆𝐼 𝑥 = −𝑏 is
violated, i.e., there exists an eigenvector 𝑥(𝑗) of 𝐴 such that
𝐴 + 𝜆𝐼 𝑥(𝑗) = 0 and 𝑏𝑇𝑥(𝑗) ≠ 0
• Note that, for large enough 𝜆, this system always has a
solution, indeed, 𝜆 > − min
𝑗:𝑏𝑇𝑥(𝑗)≠0
𝜇𝑗
31
32. Non-Convex Quadratic Minimization
min
𝑥∈ℝ𝑛
𝑥𝑇𝐴𝑥 + 2𝑏𝑇𝑥
subject to
𝑥𝑇𝑥 ≤ 1
max
𝜆≥0
−𝑓(𝜆) − 𝜆
subject to
𝜆 ≥ −𝜇𝑚𝑖𝑛
where
𝑓 𝜆 =
∞, if ∃𝑗 𝑠𝑡. 𝑏𝑇
𝑥 𝑗
≠ 0 𝑎𝑛𝑑 𝜆 + 𝜇𝑗 = 0
𝑖:𝜆+𝜇𝑖>0
𝑥 𝑖 𝑇
𝑏
2
𝜆 + 𝜇𝑖
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
32
𝐴 ∈ ℝ𝑛×𝑛
is an arbitrary
symmetric matrix
33. Non-Convex Quadratic Minimization
min
𝑥∈ℝ𝑛
𝑥𝑇𝐴𝑥 + 2𝑏𝑇𝑥
subject to
𝑥𝑇𝑥 ≤ 1
max
𝜆≥0
−𝑓(𝜆) − 𝜆
subject to
𝜆 ≥ −𝜇𝑚𝑖𝑛
where
𝑓 𝜆 =
∞, if ∃𝑗 𝑠𝑡. 𝑏𝑇
𝑥 𝑗
≠ 0 𝑎𝑛𝑑 𝜆 + 𝜇𝑗 = 0
𝑖:𝜆+𝜇𝑖>0
𝑥 𝑖 𝑇
𝑏
2
𝜆 + 𝜇𝑖
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
33
𝐴 ∈ ℝ𝑛×𝑛
is an arbitrary
symmetric matrix
How would
you solve this
problem?
34. SVD Reformulated
• Let 𝐴 ∈ ℝ𝑚×𝑛
with svd, 𝐴 = 𝑈Σ𝑉𝑇
• 𝐴 = 𝑖=1
min(𝑚,𝑛)
𝜎𝑖𝑢(𝑖)𝑣 𝑖 𝑇
• Without loss of generality, we can assume that the
singular values are in decreasing order, i.e., 𝜎1 = Σ11 ≥
𝜎2 = Σ22 ≥ ⋯ ≥ 𝜎𝑛 = Σ𝑛𝑛
• 𝐴 𝐹
2
= 𝑡𝑟𝑎𝑐𝑒 𝐴𝑇𝐴 = 𝑖 𝜎𝑖
2
34
35. Applications of SVD: Low Rank Approximations
𝐵∗ = arg min
𝐵 𝑠𝑡.𝑟𝑎𝑛𝑘 𝐵 =𝑘
𝐴 − 𝐵 𝐹
2
𝐴 = 𝑈Σ𝑉𝑇 =
𝑖=1
min(𝑚,𝑛)
𝜎𝑖𝑢(𝑖)𝑣 𝑖 𝑇
𝐵∗ =
𝑖=1
𝑘
𝜎𝑖𝑢(𝑖)𝑣 𝑖 𝑇
Note: 𝜎′s must be in decreasing order for this!
35