Lecture_10_SVD.pptx

Lecture 10: The Singular Value
Decomposition
Nicholas Ruozzi
University of Texas at Dallas

Motivation
• We just spent a lot of time looking at interesting consequences
of the observation that any symmetric matrix 𝐴 can be
factorized as 𝐴 = 𝑄𝐷𝑄𝑇
for some orthogonal matrix 𝑄 and
some diagonal matrix 𝐷
• Is there a generalization of this for arbitrary matrices?
• Yes! It is based on the same eigenvectors/eigenvalues
construction as above, but it isn’t symmetric
2

The Singular Value Decomposition
• Every matrix 𝐴 ∈ ℝ𝑚×𝑛
can be factorized as 𝐴 = 𝑈Σ𝑉𝑇
where
𝑈 ∈ ℝ𝑚×𝑚 and 𝑉 ∈ ℝ𝑛×𝑛 are orthogonal matrices and Σ ∈
ℝ𝑚×𝑛
is a diagonal matrix
• The diagonal entries of 𝐴 are called the singular values of the
matrix 𝐴
• We’ll show that we can use this decomposition similarly to the
eigenvalue decomposition, e.g., for finding least squares
solutions for linear systems
• But first, let’s show that the SVD exists
3

Existence of the SVD
• Let 𝐴 ∈ ℝ𝑚×𝑛
be an arbitrary matrix and consider the positive
semidefinite matrix 𝐴𝑇𝐴
• For now, let’s assume that it is full rank, i.e., it has no zero
eigenvalues
• It can be written as 𝐴𝑇𝐴 = 𝑖=1
𝑛
𝜆𝑖𝑣(𝑖)𝑣 𝑖 𝑇
, where 𝑣′𝑠 are
orthonormal eigenvectors in ℝ𝑛
• Define 𝜎𝑖 = 𝜆𝑖
• Let 𝑢(𝑖) =
𝐴𝑣 𝑖
𝜎𝑖
. Note that 𝑢 𝑖 𝑇
𝑢(𝑖) = 1 and 𝐴𝐴𝑇𝑢(𝑖) = 𝜆𝑖𝑢(𝑖)
4

be an arbitrary matrix and consider the positive
semidefinite matrix 𝐴𝑇𝐴
• Let 𝑢(𝑖) =
𝐴𝑣 𝑖
𝜎𝑖
. Note that 𝑢 𝑖 𝑇
𝑢(𝑖) = 1 and 𝐴𝐴𝑇𝑢(𝑖) = 𝜆𝑖𝑢(𝑖)
• The 𝑢′𝑠 are orthonormal eigenvectors of 𝐴𝐴𝑇
• Let Σ ∈ ℝ𝑚×𝑛
be a diagonal matrix with Σ𝑖𝑖 = 𝜎𝑖 and Σ−1
∈
ℝ𝑛×𝑚be a diagonal matrix with Σ𝑖𝑖
−1
= 1/𝜎𝑖
• Now, by construction,
𝑈 = 𝐴𝑉Σ−1
𝑈Σ = 𝐴𝑉
𝑈𝛴𝑉𝑇
= 𝐴
5

• Finally if the positive semidefinite matrix 𝐴𝑇
𝐴 is not full rank, we
can apply the same argument by constructing one 𝑢 for each of
the non-zero eigenvalues and then extending them to an
orthonormal basis
• We keep the diagonal entries of both Σ and Σ−1 equal to zero
for vectors in the extension
• That’s it
6
*This nice argument was taken from a blog post of Gregory Gunderson

The Singular Value Decomposition
• Every matrix 𝐴 ∈ ℝ𝑚×𝑛
can be factorized as 𝐴 = 𝑈Σ𝑉𝑇
where
𝑈 ∈ ℝ𝑚×𝑚 and 𝑉 ∈ ℝ𝑛×𝑛 are orthogonal matrices and Σ ∈
ℝ𝑚×𝑛
is a diagonal matrix
• The diagonal entries of 𝐴 are called the singular values of the
matrix 𝐴 and they are the square roots of the eigenvalues of
𝐴𝑇𝐴 and 𝐴𝐴𝑇
• 𝑈 are orthonormal eigenvectors of 𝐴𝐴𝑇
• 𝑉 are orthonormal eigenvectors of 𝐴𝑇𝐴
7

Least Squares Solutions Linear Systems
min
𝑥∈ℝ𝑛
1
2
𝑥 2
2
subject to
𝐴𝑥 = 𝑏
𝐿 𝑥, 𝜈 =
1
2
𝑥𝑇
𝑥 + 𝜈𝑇
(𝐴𝑥 − 𝑏)
∇𝑥𝐿 = 𝑥 + 𝐴𝑇𝜈 = 0
𝑥 = 𝐴𝑇
𝑄𝐷+
𝑄𝑇
𝑏
𝜈 = 𝑄𝐷+𝑄𝑇𝑏, where 𝐴𝐴𝑇 = 𝑄𝐷𝑄𝑇
𝑥 = 𝐴𝑇𝑄𝐷+𝑄𝑇𝑏
Note that 𝐴 = 𝑄𝛴𝑉𝑇 where 𝛴𝑖𝑖 = 𝐷𝑖𝑖
8

Least Squares Solutions Linear Systems
min
𝑥∈ℝ𝑛
1
2
𝑥 2
2
subject to
𝐴𝑥 = 𝑏
𝐿 𝑥, 𝜈 =
1
2
𝑥𝑇
𝑥 + 𝜈𝑇
(𝐴𝑥 − 𝑏)
∇𝑥𝐿 = 𝑥 + 𝐴𝑇𝜈 = 0
𝑥 = 𝐴𝑇
𝑄𝐷+
𝑄𝑇
𝑏
𝜈 = 𝑄𝐷+𝑄𝑇𝑏, where 𝐴𝐴𝑇 = 𝑄𝐷𝑄𝑇
𝑥 = 𝐴𝑇𝑄𝐷+𝑄𝑇𝑏 = 𝑉ΣQTQD+QTb = VΣ+𝑄𝑇𝑏
Note that 𝐴 = 𝑄𝛴𝑉𝑇
where 𝛴𝑖𝑖 = 𝐷𝑖𝑖
9

The Morse-Penrose Pseudo-inverse
• If the linear system 𝐴𝑥 = 𝑏 has a solution, then the solution of
minimum norm is given by 𝑥 = 𝐴+𝑏 where
𝐴 = 𝑈Σ𝑉𝑇, the singular value decomposition
𝑨+
= 𝑽𝚺+
𝑼𝑻
• If 𝐴 is invertible, then 𝐴+ = 𝐴−1
• All the interesting algebraic properties of 𝐴+ following directly
from the definition above, e.g.,
𝐴𝐴+𝐴 = 𝑈Σ𝑉𝑇𝑉Σ+𝑈𝑇𝑈Σ𝑉𝑇 = 𝑈ΣΣ+Σ𝑉𝑇 = 𝑈Σ𝑉𝑇 = 𝐴
10

Ex 1: Convex Quadratic Minimization
inf
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑄𝑥 − 𝑠𝑇
𝑥
Either
𝑄𝑥 = 𝑠
or
inf
𝑥
1
2
𝑥𝑇𝑄𝑥 − 𝑠𝑇𝑥 = −∞
11
𝐴 ∈ ℝ𝑛×𝑛
is an arbitrary
p.s.d matrix

Pseudoinverse
min
𝑥∈ℝ𝑛
𝐴𝑥 − 𝑏 2
2
12

Pseudoinverse
min
𝑥∈ℝ𝑛
𝐴𝑥 − 𝑏 𝑇(𝐴𝑥 − 𝑏)
13

Pseudoinverse
min
𝑥∈ℝ𝑛
𝑥𝑇𝐴𝑇𝐴𝑥 − 2𝑏𝑇𝐴𝑥 + 𝑏𝑇𝑏
Either
𝐴𝑇
𝐴𝑥 = 𝐴𝑇
𝑏
or
inf
𝑥
𝑥𝑇𝐴𝑇𝐴𝑥 − 2𝑏𝑇𝐴𝑥 = −∞
14

Pseudoinverse
min
𝑥∈ℝ𝑛
Either
𝐴𝑇
𝐴𝑥 = 𝐴𝑇
𝑏
or
inf
𝑥
𝑥𝑇𝐴𝑇𝐴𝑥 − 2𝑏𝑇𝐴𝑥 = −∞
15

Pseudoinverse
min
𝑥∈ℝ𝑛
𝐴𝑇
𝐴𝑥 = 𝐴𝑇
𝑏
Let 𝐴 = 𝑈Σ𝑉𝑇
𝑥 = 𝐴𝑇𝐴 +𝐴𝑇𝑏
= 𝑉 Σ+ 2𝑉𝑇𝑉Σ𝑈𝑇𝑏
= 𝑉Σ+𝑈𝑇𝑏
= 𝐴+
𝑏
The pseudoinverse solves the norm minimization problem even if
there is no solution to 𝐴𝑥 = 𝑏!
16

min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑥 − 𝜆𝑇
𝑥
17
is an arbitrary
p.s.d matrix

min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑥 − 𝜆𝑇
𝑥
Either
𝑄𝑥 = 𝑠 + 𝜆
or
inf
𝑥
𝐿 𝑥, 𝜆 = −∞
18

min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑥 − 𝜆𝑇
𝑥
max
𝜆
1
2
𝑠 + 𝜆 𝑇𝑄+𝑄𝑄+(𝑠 + 𝜆) − 𝑠 + 𝜆 𝑇𝑄+(𝑠 + 𝜆)
subject to
∃𝑥 𝑠𝑡. 𝑄𝑥 = 𝑠 + 𝜆
𝜆 ≥ 0
19

min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑥 − 𝜆𝑇
𝑥
max
𝜆
−
1
2
𝑠 + 𝜆 𝑇𝑄+(𝑠 + 𝜆)
subject to
∃𝑥 𝑠𝑡. 𝑄𝑥 = 𝑠 + 𝜆
𝜆 ≥ 0
20

min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑥 − 𝜆𝑇
𝑥
max
𝜆
−
1
2
𝑠 + 𝜆 𝑇𝑄+(𝑠 + 𝜆)
subject to
(𝑠 + 𝜆) ∈ span(𝑒𝑖𝑔. 𝑣𝑒𝑐. 𝑜𝑓 𝑄 𝑤𝑖𝑡ℎ 𝑛𝑜𝑛 − 𝑧𝑒𝑟𝑜 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒)
𝜆 ≥ 0
21

min
𝑥∈ℝ𝑛
1
2
𝑥𝑇
𝑥
subject to
𝑥 ≥ 0
𝐿 𝑥, 𝜆 =
1
2
𝑥𝑇
𝑥 − 𝜆𝑇
𝑥
max
𝜆
−
1
2
𝑠 + 𝜆 𝑇𝑄+(𝑠 + 𝜆)
subject to
𝐹𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑠𝑡. 𝜇𝑖 = 0, 𝑠 + 𝜆 𝑇𝑥(𝑖) = 0
𝜆 ≥ 0
22
𝑄 =
𝑖
𝜇𝑖𝑥(𝑖)𝑥 𝑖 𝑇

Non-Convex Quadratic Minimization
min
𝑥∈ℝ𝑛
𝑥𝑇𝐴𝑥 + 2𝑏𝑇𝑥
subject to
𝑥𝑇𝑥 ≤ 1
23
is an arbitrary
symmetric matrix

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
𝐿 𝑥, 𝑦 = 𝑥𝑇
𝐴𝑥 + 2𝑏𝑇
𝑥 + 𝜆 𝑥𝑇
𝑥 − 1
= 𝑥𝑇 𝐴 + 𝜆𝐼 𝑥 + 2𝑏𝑇𝑥 − 𝜆
24
is an arbitrary
symmetric matrix

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
𝑥 − 1
Bounded from below only if 𝐴 + 𝜆𝐼 is positive semidefinite
and the linear system 𝐴 + 𝜆𝐼 = −𝑏 has a solution
25
is an arbitrary
symmetric matrix

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
𝑥 − 1
Bounded from below only if 𝐴 + 𝜆𝐼 is positive semidefinite
and the linear system 𝐴 + 𝜆𝐼 = −𝑏 has a solution
∇𝑥𝐿 = 2 𝐴 + 𝜆𝐼 𝑥 + 2𝑏 = 0
𝑥 = − 𝐴 + 𝜆𝐼 +𝑏
26
is an arbitrary
symmetric matrix

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
𝑥 − 1
max
𝜆≥0
𝑏𝑇 𝐴 + 𝜆𝐼 + 𝐴 + 𝜆𝐼 𝐴 + 𝜆𝐼 +𝑏 − 2𝑏𝑇 𝐴 + 𝜆𝐼 +𝑏 − 𝜆
subject to
𝐴 + 𝜆𝐼 ≽ 0
∃𝑥 st. 𝐴 + 𝜆𝐼 𝑥 = −𝑏
27
is an arbitrary
symmetric matrix

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
𝑥 − 1
max
𝜆≥0
−𝑏𝑇 𝐴 + 𝜆𝐼 +𝑏 − 𝜆
subject to
𝐴 + 𝜆𝐼 ≽ 0
28
is an arbitrary
symmetric matrix

Eigenvectors of 𝐴 + 𝜆𝐼
• If is 𝑥 an eigenvector of 𝐴 with eigenvalue 𝜇, then
• 𝐴𝑥 = 𝜇𝑥
• 𝜆𝐼𝑥 = 𝜆𝑥
• 𝐴 + 𝜆𝐼 𝑥 = 𝜇 + 𝜆 𝑥
• So, 𝐴 + 𝜆𝐼 is positive semidefinite if and only if 𝜆 ≥ −𝜇𝑚𝑖𝑛
where 𝜇𝑚𝑖𝑛 is the smallest eigenvalue of 𝐴
29

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
𝑥 − 1
max
𝜆≥0
−𝑏𝑇 𝐴 + 𝜆𝐼 +𝑏 − 𝜆
subject to
𝜆 ≥ −𝜇𝑚𝑖𝑛
30
is an arbitrary
symmetric matrix

Range of a Matrix
• ∃𝑥 st. 𝐴 + 𝜆𝐼 𝑥 = −𝑏 has a solution for a symmetric matrix
𝐴 + 𝜆𝐼 if and only if 𝑏 is in the span of the eigenvectors of 𝐴 +
𝜆𝐼 with non-zero eigenvalues (note the eigenvectors of 𝐴 + 𝜆𝐼
are the same as the orthonormal eigenvectors of 𝐴)
• In this case, 𝑏𝑇 𝐴 + 𝜆𝐼 +𝑏 = 𝑖:𝜆+𝜇𝑖>0
𝑥 𝑖 𝑇
𝑏
2
𝜆+𝜇𝑖
• Otherwise, the constraint that ∃𝑥 st. 𝐴 + 𝜆𝐼 𝑥 = −𝑏 is
violated, i.e., there exists an eigenvector 𝑥(𝑗) of 𝐴 such that
𝐴 + 𝜆𝐼 𝑥(𝑗) = 0 and 𝑏𝑇𝑥(𝑗) ≠ 0
• Note that, for large enough 𝜆, this system always has a
solution, indeed, 𝜆 > − min
𝑗:𝑏𝑇𝑥(𝑗)≠0
𝜇𝑗
31

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
max
𝜆≥0
−𝑓(𝜆) − 𝜆
subject to
where
𝑓 𝜆 =
∞, if ∃𝑗 𝑠𝑡. 𝑏𝑇
𝑥 𝑗
≠ 0 𝑎𝑛𝑑 𝜆 + 𝜇𝑗 = 0
𝑖:𝜆+𝜇𝑖>0
𝑥 𝑖 𝑇
𝑏
2
𝜆 + 𝜇𝑖
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
32
is an arbitrary
symmetric matrix

min
𝑥∈ℝ𝑛
subject to
𝑥𝑇𝑥 ≤ 1
max
𝜆≥0
−𝑓(𝜆) − 𝜆
subject to
where
𝑓 𝜆 =
∞, if ∃𝑗 𝑠𝑡. 𝑏𝑇
𝑥 𝑗
≠ 0 𝑎𝑛𝑑 𝜆 + 𝜇𝑗 = 0
𝑖:𝜆+𝜇𝑖>0
𝑥 𝑖 𝑇
𝑏
2
𝜆 + 𝜇𝑖
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
33
is an arbitrary
symmetric matrix
How would
you solve this
problem?

SVD Reformulated
with svd, 𝐴 = 𝑈Σ𝑉𝑇
• 𝐴 = 𝑖=1
min(𝑚,𝑛)
𝜎𝑖𝑢(𝑖)𝑣 𝑖 𝑇
• Without loss of generality, we can assume that the
singular values are in decreasing order, i.e., 𝜎1 = Σ11 ≥
𝜎2 = Σ22 ≥ ⋯ ≥ 𝜎𝑛 = Σ𝑛𝑛
• 𝐴 𝐹
2
= 𝑡𝑟𝑎𝑐𝑒 𝐴𝑇𝐴 = 𝑖 𝜎𝑖
2
34

Applications of SVD: Low Rank Approximations
𝐵∗ = arg min
𝐵 𝑠𝑡.𝑟𝑎𝑛𝑘 𝐵 =𝑘
𝐴 − 𝐵 𝐹
2
𝐴 = 𝑈Σ𝑉𝑇 =
𝑖=1
min(𝑚,𝑛)
𝐵∗ =
𝑖=1
𝑘
Note: 𝜎′s must be in decreasing order for this!
35

Lecture_10_SVD.pptx

Recommended

Recommended

More Related Content

Similar to Lecture_10_SVD.pptx

Similar to Lecture_10_SVD.pptx (20)

Recently uploaded

Recently uploaded (20)

Lecture_10_SVD.pptx