SlideShare a Scribd company logo
1 of 105
Download to read offline
CHAPTER 1
Matrix Algebra
In this chapter we collect results related to matrix algebra which are
relevant to this book. Some specific topics which are typically not
found in standard books are also covered here.
1.1. Preliminaries
Standard notation in this chapter is given here. Matrices are denoted
by capital letters A, B etc.. They can be rectangular with m rows
and n columns. Their elements or entries are referred to with small
letters aij, bij etc. where i denotes the i-th row of matrix and j denotes
the j-th column of matrix. Thus
A =






a11 a12 . . . a1n
a21 a22 . . . a1n
...
...
...
...
am1 am2 . . . amn






Mostly we consider complex matrices belonging to Cm×n
. Sometimes
we will restrict our attention to real matrices belonging to Rm×n
.
Definition 1.1 [Square matrix] An m×n matrix is called square
matrix if m = n.
Definition 1.2 [Tall matrix] An m × n matrix is called tall ma-
trix if m > n i.e. the number of rows is greater than columns.
1
2 1. MATRIX ALGEBRA
Definition 1.3 [Wide matrix] An m × n matrix is called wide
matrix if m < n i.e. the number of columns is greater than rows.
Definition 1.4 [Main diagonal] Let A = [aij] be an m×n matrix.
The main diagonal consists of entries aij where i = j. i.e. main
diagonal is {a11, a22, . . . , akk} where k = min(m, n). Main diagonal
is also known as leading diagonal, major diagonal primary
diagonal or principal diagonal. The entries of A which are not
on the main diagonal are known as off diagonal entries.
Definition 1.5 [Diagonal matrix] A diagonal matrix is a matrix
(usually a square matrix) whose entries outside the main diagonal
are zero.
Whenever we refer to a diagonal matrix which is not square, we
will use the term rectangular diagonal matrix.
A square diagonal matrix A is also represented by diag(a11, a22, . . . , ann)
which lists only the diagonal (non-zero) entries in A.
The transpose of a matrix A is denoted by AT
while the Hermitian
transpose is denoted by AH
. For real matrices AT
= AH
.
When matrices are square, we have the number of rows and columns
both equal to n and they belong to Cn×n
.
If not specified, the square matrices will be of size n×n and rectangular
matrices will be of size m×n. If not specified the vectors (column vec-
tors) will be of size n×1 and belong to either Rn
or Cn
. Corresponding
row vectors will be of size 1 × n.
For statements which are valid both for real and complex matrices,
sometimes we might say that matrices belong to Fm×n
while the scalars
belong to F and vectors belong to Fn
where F refers to either the field
of real numbers or the field of complex numbers. Note that this is not
1.1. PRELIMINARIES 3
consistently followed at the moment. Most results are written only for
Cm×n
while still being applicable for Rm×n
.
Identity matrix for Fn×n
is denoted as In or simply I whenever the size
is clear from context.
Sometimes we will write a matrix in terms of its column vectors. We
will use the notation
A = a1 a2 . . . an
indicating n columns.
When we write a matrix in terms of its row vectors, we will use the
notation
A =






aT
1
aT
2
...
aT
m






indicating m rows with ai being column vectors whose transposes form
the rows of A.
The rank of a matrix A is written as rank(A), while the determinant
as det(A) or |A|.
We say that an m × n matrix A is left-invertible if there exists an
n × m matrix B such that BA = I. We say that an m × n matrix A is
right-invertible if there exists an n × m matrix B such that AB = I.
We say that a square matrix A is invertible when there exists another
square matrix B of same size such that AB = BA = I. A square
matrix is invertible iff its both left and right invertible. Inverse of a
square invertible matrix is denoted by A−1
.
A special left or right inverse is the pseudo inverse, which is denoted
by A†
.
Column space of a matrix is denoted by C(A), the null space by N(A),
and the row space by R(A).
4 1. MATRIX ALGEBRA
We say that a matrix is symmetric when A = AT
, conjugate sym-
metric or Hermitian when AH
= A.
When a square matrix is not invertible, we say that it is singular. A
non-singular matrix is invertible.
The eigen values of a square matrix are written as λ1, λ2, . . . while the
singular values of a rectangular matrix are written as σ1, σ2, . . . .
The inner product or dot product of two column / row vectors u and
v belonging to Rn
is defined as
u · v = u, v =
n
i=1
uivi. (1.1.1)
The inner product or dot product of two column / row vectors u and
v belonging to Cn
is defined as
u · v = u, v =
n
i=1
uivi. (1.1.2)
1.1.1. Block matrix
Definition 1.6 A block matrix is a matrix whose entries them-
selves are matrices with following constraints
(1) Entries in every row are matrices with same number of
rows.
(2) Entries in every column are matrices with same number
of columns.
Let A be an m × n block matrix. Then
A =






A11 A12 . . . A1n
A21 A22 . . . A2n
...
...
...
...
Am1 Am2 . . . Amn






(1.1.3)
where Aij is a matrix with ri rows and cj columns.
1.1. PRELIMINARIES 5
A block matrix is also known as a partitioned matrix.
Example 1.1: 2x2 block matrices Quite frequently we will be using
2x2 block matrices.
P =
P11 P12
P21 P22
. (1.1.4)
An example
P =



a b c
d e f
g h i



We have
P11 =
a b
d e
P12 =
c
f
P21 = g h P22 = i
• P11 and P12 have 2 rows.
• P21 and P22 have 1 row.
• P11 and P21 have 2 columns.
• P12 and P22 have 1 column.
Lemma 1.1 Let A = [Aij] be an m×n block matrix with Aij being
an ri × cj matrix. Then A is an r × c matrix where
r =
m
i=1
ri (1.1.5)
and
c =
n
j=1
cj. (1.1.6)
Remark. Sometimes it is convenient to think of a regular matrix as a
block matrix whose entries are 1 × 1 matrices themselves.
Definition 1.7 [Multiplication of block matrices] Let A = [Aij]
be an m × n block matrix with Aij being a pi × qj matrix. Let
6 1. MATRIX ALGEBRA
B = [Bjk] be an n×p block matrix with Bjk being a qj ×rk matrix.
Then the two block matrices are compatible for multiplication
and their multiplication is defined by C = AB = [Cik] where
Cik =
n
j=1
AijBjk (1.1.7)
and Cik is a pi × rk matrix.
Definition 1.8 A block diagonal matrix is a block matrix
whose off diagonal entries are zero matrices.
1.2. Linear independence, span, rank
1.2.1. Spaces associated with a matrix
Definition 1.9 The column space of a matrix is defined as the
vector space spanned by columns of the matrix.
Let A be an m × n matrix with
A = a1 a2 . . . an
Then the column space is given by
C(A) = {x ∈ Fm
: x =
n
i=1
αiai for some αi ∈ F}. (1.2.1)
Definition 1.10 The row space of a matrix is defined as the
vector space spanned by rows of the matrix.
Let A be an m × n matrix with
A =






aT
1
aT
2
...
aT
m






1.2. LINEAR INDEPENDENCE, SPAN, RANK 7
Then the row space is given by
R(A) = {x ∈ Fn
: x =
m
i=1
αiai for some αi ∈ F}. (1.2.2)
1.2.2. Rank
Definition 1.11 [Column rank] The column rank of a matrix
is defined as the maximum number of columns which are linearly
independent. In other words column rank is the dimension of the
column space of a matrix.
Definition 1.12 [Row rank] The row rank of a matrix is defined
as the maximum number of rows which are linearly independent.
In other words row rank is the dimension of the row space of a
matrix.
Theorem 1.2 The column rank and row rank of a matrix are
equal.
Definition 1.13 [Rank] The rank of a matrix is defined to be
equal to its column rank which is equal to its row rank.
Lemma 1.3 For an m × n matrix A
0 ≤ rank(A) ≤ min(m, n). (1.2.3)
Lemma 1.4 The rank of a matrix is 0 if and only if it is a zero
matrix.
Definition 1.14 [Full rank matrix] An m × n matrix A is called
full rank if
rank(A) = min(m, n).
In other words it is either a full column rank matrix or a full row
rank matrix or both.
8 1. MATRIX ALGEBRA
Lemma 1.5 [Rank of product to two matrices] Let A be an m×n
matrix and B be an n × p matrix then
rank(AB) ≤ min(rank(A), rank(B)). (1.2.4)
Lemma 1.6 [Post-multiplication with a full row rank matrix] Let
A be an m × n matrix and B be an n × p matrix. If B is of rank
n then
rank(AB) = rank(A). (1.2.5)
Lemma 1.7 [Pre-multiplication with a full column rank matrix]
Let A be an m × n matrix and B be an n × p matrix. If A is of
rank n then
rank(AB) = rank(B). (1.2.6)
Lemma 1.8 The rank of a diagonal matrix is equal to the number
of non-zero elements on its main diagonal.
Proof. The columns which correspond to diagonal entries which
are zero are zero columns. Other columns are linearly independent.
The number of linearly independent rows is also the same. Hence their
count gives us the rank of the matrix.
1.3. Invertible matrices
Definition 1.15 [Invertible] A square matrix A is called invert-
ible if there exists another square matrix B of same size such that
AB = BA = I.
The matrix B is called the inverse of A and is denoted as A−1
.
Lemma 1.9 If A is invertible then its inverse A−1
is also invertible
and the inverse of A−1
is nothing but A.
1.3. INVERTIBLE MATRICES 9
Lemma 1.10 Identity matrix I is invertible.
Proof.
II = I =⇒ I−1
= I.
Lemma 1.11 If A is invertible then columns of A are linearly
independent.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Assume that columns of A are linearly dependent. Then there exists
u = 0 such that
Au = 0 =⇒ BAu = 0 =⇒ Iu = 0 =⇒ u = 0
a contradiction. Hence columns of A are linearly independent.
Lemma 1.12 If an n × n matrix A is invertible then columns of
A span Fn
.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Now let x ∈ Fn
be any arbitrary vector. We need to show that there
exists α ∈ Fn
such that
x = Aα.
But
x = Ix = ABx = A(Bx).
Thus if we choose α = Bx, then
x = Aα.
10 1. MATRIX ALGEBRA
Thus columns of A span Fn
.
Lemma 1.13 If A is invertible, then columns of A form a basis
for Fn
.
Proof. In Fn
a basis is a set of vectors which is linearly inde-
pendent and spans Fn
. By lemma 1.11 and lemma 1.12, columns of
an invertible matrix A satisfy both conditions. Hence they form a
basis.
Lemma 1.14 If A is invertible than AT
is invertible.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Applying transpose on both sides we get
BT
AT
= AT
BT
= I.
Thus BT
is inverse of AT
and AT
is invertible.
Lemma 1.15 If A is invertible than AH
is invertible.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Applying conjugate transpose on both sides we get
BH
AH
= AH
BH
= I.
Thus BH
is inverse of AH
and AH
is invertible.
1.3. INVERTIBLE MATRICES 11
Lemma 1.16 If A and B are invertible then AB is invertible.
Proof. We note that
(AB)(B−1
A−1
) = A(BB−1
)A−1
= AIA−1
= I.
Similarly
(B−1
A−1
)(AB) = B−1
(A−1
A)B = B−1
IB = I.
Thus B−1
A−1
is the inverse of AB.
Lemma 1.17 The set of n×n invertible matrices under the matrix
multiplication operation form a group.
Proof. We verify the properties of a group
Closure: If A and B are invertible then AB is invertible. Hence the
set is closed.
Associativity: Matrix multiplication is associative.
Identity element: I is invertible and AI = IA = A for all invertible
matrices.
Inverse element: If A is invertible then A−1
is also invertible.
Thus the set of invertible matrices is indeed a group under matrix
multiplication.
Lemma 1.18 An n × n matrix A is invertible if and only if it is
full rank i.e.
rank(A) = n.
Corollary 1.19. The rank of an invertible matrix and its inverse are
same.
12 1. MATRIX ALGEBRA
1.3.1. Similar matrices
Definition 1.16 [Similar matrices] An n × n matrix B is similar
to an n × n matrix A if there exists an n × n non-singular matrix
C such that
B = C−1
AC.
Lemma 1.20 If B is similar to A then A is similar to B. Thus
similarity is a symmetric relation.
Proof.
B = C−1
AC =⇒ A = CBC−1
=⇒ A = (C−1
)−1
BC−1
Thus there exists a matrix D = C−1
such that
A = D−1
BD.
Thus A is similar to B.
Lemma 1.21 Similar matrices have same rank.
Proof. Let B be similar to A. Thus their exists an invertible
matrix C such that
B = C−1
AC.
Since C is invertible hence we have rank(C) = rank(C−1
) = n. Now
using lemma 1.6 rank(AC) = rank(A) and using lemma 1.7 we have
rank(C−1
(AC)) = rank(AC) = rank(A). Thus
rank(B) = rank(A).
Lemma 1.22 Similarity is an equivalence relation on the set of
n × n matrices.
1.3. INVERTIBLE MATRICES 13
Proof. Let A, B, C be n×n matrices. A is similar to itself through
an invertible matrix I. If A is similar to B then B is similar to itself.
If B is similar to A via P s.t. B = P−1
AP and C is similar to B
via Q s.t. C = Q−1
BQ then C is similar to A via PQ such that
C = (PQ)−1
A(PQ). Thus similarity is an equivalence relation on the
set of square matrices and if A is any n×n matrix then the set of n×n
matrices similar to A forms an equivalence class.
1.3.2. Gram matrices
Definition 1.17 Gram matrix of columns of A is given by
G = AH
A (1.3.1)
Definition 1.18 Gram matrix of rows of A is given by
G = AAH
(1.3.2)
Remark. Usually when we talk about Gram matrix of a matrix we
are looking at the Gram matrix of its column vectors.
Remark. For real matrix A ∈ Rm×n
, the Gram matrix of its column
vectors is given by AT
A and the Gram matrix for its row vectors is
given by AAT
.
Following results apply equally well for the real case.
Lemma 1.23 The columns of a matrix are linearly dependent if
and only if the Gram matrix of its column vectors AH
A is not
invertible.
Proof. Let A be an m × n matrix and G = AH
A be the Gram
matrix of its columns.
If columns of A are linearly dependent, then there exists a vector u = 0
such that
Au = 0.
14 1. MATRIX ALGEBRA
Thus
Gu = AH
Au = 0.
Hence the columns of G are also dependent and G is not invertible.
Conversely let us assume that G is not invertible, thus columns of G
are dependent and there exists a vector v = 0 such that
Gv = 0.
Now
vH
Gv = vH
AH
Av = (Av)H
(Av) = Av 2
2.
From previous equation, we have
Av 2
2 = 0 =⇒ Av = 0.
Since v = 0 hence columns of A are also linearly dependent.
Corollary 1.24. The columns of a matrix are linearly independent if
and only if the Gram matrix of its column vectors AH
A is invertible.
Proof. Columns of A can be dependent only if its Gram matrix is
not invertible. Thus if the Gram matrix is invertible, then the columns
of A are linearly independent.
The Gram matrix is not invertible only if columns of A are linearly
dependent. Thus if columns of A are linearly independent then the
Gram matrix is invertible.
Corollary 1.25. Let A be a full column rank matrix. Then AH
A is
invertible.
Lemma 1.26 The null space of A and its Gram matrix AH
A co-
incide. i.e.
N(A) = N(AH
A). (1.3.3)
Proof. Let u ∈ N(A). Then
Au = 0 =⇒ AH
Au = 0.
1.3. INVERTIBLE MATRICES 15
Thus
u ∈ N(AH
A) =⇒ N(A) ⊆ N(AH
A).
Now let u ∈ N(AH
A). Then
AH
Au = 0 =⇒ uH
AH
Au = 0 =⇒ Au 2
2 = 0 =⇒ Au = 0.
Thus we have
u ∈ N(A) =⇒ N(AH
A) ⊆ N(A).
Lemma 1.27 The rows of a matrix A are linearly dependent if and
only if the Gram matrix of its row vectors AAH
is not invertible.
Proof. Rows of A are linearly dependent, if and only if columns
of AH
are linearly dependent. There exists a vector v = 0 s.t.
AH
v = 0
Thus
Gv = AAH
v = 0.
Since v = 0 hence G is not invertible.
Converse: assuming that G is not invertible, there exists a vector u = 0
s.t.
Gu = 0.
Now
uH
Gu = uH
AAH
u = (AH
u)H
(AH
u) = AH
u 2
2 = 0 =⇒ AH
u = 0.
Since u = 0 hence columns of AH
and consequently rows of A are
linearly dependent.
Corollary 1.28. The rows of a matrix A are linearly independent if
and only if the Gram matrix of its row vectors AAH
is invertible.
Corollary 1.29. Let A be a full row rank matrix. Then AAH
is in-
vertible.
16 1. MATRIX ALGEBRA
1.3.3. Pseudo inverses
Definition 1.19 [Moore-Penrose pseudo-inverse] Let A be an m×
n matrix. An n×m matrix A†
is called its Moore-Penrose pseudo-
inverse if it satisfies all of the following criteria:
(1) AA†
A = A.
(2) A†
AA†
= A†
.
(3) AA† H
= AA†
i.e. AA†
is Hermitian.
(4) (A†
A)H
= A†
A i.e. A†
A is Hermitian.
Theorem 1.30 [Existence and uniqueness] For any matrix A there
exists precisely one matrix A†
which satisfies all the requirements
in definition 1.19.
We omit the proof for this. The pseudo-inverse can actually be ob-
tained by the singular value decomposition of A. This is shown in
lemma 1.110.
Lemma 1.31 Let D = diag(d1, d2, . . . , dn) be an n × n diag-
onal matrix. Then its Moore-Penrose pseudo-inverse is D†
=
diag(c1, c2, . . . , cn) where
ci =
1
di
if di = 0;
0 if di = 0.
Proof. We note that D†
D = DD†
= F = diag(f1, f2, . . . fn)
where
fi =
1 if di = 0;
0 if di = 0.
We now verify the requirements in definition 1.19.
DD†
D = FD = D.
D†
DD†
= FD†
= D†
D†
D = DD†
= F is a diagonal hence Hermitian matrix.
1.3. INVERTIBLE MATRICES 17
Lemma 1.32 Let D = diag(d1, d2, . . . , dp) be an m × n rectan-
gular diagonal matrix where p = min(m, n). Then its Moore-
Penrose pseudo-inverse is an n × m rectangular diagonal matrix
D†
= diag(c1, c2, . . . , cp) where
ci =
1
di
if di = 0;
0 if di = 0.
Proof. F = D†
D = diag(f1, f2, . . . fn) is an n × n matrix where
fi =



1 if di = 0;
0 if di = 0;
0 if i > p.
G = DD†
= diag(g1, g2, . . . gn) is an m × m matrix where
gi =



1 if di = 0;
0 if di = 0;
0 if i > p.
We now verify the requirements in definition 1.19.
DD†
D = DF = D.
D†
DD†
= D†
G = D†
F = D†
D and G = DD†
are both diagonal hence Hermitian matrices.
Lemma 1.33 If A is full column rank then its Moore-Penrose
pseudo-inverse is given by
A†
= (AH
A)−1
AH
. (1.3.4)
It is a left inverse of A.
Proof. By corollary 1.25 AH
A is invertible.
18 1. MATRIX ALGEBRA
First of all we verify that its a left inverse.
A†
A = (AH
A)−1
AH
A = I.
We now verify all the properties.
AA†
A = AI = A.
A†
AA†
= IA†
= A†
.
Hermitian properties:
AA† H
= A(AH
A)−1
AH H
= A(AH
A)−1
AH
= AA†
.
(A†
A)H
= IH
= I = A†
A.
Lemma 1.34 If A is full row rank then its Moore-Penrose pseudo-
inverse is given by
A†
= AH
(AAH
)−1
. (1.3.5)
It is a right inverse of A.
Proof. By corollary 1.29 AAH
is invertible.
First of all we verify that its a right inverse.
AA†
= AAH
(AAH
)−1
= I.
We now verify all the properties.
AA†
A = IA = A.
A†
AA†
= A†
I = A†
.
Hermitian properties:
AA† H
= IH
= I = AA†
.
(A†
A)H
= AH
(AAH
)−1
A
H
= AH
(AAH
)−1
A = A†
A.
1.4. TRACE AND DETERMINANT 19
1.4. Trace and determinant
1.4.1. Trace
Definition 1.20 [Trace] The trace of a square matrix is defined
as the sum of the entries on its main diagonal. Let A be an n × n
matrix, then
tr(A) =
n
i=1
aii (1.4.1)
where tr(A) denotes the trace of A.
Lemma 1.35 The trace of a square matrix and its transpose are
equal.
tr(A) = tr(AT
). (1.4.2)
Lemma 1.36 Trace of sum of two square matrices is equal to the
sum of their traces.
tr(A + B) = tr(A) + tr(B). (1.4.3)
Lemma 1.37 Let A be an m×n matrix and B be an n×m matrix.
Then
tr(AB) = tr(BA). (1.4.4)
Proof. Let AB = C = [cij]. Then
cij =
n
k=1
aikbkj.
Thus
cii =
n
k=1
aikbki.
Now
tr(C) =
m
i=1
cii =
m
i=1
n
k=1
aikbki =
n
k=1
m
i=1
aikbki =
n
k=1
m
i=1
bkiaik.
20 1. MATRIX ALGEBRA
Let BA = D = [dij]. Then
dij =
m
k=1
bikakj.
Thus
dii =
m
k=1
bikaki.
Hence
tr(D) =
n
i=1
dii =
n
i=1
m
k=1
bikaki =
m
i=1
n
k=1
bkiaik.
This completes the proof.
Lemma 1.38 Let A ∈ Fm×n
, B ∈ Fn×p
, C ∈ Fp×m
be three ma-
trices. Then
tr(ABC) = tr(BCA) = tr(CAB). (1.4.5)
Proof. Let AB = D. Then
tr(ABC) = tr(DC) = tr(CD) = tr(CAB).
Similarly the other result can be proved.
Lemma 1.39 Trace of similar matrices is equal.
Proof. Let B be similar to A. Thus
B = C−1
AC
for some invertible matrix C. Then
tr(B) = tr(C−1
AC) = tr(CC−1
A) = tr(A).
We used lemma 1.37.
1.4. TRACE AND DETERMINANT 21
1.4.2. Determinants
Following are some results on determinant of a square matrix A.
Lemma 1.40
det(αA) = αn
det(A). (1.4.6)
Lemma 1.41 Determinant of a square matrix and its transpose
are equal.
det(A) = det(AT
). (1.4.7)
Lemma 1.42 Let A be a complex square matrix. Then
det(AH
) = det(A). (1.4.8)
Proof.
det(AH
) = det(A
T
) = det(A) = det(A).
Lemma 1.43 Let A and B be two n × n matrices. Then
det(AB) = det(A) det(B). (1.4.9)
Lemma 1.44 Let A be an invertible matrix. Then
det(A−1
) =
1
det(A)
. (1.4.10)
22 1. MATRIX ALGEBRA
Lemma 1.45 Let A be a square matrix and p ∈ N. Then
det(Ap
) = (det(A))p
. (1.4.11)
Lemma 1.46 [Determinant of a triangular matrix] Determinant
of a triangular matrix is the product of its diagonal entries. i.e. if
A is upper or lower triangular matrix then
det(A) =
n
i=1
aii. (1.4.12)
Lemma 1.47 [Determinant of a diagonal matrix] Determinant of
a diagonal matrix is the product of its diagonal entries. i.e. if A
is a diagonal matrix then
det(A) =
n
i=1
aii. (1.4.13)
Lemma 1.48 [Determinant of similar matrices] Determinant of
similar matrices is equal.
Proof. Let B be similar to A. Thus
B = C−1
AC
for some invertible matrix C. Hence
det(B) = det(C−1
AC) = det(C−1
) det(A) det(C).
Now
det(C−1
) det(A) det(C) =
1
det(C)
det(A) det(C) = det(A).
We used lemma 1.43 and lemma 1.44.
1.5. UNITARY AND ORTHOGONAL MATRICES 23
Lemma 1.49 Let u and v be vectors in Fn
. Then
det(I + uvT
) = 1 + uT
v. (1.4.14)
Lemma 1.50 [Determinant of a small perturbation of identity
matrix] Let A be a square matrix and let ≈ 0. Then
det(I + A) ≈ 1 + tr(A). (1.4.15)
1.5. Unitary and orthogonal matrices
1.5.1. Orthogonal matrix
Definition 1.21 [Orthogonal matrix] A real square matrix U is
called orthogonal if the columns of U form an orthonormal set.
In other words, let
U = u1 u2 . . . un
with ui ∈ Rn
. Then we have
ui · uj = δi,j.
Lemma 1.51 An orthogonal matrix U is invertible with UT
=
U−1
.
Proof. Let
U = u1 u2 . . . un
be orthogonal with
UT
=






uT
1
uT
2
...
uT
n .






24 1. MATRIX ALGEBRA
Then
UT
U =






uT
1
uT
2
...
uT
n .






u1 u2 . . . un = ui · uj = I.
Since columns of U are linearly independent and span Rn
, hence U is
invertible. Thus
UT
= U−1
.
Lemma 1.52 Determinant of an orthogonal matrix is ±1.
Proof. Let U be an orthogonal matrix. Then
det(UT
U) = det(I) =⇒ (det(U))2
= 1
Thus we have
det(U) = ±1.
1.5.2. Unitary matrix
Definition 1.22 [Unitary matrix] A complex square matrix U is
called unitary if the columns of U form an orthonormal set. In
other words, let
U = u1 u2 . . . un
with ui ∈ Cn
. Then we have
ui · uj = ui, uj = uH
j ui = δi,j.
Lemma 1.53 A unitary matrix U is invertible with UH
= U−1
.
Proof. Let
U = u1 u2 . . . un
1.5. UNITARY AND ORTHOGONAL MATRICES 25
be orthogonal with
UH
=






uH
1
uH
2
...
uH
n .






Then
UH
U =






uH
1
uH
2
...
uH
n .






u1 u2 . . . un = uH
i uj = I.
Since columns of U are linearly independent and span Cn
, hence U is
invertible. Thus
UH
= U−1
.
Lemma 1.54 The magnitude of determinant of a unitary matrix
is 1.
Proof. Let U be a unitary matrix. Then
det(UH
U) = det(I) =⇒ det(UH
) det(U) = 1 =⇒ det(U)det(U) = 1.
Thus we have
| det(U)|2
= 1 =⇒ | det(U)| = 1.
1.5.3. F unitary matrix
We provide a common definition for unitary matrices over any field F.
This definition applies to both real and complex matrices.
Definition 1.23 [F Unitary matrix] A square matrix U ∈ Fn×n
is
called F unitary if the columns of U form an orthonormal set. In
26 1. MATRIX ALGEBRA
other words, let
U = u1 u2 . . . un
with ui ∈ Fn
. Then we have
ui, uj = uH
j ui = δi,j.
We note that a suitable definition of inner product transports the def-
inition appropriately into orthogonal matrices over R and unitary ma-
trices over C.
When we are talking about F unitary matrices, then we will use the
symbol UH
to mean its inverse. In the complex case, it will map to its
conjugate transpose, while in real case it will map to simple transpose.
This definition helps us simplify some of the discussions in the sequel
(like singular value decomposition).
Following results apply equally to orthogonal matrices for real case and
unitary matrices for complex case.
Lemma 1.55 [Norm preservation] F-unitary matrices preserve norm.
i.e.
Ux 2 = x 2.
Proof.
Ux 2
2 = (Ux)H
(Ux) = xH
UH
Ux = xH
Ix = x 2
2.
Remark. For the real case we have
Ux 2
2 = (Ux)T
(Ux) = xT
UT
Ux = xT
Ix = x 2
2.
Lemma 1.56 [Inner product preservation] F-unitary matrices pre-
serve inner product. i.e.
Ux, Uy = x, y .
1.6. EIGEN VALUES 27
Proof.
Ux, Uy = (Uy)H
Ux = yH
UH
Ux = yH
x.
Remark. For the real case we have
Ux, Uy = (Uy)T
Ux = yT
UT
Ux = yT
x.
1.6. Eigen values
Much of the discussion in this section will be equally applicable to real
as well as complex matrices. We will use the complex notation mostly
and make specific remarks for real matrices wherever needed.
Definition 1.24 [Eigen value] A scalar λ is an eigen value of an
n × n matrix A = [aij] if there exists a non null vector x such that
Ax = λx. (1.6.1)
A non null vector x which satisfies this equation is called an eigen
vector of A for the eigen value λ.
An eigen value is also known as a characteristic value, proper
value or a latent value.
We note that (1.6.1) can be written as
Ax = λInx =⇒ (A − λIn)x = 0. (1.6.2)
Thus λ is an eigen value of A if and only if the matrix A−λI is singular.
Definition 1.25 [Spectrum of a matrix] The set comprising of
eigen values of a matrix A is known as its spectrum.
Remark. For each eigen vector x for a matrix A the corresponding
eigen value λ is unique.
28 1. MATRIX ALGEBRA
Proof. Assume that for x there are two eigen values λ1 and λ2,
then
Ax = λ1x = λ2x =⇒ (λ1 − λ2)x = 0.
This can happen only when either x = 0 or λ1 = λ2. Since x is an
eigen vector, it cannot be 0. Thus λ1 = λ2.
Remark. If x is an eigen vector for A, then the corresponding eigen
value is given by
λ =
xH
Ax
xHx
. (1.6.3)
Proof.
Ax = λx =⇒ xH
Ax = λxH
x =⇒ λ =
xH
Ax
xHx
.
since x is non-zero.
Remark. An eigen vector x of A for eigen value λ belongs to the null
space of A − λI, i.e.
x ∈ N(A − λI).
In other words x is a nontrivial solution to the homogeneous system of
linear equations given by
(A − λI)z = 0.
Definition 1.26 [Eigen space] Let λ be an eigen value for a square
matrix A. Then its eigen space is the null space of A − λI i.e.
N(A − λI).
Remark. The set comprising all the eigen vectors of A for an eigen
value λ is given by
N(A − λI)  {0} (1.6.4)
since 0 cannot be an eigen vector.
1.6. EIGEN VALUES 29
Definition 1.27 [Geometric multiplicity] Let λ be an eigen value
for a square matrix A. The dimension of its eigen space N(A−λI)
is known as the geometric multiplicity of the eigen value λ.
Remark. Clearly
dim(N(A − λI)) = n − rank(A − λI).
Remark. A scalar λ can be an eigen value of a square matrix A if and
only if
det(A − λI) = 0.
det(A − λI) is a polynomial in λ of degree n.
Remark.
det(A − λI) = p(λ) = αn
λn
+ αn−1
λn−1
+ · · · + α1
λ + α0 (1.6.5)
where αi depend on entries in A.
In this sense, an eigen value of A is a root of the equation
p(λ) = 0. (1.6.6)
Its easy to show that αn
= (−1)n
.
Definition 1.28 [Characteristic polynomial and equation] For any
square matrix A, the polynomial given by p(λ) = det(A − λI) is
known as its characteristic polynomial. The equation give by
p(λ) = 0 (1.6.7)
is known as its characteristic equation. The eigen values of
A are the roots of its characteristic polynomial or solutions of its
characteristic equation.
30 1. MATRIX ALGEBRA
Lemma 1.57 [Roots of characteristic equation] For real square
matrices, if we restrict eigen values to real values, then the char-
acteristic polynomial can be factored as
p(λ) = (−1)n
(λ − λ1)r1
. . . (λ − λk)rk
q(λ). (1.6.8)
The polynomial has k distinct real roots. For each root λi, ri is a
positive integer indicating how many times the root appears. q(λ)
is a polynomial that has no real roots. The following is true
r1 + · · · + rk + deg(q(λ)) = n. (1.6.9)
Clearly k ≤ n.
For complex square matrices where eigen values can be complex
(including real square matrices), the characteristic polynomial can
be factored as
p(λ) = (−1)n
(λ − λ1)r1
. . . (λ − λk)rk
. (1.6.10)
The polynomial can be completely factorized into first degree poly-
nomials. There are k distinct roots or eigen values. The following
is true
r1 + · · · + rk = n. (1.6.11)
Thus including the duplicates there are exactly n eigen values for
a complex square matrix.
Remark. It is quite possible that a real square matrix doesn’t have
any real eigen values.
Definition 1.29 [Algebraic multiplicity] The number of times an
eigen value appears in the factorization of the characteristic poly-
nomial of a square matrix A is known as its algebraic multiplicity.
In other words ri is the algebraic multiplicity for λi in above fac-
torization.
Remark. In above the set {λ1, . . . , λk} forms the spectrum of A.
1.6. EIGEN VALUES 31
Let us consider the sum of ri which gives the count of total number of
roots of p(λ).
m =
k
i=1
ri. (1.6.12)
With this there are m not-necessarily distinct roots of p(λ). Let us
write p(λ) as
p(λ) = (−1)n
(λ − c1)(λ − c2) . . . (λ − cm)q(λ). (1.6.13)
where c1, c2, . . . , cm are m scalars (not necessarily distinct) of which r1
scalars are λ1, r2 are λ2 and so on. Obviously for the complex case
q(λ) = 1.
We will refer to the set (allowing repetitions) {c1, c2, . . . , cm} as the
eigen values of the matrix A where ci are not necessarily distinct. In
contrast the spectrum of A refers to the set of distinct eigen values of
A. The symbol c has been chosen based on the other name for eigen
values (the characteristic values).
We can put together eigen vectors of a matrix into another matrix by
itself. This can be very useful tool. We start with a simple idea.
Lemma 1.58 Let A be an n × n matrix. Let u1, u2, . . . , ur be r
non-zero vectors from Fn
. Let us construct an n × r matrix
U = u1 u2 . . . ur .
Then all the r vectors are eigen vectors of A if and only if there
exists a diagonal matrix D = diag(d1, . . . , dr) such that
AU = UD. (1.6.14)
Proof. Expanding the equation, we can write
Au1 Au2 . . . Aur = d1u1 d2u2 . . . drur .
Clearly we want
Aui = diui
32 1. MATRIX ALGEBRA
where ui are non-zero. This is possible only when di is an eigen value
of A and ui is an eigen vector for di.
Converse: Assume that ui are eigen vectors. Choose di to be corre-
sponding eigen values. Then the equation holds.
Lemma 1.59 0 is an eigen value of a square matrix A if and only
if A is singular.
Proof. Let 0 be an eigen value of A. Then there exists u = 0 such
that
Au = 0u = 0.
Thus u is a non-trivial solution of the homogeneous linear system. Thus
A is singular.
Converse: Assuming that A is singular, there exists u = 0 s.t.
Au = 0 = 0u.
Thus 0 is an eigen value of A.
Lemma 1.60 If a square matrix A is singular, then N(A) is the
eigen space for the eigen value λ = 0.
Proof. This is straight forward from the definition of eigen space
(see definition 1.26).
Remark. Clearly the geometric multiplicity of λ = 0 equals nullity(A) =
n − rank(A).
Lemma 1.61 Let A be a square matrix. Then A and AT
have
same eigen values.
Proof. The eigen values of AT
are given by
det(AT
− λI) = 0.
1.6. EIGEN VALUES 33
But
AT
− λI = AT
− (λI)T
= (A − λI)T
.
Hence (using lemma 1.41)
det(AT
− λI) = det (A − λI)T
= det(A − λI).
Thus the characteristic polynomials of A and AT
are same. Hence the
eigen values are same. In other words the spectrum of A and AT
are
same.
Remark (Direction preservation). If x is an eigen vector with a non-
zero eigen value λ for A then Ax and x are collinear.
In other words the angle between Ax and x is either 0◦
when λ is
positive and is 180◦
when λ is negative. Let us look at the inner
product:
Ax, x = xH
Ax = xH
λx = λ x 2
2.
Meanwhile
Ax 2 = λx 2 = |λ| x 2.
Thus
| Ax, x | = Ax 2 x 2.
The angle θ between Ax and x is given by
cos θ =
Ax, x
Ax 2 x 2
=
λ x 2
2
|λ| x 2
2
= ±1.
Lemma 1.62 Let A be a square matrix and λ be an eigen value
of A. Let p ∈ N. Then λp
is an eigen value of Ap
.
Proof. For p = 1 the statement holds trivially since λ1
is an eigen
value of A1
. Assume that the statement holds for some value of p.
Thus let λp
be an eigen value of Ap
and let u be corresponding eigen
vector. Now
Ap+1
u = Ap
(Au) = Ap
λu = λAp
u = λλp
u = λp+1
u.
34 1. MATRIX ALGEBRA
Thus λp+1
is an eigen value for Ap+1
with the same eigen vector u. With
the principle of mathematical induction, the proof is complete.
Lemma 1.63 Let a square matrix A be non singular and let λ = 0
be some eigen value of A. Then λ−1
is an eigen value of A−1
.
Moreover, all eigen values of A−1
are obtained by taking inverses
of eigen values of A i.e. if µ = 0 is an eigen value of A−1
then 1
µ
is an eigen value of A also. Also, A and A−1
share the same set
of eigen vectors.
Proof. Let u = 0 be an eigen vector of A for the eigen value λ.
Then
Au = λu =⇒ u = A−1
λu =⇒
1
λ
u = A−1
u.
Thus u is also an eigen vector of A−1
for the eigen value 1
λ
.
Now let B = A−1
. Then B−1
= A. Thus if µ is an eigen value of B
then 1
µ
is an eigen value of B−1
= A.
Thus if A is invertible then eigen values of A and A−1
have one to one
correspondence.
This result is very useful. Since if it can be shown that a matrix A is
similar to a diagonal or a triangular matrix whose eigen values are easy
to obtain then determination of the eigen values of A becomes straight
forward.
1.6.1. Invariant subspaces
Definition 1.30 [Invariance subspace] Let A be a square n × n
matrix and let W be a subspace of Fn
i.e. W ≤ F. Then W is
invariant relative to A if
Aw ∈ W ∀ w ∈ W. (1.6.15)
i.e. A(W) ⊆ W or for every vector w ∈ W its mapping Aw is also
in W. Thus action of A on W doesn’t take us outside of W.
1.6. EIGEN VALUES 35
We also say that W is A-invariant.
Eigen vectors are generators of invariant subspaces.
Lemma 1.64 Let A be an n × n matrix. Let x1, x2, . . . , xr be r
eigen vectors of A. Let us construct an n × r matrix
X = x1 x2 . . . rr .
Then the column space of X i.e. C(X) is invariant relative to A.
Proof. Let us assume that c1, c2, . . . , cr are the eigen values cor-
responding to x1, x2, . . . , xr (not necessarily distinct).
Let any vector x ∈ C(X) be given by
x =
r
i=1
αixi.
Then
Ax = A
r
i=1
αixi =
r
i=1
αiAxi =
r
i=1
αicixi.
Clearly Ax is also a linear combination of xi hence belongs to C(X).
Thus X is invariant relative to A or X is A-invariant.
1.6.2. Triangular matrices
Lemma 1.65 Let A be an n×n upper or lower triangular matrix.
Then its eigen values are the entries on its main diagonal.
Proof. If A is triangular then A − λI is also triangular with its
diagonal entries being (aii − λ). Using lemma 1.46, we have
p(λ) = det(A − λI) =
n
i=1
(aii − λ).
Clearly the roots of characteristic polynomial are aii.
Several small results follow from this lemma.
36 1. MATRIX ALGEBRA
Corollary 1.66. Let A = [aij] be an n × n triangular matrix.
(a) The characteristic polynomial of A is p(λ) = (−1)n
(λ − aii).
(a) A scalar λ is an eigen value of A iff its one of the diagonal entries
of A.
(a) The algebraic multiplicity of an eigen value λ is equal to the number
of times it appears on the main diagonal of A.
(a) The spectrum of A is given by the distinct entries on the main
diagonal of A.
A diagonal matrix is naturally both an upper triangular matrix as well
as a lower triangular matrix. Similar results hold for the eigen values
of a diagonal matrix also.
Lemma 1.67 Let A = [aij] be an n × n diagonal matrix.
(a) Its eigen values are the entries on its main diagonal.
(a) The characteristic polynomial of A is p(λ) = (−1)n
(λ − aii).
(a) A scalar λ is an eigen value of A iff its one of the diagonal
entries of A.
(a) The algebraic multiplicity of an eigen value λ is equal to the
number of times it appears on the main diagonal of A.
(a) The spectrum of A is given by the distinct entries on the main
diagonal of A.
There is also a result for the geometric multiplicity of eigen values for
a diagonal matrix.
Lemma 1.68 Let A = [aij] be an n × n diagonal matrix. The
geometric multiplicity of an eigen value λ is equal to the number
of times it appears on the main diagonal of A.
Proof. The unit vectors ei are eigen vectors for A since
Aei = aiiei.
1.6. EIGEN VALUES 37
They are independent. Thus if a particular eigen value appears r num-
ber of times, then there are r linearly independent eigen vectors for the
eigen value. Thus its geometric multiplicity is equal to the algebraic
multiplicity.
1.6.3. Similar matrices
Some very useful results are available for similar matrices.
Lemma 1.69 The characteristic polynomial and spectrum of sim-
ilar matrices is same.
Proof. Let B be similar to A. Thus there exists an invertible
matrix C such that
B = C−1
AC.
Now
B−λI = C−1
AC−λI = C−1
AC−λC−1
C = C−1
(AC−λC) = C−1
(A−λI)C.
Thus B − λI is similar to A − λI. Hence due to lemma 1.48, their
determinant is equal i.e.
det(B − λI) = det(A − λI).
This means that the characteristic polynomials of A and B are same.
Since eigen values are nothing but roots of the characteristic polyno-
mial, hence they are same too. This means that the spectrum (the set
of distinct eigen values) is same.
Corollary 1.70. If A and B are similar to each other then
(a) An eigen value has same algebraic and geometric multiplicity for
both A and B.
(a) The (not necessarily distinct) eigen values of A and B are same.
Although the eigen values are same, but the eigen vectors are differ-
ent.
38 1. MATRIX ALGEBRA
Lemma 1.71 Let A and B be similar with
B = C−1
AC
for some invertible matrix C. If u is an eigen vector of A for an
eigen value λ, then C−1
u is an eigen vector of B for the same
eigen value.
Proof. u is an eigen vector of A for an eigen value λ. Thus we
have
Au = λu.
Thus
BC−1
u = C−1
ACC−1
u = C−1
Au = C−1
λu = λC−1
u.
Now u = 0 and C−1
is non singular. Thus C−1
u = 0. Thus C−1
u is an
eigen vector of B.
Theorem 1.72 [Geometric vs. algebraic multiplicity] Let λ be an
eigen value of a square matrix A. Then the geometric multiplicity
of λ is less than or equal to its algebraic multiplicity.
Corollary 1.73. If an n×n matrix A has n distinct eigen values, then
each of them has a geometric (and algebraic) multiplicity of 1.
Proof. The algebraic multiplicity of an eigen value is greater than
or equal to 1. But the sum cannot exceed n. Since there are n distinct
eigen values, thus each of them has algebraic multiplicity of 1. Now
geometric multiplicity of an eigen value is greater than equal to 1 and
less than equal to its algebraic multiplicity.
1.6. EIGEN VALUES 39
Corollary 1.74. Let an n × n matrix A has k distinct eigen values
λ1, λ2, . . . , λk with algebraic multiplicities r1, r2, . . . , rk and geometric
multiplicities g1, g2, . . . gk respectively. Then
k
i=1
gk ≤
k
i=1
rk ≤ n.
Moreover if
k
i=1
gk =
k
i=1
rk
then
gk = rk.
1.6.4. Linear independence of eigen vectors
Theorem 1.75 [Linear independence of eigen vectors for distinct
eigen values] Let A be an n × n square matrix. Let x1, x2, . . . , xk
be any k eigen vectors of A for distinct eigen values λ1, λ2, . . . , λk
respectively. Then x1, x2, . . . , xk are linearly independent.
Proof. We first prove the simpler case with 2 eigen vectors x1 and
x2 and corresponding eigen values λ1 and λ2 respectively.
Let there be a linear relationship between x1 and x2 given by
α1x1 + α2x2 = 0.
Multiplying both sides with (A − λ1I) we get
α1(A − λ1I)x1 + α2(A − λ1I)x2 = 0
=⇒ α1(λ1 − λ1)x1 + α2(λ1 − λ2)x2 = 0
=⇒ α2(λ1 − λ2)x2 = 0.
Since λ1 = λ2 and x2 = 0 , hence α2 = 0.
Similarly by multiplying with (A − λ2I) on both sides, we can show
that α1 = 0. Thus x1 and x2 are linearly independent.
40 1. MATRIX ALGEBRA
Now for the general case, consider a linear relationship between x1, x2, . . . , xk
given by
α1x1 + α2x2 + . . . αkxk = 0.
Multiplying by k
i=j,i=1(A − λiI) and using the fact that λi = λj if
i = j, we get αj = 0. Thus the only linear relationship is the trivial
relationship. This completes the proof.
For eigen values with geometric multiplicity greater than 1 there are
multiple eigenvectors corresponding to the eigen value which are lin-
early independent. In this context, above theorem can be generalized
further.
Theorem 1.76 Let λ1, λ2, . . . , λk be k distinct eigen values of
A. Let {xj
1, xj
2, . . . xj
gj
} be any gj linearly independent eigen vec-
tors from the eigen space of λj where gj is the geometric mul-
tiplicity of λj. Then the combined set of eigen vectors given by
{x1
1, . . . x1
g1
, . . . xk
1, . . . xk
gk
} consisting of k
j=1 gj eigen vectors is
linearly independent.
This result puts an upper limit on the number of linearly independent
eigen vectors of a square matrix.
Lemma 1.77 Let {λ1, . . . , λk} represents the spectrum of an n×n
matrix A. Let g1, . . . , gk be the geometric multiplicities of λ1, . . . λk
respectively. Then the number of linearly independent eigen vectors
for A is
k
i=1
gi.
Moreover if
k
i=1
gi = n
then a set of n linearly independent eigen vectors of A can be found
which forms a basis for Fn
.
1.6. EIGEN VALUES 41
1.6.5. Diagonalization
Diagonalization is one of the fundamental operations in linear algebra.
This section discusses diagonalization of square matrices in depth.
Definition 1.31 [Diagonalizable matrix] An n × n matrix A is
said to be diagonalizable if it is similar to a diagonal matrix.
In other words there exists an n × n non-singular matrix P such
that D = P−1
AP is a diagonal matrix. If this happens then we
say that P diagonalizes A or A is diagonalized by P.
Remark.
D = P−1
AP ⇐⇒ PD = AP ⇐⇒ PDP−1
= A. (1.6.16)
We note that if we restrict to real matrices, then U and D should
also be real. If A ∈ Cn×n
(it may still be real) then P and D can be
complex.
The next theorem is the culmination of a variety of results studied so
far.
Theorem 1.78 [Properties of diagonalizable matrices] Let A be a
diagonalizable matrix with D = P−1
AP being its diagonalization.
Let D = diag(d1, d2, . . . , dn). Then the following hold
(a) rank(A) = rank(D) which equals the number of non-zero en-
tries on the main diagonal of D.
(a) det(A) = d1d2 . . . dn.
(a) tr(A) = d1 + d2 + . . . dn.
(a) The characteristic polynomial of A is
p(λ) = (−1)n
(λ − d1)(λ − d2) . . . (λ − dn).
(a) The spectrum of A comprises the distinct scalars on the diag-
onal entries in D.
42 1. MATRIX ALGEBRA
(a) The (not necessarily distinct) eigenvalues of A are the diagonal
elements of D.
(a) The columns of P are (linearly independent) eigenvectors of
A.
(a) The algebraic and geometric multiplicities of an eigenvalue λ
of A equal the number of diagonal elements of D that equal λ.
Proof. From definition 1.31 we note that D and A are similar.
Due to lemma 1.48
det(A) = det(D).
Due to lemma 1.47
det(D) =
n
i=1
di.
Now due to lemma 1.39
tr(A) = tr(D) =
n
i=1
di.
Further due to lemma 1.69 the characteristic polynomial and spectrum
of A and D are same. Due to lemma 1.67 the eigen values of D are
nothing but its diagonal entries. Hence they are also the eigen values
of A.
D = P−1
AP =⇒ AP = PD.
Now writing
P = p1 p2 . . . pn
we have
AP = Ap1 Ap2 . . . Apn = PD = d1p1 d2p2 . . . dnpn .
Thus pi are eigen vectors of A.
Since the characteristic polynomials of A and D are same, hence the
algebraic multiplicities of eigen values are same.
From lemma 1.71 we get that there is a one to one correspondence
between the eigen vectors of A and D through the change of basis
1.6. EIGEN VALUES 43
given by P. Thus the linear independence relationships between the
eigen vectors remain the same. Hence the geometric multiplicities of
individual eigenvalues are also the same.
This completes the proof.
So far we have verified various results which are available if a matrix A
is diagonalizable. We haven’t yet identified the conditions under which
A is diagonalizable. We note that not every matrix is diagonalizable.
The following theorem gives necessary and sufficient conditions under
which a matrix is diagonalizable.
Theorem 1.79 An n × n matrix A is diagonalizable by an n × n
non-singular matrix P if and only if the columns of P are (linearly
independent) eigenvectors of A.
Proof. We note that since P is non-singular hence columns of P
have to be linearly independent.
The necessary condition part was proven in theorem 1.78. We now
show that if P consists of n linearly independent eigen vectors of A
then A is diagonalizable.
Let the columns of P be p1, p2, . . . , pn and corresponding (not neces-
sarily distinct) eigen values be d1, d2, . . . , dn. Then
Api = dipi.
Thus by letting D = diag(d1, d2, . . . , dn), we have
AP = PD.
Now since columns of P are linearly independent, hence P is invertible.
This gives us
D = P−1
AP.
Thus A is similar to a diagonal matrix D. This validates the sufficient
condition.
44 1. MATRIX ALGEBRA
A corollary follows.
Corollary 1.80. An n×n matrix is diagonalizable if and only if there
exists a linearly independent set of n eigenvectors of A.
Now we know that geometric multiplicities of eigen values of A provide
us information about linearly independent eigenvectors of A.
Corollary 1.81. Let A be an n × n matrix. Let λ1, λ2, . . . , λk be its k
distinct eigen values (comprising its spectrum). Let gj be the geometric
multiplicity of λj.Then A is diagonalizable if and only if
n
i=1
gi = n. (1.6.17)
1.6.6. Symmetric matrices
This subsection is focused on real symmetric matrices.
Following is a fundamental property of real symmetric matrices.
Theorem 1.82 Every real symmetric matrix has an eigen value.
The proof of this result is beyond the scope of this book.
Lemma 1.83 Let A be an n×n real symmetric matrix. Let λ1 and
λ2 be any two distinct eigen values of A and let x1 and x2 be any
two corresponding eigen vectors. Then x1 and x2 are orthogonal.
Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xT
2 Ax1 = λ1xT
2 x1
=⇒ xT
1 AT
x2 = λ1xT
1 x2
=⇒ xT
1 Ax2 = λ1xT
1 x2
=⇒ xT
1 λ2x2 = λ1xT
1 x2
=⇒ (λ1 − λ2)xT
1 x2 = 0
=⇒ xT
1 x2 = 0.
1.6. EIGEN VALUES 45
Thus x1 and x2 are orthogonal. In between we took transpose on both
sides, used the fact that A = AT
and λ1 − λ2 = 0.
Definition 1.32 [Orthogonally diagonalizable matrix] A real n×n
matrix A is said to be orthogonally diagonalizable if there
exists an orthogonal matrix U which can diagonalize A, i.e.
D = UT
AU
is a real diagonal matrix.
Lemma 1.84 Every orthogonally diagonalizable matrix A is sym-
metric.
Proof. We have a diagonal matrix D such that
A = UDUT
.
Taking transpose on both sides we get
AT
= UDT
UT
= UDUT
= A.
Thus A is symmetric.
Theorem 1.85 Every symmetric matrix A is orthogonally diago-
nalizable.
We skip the proof of this theorem.
1.6.7. Hermitian matrices
Following is a fundamental property of Hermitian matrices.
Theorem 1.86 Every Hermitian matrix has an eigen value.
The proof of this result is beyond the scope of this book.
46 1. MATRIX ALGEBRA
Lemma 1.87 The eigenvalues of a Hermitian matrix are real.
Proof. Let A be a Hermitian matrix and let λ be an eigen value
of A. Let u be a corresponding eigen vector. Then
Au = λu
=⇒ uH
AH
= uH
λ
=⇒ uH
AH
u = uH
λu
=⇒ uH
Au = λuH
u
=⇒ uH
λu = λuH
u
=⇒ u 2
2(λ − λ) = 0
=⇒ λ = λ
thus λ is real. We used the facts that A = AH
and u = 0 =⇒ u 2 =
0.
Lemma 1.88 Let A be an n × n complex Hermitian matrix. Let
λ1 and λ2 be any two distinct eigen values of A and let x1 and
x2 be any two corresponding eigen vectors. Then x1 and x2 are
orthogonal.
Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xH
2 Ax1 = λ1xH
2 x1
=⇒ xH
1 AH
x2 = λ1xH
1 x2
=⇒ xH
1 Ax2 = λ1xH
1 x2
=⇒ xH
1 λ2x2 = λ1xH
1 x2
=⇒ (λ1 − λ2)xH
1 x2 = 0
=⇒ xH
1 x2 = 0.
Thus x1 and x2 are orthogonal. In between we took conjugate transpose
on both sides, used the fact that A = AH
and λ1 − λ2 = 0.
1.6. EIGEN VALUES 47
Definition 1.33 [Unitary diagonalizable matrix] A complex n×n
matrix A is said to be unitary diagonalizable if there exists a
unitary matrix U which can diagonalize A, i.e.
D = UH
AU
is a complex diagonal matrix.
Lemma 1.89 Let A be a unitary diagonalizable matrix whose di-
agonalization D is real. Then A is Hermitian.
Proof. We have a real diagonal matrix D such that
A = UDUH
.
Taking conjugate transpose on both sides we get
AH
= UDH
UH
= UDUH
= A.
Thus A is Hermitian. We used the fact that DH
= D since D is
real.
Theorem 1.90 Every Hermitian matrix A is unitary diagonaliz-
able.
We skip the proof of this theorem. The theorem means that if A is
Hermitian then A = UΛUH
Definition 1.34 [Eigen value decomposition of a Hermitian ma-
trix] Let A be an n × n Hermitian matrix. Let λ1, . . . λn be its
eigen values such that |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Let
Λ = diag(λ1, . . . , λn).
Let U be a unit matrix consisting of orthonormal eigen vectors
corresponding to λ1, . . . , λn. Then The eigen value decomposition
of A is defined as
A = UΛUH
. (1.6.18)
48 1. MATRIX ALGEBRA
If λi are distinct, then the decomposition is unique. If they are
not distinct, then
Remark. Let Λ be a diagonal matrix as in definition 1.34. Consider
some vector x ∈ Cn
.
xH
Λx =
n
i=1
λi|xi|2
. (1.6.19)
Now if λi ≥ 0 then
xH
Λx ≤ λ1
n
i=1
|xi|2
= λ1 x 2
2.
Also
xH
Λx ≥ λn
n
i=1
|xi|2
= λn x 2
2.
Lemma 1.91 Let A be a Hermitian matrix with non-negative eigen
values. Let λ1 be its largest and λn be its smallest eigen values.
λn x 2
2 ≤ xH
Ax ≤ λ1 x 2
2 ∀ x ∈ Cn
. (1.6.20)
Proof. A has an eigen value decomposition given by
A = UΛUH
.
Let x ∈ Cn
and let v = UH
x. Clearly x 2 = v 2. Then
xH
Ax = xH
UΛUH
x = vH
Λv.
From previous remark we have
λn v 2
2 ≤ vH
Λv ≤ λ1 v 2
2.
Thus we get
λn x 2
2 ≤ xH
Ax ≤ λ1 x 2
2.
1.6. EIGEN VALUES 49
1.6.8. Miscellaneous properties
This subsection lists some miscellaneous properties of eigen values of a
square matrix.
Lemma 1.92 λ is an eigen value of A if and only if λ + k is an
eigen value of A + kI. Moreover A and A + kI share the same
eigen vectors.
Proof.
Ax = λx
⇐⇒ Ax + kx = λx + kx
⇐⇒ (A + kI)x = (λ + k)x.
(1.6.21)
Thus λ is an eigen value of A with an eigen vector x if and only if λ+k
is an eigen vector of A + kI with an eigen vector x.
1.6.9. Diagonally dominant matrices
Definition 1.35 [Diagonally dominant matrix] Let A = [aij] be a
square matrix in Cn×n
. A is called diagonally dominant if
|aii| ≥
j=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is greater than or equal to the sum of absolute values of
all the off diagonal elements on that row.
Definition 1.36 [Strictly diagonally dominant matrix] Let A =
[aij] be a square matrix in Cn×n
. A is called strictly diagonally
dominant if
|aii| >
j=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is bigger than the sum of absolute values of all the off
diagonal elements on that row.
50 1. MATRIX ALGEBRA
Example 1.2: Strictly diagonally dominant matrix Let us con-
sider
A =






−4 −2 −1 0
−4 7 2 0
3 −4 9 1
2 −1 −3 15






We can see that the strict diagonal dominance condition is satisfied for
each row as follows:
row 1 : | − 4| > | − 2| + | − 1| + |0| = 3
row 2 : |7| > | − 4| + |2| + |0| = 6
row 3 : |9| > |3| + | − 4| + |1| = 8
row 4 : |15| > |2| + | − 1| + | − 3| = 6
Strictly diagonally dominant matrices have a very special property.
They are always non-singular.
Theorem 1.93 Strictly diagonally dominant matrices are non-
singular.
Proof. Suppose that A is diagonally dominant and singular. Then
there exists a vector u ∈ Cn
with u = 0 such that
Au = 0. (1.6.22)
Let
u = u1 u2 . . . un
T
.
We first show that every entry in u cannot be equal in magnitude. Let
us assume that this is so. i.e.
c = |u1| = |u2| = · · · = |un|.
1.6. EIGEN VALUES 51
Since u = 0 hence c = 0. Now for any row i in (1.6.22) , we have
n
j=1
aijuj = 0
=⇒
n
j=1
±aijc = 0
=⇒
n
j=1
±aij = 0
=⇒ aii =
j=i
±aij
=⇒ |aii| = |
j=i
±aij|
=⇒ |aii| ≤
j=i
|aij| using triangle inequality
but this contradicts our assumption that A is strictly diagonally dom-
inant. Thus all entries in u are not equal in magnitude.
Let us now assume that the largest entry in u lies at index i with
|ui| = c. Without loss of generality we can scale down u by c to
get another vector in which all entries are less than or equal to 1 in
magnitude while i-th entry is ±1. i.e. ui = ±1 and |uj| ≤ 1 for all
other entries.
Now from (1.6.22) we get for the i-th row
n
j=1
aijuj = 0
=⇒ ± aii =
j=i
ujaij
=⇒ |aii| ≤
j=i
|ujaij| ≤
j=i
|aij|
which again contradicts our assumption that A is strictly diagonally
dominant.
Hence strictly diagonally dominant matrices are non-singular.
52 1. MATRIX ALGEBRA
1.6.10. Gershgorin’s theorem
We are now ready to examine Gershgorin’ theorem which provides very
useful bounds on the spectrum of a square matrix.
Theorem 1.94 Every eigen value λ of a square matrix A ∈ Cn×n
satisfies
|λ − aii| ≤
j=i
|aij| for some i ∈ {1, 2, . . . , n}. (1.6.23)
Proof. The proof is a straight forward application of non-singularity
of diagonally dominant matrices.
We know that for an eigen value λ, det(λI − A) = 0 i.e. the matrix
(λI − A) is singular. Hence it cannot be strictly diagonally dominant
due to theorem 1.93.
Thus looking at each row i of (λI − A) we can say that
|λ − aii| >
j=i
|aij|
cannot be true for all rows simultaneously. i.e. it must fail at least for
one row. This means that there exists at least one row i for which
|λ − aii| ≤
j=i
|aij|
holds true.
What this theorem means is pretty simple. Consider a disc in the
complex plane for the i-th row of A whose center is given by aii and
whose radius is given by r = j=i |aij| i.e. the sum of magnitudes of
all non-diagonal entries in i-th row.
There are n such discs corresponding to n rows in A. (1.6.23) means
that every eigen value must lie within the union of these discs. It
cannot lie outside.
This idea is crystallized in following definition.
1.7. SINGULAR VALUES 53
Definition 1.37 [Gershgorin’s disc] For i-th row of matrix A we
define the radius ri = j=i |aij| and the center ci = aii. Then the
set given by
Di = {z ∈ C : |z − aii| ≤ ri}
is called the i-th Gershgorin’s disc of A.
We note that the definition is equally valid for real as well as complex
matrices. For real matrices, the centers of disks lie on the real line. For
complex matrices, the centers may lie anywhere in the complex plane.
Clearly there is nothing magical about the rows of A. We can as well
consider the columns of A.
Theorem 1.95 Every eigen value of a matrix A must lie in a
Gershgorin disc corresponding to the columns of A where the Ger-
shgorin disc for j-th column is given by
Dj = {z ∈ C : |z − ajj| ≤ rj}
with
rj =
i=j
|aij|
Proof. We know that eigen values of A are same as eigen values of
AT
and columns of A are nothing but rows of AT
. Hence eigen values of
A must satisfy conditions in theorem 1.94 w.r.t. the matrix AT
. This
completes the proof.
1.7. Singular values
In previous section we saw diagonalization of square matrices which
resulted in an eigen value decomposition of the matrix. This matrix
factorization is very useful yet it is not applicable in all situations. In
particular, the eigen value decomposition is useless if the square matrix
is not diagonalizable or if the matrix is not square at all. Moreover,
54 1. MATRIX ALGEBRA
the decomposition is particularly useful only for real symmetric or Her-
mitian matrices where the diagonalizing matrix is an F-unitary matrix
(see definition 1.23). Otherwise, one has to consider the inverse of the
diagonalizing matrix also.
Fortunately there happens to be another decomposition which applies
to all matrices and it involves just F-unitary matrices.
Definition 1.38 [Singular value] A non-negative real number σ is
a singular value for a matrix A ∈ Fm×n
if and only if there exist
unit-length vectors u ∈ Fm
and v ∈ Fn
such that
Av = σu (1.7.1)
and
AH
u = σv (1.7.2)
hold. The vectors u and v are called left-singular and right-
singular vectors for σ respectively.
We first present the basic result of singular value decomposition. We
will not prove this result completely although we will present proofs of
some aspects.
Theorem 1.96 For every A ∈ Fm×n
with k = min(m, n), there
exist two F-unitary matrices U ∈ Fm×m
and V ∈ Fn×n
and a
sequence of real numbers
σ1 ≥ σ2 ≥ · · · ≥ σk ≥ 0
such that
UH
AV = Σ (1.7.3)
where
Σ = diag(σ1, σ2, . . . , σk) ∈ Fm×n
.
The non-negative real numbers σi are the singular values of A as
per definition 1.38.
1.7. SINGULAR VALUES 55
The sequence of real numbers σi doesn’t depend on the particular
choice of U and V .
Σ is rectangular with the same size as A. The singular values of A lie
on the principle diagonal of Σ. All other entries in Σ are zero.
It is certainly possible that some of the singular values are 0 themselves.
Remark. Since UH
AV = Σ hence
A = UΣV H
. (1.7.4)
Definition 1.39 [Singular value decomposition] The decomposi-
tion of a matrix A ∈ Fm×n
given by
A = UΣV H
(1.7.5)
is known as its singular value decomposition.
Remark. When F is R then the decomposition simplifies to
UT
AV = Σ (1.7.6)
and
A = UΣV T
. (1.7.7)
Remark. Clearly there can be at most k = min(m, n) distinct singular
values of A.
Remark. We can also write
AV = UΣ. (1.7.8)
Remark. Let us expand
A = UΣV H
= u1 u2 . . . um σij






vH
1
vH
2
...
vH
n






=
m
i=1
n
j=1
σijuivH
j .
56 1. MATRIX ALGEBRA
Remark. Alternatively, let us expand
Σ = UH
AV =






uH
1
uH
2
...
uH
m






A v1 v2 . . . vm = uH
i Avj
This gives us
σij = uH
i Avj. (1.7.9)
Following lemma verifies that Σ indeed consists of singular values of A
as per definition 1.38.
Lemma 1.97 Let A = UΣV H
be a singular value decomposition
of A. Then the main diagonal entries of Σ are singular values.
The first k = min(m, n) column vectors in U and V are left and
right singular vectors of A.
Proof. We have
AV = UΣ.
Let us expand R.H.S.
UΣ = m
j=1 uijσjk = [uikσk] = σ1u1 σ2u2 . . . σkuk 0 . . . 0
where 0 columns in the end appear n − k times.
Expanding the L.H.S. we get
AV = Av1 Av2 . . . Avn .
Thus by comparing both sides we get
Avi = σiui for 1 ≤ i ≤ k
and
Avi = 0 for k < i ≤ n.
Now let us start with
A = UΣV H
=⇒ AH
= V ΣH
UH
=⇒ AH
U = V ΣH
.
1.7. SINGULAR VALUES 57
Let us expand R.H.S.
V ΣH
= n
j=1 vijσjk = [vikσk] = σ1v1 σ2v2 . . . σkvk 0 . . . 0
where 0 columns appear m − k times.
Expanding the L.H.S. we get
AH
U = AH
u1 AH
u2 . . . AH
um .
Thus by comparing both sides we get
AH
ui = σivi for 1 ≤ i ≤ k
and
AH
ui = 0 for k < i ≤ m.
We now consider the three cases.
For m = n, we have k = m = n. And we get
Avi = σiui, AH
ui = σivi for 1 ≤ i ≤ m
Thus σi is a singular value of A and ui is a left singular vector while vi
is a right singular vector.
For m < n, we have k = m. We get for first m vectors in V
Avi = σiui, AH
ui = σivi for 1 ≤ i ≤ m.
Finally for remaining n − m vectors in V , we can write
Avi = 0.
They belong to the null space of A.
For m > n, we have k = n. We get for first n vectors in U
Avi = σiui, AH
ui = σivi for 1 ≤ i ≤ n.
Finally for remaining m − n vectors in U, we can write
AH
ui = 0.
58 1. MATRIX ALGEBRA
Lemma 1.98 ΣΣH
is an m × m matrix given by
ΣΣH
= diag(σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0)
where the number of 0’s following σ2
k is m − k.
Lemma 1.99 ΣH
Σ is an n × n matrix given by
ΣH
Σ = diag(σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0)
where the number of 0’s following σ2
k is n − k.
Lemma 1.100 [Rank and singular value decomposition] Let A ∈
Fm×n
have a singular value decomposition given by
A = UΣV H
.
Then
rank(A) = rank(Σ). (1.7.10)
In other words, rank of A is number of non-zero singular values of
A. Since the singular values are ordered in descending order in A
hence, the first r singular values σ1, . . . , σr are non-zero.
Proof. This is a straight forward application of lemma 1.6 and
lemma 1.7. Further since only non-zero values in Σ appear on its main
diagonal hence its rank is number of non-zero singular values σi.
Corollary 1.101. Let r = rank(A). Then Σ can be split as a block
matrix
Σ =
Σr 0
0 0
(1.7.11)
where Σr is an r × r diagonal matrix of the non-zero singular values
diag(σ1, σ2, . . . , σr). All other sub-matrices in Σ are 0.
1.7. SINGULAR VALUES 59
Lemma 1.102 The eigen values of Hermitian matrix AH
A ∈
Fn×n
are σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with n − k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AH
A = UΣV H H
UΣV H
= V ΣH
UH
UΣV H
= V ΣH
ΣV H
.
We note that AH
A is Hermitian. Hence AH
A is diagonalized by V and
the diagonalization of AH
A is ΣH
Σ. Thus the eigen values of AH
A are
σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with n − k 0’s after σ2
k.
Clearly
(AH
A)V = V (ΣH
Σ)
thus columns of V are the eigen vectors of AH
A.
Lemma 1.103 The eigen values of Hermitian matrix AAH
∈
Fm×m
are σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with m−k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AAH
= UΣV H
UΣV H H
= UΣV H
V ΣH
UH
= UΣΣH
UH
.
We note that AH
A is Hermitian. Hence AH
A is diagonalized by V and
the diagonalization of AH
A is ΣH
Σ. Thus the eigen values of AH
A are
σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with m − k 0’s after σ2
k.
Clearly
(AAH
)U = U(ΣΣH
)
thus columns of U are the eigen vectors of AAH
.
Lemma 1.104 The Gram matrices AAH
and AH
A share the same
eigen values except for some extra 0s. Their eigen values are the
squares of singular values of A and some extra 0s. In other words
60 1. MATRIX ALGEBRA
singular values of A are the square roots of non-zero eigen values
of the Gram matrices AAH
or AH
A.
1.7.1. The largest singular value
Lemma 1.105 For all u ∈ Fn
the following holds
Σu 2 ≤ σ1 u 2 (1.7.12)
Moreover for all u ∈ Fm
the following holds
ΣH
u 2 ≤ σ1 u 2 (1.7.13)
Proof. Let us expand the term Σu.









σ1 0 . . . . . . 0
0 σ2 . . . . . . 0
...
...
... . . . 0
0
... σk . . . 0
0 0
... . . . 0




















u1
u2
...
uk
...
un











=














σ1u1
σ2u2
...
σkuk
0
...
0














Now since σ1 is the largest singular value, hence
|σrui| ≤ |σ1ui| ∀ 1 ≤ i ≤ k.
Thus
n
i=1
|σ1ui|2
≥
n
i=1
|σiui|2
or
σ2
1 u 2
2 ≥ Σu 2
2.
The result follows.
A simpler representation of Σu can be given using corollary 1.101. Let
r = rank(A). Thus
Σ =
Σr 0
0 0
1.7. SINGULAR VALUES 61
We split entries in u as u = [(u1, . . . , ur)(ur+1 . . . un)]T
. Then
Σu =


Σr u1 . . . ur
T
0 ur+1 . . . un
T

 = σ1u1 σ2u2 . . . σrur 0 . . . 0
T
Thus
Σu 2
2 =
r
i=1
|σiui|2
≤ σ1
r
i=1
|ui|2
≤ σ1 u 2
2.
2nd result can also be proven similarly.
Lemma 1.106 Let σ1 be the largest singular value of an m × n
matrix A. Then
Ax 2 ≤ σ1 x 2 ∀ x ∈ Fn
. (1.7.14)
Moreover
AH
x 2 ≤ σ1 x 2 ∀ x ∈ Fm
. (1.7.15)
Proof.
Ax 2 = UΣV H
x 2 = ΣV H
x 2
since U is unitary. Now from previous lemma we have
ΣV H
x 2 ≤ σ1 V H
x 2 = σ1 x 2
since V H
also unitary. Thus we get the result
Ax 2 ≤ σ1 x 2 ∀ x ∈ Fn
.
Similarly
AH
x 2 = V ΣH
UH
x 2 = ΣH
UH
x 2
since V is unitary. Now from previous lemma we have
ΣH
UH
x 2 ≤ σ1 UH
x 2 = σ1 x 2
since UH
also unitary. Thus we get the result
AH
x 2 ≤ σ1 x 2 ∀ x ∈ Fm
.
62 1. MATRIX ALGEBRA
There is a direct connection between the largest singular value and
2-norm of a matrix (see section 1.8.6).
Corollary 1.107. The largest singular value of A is nothing but its
2-norm. i.e.
σ1 = max
u 2=1
Au 2.
1.7.2. SVD and pseudo inverse
Lemma 1.108 [Pseudo-inverse of Σ] Let A = UΣV H
and let r =
rank(A). Let σ1, . . . , σr be the r non-zero singular values of A.
Then the Moore-Penrose pseudo-inverse of Σ is an n × m matrix
Σ†
given by
Σ†
=
Σ−1
r 0
0 0
(1.7.16)
where Σr = diag(σ1, . . . , σr).
Essentially Σ†
is obtained by transposing Σ and inverting all its
non-zero (positive real) values.
Proof. Straight forward application of lemma 1.32.
Corollary 1.109. The rank of Σ and its pseudo-inverse Σ†
are same.
i.e.
rank(Σ) = rank(Σ†
). (1.7.17)
Proof. The number of non-zero diagonal entries in Σ and Σ†
are
same.
Lemma 1.110 Let A be an m × n matrix and let A = UΣV H
be
its singular value decomposition. Let Σ†
be the pseudo inverse of
Σ as per lemma 1.108. Then the Moore-Penrose pseudo-inverse of
A is given by
A†
= V Σ†
UH
. (1.7.18)
1.7. SINGULAR VALUES 63
Proof. As usual we verify the requirements for a Moore-Penrose
pseudo-inverse as per definition 1.19. We note that since Σ†
is the
pseudo-inverse of Σ it already satisfies necessary criteria.
First requirement:
AA†
A = UΣV H
V Σ†
UH
UΣV H
= UΣΣ†
ΣV H
= UΣV H
= A.
Second requirement:
A†
AA†
= V Σ†
UH
UΣV H
V Σ†
UH
= V Σ†
ΣΣ†
UH
= V Σ†
UH
= A†
.
We now consider
AA†
= UΣV H
V Σ†
UH
= UΣΣ†
UH
.
Thus
AA† H
= UΣΣ†
UH H
= U ΣΣ† H
UH
= UΣΣ†
UH
= AA†
since ΣΣ†
is Hermitian.
Finally we consider
A†
A = V Σ†
UH
UΣV H
= V Σ†
ΣV H
.
Thus
A†
A
H
= V Σ†
ΣV H H
= V Σ†
Σ
H
V H
= V Σ†
ΣV H
= A†
A
since Σ†
Σ is also Hermitian.
This completes the proof.
Finally we can connect the singular values of A with the singular values
of its pseudo-inverse.
Corollary 1.111. The rank of any m × n matrix A and its pseudo-
inverse A†
are same. i.e.
rank(A) = rank(A†
). (1.7.19)
Proof. We have rank(A) = rank(Σ). Also its easy to verify that
rank(A†
) = rank(Σ†
). So using corollary 1.109 completes the proof.
64 1. MATRIX ALGEBRA
Lemma 1.112 Let A be an m × n matrix and let A†
be its n × m
pseudo inverse as per lemma 1.110. Let r = rank(A) Let k =
min(m, n) denote the number of singular values while r denote the
number of non-singular values of A. Let σ1, . . . , σr be the non-zero
singular values of A. Then the number of singular values of A†
is
same as that of A and the non-zero singular values of A†
are
1
σ1
, . . . ,
1
σr
while all other k − r singular values of A†
are zero.
Proof. k = min(m, n) denotes the number of singular values for
both A and A†
. Since rank of A and A†
are same, hence the number
of non-zero singular values is same. Now look at
A†
= V Σ†
UH
where
Σ†
=
Σ−1
r 0
0 0
.
Clearly Σ−1
r = diag( 1
σ1
, . . . , 1
σr
).
Thus expanding the R.H.S. we can get
A†
=
r
i=1
1
σi
viuH
i
where vi and ui are first r columns of V and U respectively. If we
reverse the order of first r columns of U and V and reverse the first r
diagonal entries of Σ†
, the R.H.S. remains the same while we are able
to express A†
in the standard singular value decomposition form. Thus
1
σ1
, . . . , 1
σr
are indeed the non-zero singular values of A†
.
1.7.3. Full column rank matrices
In this subsection we consider some specific results related to singular
value decomposition of a full column rank matrix.
1.7. SINGULAR VALUES 65
We will consider A to be an m × n matrix in Fm×n
with m ≥ n and
rank(A) = n. Let A = UΣV H
be its singular value decomposition.
From lemma 1.100 we observe that there are n non-zero singular values
of A. We will call these singular values as σ1, σ2, . . . , σn. We will define
Σn = diag(σ1, σ2, . . . , σn).
Clearly Σ is an 2 × 1 block matrix given by
Σ =
Σn
0
where the lower 0 is an (m − n) × n zero matrix. From here we obtain
that ΣH
Σ is an n × n matrix given by
ΣH
Σ = Σ2
n
where
Σ2
n = diag(σ2
1, σ2
2, . . . , σ2
n).
Lemma 1.113 Let A be a full column rank matrix with singular
value decomposition A = UΣV H
. Then ΣH
Σ = Σ2
n = diag(σ2
1, σ2
2, . . . , σ2
n)
and ΣH
Σ is invertible.
Proof. Since all singular values are non-zero hence Σ2
n is invert-
ible. Thus
ΣH
Σ
−1
= Σ2
n
−1
= diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
. (1.7.20)
Lemma 1.114 Let A be a full column rank matrix with singular
value decomposition A = UΣV H
. Let σ1 be its largest singular
value and σn be its smallest singular value. Then
σ2
n x 2 ≤ ΣH
Σx 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
. (1.7.21)
66 1. MATRIX ALGEBRA
Proof. Let x ∈ Fn
. We have
ΣH
Σx 2
2 = Σ2
nx 2
2 =
n
i=1
|σ2
i xi|2
.
Now since
σn ≤ σi ≤ σ1
hence
σ4
n
n
i=1
|xi|2
≤
n
i=1
|σ2
i xi|2
≤ σ4
1
n
i=1
|xi|2
thus
σ4
n x 2
2 ≤ ΣH
Σx 2
2 ≤ σ4
1 x 2
2.
Applying square roots, we get
σ2
n x 2 ≤ ΣH
Σx 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
.
We recall from corollary 1.25 that the Gram matrix of its column vec-
tors G = AH
A is full rank and invertible.
Lemma 1.115 Let A be a full column rank matrix with singular
value decomposition A = UΣV H
. Let σ1 be its largest singular
value and σn be its smallest singular value. Then
σ2
n x 2 ≤ AH
Ax 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
. (1.7.22)
Proof.
AH
A = (UΣV H
)H
(UΣV H
) = V ΣH
ΣV H
.
Let x ∈ Fn
. Let
u = V H
x =⇒ u 2 = x 2.
Let
r = ΣH
Σu.
Then from previous lemma we have
σ2
n u 2 ≤ ΣH
Σu 2 = r 2 ≤ σ2
1 u 2.
1.7. SINGULAR VALUES 67
Finally
AH
Ax = V ΣH
ΣV H
x = V r.
Thus
AH
Ax 2 = r 2.
Substituting we get
σ2
n x 2 ≤ AH
Ax 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
.
There are bounds for the inverse of Gram matrix also. First let us
establish the inverse of Gram matrix.
Lemma 1.116 Let A be a full column rank matrix with singular
value decomposition A = UΣV H
. Let the singular values of A be
σ1, . . . , σn. Let the Gram matrix of columns of A be G = AH
A.
Then
G−1
= V ΨV H
where
Ψ = diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
.
Proof. We have
G = V ΣH
ΣV H
Thus
G−1
= V ΣH
ΣV H −1
= V H −1
ΣH
Σ
−1
V −1
= V ΣH
Σ
−1
V H
.
From lemma 1.113 we have
Ψ = ΣH
Σ
−1
= diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
.
This completes the proof.
We can now state the bounds:
68 1. MATRIX ALGEBRA
Lemma 1.117 Let A be a full column rank matrix with singular
value decomposition A = UΣV H
. Let σ1 be its largest singular
value and σn be its smallest singular value. Then
1
σ2
1
x 2 ≤ AH
A
−1
x 2 ≤
1
σ2
n
x 2 ∀ x ∈ Fn
. (1.7.23)
Proof. From lemma 1.116 we have
G−1
= AH
A
−1
= V ΨV H
where
Ψ = diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
.
Let x ∈ Fn
. Let
u = V H
x =⇒ u 2 = x 2.
Let
r = Ψu.
Then
r 2
2 =
n
i=1
1
σ2
i
ui
2
.
Thus
1
σ2
1
u 2 ≤ Ψu 2 = r 2 ≤
1
σ2
n
u 2.
Finally
AH
A
−1
x = V ΨV H
x = V r.
Thus
AH
A
−1
x 2 = r 2.
Substituting we get the result.
1.8. MATRIX NORMS 69
1.7.4. Low rank approximation of a matrix
Definition 1.40 An m × n matrix A is called low rank if
rank(A) min(m, n). (1.7.24)
Remark. A matrix is low rank if the number of non-zero singular
values for the matrix is much smaller than its dimensions.
Following is a simple procedure for making a low rank approximation
of a given matrix A.
(1) Perform the singular value decomposition of A given by A =
UΣV H
.
(2) Identify the singular values of A in Σ.
(3) Keep the first r singular values (where r min(m, n) is the
rank of the approximation). and set all other singular values
to 0 to obtain Σ.
(4) Compute A = UΣV H
.
1.8. Matrix norms
This section reviews various matrix norms on the vector space of com-
plex matrices over the field of complex numbers (Cm×n
, C).
We know (Cm×n
, C) is a finite dimensional vector space with dimension
mn. We will usually refer to it as Cm×n
.
Matrix norms will follow the usual definition of norms for a vector
space.
Definition 1.41 A function · : Cm×n
→ R is called a matrix
norm on Cm×n
if for all A, B ∈ Cm×n
and all α ∈ C it satisfies
the following
Positivity:
A ≥ 0
70 1. MATRIX ALGEBRA
with A = 0 ⇐⇒ A = 0.
Homogeneity:
αA = |α| A .
Triangle inequality:
A + B ≤ A + B .
We recall some of the standard results on normed vector spaces.
All matrix norms are equivalent. Let · and · be two different
matrix norms on Cm×n
. Then there exist two constants a and b such
that the following holds
a A ≤ A ≤ b A ∀ A ∈ Cm×n
.
A matrix norm is a continuous function · : Cm×n
→ R.
1.8.1. Norms like lp on Cn
Following norms are quite like lp norms on finite dimensional complex
vector space Cn
. They are developed by the fact that the matrix vector
space Cm×n
has one to one correspondence with the complex vector
space Cmn
.
Definition 1.42 Let A ∈ Cm×n
and A = [aij].
Matrix sum norm is defined as
A S =
m
i=1
n
j=1
|aij| (1.8.1)
Definition 1.43 Let A ∈ Cm×n
and A = [aij].
Matrix Frobenius norm is defined as
A F =
m
i=1
n
j=1
|aij|2
1
2
. (1.8.2)
1.8. MATRIX NORMS 71
Definition 1.44 Let A ∈ Cm×n
and A = [aij].
Matrix Max norm is defined as
A M = max
1≤i≤m
1≤j≤n
|aij|. (1.8.3)
1.8.2. Properties of Frobenius norm
We now prove some elementary properties of Frobenius norm.
Lemma 1.118 The Frobenius norm of a matrix is equal to the
Frobenius norm of its Hermitian transpose.
AH
F = A F . (1.8.4)
Proof. Let
A = [aij].
Then
AH
= [aji]
AH 2
F =
n
j=1
m
i=1
|aij|2
=
m
i=1
n
j=1
|aij|2
= A 2
F .
Now
AH 2
F = A 2
F =⇒ AH
F = A F
Lemma 1.119 Let A ∈ Cm×n
be written as a row of column vec-
tors
A = a1 . . . an .
Then
A 2
F =
n
j=1
aj
2
2. (1.8.5)
72 1. MATRIX ALGEBRA
Proof. We note that
aj
2
2 =
m
i=1
aij
2
2.
Now
A 2
F =
m
i=1
n
j=1
|aij|2
=
n
j=1
m
i=1
|aij|2
=
n
j=1
aj
2
2 .
We thus showed that that the square of the Frobenius norm of a matrix
is nothing but the sum of squares of l2 norms of its columns.
Lemma 1.120 Let A ∈ Cm×n
be written as a column of row vec-
tors
A =




a1
...
am



 .
Then
A 2
F =
m
i=1
ai 2
2. (1.8.6)
Proof. We note that
ai 2
2 =
n
j=1
aij
2
2.
Now
A 2
F =
m
i=1
n
j=1
|aij|2
=
m
i=1
ai 2
2.
We now consider how the Frobenius norm is affected with the action
of unitary matrices.
Let A be any arbitrary matrix in Cm×n
. Let U be some unitary matrices
in Cm×m
. Let V be some unitary matrices in Cn×n
.
1.8. MATRIX NORMS 73
We present our first result that multiplication with unitary matrices
doesn’t change Frobenius norm of a matrix.
Theorem 1.121 The Frobenius norm of a matrix is invariant to
pre or post multiplication by a unitary matrix. i.e.
UA F = A F (1.8.7)
and
AV F = A F . (1.8.8)
Proof. We can write A as
A = a1 . . . an .
So
UA = Ua1 . . . Uan .
Then applying lemma 1.119 clearly
UA 2
F =
n
j=1
Uaj
2
2.
But we know that unitary matrices are norm preserving. Hence
Uaj
2
2 = aj
2
2.
Thus
UA 2
F =
n
j=1
aj
2
2 = A 2
F
which implies
UA F = A F .
Similarly writing A as
74 1. MATRIX ALGEBRA
A =




r1
...
rm



 .
we have
AV =




r1V
...
rmV



 .
Then applying lemma 1.120 clearly
AV 2
F =
m
i=1
riV 2
2.
But we know that unitary matrices are norm preserving. Hence
riV 2
2 = ri
2
2.
Thus
AV 2
F =
m
i=1
ri
2
2 = A 2
F
which implies
AV F = A F .
An alternative approach for the 2nd part of the proof using the first
part is just one line
AV F = (AV )H
F = V H
AH
F = AH
F = A F .
In above we use lemma 1.118 and the fact that V is a unitary matrix
implies that V H
is also a unitary matrix. We have already shown that
pre multiplication by a unitary matrix preserves Frobenius norm.
Theorem 1.122 Let A ∈ Cm×n
and B ∈ Cn×P
be two matrices.
Then the Frobenius norm of their product is less than or equal to
1.8. MATRIX NORMS 75
the product of Frobenius norms of the matrices themselves. i.e.
AB F ≤ A F B F . (1.8.9)
Proof. We can write A as
A =




aT
1
...
aT
m




where ai are m column vectors corresponding to rows of A. Similarly
we can write B as
B = b1 . . . bP
where bi are column vectors corresponding to columns of B. Then
AB =




aT
1
...
aT
m



 b1 . . . bP =




aT
1 b1 . . . aT
1 bP
...
...
...
aT
mb1 . . . aT
mbP



 = aT
i bj .
Now looking carefully
aT
i bj = ai, bj
Applying the Cauchy-Schwartz inequality we have
| ai, bj |2
≤ ai
2
2 bj
2
2 = ai
2
2 bj
2
2
Now
AB 2
F =
m
i=1
P
j=1
|aT
i bj|2
≤
m
i=1
P
j=1
ai
2
2 bj
2
2
=
m
i=1
ai
2
2
P
j=1
bj
2
2
= A 2
F B 2
F
which implies
AB F ≤ A F B F
by taking square roots on both sides.
76 1. MATRIX ALGEBRA
Corollary 1.123. Let A ∈ Cm×n
and let x ∈ Cn
. Then
Ax 2 ≤ A F x 2.
Proof. We note that Frobenius norm for a column matrix is same
as l2 norm for corresponding column vector. i.e.
x F = x 2 ∀ x ∈ Cn
.
Now applying theorem 1.122 we have
Ax 2 = Ax F ≤ A F x F = A F x 2 ∀ x ∈ Cn
.
It turns out that Frobenius norm is intimately related to the singular
value decomposition of a matrix.
Lemma 1.124 Let A ∈ Cm×n
. Let the singular value decomposi-
tion of A be given by
A = UΣV H
.
Let the singular value of A be σ1, . . . , σn. Then
A F =
n
i=1
σ2
i . (1.8.10)
Proof.
A = UΣV H
=⇒ A F = UΣV H
F .
But
UΣV H
F = ΣV H
F = Σ F
since U and V are unitary matrices (see theorem 1.121 ).
Now the only non-zero terms in Σ are the singular values. Hence
A F = Σ F =
n
i=1
σ2
i .
1.8. MATRIX NORMS 77
1.8.3. Consistency of a matrix norm
Definition 1.45 A matrix norm · is called consistent on Cn×n
if
AB ≤ A B (1.8.11)
holds true for all A, B ∈ Cn×n
. A matrix norm · is called
consistent if it is defined on Cm×n
for all m, n ∈ N and eq (1.8.11)
holds for all matrices A, B for which the product AB is defined.
A consistent matrix norm is also known as a sub-multiplicative
norm.
With this definition and results in theorem 1.122 we can see that Frobe-
nius norm is consistent.
1.8.4. Subordinate matrix norm
A matrix operates on vectors from one space to generate vectors in
another space. It is interesting to explore the connection between the
norm of a matrix and norms of vectors in the domain and co-domain
of a matrix.
Definition 1.46 Let m, n ∈ N be given. Let · α be some norm
on Cm
and · β be some norm on Cn
. Let · be some norm on
matrices in Cm×n
. We say that · is subordinate to the vector
norms · α and · β if
Ax α ≤ A x β (1.8.12)
for all A ∈ Cm×n
and for all x ∈ Cn
. In other words the length of
the vector doesn’t increase by the operation of A beyond a factor
given by the norm of the matrix itself.
If · α and · β are same then we say that · is subordinate
to the vector norm · α.
78 1. MATRIX ALGEBRA
We have shown earlier in corollary 1.123 that Frobenius norm is sub-
ordinate to Euclidean norm.
1.8.5. Operator norm
We now consider the maximum factor by which a matrix A can increase
the length of a vector.
Definition 1.47 Let m, n ∈ N be given. Let · α be some norm
on Cn
and · β be some norm on Cm
. For A ∈ Cm×n
we define
A A α→β max
x=0
Ax β
x α
. (1.8.13)
Ax β
x α
represents the factor with which the length of x increased
by operation of A. We simply pick up the maximum value of such
scaling factor.
The norm as defined above is known as (α → β) operator norm,
the (α → β)-norm, or simply the α-norm if α = β.
Off course we need to verify that this definition satisfies all properties
of a norm.
Clearly if A = 0 then Ax = 0 always, hence A = 0.
Conversely, if A = 0 then Ax β = 0 ∀ x ∈ Cn
. In particular this is
true for the unit vectors ei ∈ Cn
. The i-th column of A is given by Aei
which is 0. Thus each column in A is 0. Hence A = 0.
Now consider c ∈ C.
cA = max
x=0
cAx β
x α
= |c|max
x=0
Ax β
x α
= |c| A .
We now present some useful observations on operator norm before we
can prove triangle inequality for operator norm.
For any x ∈ ker(A), Ax = 0 hence we only need to consider vectors
which don’t belong to the kernel of A.
1.8. MATRIX NORMS 79
Thus we can write
A α→β = max
x/∈ker(A)
Ax β
x α
. (1.8.14)
We also note that
Acx β
cx α
=
|c| Ax β
|c| x α
=
Ax β
x α
∀ c = 0, x = 0.
Thus, it is sufficient to find the maximum on unit norm vectors:
A α→β = max
x α=1
Ax β.
Note that since x α = 1 hence the term in denominator goes away.
Lemma 1.125 The (α → β)-operator norm is subordinate to vec-
tor norms · α and · β. i.e.
Ax β ≤ A α→β x α. (1.8.15)
Proof. For x = 0 the inequality is trivially satisfied. Now for
x = 0 by definition, we have
A α→β ≥
Ax β
x α
=⇒ A α→β x α ≥ Ax β.
Remark. There exists a vector x∗
∈ Cn
with unit norm ( x∗
α = 1)
such that
A α→β = Ax∗
β. (1.8.16)
Proof. Let x = 0 be some vector which maximizes the expression
Ax β
x α
.
Then
A α→β =
Ax β
x α
.
Now consider x∗
= x
x α
. Thus x∗
α = 1. We know that
Ax β
x α
= Ax∗
β.
80 1. MATRIX ALGEBRA
Hence
A α→β = Ax∗
β.
We are now ready to prove triangle inequality for operator norm.
Lemma 1.126 Operator norm as defined in definition 1.47 satis-
fies triangle inequality.
Proof. Let A and B be some matrices in Cm×n
. Consider the
operator norm of matrix A + B. From previous remarks, there exists
some vector x∗
∈ Cn
with x∗
α = 1 such that
A + B = (A + B)x∗
β.
Now
(A + B)x∗
β = Ax∗
+ Bx∗
β ≤ Ax∗
β + Bx∗
β.
From another remark we have
Ax∗
β ≤ A x∗
α = A
and
Bx∗
β ≤ B x∗
α = B
since x∗
α = 1.
Hence we have
A + B ≤ A + B .
It turns out that operator norm is also consistent under certain condi-
tions.
1.8. MATRIX NORMS 81
Lemma 1.127 Let · α be defined over all m ∈ N. Let · β =
· α. Then the operator norm
A α = max
x=0
Ax α
x α
is consistent.
Proof. We need to show that
AB α ≤ A α B α.
Now
AB α = max
x=0
ABx α
x α
.
We note that if Bx = 0, then ABx = 0. Hence we can rewrite as
AB α = max
Bx=0
ABx α
x α
.
Now if Bx = 0 then Bx α = 0. Hence
ABx α
x α
=
ABx α
Bx α
Bx α
x α
and
max
Bx=0
ABx α
x α
≤ max
Bx=0
ABx α
Bx α
max
Bx=0
Bx α
x α
.
Clearly
B α = max
Bx=0
Bx α
x α
.
Furthermore
max
Bx=0
ABx α
Bx α
≤ max
y=0
Ay α
y α
= A α.
Thus we have
AB α ≤ A α B α.
82 1. MATRIX ALGEBRA
1.8.6. p-norm for matrices
We recall the definition of lp norms for vectors x ∈ Cn
from (??)
x p =



( n
i=1 |x|p
i )
1
p p ∈ [1, ∞)
max
1≤i≤n
|xi| p = ∞
.
The operator norms · p defined from lp vector norms are of specific
interest.
Definition 1.48 The p-norm for a matrix A ∈ Cm×n
is defined as
A p max
x=0
Ax p
x p
= max
x p=1
Ax p (1.8.17)
where x p is the standard lp norm for vectors in Cm
and Cn
.
Remark. As per lemma 1.127 p-norms for matrices are consistent
norms. They are also sub-ordinate to lp vector norms.
Special cases are considered for p = 1, 2 and ∞.
Theorem 1.128 Let A ∈ Cm×n
.
For p = 1 we have
A 1 max
1≤j≤n
m
i=1
|aij|. (1.8.18)
This is also known as max column sum norm.
For p = ∞ we have
A ∞ max
1≤i≤m
n
j=1
|aij|. (1.8.19)
This is also known as max row sum norm.
Finally for p = 2 we have
A 2 σ1 (1.8.20)
1.8. MATRIX NORMS 83
where σ1 is the largest singular value of A. This is also known as
spectral norm.
Proof. Let
A = a1
. . . , an .
Then
Ax 1 =
n
j=1
xjaj
1
≤
n
j=1
xjaj
1
=
n
j=1
|xj| aj
1
≤ max
1≤j≤n
aj
1
n
j=1
|xj|
= max
1≤j≤n
aj
1 x 1.
Thus,
A 1 = max
x=0
Ax 1
x 1
≤ max
1≤j≤n
aj
1
which the maximum column sum. We need to show that this upper
bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
Aej 1 = aj
1.
Thus
A 1 ≥ aj
1 ∀ 1 ≤ j ≤ n.
Combining the two, we see that
A 1 = max
1≤j≤n
aj
1.
84 1. MATRIX ALGEBRA
For p = ∞, we proceed as follows:
Ax ∞ = max
1≤i≤m
n
j=1
aijxj
≤ max
1≤i≤m
n
j=1
|aij||xj|
≤ max
1≤j≤n
|xj| max
1≤i≤m
n
j=1
|aij|
= x ∞ max
1≤i≤m
ai
1
where ai
are the rows of A.
This shows that
Ax ∞ ≤ max
1≤i≤m
ai
1.
We need to show that this is indeed an equality.
Fix an i = k and choose x such that
xj = sgn(akj).
Clearly x ∞ = 1.
Then
Ax ∞ = max
1≤i≤m
n
j=1
aijxj
≥
n
j=1
akjxj
=
n
j=1
|akj|
=
n
j=1
|akj|
= ak
1.
Thus,
A ∞ ≥ max
1≤i≤m
ai
1
1.8. MATRIX NORMS 85
Combining the two inequalities we get:
A ∞ = max
1≤i≤m
ai
1.
Remaining case is for p = 2.
For any vector x with x 2 = 1,
Ax 2 = UΣV H
x 2 = U(ΣV H
x) 2 = ΣV H
x 2
since l2 norm is invariant to unitary transformations.
Let v = V H
x. Then v 2 = V H
x 2 = x 2 = 1.
Now
Ax 2 = Σv 2
=
n
j=1
|σjvj|2
1
2
≤ σ1
n
j=1
|vj|2
1
2
= σ1 v 2 = σ1.
This shows that
A 2 ≤ σ1.
Now consider some vector x such that v = (1, 0, . . . , 0). Then
Ax 2 = Σv 2 = σ1.
Thus
A 2 ≥ σ1.
Combining the two, we get that A 2 = σ1.
1.8.7. The 2-norm
Theorem 1.129 Let A ∈ Cn×n
has singular values σ1 ≥ σ2 ≥
· · · ≥ σn. Let the eigen values for A be λ1, λ2, . . . , λn with |λ1| ≥
|λ2| ≥ · · · ≥ |λn|. Then the following hold
A 2 = σ1 (1.8.21)
86 1. MATRIX ALGEBRA
and if A is non-singular
A−1
2 =
1
σn
. (1.8.22)
If A is symmetric and positive definite, then
A 2 = λ1 (1.8.23)
and if A is non-singular
A−1
2 =
1
λn
. (1.8.24)
If A is normal then
A 2 = |λ1| (1.8.25)
and if A is non-singular
A−1
2 =
1
|λn|
. (1.8.26)
1.8.8. Unitary invariant norms
Definition 1.49 A matrix norm · on Cm×n
is called unitary
invariant if UAV = A for any A ∈ Cm×n
and any unitary
matrices U ∈ Cm×m
and V ∈ Cn×n
.
We have already seen in theorem 1.121 that Frobenius norm is unitary
invariant.
It turns out that spectral norm is also unitary invariant.
1.8.9. More properties of operator norms
In this section we will focus on operator norms connecting normed
linear spaces (Cn
, · p) and (Cm
, · q). Typical values of p, q would
be in {1, 2, ∞}.
We recall that
A p→q = max
x=0
Ax q
x p
= max
x p=1
Ax q = max
x p≤1
Ax q. (1.8.27)
1.8. MATRIX NORMS 87
Table 1[[5]] shows how to compute different (p, q) norms. Some can be
computed easily while others are NP-hard to compute.
Table 1. Typical (p → q) norms
p q A p→q Calculation
1 1 A 1 Maximum l1 norm of a column
1 2 A 1→2 Maximum l2 norm of a column
1 ∞ A 1→∞ Maximum absolute entry of a matrix
2 1 A 2→1 NP hard
2 2 A 2 Maximum singular value
2 ∞ A 2→∞ Maximum l2 norm of a row
∞ 1 A ∞→1 NP hard
∞ 2 A ∞→2 NP hard
∞ ∞ A ∞ Maximum l1-norm of a row
The topological dual of the finite dimensional normed linear space
(Cn
, · p) is the normed linear space (Cn
, · p ) where
1
p
+
1
p
= 1.
l2-norm is dual of l2-norm. It is a self dual. l1 norm and l∞-norm are
dual of each other.
When a matrix A maps from the space (Cn
, · p) to the space (Cm
, ·
q), we can view its conjugate transpose AH
as a mapping from the
space (Cm
, · q ) to (Cn
, · p ).
Theorem 1.130 Operator norm of a matrix always equals the op-
erator norm of its conjugate transpose. i.e.
A p→q = AH
q →p (1.8.28)
where
1
p
+
1
p
= 1,
1
q
+
1
q
= 1.
88 1. MATRIX ALGEBRA
Specific applications of this result are:
A 2 = AH
2. (1.8.29)
This is obvious since the maximum singular value of a matrix and its
conjugate transpose are same.
A 1 = AH
∞, A ∞ = AH
1. (1.8.30)
This is also obvious since max column sum of A is same as the max
row sum norm of AH
and vice versa.
A 1→∞ = AH
1→∞. (1.8.31)
A 1→2 = AH
2→∞. (1.8.32)
A ∞→2 = AH
2→1. (1.8.33)
We now need to show the result for the general case (arbitrary 1 ≤
p, q ≤ ∞).
Proof. TODO
Theorem 1.131
A 1→p = max
1≤j≤n
aj
p. (1.8.34)
where
A = a1
. . . , an .
1.8. MATRIX NORMS 89
Proof.
Ax p =
n
j=1
xjaj
p
≤
n
j=1
xjaj
p
=
n
j=1
|xj| aj
p
≤ max
1≤j≤n
aj
p
n
j=1
|xj|
= max
1≤j≤n
aj
p x 1.
Thus,
A 1→p = max
x=0
Ax p
x 1
≤ max
1≤j≤n
aj
p.
We need to show that this upper bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
Aej p = aj
p.
Thus
A 1→p ≥ aj
p ∀ 1 ≤ j ≤ n.
Combining the two, we see that
A 1→p = max
1≤j≤n
aj
p.
Theorem 1.132
A p→∞ = max
1≤i≤m
ai
q (1.8.35)
where
1
p
+
1
q
= 1.
90 1. MATRIX ALGEBRA
Proof. Using theorem 1.130, we get
A p→∞ = AH
1→q.
Using theorem 1.131, we get
AH
1→q = max
1≤i≤m
ai
q.
This completes the proof.
Theorem 1.133 For two matrices A and B and p ≥ 1, we have
AB p→q ≤ B p→s A s→q. (1.8.36)
Proof. We start with
AB p→q = max
x p=1
A(Bx) q.
From lemma 1.125, we obtain
A(Bx) q ≤ A s→q (Bx) s.
Thus,
AB p→q ≤ A s→q max
x p=1
(Bx) s = A s→q B p→s.
Theorem 1.134 For two matrices A and B and p ≥ 1, we have
AB p→∞ ≤ A ∞→∞ B p→∞. (1.8.37)
Proof. We start with
AB p→∞ = max
x p=1
A(Bx) ∞.
From lemma 1.125, we obtain
A(Bx) ∞ ≤ A ∞→∞ (Bx) ∞.
Thus,
AB p→∞ ≤ A ∞→∞ max
x p=1
(Bx) ∞ = A ∞→∞ B p→∞.
1.8. MATRIX NORMS 91
Theorem 1.135
A p→∞ ≤ A p→p. (1.8.38)
In particular
A 1→∞ ≤ A 1. (1.8.39)
A 2→∞ ≤ A 2. (1.8.40)
Proof. Choosing q = ∞ and s = p and applying theorem 1.133
IA p→∞ ≤ A p→p I p→∞.
But I p→∞ is the maximum lp norm of any row of I which is 1. Thus
A p→∞ ≤ A p→p.
Consider the expression
min
z∈C(AH )
z=0
Az q
z p
. (1.8.41)
z ∈ C(AH
), z = 0 means there exists some vector u /∈ ker(AH
) such
that z = AH
u.
This expression measures the factor by which the non-singular part of
A can decrease the length of a vector.
Theorem 1.136 [5] The following bound holds for every matrix
A:
min
z∈C(AH )
z=0
Az q
z p
≥ A† −1
q,p. (1.8.42)
If A is surjective (onto), then the equality holds. When A is bijec-
tive (one-one onto, square, invertible), then the result implies
min
z∈C(AH )
z=0
Az q
z p
= A−1 −1
q,p. (1.8.43)
92 1. MATRIX ALGEBRA
Proof. The spaces C(AH
) and C(A) have same dimensions given
by rank(A). We recall that A†
A is a projector onto the column space
of A.
w = Az ⇐⇒ z = A†
w = A†
Az ∀ z ∈ C(AH
).
As a result we can write
z p
Az q
=
A†
w p
w q
whenever z ∈ C(AH
). Now

 min
z∈C(AH )
z=0
Az q
z p


−1
= max
z∈C(AH )
z=0
z p
Az q
= max
w∈C(A)
w=0
A†
w p
w q
≤ max
w=0
A†
w p
w q
.
When A is surjective, then C(A) = Cm
. Hence
max
w∈C(A)
w=0
A†
w p
w q
= max
w=0
A†
w p
w q
.
Thus, the inequality changes into equality. Finally
max
w=0
A†
w p
w q
= A†
q→p
which completes the proof.
1.8.10. Row column norms
Definition 1.50 Let A be an m × n matrix with rows ai
as
A =




a1
...
am




Then we define
A p,∞ max
1≤i≤m
ai
p = max
1≤i≤m
n
j=1
|ai
j|p
1
p
(1.8.44)
where 1 ≤ p < ∞. i.e. we take p-norms of all row vectors and
then find the maximum.
1.8. MATRIX NORMS 93
We define
A ∞,∞ = max
i,j
|aij|. (1.8.45)
This is equivalent to taking l∞ norm on each row and then taking
the maximum of all the norms.
For 1 ≤ p, q < ∞, we define the norm
A p,q
m
i=1
ai
p
q
1
q
. (1.8.46)
i.e., we compute p-norm of all the row vectors to form another
vector and then take q-norm of that vector.
Note that the norm A p,∞ is different from the operator norm A p→∞.
Similarly A p,q is different from A p→q.
Theorem 1.137
A p,∞ = A q→∞ (1.8.47)
where
1
p
+
1
q
= 1.
Proof. From theorem 1.132 we get
A q→∞ = max
1≤i≤m
ai
p.
This is exactly the definition of A p,∞.
Theorem 1.138
A 1→p = A p,∞. (1.8.48)
Proof.
A 1→p = AH
q→∞.
From theorem 1.137
AH
q→∞ = AH
p,∞.
94 1. MATRIX ALGEBRA
Theorem 1.139 For any two matrices A, B, we have
AB p,∞
B p,∞
≤ A ∞→∞. (1.8.49)
Proof. Let q be such that 1
p
+ 1
q
= 1. From theorem 1.134, we
have
AB q→∞ ≤ A ∞→∞ B q→∞.
From theorem 1.137
AB q→∞ = AB p,∞
and
B q→∞ = B p,∞.
Thus
AB p,∞ ≤ A ∞→∞ B p,∞.
Theorem 1.140 Relations between (p, q) norms and (p → q)
norms
A 1,∞ = A ∞→∞ (1.8.50)
A 2,∞ = A 2→∞ (1.8.51)
A ∞,∞ = A 1→∞ (1.8.52)
A 1→1 = AH
1,∞ (1.8.53)
A 1→2 = AH
2,∞ (1.8.54)
(1.8.55)
Proof. The first three are straight forward applications of theo-
rem 1.137. The next two are applications of theorem 1.138. See also
table 1.
1.8. MATRIX NORMS 95
1.8.11. Block diagonally dominant matrices and generalized
Gershgorin disc theorem
In [1] the idea of diagonally dominant matrices (see section 1.6.9) has
been generalized to block matrices using matrix norms. We consider
the specific case with spectral norm.
Definition 1.51 [Block diagonally dominant matrix] Let A be a
square matrix in Cn×n
which is partitioned in following manner
A =






A11 A12 . . . A1k
A21 A22 . . . A2k
...
...
...
...
Ak1 Ak2 . . . Akk






(1.8.56)
where each of the submatrices Aij is a square matrix of size m×m.
Thus n = km.
A is called block diagonally dominant if
Aii 2 ≥
j=i
Aij 2.
holds true for all 1 ≤ i ≤ n. If the inequality satisfies strictly
for all i, then A is called block strictly diagonally dominant
matrix.
Theorem 1.141 If the partitioned matrix A of definition 1.51 is
block strictly diagonally dominant matrix, then it is nonsingular.
For proof see [1].
This leads to the generalized Gershgorin disc theorem.
96 1. MATRIX ALGEBRA
Theorem 1.142 Let A be a square matrix in Cn×n
which is par-
titioned in following manner
A =






A11 A12 . . . A1k
A21 A22 . . . A2k
...
...
...
...
Ak1 Ak2 . . . Akk






(1.8.57)
where each of the submatrices Aij is a square matrix of size m×m.
Then each eigenvalue λ of A satisfies
λI − Aii 2 ≤
j=i
Aij for some i ∈ {1, 2, . . . , n}. (1.8.58)
For proof see [1].
Since the 2-norm of a positive semidefinite matrix is nothing but its
largest eigen value, the theorem directly applies.
Corollary 1.143. Let A be a Hermitian positive semidefinite matrix.
Let A be partitioned as in theorem 1.142. Then its 2-norm A 2 satis-
fies
| A 2 − Aii 2| ≤
j=i
Aij for some i ∈ {1, 2, . . . , n}. (1.8.59)
1.9. Miscellaneous topics
1.9.1. Hadamard product
Usually standard linear algebra books don’t dwell much about element-
wise or component wise products of vectors or matrices. Yet in certain
contexts and algorithms, this is quite useful. We define the notation in
this section. For further details see [3], [2] and [4].
Definition 1.52 The Hadamard product of two matrices A =
[aij] and B = [bij] with same dimensions (not necessarily square)
1.10. DIGEST 97
with entries in a given ring R is the entry-wise product A ◦ B ≡
[aijbij], which has the same dimensions as A and B.
Example 1.3: Hadamard product Let
A =
1 2
3 4
and B =
5 −6
7 −3
Then
A ◦ B =
5 −12
21 −12
The Hardamard product is associative and distributive. It is also com-
mutative.
Naturally this can also be defined for column vectors and row vectors
also.
The reason why this product is not mentioned in linear algebra texts
is because it is inherently basis dependent. But this product has a
number of uses in statistics and analysis.
In analysis, a similar concept is point-wise product which is defined
to be
(f.g)(x) = f(x)g(x).
1.10. Digest
1.10.1. Norms
All norms are equivalent.
Sum norm
A S =
m
i=1
n
j=1
|aij|.
98 1. MATRIX ALGEBRA
Frobenius norm
A F =
m
i=1
n
j=1
|aij|2
1
2
.
Max norm
A M = max
1≤i≤m
1≤j≤n
|aij|.
Frobenius norm of Hermitian transpose
AH
F = A F .
Frobenius norm as sum of norms of column vectors
A 2
F =
n
j=1
aj 2
2.
Frobenius norm as sum of norms of row vectors
A 2
F =
m
i=1
ai 2
2.
Frobenius norm invariance w.r.t. unitary matrices
UA F = A F
AV F = A F .
Frobenius norm is consistent:
AB F ≤ A F B F .
corollary 1.123
Ax 2 ≤ A F x 2.
A F =
n
i=1
σ2
i .
Consistent norms
AB ≤ A B
also known as sub-multiplicative norm.
1.10. DIGEST 99
Subordinate matrix norm
Ax α ≤ A x β
(α → β) Operator norm
A A α→β max
x=0
Ax β
x α
.
A α→β = max
x/∈ker(A)
Ax β
x α
= max
x α=1
Ax β.
(α → β) norm is subordinate
Ax β ≤ A α→β x α.
There exists a unit norm vector x∗
such that
A α→β = Ax∗
β.
α → α-norms are consistent
A α = max
x=0
Ax α
x α
AB α ≤ A α B α.
p-norm
A p max
x=0
Ax p
x p
= max
x p=1
Ax p
Closed form p-norms
A 1 max
1≤j≤n
m
i=1
|aij|.
A ∞ max
1≤i≤m
n
j=1
|aij|.
2-norm
A 2 σ1
non-singular
A−1
2 =
1
σn
.
100 1. MATRIX ALGEBRA
symmetric and positive definite
A 2 = λ1
non-singular
A−1
2 =
1
λn
.
normal
A 2 = |λ1|
non-singular
A−1
2 =
1
|λn|
.
Unitary invariant norm UAV = A for any A ∈ Cm×n
and any
unitary U and V .
Typical p → q norms
Dual norm and conjugate transpose
A p→q = AH
q →p
1
p
+
1
p
= 1.
A 2 = AH
2.
A 1 = AH
∞, A ∞ = AH
1.
A 1→∞ = AH
1→∞.
A 1→2 = AH
2→∞.
A ∞→2 = AH
2→1.
A 1→p
A 1→p = max
1≤j≤n
aj
p.
A p→∞
A p→∞ = max
1≤i≤m
ai
q
with 1
p
+ 1
q
= 1.
Consistency of p → q norm
AB p→q ≤ B p→s A s→q.
1.10. DIGEST 101
Consistency of p → ∞ norm
AB p→∞ ≤ A ∞→∞ B p→∞.
Dominance of p → ∞ norm by p → p norm
A p→∞ ≤ A p→p.
A 1→∞ ≤ A 1.
A 2→∞ ≤ A 2.
Restricted minimum property
min
z∈C(AH )
z=0
Az q
z p
≥ A† −1
q,p.
If A is surjective (onto), then the equality holds. When A is bijective
min
z∈C(AH )
z=0
Az q
z p
= A−1 −1
q,p.
Row column norm
A p,∞ max
1≤i≤m
ai
p.
A p,∞ = max
1≤i≤m
n
j=1
|ai
j|p
1
p
.
A ∞,∞ = max
i,j
|aij|.
A p,q
m
i=1
ai
p
q
1
q
.
Row column norm and p → ∞ norm
A p,∞ = A q→∞
with 1
p
+ 1
q
= 1.
Consistency of (p, ∞) norm
AB p,∞
B p,∞
≤ A ∞→∞.
102 1. MATRIX ALGEBRA
Relations between (p, q) norms and (p → q) norms
A 1,∞ = A ∞→∞
A 2,∞ = A 2→∞
A ∞,∞ = A 1→∞
A 1→1 = AH
1,∞
A 1→2 = AH
2,∞
Bibliography
[1] David G Feingold, Richard S Varga, et al. Block diagonally domi-
nant matrices and generalizations of the gerschgorin circle theorem.
Pacific J. Math, 12(4):1241–1250, 1962.
[2] Roger A Horn. The hadamard product. In Proc. Symp. Appl. Math,
volume 40, pages 87–169, 1990.
[3] Elizabeth Million. The hadamard product, 2007.
[4] George PH Styan. Hadamard products and multivariate statistical
analysis. Linear Algebra and Its Applications, 6:217–240, 1973.
[5] JOEL A TROPP. Just relax: Convex programming methods for
subset selection and sparse approximation. 2004.
103
Some notes on Matrix Algebra
Some notes on Matrix Algebra

More Related Content

What's hot

Newton's 3rd Law of Motion
Newton's 3rd Law of MotionNewton's 3rd Law of Motion
Newton's 3rd Law of MotionMrsJenner
 
Importance of Normalization
Importance of NormalizationImportance of Normalization
Importance of NormalizationShwe Yee
 
Compoutational Physics
Compoutational PhysicsCompoutational Physics
Compoutational PhysicsSaad Shaukat
 

What's hot (6)

1.1.2 HEXADECIMAL
1.1.2 HEXADECIMAL1.1.2 HEXADECIMAL
1.1.2 HEXADECIMAL
 
Newton's 3rd Law of Motion
Newton's 3rd Law of MotionNewton's 3rd Law of Motion
Newton's 3rd Law of Motion
 
Importance of Normalization
Importance of NormalizationImportance of Normalization
Importance of Normalization
 
MySql slides (ppt)
MySql slides (ppt)MySql slides (ppt)
MySql slides (ppt)
 
Compoutational Physics
Compoutational PhysicsCompoutational Physics
Compoutational Physics
 
SQL JOINS
SQL JOINSSQL JOINS
SQL JOINS
 

Similar to Some notes on Matrix Algebra

Matrices & determinants
Matrices & determinantsMatrices & determinants
Matrices & determinantsindu thakur
 
Invertible Matrix and Factorization.pptx
Invertible Matrix and Factorization.pptxInvertible Matrix and Factorization.pptx
Invertible Matrix and Factorization.pptxIkhlaqAhmad18
 
Matrices
MatricesMatrices
MatricesNORAIMA
 
Matrices
MatricesMatrices
MatricesNORAIMA
 
Matrices
MatricesMatrices
MatricesNORAIMA
 
Matrices
MatricesMatrices
MatricesNORAIMA
 
MATRICES.pdf
MATRICES.pdfMATRICES.pdf
MATRICES.pdfMahatoJee
 
intruduction to Matrix in discrete structures.pptx
intruduction to Matrix in discrete structures.pptxintruduction to Matrix in discrete structures.pptx
intruduction to Matrix in discrete structures.pptxShaukatAliChaudhry1
 
systems of linear equations & matrices
systems of linear equations & matricessystems of linear equations & matrices
systems of linear equations & matricesStudent
 
Engg maths k notes(4)
Engg maths k notes(4)Engg maths k notes(4)
Engg maths k notes(4)Ranjay Kumar
 
Chapter 4: Vector Spaces - Part 4/Slides By Pearson
Chapter 4: Vector Spaces - Part 4/Slides By PearsonChapter 4: Vector Spaces - Part 4/Slides By Pearson
Chapter 4: Vector Spaces - Part 4/Slides By PearsonChaimae Baroudi
 
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeksBeginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeksJinTaek Seo
 
Matrices y determinants
Matrices y determinantsMatrices y determinants
Matrices y determinantsJeannie
 
Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Rai University
 

Similar to Some notes on Matrix Algebra (20)

Matrices & determinants
Matrices & determinantsMatrices & determinants
Matrices & determinants
 
Invertible Matrix and Factorization.pptx
Invertible Matrix and Factorization.pptxInvertible Matrix and Factorization.pptx
Invertible Matrix and Factorization.pptx
 
Matrices
MatricesMatrices
Matrices
 
Matrices
MatricesMatrices
Matrices
 
Matrices
MatricesMatrices
Matrices
 
Matrices
MatricesMatrices
Matrices
 
MATRICES.pdf
MATRICES.pdfMATRICES.pdf
MATRICES.pdf
 
Matrix_PPT.pptx
Matrix_PPT.pptxMatrix_PPT.pptx
Matrix_PPT.pptx
 
intruduction to Matrix in discrete structures.pptx
intruduction to Matrix in discrete structures.pptxintruduction to Matrix in discrete structures.pptx
intruduction to Matrix in discrete structures.pptx
 
Matrix_PPT.pptx
Matrix_PPT.pptxMatrix_PPT.pptx
Matrix_PPT.pptx
 
Unit i
Unit iUnit i
Unit i
 
Maths
MathsMaths
Maths
 
Matrices
MatricesMatrices
Matrices
 
systems of linear equations & matrices
systems of linear equations & matricessystems of linear equations & matrices
systems of linear equations & matrices
 
Engg maths k notes(4)
Engg maths k notes(4)Engg maths k notes(4)
Engg maths k notes(4)
 
Matrix.
Matrix.Matrix.
Matrix.
 
Chapter 4: Vector Spaces - Part 4/Slides By Pearson
Chapter 4: Vector Spaces - Part 4/Slides By PearsonChapter 4: Vector Spaces - Part 4/Slides By Pearson
Chapter 4: Vector Spaces - Part 4/Slides By Pearson
 
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeksBeginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
 
Matrices y determinants
Matrices y determinantsMatrices y determinants
Matrices y determinants
 
Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -
 

Recently uploaded

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Some notes on Matrix Algebra

  • 1. CHAPTER 1 Matrix Algebra In this chapter we collect results related to matrix algebra which are relevant to this book. Some specific topics which are typically not found in standard books are also covered here. 1.1. Preliminaries Standard notation in this chapter is given here. Matrices are denoted by capital letters A, B etc.. They can be rectangular with m rows and n columns. Their elements or entries are referred to with small letters aij, bij etc. where i denotes the i-th row of matrix and j denotes the j-th column of matrix. Thus A =       a11 a12 . . . a1n a21 a22 . . . a1n ... ... ... ... am1 am2 . . . amn       Mostly we consider complex matrices belonging to Cm×n . Sometimes we will restrict our attention to real matrices belonging to Rm×n . Definition 1.1 [Square matrix] An m×n matrix is called square matrix if m = n. Definition 1.2 [Tall matrix] An m × n matrix is called tall ma- trix if m > n i.e. the number of rows is greater than columns. 1
  • 2. 2 1. MATRIX ALGEBRA Definition 1.3 [Wide matrix] An m × n matrix is called wide matrix if m < n i.e. the number of columns is greater than rows. Definition 1.4 [Main diagonal] Let A = [aij] be an m×n matrix. The main diagonal consists of entries aij where i = j. i.e. main diagonal is {a11, a22, . . . , akk} where k = min(m, n). Main diagonal is also known as leading diagonal, major diagonal primary diagonal or principal diagonal. The entries of A which are not on the main diagonal are known as off diagonal entries. Definition 1.5 [Diagonal matrix] A diagonal matrix is a matrix (usually a square matrix) whose entries outside the main diagonal are zero. Whenever we refer to a diagonal matrix which is not square, we will use the term rectangular diagonal matrix. A square diagonal matrix A is also represented by diag(a11, a22, . . . , ann) which lists only the diagonal (non-zero) entries in A. The transpose of a matrix A is denoted by AT while the Hermitian transpose is denoted by AH . For real matrices AT = AH . When matrices are square, we have the number of rows and columns both equal to n and they belong to Cn×n . If not specified, the square matrices will be of size n×n and rectangular matrices will be of size m×n. If not specified the vectors (column vec- tors) will be of size n×1 and belong to either Rn or Cn . Corresponding row vectors will be of size 1 × n. For statements which are valid both for real and complex matrices, sometimes we might say that matrices belong to Fm×n while the scalars belong to F and vectors belong to Fn where F refers to either the field of real numbers or the field of complex numbers. Note that this is not
  • 3. 1.1. PRELIMINARIES 3 consistently followed at the moment. Most results are written only for Cm×n while still being applicable for Rm×n . Identity matrix for Fn×n is denoted as In or simply I whenever the size is clear from context. Sometimes we will write a matrix in terms of its column vectors. We will use the notation A = a1 a2 . . . an indicating n columns. When we write a matrix in terms of its row vectors, we will use the notation A =       aT 1 aT 2 ... aT m       indicating m rows with ai being column vectors whose transposes form the rows of A. The rank of a matrix A is written as rank(A), while the determinant as det(A) or |A|. We say that an m × n matrix A is left-invertible if there exists an n × m matrix B such that BA = I. We say that an m × n matrix A is right-invertible if there exists an n × m matrix B such that AB = I. We say that a square matrix A is invertible when there exists another square matrix B of same size such that AB = BA = I. A square matrix is invertible iff its both left and right invertible. Inverse of a square invertible matrix is denoted by A−1 . A special left or right inverse is the pseudo inverse, which is denoted by A† . Column space of a matrix is denoted by C(A), the null space by N(A), and the row space by R(A).
  • 4. 4 1. MATRIX ALGEBRA We say that a matrix is symmetric when A = AT , conjugate sym- metric or Hermitian when AH = A. When a square matrix is not invertible, we say that it is singular. A non-singular matrix is invertible. The eigen values of a square matrix are written as λ1, λ2, . . . while the singular values of a rectangular matrix are written as σ1, σ2, . . . . The inner product or dot product of two column / row vectors u and v belonging to Rn is defined as u · v = u, v = n i=1 uivi. (1.1.1) The inner product or dot product of two column / row vectors u and v belonging to Cn is defined as u · v = u, v = n i=1 uivi. (1.1.2) 1.1.1. Block matrix Definition 1.6 A block matrix is a matrix whose entries them- selves are matrices with following constraints (1) Entries in every row are matrices with same number of rows. (2) Entries in every column are matrices with same number of columns. Let A be an m × n block matrix. Then A =       A11 A12 . . . A1n A21 A22 . . . A2n ... ... ... ... Am1 Am2 . . . Amn       (1.1.3) where Aij is a matrix with ri rows and cj columns.
  • 5. 1.1. PRELIMINARIES 5 A block matrix is also known as a partitioned matrix. Example 1.1: 2x2 block matrices Quite frequently we will be using 2x2 block matrices. P = P11 P12 P21 P22 . (1.1.4) An example P =    a b c d e f g h i    We have P11 = a b d e P12 = c f P21 = g h P22 = i • P11 and P12 have 2 rows. • P21 and P22 have 1 row. • P11 and P21 have 2 columns. • P12 and P22 have 1 column. Lemma 1.1 Let A = [Aij] be an m×n block matrix with Aij being an ri × cj matrix. Then A is an r × c matrix where r = m i=1 ri (1.1.5) and c = n j=1 cj. (1.1.6) Remark. Sometimes it is convenient to think of a regular matrix as a block matrix whose entries are 1 × 1 matrices themselves. Definition 1.7 [Multiplication of block matrices] Let A = [Aij] be an m × n block matrix with Aij being a pi × qj matrix. Let
  • 6. 6 1. MATRIX ALGEBRA B = [Bjk] be an n×p block matrix with Bjk being a qj ×rk matrix. Then the two block matrices are compatible for multiplication and their multiplication is defined by C = AB = [Cik] where Cik = n j=1 AijBjk (1.1.7) and Cik is a pi × rk matrix. Definition 1.8 A block diagonal matrix is a block matrix whose off diagonal entries are zero matrices. 1.2. Linear independence, span, rank 1.2.1. Spaces associated with a matrix Definition 1.9 The column space of a matrix is defined as the vector space spanned by columns of the matrix. Let A be an m × n matrix with A = a1 a2 . . . an Then the column space is given by C(A) = {x ∈ Fm : x = n i=1 αiai for some αi ∈ F}. (1.2.1) Definition 1.10 The row space of a matrix is defined as the vector space spanned by rows of the matrix. Let A be an m × n matrix with A =       aT 1 aT 2 ... aT m      
  • 7. 1.2. LINEAR INDEPENDENCE, SPAN, RANK 7 Then the row space is given by R(A) = {x ∈ Fn : x = m i=1 αiai for some αi ∈ F}. (1.2.2) 1.2.2. Rank Definition 1.11 [Column rank] The column rank of a matrix is defined as the maximum number of columns which are linearly independent. In other words column rank is the dimension of the column space of a matrix. Definition 1.12 [Row rank] The row rank of a matrix is defined as the maximum number of rows which are linearly independent. In other words row rank is the dimension of the row space of a matrix. Theorem 1.2 The column rank and row rank of a matrix are equal. Definition 1.13 [Rank] The rank of a matrix is defined to be equal to its column rank which is equal to its row rank. Lemma 1.3 For an m × n matrix A 0 ≤ rank(A) ≤ min(m, n). (1.2.3) Lemma 1.4 The rank of a matrix is 0 if and only if it is a zero matrix. Definition 1.14 [Full rank matrix] An m × n matrix A is called full rank if rank(A) = min(m, n). In other words it is either a full column rank matrix or a full row rank matrix or both.
  • 8. 8 1. MATRIX ALGEBRA Lemma 1.5 [Rank of product to two matrices] Let A be an m×n matrix and B be an n × p matrix then rank(AB) ≤ min(rank(A), rank(B)). (1.2.4) Lemma 1.6 [Post-multiplication with a full row rank matrix] Let A be an m × n matrix and B be an n × p matrix. If B is of rank n then rank(AB) = rank(A). (1.2.5) Lemma 1.7 [Pre-multiplication with a full column rank matrix] Let A be an m × n matrix and B be an n × p matrix. If A is of rank n then rank(AB) = rank(B). (1.2.6) Lemma 1.8 The rank of a diagonal matrix is equal to the number of non-zero elements on its main diagonal. Proof. The columns which correspond to diagonal entries which are zero are zero columns. Other columns are linearly independent. The number of linearly independent rows is also the same. Hence their count gives us the rank of the matrix. 1.3. Invertible matrices Definition 1.15 [Invertible] A square matrix A is called invert- ible if there exists another square matrix B of same size such that AB = BA = I. The matrix B is called the inverse of A and is denoted as A−1 . Lemma 1.9 If A is invertible then its inverse A−1 is also invertible and the inverse of A−1 is nothing but A.
  • 9. 1.3. INVERTIBLE MATRICES 9 Lemma 1.10 Identity matrix I is invertible. Proof. II = I =⇒ I−1 = I. Lemma 1.11 If A is invertible then columns of A are linearly independent. Proof. Assume A is invertible, then there exists a matrix B such that AB = BA = I. Assume that columns of A are linearly dependent. Then there exists u = 0 such that Au = 0 =⇒ BAu = 0 =⇒ Iu = 0 =⇒ u = 0 a contradiction. Hence columns of A are linearly independent. Lemma 1.12 If an n × n matrix A is invertible then columns of A span Fn . Proof. Assume A is invertible, then there exists a matrix B such that AB = BA = I. Now let x ∈ Fn be any arbitrary vector. We need to show that there exists α ∈ Fn such that x = Aα. But x = Ix = ABx = A(Bx). Thus if we choose α = Bx, then x = Aα.
  • 10. 10 1. MATRIX ALGEBRA Thus columns of A span Fn . Lemma 1.13 If A is invertible, then columns of A form a basis for Fn . Proof. In Fn a basis is a set of vectors which is linearly inde- pendent and spans Fn . By lemma 1.11 and lemma 1.12, columns of an invertible matrix A satisfy both conditions. Hence they form a basis. Lemma 1.14 If A is invertible than AT is invertible. Proof. Assume A is invertible, then there exists a matrix B such that AB = BA = I. Applying transpose on both sides we get BT AT = AT BT = I. Thus BT is inverse of AT and AT is invertible. Lemma 1.15 If A is invertible than AH is invertible. Proof. Assume A is invertible, then there exists a matrix B such that AB = BA = I. Applying conjugate transpose on both sides we get BH AH = AH BH = I. Thus BH is inverse of AH and AH is invertible.
  • 11. 1.3. INVERTIBLE MATRICES 11 Lemma 1.16 If A and B are invertible then AB is invertible. Proof. We note that (AB)(B−1 A−1 ) = A(BB−1 )A−1 = AIA−1 = I. Similarly (B−1 A−1 )(AB) = B−1 (A−1 A)B = B−1 IB = I. Thus B−1 A−1 is the inverse of AB. Lemma 1.17 The set of n×n invertible matrices under the matrix multiplication operation form a group. Proof. We verify the properties of a group Closure: If A and B are invertible then AB is invertible. Hence the set is closed. Associativity: Matrix multiplication is associative. Identity element: I is invertible and AI = IA = A for all invertible matrices. Inverse element: If A is invertible then A−1 is also invertible. Thus the set of invertible matrices is indeed a group under matrix multiplication. Lemma 1.18 An n × n matrix A is invertible if and only if it is full rank i.e. rank(A) = n. Corollary 1.19. The rank of an invertible matrix and its inverse are same.
  • 12. 12 1. MATRIX ALGEBRA 1.3.1. Similar matrices Definition 1.16 [Similar matrices] An n × n matrix B is similar to an n × n matrix A if there exists an n × n non-singular matrix C such that B = C−1 AC. Lemma 1.20 If B is similar to A then A is similar to B. Thus similarity is a symmetric relation. Proof. B = C−1 AC =⇒ A = CBC−1 =⇒ A = (C−1 )−1 BC−1 Thus there exists a matrix D = C−1 such that A = D−1 BD. Thus A is similar to B. Lemma 1.21 Similar matrices have same rank. Proof. Let B be similar to A. Thus their exists an invertible matrix C such that B = C−1 AC. Since C is invertible hence we have rank(C) = rank(C−1 ) = n. Now using lemma 1.6 rank(AC) = rank(A) and using lemma 1.7 we have rank(C−1 (AC)) = rank(AC) = rank(A). Thus rank(B) = rank(A). Lemma 1.22 Similarity is an equivalence relation on the set of n × n matrices.
  • 13. 1.3. INVERTIBLE MATRICES 13 Proof. Let A, B, C be n×n matrices. A is similar to itself through an invertible matrix I. If A is similar to B then B is similar to itself. If B is similar to A via P s.t. B = P−1 AP and C is similar to B via Q s.t. C = Q−1 BQ then C is similar to A via PQ such that C = (PQ)−1 A(PQ). Thus similarity is an equivalence relation on the set of square matrices and if A is any n×n matrix then the set of n×n matrices similar to A forms an equivalence class. 1.3.2. Gram matrices Definition 1.17 Gram matrix of columns of A is given by G = AH A (1.3.1) Definition 1.18 Gram matrix of rows of A is given by G = AAH (1.3.2) Remark. Usually when we talk about Gram matrix of a matrix we are looking at the Gram matrix of its column vectors. Remark. For real matrix A ∈ Rm×n , the Gram matrix of its column vectors is given by AT A and the Gram matrix for its row vectors is given by AAT . Following results apply equally well for the real case. Lemma 1.23 The columns of a matrix are linearly dependent if and only if the Gram matrix of its column vectors AH A is not invertible. Proof. Let A be an m × n matrix and G = AH A be the Gram matrix of its columns. If columns of A are linearly dependent, then there exists a vector u = 0 such that Au = 0.
  • 14. 14 1. MATRIX ALGEBRA Thus Gu = AH Au = 0. Hence the columns of G are also dependent and G is not invertible. Conversely let us assume that G is not invertible, thus columns of G are dependent and there exists a vector v = 0 such that Gv = 0. Now vH Gv = vH AH Av = (Av)H (Av) = Av 2 2. From previous equation, we have Av 2 2 = 0 =⇒ Av = 0. Since v = 0 hence columns of A are also linearly dependent. Corollary 1.24. The columns of a matrix are linearly independent if and only if the Gram matrix of its column vectors AH A is invertible. Proof. Columns of A can be dependent only if its Gram matrix is not invertible. Thus if the Gram matrix is invertible, then the columns of A are linearly independent. The Gram matrix is not invertible only if columns of A are linearly dependent. Thus if columns of A are linearly independent then the Gram matrix is invertible. Corollary 1.25. Let A be a full column rank matrix. Then AH A is invertible. Lemma 1.26 The null space of A and its Gram matrix AH A co- incide. i.e. N(A) = N(AH A). (1.3.3) Proof. Let u ∈ N(A). Then Au = 0 =⇒ AH Au = 0.
  • 15. 1.3. INVERTIBLE MATRICES 15 Thus u ∈ N(AH A) =⇒ N(A) ⊆ N(AH A). Now let u ∈ N(AH A). Then AH Au = 0 =⇒ uH AH Au = 0 =⇒ Au 2 2 = 0 =⇒ Au = 0. Thus we have u ∈ N(A) =⇒ N(AH A) ⊆ N(A). Lemma 1.27 The rows of a matrix A are linearly dependent if and only if the Gram matrix of its row vectors AAH is not invertible. Proof. Rows of A are linearly dependent, if and only if columns of AH are linearly dependent. There exists a vector v = 0 s.t. AH v = 0 Thus Gv = AAH v = 0. Since v = 0 hence G is not invertible. Converse: assuming that G is not invertible, there exists a vector u = 0 s.t. Gu = 0. Now uH Gu = uH AAH u = (AH u)H (AH u) = AH u 2 2 = 0 =⇒ AH u = 0. Since u = 0 hence columns of AH and consequently rows of A are linearly dependent. Corollary 1.28. The rows of a matrix A are linearly independent if and only if the Gram matrix of its row vectors AAH is invertible. Corollary 1.29. Let A be a full row rank matrix. Then AAH is in- vertible.
  • 16. 16 1. MATRIX ALGEBRA 1.3.3. Pseudo inverses Definition 1.19 [Moore-Penrose pseudo-inverse] Let A be an m× n matrix. An n×m matrix A† is called its Moore-Penrose pseudo- inverse if it satisfies all of the following criteria: (1) AA† A = A. (2) A† AA† = A† . (3) AA† H = AA† i.e. AA† is Hermitian. (4) (A† A)H = A† A i.e. A† A is Hermitian. Theorem 1.30 [Existence and uniqueness] For any matrix A there exists precisely one matrix A† which satisfies all the requirements in definition 1.19. We omit the proof for this. The pseudo-inverse can actually be ob- tained by the singular value decomposition of A. This is shown in lemma 1.110. Lemma 1.31 Let D = diag(d1, d2, . . . , dn) be an n × n diag- onal matrix. Then its Moore-Penrose pseudo-inverse is D† = diag(c1, c2, . . . , cn) where ci = 1 di if di = 0; 0 if di = 0. Proof. We note that D† D = DD† = F = diag(f1, f2, . . . fn) where fi = 1 if di = 0; 0 if di = 0. We now verify the requirements in definition 1.19. DD† D = FD = D. D† DD† = FD† = D† D† D = DD† = F is a diagonal hence Hermitian matrix.
  • 17. 1.3. INVERTIBLE MATRICES 17 Lemma 1.32 Let D = diag(d1, d2, . . . , dp) be an m × n rectan- gular diagonal matrix where p = min(m, n). Then its Moore- Penrose pseudo-inverse is an n × m rectangular diagonal matrix D† = diag(c1, c2, . . . , cp) where ci = 1 di if di = 0; 0 if di = 0. Proof. F = D† D = diag(f1, f2, . . . fn) is an n × n matrix where fi =    1 if di = 0; 0 if di = 0; 0 if i > p. G = DD† = diag(g1, g2, . . . gn) is an m × m matrix where gi =    1 if di = 0; 0 if di = 0; 0 if i > p. We now verify the requirements in definition 1.19. DD† D = DF = D. D† DD† = D† G = D† F = D† D and G = DD† are both diagonal hence Hermitian matrices. Lemma 1.33 If A is full column rank then its Moore-Penrose pseudo-inverse is given by A† = (AH A)−1 AH . (1.3.4) It is a left inverse of A. Proof. By corollary 1.25 AH A is invertible.
  • 18. 18 1. MATRIX ALGEBRA First of all we verify that its a left inverse. A† A = (AH A)−1 AH A = I. We now verify all the properties. AA† A = AI = A. A† AA† = IA† = A† . Hermitian properties: AA† H = A(AH A)−1 AH H = A(AH A)−1 AH = AA† . (A† A)H = IH = I = A† A. Lemma 1.34 If A is full row rank then its Moore-Penrose pseudo- inverse is given by A† = AH (AAH )−1 . (1.3.5) It is a right inverse of A. Proof. By corollary 1.29 AAH is invertible. First of all we verify that its a right inverse. AA† = AAH (AAH )−1 = I. We now verify all the properties. AA† A = IA = A. A† AA† = A† I = A† . Hermitian properties: AA† H = IH = I = AA† . (A† A)H = AH (AAH )−1 A H = AH (AAH )−1 A = A† A.
  • 19. 1.4. TRACE AND DETERMINANT 19 1.4. Trace and determinant 1.4.1. Trace Definition 1.20 [Trace] The trace of a square matrix is defined as the sum of the entries on its main diagonal. Let A be an n × n matrix, then tr(A) = n i=1 aii (1.4.1) where tr(A) denotes the trace of A. Lemma 1.35 The trace of a square matrix and its transpose are equal. tr(A) = tr(AT ). (1.4.2) Lemma 1.36 Trace of sum of two square matrices is equal to the sum of their traces. tr(A + B) = tr(A) + tr(B). (1.4.3) Lemma 1.37 Let A be an m×n matrix and B be an n×m matrix. Then tr(AB) = tr(BA). (1.4.4) Proof. Let AB = C = [cij]. Then cij = n k=1 aikbkj. Thus cii = n k=1 aikbki. Now tr(C) = m i=1 cii = m i=1 n k=1 aikbki = n k=1 m i=1 aikbki = n k=1 m i=1 bkiaik.
  • 20. 20 1. MATRIX ALGEBRA Let BA = D = [dij]. Then dij = m k=1 bikakj. Thus dii = m k=1 bikaki. Hence tr(D) = n i=1 dii = n i=1 m k=1 bikaki = m i=1 n k=1 bkiaik. This completes the proof. Lemma 1.38 Let A ∈ Fm×n , B ∈ Fn×p , C ∈ Fp×m be three ma- trices. Then tr(ABC) = tr(BCA) = tr(CAB). (1.4.5) Proof. Let AB = D. Then tr(ABC) = tr(DC) = tr(CD) = tr(CAB). Similarly the other result can be proved. Lemma 1.39 Trace of similar matrices is equal. Proof. Let B be similar to A. Thus B = C−1 AC for some invertible matrix C. Then tr(B) = tr(C−1 AC) = tr(CC−1 A) = tr(A). We used lemma 1.37.
  • 21. 1.4. TRACE AND DETERMINANT 21 1.4.2. Determinants Following are some results on determinant of a square matrix A. Lemma 1.40 det(αA) = αn det(A). (1.4.6) Lemma 1.41 Determinant of a square matrix and its transpose are equal. det(A) = det(AT ). (1.4.7) Lemma 1.42 Let A be a complex square matrix. Then det(AH ) = det(A). (1.4.8) Proof. det(AH ) = det(A T ) = det(A) = det(A). Lemma 1.43 Let A and B be two n × n matrices. Then det(AB) = det(A) det(B). (1.4.9) Lemma 1.44 Let A be an invertible matrix. Then det(A−1 ) = 1 det(A) . (1.4.10)
  • 22. 22 1. MATRIX ALGEBRA Lemma 1.45 Let A be a square matrix and p ∈ N. Then det(Ap ) = (det(A))p . (1.4.11) Lemma 1.46 [Determinant of a triangular matrix] Determinant of a triangular matrix is the product of its diagonal entries. i.e. if A is upper or lower triangular matrix then det(A) = n i=1 aii. (1.4.12) Lemma 1.47 [Determinant of a diagonal matrix] Determinant of a diagonal matrix is the product of its diagonal entries. i.e. if A is a diagonal matrix then det(A) = n i=1 aii. (1.4.13) Lemma 1.48 [Determinant of similar matrices] Determinant of similar matrices is equal. Proof. Let B be similar to A. Thus B = C−1 AC for some invertible matrix C. Hence det(B) = det(C−1 AC) = det(C−1 ) det(A) det(C). Now det(C−1 ) det(A) det(C) = 1 det(C) det(A) det(C) = det(A). We used lemma 1.43 and lemma 1.44.
  • 23. 1.5. UNITARY AND ORTHOGONAL MATRICES 23 Lemma 1.49 Let u and v be vectors in Fn . Then det(I + uvT ) = 1 + uT v. (1.4.14) Lemma 1.50 [Determinant of a small perturbation of identity matrix] Let A be a square matrix and let ≈ 0. Then det(I + A) ≈ 1 + tr(A). (1.4.15) 1.5. Unitary and orthogonal matrices 1.5.1. Orthogonal matrix Definition 1.21 [Orthogonal matrix] A real square matrix U is called orthogonal if the columns of U form an orthonormal set. In other words, let U = u1 u2 . . . un with ui ∈ Rn . Then we have ui · uj = δi,j. Lemma 1.51 An orthogonal matrix U is invertible with UT = U−1 . Proof. Let U = u1 u2 . . . un be orthogonal with UT =       uT 1 uT 2 ... uT n .      
  • 24. 24 1. MATRIX ALGEBRA Then UT U =       uT 1 uT 2 ... uT n .       u1 u2 . . . un = ui · uj = I. Since columns of U are linearly independent and span Rn , hence U is invertible. Thus UT = U−1 . Lemma 1.52 Determinant of an orthogonal matrix is ±1. Proof. Let U be an orthogonal matrix. Then det(UT U) = det(I) =⇒ (det(U))2 = 1 Thus we have det(U) = ±1. 1.5.2. Unitary matrix Definition 1.22 [Unitary matrix] A complex square matrix U is called unitary if the columns of U form an orthonormal set. In other words, let U = u1 u2 . . . un with ui ∈ Cn . Then we have ui · uj = ui, uj = uH j ui = δi,j. Lemma 1.53 A unitary matrix U is invertible with UH = U−1 . Proof. Let U = u1 u2 . . . un
  • 25. 1.5. UNITARY AND ORTHOGONAL MATRICES 25 be orthogonal with UH =       uH 1 uH 2 ... uH n .       Then UH U =       uH 1 uH 2 ... uH n .       u1 u2 . . . un = uH i uj = I. Since columns of U are linearly independent and span Cn , hence U is invertible. Thus UH = U−1 . Lemma 1.54 The magnitude of determinant of a unitary matrix is 1. Proof. Let U be a unitary matrix. Then det(UH U) = det(I) =⇒ det(UH ) det(U) = 1 =⇒ det(U)det(U) = 1. Thus we have | det(U)|2 = 1 =⇒ | det(U)| = 1. 1.5.3. F unitary matrix We provide a common definition for unitary matrices over any field F. This definition applies to both real and complex matrices. Definition 1.23 [F Unitary matrix] A square matrix U ∈ Fn×n is called F unitary if the columns of U form an orthonormal set. In
  • 26. 26 1. MATRIX ALGEBRA other words, let U = u1 u2 . . . un with ui ∈ Fn . Then we have ui, uj = uH j ui = δi,j. We note that a suitable definition of inner product transports the def- inition appropriately into orthogonal matrices over R and unitary ma- trices over C. When we are talking about F unitary matrices, then we will use the symbol UH to mean its inverse. In the complex case, it will map to its conjugate transpose, while in real case it will map to simple transpose. This definition helps us simplify some of the discussions in the sequel (like singular value decomposition). Following results apply equally to orthogonal matrices for real case and unitary matrices for complex case. Lemma 1.55 [Norm preservation] F-unitary matrices preserve norm. i.e. Ux 2 = x 2. Proof. Ux 2 2 = (Ux)H (Ux) = xH UH Ux = xH Ix = x 2 2. Remark. For the real case we have Ux 2 2 = (Ux)T (Ux) = xT UT Ux = xT Ix = x 2 2. Lemma 1.56 [Inner product preservation] F-unitary matrices pre- serve inner product. i.e. Ux, Uy = x, y .
  • 27. 1.6. EIGEN VALUES 27 Proof. Ux, Uy = (Uy)H Ux = yH UH Ux = yH x. Remark. For the real case we have Ux, Uy = (Uy)T Ux = yT UT Ux = yT x. 1.6. Eigen values Much of the discussion in this section will be equally applicable to real as well as complex matrices. We will use the complex notation mostly and make specific remarks for real matrices wherever needed. Definition 1.24 [Eigen value] A scalar λ is an eigen value of an n × n matrix A = [aij] if there exists a non null vector x such that Ax = λx. (1.6.1) A non null vector x which satisfies this equation is called an eigen vector of A for the eigen value λ. An eigen value is also known as a characteristic value, proper value or a latent value. We note that (1.6.1) can be written as Ax = λInx =⇒ (A − λIn)x = 0. (1.6.2) Thus λ is an eigen value of A if and only if the matrix A−λI is singular. Definition 1.25 [Spectrum of a matrix] The set comprising of eigen values of a matrix A is known as its spectrum. Remark. For each eigen vector x for a matrix A the corresponding eigen value λ is unique.
  • 28. 28 1. MATRIX ALGEBRA Proof. Assume that for x there are two eigen values λ1 and λ2, then Ax = λ1x = λ2x =⇒ (λ1 − λ2)x = 0. This can happen only when either x = 0 or λ1 = λ2. Since x is an eigen vector, it cannot be 0. Thus λ1 = λ2. Remark. If x is an eigen vector for A, then the corresponding eigen value is given by λ = xH Ax xHx . (1.6.3) Proof. Ax = λx =⇒ xH Ax = λxH x =⇒ λ = xH Ax xHx . since x is non-zero. Remark. An eigen vector x of A for eigen value λ belongs to the null space of A − λI, i.e. x ∈ N(A − λI). In other words x is a nontrivial solution to the homogeneous system of linear equations given by (A − λI)z = 0. Definition 1.26 [Eigen space] Let λ be an eigen value for a square matrix A. Then its eigen space is the null space of A − λI i.e. N(A − λI). Remark. The set comprising all the eigen vectors of A for an eigen value λ is given by N(A − λI) {0} (1.6.4) since 0 cannot be an eigen vector.
  • 29. 1.6. EIGEN VALUES 29 Definition 1.27 [Geometric multiplicity] Let λ be an eigen value for a square matrix A. The dimension of its eigen space N(A−λI) is known as the geometric multiplicity of the eigen value λ. Remark. Clearly dim(N(A − λI)) = n − rank(A − λI). Remark. A scalar λ can be an eigen value of a square matrix A if and only if det(A − λI) = 0. det(A − λI) is a polynomial in λ of degree n. Remark. det(A − λI) = p(λ) = αn λn + αn−1 λn−1 + · · · + α1 λ + α0 (1.6.5) where αi depend on entries in A. In this sense, an eigen value of A is a root of the equation p(λ) = 0. (1.6.6) Its easy to show that αn = (−1)n . Definition 1.28 [Characteristic polynomial and equation] For any square matrix A, the polynomial given by p(λ) = det(A − λI) is known as its characteristic polynomial. The equation give by p(λ) = 0 (1.6.7) is known as its characteristic equation. The eigen values of A are the roots of its characteristic polynomial or solutions of its characteristic equation.
  • 30. 30 1. MATRIX ALGEBRA Lemma 1.57 [Roots of characteristic equation] For real square matrices, if we restrict eigen values to real values, then the char- acteristic polynomial can be factored as p(λ) = (−1)n (λ − λ1)r1 . . . (λ − λk)rk q(λ). (1.6.8) The polynomial has k distinct real roots. For each root λi, ri is a positive integer indicating how many times the root appears. q(λ) is a polynomial that has no real roots. The following is true r1 + · · · + rk + deg(q(λ)) = n. (1.6.9) Clearly k ≤ n. For complex square matrices where eigen values can be complex (including real square matrices), the characteristic polynomial can be factored as p(λ) = (−1)n (λ − λ1)r1 . . . (λ − λk)rk . (1.6.10) The polynomial can be completely factorized into first degree poly- nomials. There are k distinct roots or eigen values. The following is true r1 + · · · + rk = n. (1.6.11) Thus including the duplicates there are exactly n eigen values for a complex square matrix. Remark. It is quite possible that a real square matrix doesn’t have any real eigen values. Definition 1.29 [Algebraic multiplicity] The number of times an eigen value appears in the factorization of the characteristic poly- nomial of a square matrix A is known as its algebraic multiplicity. In other words ri is the algebraic multiplicity for λi in above fac- torization. Remark. In above the set {λ1, . . . , λk} forms the spectrum of A.
  • 31. 1.6. EIGEN VALUES 31 Let us consider the sum of ri which gives the count of total number of roots of p(λ). m = k i=1 ri. (1.6.12) With this there are m not-necessarily distinct roots of p(λ). Let us write p(λ) as p(λ) = (−1)n (λ − c1)(λ − c2) . . . (λ − cm)q(λ). (1.6.13) where c1, c2, . . . , cm are m scalars (not necessarily distinct) of which r1 scalars are λ1, r2 are λ2 and so on. Obviously for the complex case q(λ) = 1. We will refer to the set (allowing repetitions) {c1, c2, . . . , cm} as the eigen values of the matrix A where ci are not necessarily distinct. In contrast the spectrum of A refers to the set of distinct eigen values of A. The symbol c has been chosen based on the other name for eigen values (the characteristic values). We can put together eigen vectors of a matrix into another matrix by itself. This can be very useful tool. We start with a simple idea. Lemma 1.58 Let A be an n × n matrix. Let u1, u2, . . . , ur be r non-zero vectors from Fn . Let us construct an n × r matrix U = u1 u2 . . . ur . Then all the r vectors are eigen vectors of A if and only if there exists a diagonal matrix D = diag(d1, . . . , dr) such that AU = UD. (1.6.14) Proof. Expanding the equation, we can write Au1 Au2 . . . Aur = d1u1 d2u2 . . . drur . Clearly we want Aui = diui
  • 32. 32 1. MATRIX ALGEBRA where ui are non-zero. This is possible only when di is an eigen value of A and ui is an eigen vector for di. Converse: Assume that ui are eigen vectors. Choose di to be corre- sponding eigen values. Then the equation holds. Lemma 1.59 0 is an eigen value of a square matrix A if and only if A is singular. Proof. Let 0 be an eigen value of A. Then there exists u = 0 such that Au = 0u = 0. Thus u is a non-trivial solution of the homogeneous linear system. Thus A is singular. Converse: Assuming that A is singular, there exists u = 0 s.t. Au = 0 = 0u. Thus 0 is an eigen value of A. Lemma 1.60 If a square matrix A is singular, then N(A) is the eigen space for the eigen value λ = 0. Proof. This is straight forward from the definition of eigen space (see definition 1.26). Remark. Clearly the geometric multiplicity of λ = 0 equals nullity(A) = n − rank(A). Lemma 1.61 Let A be a square matrix. Then A and AT have same eigen values. Proof. The eigen values of AT are given by det(AT − λI) = 0.
  • 33. 1.6. EIGEN VALUES 33 But AT − λI = AT − (λI)T = (A − λI)T . Hence (using lemma 1.41) det(AT − λI) = det (A − λI)T = det(A − λI). Thus the characteristic polynomials of A and AT are same. Hence the eigen values are same. In other words the spectrum of A and AT are same. Remark (Direction preservation). If x is an eigen vector with a non- zero eigen value λ for A then Ax and x are collinear. In other words the angle between Ax and x is either 0◦ when λ is positive and is 180◦ when λ is negative. Let us look at the inner product: Ax, x = xH Ax = xH λx = λ x 2 2. Meanwhile Ax 2 = λx 2 = |λ| x 2. Thus | Ax, x | = Ax 2 x 2. The angle θ between Ax and x is given by cos θ = Ax, x Ax 2 x 2 = λ x 2 2 |λ| x 2 2 = ±1. Lemma 1.62 Let A be a square matrix and λ be an eigen value of A. Let p ∈ N. Then λp is an eigen value of Ap . Proof. For p = 1 the statement holds trivially since λ1 is an eigen value of A1 . Assume that the statement holds for some value of p. Thus let λp be an eigen value of Ap and let u be corresponding eigen vector. Now Ap+1 u = Ap (Au) = Ap λu = λAp u = λλp u = λp+1 u.
  • 34. 34 1. MATRIX ALGEBRA Thus λp+1 is an eigen value for Ap+1 with the same eigen vector u. With the principle of mathematical induction, the proof is complete. Lemma 1.63 Let a square matrix A be non singular and let λ = 0 be some eigen value of A. Then λ−1 is an eigen value of A−1 . Moreover, all eigen values of A−1 are obtained by taking inverses of eigen values of A i.e. if µ = 0 is an eigen value of A−1 then 1 µ is an eigen value of A also. Also, A and A−1 share the same set of eigen vectors. Proof. Let u = 0 be an eigen vector of A for the eigen value λ. Then Au = λu =⇒ u = A−1 λu =⇒ 1 λ u = A−1 u. Thus u is also an eigen vector of A−1 for the eigen value 1 λ . Now let B = A−1 . Then B−1 = A. Thus if µ is an eigen value of B then 1 µ is an eigen value of B−1 = A. Thus if A is invertible then eigen values of A and A−1 have one to one correspondence. This result is very useful. Since if it can be shown that a matrix A is similar to a diagonal or a triangular matrix whose eigen values are easy to obtain then determination of the eigen values of A becomes straight forward. 1.6.1. Invariant subspaces Definition 1.30 [Invariance subspace] Let A be a square n × n matrix and let W be a subspace of Fn i.e. W ≤ F. Then W is invariant relative to A if Aw ∈ W ∀ w ∈ W. (1.6.15) i.e. A(W) ⊆ W or for every vector w ∈ W its mapping Aw is also in W. Thus action of A on W doesn’t take us outside of W.
  • 35. 1.6. EIGEN VALUES 35 We also say that W is A-invariant. Eigen vectors are generators of invariant subspaces. Lemma 1.64 Let A be an n × n matrix. Let x1, x2, . . . , xr be r eigen vectors of A. Let us construct an n × r matrix X = x1 x2 . . . rr . Then the column space of X i.e. C(X) is invariant relative to A. Proof. Let us assume that c1, c2, . . . , cr are the eigen values cor- responding to x1, x2, . . . , xr (not necessarily distinct). Let any vector x ∈ C(X) be given by x = r i=1 αixi. Then Ax = A r i=1 αixi = r i=1 αiAxi = r i=1 αicixi. Clearly Ax is also a linear combination of xi hence belongs to C(X). Thus X is invariant relative to A or X is A-invariant. 1.6.2. Triangular matrices Lemma 1.65 Let A be an n×n upper or lower triangular matrix. Then its eigen values are the entries on its main diagonal. Proof. If A is triangular then A − λI is also triangular with its diagonal entries being (aii − λ). Using lemma 1.46, we have p(λ) = det(A − λI) = n i=1 (aii − λ). Clearly the roots of characteristic polynomial are aii. Several small results follow from this lemma.
  • 36. 36 1. MATRIX ALGEBRA Corollary 1.66. Let A = [aij] be an n × n triangular matrix. (a) The characteristic polynomial of A is p(λ) = (−1)n (λ − aii). (a) A scalar λ is an eigen value of A iff its one of the diagonal entries of A. (a) The algebraic multiplicity of an eigen value λ is equal to the number of times it appears on the main diagonal of A. (a) The spectrum of A is given by the distinct entries on the main diagonal of A. A diagonal matrix is naturally both an upper triangular matrix as well as a lower triangular matrix. Similar results hold for the eigen values of a diagonal matrix also. Lemma 1.67 Let A = [aij] be an n × n diagonal matrix. (a) Its eigen values are the entries on its main diagonal. (a) The characteristic polynomial of A is p(λ) = (−1)n (λ − aii). (a) A scalar λ is an eigen value of A iff its one of the diagonal entries of A. (a) The algebraic multiplicity of an eigen value λ is equal to the number of times it appears on the main diagonal of A. (a) The spectrum of A is given by the distinct entries on the main diagonal of A. There is also a result for the geometric multiplicity of eigen values for a diagonal matrix. Lemma 1.68 Let A = [aij] be an n × n diagonal matrix. The geometric multiplicity of an eigen value λ is equal to the number of times it appears on the main diagonal of A. Proof. The unit vectors ei are eigen vectors for A since Aei = aiiei.
  • 37. 1.6. EIGEN VALUES 37 They are independent. Thus if a particular eigen value appears r num- ber of times, then there are r linearly independent eigen vectors for the eigen value. Thus its geometric multiplicity is equal to the algebraic multiplicity. 1.6.3. Similar matrices Some very useful results are available for similar matrices. Lemma 1.69 The characteristic polynomial and spectrum of sim- ilar matrices is same. Proof. Let B be similar to A. Thus there exists an invertible matrix C such that B = C−1 AC. Now B−λI = C−1 AC−λI = C−1 AC−λC−1 C = C−1 (AC−λC) = C−1 (A−λI)C. Thus B − λI is similar to A − λI. Hence due to lemma 1.48, their determinant is equal i.e. det(B − λI) = det(A − λI). This means that the characteristic polynomials of A and B are same. Since eigen values are nothing but roots of the characteristic polyno- mial, hence they are same too. This means that the spectrum (the set of distinct eigen values) is same. Corollary 1.70. If A and B are similar to each other then (a) An eigen value has same algebraic and geometric multiplicity for both A and B. (a) The (not necessarily distinct) eigen values of A and B are same. Although the eigen values are same, but the eigen vectors are differ- ent.
  • 38. 38 1. MATRIX ALGEBRA Lemma 1.71 Let A and B be similar with B = C−1 AC for some invertible matrix C. If u is an eigen vector of A for an eigen value λ, then C−1 u is an eigen vector of B for the same eigen value. Proof. u is an eigen vector of A for an eigen value λ. Thus we have Au = λu. Thus BC−1 u = C−1 ACC−1 u = C−1 Au = C−1 λu = λC−1 u. Now u = 0 and C−1 is non singular. Thus C−1 u = 0. Thus C−1 u is an eigen vector of B. Theorem 1.72 [Geometric vs. algebraic multiplicity] Let λ be an eigen value of a square matrix A. Then the geometric multiplicity of λ is less than or equal to its algebraic multiplicity. Corollary 1.73. If an n×n matrix A has n distinct eigen values, then each of them has a geometric (and algebraic) multiplicity of 1. Proof. The algebraic multiplicity of an eigen value is greater than or equal to 1. But the sum cannot exceed n. Since there are n distinct eigen values, thus each of them has algebraic multiplicity of 1. Now geometric multiplicity of an eigen value is greater than equal to 1 and less than equal to its algebraic multiplicity.
  • 39. 1.6. EIGEN VALUES 39 Corollary 1.74. Let an n × n matrix A has k distinct eigen values λ1, λ2, . . . , λk with algebraic multiplicities r1, r2, . . . , rk and geometric multiplicities g1, g2, . . . gk respectively. Then k i=1 gk ≤ k i=1 rk ≤ n. Moreover if k i=1 gk = k i=1 rk then gk = rk. 1.6.4. Linear independence of eigen vectors Theorem 1.75 [Linear independence of eigen vectors for distinct eigen values] Let A be an n × n square matrix. Let x1, x2, . . . , xk be any k eigen vectors of A for distinct eigen values λ1, λ2, . . . , λk respectively. Then x1, x2, . . . , xk are linearly independent. Proof. We first prove the simpler case with 2 eigen vectors x1 and x2 and corresponding eigen values λ1 and λ2 respectively. Let there be a linear relationship between x1 and x2 given by α1x1 + α2x2 = 0. Multiplying both sides with (A − λ1I) we get α1(A − λ1I)x1 + α2(A − λ1I)x2 = 0 =⇒ α1(λ1 − λ1)x1 + α2(λ1 − λ2)x2 = 0 =⇒ α2(λ1 − λ2)x2 = 0. Since λ1 = λ2 and x2 = 0 , hence α2 = 0. Similarly by multiplying with (A − λ2I) on both sides, we can show that α1 = 0. Thus x1 and x2 are linearly independent.
  • 40. 40 1. MATRIX ALGEBRA Now for the general case, consider a linear relationship between x1, x2, . . . , xk given by α1x1 + α2x2 + . . . αkxk = 0. Multiplying by k i=j,i=1(A − λiI) and using the fact that λi = λj if i = j, we get αj = 0. Thus the only linear relationship is the trivial relationship. This completes the proof. For eigen values with geometric multiplicity greater than 1 there are multiple eigenvectors corresponding to the eigen value which are lin- early independent. In this context, above theorem can be generalized further. Theorem 1.76 Let λ1, λ2, . . . , λk be k distinct eigen values of A. Let {xj 1, xj 2, . . . xj gj } be any gj linearly independent eigen vec- tors from the eigen space of λj where gj is the geometric mul- tiplicity of λj. Then the combined set of eigen vectors given by {x1 1, . . . x1 g1 , . . . xk 1, . . . xk gk } consisting of k j=1 gj eigen vectors is linearly independent. This result puts an upper limit on the number of linearly independent eigen vectors of a square matrix. Lemma 1.77 Let {λ1, . . . , λk} represents the spectrum of an n×n matrix A. Let g1, . . . , gk be the geometric multiplicities of λ1, . . . λk respectively. Then the number of linearly independent eigen vectors for A is k i=1 gi. Moreover if k i=1 gi = n then a set of n linearly independent eigen vectors of A can be found which forms a basis for Fn .
  • 41. 1.6. EIGEN VALUES 41 1.6.5. Diagonalization Diagonalization is one of the fundamental operations in linear algebra. This section discusses diagonalization of square matrices in depth. Definition 1.31 [Diagonalizable matrix] An n × n matrix A is said to be diagonalizable if it is similar to a diagonal matrix. In other words there exists an n × n non-singular matrix P such that D = P−1 AP is a diagonal matrix. If this happens then we say that P diagonalizes A or A is diagonalized by P. Remark. D = P−1 AP ⇐⇒ PD = AP ⇐⇒ PDP−1 = A. (1.6.16) We note that if we restrict to real matrices, then U and D should also be real. If A ∈ Cn×n (it may still be real) then P and D can be complex. The next theorem is the culmination of a variety of results studied so far. Theorem 1.78 [Properties of diagonalizable matrices] Let A be a diagonalizable matrix with D = P−1 AP being its diagonalization. Let D = diag(d1, d2, . . . , dn). Then the following hold (a) rank(A) = rank(D) which equals the number of non-zero en- tries on the main diagonal of D. (a) det(A) = d1d2 . . . dn. (a) tr(A) = d1 + d2 + . . . dn. (a) The characteristic polynomial of A is p(λ) = (−1)n (λ − d1)(λ − d2) . . . (λ − dn). (a) The spectrum of A comprises the distinct scalars on the diag- onal entries in D.
  • 42. 42 1. MATRIX ALGEBRA (a) The (not necessarily distinct) eigenvalues of A are the diagonal elements of D. (a) The columns of P are (linearly independent) eigenvectors of A. (a) The algebraic and geometric multiplicities of an eigenvalue λ of A equal the number of diagonal elements of D that equal λ. Proof. From definition 1.31 we note that D and A are similar. Due to lemma 1.48 det(A) = det(D). Due to lemma 1.47 det(D) = n i=1 di. Now due to lemma 1.39 tr(A) = tr(D) = n i=1 di. Further due to lemma 1.69 the characteristic polynomial and spectrum of A and D are same. Due to lemma 1.67 the eigen values of D are nothing but its diagonal entries. Hence they are also the eigen values of A. D = P−1 AP =⇒ AP = PD. Now writing P = p1 p2 . . . pn we have AP = Ap1 Ap2 . . . Apn = PD = d1p1 d2p2 . . . dnpn . Thus pi are eigen vectors of A. Since the characteristic polynomials of A and D are same, hence the algebraic multiplicities of eigen values are same. From lemma 1.71 we get that there is a one to one correspondence between the eigen vectors of A and D through the change of basis
  • 43. 1.6. EIGEN VALUES 43 given by P. Thus the linear independence relationships between the eigen vectors remain the same. Hence the geometric multiplicities of individual eigenvalues are also the same. This completes the proof. So far we have verified various results which are available if a matrix A is diagonalizable. We haven’t yet identified the conditions under which A is diagonalizable. We note that not every matrix is diagonalizable. The following theorem gives necessary and sufficient conditions under which a matrix is diagonalizable. Theorem 1.79 An n × n matrix A is diagonalizable by an n × n non-singular matrix P if and only if the columns of P are (linearly independent) eigenvectors of A. Proof. We note that since P is non-singular hence columns of P have to be linearly independent. The necessary condition part was proven in theorem 1.78. We now show that if P consists of n linearly independent eigen vectors of A then A is diagonalizable. Let the columns of P be p1, p2, . . . , pn and corresponding (not neces- sarily distinct) eigen values be d1, d2, . . . , dn. Then Api = dipi. Thus by letting D = diag(d1, d2, . . . , dn), we have AP = PD. Now since columns of P are linearly independent, hence P is invertible. This gives us D = P−1 AP. Thus A is similar to a diagonal matrix D. This validates the sufficient condition.
  • 44. 44 1. MATRIX ALGEBRA A corollary follows. Corollary 1.80. An n×n matrix is diagonalizable if and only if there exists a linearly independent set of n eigenvectors of A. Now we know that geometric multiplicities of eigen values of A provide us information about linearly independent eigenvectors of A. Corollary 1.81. Let A be an n × n matrix. Let λ1, λ2, . . . , λk be its k distinct eigen values (comprising its spectrum). Let gj be the geometric multiplicity of λj.Then A is diagonalizable if and only if n i=1 gi = n. (1.6.17) 1.6.6. Symmetric matrices This subsection is focused on real symmetric matrices. Following is a fundamental property of real symmetric matrices. Theorem 1.82 Every real symmetric matrix has an eigen value. The proof of this result is beyond the scope of this book. Lemma 1.83 Let A be an n×n real symmetric matrix. Let λ1 and λ2 be any two distinct eigen values of A and let x1 and x2 be any two corresponding eigen vectors. Then x1 and x2 are orthogonal. Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus xT 2 Ax1 = λ1xT 2 x1 =⇒ xT 1 AT x2 = λ1xT 1 x2 =⇒ xT 1 Ax2 = λ1xT 1 x2 =⇒ xT 1 λ2x2 = λ1xT 1 x2 =⇒ (λ1 − λ2)xT 1 x2 = 0 =⇒ xT 1 x2 = 0.
  • 45. 1.6. EIGEN VALUES 45 Thus x1 and x2 are orthogonal. In between we took transpose on both sides, used the fact that A = AT and λ1 − λ2 = 0. Definition 1.32 [Orthogonally diagonalizable matrix] A real n×n matrix A is said to be orthogonally diagonalizable if there exists an orthogonal matrix U which can diagonalize A, i.e. D = UT AU is a real diagonal matrix. Lemma 1.84 Every orthogonally diagonalizable matrix A is sym- metric. Proof. We have a diagonal matrix D such that A = UDUT . Taking transpose on both sides we get AT = UDT UT = UDUT = A. Thus A is symmetric. Theorem 1.85 Every symmetric matrix A is orthogonally diago- nalizable. We skip the proof of this theorem. 1.6.7. Hermitian matrices Following is a fundamental property of Hermitian matrices. Theorem 1.86 Every Hermitian matrix has an eigen value. The proof of this result is beyond the scope of this book.
  • 46. 46 1. MATRIX ALGEBRA Lemma 1.87 The eigenvalues of a Hermitian matrix are real. Proof. Let A be a Hermitian matrix and let λ be an eigen value of A. Let u be a corresponding eigen vector. Then Au = λu =⇒ uH AH = uH λ =⇒ uH AH u = uH λu =⇒ uH Au = λuH u =⇒ uH λu = λuH u =⇒ u 2 2(λ − λ) = 0 =⇒ λ = λ thus λ is real. We used the facts that A = AH and u = 0 =⇒ u 2 = 0. Lemma 1.88 Let A be an n × n complex Hermitian matrix. Let λ1 and λ2 be any two distinct eigen values of A and let x1 and x2 be any two corresponding eigen vectors. Then x1 and x2 are orthogonal. Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus xH 2 Ax1 = λ1xH 2 x1 =⇒ xH 1 AH x2 = λ1xH 1 x2 =⇒ xH 1 Ax2 = λ1xH 1 x2 =⇒ xH 1 λ2x2 = λ1xH 1 x2 =⇒ (λ1 − λ2)xH 1 x2 = 0 =⇒ xH 1 x2 = 0. Thus x1 and x2 are orthogonal. In between we took conjugate transpose on both sides, used the fact that A = AH and λ1 − λ2 = 0.
  • 47. 1.6. EIGEN VALUES 47 Definition 1.33 [Unitary diagonalizable matrix] A complex n×n matrix A is said to be unitary diagonalizable if there exists a unitary matrix U which can diagonalize A, i.e. D = UH AU is a complex diagonal matrix. Lemma 1.89 Let A be a unitary diagonalizable matrix whose di- agonalization D is real. Then A is Hermitian. Proof. We have a real diagonal matrix D such that A = UDUH . Taking conjugate transpose on both sides we get AH = UDH UH = UDUH = A. Thus A is Hermitian. We used the fact that DH = D since D is real. Theorem 1.90 Every Hermitian matrix A is unitary diagonaliz- able. We skip the proof of this theorem. The theorem means that if A is Hermitian then A = UΛUH Definition 1.34 [Eigen value decomposition of a Hermitian ma- trix] Let A be an n × n Hermitian matrix. Let λ1, . . . λn be its eigen values such that |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Let Λ = diag(λ1, . . . , λn). Let U be a unit matrix consisting of orthonormal eigen vectors corresponding to λ1, . . . , λn. Then The eigen value decomposition of A is defined as A = UΛUH . (1.6.18)
  • 48. 48 1. MATRIX ALGEBRA If λi are distinct, then the decomposition is unique. If they are not distinct, then Remark. Let Λ be a diagonal matrix as in definition 1.34. Consider some vector x ∈ Cn . xH Λx = n i=1 λi|xi|2 . (1.6.19) Now if λi ≥ 0 then xH Λx ≤ λ1 n i=1 |xi|2 = λ1 x 2 2. Also xH Λx ≥ λn n i=1 |xi|2 = λn x 2 2. Lemma 1.91 Let A be a Hermitian matrix with non-negative eigen values. Let λ1 be its largest and λn be its smallest eigen values. λn x 2 2 ≤ xH Ax ≤ λ1 x 2 2 ∀ x ∈ Cn . (1.6.20) Proof. A has an eigen value decomposition given by A = UΛUH . Let x ∈ Cn and let v = UH x. Clearly x 2 = v 2. Then xH Ax = xH UΛUH x = vH Λv. From previous remark we have λn v 2 2 ≤ vH Λv ≤ λ1 v 2 2. Thus we get λn x 2 2 ≤ xH Ax ≤ λ1 x 2 2.
  • 49. 1.6. EIGEN VALUES 49 1.6.8. Miscellaneous properties This subsection lists some miscellaneous properties of eigen values of a square matrix. Lemma 1.92 λ is an eigen value of A if and only if λ + k is an eigen value of A + kI. Moreover A and A + kI share the same eigen vectors. Proof. Ax = λx ⇐⇒ Ax + kx = λx + kx ⇐⇒ (A + kI)x = (λ + k)x. (1.6.21) Thus λ is an eigen value of A with an eigen vector x if and only if λ+k is an eigen vector of A + kI with an eigen vector x. 1.6.9. Diagonally dominant matrices Definition 1.35 [Diagonally dominant matrix] Let A = [aij] be a square matrix in Cn×n . A is called diagonally dominant if |aii| ≥ j=i |aij| holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal element is greater than or equal to the sum of absolute values of all the off diagonal elements on that row. Definition 1.36 [Strictly diagonally dominant matrix] Let A = [aij] be a square matrix in Cn×n . A is called strictly diagonally dominant if |aii| > j=i |aij| holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal element is bigger than the sum of absolute values of all the off diagonal elements on that row.
  • 50. 50 1. MATRIX ALGEBRA Example 1.2: Strictly diagonally dominant matrix Let us con- sider A =       −4 −2 −1 0 −4 7 2 0 3 −4 9 1 2 −1 −3 15       We can see that the strict diagonal dominance condition is satisfied for each row as follows: row 1 : | − 4| > | − 2| + | − 1| + |0| = 3 row 2 : |7| > | − 4| + |2| + |0| = 6 row 3 : |9| > |3| + | − 4| + |1| = 8 row 4 : |15| > |2| + | − 1| + | − 3| = 6 Strictly diagonally dominant matrices have a very special property. They are always non-singular. Theorem 1.93 Strictly diagonally dominant matrices are non- singular. Proof. Suppose that A is diagonally dominant and singular. Then there exists a vector u ∈ Cn with u = 0 such that Au = 0. (1.6.22) Let u = u1 u2 . . . un T . We first show that every entry in u cannot be equal in magnitude. Let us assume that this is so. i.e. c = |u1| = |u2| = · · · = |un|.
  • 51. 1.6. EIGEN VALUES 51 Since u = 0 hence c = 0. Now for any row i in (1.6.22) , we have n j=1 aijuj = 0 =⇒ n j=1 ±aijc = 0 =⇒ n j=1 ±aij = 0 =⇒ aii = j=i ±aij =⇒ |aii| = | j=i ±aij| =⇒ |aii| ≤ j=i |aij| using triangle inequality but this contradicts our assumption that A is strictly diagonally dom- inant. Thus all entries in u are not equal in magnitude. Let us now assume that the largest entry in u lies at index i with |ui| = c. Without loss of generality we can scale down u by c to get another vector in which all entries are less than or equal to 1 in magnitude while i-th entry is ±1. i.e. ui = ±1 and |uj| ≤ 1 for all other entries. Now from (1.6.22) we get for the i-th row n j=1 aijuj = 0 =⇒ ± aii = j=i ujaij =⇒ |aii| ≤ j=i |ujaij| ≤ j=i |aij| which again contradicts our assumption that A is strictly diagonally dominant. Hence strictly diagonally dominant matrices are non-singular.
  • 52. 52 1. MATRIX ALGEBRA 1.6.10. Gershgorin’s theorem We are now ready to examine Gershgorin’ theorem which provides very useful bounds on the spectrum of a square matrix. Theorem 1.94 Every eigen value λ of a square matrix A ∈ Cn×n satisfies |λ − aii| ≤ j=i |aij| for some i ∈ {1, 2, . . . , n}. (1.6.23) Proof. The proof is a straight forward application of non-singularity of diagonally dominant matrices. We know that for an eigen value λ, det(λI − A) = 0 i.e. the matrix (λI − A) is singular. Hence it cannot be strictly diagonally dominant due to theorem 1.93. Thus looking at each row i of (λI − A) we can say that |λ − aii| > j=i |aij| cannot be true for all rows simultaneously. i.e. it must fail at least for one row. This means that there exists at least one row i for which |λ − aii| ≤ j=i |aij| holds true. What this theorem means is pretty simple. Consider a disc in the complex plane for the i-th row of A whose center is given by aii and whose radius is given by r = j=i |aij| i.e. the sum of magnitudes of all non-diagonal entries in i-th row. There are n such discs corresponding to n rows in A. (1.6.23) means that every eigen value must lie within the union of these discs. It cannot lie outside. This idea is crystallized in following definition.
  • 53. 1.7. SINGULAR VALUES 53 Definition 1.37 [Gershgorin’s disc] For i-th row of matrix A we define the radius ri = j=i |aij| and the center ci = aii. Then the set given by Di = {z ∈ C : |z − aii| ≤ ri} is called the i-th Gershgorin’s disc of A. We note that the definition is equally valid for real as well as complex matrices. For real matrices, the centers of disks lie on the real line. For complex matrices, the centers may lie anywhere in the complex plane. Clearly there is nothing magical about the rows of A. We can as well consider the columns of A. Theorem 1.95 Every eigen value of a matrix A must lie in a Gershgorin disc corresponding to the columns of A where the Ger- shgorin disc for j-th column is given by Dj = {z ∈ C : |z − ajj| ≤ rj} with rj = i=j |aij| Proof. We know that eigen values of A are same as eigen values of AT and columns of A are nothing but rows of AT . Hence eigen values of A must satisfy conditions in theorem 1.94 w.r.t. the matrix AT . This completes the proof. 1.7. Singular values In previous section we saw diagonalization of square matrices which resulted in an eigen value decomposition of the matrix. This matrix factorization is very useful yet it is not applicable in all situations. In particular, the eigen value decomposition is useless if the square matrix is not diagonalizable or if the matrix is not square at all. Moreover,
  • 54. 54 1. MATRIX ALGEBRA the decomposition is particularly useful only for real symmetric or Her- mitian matrices where the diagonalizing matrix is an F-unitary matrix (see definition 1.23). Otherwise, one has to consider the inverse of the diagonalizing matrix also. Fortunately there happens to be another decomposition which applies to all matrices and it involves just F-unitary matrices. Definition 1.38 [Singular value] A non-negative real number σ is a singular value for a matrix A ∈ Fm×n if and only if there exist unit-length vectors u ∈ Fm and v ∈ Fn such that Av = σu (1.7.1) and AH u = σv (1.7.2) hold. The vectors u and v are called left-singular and right- singular vectors for σ respectively. We first present the basic result of singular value decomposition. We will not prove this result completely although we will present proofs of some aspects. Theorem 1.96 For every A ∈ Fm×n with k = min(m, n), there exist two F-unitary matrices U ∈ Fm×m and V ∈ Fn×n and a sequence of real numbers σ1 ≥ σ2 ≥ · · · ≥ σk ≥ 0 such that UH AV = Σ (1.7.3) where Σ = diag(σ1, σ2, . . . , σk) ∈ Fm×n . The non-negative real numbers σi are the singular values of A as per definition 1.38.
  • 55. 1.7. SINGULAR VALUES 55 The sequence of real numbers σi doesn’t depend on the particular choice of U and V . Σ is rectangular with the same size as A. The singular values of A lie on the principle diagonal of Σ. All other entries in Σ are zero. It is certainly possible that some of the singular values are 0 themselves. Remark. Since UH AV = Σ hence A = UΣV H . (1.7.4) Definition 1.39 [Singular value decomposition] The decomposi- tion of a matrix A ∈ Fm×n given by A = UΣV H (1.7.5) is known as its singular value decomposition. Remark. When F is R then the decomposition simplifies to UT AV = Σ (1.7.6) and A = UΣV T . (1.7.7) Remark. Clearly there can be at most k = min(m, n) distinct singular values of A. Remark. We can also write AV = UΣ. (1.7.8) Remark. Let us expand A = UΣV H = u1 u2 . . . um σij       vH 1 vH 2 ... vH n       = m i=1 n j=1 σijuivH j .
  • 56. 56 1. MATRIX ALGEBRA Remark. Alternatively, let us expand Σ = UH AV =       uH 1 uH 2 ... uH m       A v1 v2 . . . vm = uH i Avj This gives us σij = uH i Avj. (1.7.9) Following lemma verifies that Σ indeed consists of singular values of A as per definition 1.38. Lemma 1.97 Let A = UΣV H be a singular value decomposition of A. Then the main diagonal entries of Σ are singular values. The first k = min(m, n) column vectors in U and V are left and right singular vectors of A. Proof. We have AV = UΣ. Let us expand R.H.S. UΣ = m j=1 uijσjk = [uikσk] = σ1u1 σ2u2 . . . σkuk 0 . . . 0 where 0 columns in the end appear n − k times. Expanding the L.H.S. we get AV = Av1 Av2 . . . Avn . Thus by comparing both sides we get Avi = σiui for 1 ≤ i ≤ k and Avi = 0 for k < i ≤ n. Now let us start with A = UΣV H =⇒ AH = V ΣH UH =⇒ AH U = V ΣH .
  • 57. 1.7. SINGULAR VALUES 57 Let us expand R.H.S. V ΣH = n j=1 vijσjk = [vikσk] = σ1v1 σ2v2 . . . σkvk 0 . . . 0 where 0 columns appear m − k times. Expanding the L.H.S. we get AH U = AH u1 AH u2 . . . AH um . Thus by comparing both sides we get AH ui = σivi for 1 ≤ i ≤ k and AH ui = 0 for k < i ≤ m. We now consider the three cases. For m = n, we have k = m = n. And we get Avi = σiui, AH ui = σivi for 1 ≤ i ≤ m Thus σi is a singular value of A and ui is a left singular vector while vi is a right singular vector. For m < n, we have k = m. We get for first m vectors in V Avi = σiui, AH ui = σivi for 1 ≤ i ≤ m. Finally for remaining n − m vectors in V , we can write Avi = 0. They belong to the null space of A. For m > n, we have k = n. We get for first n vectors in U Avi = σiui, AH ui = σivi for 1 ≤ i ≤ n. Finally for remaining m − n vectors in U, we can write AH ui = 0.
  • 58. 58 1. MATRIX ALGEBRA Lemma 1.98 ΣΣH is an m × m matrix given by ΣΣH = diag(σ2 1, σ2 2, . . . σ2 k, 0, 0, . . . 0) where the number of 0’s following σ2 k is m − k. Lemma 1.99 ΣH Σ is an n × n matrix given by ΣH Σ = diag(σ2 1, σ2 2, . . . σ2 k, 0, 0, . . . 0) where the number of 0’s following σ2 k is n − k. Lemma 1.100 [Rank and singular value decomposition] Let A ∈ Fm×n have a singular value decomposition given by A = UΣV H . Then rank(A) = rank(Σ). (1.7.10) In other words, rank of A is number of non-zero singular values of A. Since the singular values are ordered in descending order in A hence, the first r singular values σ1, . . . , σr are non-zero. Proof. This is a straight forward application of lemma 1.6 and lemma 1.7. Further since only non-zero values in Σ appear on its main diagonal hence its rank is number of non-zero singular values σi. Corollary 1.101. Let r = rank(A). Then Σ can be split as a block matrix Σ = Σr 0 0 0 (1.7.11) where Σr is an r × r diagonal matrix of the non-zero singular values diag(σ1, σ2, . . . , σr). All other sub-matrices in Σ are 0.
  • 59. 1.7. SINGULAR VALUES 59 Lemma 1.102 The eigen values of Hermitian matrix AH A ∈ Fn×n are σ2 1, σ2 2, . . . σ2 k, 0, 0, . . . 0 with n − k 0’s after σ2 k. Moreover the eigen vectors are the columns of V . Proof. AH A = UΣV H H UΣV H = V ΣH UH UΣV H = V ΣH ΣV H . We note that AH A is Hermitian. Hence AH A is diagonalized by V and the diagonalization of AH A is ΣH Σ. Thus the eigen values of AH A are σ2 1, σ2 2, . . . σ2 k, 0, 0, . . . 0 with n − k 0’s after σ2 k. Clearly (AH A)V = V (ΣH Σ) thus columns of V are the eigen vectors of AH A. Lemma 1.103 The eigen values of Hermitian matrix AAH ∈ Fm×m are σ2 1, σ2 2, . . . σ2 k, 0, 0, . . . 0 with m−k 0’s after σ2 k. Moreover the eigen vectors are the columns of V . Proof. AAH = UΣV H UΣV H H = UΣV H V ΣH UH = UΣΣH UH . We note that AH A is Hermitian. Hence AH A is diagonalized by V and the diagonalization of AH A is ΣH Σ. Thus the eigen values of AH A are σ2 1, σ2 2, . . . σ2 k, 0, 0, . . . 0 with m − k 0’s after σ2 k. Clearly (AAH )U = U(ΣΣH ) thus columns of U are the eigen vectors of AAH . Lemma 1.104 The Gram matrices AAH and AH A share the same eigen values except for some extra 0s. Their eigen values are the squares of singular values of A and some extra 0s. In other words
  • 60. 60 1. MATRIX ALGEBRA singular values of A are the square roots of non-zero eigen values of the Gram matrices AAH or AH A. 1.7.1. The largest singular value Lemma 1.105 For all u ∈ Fn the following holds Σu 2 ≤ σ1 u 2 (1.7.12) Moreover for all u ∈ Fm the following holds ΣH u 2 ≤ σ1 u 2 (1.7.13) Proof. Let us expand the term Σu.          σ1 0 . . . . . . 0 0 σ2 . . . . . . 0 ... ... ... . . . 0 0 ... σk . . . 0 0 0 ... . . . 0                     u1 u2 ... uk ... un            =               σ1u1 σ2u2 ... σkuk 0 ... 0               Now since σ1 is the largest singular value, hence |σrui| ≤ |σ1ui| ∀ 1 ≤ i ≤ k. Thus n i=1 |σ1ui|2 ≥ n i=1 |σiui|2 or σ2 1 u 2 2 ≥ Σu 2 2. The result follows. A simpler representation of Σu can be given using corollary 1.101. Let r = rank(A). Thus Σ = Σr 0 0 0
  • 61. 1.7. SINGULAR VALUES 61 We split entries in u as u = [(u1, . . . , ur)(ur+1 . . . un)]T . Then Σu =   Σr u1 . . . ur T 0 ur+1 . . . un T   = σ1u1 σ2u2 . . . σrur 0 . . . 0 T Thus Σu 2 2 = r i=1 |σiui|2 ≤ σ1 r i=1 |ui|2 ≤ σ1 u 2 2. 2nd result can also be proven similarly. Lemma 1.106 Let σ1 be the largest singular value of an m × n matrix A. Then Ax 2 ≤ σ1 x 2 ∀ x ∈ Fn . (1.7.14) Moreover AH x 2 ≤ σ1 x 2 ∀ x ∈ Fm . (1.7.15) Proof. Ax 2 = UΣV H x 2 = ΣV H x 2 since U is unitary. Now from previous lemma we have ΣV H x 2 ≤ σ1 V H x 2 = σ1 x 2 since V H also unitary. Thus we get the result Ax 2 ≤ σ1 x 2 ∀ x ∈ Fn . Similarly AH x 2 = V ΣH UH x 2 = ΣH UH x 2 since V is unitary. Now from previous lemma we have ΣH UH x 2 ≤ σ1 UH x 2 = σ1 x 2 since UH also unitary. Thus we get the result AH x 2 ≤ σ1 x 2 ∀ x ∈ Fm .
  • 62. 62 1. MATRIX ALGEBRA There is a direct connection between the largest singular value and 2-norm of a matrix (see section 1.8.6). Corollary 1.107. The largest singular value of A is nothing but its 2-norm. i.e. σ1 = max u 2=1 Au 2. 1.7.2. SVD and pseudo inverse Lemma 1.108 [Pseudo-inverse of Σ] Let A = UΣV H and let r = rank(A). Let σ1, . . . , σr be the r non-zero singular values of A. Then the Moore-Penrose pseudo-inverse of Σ is an n × m matrix Σ† given by Σ† = Σ−1 r 0 0 0 (1.7.16) where Σr = diag(σ1, . . . , σr). Essentially Σ† is obtained by transposing Σ and inverting all its non-zero (positive real) values. Proof. Straight forward application of lemma 1.32. Corollary 1.109. The rank of Σ and its pseudo-inverse Σ† are same. i.e. rank(Σ) = rank(Σ† ). (1.7.17) Proof. The number of non-zero diagonal entries in Σ and Σ† are same. Lemma 1.110 Let A be an m × n matrix and let A = UΣV H be its singular value decomposition. Let Σ† be the pseudo inverse of Σ as per lemma 1.108. Then the Moore-Penrose pseudo-inverse of A is given by A† = V Σ† UH . (1.7.18)
  • 63. 1.7. SINGULAR VALUES 63 Proof. As usual we verify the requirements for a Moore-Penrose pseudo-inverse as per definition 1.19. We note that since Σ† is the pseudo-inverse of Σ it already satisfies necessary criteria. First requirement: AA† A = UΣV H V Σ† UH UΣV H = UΣΣ† ΣV H = UΣV H = A. Second requirement: A† AA† = V Σ† UH UΣV H V Σ† UH = V Σ† ΣΣ† UH = V Σ† UH = A† . We now consider AA† = UΣV H V Σ† UH = UΣΣ† UH . Thus AA† H = UΣΣ† UH H = U ΣΣ† H UH = UΣΣ† UH = AA† since ΣΣ† is Hermitian. Finally we consider A† A = V Σ† UH UΣV H = V Σ† ΣV H . Thus A† A H = V Σ† ΣV H H = V Σ† Σ H V H = V Σ† ΣV H = A† A since Σ† Σ is also Hermitian. This completes the proof. Finally we can connect the singular values of A with the singular values of its pseudo-inverse. Corollary 1.111. The rank of any m × n matrix A and its pseudo- inverse A† are same. i.e. rank(A) = rank(A† ). (1.7.19) Proof. We have rank(A) = rank(Σ). Also its easy to verify that rank(A† ) = rank(Σ† ). So using corollary 1.109 completes the proof.
  • 64. 64 1. MATRIX ALGEBRA Lemma 1.112 Let A be an m × n matrix and let A† be its n × m pseudo inverse as per lemma 1.110. Let r = rank(A) Let k = min(m, n) denote the number of singular values while r denote the number of non-singular values of A. Let σ1, . . . , σr be the non-zero singular values of A. Then the number of singular values of A† is same as that of A and the non-zero singular values of A† are 1 σ1 , . . . , 1 σr while all other k − r singular values of A† are zero. Proof. k = min(m, n) denotes the number of singular values for both A and A† . Since rank of A and A† are same, hence the number of non-zero singular values is same. Now look at A† = V Σ† UH where Σ† = Σ−1 r 0 0 0 . Clearly Σ−1 r = diag( 1 σ1 , . . . , 1 σr ). Thus expanding the R.H.S. we can get A† = r i=1 1 σi viuH i where vi and ui are first r columns of V and U respectively. If we reverse the order of first r columns of U and V and reverse the first r diagonal entries of Σ† , the R.H.S. remains the same while we are able to express A† in the standard singular value decomposition form. Thus 1 σ1 , . . . , 1 σr are indeed the non-zero singular values of A† . 1.7.3. Full column rank matrices In this subsection we consider some specific results related to singular value decomposition of a full column rank matrix.
  • 65. 1.7. SINGULAR VALUES 65 We will consider A to be an m × n matrix in Fm×n with m ≥ n and rank(A) = n. Let A = UΣV H be its singular value decomposition. From lemma 1.100 we observe that there are n non-zero singular values of A. We will call these singular values as σ1, σ2, . . . , σn. We will define Σn = diag(σ1, σ2, . . . , σn). Clearly Σ is an 2 × 1 block matrix given by Σ = Σn 0 where the lower 0 is an (m − n) × n zero matrix. From here we obtain that ΣH Σ is an n × n matrix given by ΣH Σ = Σ2 n where Σ2 n = diag(σ2 1, σ2 2, . . . , σ2 n). Lemma 1.113 Let A be a full column rank matrix with singular value decomposition A = UΣV H . Then ΣH Σ = Σ2 n = diag(σ2 1, σ2 2, . . . , σ2 n) and ΣH Σ is invertible. Proof. Since all singular values are non-zero hence Σ2 n is invert- ible. Thus ΣH Σ −1 = Σ2 n −1 = diag 1 σ2 1 , 1 σ2 2 , . . . , 1 σ2 n . (1.7.20) Lemma 1.114 Let A be a full column rank matrix with singular value decomposition A = UΣV H . Let σ1 be its largest singular value and σn be its smallest singular value. Then σ2 n x 2 ≤ ΣH Σx 2 ≤ σ2 1 x 2 ∀ x ∈ Fn . (1.7.21)
  • 66. 66 1. MATRIX ALGEBRA Proof. Let x ∈ Fn . We have ΣH Σx 2 2 = Σ2 nx 2 2 = n i=1 |σ2 i xi|2 . Now since σn ≤ σi ≤ σ1 hence σ4 n n i=1 |xi|2 ≤ n i=1 |σ2 i xi|2 ≤ σ4 1 n i=1 |xi|2 thus σ4 n x 2 2 ≤ ΣH Σx 2 2 ≤ σ4 1 x 2 2. Applying square roots, we get σ2 n x 2 ≤ ΣH Σx 2 ≤ σ2 1 x 2 ∀ x ∈ Fn . We recall from corollary 1.25 that the Gram matrix of its column vec- tors G = AH A is full rank and invertible. Lemma 1.115 Let A be a full column rank matrix with singular value decomposition A = UΣV H . Let σ1 be its largest singular value and σn be its smallest singular value. Then σ2 n x 2 ≤ AH Ax 2 ≤ σ2 1 x 2 ∀ x ∈ Fn . (1.7.22) Proof. AH A = (UΣV H )H (UΣV H ) = V ΣH ΣV H . Let x ∈ Fn . Let u = V H x =⇒ u 2 = x 2. Let r = ΣH Σu. Then from previous lemma we have σ2 n u 2 ≤ ΣH Σu 2 = r 2 ≤ σ2 1 u 2.
  • 67. 1.7. SINGULAR VALUES 67 Finally AH Ax = V ΣH ΣV H x = V r. Thus AH Ax 2 = r 2. Substituting we get σ2 n x 2 ≤ AH Ax 2 ≤ σ2 1 x 2 ∀ x ∈ Fn . There are bounds for the inverse of Gram matrix also. First let us establish the inverse of Gram matrix. Lemma 1.116 Let A be a full column rank matrix with singular value decomposition A = UΣV H . Let the singular values of A be σ1, . . . , σn. Let the Gram matrix of columns of A be G = AH A. Then G−1 = V ΨV H where Ψ = diag 1 σ2 1 , 1 σ2 2 , . . . , 1 σ2 n . Proof. We have G = V ΣH ΣV H Thus G−1 = V ΣH ΣV H −1 = V H −1 ΣH Σ −1 V −1 = V ΣH Σ −1 V H . From lemma 1.113 we have Ψ = ΣH Σ −1 = diag 1 σ2 1 , 1 σ2 2 , . . . , 1 σ2 n . This completes the proof. We can now state the bounds:
  • 68. 68 1. MATRIX ALGEBRA Lemma 1.117 Let A be a full column rank matrix with singular value decomposition A = UΣV H . Let σ1 be its largest singular value and σn be its smallest singular value. Then 1 σ2 1 x 2 ≤ AH A −1 x 2 ≤ 1 σ2 n x 2 ∀ x ∈ Fn . (1.7.23) Proof. From lemma 1.116 we have G−1 = AH A −1 = V ΨV H where Ψ = diag 1 σ2 1 , 1 σ2 2 , . . . , 1 σ2 n . Let x ∈ Fn . Let u = V H x =⇒ u 2 = x 2. Let r = Ψu. Then r 2 2 = n i=1 1 σ2 i ui 2 . Thus 1 σ2 1 u 2 ≤ Ψu 2 = r 2 ≤ 1 σ2 n u 2. Finally AH A −1 x = V ΨV H x = V r. Thus AH A −1 x 2 = r 2. Substituting we get the result.
  • 69. 1.8. MATRIX NORMS 69 1.7.4. Low rank approximation of a matrix Definition 1.40 An m × n matrix A is called low rank if rank(A) min(m, n). (1.7.24) Remark. A matrix is low rank if the number of non-zero singular values for the matrix is much smaller than its dimensions. Following is a simple procedure for making a low rank approximation of a given matrix A. (1) Perform the singular value decomposition of A given by A = UΣV H . (2) Identify the singular values of A in Σ. (3) Keep the first r singular values (where r min(m, n) is the rank of the approximation). and set all other singular values to 0 to obtain Σ. (4) Compute A = UΣV H . 1.8. Matrix norms This section reviews various matrix norms on the vector space of com- plex matrices over the field of complex numbers (Cm×n , C). We know (Cm×n , C) is a finite dimensional vector space with dimension mn. We will usually refer to it as Cm×n . Matrix norms will follow the usual definition of norms for a vector space. Definition 1.41 A function · : Cm×n → R is called a matrix norm on Cm×n if for all A, B ∈ Cm×n and all α ∈ C it satisfies the following Positivity: A ≥ 0
  • 70. 70 1. MATRIX ALGEBRA with A = 0 ⇐⇒ A = 0. Homogeneity: αA = |α| A . Triangle inequality: A + B ≤ A + B . We recall some of the standard results on normed vector spaces. All matrix norms are equivalent. Let · and · be two different matrix norms on Cm×n . Then there exist two constants a and b such that the following holds a A ≤ A ≤ b A ∀ A ∈ Cm×n . A matrix norm is a continuous function · : Cm×n → R. 1.8.1. Norms like lp on Cn Following norms are quite like lp norms on finite dimensional complex vector space Cn . They are developed by the fact that the matrix vector space Cm×n has one to one correspondence with the complex vector space Cmn . Definition 1.42 Let A ∈ Cm×n and A = [aij]. Matrix sum norm is defined as A S = m i=1 n j=1 |aij| (1.8.1) Definition 1.43 Let A ∈ Cm×n and A = [aij]. Matrix Frobenius norm is defined as A F = m i=1 n j=1 |aij|2 1 2 . (1.8.2)
  • 71. 1.8. MATRIX NORMS 71 Definition 1.44 Let A ∈ Cm×n and A = [aij]. Matrix Max norm is defined as A M = max 1≤i≤m 1≤j≤n |aij|. (1.8.3) 1.8.2. Properties of Frobenius norm We now prove some elementary properties of Frobenius norm. Lemma 1.118 The Frobenius norm of a matrix is equal to the Frobenius norm of its Hermitian transpose. AH F = A F . (1.8.4) Proof. Let A = [aij]. Then AH = [aji] AH 2 F = n j=1 m i=1 |aij|2 = m i=1 n j=1 |aij|2 = A 2 F . Now AH 2 F = A 2 F =⇒ AH F = A F Lemma 1.119 Let A ∈ Cm×n be written as a row of column vec- tors A = a1 . . . an . Then A 2 F = n j=1 aj 2 2. (1.8.5)
  • 72. 72 1. MATRIX ALGEBRA Proof. We note that aj 2 2 = m i=1 aij 2 2. Now A 2 F = m i=1 n j=1 |aij|2 = n j=1 m i=1 |aij|2 = n j=1 aj 2 2 . We thus showed that that the square of the Frobenius norm of a matrix is nothing but the sum of squares of l2 norms of its columns. Lemma 1.120 Let A ∈ Cm×n be written as a column of row vec- tors A =     a1 ... am     . Then A 2 F = m i=1 ai 2 2. (1.8.6) Proof. We note that ai 2 2 = n j=1 aij 2 2. Now A 2 F = m i=1 n j=1 |aij|2 = m i=1 ai 2 2. We now consider how the Frobenius norm is affected with the action of unitary matrices. Let A be any arbitrary matrix in Cm×n . Let U be some unitary matrices in Cm×m . Let V be some unitary matrices in Cn×n .
  • 73. 1.8. MATRIX NORMS 73 We present our first result that multiplication with unitary matrices doesn’t change Frobenius norm of a matrix. Theorem 1.121 The Frobenius norm of a matrix is invariant to pre or post multiplication by a unitary matrix. i.e. UA F = A F (1.8.7) and AV F = A F . (1.8.8) Proof. We can write A as A = a1 . . . an . So UA = Ua1 . . . Uan . Then applying lemma 1.119 clearly UA 2 F = n j=1 Uaj 2 2. But we know that unitary matrices are norm preserving. Hence Uaj 2 2 = aj 2 2. Thus UA 2 F = n j=1 aj 2 2 = A 2 F which implies UA F = A F . Similarly writing A as
  • 74. 74 1. MATRIX ALGEBRA A =     r1 ... rm     . we have AV =     r1V ... rmV     . Then applying lemma 1.120 clearly AV 2 F = m i=1 riV 2 2. But we know that unitary matrices are norm preserving. Hence riV 2 2 = ri 2 2. Thus AV 2 F = m i=1 ri 2 2 = A 2 F which implies AV F = A F . An alternative approach for the 2nd part of the proof using the first part is just one line AV F = (AV )H F = V H AH F = AH F = A F . In above we use lemma 1.118 and the fact that V is a unitary matrix implies that V H is also a unitary matrix. We have already shown that pre multiplication by a unitary matrix preserves Frobenius norm. Theorem 1.122 Let A ∈ Cm×n and B ∈ Cn×P be two matrices. Then the Frobenius norm of their product is less than or equal to
  • 75. 1.8. MATRIX NORMS 75 the product of Frobenius norms of the matrices themselves. i.e. AB F ≤ A F B F . (1.8.9) Proof. We can write A as A =     aT 1 ... aT m     where ai are m column vectors corresponding to rows of A. Similarly we can write B as B = b1 . . . bP where bi are column vectors corresponding to columns of B. Then AB =     aT 1 ... aT m     b1 . . . bP =     aT 1 b1 . . . aT 1 bP ... ... ... aT mb1 . . . aT mbP     = aT i bj . Now looking carefully aT i bj = ai, bj Applying the Cauchy-Schwartz inequality we have | ai, bj |2 ≤ ai 2 2 bj 2 2 = ai 2 2 bj 2 2 Now AB 2 F = m i=1 P j=1 |aT i bj|2 ≤ m i=1 P j=1 ai 2 2 bj 2 2 = m i=1 ai 2 2 P j=1 bj 2 2 = A 2 F B 2 F which implies AB F ≤ A F B F by taking square roots on both sides.
  • 76. 76 1. MATRIX ALGEBRA Corollary 1.123. Let A ∈ Cm×n and let x ∈ Cn . Then Ax 2 ≤ A F x 2. Proof. We note that Frobenius norm for a column matrix is same as l2 norm for corresponding column vector. i.e. x F = x 2 ∀ x ∈ Cn . Now applying theorem 1.122 we have Ax 2 = Ax F ≤ A F x F = A F x 2 ∀ x ∈ Cn . It turns out that Frobenius norm is intimately related to the singular value decomposition of a matrix. Lemma 1.124 Let A ∈ Cm×n . Let the singular value decomposi- tion of A be given by A = UΣV H . Let the singular value of A be σ1, . . . , σn. Then A F = n i=1 σ2 i . (1.8.10) Proof. A = UΣV H =⇒ A F = UΣV H F . But UΣV H F = ΣV H F = Σ F since U and V are unitary matrices (see theorem 1.121 ). Now the only non-zero terms in Σ are the singular values. Hence A F = Σ F = n i=1 σ2 i .
  • 77. 1.8. MATRIX NORMS 77 1.8.3. Consistency of a matrix norm Definition 1.45 A matrix norm · is called consistent on Cn×n if AB ≤ A B (1.8.11) holds true for all A, B ∈ Cn×n . A matrix norm · is called consistent if it is defined on Cm×n for all m, n ∈ N and eq (1.8.11) holds for all matrices A, B for which the product AB is defined. A consistent matrix norm is also known as a sub-multiplicative norm. With this definition and results in theorem 1.122 we can see that Frobe- nius norm is consistent. 1.8.4. Subordinate matrix norm A matrix operates on vectors from one space to generate vectors in another space. It is interesting to explore the connection between the norm of a matrix and norms of vectors in the domain and co-domain of a matrix. Definition 1.46 Let m, n ∈ N be given. Let · α be some norm on Cm and · β be some norm on Cn . Let · be some norm on matrices in Cm×n . We say that · is subordinate to the vector norms · α and · β if Ax α ≤ A x β (1.8.12) for all A ∈ Cm×n and for all x ∈ Cn . In other words the length of the vector doesn’t increase by the operation of A beyond a factor given by the norm of the matrix itself. If · α and · β are same then we say that · is subordinate to the vector norm · α.
  • 78. 78 1. MATRIX ALGEBRA We have shown earlier in corollary 1.123 that Frobenius norm is sub- ordinate to Euclidean norm. 1.8.5. Operator norm We now consider the maximum factor by which a matrix A can increase the length of a vector. Definition 1.47 Let m, n ∈ N be given. Let · α be some norm on Cn and · β be some norm on Cm . For A ∈ Cm×n we define A A α→β max x=0 Ax β x α . (1.8.13) Ax β x α represents the factor with which the length of x increased by operation of A. We simply pick up the maximum value of such scaling factor. The norm as defined above is known as (α → β) operator norm, the (α → β)-norm, or simply the α-norm if α = β. Off course we need to verify that this definition satisfies all properties of a norm. Clearly if A = 0 then Ax = 0 always, hence A = 0. Conversely, if A = 0 then Ax β = 0 ∀ x ∈ Cn . In particular this is true for the unit vectors ei ∈ Cn . The i-th column of A is given by Aei which is 0. Thus each column in A is 0. Hence A = 0. Now consider c ∈ C. cA = max x=0 cAx β x α = |c|max x=0 Ax β x α = |c| A . We now present some useful observations on operator norm before we can prove triangle inequality for operator norm. For any x ∈ ker(A), Ax = 0 hence we only need to consider vectors which don’t belong to the kernel of A.
  • 79. 1.8. MATRIX NORMS 79 Thus we can write A α→β = max x/∈ker(A) Ax β x α . (1.8.14) We also note that Acx β cx α = |c| Ax β |c| x α = Ax β x α ∀ c = 0, x = 0. Thus, it is sufficient to find the maximum on unit norm vectors: A α→β = max x α=1 Ax β. Note that since x α = 1 hence the term in denominator goes away. Lemma 1.125 The (α → β)-operator norm is subordinate to vec- tor norms · α and · β. i.e. Ax β ≤ A α→β x α. (1.8.15) Proof. For x = 0 the inequality is trivially satisfied. Now for x = 0 by definition, we have A α→β ≥ Ax β x α =⇒ A α→β x α ≥ Ax β. Remark. There exists a vector x∗ ∈ Cn with unit norm ( x∗ α = 1) such that A α→β = Ax∗ β. (1.8.16) Proof. Let x = 0 be some vector which maximizes the expression Ax β x α . Then A α→β = Ax β x α . Now consider x∗ = x x α . Thus x∗ α = 1. We know that Ax β x α = Ax∗ β.
  • 80. 80 1. MATRIX ALGEBRA Hence A α→β = Ax∗ β. We are now ready to prove triangle inequality for operator norm. Lemma 1.126 Operator norm as defined in definition 1.47 satis- fies triangle inequality. Proof. Let A and B be some matrices in Cm×n . Consider the operator norm of matrix A + B. From previous remarks, there exists some vector x∗ ∈ Cn with x∗ α = 1 such that A + B = (A + B)x∗ β. Now (A + B)x∗ β = Ax∗ + Bx∗ β ≤ Ax∗ β + Bx∗ β. From another remark we have Ax∗ β ≤ A x∗ α = A and Bx∗ β ≤ B x∗ α = B since x∗ α = 1. Hence we have A + B ≤ A + B . It turns out that operator norm is also consistent under certain condi- tions.
  • 81. 1.8. MATRIX NORMS 81 Lemma 1.127 Let · α be defined over all m ∈ N. Let · β = · α. Then the operator norm A α = max x=0 Ax α x α is consistent. Proof. We need to show that AB α ≤ A α B α. Now AB α = max x=0 ABx α x α . We note that if Bx = 0, then ABx = 0. Hence we can rewrite as AB α = max Bx=0 ABx α x α . Now if Bx = 0 then Bx α = 0. Hence ABx α x α = ABx α Bx α Bx α x α and max Bx=0 ABx α x α ≤ max Bx=0 ABx α Bx α max Bx=0 Bx α x α . Clearly B α = max Bx=0 Bx α x α . Furthermore max Bx=0 ABx α Bx α ≤ max y=0 Ay α y α = A α. Thus we have AB α ≤ A α B α.
  • 82. 82 1. MATRIX ALGEBRA 1.8.6. p-norm for matrices We recall the definition of lp norms for vectors x ∈ Cn from (??) x p =    ( n i=1 |x|p i ) 1 p p ∈ [1, ∞) max 1≤i≤n |xi| p = ∞ . The operator norms · p defined from lp vector norms are of specific interest. Definition 1.48 The p-norm for a matrix A ∈ Cm×n is defined as A p max x=0 Ax p x p = max x p=1 Ax p (1.8.17) where x p is the standard lp norm for vectors in Cm and Cn . Remark. As per lemma 1.127 p-norms for matrices are consistent norms. They are also sub-ordinate to lp vector norms. Special cases are considered for p = 1, 2 and ∞. Theorem 1.128 Let A ∈ Cm×n . For p = 1 we have A 1 max 1≤j≤n m i=1 |aij|. (1.8.18) This is also known as max column sum norm. For p = ∞ we have A ∞ max 1≤i≤m n j=1 |aij|. (1.8.19) This is also known as max row sum norm. Finally for p = 2 we have A 2 σ1 (1.8.20)
  • 83. 1.8. MATRIX NORMS 83 where σ1 is the largest singular value of A. This is also known as spectral norm. Proof. Let A = a1 . . . , an . Then Ax 1 = n j=1 xjaj 1 ≤ n j=1 xjaj 1 = n j=1 |xj| aj 1 ≤ max 1≤j≤n aj 1 n j=1 |xj| = max 1≤j≤n aj 1 x 1. Thus, A 1 = max x=0 Ax 1 x 1 ≤ max 1≤j≤n aj 1 which the maximum column sum. We need to show that this upper bound is indeed an equality. Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and 0 elsewhere, Aej 1 = aj 1. Thus A 1 ≥ aj 1 ∀ 1 ≤ j ≤ n. Combining the two, we see that A 1 = max 1≤j≤n aj 1.
  • 84. 84 1. MATRIX ALGEBRA For p = ∞, we proceed as follows: Ax ∞ = max 1≤i≤m n j=1 aijxj ≤ max 1≤i≤m n j=1 |aij||xj| ≤ max 1≤j≤n |xj| max 1≤i≤m n j=1 |aij| = x ∞ max 1≤i≤m ai 1 where ai are the rows of A. This shows that Ax ∞ ≤ max 1≤i≤m ai 1. We need to show that this is indeed an equality. Fix an i = k and choose x such that xj = sgn(akj). Clearly x ∞ = 1. Then Ax ∞ = max 1≤i≤m n j=1 aijxj ≥ n j=1 akjxj = n j=1 |akj| = n j=1 |akj| = ak 1. Thus, A ∞ ≥ max 1≤i≤m ai 1
  • 85. 1.8. MATRIX NORMS 85 Combining the two inequalities we get: A ∞ = max 1≤i≤m ai 1. Remaining case is for p = 2. For any vector x with x 2 = 1, Ax 2 = UΣV H x 2 = U(ΣV H x) 2 = ΣV H x 2 since l2 norm is invariant to unitary transformations. Let v = V H x. Then v 2 = V H x 2 = x 2 = 1. Now Ax 2 = Σv 2 = n j=1 |σjvj|2 1 2 ≤ σ1 n j=1 |vj|2 1 2 = σ1 v 2 = σ1. This shows that A 2 ≤ σ1. Now consider some vector x such that v = (1, 0, . . . , 0). Then Ax 2 = Σv 2 = σ1. Thus A 2 ≥ σ1. Combining the two, we get that A 2 = σ1. 1.8.7. The 2-norm Theorem 1.129 Let A ∈ Cn×n has singular values σ1 ≥ σ2 ≥ · · · ≥ σn. Let the eigen values for A be λ1, λ2, . . . , λn with |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Then the following hold A 2 = σ1 (1.8.21)
  • 86. 86 1. MATRIX ALGEBRA and if A is non-singular A−1 2 = 1 σn . (1.8.22) If A is symmetric and positive definite, then A 2 = λ1 (1.8.23) and if A is non-singular A−1 2 = 1 λn . (1.8.24) If A is normal then A 2 = |λ1| (1.8.25) and if A is non-singular A−1 2 = 1 |λn| . (1.8.26) 1.8.8. Unitary invariant norms Definition 1.49 A matrix norm · on Cm×n is called unitary invariant if UAV = A for any A ∈ Cm×n and any unitary matrices U ∈ Cm×m and V ∈ Cn×n . We have already seen in theorem 1.121 that Frobenius norm is unitary invariant. It turns out that spectral norm is also unitary invariant. 1.8.9. More properties of operator norms In this section we will focus on operator norms connecting normed linear spaces (Cn , · p) and (Cm , · q). Typical values of p, q would be in {1, 2, ∞}. We recall that A p→q = max x=0 Ax q x p = max x p=1 Ax q = max x p≤1 Ax q. (1.8.27)
  • 87. 1.8. MATRIX NORMS 87 Table 1[[5]] shows how to compute different (p, q) norms. Some can be computed easily while others are NP-hard to compute. Table 1. Typical (p → q) norms p q A p→q Calculation 1 1 A 1 Maximum l1 norm of a column 1 2 A 1→2 Maximum l2 norm of a column 1 ∞ A 1→∞ Maximum absolute entry of a matrix 2 1 A 2→1 NP hard 2 2 A 2 Maximum singular value 2 ∞ A 2→∞ Maximum l2 norm of a row ∞ 1 A ∞→1 NP hard ∞ 2 A ∞→2 NP hard ∞ ∞ A ∞ Maximum l1-norm of a row The topological dual of the finite dimensional normed linear space (Cn , · p) is the normed linear space (Cn , · p ) where 1 p + 1 p = 1. l2-norm is dual of l2-norm. It is a self dual. l1 norm and l∞-norm are dual of each other. When a matrix A maps from the space (Cn , · p) to the space (Cm , · q), we can view its conjugate transpose AH as a mapping from the space (Cm , · q ) to (Cn , · p ). Theorem 1.130 Operator norm of a matrix always equals the op- erator norm of its conjugate transpose. i.e. A p→q = AH q →p (1.8.28) where 1 p + 1 p = 1, 1 q + 1 q = 1.
  • 88. 88 1. MATRIX ALGEBRA Specific applications of this result are: A 2 = AH 2. (1.8.29) This is obvious since the maximum singular value of a matrix and its conjugate transpose are same. A 1 = AH ∞, A ∞ = AH 1. (1.8.30) This is also obvious since max column sum of A is same as the max row sum norm of AH and vice versa. A 1→∞ = AH 1→∞. (1.8.31) A 1→2 = AH 2→∞. (1.8.32) A ∞→2 = AH 2→1. (1.8.33) We now need to show the result for the general case (arbitrary 1 ≤ p, q ≤ ∞). Proof. TODO Theorem 1.131 A 1→p = max 1≤j≤n aj p. (1.8.34) where A = a1 . . . , an .
  • 89. 1.8. MATRIX NORMS 89 Proof. Ax p = n j=1 xjaj p ≤ n j=1 xjaj p = n j=1 |xj| aj p ≤ max 1≤j≤n aj p n j=1 |xj| = max 1≤j≤n aj p x 1. Thus, A 1→p = max x=0 Ax p x 1 ≤ max 1≤j≤n aj p. We need to show that this upper bound is indeed an equality. Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and 0 elsewhere, Aej p = aj p. Thus A 1→p ≥ aj p ∀ 1 ≤ j ≤ n. Combining the two, we see that A 1→p = max 1≤j≤n aj p. Theorem 1.132 A p→∞ = max 1≤i≤m ai q (1.8.35) where 1 p + 1 q = 1.
  • 90. 90 1. MATRIX ALGEBRA Proof. Using theorem 1.130, we get A p→∞ = AH 1→q. Using theorem 1.131, we get AH 1→q = max 1≤i≤m ai q. This completes the proof. Theorem 1.133 For two matrices A and B and p ≥ 1, we have AB p→q ≤ B p→s A s→q. (1.8.36) Proof. We start with AB p→q = max x p=1 A(Bx) q. From lemma 1.125, we obtain A(Bx) q ≤ A s→q (Bx) s. Thus, AB p→q ≤ A s→q max x p=1 (Bx) s = A s→q B p→s. Theorem 1.134 For two matrices A and B and p ≥ 1, we have AB p→∞ ≤ A ∞→∞ B p→∞. (1.8.37) Proof. We start with AB p→∞ = max x p=1 A(Bx) ∞. From lemma 1.125, we obtain A(Bx) ∞ ≤ A ∞→∞ (Bx) ∞. Thus, AB p→∞ ≤ A ∞→∞ max x p=1 (Bx) ∞ = A ∞→∞ B p→∞.
  • 91. 1.8. MATRIX NORMS 91 Theorem 1.135 A p→∞ ≤ A p→p. (1.8.38) In particular A 1→∞ ≤ A 1. (1.8.39) A 2→∞ ≤ A 2. (1.8.40) Proof. Choosing q = ∞ and s = p and applying theorem 1.133 IA p→∞ ≤ A p→p I p→∞. But I p→∞ is the maximum lp norm of any row of I which is 1. Thus A p→∞ ≤ A p→p. Consider the expression min z∈C(AH ) z=0 Az q z p . (1.8.41) z ∈ C(AH ), z = 0 means there exists some vector u /∈ ker(AH ) such that z = AH u. This expression measures the factor by which the non-singular part of A can decrease the length of a vector. Theorem 1.136 [5] The following bound holds for every matrix A: min z∈C(AH ) z=0 Az q z p ≥ A† −1 q,p. (1.8.42) If A is surjective (onto), then the equality holds. When A is bijec- tive (one-one onto, square, invertible), then the result implies min z∈C(AH ) z=0 Az q z p = A−1 −1 q,p. (1.8.43)
  • 92. 92 1. MATRIX ALGEBRA Proof. The spaces C(AH ) and C(A) have same dimensions given by rank(A). We recall that A† A is a projector onto the column space of A. w = Az ⇐⇒ z = A† w = A† Az ∀ z ∈ C(AH ). As a result we can write z p Az q = A† w p w q whenever z ∈ C(AH ). Now   min z∈C(AH ) z=0 Az q z p   −1 = max z∈C(AH ) z=0 z p Az q = max w∈C(A) w=0 A† w p w q ≤ max w=0 A† w p w q . When A is surjective, then C(A) = Cm . Hence max w∈C(A) w=0 A† w p w q = max w=0 A† w p w q . Thus, the inequality changes into equality. Finally max w=0 A† w p w q = A† q→p which completes the proof. 1.8.10. Row column norms Definition 1.50 Let A be an m × n matrix with rows ai as A =     a1 ... am     Then we define A p,∞ max 1≤i≤m ai p = max 1≤i≤m n j=1 |ai j|p 1 p (1.8.44) where 1 ≤ p < ∞. i.e. we take p-norms of all row vectors and then find the maximum.
  • 93. 1.8. MATRIX NORMS 93 We define A ∞,∞ = max i,j |aij|. (1.8.45) This is equivalent to taking l∞ norm on each row and then taking the maximum of all the norms. For 1 ≤ p, q < ∞, we define the norm A p,q m i=1 ai p q 1 q . (1.8.46) i.e., we compute p-norm of all the row vectors to form another vector and then take q-norm of that vector. Note that the norm A p,∞ is different from the operator norm A p→∞. Similarly A p,q is different from A p→q. Theorem 1.137 A p,∞ = A q→∞ (1.8.47) where 1 p + 1 q = 1. Proof. From theorem 1.132 we get A q→∞ = max 1≤i≤m ai p. This is exactly the definition of A p,∞. Theorem 1.138 A 1→p = A p,∞. (1.8.48) Proof. A 1→p = AH q→∞. From theorem 1.137 AH q→∞ = AH p,∞.
  • 94. 94 1. MATRIX ALGEBRA Theorem 1.139 For any two matrices A, B, we have AB p,∞ B p,∞ ≤ A ∞→∞. (1.8.49) Proof. Let q be such that 1 p + 1 q = 1. From theorem 1.134, we have AB q→∞ ≤ A ∞→∞ B q→∞. From theorem 1.137 AB q→∞ = AB p,∞ and B q→∞ = B p,∞. Thus AB p,∞ ≤ A ∞→∞ B p,∞. Theorem 1.140 Relations between (p, q) norms and (p → q) norms A 1,∞ = A ∞→∞ (1.8.50) A 2,∞ = A 2→∞ (1.8.51) A ∞,∞ = A 1→∞ (1.8.52) A 1→1 = AH 1,∞ (1.8.53) A 1→2 = AH 2,∞ (1.8.54) (1.8.55) Proof. The first three are straight forward applications of theo- rem 1.137. The next two are applications of theorem 1.138. See also table 1.
  • 95. 1.8. MATRIX NORMS 95 1.8.11. Block diagonally dominant matrices and generalized Gershgorin disc theorem In [1] the idea of diagonally dominant matrices (see section 1.6.9) has been generalized to block matrices using matrix norms. We consider the specific case with spectral norm. Definition 1.51 [Block diagonally dominant matrix] Let A be a square matrix in Cn×n which is partitioned in following manner A =       A11 A12 . . . A1k A21 A22 . . . A2k ... ... ... ... Ak1 Ak2 . . . Akk       (1.8.56) where each of the submatrices Aij is a square matrix of size m×m. Thus n = km. A is called block diagonally dominant if Aii 2 ≥ j=i Aij 2. holds true for all 1 ≤ i ≤ n. If the inequality satisfies strictly for all i, then A is called block strictly diagonally dominant matrix. Theorem 1.141 If the partitioned matrix A of definition 1.51 is block strictly diagonally dominant matrix, then it is nonsingular. For proof see [1]. This leads to the generalized Gershgorin disc theorem.
  • 96. 96 1. MATRIX ALGEBRA Theorem 1.142 Let A be a square matrix in Cn×n which is par- titioned in following manner A =       A11 A12 . . . A1k A21 A22 . . . A2k ... ... ... ... Ak1 Ak2 . . . Akk       (1.8.57) where each of the submatrices Aij is a square matrix of size m×m. Then each eigenvalue λ of A satisfies λI − Aii 2 ≤ j=i Aij for some i ∈ {1, 2, . . . , n}. (1.8.58) For proof see [1]. Since the 2-norm of a positive semidefinite matrix is nothing but its largest eigen value, the theorem directly applies. Corollary 1.143. Let A be a Hermitian positive semidefinite matrix. Let A be partitioned as in theorem 1.142. Then its 2-norm A 2 satis- fies | A 2 − Aii 2| ≤ j=i Aij for some i ∈ {1, 2, . . . , n}. (1.8.59) 1.9. Miscellaneous topics 1.9.1. Hadamard product Usually standard linear algebra books don’t dwell much about element- wise or component wise products of vectors or matrices. Yet in certain contexts and algorithms, this is quite useful. We define the notation in this section. For further details see [3], [2] and [4]. Definition 1.52 The Hadamard product of two matrices A = [aij] and B = [bij] with same dimensions (not necessarily square)
  • 97. 1.10. DIGEST 97 with entries in a given ring R is the entry-wise product A ◦ B ≡ [aijbij], which has the same dimensions as A and B. Example 1.3: Hadamard product Let A = 1 2 3 4 and B = 5 −6 7 −3 Then A ◦ B = 5 −12 21 −12 The Hardamard product is associative and distributive. It is also com- mutative. Naturally this can also be defined for column vectors and row vectors also. The reason why this product is not mentioned in linear algebra texts is because it is inherently basis dependent. But this product has a number of uses in statistics and analysis. In analysis, a similar concept is point-wise product which is defined to be (f.g)(x) = f(x)g(x). 1.10. Digest 1.10.1. Norms All norms are equivalent. Sum norm A S = m i=1 n j=1 |aij|.
  • 98. 98 1. MATRIX ALGEBRA Frobenius norm A F = m i=1 n j=1 |aij|2 1 2 . Max norm A M = max 1≤i≤m 1≤j≤n |aij|. Frobenius norm of Hermitian transpose AH F = A F . Frobenius norm as sum of norms of column vectors A 2 F = n j=1 aj 2 2. Frobenius norm as sum of norms of row vectors A 2 F = m i=1 ai 2 2. Frobenius norm invariance w.r.t. unitary matrices UA F = A F AV F = A F . Frobenius norm is consistent: AB F ≤ A F B F . corollary 1.123 Ax 2 ≤ A F x 2. A F = n i=1 σ2 i . Consistent norms AB ≤ A B also known as sub-multiplicative norm.
  • 99. 1.10. DIGEST 99 Subordinate matrix norm Ax α ≤ A x β (α → β) Operator norm A A α→β max x=0 Ax β x α . A α→β = max x/∈ker(A) Ax β x α = max x α=1 Ax β. (α → β) norm is subordinate Ax β ≤ A α→β x α. There exists a unit norm vector x∗ such that A α→β = Ax∗ β. α → α-norms are consistent A α = max x=0 Ax α x α AB α ≤ A α B α. p-norm A p max x=0 Ax p x p = max x p=1 Ax p Closed form p-norms A 1 max 1≤j≤n m i=1 |aij|. A ∞ max 1≤i≤m n j=1 |aij|. 2-norm A 2 σ1 non-singular A−1 2 = 1 σn .
  • 100. 100 1. MATRIX ALGEBRA symmetric and positive definite A 2 = λ1 non-singular A−1 2 = 1 λn . normal A 2 = |λ1| non-singular A−1 2 = 1 |λn| . Unitary invariant norm UAV = A for any A ∈ Cm×n and any unitary U and V . Typical p → q norms Dual norm and conjugate transpose A p→q = AH q →p 1 p + 1 p = 1. A 2 = AH 2. A 1 = AH ∞, A ∞ = AH 1. A 1→∞ = AH 1→∞. A 1→2 = AH 2→∞. A ∞→2 = AH 2→1. A 1→p A 1→p = max 1≤j≤n aj p. A p→∞ A p→∞ = max 1≤i≤m ai q with 1 p + 1 q = 1. Consistency of p → q norm AB p→q ≤ B p→s A s→q.
  • 101. 1.10. DIGEST 101 Consistency of p → ∞ norm AB p→∞ ≤ A ∞→∞ B p→∞. Dominance of p → ∞ norm by p → p norm A p→∞ ≤ A p→p. A 1→∞ ≤ A 1. A 2→∞ ≤ A 2. Restricted minimum property min z∈C(AH ) z=0 Az q z p ≥ A† −1 q,p. If A is surjective (onto), then the equality holds. When A is bijective min z∈C(AH ) z=0 Az q z p = A−1 −1 q,p. Row column norm A p,∞ max 1≤i≤m ai p. A p,∞ = max 1≤i≤m n j=1 |ai j|p 1 p . A ∞,∞ = max i,j |aij|. A p,q m i=1 ai p q 1 q . Row column norm and p → ∞ norm A p,∞ = A q→∞ with 1 p + 1 q = 1. Consistency of (p, ∞) norm AB p,∞ B p,∞ ≤ A ∞→∞.
  • 102. 102 1. MATRIX ALGEBRA Relations between (p, q) norms and (p → q) norms A 1,∞ = A ∞→∞ A 2,∞ = A 2→∞ A ∞,∞ = A 1→∞ A 1→1 = AH 1,∞ A 1→2 = AH 2,∞
  • 103. Bibliography [1] David G Feingold, Richard S Varga, et al. Block diagonally domi- nant matrices and generalizations of the gerschgorin circle theorem. Pacific J. Math, 12(4):1241–1250, 1962. [2] Roger A Horn. The hadamard product. In Proc. Symp. Appl. Math, volume 40, pages 87–169, 1990. [3] Elizabeth Million. The hadamard product, 2007. [4] George PH Styan. Hadamard products and multivariate statistical analysis. Linear Algebra and Its Applications, 6:217–240, 1973. [5] JOEL A TROPP. Just relax: Convex programming methods for subset selection and sparse approximation. 2004. 103