Some notes on Matrix Algebra

CHAPTER 1
Matrix Algebra
In this chapter we collect results related to matrix algebra which are
relevant to this book. Some specific topics which are typically not
found in standard books are also covered here.
1.1. Preliminaries
Standard notation in this chapter is given here. Matrices are denoted
by capital letters A, B etc.. They can be rectangular with m rows
and n columns. Their elements or entries are referred to with small
letters aij, bij etc. where i denotes the i-th row of matrix and j denotes
the j-th column of matrix. Thus
A =






a11 a12 . . . a1n
a21 a22 . . . a1n
...
...
...
...
am1 am2 . . . amn






Mostly we consider complex matrices belonging to Cm×n
. Sometimes
we will restrict our attention to real matrices belonging to Rm×n
.
Definition 1.1 [Square matrix] An m×n matrix is called square
matrix if m = n.
Definition 1.2 [Tall matrix] An m × n matrix is called tall ma-
trix if m > n i.e. the number of rows is greater than columns.
1

2 1. MATRIX ALGEBRA
Definition 1.3 [Wide matrix] An m × n matrix is called wide
matrix if m < n i.e. the number of columns is greater than rows.
Definition 1.4 [Main diagonal] Let A = [aij] be an m×n matrix.
The main diagonal consists of entries aij where i = j. i.e. main
diagonal is {a11, a22, . . . , akk} where k = min(m, n). Main diagonal
is also known as leading diagonal, major diagonal primary
diagonal or principal diagonal. The entries of A which are not
on the main diagonal are known as off diagonal entries.
Definition 1.5 [Diagonal matrix] A diagonal matrix is a matrix
(usually a square matrix) whose entries outside the main diagonal
are zero.
Whenever we refer to a diagonal matrix which is not square, we
will use the term rectangular diagonal matrix.
A square diagonal matrix A is also represented by diag(a11, a22, . . . , ann)
which lists only the diagonal (non-zero) entries in A.
The transpose of a matrix A is denoted by AT
while the Hermitian
transpose is denoted by AH
. For real matrices AT
= AH
.
When matrices are square, we have the number of rows and columns
both equal to n and they belong to Cn×n
.
If not specified, the square matrices will be of size n×n and rectangular
matrices will be of size m×n. If not specified the vectors (column vec-
tors) will be of size n×1 and belong to either Rn
or Cn
. Corresponding
row vectors will be of size 1 × n.
For statements which are valid both for real and complex matrices,
sometimes we might say that matrices belong to Fm×n
while the scalars
belong to F and vectors belong to Fn
where F refers to either the field
of real numbers or the field of complex numbers. Note that this is not

1.1. PRELIMINARIES 3
consistently followed at the moment. Most results are written only for
Cm×n
while still being applicable for Rm×n
.
Identity matrix for Fn×n
is denoted as In or simply I whenever the size
is clear from context.
Sometimes we will write a matrix in terms of its column vectors. We
will use the notation
A = a1 a2 . . . an
indicating n columns.
When we write a matrix in terms of its row vectors, we will use the
notation
A =






aT
1
aT
2
...
aT
m






indicating m rows with ai being column vectors whose transposes form
the rows of A.
The rank of a matrix A is written as rank(A), while the determinant
as det(A) or |A|.
We say that an m × n matrix A is left-invertible if there exists an
n × m matrix B such that BA = I. We say that an m × n matrix A is
right-invertible if there exists an n × m matrix B such that AB = I.
We say that a square matrix A is invertible when there exists another
square matrix B of same size such that AB = BA = I. A square
matrix is invertible iﬀ its both left and right invertible. Inverse of a
square invertible matrix is denoted by A−1
.
A special left or right inverse is the pseudo inverse, which is denoted
by A†
.
Column space of a matrix is denoted by C(A), the null space by N(A),
and the row space by R(A).

4 1. MATRIX ALGEBRA
We say that a matrix is symmetric when A = AT
, conjugate sym-
metric or Hermitian when AH
= A.
When a square matrix is not invertible, we say that it is singular. A
non-singular matrix is invertible.
The eigen values of a square matrix are written as λ1, λ2, . . . while the
singular values of a rectangular matrix are written as σ1, σ2, . . . .
The inner product or dot product of two column / row vectors u and
v belonging to Rn
is defined as
u · v = u, v =
n
i=1
uivi. (1.1.1)
The inner product or dot product of two column / row vectors u and
v belonging to Cn
is defined as
u · v = u, v =
n
i=1
uivi. (1.1.2)
1.1.1. Block matrix
Definition 1.6 A block matrix is a matrix whose entries them-
selves are matrices with following constraints
(1) Entries in every row are matrices with same number of
rows.
(2) Entries in every column are matrices with same number
of columns.
Let A be an m × n block matrix. Then
A =






A11 A12 . . . A1n
A21 A22 . . . A2n
...
...
...
...
Am1 Am2 . . . Amn






(1.1.3)
where Aij is a matrix with ri rows and cj columns.

1.1. PRELIMINARIES 5
A block matrix is also known as a partitioned matrix.
Example 1.1: 2x2 block matrices Quite frequently we will be using
2x2 block matrices.
P =
P11 P12
P21 P22
. (1.1.4)
An example
P =



a b c
d e f
g h i



We have
P11 =
a b
d e
P12 =
c
f
P21 = g h P22 = i
• P11 and P12 have 2 rows.
• P21 and P22 have 1 row.
• P11 and P21 have 2 columns.
• P12 and P22 have 1 column.
Lemma 1.1 Let A = [Aij] be an m×n block matrix with Aij being
an ri × cj matrix. Then A is an r × c matrix where
r =
m
i=1
ri (1.1.5)
and
c =
n
j=1
cj. (1.1.6)
Remark. Sometimes it is convenient to think of a regular matrix as a
block matrix whose entries are 1 × 1 matrices themselves.
Deﬁnition 1.7 [Multiplication of block matrices] Let A = [Aij]
be an m × n block matrix with Aij being a pi × qj matrix. Let

6 1. MATRIX ALGEBRA
B = [Bjk] be an n×p block matrix with Bjk being a qj ×rk matrix.
Then the two block matrices are compatible for multiplication
and their multiplication is defined by C = AB = [Cik] where
Cik =
n
j=1
AijBjk (1.1.7)
and Cik is a pi × rk matrix.
Definition 1.8 A block diagonal matrix is a block matrix
whose off diagonal entries are zero matrices.
1.2. Linear independence, span, rank
1.2.1. Spaces associated with a matrix
Definition 1.9 The column space of a matrix is defined as the
vector space spanned by columns of the matrix.
Let A be an m × n matrix with
A = a1 a2 . . . an
Then the column space is given by
C(A) = {x ∈ Fm
: x =
n
i=1
αiai for some αi ∈ F}. (1.2.1)
Definition 1.10 The row space of a matrix is defined as the
vector space spanned by rows of the matrix.
Let A be an m × n matrix with
A =






aT
1
aT
2
...
aT
m







1.2. LINEAR INDEPENDENCE, SPAN, RANK 7
Then the row space is given by
R(A) = {x ∈ Fn
: x =
m
i=1
αiai for some αi ∈ F}. (1.2.2)
1.2.2. Rank
Definition 1.11 [Column rank] The column rank of a matrix
is defined as the maximum number of columns which are linearly
independent. In other words column rank is the dimension of the
column space of a matrix.
Definition 1.12 [Row rank] The row rank of a matrix is defined
as the maximum number of rows which are linearly independent.
In other words row rank is the dimension of the row space of a
matrix.
Theorem 1.2 The column rank and row rank of a matrix are
equal.
Definition 1.13 [Rank] The rank of a matrix is defined to be
equal to its column rank which is equal to its row rank.
Lemma 1.3 For an m × n matrix A
0 ≤ rank(A) ≤ min(m, n). (1.2.3)
Lemma 1.4 The rank of a matrix is 0 if and only if it is a zero
matrix.
Definition 1.14 [Full rank matrix] An m × n matrix A is called
full rank if
rank(A) = min(m, n).
In other words it is either a full column rank matrix or a full row
rank matrix or both.

8 1. MATRIX ALGEBRA
Lemma 1.5 [Rank of product to two matrices] Let A be an m×n
matrix and B be an n × p matrix then
rank(AB) ≤ min(rank(A), rank(B)). (1.2.4)
Lemma 1.6 [Post-multiplication with a full row rank matrix] Let
A be an m × n matrix and B be an n × p matrix. If B is of rank
n then
rank(AB) = rank(A). (1.2.5)
Lemma 1.7 [Pre-multiplication with a full column rank matrix]
Let A be an m × n matrix and B be an n × p matrix. If A is of
rank n then
rank(AB) = rank(B). (1.2.6)
Lemma 1.8 The rank of a diagonal matrix is equal to the number
of non-zero elements on its main diagonal.
Proof. The columns which correspond to diagonal entries which
are zero are zero columns. Other columns are linearly independent.
The number of linearly independent rows is also the same. Hence their
count gives us the rank of the matrix.
1.3. Invertible matrices
Deﬁnition 1.15 [Invertible] A square matrix A is called invert-
ible if there exists another square matrix B of same size such that
AB = BA = I.
The matrix B is called the inverse of A and is denoted as A−1
.
Lemma 1.9 If A is invertible then its inverse A−1
is also invertible
and the inverse of A−1
is nothing but A.

1.3. INVERTIBLE MATRICES 9
Lemma 1.10 Identity matrix I is invertible.
Proof.
II = I =⇒ I−1
= I.
Lemma 1.11 If A is invertible then columns of A are linearly
independent.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Assume that columns of A are linearly dependent. Then there exists
u = 0 such that
Au = 0 =⇒ BAu = 0 =⇒ Iu = 0 =⇒ u = 0
a contradiction. Hence columns of A are linearly independent.
Lemma 1.12 If an n × n matrix A is invertible then columns of
A span Fn
.
that
AB = BA = I.
Now let x ∈ Fn
be any arbitrary vector. We need to show that there
exists α ∈ Fn
such that
x = Aα.
But
x = Ix = ABx = A(Bx).
Thus if we choose α = Bx, then
x = Aα.

10 1. MATRIX ALGEBRA
Thus columns of A span Fn
.
Lemma 1.13 If A is invertible, then columns of A form a basis
for Fn
.
Proof. In Fn
a basis is a set of vectors which is linearly inde-
pendent and spans Fn
. By lemma 1.11 and lemma 1.12, columns of
an invertible matrix A satisfy both conditions. Hence they form a
basis.
Lemma 1.14 If A is invertible than AT
is invertible.
that
AB = BA = I.
Applying transpose on both sides we get
BT
AT
= AT
BT
= I.
Thus BT
is inverse of AT
and AT
is invertible.
Lemma 1.15 If A is invertible than AH
is invertible.
that
AB = BA = I.
Applying conjugate transpose on both sides we get
BH
AH
= AH
BH
= I.
Thus BH
is inverse of AH
and AH
is invertible.

Lemma 1.16 If A and B are invertible then AB is invertible.
Proof. We note that
(AB)(B−1
A−1
) = A(BB−1
)A−1
= AIA−1
= I.
Similarly
(B−1
A−1
)(AB) = B−1
(A−1
A)B = B−1
IB = I.
Thus B−1
A−1
is the inverse of AB.
Lemma 1.17 The set of n×n invertible matrices under the matrix
multiplication operation form a group.
Proof. We verify the properties of a group
Closure: If A and B are invertible then AB is invertible. Hence the
set is closed.
Associativity: Matrix multiplication is associative.
Identity element: I is invertible and AI = IA = A for all invertible
matrices.
Inverse element: If A is invertible then A−1
is also invertible.
Thus the set of invertible matrices is indeed a group under matrix
multiplication.
Lemma 1.18 An n × n matrix A is invertible if and only if it is
full rank i.e.
rank(A) = n.
Corollary 1.19. The rank of an invertible matrix and its inverse are
same.

1.3.1. Similar matrices
Deﬁnition 1.16 [Similar matrices] An n × n matrix B is similar
to an n × n matrix A if there exists an n × n non-singular matrix
C such that
B = C−1
AC.
Lemma 1.20 If B is similar to A then A is similar to B. Thus
similarity is a symmetric relation.
Proof.
B = C−1
AC =⇒ A = CBC−1
=⇒ A = (C−1
)−1
BC−1
Thus there exists a matrix D = C−1
such that
A = D−1
BD.
Thus A is similar to B.
Lemma 1.21 Similar matrices have same rank.
Proof. Let B be similar to A. Thus their exists an invertible
matrix C such that
B = C−1
AC.
Since C is invertible hence we have rank(C) = rank(C−1
) = n. Now
using lemma 1.6 rank(AC) = rank(A) and using lemma 1.7 we have
rank(C−1
(AC)) = rank(AC) = rank(A). Thus
rank(B) = rank(A).
Lemma 1.22 Similarity is an equivalence relation on the set of
n × n matrices.

Proof. Let A, B, C be n×n matrices. A is similar to itself through
an invertible matrix I. If A is similar to B then B is similar to itself.
If B is similar to A via P s.t. B = P−1
AP and C is similar to B
via Q s.t. C = Q−1
BQ then C is similar to A via PQ such that
C = (PQ)−1
A(PQ). Thus similarity is an equivalence relation on the
set of square matrices and if A is any n×n matrix then the set of n×n
matrices similar to A forms an equivalence class.
1.3.2. Gram matrices
Deﬁnition 1.17 Gram matrix of columns of A is given by
G = AH
A (1.3.1)
Deﬁnition 1.18 Gram matrix of rows of A is given by
G = AAH
(1.3.2)
Remark. Usually when we talk about Gram matrix of a matrix we
are looking at the Gram matrix of its column vectors.
Remark. For real matrix A ∈ Rm×n
, the Gram matrix of its column
vectors is given by AT
A and the Gram matrix for its row vectors is
given by AAT
.
Following results apply equally well for the real case.
Lemma 1.23 The columns of a matrix are linearly dependent if
and only if the Gram matrix of its column vectors AH
A is not
invertible.
Proof. Let A be an m × n matrix and G = AH
A be the Gram
matrix of its columns.
If columns of A are linearly dependent, then there exists a vector u = 0
such that
Au = 0.

Thus
Gu = AH
Au = 0.
Hence the columns of G are also dependent and G is not invertible.
Conversely let us assume that G is not invertible, thus columns of G
are dependent and there exists a vector v = 0 such that
Gv = 0.
Now
vH
Gv = vH
AH
Av = (Av)H
(Av) = Av 2
2.
From previous equation, we have
Av 2
2 = 0 =⇒ Av = 0.
Since v = 0 hence columns of A are also linearly dependent.
Corollary 1.24. The columns of a matrix are linearly independent if
and only if the Gram matrix of its column vectors AH
A is invertible.
Proof. Columns of A can be dependent only if its Gram matrix is
not invertible. Thus if the Gram matrix is invertible, then the columns
of A are linearly independent.
The Gram matrix is not invertible only if columns of A are linearly
dependent. Thus if columns of A are linearly independent then the
Gram matrix is invertible.
Corollary 1.25. Let A be a full column rank matrix. Then AH
A is
invertible.
Lemma 1.26 The null space of A and its Gram matrix AH
A co-
incide. i.e.
N(A) = N(AH
A). (1.3.3)
Proof. Let u ∈ N(A). Then
Au = 0 =⇒ AH
Au = 0.

Thus
u ∈ N(AH
A) =⇒ N(A) ⊆ N(AH
A).
Now let u ∈ N(AH
A). Then
AH
Au = 0 =⇒ uH
AH
Au = 0 =⇒ Au 2
2 = 0 =⇒ Au = 0.
Thus we have
u ∈ N(A) =⇒ N(AH
A) ⊆ N(A).
Lemma 1.27 The rows of a matrix A are linearly dependent if and
only if the Gram matrix of its row vectors AAH
is not invertible.
Proof. Rows of A are linearly dependent, if and only if columns
of AH
are linearly dependent. There exists a vector v = 0 s.t.
AH
v = 0
Thus
Gv = AAH
v = 0.
Since v = 0 hence G is not invertible.
Converse: assuming that G is not invertible, there exists a vector u = 0
s.t.
Gu = 0.
Now
uH
Gu = uH
AAH
u = (AH
u)H
(AH
u) = AH
u 2
2 = 0 =⇒ AH
u = 0.
Since u = 0 hence columns of AH
and consequently rows of A are
linearly dependent.
Corollary 1.28. The rows of a matrix A are linearly independent if
and only if the Gram matrix of its row vectors AAH
is invertible.
Corollary 1.29. Let A be a full row rank matrix. Then AAH
is in-
vertible.

1.3.3. Pseudo inverses
Definition 1.19 [Moore-Penrose pseudo-inverse] Let A be an m×
n matrix. An n×m matrix A†
is called its Moore-Penrose pseudo-
inverse if it satisfies all of the following criteria:
(1) AA†
A = A.
(2) A†
AA†
= A†
.
(3) AA† H
= AA†
i.e. AA†
is Hermitian.
(4) (A†
A)H
= A†
A i.e. A†
A is Hermitian.
Theorem 1.30 [Existence and uniqueness] For any matrix A there
exists precisely one matrix A†
which satisfies all the requirements
in definition 1.19.
We omit the proof for this. The pseudo-inverse can actually be ob-
tained by the singular value decomposition of A. This is shown in
lemma 1.110.
Lemma 1.31 Let D = diag(d1, d2, . . . , dn) be an n × n diag-
onal matrix. Then its Moore-Penrose pseudo-inverse is D†
=
diag(c1, c2, . . . , cn) where
ci =
1
di
if di = 0;
0 if di = 0.
Proof. We note that D†
D = DD†
= F = diag(f1, f2, . . . fn)
where
fi =
1 if di = 0;
0 if di = 0.
We now verify the requirements in definition 1.19.
DD†
D = FD = D.
D†
DD†
= FD†
= D†
D†
D = DD†
= F is a diagonal hence Hermitian matrix.

Lemma 1.32 Let D = diag(d1, d2, . . . , dp) be an m × n rectan-
gular diagonal matrix where p = min(m, n). Then its Moore-
Penrose pseudo-inverse is an n × m rectangular diagonal matrix
D†
= diag(c1, c2, . . . , cp) where
ci =
1
di
if di = 0;
0 if di = 0.
Proof. F = D†
D = diag(f1, f2, . . . fn) is an n × n matrix where
fi =



1 if di = 0;
0 if di = 0;
0 if i > p.
G = DD†
= diag(g1, g2, . . . gn) is an m × m matrix where
gi =



1 if di = 0;
0 if di = 0;
0 if i > p.
We now verify the requirements in deﬁnition 1.19.
DD†
D = DF = D.
D†
DD†
= D†
G = D†
F = D†
D and G = DD†
are both diagonal hence Hermitian matrices.
Lemma 1.33 If A is full column rank then its Moore-Penrose
pseudo-inverse is given by
A†
= (AH
A)−1
AH
. (1.3.4)
It is a left inverse of A.
Proof. By corollary 1.25 AH
A is invertible.

First of all we verify that its a left inverse.
A†
A = (AH
A)−1
AH
A = I.
We now verify all the properties.
AA†
A = AI = A.
A†
AA†
= IA†
= A†
.
Hermitian properties:
AA† H
= A(AH
A)−1
AH H
= A(AH
A)−1
AH
= AA†
.
(A†
A)H
= IH
= I = A†
A.
Lemma 1.34 If A is full row rank then its Moore-Penrose pseudo-
inverse is given by
A†
= AH
(AAH
)−1
. (1.3.5)
It is a right inverse of A.
Proof. By corollary 1.29 AAH
is invertible.
First of all we verify that its a right inverse.
AA†
= AAH
(AAH
)−1
= I.
We now verify all the properties.
AA†
A = IA = A.
A†
AA†
= A†
I = A†
.
Hermitian properties:
AA† H
= IH
= I = AA†
.
(A†
A)H
= AH
(AAH
)−1
A
H
= AH
(AAH
)−1
A = A†
A.

1.4. TRACE AND DETERMINANT 19
1.4. Trace and determinant
1.4.1. Trace
Deﬁnition 1.20 [Trace] The trace of a square matrix is deﬁned
as the sum of the entries on its main diagonal. Let A be an n × n
matrix, then
tr(A) =
n
i=1
aii (1.4.1)
where tr(A) denotes the trace of A.
Lemma 1.35 The trace of a square matrix and its transpose are
equal.
tr(A) = tr(AT
). (1.4.2)
Lemma 1.36 Trace of sum of two square matrices is equal to the
sum of their traces.
tr(A + B) = tr(A) + tr(B). (1.4.3)
Lemma 1.37 Let A be an m×n matrix and B be an n×m matrix.
Then
tr(AB) = tr(BA). (1.4.4)
Proof. Let AB = C = [cij]. Then
cij =
n
k=1
aikbkj.
Thus
cii =
n
k=1
aikbki.
Now
tr(C) =
m
i=1
cii =
m
i=1
n
k=1
aikbki =
n
k=1
m
i=1
aikbki =
n
k=1
m
i=1
bkiaik.

Let BA = D = [dij]. Then
dij =
m
k=1
bikakj.
Thus
dii =
m
k=1
bikaki.
Hence
tr(D) =
n
i=1
dii =
n
i=1
m
k=1
bikaki =
m
i=1
n
k=1
bkiaik.
This completes the proof.
Lemma 1.38 Let A ∈ Fm×n
, B ∈ Fn×p
, C ∈ Fp×m
be three ma-
trices. Then
tr(ABC) = tr(BCA) = tr(CAB). (1.4.5)
Proof. Let AB = D. Then
tr(ABC) = tr(DC) = tr(CD) = tr(CAB).
Similarly the other result can be proved.
Lemma 1.39 Trace of similar matrices is equal.
Proof. Let B be similar to A. Thus
B = C−1
AC
for some invertible matrix C. Then
tr(B) = tr(C−1
AC) = tr(CC−1
A) = tr(A).
We used lemma 1.37.

1.4. TRACE AND DETERMINANT 21
1.4.2. Determinants
Following are some results on determinant of a square matrix A.
Lemma 1.40
det(αA) = αn
det(A). (1.4.6)
Lemma 1.41 Determinant of a square matrix and its transpose
are equal.
det(A) = det(AT
). (1.4.7)
Lemma 1.42 Let A be a complex square matrix. Then
det(AH
) = det(A). (1.4.8)
Proof.
det(AH
) = det(A
T
) = det(A) = det(A).
Lemma 1.43 Let A and B be two n × n matrices. Then
det(AB) = det(A) det(B). (1.4.9)
Lemma 1.44 Let A be an invertible matrix. Then
det(A−1
) =
1
det(A)
. (1.4.10)

Lemma 1.45 Let A be a square matrix and p ∈ N. Then
det(Ap
) = (det(A))p
. (1.4.11)
Lemma 1.46 [Determinant of a triangular matrix] Determinant
of a triangular matrix is the product of its diagonal entries. i.e. if
A is upper or lower triangular matrix then
det(A) =
n
i=1
aii. (1.4.12)
Lemma 1.47 [Determinant of a diagonal matrix] Determinant of
a diagonal matrix is the product of its diagonal entries. i.e. if A
is a diagonal matrix then
det(A) =
n
i=1
aii. (1.4.13)
Lemma 1.48 [Determinant of similar matrices] Determinant of
similar matrices is equal.
Proof. Let B be similar to A. Thus
B = C−1
AC
for some invertible matrix C. Hence
det(B) = det(C−1
AC) = det(C−1
) det(A) det(C).
Now
det(C−1
) det(A) det(C) =
1
det(C)
det(A) det(C) = det(A).
We used lemma 1.43 and lemma 1.44.

1.5. UNITARY AND ORTHOGONAL MATRICES 23
Lemma 1.49 Let u and v be vectors in Fn
. Then
det(I + uvT
) = 1 + uT
v. (1.4.14)
Lemma 1.50 [Determinant of a small perturbation of identity
matrix] Let A be a square matrix and let ≈ 0. Then
det(I + A) ≈ 1 + tr(A). (1.4.15)
1.5. Unitary and orthogonal matrices
1.5.1. Orthogonal matrix
Deﬁnition 1.21 [Orthogonal matrix] A real square matrix U is
called orthogonal if the columns of U form an orthonormal set.
In other words, let
U = u1 u2 . . . un
with ui ∈ Rn
. Then we have
ui · uj = δi,j.
Lemma 1.51 An orthogonal matrix U is invertible with UT
=
U−1
.
Proof. Let
U = u1 u2 . . . un
be orthogonal with
UT
=






uT
1
uT
2
...
uT
n .







Then
UT
U =






uT
1
uT
2
...
uT
n .






u1 u2 . . . un = ui · uj = I.
Since columns of U are linearly independent and span Rn
, hence U is
invertible. Thus
UT
= U−1
.
Lemma 1.52 Determinant of an orthogonal matrix is ±1.
Proof. Let U be an orthogonal matrix. Then
det(UT
U) = det(I) =⇒ (det(U))2
= 1
Thus we have
det(U) = ±1.
1.5.2. Unitary matrix
Deﬁnition 1.22 [Unitary matrix] A complex square matrix U is
called unitary if the columns of U form an orthonormal set. In
other words, let
U = u1 u2 . . . un
with ui ∈ Cn
. Then we have
ui · uj = ui, uj = uH
j ui = δi,j.
Lemma 1.53 A unitary matrix U is invertible with UH
= U−1
.
Proof. Let
U = u1 u2 . . . un

1.5. UNITARY AND ORTHOGONAL MATRICES 25
be orthogonal with
UH
=






uH
1
uH
2
...
uH
n .






Then
UH
U =






uH
1
uH
2
...
uH
n .






u1 u2 . . . un = uH
i uj = I.
Since columns of U are linearly independent and span Cn
, hence U is
invertible. Thus
UH
= U−1
.
Lemma 1.54 The magnitude of determinant of a unitary matrix
is 1.
Proof. Let U be a unitary matrix. Then
det(UH
U) = det(I) =⇒ det(UH
) det(U) = 1 =⇒ det(U)det(U) = 1.
Thus we have
| det(U)|2
= 1 =⇒ | det(U)| = 1.
1.5.3. F unitary matrix
We provide a common definition for unitary matrices over any field F.
This definition applies to both real and complex matrices.
Definition 1.23 [F Unitary matrix] A square matrix U ∈ Fn×n
is
called F unitary if the columns of U form an orthonormal set. In

other words, let
U = u1 u2 . . . un
with ui ∈ Fn
. Then we have
ui, uj = uH
j ui = δi,j.
We note that a suitable deﬁnition of inner product transports the def-
inition appropriately into orthogonal matrices over R and unitary ma-
trices over C.
When we are talking about F unitary matrices, then we will use the
symbol UH
to mean its inverse. In the complex case, it will map to its
conjugate transpose, while in real case it will map to simple transpose.
This deﬁnition helps us simplify some of the discussions in the sequel
(like singular value decomposition).
Following results apply equally to orthogonal matrices for real case and
unitary matrices for complex case.
Lemma 1.55 [Norm preservation] F-unitary matrices preserve norm.
i.e.
Ux 2 = x 2.
Proof.
Ux 2
2 = (Ux)H
(Ux) = xH
UH
Ux = xH
Ix = x 2
2.
Remark. For the real case we have
Ux 2
2 = (Ux)T
(Ux) = xT
UT
Ux = xT
Ix = x 2
2.
Lemma 1.56 [Inner product preservation] F-unitary matrices pre-
serve inner product. i.e.
Ux, Uy = x, y .

1.6. EIGEN VALUES 27
Proof.
Ux, Uy = (Uy)H
Ux = yH
UH
Ux = yH
x.
Remark. For the real case we have
Ux, Uy = (Uy)T
Ux = yT
UT
Ux = yT
x.
1.6. Eigen values
Much of the discussion in this section will be equally applicable to real
as well as complex matrices. We will use the complex notation mostly
and make specific remarks for real matrices wherever needed.
Definition 1.24 [Eigen value] A scalar λ is an eigen value of an
n × n matrix A = [aij] if there exists a non null vector x such that
Ax = λx. (1.6.1)
A non null vector x which satisfies this equation is called an eigen
vector of A for the eigen value λ.
An eigen value is also known as a characteristic value, proper
value or a latent value.
We note that (1.6.1) can be written as
Ax = λInx =⇒ (A − λIn)x = 0. (1.6.2)
Thus λ is an eigen value of A if and only if the matrix A−λI is singular.
Definition 1.25 [Spectrum of a matrix] The set comprising of
eigen values of a matrix A is known as its spectrum.
Remark. For each eigen vector x for a matrix A the corresponding
eigen value λ is unique.

Proof. Assume that for x there are two eigen values λ1 and λ2,
then
Ax = λ1x = λ2x =⇒ (λ1 − λ2)x = 0.
This can happen only when either x = 0 or λ1 = λ2. Since x is an
eigen vector, it cannot be 0. Thus λ1 = λ2.
Remark. If x is an eigen vector for A, then the corresponding eigen
value is given by
λ =
xH
Ax
xHx
. (1.6.3)
Proof.
Ax = λx =⇒ xH
Ax = λxH
x =⇒ λ =
xH
Ax
xHx
.
since x is non-zero.
Remark. An eigen vector x of A for eigen value λ belongs to the null
space of A − λI, i.e.
x ∈ N(A − λI).
In other words x is a nontrivial solution to the homogeneous system of
linear equations given by
(A − λI)z = 0.
Deﬁnition 1.26 [Eigen space] Let λ be an eigen value for a square
matrix A. Then its eigen space is the null space of A − λI i.e.
N(A − λI).
Remark. The set comprising all the eigen vectors of A for an eigen
value λ is given by
N(A − λI) {0} (1.6.4)
since 0 cannot be an eigen vector.

Deﬁnition 1.27 [Geometric multiplicity] Let λ be an eigen value
for a square matrix A. The dimension of its eigen space N(A−λI)
is known as the geometric multiplicity of the eigen value λ.
Remark. Clearly
dim(N(A − λI)) = n − rank(A − λI).
Remark. A scalar λ can be an eigen value of a square matrix A if and
only if
det(A − λI) = 0.
det(A − λI) is a polynomial in λ of degree n.
Remark.
det(A − λI) = p(λ) = αn
λn
+ αn−1
λn−1
+ · · · + α1
λ + α0 (1.6.5)
where αi depend on entries in A.
In this sense, an eigen value of A is a root of the equation
p(λ) = 0. (1.6.6)
Its easy to show that αn
= (−1)n
.
Deﬁnition 1.28 [Characteristic polynomial and equation] For any
square matrix A, the polynomial given by p(λ) = det(A − λI) is
known as its characteristic polynomial. The equation give by
p(λ) = 0 (1.6.7)
is known as its characteristic equation. The eigen values of
A are the roots of its characteristic polynomial or solutions of its
characteristic equation.

Lemma 1.57 [Roots of characteristic equation] For real square
matrices, if we restrict eigen values to real values, then the char-
acteristic polynomial can be factored as
p(λ) = (−1)n
(λ − λ1)r1
. . . (λ − λk)rk
q(λ). (1.6.8)
The polynomial has k distinct real roots. For each root λi, ri is a
positive integer indicating how many times the root appears. q(λ)
is a polynomial that has no real roots. The following is true
r1 + · · · + rk + deg(q(λ)) = n. (1.6.9)
Clearly k ≤ n.
For complex square matrices where eigen values can be complex
(including real square matrices), the characteristic polynomial can
be factored as
p(λ) = (−1)n
(λ − λ1)r1
. . . (λ − λk)rk
. (1.6.10)
The polynomial can be completely factorized into ﬁrst degree poly-
nomials. There are k distinct roots or eigen values. The following
is true
r1 + · · · + rk = n. (1.6.11)
Thus including the duplicates there are exactly n eigen values for
a complex square matrix.
Remark. It is quite possible that a real square matrix doesn’t have
any real eigen values.
Deﬁnition 1.29 [Algebraic multiplicity] The number of times an
eigen value appears in the factorization of the characteristic poly-
nomial of a square matrix A is known as its algebraic multiplicity.
In other words ri is the algebraic multiplicity for λi in above fac-
torization.
Remark. In above the set {λ1, . . . , λk} forms the spectrum of A.

Let us consider the sum of ri which gives the count of total number of
roots of p(λ).
m =
k
i=1
ri. (1.6.12)
With this there are m not-necessarily distinct roots of p(λ). Let us
write p(λ) as
p(λ) = (−1)n
(λ − c1)(λ − c2) . . . (λ − cm)q(λ). (1.6.13)
where c1, c2, . . . , cm are m scalars (not necessarily distinct) of which r1
scalars are λ1, r2 are λ2 and so on. Obviously for the complex case
q(λ) = 1.
We will refer to the set (allowing repetitions) {c1, c2, . . . , cm} as the
eigen values of the matrix A where ci are not necessarily distinct. In
contrast the spectrum of A refers to the set of distinct eigen values of
A. The symbol c has been chosen based on the other name for eigen
values (the characteristic values).
We can put together eigen vectors of a matrix into another matrix by
itself. This can be very useful tool. We start with a simple idea.
Lemma 1.58 Let A be an n × n matrix. Let u1, u2, . . . , ur be r
non-zero vectors from Fn
. Let us construct an n × r matrix
U = u1 u2 . . . ur .
Then all the r vectors are eigen vectors of A if and only if there
exists a diagonal matrix D = diag(d1, . . . , dr) such that
AU = UD. (1.6.14)
Proof. Expanding the equation, we can write
Au1 Au2 . . . Aur = d1u1 d2u2 . . . drur .
Clearly we want
Aui = diui

where ui are non-zero. This is possible only when di is an eigen value
of A and ui is an eigen vector for di.
Converse: Assume that ui are eigen vectors. Choose di to be corre-
sponding eigen values. Then the equation holds.
Lemma 1.59 0 is an eigen value of a square matrix A if and only
if A is singular.
Proof. Let 0 be an eigen value of A. Then there exists u = 0 such
that
Au = 0u = 0.
Thus u is a non-trivial solution of the homogeneous linear system. Thus
A is singular.
Converse: Assuming that A is singular, there exists u = 0 s.t.
Au = 0 = 0u.
Thus 0 is an eigen value of A.
Lemma 1.60 If a square matrix A is singular, then N(A) is the
eigen space for the eigen value λ = 0.
Proof. This is straight forward from the deﬁnition of eigen space
(see deﬁnition 1.26).
Remark. Clearly the geometric multiplicity of λ = 0 equals nullity(A) =
n − rank(A).
Lemma 1.61 Let A be a square matrix. Then A and AT
have
same eigen values.
Proof. The eigen values of AT
are given by
det(AT
− λI) = 0.

But
AT
− λI = AT
− (λI)T
= (A − λI)T
.
Hence (using lemma 1.41)
det(AT
− λI) = det (A − λI)T
= det(A − λI).
Thus the characteristic polynomials of A and AT
are same. Hence the
eigen values are same. In other words the spectrum of A and AT
are
same.
Remark (Direction preservation). If x is an eigen vector with a non-
zero eigen value λ for A then Ax and x are collinear.
In other words the angle between Ax and x is either 0◦
when λ is
positive and is 180◦
when λ is negative. Let us look at the inner
product:
Ax, x = xH
Ax = xH
λx = λ x 2
2.
Meanwhile
Ax 2 = λx 2 = |λ| x 2.
Thus
| Ax, x | = Ax 2 x 2.
The angle θ between Ax and x is given by
cos θ =
Ax, x
Ax 2 x 2
=
λ x 2
2
|λ| x 2
2
= ±1.
Lemma 1.62 Let A be a square matrix and λ be an eigen value
of A. Let p ∈ N. Then λp
is an eigen value of Ap
.
Proof. For p = 1 the statement holds trivially since λ1
is an eigen
value of A1
. Assume that the statement holds for some value of p.
Thus let λp
be an eigen value of Ap
and let u be corresponding eigen
vector. Now
Ap+1
u = Ap
(Au) = Ap
λu = λAp
u = λλp
u = λp+1
u.

Thus λp+1
is an eigen value for Ap+1
with the same eigen vector u. With
the principle of mathematical induction, the proof is complete.
Lemma 1.63 Let a square matrix A be non singular and let λ = 0
be some eigen value of A. Then λ−1
is an eigen value of A−1
.
Moreover, all eigen values of A−1
are obtained by taking inverses
of eigen values of A i.e. if µ = 0 is an eigen value of A−1
then 1
µ
is an eigen value of A also. Also, A and A−1
share the same set
of eigen vectors.
Proof. Let u = 0 be an eigen vector of A for the eigen value λ.
Then
Au = λu =⇒ u = A−1
λu =⇒
1
λ
u = A−1
u.
Thus u is also an eigen vector of A−1
for the eigen value 1
λ
.
Now let B = A−1
. Then B−1
= A. Thus if µ is an eigen value of B
then 1
µ
is an eigen value of B−1
= A.
Thus if A is invertible then eigen values of A and A−1
have one to one
correspondence.
This result is very useful. Since if it can be shown that a matrix A is
similar to a diagonal or a triangular matrix whose eigen values are easy
to obtain then determination of the eigen values of A becomes straight
forward.
1.6.1. Invariant subspaces
Deﬁnition 1.30 [Invariance subspace] Let A be a square n × n
matrix and let W be a subspace of Fn
i.e. W ≤ F. Then W is
invariant relative to A if
Aw ∈ W ∀ w ∈ W. (1.6.15)
i.e. A(W) ⊆ W or for every vector w ∈ W its mapping Aw is also
in W. Thus action of A on W doesn’t take us outside of W.

We also say that W is A-invariant.
Eigen vectors are generators of invariant subspaces.
Lemma 1.64 Let A be an n × n matrix. Let x1, x2, . . . , xr be r
eigen vectors of A. Let us construct an n × r matrix
X = x1 x2 . . . rr .
Then the column space of X i.e. C(X) is invariant relative to A.
Proof. Let us assume that c1, c2, . . . , cr are the eigen values cor-
responding to x1, x2, . . . , xr (not necessarily distinct).
Let any vector x ∈ C(X) be given by
x =
r
i=1
αixi.
Then
Ax = A
r
i=1
αixi =
r
i=1
αiAxi =
r
i=1
αicixi.
Clearly Ax is also a linear combination of xi hence belongs to C(X).
Thus X is invariant relative to A or X is A-invariant.
1.6.2. Triangular matrices
Lemma 1.65 Let A be an n×n upper or lower triangular matrix.
Then its eigen values are the entries on its main diagonal.
Proof. If A is triangular then A − λI is also triangular with its
diagonal entries being (aii − λ). Using lemma 1.46, we have
p(λ) = det(A − λI) =
n
i=1
(aii − λ).
Clearly the roots of characteristic polynomial are aii.
Several small results follow from this lemma.

Corollary 1.66. Let A = [aij] be an n × n triangular matrix.
(a) The characteristic polynomial of A is p(λ) = (−1)n
(λ − aii).
(a) A scalar λ is an eigen value of A iﬀ its one of the diagonal entries
of A.
(a) The algebraic multiplicity of an eigen value λ is equal to the number
of times it appears on the main diagonal of A.
(a) The spectrum of A is given by the distinct entries on the main
diagonal of A.
A diagonal matrix is naturally both an upper triangular matrix as well
as a lower triangular matrix. Similar results hold for the eigen values
of a diagonal matrix also.
Lemma 1.67 Let A = [aij] be an n × n diagonal matrix.
(a) Its eigen values are the entries on its main diagonal.
(a) The characteristic polynomial of A is p(λ) = (−1)n
(λ − aii).
(a) A scalar λ is an eigen value of A iﬀ its one of the diagonal
entries of A.
(a) The algebraic multiplicity of an eigen value λ is equal to the
number of times it appears on the main diagonal of A.
(a) The spectrum of A is given by the distinct entries on the main
diagonal of A.
There is also a result for the geometric multiplicity of eigen values for
a diagonal matrix.
Lemma 1.68 Let A = [aij] be an n × n diagonal matrix. The
geometric multiplicity of an eigen value λ is equal to the number
of times it appears on the main diagonal of A.
Proof. The unit vectors ei are eigen vectors for A since
Aei = aiiei.

They are independent. Thus if a particular eigen value appears r num-
ber of times, then there are r linearly independent eigen vectors for the
eigen value. Thus its geometric multiplicity is equal to the algebraic
multiplicity.
1.6.3. Similar matrices
Some very useful results are available for similar matrices.
Lemma 1.69 The characteristic polynomial and spectrum of sim-
ilar matrices is same.
Proof. Let B be similar to A. Thus there exists an invertible
matrix C such that
B = C−1
AC.
Now
B−λI = C−1
AC−λI = C−1
AC−λC−1
C = C−1
(AC−λC) = C−1
(A−λI)C.
Thus B − λI is similar to A − λI. Hence due to lemma 1.48, their
determinant is equal i.e.
det(B − λI) = det(A − λI).
This means that the characteristic polynomials of A and B are same.
Since eigen values are nothing but roots of the characteristic polyno-
mial, hence they are same too. This means that the spectrum (the set
of distinct eigen values) is same.
Corollary 1.70. If A and B are similar to each other then
(a) An eigen value has same algebraic and geometric multiplicity for
both A and B.
(a) The (not necessarily distinct) eigen values of A and B are same.
Although the eigen values are same, but the eigen vectors are diﬀer-
ent.

Lemma 1.71 Let A and B be similar with
B = C−1
AC
for some invertible matrix C. If u is an eigen vector of A for an
eigen value λ, then C−1
u is an eigen vector of B for the same
eigen value.
Proof. u is an eigen vector of A for an eigen value λ. Thus we
have
Au = λu.
Thus
BC−1
u = C−1
ACC−1
u = C−1
Au = C−1
λu = λC−1
u.
Now u = 0 and C−1
is non singular. Thus C−1
u = 0. Thus C−1
u is an
eigen vector of B.
Theorem 1.72 [Geometric vs. algebraic multiplicity] Let λ be an
eigen value of a square matrix A. Then the geometric multiplicity
of λ is less than or equal to its algebraic multiplicity.
Corollary 1.73. If an n×n matrix A has n distinct eigen values, then
each of them has a geometric (and algebraic) multiplicity of 1.
Proof. The algebraic multiplicity of an eigen value is greater than
or equal to 1. But the sum cannot exceed n. Since there are n distinct
eigen values, thus each of them has algebraic multiplicity of 1. Now
geometric multiplicity of an eigen value is greater than equal to 1 and
less than equal to its algebraic multiplicity.

Corollary 1.74. Let an n × n matrix A has k distinct eigen values
λ1, λ2, . . . , λk with algebraic multiplicities r1, r2, . . . , rk and geometric
multiplicities g1, g2, . . . gk respectively. Then
k
i=1
gk ≤
k
i=1
rk ≤ n.
Moreover if
k
i=1
gk =
k
i=1
rk
then
gk = rk.
1.6.4. Linear independence of eigen vectors
Theorem 1.75 [Linear independence of eigen vectors for distinct
eigen values] Let A be an n × n square matrix. Let x1, x2, . . . , xk
be any k eigen vectors of A for distinct eigen values λ1, λ2, . . . , λk
respectively. Then x1, x2, . . . , xk are linearly independent.
Proof. We ﬁrst prove the simpler case with 2 eigen vectors x1 and
x2 and corresponding eigen values λ1 and λ2 respectively.
Let there be a linear relationship between x1 and x2 given by
α1x1 + α2x2 = 0.
Multiplying both sides with (A − λ1I) we get
α1(A − λ1I)x1 + α2(A − λ1I)x2 = 0
=⇒ α1(λ1 − λ1)x1 + α2(λ1 − λ2)x2 = 0
=⇒ α2(λ1 − λ2)x2 = 0.
Since λ1 = λ2 and x2 = 0 , hence α2 = 0.
Similarly by multiplying with (A − λ2I) on both sides, we can show
that α1 = 0. Thus x1 and x2 are linearly independent.

Now for the general case, consider a linear relationship between x1, x2, . . . , xk
given by
α1x1 + α2x2 + . . . αkxk = 0.
Multiplying by k
i=j,i=1(A − λiI) and using the fact that λi = λj if
i = j, we get αj = 0. Thus the only linear relationship is the trivial
relationship. This completes the proof.
For eigen values with geometric multiplicity greater than 1 there are
multiple eigenvectors corresponding to the eigen value which are lin-
early independent. In this context, above theorem can be generalized
further.
Theorem 1.76 Let λ1, λ2, . . . , λk be k distinct eigen values of
A. Let {xj
1, xj
2, . . . xj
gj
} be any gj linearly independent eigen vec-
tors from the eigen space of λj where gj is the geometric mul-
tiplicity of λj. Then the combined set of eigen vectors given by
{x1
1, . . . x1
g1
, . . . xk
1, . . . xk
gk
} consisting of k
j=1 gj eigen vectors is
linearly independent.
This result puts an upper limit on the number of linearly independent
eigen vectors of a square matrix.
Lemma 1.77 Let {λ1, . . . , λk} represents the spectrum of an n×n
matrix A. Let g1, . . . , gk be the geometric multiplicities of λ1, . . . λk
respectively. Then the number of linearly independent eigen vectors
for A is
k
i=1
gi.
Moreover if
k
i=1
gi = n
then a set of n linearly independent eigen vectors of A can be found
which forms a basis for Fn
.

1.6.5. Diagonalization
Diagonalization is one of the fundamental operations in linear algebra.
This section discusses diagonalization of square matrices in depth.
Deﬁnition 1.31 [Diagonalizable matrix] An n × n matrix A is
said to be diagonalizable if it is similar to a diagonal matrix.
In other words there exists an n × n non-singular matrix P such
that D = P−1
AP is a diagonal matrix. If this happens then we
say that P diagonalizes A or A is diagonalized by P.
Remark.
D = P−1
AP ⇐⇒ PD = AP ⇐⇒ PDP−1
= A. (1.6.16)
We note that if we restrict to real matrices, then U and D should
also be real. If A ∈ Cn×n
(it may still be real) then P and D can be
complex.
The next theorem is the culmination of a variety of results studied so
far.
Theorem 1.78 [Properties of diagonalizable matrices] Let A be a
diagonalizable matrix with D = P−1
AP being its diagonalization.
Let D = diag(d1, d2, . . . , dn). Then the following hold
(a) rank(A) = rank(D) which equals the number of non-zero en-
tries on the main diagonal of D.
(a) det(A) = d1d2 . . . dn.
(a) tr(A) = d1 + d2 + . . . dn.
(a) The characteristic polynomial of A is
p(λ) = (−1)n
(λ − d1)(λ − d2) . . . (λ − dn).
(a) The spectrum of A comprises the distinct scalars on the diag-
onal entries in D.

(a) The (not necessarily distinct) eigenvalues of A are the diagonal
elements of D.
(a) The columns of P are (linearly independent) eigenvectors of
A.
(a) The algebraic and geometric multiplicities of an eigenvalue λ
of A equal the number of diagonal elements of D that equal λ.
Proof. From deﬁnition 1.31 we note that D and A are similar.
Due to lemma 1.48
det(A) = det(D).
Due to lemma 1.47
det(D) =
n
i=1
di.
Now due to lemma 1.39
tr(A) = tr(D) =
n
i=1
di.
Further due to lemma 1.69 the characteristic polynomial and spectrum
of A and D are same. Due to lemma 1.67 the eigen values of D are
nothing but its diagonal entries. Hence they are also the eigen values
of A.
D = P−1
AP =⇒ AP = PD.
Now writing
P = p1 p2 . . . pn
we have
AP = Ap1 Ap2 . . . Apn = PD = d1p1 d2p2 . . . dnpn .
Thus pi are eigen vectors of A.
Since the characteristic polynomials of A and D are same, hence the
algebraic multiplicities of eigen values are same.
From lemma 1.71 we get that there is a one to one correspondence
between the eigen vectors of A and D through the change of basis

given by P. Thus the linear independence relationships between the
eigen vectors remain the same. Hence the geometric multiplicities of
individual eigenvalues are also the same.
So far we have verified various results which are available if a matrix A
is diagonalizable. We haven’t yet identified the conditions under which
A is diagonalizable. We note that not every matrix is diagonalizable.
The following theorem gives necessary and sufficient conditions under
which a matrix is diagonalizable.
Theorem 1.79 An n × n matrix A is diagonalizable by an n × n
non-singular matrix P if and only if the columns of P are (linearly
independent) eigenvectors of A.
Proof. We note that since P is non-singular hence columns of P
have to be linearly independent.
The necessary condition part was proven in theorem 1.78. We now
show that if P consists of n linearly independent eigen vectors of A
then A is diagonalizable.
Let the columns of P be p1, p2, . . . , pn and corresponding (not neces-
sarily distinct) eigen values be d1, d2, . . . , dn. Then
Api = dipi.
Thus by letting D = diag(d1, d2, . . . , dn), we have
AP = PD.
Now since columns of P are linearly independent, hence P is invertible.
This gives us
D = P−1
AP.
Thus A is similar to a diagonal matrix D. This validates the sufficient
condition.

A corollary follows.
Corollary 1.80. An n×n matrix is diagonalizable if and only if there
exists a linearly independent set of n eigenvectors of A.
Now we know that geometric multiplicities of eigen values of A provide
us information about linearly independent eigenvectors of A.
Corollary 1.81. Let A be an n × n matrix. Let λ1, λ2, . . . , λk be its k
distinct eigen values (comprising its spectrum). Let gj be the geometric
multiplicity of λj.Then A is diagonalizable if and only if
n
i=1
gi = n. (1.6.17)
1.6.6. Symmetric matrices
This subsection is focused on real symmetric matrices.
Following is a fundamental property of real symmetric matrices.
Theorem 1.82 Every real symmetric matrix has an eigen value.
The proof of this result is beyond the scope of this book.
Lemma 1.83 Let A be an n×n real symmetric matrix. Let λ1 and
λ2 be any two distinct eigen values of A and let x1 and x2 be any
two corresponding eigen vectors. Then x1 and x2 are orthogonal.
Proof. By deﬁnition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xT
2 Ax1 = λ1xT
2 x1
=⇒ xT
1 AT
x2 = λ1xT
1 x2
=⇒ xT
1 Ax2 = λ1xT
1 x2
=⇒ xT
1 λ2x2 = λ1xT
1 x2
=⇒ (λ1 − λ2)xT
1 x2 = 0
=⇒ xT
1 x2 = 0.

Thus x1 and x2 are orthogonal. In between we took transpose on both
sides, used the fact that A = AT
and λ1 − λ2 = 0.
Deﬁnition 1.32 [Orthogonally diagonalizable matrix] A real n×n
matrix A is said to be orthogonally diagonalizable if there
exists an orthogonal matrix U which can diagonalize A, i.e.
D = UT
AU
is a real diagonal matrix.
Lemma 1.84 Every orthogonally diagonalizable matrix A is sym-
metric.
Proof. We have a diagonal matrix D such that
A = UDUT
.
Taking transpose on both sides we get
AT
= UDT
UT
= UDUT
= A.
Thus A is symmetric.
Theorem 1.85 Every symmetric matrix A is orthogonally diago-
nalizable.
We skip the proof of this theorem.
1.6.7. Hermitian matrices
Following is a fundamental property of Hermitian matrices.
Theorem 1.86 Every Hermitian matrix has an eigen value.
The proof of this result is beyond the scope of this book.

Lemma 1.87 The eigenvalues of a Hermitian matrix are real.
Proof. Let A be a Hermitian matrix and let λ be an eigen value
of A. Let u be a corresponding eigen vector. Then
Au = λu
=⇒ uH
AH
= uH
λ
=⇒ uH
AH
u = uH
λu
=⇒ uH
Au = λuH
u
=⇒ uH
λu = λuH
u
=⇒ u 2
2(λ − λ) = 0
=⇒ λ = λ
thus λ is real. We used the facts that A = AH
and u = 0 =⇒ u 2 =
0.
Lemma 1.88 Let A be an n × n complex Hermitian matrix. Let
λ1 and λ2 be any two distinct eigen values of A and let x1 and
x2 be any two corresponding eigen vectors. Then x1 and x2 are
orthogonal.
Proof. By deﬁnition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xH
2 Ax1 = λ1xH
2 x1
=⇒ xH
1 AH
x2 = λ1xH
1 x2
=⇒ xH
1 Ax2 = λ1xH
1 x2
=⇒ xH
1 λ2x2 = λ1xH
1 x2
=⇒ (λ1 − λ2)xH
1 x2 = 0
=⇒ xH
1 x2 = 0.
Thus x1 and x2 are orthogonal. In between we took conjugate transpose
on both sides, used the fact that A = AH
and λ1 − λ2 = 0.

Definition 1.33 [Unitary diagonalizable matrix] A complex n×n
matrix A is said to be unitary diagonalizable if there exists a
unitary matrix U which can diagonalize A, i.e.
D = UH
AU
is a complex diagonal matrix.
Lemma 1.89 Let A be a unitary diagonalizable matrix whose di-
agonalization D is real. Then A is Hermitian.
Proof. We have a real diagonal matrix D such that
A = UDUH
.
Taking conjugate transpose on both sides we get
AH
= UDH
UH
= UDUH
= A.
Thus A is Hermitian. We used the fact that DH
= D since D is
real.
Theorem 1.90 Every Hermitian matrix A is unitary diagonaliz-
able.
We skip the proof of this theorem. The theorem means that if A is
Hermitian then A = UΛUH
Definition 1.34 [Eigen value decomposition of a Hermitian ma-
trix] Let A be an n × n Hermitian matrix. Let λ1, . . . λn be its
eigen values such that |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Let
Λ = diag(λ1, . . . , λn).
Let U be a unit matrix consisting of orthonormal eigen vectors
corresponding to λ1, . . . , λn. Then The eigen value decomposition
of A is defined as
A = UΛUH
. (1.6.18)

If λi are distinct, then the decomposition is unique. If they are
not distinct, then
Remark. Let Λ be a diagonal matrix as in deﬁnition 1.34. Consider
some vector x ∈ Cn
.
xH
Λx =
n
i=1
λi|xi|2
. (1.6.19)
Now if λi ≥ 0 then
xH
Λx ≤ λ1
n
i=1
|xi|2
= λ1 x 2
2.
Also
xH
Λx ≥ λn
n
i=1
|xi|2
= λn x 2
2.
Lemma 1.91 Let A be a Hermitian matrix with non-negative eigen
values. Let λ1 be its largest and λn be its smallest eigen values.
λn x 2
2 ≤ xH
Ax ≤ λ1 x 2
2 ∀ x ∈ Cn
. (1.6.20)
Proof. A has an eigen value decomposition given by
A = UΛUH
.
Let x ∈ Cn
and let v = UH
x. Clearly x 2 = v 2. Then
xH
Ax = xH
UΛUH
x = vH
Λv.
From previous remark we have
λn v 2
2 ≤ vH
Λv ≤ λ1 v 2
2.
Thus we get
λn x 2
2 ≤ xH
Ax ≤ λ1 x 2
2.

1.6.8. Miscellaneous properties
This subsection lists some miscellaneous properties of eigen values of a
square matrix.
Lemma 1.92 λ is an eigen value of A if and only if λ + k is an
eigen value of A + kI. Moreover A and A + kI share the same
eigen vectors.
Proof.
Ax = λx
⇐⇒ Ax + kx = λx + kx
⇐⇒ (A + kI)x = (λ + k)x.
(1.6.21)
Thus λ is an eigen value of A with an eigen vector x if and only if λ+k
is an eigen vector of A + kI with an eigen vector x.
1.6.9. Diagonally dominant matrices
Definition 1.35 [Diagonally dominant matrix] Let A = [aij] be a
square matrix in Cn×n
. A is called diagonally dominant if
|aii| ≥
j=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is greater than or equal to the sum of absolute values of
all the off diagonal elements on that row.
Definition 1.36 [Strictly diagonally dominant matrix] Let A =
[aij] be a square matrix in Cn×n
. A is called strictly diagonally
dominant if
|aii| >
j=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is bigger than the sum of absolute values of all the off
diagonal elements on that row.

Example 1.2: Strictly diagonally dominant matrix Let us con-
sider
A =






−4 −2 −1 0
−4 7 2 0
3 −4 9 1
2 −1 −3 15






We can see that the strict diagonal dominance condition is satisﬁed for
each row as follows:
row 1 : | − 4| > | − 2| + | − 1| + |0| = 3
row 2 : |7| > | − 4| + |2| + |0| = 6
row 3 : |9| > |3| + | − 4| + |1| = 8
row 4 : |15| > |2| + | − 1| + | − 3| = 6
Strictly diagonally dominant matrices have a very special property.
They are always non-singular.
Theorem 1.93 Strictly diagonally dominant matrices are non-
singular.
Proof. Suppose that A is diagonally dominant and singular. Then
there exists a vector u ∈ Cn
with u = 0 such that
Au = 0. (1.6.22)
Let
u = u1 u2 . . . un
T
.
We ﬁrst show that every entry in u cannot be equal in magnitude. Let
us assume that this is so. i.e.
c = |u1| = |u2| = · · · = |un|.

Since u = 0 hence c = 0. Now for any row i in (1.6.22) , we have
n
j=1
aijuj = 0
=⇒
n
j=1
±aijc = 0
=⇒
n
j=1
±aij = 0
=⇒ aii =
j=i
±aij
=⇒ |aii| = |
j=i
±aij|
=⇒ |aii| ≤
j=i
|aij| using triangle inequality
but this contradicts our assumption that A is strictly diagonally dom-
inant. Thus all entries in u are not equal in magnitude.
Let us now assume that the largest entry in u lies at index i with
|ui| = c. Without loss of generality we can scale down u by c to
get another vector in which all entries are less than or equal to 1 in
magnitude while i-th entry is ±1. i.e. ui = ±1 and |uj| ≤ 1 for all
other entries.
Now from (1.6.22) we get for the i-th row
n
j=1
aijuj = 0
=⇒ ± aii =
j=i
ujaij
=⇒ |aii| ≤
j=i
|ujaij| ≤
j=i
|aij|
which again contradicts our assumption that A is strictly diagonally
dominant.
Hence strictly diagonally dominant matrices are non-singular.

1.6.10. Gershgorin’s theorem
We are now ready to examine Gershgorin’ theorem which provides very
useful bounds on the spectrum of a square matrix.
Theorem 1.94 Every eigen value λ of a square matrix A ∈ Cn×n
satisﬁes
|λ − aii| ≤
j=i
|aij| for some i ∈ {1, 2, . . . , n}. (1.6.23)
Proof. The proof is a straight forward application of non-singularity
of diagonally dominant matrices.
We know that for an eigen value λ, det(λI − A) = 0 i.e. the matrix
(λI − A) is singular. Hence it cannot be strictly diagonally dominant
due to theorem 1.93.
Thus looking at each row i of (λI − A) we can say that
|λ − aii| >
j=i
|aij|
cannot be true for all rows simultaneously. i.e. it must fail at least for
one row. This means that there exists at least one row i for which
|λ − aii| ≤
j=i
|aij|
holds true.
What this theorem means is pretty simple. Consider a disc in the
complex plane for the i-th row of A whose center is given by aii and
whose radius is given by r = j=i |aij| i.e. the sum of magnitudes of
all non-diagonal entries in i-th row.
There are n such discs corresponding to n rows in A. (1.6.23) means
that every eigen value must lie within the union of these discs. It
cannot lie outside.
This idea is crystallized in following deﬁnition.

1.7. SINGULAR VALUES 53
Definition 1.37 [Gershgorin’s disc] For i-th row of matrix A we
define the radius ri = j=i |aij| and the center ci = aii. Then the
set given by
Di = {z ∈ C : |z − aii| ≤ ri}
is called the i-th Gershgorin’s disc of A.
We note that the definition is equally valid for real as well as complex
matrices. For real matrices, the centers of disks lie on the real line. For
complex matrices, the centers may lie anywhere in the complex plane.
Clearly there is nothing magical about the rows of A. We can as well
consider the columns of A.
Theorem 1.95 Every eigen value of a matrix A must lie in a
Gershgorin disc corresponding to the columns of A where the Ger-
shgorin disc for j-th column is given by
Dj = {z ∈ C : |z − ajj| ≤ rj}
with
rj =
i=j
|aij|
Proof. We know that eigen values of A are same as eigen values of
AT
and columns of A are nothing but rows of AT
. Hence eigen values of
A must satisfy conditions in theorem 1.94 w.r.t. the matrix AT
. This
completes the proof.
1.7. Singular values
In previous section we saw diagonalization of square matrices which
resulted in an eigen value decomposition of the matrix. This matrix
factorization is very useful yet it is not applicable in all situations. In
particular, the eigen value decomposition is useless if the square matrix
is not diagonalizable or if the matrix is not square at all. Moreover,

the decomposition is particularly useful only for real symmetric or Her-
mitian matrices where the diagonalizing matrix is an F-unitary matrix
(see definition 1.23). Otherwise, one has to consider the inverse of the
diagonalizing matrix also.
Fortunately there happens to be another decomposition which applies
to all matrices and it involves just F-unitary matrices.
Definition 1.38 [Singular value] A non-negative real number σ is
a singular value for a matrix A ∈ Fm×n
if and only if there exist
unit-length vectors u ∈ Fm
and v ∈ Fn
such that
Av = σu (1.7.1)
and
AH
u = σv (1.7.2)
hold. The vectors u and v are called left-singular and right-
singular vectors for σ respectively.
We first present the basic result of singular value decomposition. We
will not prove this result completely although we will present proofs of
some aspects.
Theorem 1.96 For every A ∈ Fm×n
with k = min(m, n), there
exist two F-unitary matrices U ∈ Fm×m
and V ∈ Fn×n
and a
sequence of real numbers
σ1 ≥ σ2 ≥ · · · ≥ σk ≥ 0
such that
UH
AV = Σ (1.7.3)
where
Σ = diag(σ1, σ2, . . . , σk) ∈ Fm×n
.
The non-negative real numbers σi are the singular values of A as
per definition 1.38.

The sequence of real numbers σi doesn’t depend on the particular
choice of U and V .
Σ is rectangular with the same size as A. The singular values of A lie
on the principle diagonal of Σ. All other entries in Σ are zero.
It is certainly possible that some of the singular values are 0 themselves.
Remark. Since UH
AV = Σ hence
A = UΣV H
. (1.7.4)
Deﬁnition 1.39 [Singular value decomposition] The decomposi-
tion of a matrix A ∈ Fm×n
given by
A = UΣV H
(1.7.5)
is known as its singular value decomposition.
Remark. When F is R then the decomposition simpliﬁes to
UT
AV = Σ (1.7.6)
and
A = UΣV T
. (1.7.7)
Remark. Clearly there can be at most k = min(m, n) distinct singular
values of A.
Remark. We can also write
AV = UΣ. (1.7.8)
Remark. Let us expand
A = UΣV H
= u1 u2 . . . um σij






vH
1
vH
2
...
vH
n






=
m
i=1
n
j=1
σijuivH
j .

Remark. Alternatively, let us expand
Σ = UH
AV =






uH
1
uH
2
...
uH
m






A v1 v2 . . . vm = uH
i Avj
This gives us
σij = uH
i Avj. (1.7.9)
Following lemma verifies that Σ indeed consists of singular values of A
as per definition 1.38.
Lemma 1.97 Let A = UΣV H
be a singular value decomposition
of A. Then the main diagonal entries of Σ are singular values.
The first k = min(m, n) column vectors in U and V are left and
right singular vectors of A.
Proof. We have
AV = UΣ.
Let us expand R.H.S.
UΣ = m
j=1 uijσjk = [uikσk] = σ1u1 σ2u2 . . . σkuk 0 . . . 0
where 0 columns in the end appear n − k times.
Expanding the L.H.S. we get
AV = Av1 Av2 . . . Avn .
Thus by comparing both sides we get
Avi = σiui for 1 ≤ i ≤ k
and
Avi = 0 for k < i ≤ n.
Now let us start with
A = UΣV H
=⇒ AH
= V ΣH
UH
=⇒ AH
U = V ΣH
.

Let us expand R.H.S.
V ΣH
= n
j=1 vijσjk = [vikσk] = σ1v1 σ2v2 . . . σkvk 0 . . . 0
where 0 columns appear m − k times.
Expanding the L.H.S. we get
AH
U = AH
u1 AH
u2 . . . AH
um .
Thus by comparing both sides we get
AH
ui = σivi for 1 ≤ i ≤ k
and
AH
ui = 0 for k < i ≤ m.
We now consider the three cases.
For m = n, we have k = m = n. And we get
Avi = σiui, AH
ui = σivi for 1 ≤ i ≤ m
Thus σi is a singular value of A and ui is a left singular vector while vi
is a right singular vector.
For m < n, we have k = m. We get for ﬁrst m vectors in V
Avi = σiui, AH
ui = σivi for 1 ≤ i ≤ m.
Finally for remaining n − m vectors in V , we can write
Avi = 0.
They belong to the null space of A.
For m > n, we have k = n. We get for ﬁrst n vectors in U
Avi = σiui, AH
ui = σivi for 1 ≤ i ≤ n.
Finally for remaining m − n vectors in U, we can write
AH
ui = 0.

Lemma 1.98 ΣΣH
is an m × m matrix given by
ΣΣH
= diag(σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0)
where the number of 0’s following σ2
k is m − k.
Lemma 1.99 ΣH
Σ is an n × n matrix given by
ΣH
Σ = diag(σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0)
where the number of 0’s following σ2
k is n − k.
Lemma 1.100 [Rank and singular value decomposition] Let A ∈
Fm×n
have a singular value decomposition given by
A = UΣV H
.
Then
rank(A) = rank(Σ). (1.7.10)
In other words, rank of A is number of non-zero singular values of
A. Since the singular values are ordered in descending order in A
hence, the ﬁrst r singular values σ1, . . . , σr are non-zero.
Proof. This is a straight forward application of lemma 1.6 and
lemma 1.7. Further since only non-zero values in Σ appear on its main
diagonal hence its rank is number of non-zero singular values σi.
Corollary 1.101. Let r = rank(A). Then Σ can be split as a block
matrix
Σ =
Σr 0
0 0
(1.7.11)
where Σr is an r × r diagonal matrix of the non-zero singular values
diag(σ1, σ2, . . . , σr). All other sub-matrices in Σ are 0.

Lemma 1.102 The eigen values of Hermitian matrix AH
A ∈
Fn×n
are σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with n − k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AH
A = UΣV H H
UΣV H
= V ΣH
UH
UΣV H
= V ΣH
ΣV H
.
We note that AH
A is Hermitian. Hence AH
A is diagonalized by V and
the diagonalization of AH
A is ΣH
Σ. Thus the eigen values of AH
A are
σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with n − k 0’s after σ2
k.
Clearly
(AH
A)V = V (ΣH
Σ)
thus columns of V are the eigen vectors of AH
A.
Lemma 1.103 The eigen values of Hermitian matrix AAH
∈
Fm×m
are σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with m−k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AAH
= UΣV H
UΣV H H
= UΣV H
V ΣH
UH
= UΣΣH
UH
.
We note that AH
A is Hermitian. Hence AH
A is diagonalized by V and
the diagonalization of AH
A is ΣH
Σ. Thus the eigen values of AH
A are
σ2
1, σ2
2, . . . σ2
k, 0, 0, . . . 0 with m − k 0’s after σ2
k.
Clearly
(AAH
)U = U(ΣΣH
)
thus columns of U are the eigen vectors of AAH
.
Lemma 1.104 The Gram matrices AAH
and AH
A share the same
eigen values except for some extra 0s. Their eigen values are the
squares of singular values of A and some extra 0s. In other words

singular values of A are the square roots of non-zero eigen values
of the Gram matrices AAH
or AH
A.
1.7.1. The largest singular value
Lemma 1.105 For all u ∈ Fn
the following holds
Σu 2 ≤ σ1 u 2 (1.7.12)
Moreover for all u ∈ Fm
the following holds
ΣH
u 2 ≤ σ1 u 2 (1.7.13)
Proof. Let us expand the term Σu.









σ1 0 . . . . . . 0
0 σ2 . . . . . . 0
...
...
... . . . 0
0
... σk . . . 0
0 0
... . . . 0




















u1
u2
...
uk
...
un











=














σ1u1
σ2u2
...
σkuk
0
...
0














Now since σ1 is the largest singular value, hence
|σrui| ≤ |σ1ui| ∀ 1 ≤ i ≤ k.
Thus
n
i=1
|σ1ui|2
≥
n
i=1
|σiui|2
or
σ2
1 u 2
2 ≥ Σu 2
2.
The result follows.
A simpler representation of Σu can be given using corollary 1.101. Let
r = rank(A). Thus
Σ =
Σr 0
0 0

We split entries in u as u = [(u1, . . . , ur)(ur+1 . . . un)]T
. Then
Σu =


Σr u1 . . . ur
T
0 ur+1 . . . un
T

 = σ1u1 σ2u2 . . . σrur 0 . . . 0
T
Thus
Σu 2
2 =
r
i=1
|σiui|2
≤ σ1
r
i=1
|ui|2
≤ σ1 u 2
2.
2nd result can also be proven similarly.
Lemma 1.106 Let σ1 be the largest singular value of an m × n
matrix A. Then
Ax 2 ≤ σ1 x 2 ∀ x ∈ Fn
. (1.7.14)
Moreover
AH
x 2 ≤ σ1 x 2 ∀ x ∈ Fm
. (1.7.15)
Proof.
Ax 2 = UΣV H
x 2 = ΣV H
x 2
since U is unitary. Now from previous lemma we have
ΣV H
x 2 ≤ σ1 V H
x 2 = σ1 x 2
since V H
also unitary. Thus we get the result
Ax 2 ≤ σ1 x 2 ∀ x ∈ Fn
.
Similarly
AH
x 2 = V ΣH
UH
x 2 = ΣH
UH
x 2
since V is unitary. Now from previous lemma we have
ΣH
UH
x 2 ≤ σ1 UH
x 2 = σ1 x 2
since UH
also unitary. Thus we get the result
AH
x 2 ≤ σ1 x 2 ∀ x ∈ Fm
.

There is a direct connection between the largest singular value and
2-norm of a matrix (see section 1.8.6).
Corollary 1.107. The largest singular value of A is nothing but its
2-norm. i.e.
σ1 = max
u 2=1
Au 2.
1.7.2. SVD and pseudo inverse
Lemma 1.108 [Pseudo-inverse of Σ] Let A = UΣV H
and let r =
rank(A). Let σ1, . . . , σr be the r non-zero singular values of A.
Then the Moore-Penrose pseudo-inverse of Σ is an n × m matrix
Σ†
given by
Σ†
=
Σ−1
r 0
0 0
(1.7.16)
where Σr = diag(σ1, . . . , σr).
Essentially Σ†
is obtained by transposing Σ and inverting all its
non-zero (positive real) values.
Proof. Straight forward application of lemma 1.32.
Corollary 1.109. The rank of Σ and its pseudo-inverse Σ†
are same.
i.e.
rank(Σ) = rank(Σ†
). (1.7.17)
Proof. The number of non-zero diagonal entries in Σ and Σ†
are
same.
Lemma 1.110 Let A be an m × n matrix and let A = UΣV H
be
its singular value decomposition. Let Σ†
be the pseudo inverse of
Σ as per lemma 1.108. Then the Moore-Penrose pseudo-inverse of
A is given by
A†
= V Σ†
UH
. (1.7.18)

Proof. As usual we verify the requirements for a Moore-Penrose
pseudo-inverse as per deﬁnition 1.19. We note that since Σ†
is the
pseudo-inverse of Σ it already satisﬁes necessary criteria.
First requirement:
AA†
A = UΣV H
V Σ†
UH
UΣV H
= UΣΣ†
ΣV H
= UΣV H
= A.
Second requirement:
A†
AA†
= V Σ†
UH
UΣV H
V Σ†
UH
= V Σ†
ΣΣ†
UH
= V Σ†
UH
= A†
.
We now consider
AA†
= UΣV H
V Σ†
UH
= UΣΣ†
UH
.
Thus
AA† H
= UΣΣ†
UH H
= U ΣΣ† H
UH
= UΣΣ†
UH
= AA†
since ΣΣ†
is Hermitian.
Finally we consider
A†
A = V Σ†
UH
UΣV H
= V Σ†
ΣV H
.
Thus
A†
A
H
= V Σ†
ΣV H H
= V Σ†
Σ
H
V H
= V Σ†
ΣV H
= A†
A
since Σ†
Σ is also Hermitian.
Finally we can connect the singular values of A with the singular values
of its pseudo-inverse.
Corollary 1.111. The rank of any m × n matrix A and its pseudo-
inverse A†
are same. i.e.
rank(A) = rank(A†
). (1.7.19)
Proof. We have rank(A) = rank(Σ). Also its easy to verify that
rank(A†
) = rank(Σ†
). So using corollary 1.109 completes the proof.

Lemma 1.112 Let A be an m × n matrix and let A†
be its n × m
pseudo inverse as per lemma 1.110. Let r = rank(A) Let k =
min(m, n) denote the number of singular values while r denote the
number of non-singular values of A. Let σ1, . . . , σr be the non-zero
singular values of A. Then the number of singular values of A†
is
same as that of A and the non-zero singular values of A†
are
1
σ1
, . . . ,
1
σr
while all other k − r singular values of A†
are zero.
Proof. k = min(m, n) denotes the number of singular values for
both A and A†
. Since rank of A and A†
are same, hence the number
of non-zero singular values is same. Now look at
A†
= V Σ†
UH
where
Σ†
=
Σ−1
r 0
0 0
.
Clearly Σ−1
r = diag( 1
σ1
, . . . , 1
σr
).
Thus expanding the R.H.S. we can get
A†
=
r
i=1
1
σi
viuH
i
where vi and ui are first r columns of V and U respectively. If we
reverse the order of first r columns of U and V and reverse the first r
diagonal entries of Σ†
, the R.H.S. remains the same while we are able
to express A†
in the standard singular value decomposition form. Thus
1
σ1
, . . . , 1
σr
are indeed the non-zero singular values of A†
.
1.7.3. Full column rank matrices
In this subsection we consider some specific results related to singular
value decomposition of a full column rank matrix.

We will consider A to be an m × n matrix in Fm×n
with m ≥ n and
rank(A) = n. Let A = UΣV H
be its singular value decomposition.
From lemma 1.100 we observe that there are n non-zero singular values
of A. We will call these singular values as σ1, σ2, . . . , σn. We will deﬁne
Σn = diag(σ1, σ2, . . . , σn).
Clearly Σ is an 2 × 1 block matrix given by
Σ =
Σn
0
where the lower 0 is an (m − n) × n zero matrix. From here we obtain
that ΣH
Σ is an n × n matrix given by
ΣH
Σ = Σ2
n
where
Σ2
n = diag(σ2
1, σ2
2, . . . , σ2
n).
Lemma 1.113 Let A be a full column rank matrix with singular
value decomposition A = UΣV H
. Then ΣH
Σ = Σ2
n = diag(σ2
1, σ2
2, . . . , σ2
n)
and ΣH
Σ is invertible.
Proof. Since all singular values are non-zero hence Σ2
n is invert-
ible. Thus
ΣH
Σ
−1
= Σ2
n
−1
= diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
. (1.7.20)
. Let σ1 be its largest singular
value and σn be its smallest singular value. Then
σ2
n x 2 ≤ ΣH
Σx 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
. (1.7.21)

Proof. Let x ∈ Fn
. We have
ΣH
Σx 2
2 = Σ2
nx 2
2 =
n
i=1
|σ2
i xi|2
.
Now since
σn ≤ σi ≤ σ1
hence
σ4
n
n
i=1
|xi|2
≤
n
i=1
|σ2
i xi|2
≤ σ4
1
n
i=1
|xi|2
thus
σ4
n x 2
2 ≤ ΣH
Σx 2
2 ≤ σ4
1 x 2
2.
Applying square roots, we get
σ2
n x 2 ≤ ΣH
Σx 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
.
We recall from corollary 1.25 that the Gram matrix of its column vec-
tors G = AH
A is full rank and invertible.
σ2
n x 2 ≤ AH
Ax 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
. (1.7.22)
Proof.
AH
A = (UΣV H
)H
(UΣV H
) = V ΣH
ΣV H
.
Let x ∈ Fn
. Let
u = V H
x =⇒ u 2 = x 2.
Let
r = ΣH
Σu.
Then from previous lemma we have
σ2
n u 2 ≤ ΣH
Σu 2 = r 2 ≤ σ2
1 u 2.

Finally
AH
Ax = V ΣH
ΣV H
x = V r.
Thus
AH
Ax 2 = r 2.
Substituting we get
σ2
n x 2 ≤ AH
Ax 2 ≤ σ2
1 x 2 ∀ x ∈ Fn
.
There are bounds for the inverse of Gram matrix also. First let us
establish the inverse of Gram matrix.
. Let the singular values of A be
σ1, . . . , σn. Let the Gram matrix of columns of A be G = AH
A.
Then
G−1
= V ΨV H
where
Ψ = diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
.
Proof. We have
G = V ΣH
ΣV H
Thus
G−1
= V ΣH
ΣV H −1
= V H −1
ΣH
Σ
−1
V −1
= V ΣH
Σ
−1
V H
.
From lemma 1.113 we have
Ψ = ΣH
Σ
−1
= diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
.
We can now state the bounds:

1
σ2
1
x 2 ≤ AH
A
−1
x 2 ≤
1
σ2
n
x 2 ∀ x ∈ Fn
. (1.7.23)
Proof. From lemma 1.116 we have
G−1
= AH
A
−1
= V ΨV H
where
Ψ = diag
1
σ2
1
,
1
σ2
2
, . . . ,
1
σ2
n
.
Let x ∈ Fn
. Let
u = V H
x =⇒ u 2 = x 2.
Let
r = Ψu.
Then
r 2
2 =
n
i=1
1
σ2
i
ui
2
.
Thus
1
σ2
1
u 2 ≤ Ψu 2 = r 2 ≤
1
σ2
n
u 2.
Finally
AH
A
−1
x = V ΨV H
x = V r.
Thus
AH
A
−1
x 2 = r 2.
Substituting we get the result.

1.8. MATRIX NORMS 69
1.7.4. Low rank approximation of a matrix
Definition 1.40 An m × n matrix A is called low rank if
rank(A) min(m, n). (1.7.24)
Remark. A matrix is low rank if the number of non-zero singular
values for the matrix is much smaller than its dimensions.
Following is a simple procedure for making a low rank approximation
of a given matrix A.
(1) Perform the singular value decomposition of A given by A =
UΣV H
.
(2) Identify the singular values of A in Σ.
(3) Keep the first r singular values (where r min(m, n) is the
rank of the approximation). and set all other singular values
to 0 to obtain Σ.
(4) Compute A = UΣV H
.
1.8. Matrix norms
This section reviews various matrix norms on the vector space of com-
plex matrices over the field of complex numbers (Cm×n
, C).
We know (Cm×n
, C) is a finite dimensional vector space with dimension
mn. We will usually refer to it as Cm×n
.
Matrix norms will follow the usual definition of norms for a vector
space.
Definition 1.41 A function · : Cm×n
→ R is called a matrix
norm on Cm×n
if for all A, B ∈ Cm×n
and all α ∈ C it satisfies
the following
Positivity:
A ≥ 0

with A = 0 ⇐⇒ A = 0.
Homogeneity:
αA = |α| A .
Triangle inequality:
A + B ≤ A + B .
We recall some of the standard results on normed vector spaces.
All matrix norms are equivalent. Let · and · be two different
matrix norms on Cm×n
. Then there exist two constants a and b such
that the following holds
a A ≤ A ≤ b A ∀ A ∈ Cm×n
.
A matrix norm is a continuous function · : Cm×n
→ R.
1.8.1. Norms like lp on Cn
Following norms are quite like lp norms on finite dimensional complex
vector space Cn
. They are developed by the fact that the matrix vector
space Cm×n
has one to one correspondence with the complex vector
space Cmn
.
Definition 1.42 Let A ∈ Cm×n
and A = [aij].
Matrix sum norm is defined as
A S =
m
i=1
n
j=1
|aij| (1.8.1)
and A = [aij].
Matrix Frobenius norm is defined as
A F =
m
i=1
n
j=1
|aij|2
1
2
. (1.8.2)

and A = [aij].
Matrix Max norm is deﬁned as
A M = max
1≤i≤m
1≤j≤n
|aij|. (1.8.3)
1.8.2. Properties of Frobenius norm
We now prove some elementary properties of Frobenius norm.
Lemma 1.118 The Frobenius norm of a matrix is equal to the
Frobenius norm of its Hermitian transpose.
AH
F = A F . (1.8.4)
Proof. Let
A = [aij].
Then
AH
= [aji]
AH 2
F =
n
j=1
m
i=1
|aij|2
=
m
i=1
n
j=1
|aij|2
= A 2
F .
Now
AH 2
F = A 2
F =⇒ AH
F = A F
Lemma 1.119 Let A ∈ Cm×n
be written as a row of column vec-
tors
A = a1 . . . an .
Then
A 2
F =
n
j=1
aj
2
2. (1.8.5)

Proof. We note that
aj
2
2 =
m
i=1
aij
2
2.
Now
A 2
F =
m
i=1
n
j=1
|aij|2
=
n
j=1
m
i=1
|aij|2
=
n
j=1
aj
2
2 .
We thus showed that that the square of the Frobenius norm of a matrix
is nothing but the sum of squares of l2 norms of its columns.
be written as a column of row vec-
tors
A =




a1
...
am



 .
Then
A 2
F =
m
i=1
ai 2
2. (1.8.6)
Proof. We note that
ai 2
2 =
n
j=1
aij
2
2.
Now
A 2
F =
m
i=1
n
j=1
|aij|2
=
m
i=1
ai 2
2.
We now consider how the Frobenius norm is aﬀected with the action
of unitary matrices.
Let A be any arbitrary matrix in Cm×n
. Let U be some unitary matrices
in Cm×m
. Let V be some unitary matrices in Cn×n
.

We present our ﬁrst result that multiplication with unitary matrices
doesn’t change Frobenius norm of a matrix.
Theorem 1.121 The Frobenius norm of a matrix is invariant to
pre or post multiplication by a unitary matrix. i.e.
UA F = A F (1.8.7)
and
AV F = A F . (1.8.8)
Proof. We can write A as
A = a1 . . . an .
So
UA = Ua1 . . . Uan .
Then applying lemma 1.119 clearly
UA 2
F =
n
j=1
Uaj
2
2.
But we know that unitary matrices are norm preserving. Hence
Uaj
2
2 = aj
2
2.
Thus
UA 2
F =
n
j=1
aj
2
2 = A 2
F
which implies
UA F = A F .
Similarly writing A as

A =




r1
...
rm



 .
we have
AV =




r1V
...
rmV



 .
Then applying lemma 1.120 clearly
AV 2
F =
m
i=1
riV 2
2.
But we know that unitary matrices are norm preserving. Hence
riV 2
2 = ri
2
2.
Thus
AV 2
F =
m
i=1
ri
2
2 = A 2
F
which implies
AV F = A F .
An alternative approach for the 2nd part of the proof using the ﬁrst
part is just one line
AV F = (AV )H
F = V H
AH
F = AH
F = A F .
In above we use lemma 1.118 and the fact that V is a unitary matrix
implies that V H
is also a unitary matrix. We have already shown that
pre multiplication by a unitary matrix preserves Frobenius norm.
Theorem 1.122 Let A ∈ Cm×n
and B ∈ Cn×P
be two matrices.
Then the Frobenius norm of their product is less than or equal to

the product of Frobenius norms of the matrices themselves. i.e.
AB F ≤ A F B F . (1.8.9)
Proof. We can write A as
A =




aT
1
...
aT
m




where ai are m column vectors corresponding to rows of A. Similarly
we can write B as
B = b1 . . . bP
where bi are column vectors corresponding to columns of B. Then
AB =




aT
1
...
aT
m



 b1 . . . bP =




aT
1 b1 . . . aT
1 bP
...
...
...
aT
mb1 . . . aT
mbP



 = aT
i bj .
Now looking carefully
aT
i bj = ai, bj
Applying the Cauchy-Schwartz inequality we have
| ai, bj |2
≤ ai
2
2 bj
2
2 = ai
2
2 bj
2
2
Now
AB 2
F =
m
i=1
P
j=1
|aT
i bj|2
≤
m
i=1
P
j=1
ai
2
2 bj
2
2
=
m
i=1
ai
2
2
P
j=1
bj
2
2
= A 2
F B 2
F
which implies
AB F ≤ A F B F
by taking square roots on both sides.

Corollary 1.123. Let A ∈ Cm×n
and let x ∈ Cn
. Then
Ax 2 ≤ A F x 2.
Proof. We note that Frobenius norm for a column matrix is same
as l2 norm for corresponding column vector. i.e.
x F = x 2 ∀ x ∈ Cn
.
Now applying theorem 1.122 we have
Ax 2 = Ax F ≤ A F x F = A F x 2 ∀ x ∈ Cn
.
It turns out that Frobenius norm is intimately related to the singular
value decomposition of a matrix.
. Let the singular value decomposi-
tion of A be given by
A = UΣV H
.
Let the singular value of A be σ1, . . . , σn. Then
A F =
n
i=1
σ2
i . (1.8.10)
Proof.
A = UΣV H
=⇒ A F = UΣV H
F .
But
UΣV H
F = ΣV H
F = Σ F
since U and V are unitary matrices (see theorem 1.121 ).
Now the only non-zero terms in Σ are the singular values. Hence
A F = Σ F =
n
i=1
σ2
i .

1.8.3. Consistency of a matrix norm
Definition 1.45 A matrix norm · is called consistent on Cn×n
if
AB ≤ A B (1.8.11)
holds true for all A, B ∈ Cn×n
. A matrix norm · is called
consistent if it is defined on Cm×n
for all m, n ∈ N and eq (1.8.11)
holds for all matrices A, B for which the product AB is defined.
A consistent matrix norm is also known as a sub-multiplicative
norm.
With this definition and results in theorem 1.122 we can see that Frobe-
nius norm is consistent.
1.8.4. Subordinate matrix norm
A matrix operates on vectors from one space to generate vectors in
another space. It is interesting to explore the connection between the
norm of a matrix and norms of vectors in the domain and co-domain
of a matrix.
Definition 1.46 Let m, n ∈ N be given. Let · α be some norm
on Cm
and · β be some norm on Cn
. Let · be some norm on
matrices in Cm×n
. We say that · is subordinate to the vector
norms · α and · β if
Ax α ≤ A x β (1.8.12)
for all A ∈ Cm×n
and for all x ∈ Cn
. In other words the length of
the vector doesn’t increase by the operation of A beyond a factor
given by the norm of the matrix itself.
If · α and · β are same then we say that · is subordinate
to the vector norm · α.

We have shown earlier in corollary 1.123 that Frobenius norm is sub-
ordinate to Euclidean norm.
1.8.5. Operator norm
We now consider the maximum factor by which a matrix A can increase
the length of a vector.
Definition 1.47 Let m, n ∈ N be given. Let · α be some norm
on Cn
and · β be some norm on Cm
. For A ∈ Cm×n
we define
A A α→β max
x=0
Ax β
x α
. (1.8.13)
Ax β
x α
represents the factor with which the length of x increased
by operation of A. We simply pick up the maximum value of such
scaling factor.
The norm as defined above is known as (α → β) operator norm,
the (α → β)-norm, or simply the α-norm if α = β.
Off course we need to verify that this definition satisfies all properties
of a norm.
Clearly if A = 0 then Ax = 0 always, hence A = 0.
Conversely, if A = 0 then Ax β = 0 ∀ x ∈ Cn
. In particular this is
true for the unit vectors ei ∈ Cn
. The i-th column of A is given by Aei
which is 0. Thus each column in A is 0. Hence A = 0.
Now consider c ∈ C.
cA = max
x=0
cAx β
x α
= |c|max
x=0
Ax β
x α
= |c| A .
We now present some useful observations on operator norm before we
can prove triangle inequality for operator norm.
For any x ∈ ker(A), Ax = 0 hence we only need to consider vectors
which don’t belong to the kernel of A.

Thus we can write
A α→β = max
x/∈ker(A)
Ax β
x α
. (1.8.14)
We also note that
Acx β
cx α
=
|c| Ax β
|c| x α
=
Ax β
x α
∀ c = 0, x = 0.
Thus, it is sufficient to find the maximum on unit norm vectors:
A α→β = max
x α=1
Ax β.
Note that since x α = 1 hence the term in denominator goes away.
Lemma 1.125 The (α → β)-operator norm is subordinate to vec-
tor norms · α and · β. i.e.
Ax β ≤ A α→β x α. (1.8.15)
Proof. For x = 0 the inequality is trivially satisfied. Now for
x = 0 by definition, we have
A α→β ≥
Ax β
x α
=⇒ A α→β x α ≥ Ax β.
Remark. There exists a vector x∗
∈ Cn
with unit norm ( x∗
α = 1)
such that
A α→β = Ax∗
β. (1.8.16)
Proof. Let x = 0 be some vector which maximizes the expression
Ax β
x α
.
Then
A α→β =
Ax β
x α
.
Now consider x∗
= x
x α
. Thus x∗
α = 1. We know that
Ax β
x α
= Ax∗
β.

Hence
A α→β = Ax∗
β.
We are now ready to prove triangle inequality for operator norm.
Lemma 1.126 Operator norm as defined in definition 1.47 satis-
fies triangle inequality.
Proof. Let A and B be some matrices in Cm×n
. Consider the
operator norm of matrix A + B. From previous remarks, there exists
some vector x∗
∈ Cn
with x∗
α = 1 such that
A + B = (A + B)x∗
β.
Now
(A + B)x∗
β = Ax∗
+ Bx∗
β ≤ Ax∗
β + Bx∗
β.
From another remark we have
Ax∗
β ≤ A x∗
α = A
and
Bx∗
β ≤ B x∗
α = B
since x∗
α = 1.
Hence we have
A + B ≤ A + B .
It turns out that operator norm is also consistent under certain condi-
tions.

Lemma 1.127 Let · α be deﬁned over all m ∈ N. Let · β =
· α. Then the operator norm
A α = max
x=0
Ax α
x α
is consistent.
Proof. We need to show that
AB α ≤ A α B α.
Now
AB α = max
x=0
ABx α
x α
.
We note that if Bx = 0, then ABx = 0. Hence we can rewrite as
AB α = max
Bx=0
ABx α
x α
.
Now if Bx = 0 then Bx α = 0. Hence
ABx α
x α
=
ABx α
Bx α
Bx α
x α
and
max
Bx=0
ABx α
x α
≤ max
Bx=0
ABx α
Bx α
max
Bx=0
Bx α
x α
.
Clearly
B α = max
Bx=0
Bx α
x α
.
Furthermore
max
Bx=0
ABx α
Bx α
≤ max
y=0
Ay α
y α
= A α.
Thus we have

1.8.6. p-norm for matrices
We recall the definition of lp norms for vectors x ∈ Cn
from (??)
x p =



( n
i=1 |x|p
i )
1
p p ∈ [1, ∞)
max
1≤i≤n
|xi| p = ∞
.
The operator norms · p defined from lp vector norms are of specific
interest.
Definition 1.48 The p-norm for a matrix A ∈ Cm×n
is defined as
A p max
x=0
Ax p
x p
= max
x p=1
Ax p (1.8.17)
where x p is the standard lp norm for vectors in Cm
and Cn
.
Remark. As per lemma 1.127 p-norms for matrices are consistent
norms. They are also sub-ordinate to lp vector norms.
Special cases are considered for p = 1, 2 and ∞.
Theorem 1.128 Let A ∈ Cm×n
.
For p = 1 we have
A 1 max
1≤j≤n
m
i=1
|aij|. (1.8.18)
This is also known as max column sum norm.
For p = ∞ we have
A ∞ max
1≤i≤m
n
j=1
|aij|. (1.8.19)
This is also known as max row sum norm.
Finally for p = 2 we have
A 2 σ1 (1.8.20)

where σ1 is the largest singular value of A. This is also known as
spectral norm.
Proof. Let
A = a1
. . . , an .
Then
Ax 1 =
n
j=1
xjaj
1
≤
n
j=1
xjaj
1
=
n
j=1
|xj| aj
1
≤ max
1≤j≤n
aj
1
n
j=1
|xj|
= max
1≤j≤n
aj
1 x 1.
Thus,
A 1 = max
x=0
Ax 1
x 1
≤ max
1≤j≤n
aj
1
which the maximum column sum. We need to show that this upper
bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
Aej 1 = aj
1.
Thus
A 1 ≥ aj
1 ∀ 1 ≤ j ≤ n.
Combining the two, we see that
A 1 = max
1≤j≤n
aj
1.

For p = ∞, we proceed as follows:
Ax ∞ = max
1≤i≤m
n
j=1
aijxj
≤ max
1≤i≤m
n
j=1
|aij||xj|
≤ max
1≤j≤n
|xj| max
1≤i≤m
n
j=1
|aij|
= x ∞ max
1≤i≤m
ai
1
where ai
are the rows of A.
This shows that
Ax ∞ ≤ max
1≤i≤m
ai
1.
We need to show that this is indeed an equality.
Fix an i = k and choose x such that
xj = sgn(akj).
Clearly x ∞ = 1.
Then
Ax ∞ = max
1≤i≤m
n
j=1
aijxj
≥
n
j=1
akjxj
=
n
j=1
|akj|
=
n
j=1
|akj|
= ak
1.
Thus,
A ∞ ≥ max
1≤i≤m
ai
1

Combining the two inequalities we get:
A ∞ = max
1≤i≤m
ai
1.
Remaining case is for p = 2.
For any vector x with x 2 = 1,
Ax 2 = UΣV H
x 2 = U(ΣV H
x) 2 = ΣV H
x 2
since l2 norm is invariant to unitary transformations.
Let v = V H
x. Then v 2 = V H
x 2 = x 2 = 1.
Now
Ax 2 = Σv 2
=
n
j=1
|σjvj|2
1
2
≤ σ1
n
j=1
|vj|2
1
2
= σ1 v 2 = σ1.
This shows that
A 2 ≤ σ1.
Now consider some vector x such that v = (1, 0, . . . , 0). Then
Ax 2 = Σv 2 = σ1.
Thus
A 2 ≥ σ1.
Combining the two, we get that A 2 = σ1.
1.8.7. The 2-norm
Theorem 1.129 Let A ∈ Cn×n
has singular values σ1 ≥ σ2 ≥
· · · ≥ σn. Let the eigen values for A be λ1, λ2, . . . , λn with |λ1| ≥
|λ2| ≥ · · · ≥ |λn|. Then the following hold
A 2 = σ1 (1.8.21)

and if A is non-singular
A−1
2 =
1
σn
. (1.8.22)
If A is symmetric and positive deﬁnite, then
A 2 = λ1 (1.8.23)
A−1
2 =
1
λn
. (1.8.24)
If A is normal then
A 2 = |λ1| (1.8.25)
A−1
2 =
1
|λn|
. (1.8.26)
1.8.8. Unitary invariant norms
Deﬁnition 1.49 A matrix norm · on Cm×n
is called unitary
invariant if UAV = A for any A ∈ Cm×n
and any unitary
matrices U ∈ Cm×m
and V ∈ Cn×n
.
We have already seen in theorem 1.121 that Frobenius norm is unitary
invariant.
It turns out that spectral norm is also unitary invariant.
1.8.9. More properties of operator norms
In this section we will focus on operator norms connecting normed
linear spaces (Cn
, · p) and (Cm
, · q). Typical values of p, q would
be in {1, 2, ∞}.
We recall that
A p→q = max
x=0
Ax q
x p
= max
x p=1
Ax q = max
x p≤1
Ax q. (1.8.27)

Table 1[[5]] shows how to compute diﬀerent (p, q) norms. Some can be
computed easily while others are NP-hard to compute.
Table 1. Typical (p → q) norms
p q A p→q Calculation
1 1 A 1 Maximum l1 norm of a column
1 2 A 1→2 Maximum l2 norm of a column
1 ∞ A 1→∞ Maximum absolute entry of a matrix
2 1 A 2→1 NP hard
2 2 A 2 Maximum singular value
2 ∞ A 2→∞ Maximum l2 norm of a row
∞ 1 A ∞→1 NP hard
∞ 2 A ∞→2 NP hard
∞ ∞ A ∞ Maximum l1-norm of a row
The topological dual of the ﬁnite dimensional normed linear space
(Cn
, · p) is the normed linear space (Cn
, · p ) where
1
p
+
1
p
= 1.
l2-norm is dual of l2-norm. It is a self dual. l1 norm and l∞-norm are
dual of each other.
When a matrix A maps from the space (Cn
, · p) to the space (Cm
, ·
q), we can view its conjugate transpose AH
as a mapping from the
space (Cm
, · q ) to (Cn
, · p ).
Theorem 1.130 Operator norm of a matrix always equals the op-
erator norm of its conjugate transpose. i.e.
A p→q = AH
q →p (1.8.28)
where
1
p
+
1
p
= 1,
1
q
+
1
q
= 1.

Speciﬁc applications of this result are:
A 2 = AH
2. (1.8.29)
This is obvious since the maximum singular value of a matrix and its
conjugate transpose are same.
A 1 = AH
∞, A ∞ = AH
1. (1.8.30)
This is also obvious since max column sum of A is same as the max
row sum norm of AH
and vice versa.
A 1→∞ = AH
1→∞. (1.8.31)
A 1→2 = AH
2→∞. (1.8.32)
A ∞→2 = AH
2→1. (1.8.33)
We now need to show the result for the general case (arbitrary 1 ≤
p, q ≤ ∞).
Proof. TODO
Theorem 1.131
A 1→p = max
1≤j≤n
aj
p. (1.8.34)
where
A = a1
. . . , an .

Proof.
Ax p =
n
j=1
xjaj
p
≤
n
j=1
xjaj
p
=
n
j=1
|xj| aj
p
≤ max
1≤j≤n
aj
p
n
j=1
|xj|
= max
1≤j≤n
aj
p x 1.
Thus,
A 1→p = max
x=0
Ax p
x 1
≤ max
1≤j≤n
aj
p.
We need to show that this upper bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
Aej p = aj
p.
Thus
A 1→p ≥ aj
p ∀ 1 ≤ j ≤ n.
Combining the two, we see that
A 1→p = max
1≤j≤n
aj
p.
Theorem 1.132
A p→∞ = max
1≤i≤m
ai
q (1.8.35)
where
1
p
+
1
q
= 1.

Proof. Using theorem 1.130, we get
A p→∞ = AH
1→q.
Using theorem 1.131, we get
AH
1→q = max
1≤i≤m
ai
q.
Theorem 1.133 For two matrices A and B and p ≥ 1, we have
AB p→q ≤ B p→s A s→q. (1.8.36)
Proof. We start with
AB p→q = max
x p=1
A(Bx) q.
From lemma 1.125, we obtain
A(Bx) q ≤ A s→q (Bx) s.
Thus,
AB p→q ≤ A s→q max
x p=1
(Bx) s = A s→q B p→s.
Theorem 1.134 For two matrices A and B and p ≥ 1, we have
AB p→∞ ≤ A ∞→∞ B p→∞. (1.8.37)
Proof. We start with
AB p→∞ = max
x p=1
A(Bx) ∞.
From lemma 1.125, we obtain
A(Bx) ∞ ≤ A ∞→∞ (Bx) ∞.
Thus,
AB p→∞ ≤ A ∞→∞ max
x p=1
(Bx) ∞ = A ∞→∞ B p→∞.

Theorem 1.135
A p→∞ ≤ A p→p. (1.8.38)
In particular
A 1→∞ ≤ A 1. (1.8.39)
A 2→∞ ≤ A 2. (1.8.40)
Proof. Choosing q = ∞ and s = p and applying theorem 1.133
IA p→∞ ≤ A p→p I p→∞.
But I p→∞ is the maximum lp norm of any row of I which is 1. Thus
A p→∞ ≤ A p→p.
Consider the expression
min
z∈C(AH )
z=0
Az q
z p
. (1.8.41)
z ∈ C(AH
), z = 0 means there exists some vector u /∈ ker(AH
) such
that z = AH
u.
This expression measures the factor by which the non-singular part of
A can decrease the length of a vector.
Theorem 1.136 [5] The following bound holds for every matrix
A:
min
z∈C(AH )
z=0
Az q
z p
≥ A† −1
q,p. (1.8.42)
If A is surjective (onto), then the equality holds. When A is bijec-
tive (one-one onto, square, invertible), then the result implies
min
z∈C(AH )
z=0
Az q
z p
= A−1 −1
q,p. (1.8.43)

Proof. The spaces C(AH
) and C(A) have same dimensions given
by rank(A). We recall that A†
A is a projector onto the column space
of A.
w = Az ⇐⇒ z = A†
w = A†
Az ∀ z ∈ C(AH
).
As a result we can write
z p
Az q
=
A†
w p
w q
whenever z ∈ C(AH
). Now

 min
z∈C(AH )
z=0
Az q
z p


−1
= max
z∈C(AH )
z=0
z p
Az q
= max
w∈C(A)
w=0
A†
w p
w q
≤ max
w=0
A†
w p
w q
.
When A is surjective, then C(A) = Cm
. Hence
max
w∈C(A)
w=0
A†
w p
w q
= max
w=0
A†
w p
w q
.
Thus, the inequality changes into equality. Finally
max
w=0
A†
w p
w q
= A†
q→p
which completes the proof.
1.8.10. Row column norms
Definition 1.50 Let A be an m × n matrix with rows ai
as
A =




a1
...
am




Then we define
A p,∞ max
1≤i≤m
ai
p = max
1≤i≤m
n
j=1
|ai
j|p
1
p
(1.8.44)
where 1 ≤ p < ∞. i.e. we take p-norms of all row vectors and
then find the maximum.

We define
A ∞,∞ = max
i,j
|aij|. (1.8.45)
This is equivalent to taking l∞ norm on each row and then taking
the maximum of all the norms.
For 1 ≤ p, q < ∞, we define the norm
A p,q
m
i=1
ai
p
q
1
q
. (1.8.46)
i.e., we compute p-norm of all the row vectors to form another
vector and then take q-norm of that vector.
Note that the norm A p,∞ is different from the operator norm A p→∞.
Similarly A p,q is different from A p→q.
Theorem 1.137
A p,∞ = A q→∞ (1.8.47)
where
1
p
+
1
q
= 1.
Proof. From theorem 1.132 we get
A q→∞ = max
1≤i≤m
ai
p.
This is exactly the definition of A p,∞.
Theorem 1.138
A 1→p = A p,∞. (1.8.48)
Proof.
A 1→p = AH
q→∞.
From theorem 1.137
AH
q→∞ = AH
p,∞.

Theorem 1.139 For any two matrices A, B, we have
AB p,∞
B p,∞
≤ A ∞→∞. (1.8.49)
Proof. Let q be such that 1
p
+ 1
q
= 1. From theorem 1.134, we
have
AB q→∞ ≤ A ∞→∞ B q→∞.
From theorem 1.137
AB q→∞ = AB p,∞
and
B q→∞ = B p,∞.
Thus
AB p,∞ ≤ A ∞→∞ B p,∞.
Theorem 1.140 Relations between (p, q) norms and (p → q)
norms
A 1,∞ = A ∞→∞ (1.8.50)
A 2,∞ = A 2→∞ (1.8.51)
A ∞,∞ = A 1→∞ (1.8.52)
A 1→1 = AH
1,∞ (1.8.53)
A 1→2 = AH
2,∞ (1.8.54)
(1.8.55)
Proof. The ﬁrst three are straight forward applications of theo-
rem 1.137. The next two are applications of theorem 1.138. See also
table 1.

1.8.11. Block diagonally dominant matrices and generalized
Gershgorin disc theorem
In [1] the idea of diagonally dominant matrices (see section 1.6.9) has
been generalized to block matrices using matrix norms. We consider
the specific case with spectral norm.
Definition 1.51 [Block diagonally dominant matrix] Let A be a
square matrix in Cn×n
which is partitioned in following manner
A =






A11 A12 . . . A1k
A21 A22 . . . A2k
...
...
...
...
Ak1 Ak2 . . . Akk






(1.8.56)
where each of the submatrices Aij is a square matrix of size m×m.
Thus n = km.
A is called block diagonally dominant if
Aii 2 ≥
j=i
Aij 2.
holds true for all 1 ≤ i ≤ n. If the inequality satisfies strictly
for all i, then A is called block strictly diagonally dominant
matrix.
Theorem 1.141 If the partitioned matrix A of definition 1.51 is
block strictly diagonally dominant matrix, then it is nonsingular.
For proof see [1].
This leads to the generalized Gershgorin disc theorem.

Theorem 1.142 Let A be a square matrix in Cn×n
which is par-
titioned in following manner
A =






A11 A12 . . . A1k
A21 A22 . . . A2k
...
...
...
...
Ak1 Ak2 . . . Akk






(1.8.57)
where each of the submatrices Aij is a square matrix of size m×m.
Then each eigenvalue λ of A satisfies
λI − Aii 2 ≤
j=i
Aij for some i ∈ {1, 2, . . . , n}. (1.8.58)
For proof see [1].
Since the 2-norm of a positive semidefinite matrix is nothing but its
largest eigen value, the theorem directly applies.
Corollary 1.143. Let A be a Hermitian positive semidefinite matrix.
Let A be partitioned as in theorem 1.142. Then its 2-norm A 2 satis-
fies
| A 2 − Aii 2| ≤
j=i
Aij for some i ∈ {1, 2, . . . , n}. (1.8.59)
1.9. Miscellaneous topics
1.9.1. Hadamard product
Usually standard linear algebra books don’t dwell much about element-
wise or component wise products of vectors or matrices. Yet in certain
contexts and algorithms, this is quite useful. We define the notation in
this section. For further details see [3], [2] and [4].
Definition 1.52 The Hadamard product of two matrices A =
[aij] and B = [bij] with same dimensions (not necessarily square)

1.10. DIGEST 97
with entries in a given ring R is the entry-wise product A ◦ B ≡
[aijbij], which has the same dimensions as A and B.
Example 1.3: Hadamard product Let
A =
1 2
3 4
and B =
5 −6
7 −3
Then
A ◦ B =
5 −12
21 −12
The Hardamard product is associative and distributive. It is also com-
mutative.
Naturally this can also be deﬁned for column vectors and row vectors
also.
The reason why this product is not mentioned in linear algebra texts
is because it is inherently basis dependent. But this product has a
number of uses in statistics and analysis.
In analysis, a similar concept is point-wise product which is deﬁned
to be
(f.g)(x) = f(x)g(x).
1.10. Digest
1.10.1. Norms
All norms are equivalent.
Sum norm
A S =
m
i=1
n
j=1
|aij|.

Frobenius norm
A F =
m
i=1
n
j=1
|aij|2
1
2
.
Max norm
A M = max
1≤i≤m
1≤j≤n
|aij|.
Frobenius norm of Hermitian transpose
AH
F = A F .
Frobenius norm as sum of norms of column vectors
A 2
F =
n
j=1
aj 2
2.
Frobenius norm as sum of norms of row vectors
A 2
F =
m
i=1
ai 2
2.
Frobenius norm invariance w.r.t. unitary matrices
UA F = A F
AV F = A F .
Frobenius norm is consistent:
AB F ≤ A F B F .
corollary 1.123
Ax 2 ≤ A F x 2.
A F =
n
i=1
σ2
i .
Consistent norms
AB ≤ A B
also known as sub-multiplicative norm.

1.10. DIGEST 99
Subordinate matrix norm
Ax α ≤ A x β
(α → β) Operator norm
A A α→β max
x=0
Ax β
x α
.
A α→β = max
x/∈ker(A)
Ax β
x α
= max
x α=1
Ax β.
(α → β) norm is subordinate
Ax β ≤ A α→β x α.
There exists a unit norm vector x∗
such that
A α→β = Ax∗
β.
α → α-norms are consistent
A α = max
x=0
Ax α
x α
p-norm
A p max
x=0
Ax p
x p
= max
x p=1
Ax p
Closed form p-norms
A 1 max
1≤j≤n
m
i=1
|aij|.
A ∞ max
1≤i≤m
n
j=1
|aij|.
2-norm
A 2 σ1
non-singular
A−1
2 =
1
σn
.

symmetric and positive deﬁnite
A 2 = λ1
non-singular
A−1
2 =
1
λn
.
normal
A 2 = |λ1|
non-singular
A−1
2 =
1
|λn|
.
Unitary invariant norm UAV = A for any A ∈ Cm×n
and any
unitary U and V .
Typical p → q norms
Dual norm and conjugate transpose
A p→q = AH
q →p
1
p
+
1
p
= 1.
A 2 = AH
2.
A 1 = AH
∞, A ∞ = AH
1.
A 1→∞ = AH
1→∞.
A 1→2 = AH
2→∞.
A ∞→2 = AH
2→1.
A 1→p
A 1→p = max
1≤j≤n
aj
p.
A p→∞
A p→∞ = max
1≤i≤m
ai
q
with 1
p
+ 1
q
= 1.
Consistency of p → q norm
AB p→q ≤ B p→s A s→q.

1.10. DIGEST 101
Consistency of p → ∞ norm
AB p→∞ ≤ A ∞→∞ B p→∞.
Dominance of p → ∞ norm by p → p norm
A p→∞ ≤ A p→p.
A 1→∞ ≤ A 1.
A 2→∞ ≤ A 2.
Restricted minimum property
min
z∈C(AH )
z=0
Az q
z p
≥ A† −1
q,p.
If A is surjective (onto), then the equality holds. When A is bijective
min
z∈C(AH )
z=0
Az q
z p
= A−1 −1
q,p.
Row column norm
A p,∞ max
1≤i≤m
ai
p.
A p,∞ = max
1≤i≤m
n
j=1
|ai
j|p
1
p
.
A ∞,∞ = max
i,j
|aij|.
A p,q
m
i=1
ai
p
q
1
q
.
Row column norm and p → ∞ norm
A p,∞ = A q→∞
with 1
p
+ 1
q
= 1.
Consistency of (p, ∞) norm
AB p,∞
B p,∞
≤ A ∞→∞.

Relations between (p, q) norms and (p → q) norms
A 1,∞ = A ∞→∞
A 2,∞ = A 2→∞
A ∞,∞ = A 1→∞
A 1→1 = AH
1,∞
A 1→2 = AH
2,∞

Bibliography
[1] David G Feingold, Richard S Varga, et al. Block diagonally domi-
nant matrices and generalizations of the gerschgorin circle theorem.
Paciﬁc J. Math, 12(4):1241–1250, 1962.
[2] Roger A Horn. The hadamard product. In Proc. Symp. Appl. Math,
volume 40, pages 87–169, 1990.
[3] Elizabeth Million. The hadamard product, 2007.
[4] George PH Styan. Hadamard products and multivariate statistical
analysis. Linear Algebra and Its Applications, 6:217–240, 1973.
[5] JOEL A TROPP. Just relax: Convex programming methods for
subset selection and sparse approximation. 2004.
103

Some notes on Matrix Algebra

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Some notes on Matrix Algebra

Similar to Some notes on Matrix Algebra (20)

Recently uploaded

Recently uploaded (20)

Some notes on Matrix Algebra