SlideShare a Scribd company logo
1 of 82
1
Matrix Decomposition and its
Application in Statistics
Nishith Kumar
Lecturer
Department of Statistics
Begum Rokeya University, Rangpur.
Email: nk.bru09@gmail.com
2
Overview
• Introduction
• LU decomposition
• QR decomposition
• Cholesky decomposition
• Jordan Decomposition
• Spectral decomposition
• Singular value decomposition
• Applications
3
Introduction
Some of most frequently used decompositions are the LU, QR,
Cholesky, Jordan, Spectral decomposition and Singular value
decompositions.
This Lecture covers relevant matrix decompositions, basic
numerical methods, its computation and some of its applications.
Decompositions provide a numerically stable way to solve
a system of linear equations, as shown already in
[Wampler, 1970], and to invert a matrix. Additionally, they
provide an important tool for analyzing the numerical stability of
a system.
4
Easy to solve system (Cont.)
Some linear system that can be easily solved
The solution:












nn
n a
b
a
b
a
b
/
/
/
22
2
11
1

5
Easy to solve system (Cont.)
Lower triangular matrix:
Solution: This system is solved using forward substitution
6
Easy to solve system (Cont.)
Upper Triangular Matrix:
Solution: This system is solved using Backward substitution
7
LU Decomposition
and
Where,













mm
m
m
u
u
u
u
u
u
U







0
0
0 2
22
1
12
11













mm
m
m l
l
l
l
l
l
L







2
1
22
21
11
0
0
0
LU
A 
LU decomposition was originally derived as a decomposition of quadratic
and bilinear forms. Lagrange, in the very first paper in his collected works(
1759) derives the algorithm we call Gaussian elimination. Later Turing
introduced the LU decomposition of a matrix in 1948 that is used to solve the
system of linear equation.
Let A be a m × m with nonsingular square matrix. Then there exists two
matrices L and U such that, where L is a lower triangular matrix and U is an
upper triangular matrix.
J-L Lagrange
(1736 –1813)
A. M. Turing
(1912-1954)
8
A … U (upper triangular)
 U = Ek  E1 A
 A = (E1)-1  (Ek)-1 U
If each such elementary matrix Ei is a lower triangular matrices,
it can be proved that (E1)-1, , (Ek)-1 are lower triangular, and
(E1)-1  (Ek)-1 is a lower triangular matrix.
Let L=(E1)-1  (Ek)-1 then A=LU.
How to decompose A=LU?










-
-
-










-
-










-











-
-
-











-
-
-










-
-











-
-
-










-
-
-

2
13
3
6
8
12
2
2
6
1
0
2
/
1
0
1
2
0
0
1
1
3
0
0
1
0
0
0
1
5
0
0
2
4
0
2
2
6
2
13
3
6
8
12
2
2
6
1
0
2
/
1
0
1
2
0
0
1
1
12
0
2
4
0
2
2
6
Now,
2
13
3
6
8
12
2
2
6
A
U E2 E1 A
9
Calculation of L and U (cont.)
Now reducing the first column we have










-
-
-

2
13
3
6
8
12
2
2
6
A










-
-
-










2
13
3
6
8
12
2
2
6
1
0
0
0
1
0
0
0
1










-
-
-










-
-










-











-
-
-











-
-
-










-
-











-
-
-
2
13
3
6
8
12
2
2
6
1
0
2
/
1
0
1
2
0
0
1
1
3
0
0
1
0
0
0
1
5
0
0
2
4
0
2
2
6
2
13
3
6
8
12
2
2
6
1
0
2
/
1
0
1
2
0
0
1
1
12
0
2
4
0
2
2
6
=
10
If A is a Non singular matrix then for each L (lower triangular matrix) the
upper triangular matrix is unique but an LU decomposition is not unique.
There can be more than one such LU decomposition for a matrix. Such as
Calculation of L and U (cont.)










































-










-
-
-
-
1
3
2
/
1
0
1
2
0
0
1
1
3
0
0
1
0
0
0
1
1
0
2
/
1
0
1
2
0
0
1
1
3
0
0
1
0
0
0
1
1
0
2
/
1
0
1
2
0
0
1
1
1










-
-
-

2
13
3
6
8
12
2
2
6
A










1
3
2
/
1
0
1
2
0
0
1










-
-
-
5
0
0
2
4
0
2
2
6










-
-
-

2
13
3
6
8
12
2
2
6
A










1
3
3
0
1
12
0
0
6










-
-
-
5
0
0
2
4
0
6
/
2
6
/
2
1
Now
Therefore,
=
=LU
=
=LU
11
Calculation of L and U (cont.)
Thus LU decomposition is not unique. Since we compute LU
decomposition by elementary transformation so if we change
L then U will be changed such that A=LU
To find out the unique LU decomposition, it is necessary to
put some restriction on L and U matrices. For example, we can
require the lower triangular matrix L to be a unit one (i.e. set
all the entries of its main diagonal to ones).
LU Decomposition in R:
• library(Matrix)
• x<-matrix(c(3,2,1, 9,3,4,4,2,5 ),ncol=3,nrow=3)
• expand(lu(x))
Calculation of L and U (cont.)
12
• Note: there are also generalizations of LU to non-square and singular
matrices, such as rank revealing LU factorization.
• [Pan, C.T. (2000). On the existence and computation of rank revealing LU
factorizations. Linear Algebra and its Applications, 316: 199-222.
• Miranian, L. and Gu, M. (2003). Strong rank revealing LU factorizations.
Linear Algebra and its Applications, 367: 1-16.]
• Uses: The LU decomposition is most commonly used in the solution of
systems of simultaneous linear equations. We can also find determinant
easily by using LU decomposition (Product of the diagonal element of
upper and lower triangular matrix).
Calculation of L and U (cont.)
13
Solving system of linear equation
using LU decomposition
Suppose we would like to solve a m×m system AX = b. Then we can find
a LU-decomposition for A, then to solve AX =b, it is enough to solve the
systems
Thus the system LY = b can be solved by the method of forward
substitution and the system UX = Y can be solved by the method of
backward substitution. To illustrate, we give some examples
Consider the given system AX = b, where
and










-
-
-

2
13
3
6
8
12
2
2
6
A










-

17
14
8
b
14
We have seen A = LU, where
Thus, to solve AX = b, we first solve LY = b by forward substitution
Then
Solving system of linear equation
using LU decomposition











1
3
2
/
1
0
1
2
0
0
1
L










-
-
-

5
0
0
2
4
0
2
2
6
U










-





















17
14
8
1
3
2
/
1
0
1
2
0
0
1
3
2
1
y
y
y










-
-












15
2
8
3
2
1
y
y
y
Y
15
Now, we solve UX =Y by backward substitution
then
Solving system of linear equation
using LU decomposition










-
-





















-
-
-
15
2
8
5
0
0
2
4
0
2
2
6
3
2
1
x
x
x





















3
2
1
3
2
1
x
x
x
16
QR Decomposition
If A is a m×n matrix with linearly independent columns, then A can be
decomposed as , where Q is a m×n matrix whose columns
form an orthonormal basis for the column space of A and R is an
nonsingular upper triangular matrix.
QR
A 
Jørgen Pedersen Gram
(1850 –1916)
Erhard Schmidt
(1876-1959)
Firstly QR decomposition
originated with Gram(1883).
Later Erhard Schmidt (1907)
proved the QR Decomposition
Theorem
17
QR-Decomposition (Cont.)
Theorem : If A is a m×n matrix with linearly independent columns, then
A can be decomposed as , where Q is a m×n matrix whose
columns form an orthonormal basis for the column space of A and R is an
nonsingular upper triangular matrix.
Proof: Suppose A=[u1 | u2| . . . | un] and rank (A) = n.
Apply the Gram-Schmidt process to {u1, u2 , . . . ,un} and the
orthogonal vectors v1, v2 , . . . ,vn are
Let for i=1,2,. . ., n. Thus q1, q2 , . . . ,qn form a orthonormal
basis for the column space of A.
QR
A 
1
2
1
1
2
2
2
2
1
2
1
1 ,
,
,
-
-
-
-
-
-
-
 i
i
i
i
i
i
i
i v
v
v
u
v
v
v
u
v
v
v
u
u
v 
i
i
i
v
v
q 
18
QR-Decomposition (Cont.)
Now,
i.e.,
Thus ui is orthogonal to qj for j>i;
1
2
1
1
2
2
2
2
1
2
1
1 ,
,
,
-
-
-




 i
i
i
i
i
i
i
i v
v
v
u
v
v
v
u
v
v
v
u
v
u 
1
1
2
2
1
1 ,
,
, -
-





 i
i
i
i
i
i
i
i q
q
u
q
q
u
q
q
u
q
v
u 
}
,
,
{
}
,
,
,
{ 2
2
1 i
i
i
i q
q
q
span
v
v
v
span
u 
 

1
1
2
2
1
1
2
2
3
1
1
3
3
3
3
1
1
2
2
2
2
1
1
1
,
,
,
,
,
,
-
-











n
n
n
n
n
n
n
n q
q
u
q
q
u
q
q
u
q
v
u
q
q
u
q
q
u
q
v
u
q
q
u
q
v
u
q
v
u


19
Let Q= [q1 q2 . . . qn] , so Q is a m×n matrix whose columns form an
orthonormal basis for the column space of A .
Now,
i.e., A=QR.
Where,
Thus A can be decomposed as A=QR , where R is an upper triangular and
nonsingular matrix.
QR-Decomposition (Cont.)
   


















n
n
n
n
n
n
v
q
u
v
q
u
q
u
v
q
u
q
u
q
u
v
q
q
q
u
u
u
A
0
0
0
0
,
0
0
,
,
0
,
,
,
3
3
2
2
3
2
1
1
3
1
2
1
2
1
2
1



























n
n
n
n
v
q
u
v
q
u
q
u
v
q
u
q
u
q
u
v
R
0
0
0
0
,
0
0
,
,
0
,
,
,
3
3
2
2
3
2
1
1
3
1
2
1








20
QR Decomposition
Example: Find the QR decomposition of












-
-
-
-

1
0
0
0
1
1
0
0
1
1
1
1
A
21
Applying Gram-Schmidt process of computing QR decomposition
1st Step:
2nd Step:
3rd Step:
Calculation of QR Decomposition


















0
3
1
3
1
3
1
1
3
1
1
1
1
11
a
a
q
a
r
3
2
2
1
12 -

 a
q
r T














-
-


















-
-















-
-














-
-

-

-

0
6
/
1
3
2
6
/
1
ˆ
ˆ
1
3
2
ˆ
0
3
/
1
3
/
2
3
/
1
0
3
1
3
1
3
1
)
3
/
2
(
0
1
0
1
ˆ
2
2
2
2
22
12
1
2
2
1
1
2
2
q
q
q
q
r
r
q
a
a
q
q
a
q T
22
4th Step:
5th Step:
6th Step:
Calculation of QR Decomposition
3
1
3
1
13 -

 a
q
r T
6
1
3
2
23 
 a
q
r T














-
-


















-
-

-
-

-
-

6
/
2
6
/
1
0
6
/
1
ˆ
ˆ
1
2
/
6
ˆ
1
2
/
1
0
2
/
1
ˆ
3
3
3
3
33
2
23
1
13
3
3
2
2
3
1
1
3
3
q
q
q
q
r
q
r
q
r
a
a
q
q
a
q
q
a
q T
T
23
Therefore, A=QR
R code for QR Decomposition:
x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)
qrstr <- qr(x)
Q<-qr.Q(qrstr)
R<-qr.R(qrstr)
Uses: QR decomposition is widely used in computer codes to find the
eigenvalues of a matrix, to solve linear systems, and to find least squares
approximations.
Calculation of QR Decomposition









 -
-














-
-
-
-













-
-
-
-
2
/
6
0
0
6
/
1
6
/
2
0
3
/
1
3
/
2
3
6
/
2
0
0
6
/
1
6
/
1
3
/
1
0
6
/
2
3
/
1
6
/
1
6
/
1
3
/
1
1
0
0
0
1
1
0
0
1
1
1
1
24
Least square solution using QR
Decomposition
The least square solution of b is
Let X=QR. Then
Therefore,
  Y
X
b
X
X t
t

    Z
Y
Q
Rb
Y
Q
R
R
Rb
R
R
Y
Q
R
Rb
R t
t
t
t
t
t
t
t
t






-
- 1
1
     
Y
Q
R
Y
X
Rb
R
QRb
Q
R
b
QR
QR
b
X
X
t
t
t
t
t
t
t
t





25
Cholesky Decomposition
Cholesky died from wounds received on the battle field on 31 August 1918
at 5 o'clock in the morning in the North of France. After his death one of
his fellow officers, Commandant Benoit, published Cholesky's method of
computing solutions to the normal equations for some least squares data
fitting problems published in the Bulletin géodesique in 1924. Which is
known as Cholesky Decomposition
Cholesky Decomposition: If A is a real, symmetric and positive definite
matrix then there exists a unique lower triangular matrix L with positive
diagonal element such that .
T
LL
A 
Andre-Louis Cholesky
1875-1918
26
Cholesky Decomposition
Theorem: If A is a n×n real, symmetric and positive definite matrix then
there exists a unique lower triangular matrix G with positive diagonal
element such that .
Proof: Since A is a n×n real and positive definite so it has a LU
decomposition, A=LU. Also let the lower triangular matrix L to be a unit
one (i.e. set all the entries of its main diagonal to ones). So in that case LU
decomposition is unique. Let us suppose
observe that . is a unit upper triangular matrix.
Thus, A=LDMT .Since A is Symmetric so, A=AT . i.e., LDMT =MDLT.
From the uniqueness we have L=M. So, A=LDLT . Since A is positive
definite so all diagonal elements of D are positive. Let
then we can write A=GGT.
T
GG
A 
)
,
,
,
( 22
11 nn
u
u
u
diag
D 


U
D
M T 1
-

)
,
,
,
( 22
11 nn
d
d
d
diag
L
G 


27
Cholesky Decomposition (Cont.)
Procedure To find out the cholesky decomposition
Suppose
We need to solve
the equation













nn
n
n
n
n
a
a
a
a
a
a
a
a
a
A







2
1
2
22
21
1
12
11


 


 










 


 















T
L
nn
n
n
L
nn
n
n
nn
n
n
n
n
l
l
l
l
l
l
l
l
l
l
l
l
a
a
a
a
a
a
a
a
a
A






































0
0
0
0
0
0
2
22
1
21
11
2
1
22
21
11
2
1
2
22
21
1
12
11
28
Example of Cholesky Decomposition
Suppose
Then Cholesky Decomposition
Now,
2
/
1
1
1
2






-
 
-

k
s
ks
kk
kk l
a
l










-
-

5
2
2
2
10
2
2
2
4
A










-

3
1
1
0
3
1
0
0
2
L
For k from 1 to n
For j from k+1 to n kk
k
s
ks
js
jk
jk l
l
l
a
l 





-
 
-

1
1
29
R code for Cholesky Decomposition
• x<-matrix(c(4,2,-2, 2,10,2, -2,2,5),ncol=3,nrow=3)
• cl<-chol(x)
• If we Decompose A as LDLT then
and










-

1
3
/
1
2
/
1
0
1
2
/
1
0
0
1
L











3
0
0
0
9
0
0
0
4
D
30
Application of Cholesky
Decomposition
Cholesky Decomposition is used to solve the system
of linear equation Ax=b, where A is real symmetric
and positive definite.
In regression analysis it could be used to estimate the
parameter if XTX is positive definite.
In Kernel principal component analysis, Cholesky
decomposition is also used (Weiya Shi; Yue-Fei
Guo; 2010)
31
Characteristic Roots and
Characteristics Vectors
Any nonzero vector x is said to be a characteristic vector of a matrix A, If
there exist a number λ such that Ax= λx;
Where A is a square matrix, also then λ is said to be a characteristic root of
the matrix A corresponding to the characteristic vector x.
Characteristic root is unique but characteristic vector is not unique.
We calculate characteristics root λ from the characteristic equation |A- λI|=0
For λ= λi the characteristics vector is the solution of x from the following
homogeneous system of linear equation (A- λiI)x=0
Theorem: If A is a real symmetric matrix and λi and λj are two distinct latent
root of A then the corresponding latent vector xi and xj are orthogonal.
32
Multiplicity
Algebraic Multiplicity: The number of repetitions of a certain
eigenvalue. If, for a certain matrix, λ={3,3,4}, then the
algebraic multiplicity of 3 would be 2 (as it appears twice) and
the algebraic multiplicity of 4 would be 1 (as it appears once).
This type of multiplicity is normally represented by the Greek
letter α, where α(λi) represents the algebraic multiplicity of λi.
Geometric Multiplicity: the geometric multiplicity of an
eigenvalue is the number of linearly independent eigenvectors
associated with it.
33
Jordan Decomposition
Camille Jordan (1870)
• Let A be any n×n matrix then there exists a nonsingular matrix P and JK(λ)
a k×k matrix form
Such that

























0
0
0
0
1
0
0
0
1
)
(
k
J















-
)
(
0
0
0
0
)
(
0
0
0
)
(
2
1
1 2
1
r
k
k
k
r
J
J
J
AP
P









where k1+k2+ … + kr =n. Also λi , i=1,2,. . ., r are the characteristic roots
And ki are the algebraic multiplicity of λi ,
Jordan Decomposition is used in Differential equation and time series analysis.
Camille Jordan
(1838-1921)
34
Spectral Decomposition
Let A be a m × m real symmetric matrix. Then
there exists an orthogonal matrix P such that
or , where Λ is a diagonal
matrix.


AP
PT T
P
P
A 

CAUCHY, A.L.(1789-1857)
A. L. Cauchy established the Spectral
Decomposition in 1829.
35
Spectral Decomposition and
Principal component Analysis (Cont.)
By using spectral decomposition we can write
In multivariate analysis our data is a matrix. Suppose our data is
X matrix. Suppose X is mean centered i.e.,
and the variance covariance matrix is ∑. The variance covariance
matrix ∑ is real and symmetric.
Using spectral decomposition we can write ∑=PΛPT . Where Λ is
a diagonal matrix.
Also
tr(∑) = Total variation of Data =tr(Λ)
T
P
P
A 

)
( 
-
 X
X
)
,
,
,
( 2
1 n
diag 

 


n


 

 
2
1
36
The Principal component transformation is the transformation
Y=(X-µ)P
Where,
 E(Yi)=0
 V(Yi)=λi
 Cov(Yi ,Yj)=0 if i ≠ j
 V(Y1) ≥ V(Y2) ≥ . . . ≥ V(Yn)


Spectral Decomposition and
Principal component Analysis (Cont.)




n
i
i tr
Y
V
1
)
(
)
(




n
i
i
Y
V
1
)
(
37
R code for Spectral Decomposition
x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)
eigen(x)
Application:
 For Data Reduction.
 Image Processing and Compression.
 K-Selection for K-means clustering
 Multivariate Outliers Detection
 Noise Filtering
 Trend detection in the observations.
38
There are five mathematicians who were responsible for establishing the existence of the
singular value decomposition and developing its theory.
Historical background of SVD
Eugenio Beltrami
(1835-1899)
Camille Jordan
(1838-1921)
James Joseph
Sylvester
(1814-1897)
Erhard Schmidt
(1876-1959)
Hermann Weyl
(1885-1955)
The Singular Value Decomposition was originally developed by two mathematician in the
mid to late 1800’s
1. Eugenio Beltrami , 2.Camille Jordan
Several other mathematicians took part in the final developments of the SVD including James
Joseph Sylvester, Erhard Schmidt and Hermann Weyl who studied the SVD into the mid-1900’s.
C.Eckart and G. Young prove low rank approximation of SVD (1936).
C.Eckart
39
What is SVD?
Any real (m×n) matrix X, where (n≤ m), can be
decomposed,
X = UΛVT
U is a (m×n) column orthonormal matrix (UTU=I),
containing the eigenvectors of the symmetric matrix
XXT.
Λ is a (n×n ) diagonal matrix, containing the singular
values of matrix X. The number of non zero diagonal
elements of Λ corresponds to the rank of X.
VT is a (n×n ) row orthonormal matrix (VTV=I),
containing the eigenvectors of the symmetric matrix
XTX.
40
Theorem (Singular Value Decomposition) : Let X be m×n of rank
r, r ≤ n ≤ m. Then there exist matrices U , V and a diagonal
matrix Λ , with positive diagonal elements such that,
Proof: Since X is m × n of rank r, r ≤ n ≤ m. So XXT and XTX both
of rank r ( by using the concept of Grammian matrix ) and of
dimension m × m and n × n respectively. Since XXT is real
symmetric matrix so we can write by spectral decomposition,
Where Q and D are respectively, the matrices of characteristic
vectors and corresponding characteristic roots of XXT.
Again since XTX is real symmetric matrix so we can write by
spectral decomposition,
Singular Value Decomposition (Cont.)
T
V
U
X 

T
T
QDQ
XX 
T
T
RMR
X
X 
41
Where R is the (orthogonal) matrix of characteristic vectors and M
is diagonal matrix of the corresponding characteristic roots.
Since XXT and XTX are both of rank r, only r of their characteristic
roots are positive, the remaining being zero. Hence we can
write,
Also we can write,
Singular Value Decomposition (Cont.)







0
0
0
r
D
D







0
0
0
r
M
M
42
We know that the nonzero characteristic roots of XXT and XTX are
equal so
Partition Q, R conformably with D and M, respectively
i.e., ; such that Qr is m × r , Rr is n × r and
correspond respectively to the nonzero characteristic roots of
XXT and XTX. Now take
Where are the positive characteristic roots of
XXT and hence those of XTX as well (by using the concept of
grammian matrix.)
Singular Value Decomposition (Cont.)
r
r M
D 
)
,
( *
Q
Q
Q r
 )
R
,
( *
r
R
R 
r
r
R
V
Q
U


)
,
,
,
(
2
/
1
2
/
1
2
2
/
1
1
2
/
1
r
r d
d
d
diag
D 



r
i
di ,
,
2
,
1
, 

43
Now define,
Now we shall show that S=X thus completing the proof.
Similarly,
From the first relation above we conclude that for an arbitrary orthogonal matrix,
say P1 ,
While from the second we conclude that for an arbitrary orthogonal matrix, say P2
We must have
Singular Value Decomposition (Cont.)
T
r
r
r R
D
Q
S 2
/
1

X
X
RMR
R
M
R
R
D
R
R
D
Q
Q
D
R
R
D
Q
R
D
Q
S
S
T
T
T
r
r
r
T
r
r
r
T
r
r
r
T
r
r
r
T
r
r
r
T
T
r
r
r
T





 )
(
2
/
1
2
/
1
2
/
1
2
/
1
T
T
XX
SS 
X
P
S 1

2
XP
S 
44
The preceding, however, implies that for arbitrary orthogonal
matrices P1 , P2 the matrix X satisfies
Which in turn implies that,
Thus
Singular Value Decomposition (Cont.)
2
2
1
1 , XP
X
P
X
X
P
XX
P
XX T
T
T
T
T
T


n
m I
P
I
P 
 2
1 ,
T
T
r
r
r V
U
R
D
Q
S
X 


 2
/
1
45
R Code for Singular Value Decomposition
x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)
sv<-svd(x)
D<-sv$d
U<-sv$u
V<-sv$v
46
Decomposition in Diagram
Matrix A
Lu decomposition
Not always unique
QR Decomposition
Full column rank
Square
Rectangular
SVD
Symmetric
Asymmetric
PD
Cholesky
Decomposition
Spectral
Decomposition
AM>GM
Jordan
Decomposition
AM=GM
Similar
Diagonalization
P-1AP=Λ
47
Properties Of SVD
Rewriting the SVD
where
r = rank of A
λi = the i-th diagonal element of Λ.
ui and vi are the i-th columns of U and V
respectively.
T
i
r
i
i
i
T
v
u
V
U
A 




1

48
Proprieties of SVD
Low rank Approximation
Theorem: If A=UΛVT is the SVD of A and the
singular values are sorted as ,
then for any l <r, the best rank-l approximation
to A is
;
Low rank approximation technique is very much
important for data compression.
n


 

 
2
1
T
i
l
i
i
i v
u
A 


1
~
 



-
r
l
i
i
A
A
1
2
2
~

49
• SVD can be used to compute optimal low-rank
approximations.
• Approximation of A is à of rank k such that
If are the characteristics roots of ATA then
à and X are both mn matrices.
Low-rank Approximation
F
k
X
rank
X
X
A
Min
A -


)
(
:
~
Frobenius norm

 

m
i
n
j
ij
a
A
1
2
1
n
d
d
d ,
,
, 2
1  


n
i
i
d
A
1
2
50
Low-rank Approximation
• Solution via SVD
set smallest r-k
singular values to zero




















T
V
U
X

























































*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
K=2
T
k V
U
A )
0
,...,
0
,
,...,
(
diag
~
1 


column notation: sum
of rank 1 matrices
T
i
i
k
i i v
u
A 
 1
~

51
Approximation error
• How good (bad) is this approximation?
• It’s the best possible, measured by the Frobenius norm of the
error:
• where the λi are ordered such that λi  λi+1.





-

-
r
k
i
i
F
F
k
X
rank
X
A
A
X
A
1
2
2
2
)
(
:
~
min 
2
~
F
A
A -
Now
52
Row approximation and column
approximation
Suppose Ri and cj represent the i-th row and j-th column of A. The SVD
of A and is
The SVD equation for Ri is
We can approximate Ri by ; l<r
where i = 1,…,m.



r
k
k
k
jk
j u
v
C
1

A
~
T
k
l
k
k
k
T
l
l
l v
u
V
U
A 




1
~

T
k
r
k
k
k
T
v
u
V
U
A 




1




r
k
k
k
ik
i v
u
R
1




l
k
k
k
ik
l
i v
u
R
1

Also the SVD equation for Cj is,
where j = 1, 2, …, n
We can also approximate Cj by ; l<r



l
k
k
k
jk
l
j u
v
C
1

53
Least square solution in inconsistent
system
By using SVD we can solve the inconsistent system.This gives the
least square solution.
The least square solution
where Ag be the MP inverse of A.
2
min
b
Ax
x
-
54
The SVD of Ag is
This can be written as
Where
55
Basic Results of SVD
56
SVD based PCA
If we reduced variable by using SVD then it performs like PCA.
Suppose X is a mean centered data matrix, Then
X using SVD, X=UΛVT
we can write- XV = UΛ
Suppose Y = XV = UΛ
Then the first columns of Y represents the first
principal component score and so on.
o SVD Based PC is more Numerically Stable.
o If no. of variables is greater than no. of observations then SVD based PCA will
give efficient result(Antti Niemistö, Statistical Analysis of Gene Expression
Microarray Data,2005)
57
 Data Reduction both variables and observations.
 Solving linear least square Problems
 Image Processing and Compression.
 K-Selection for K-means clustering
 Multivariate Outliers Detection
 Noise Filtering
 Trend detection in the observations and the variables.
Application of SVD
58
Origin of biplot
 Gabriel (1971)
 One of the most
important advances in
data analysis in recent
decades
 Currently…
 > 50,000 web pages
 Numerous academic
publications
 Included in most
statistical analysis
packages
 Still a very new
technique to most
scientists
Prof. Ruben Gabriel, “The founder of biplot”
Courtesy of Prof. Purificación Galindo
University of Salamanca, Spain
59
What is a biplot?
• “Biplot” = “bi” + “plot”
– “plot”
• scatter plot of two rows OR of two columns, or
• scatter plot summarizing the rows OR the columns
– “bi”
• BOTH rows AND columns
• 1 biplot >> 2 plots
60
Practical definition of a biplot
“Any two-way table can be analyzed using a 2D-biplot as soon as it can be
sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971)
G-by-E table
Matrix decomposition
P(4, 3) G(3, 2) E(2, 3)
(Now 3D-biplots are also possible…)










-
-

















-
-


















-
-
-
-
-
2
1
4
3
3
2
3
2
1
0
4
4
3
1
3
3
3
2
3
4
1
12
12
8
4
9
6
10
3
15
12
6
2
6
9
20
1
3
2
1
y
x
e
e
e
g
g
g
g
y
x
g
g
g
g
e
e
e
-4
-3
-2
-1
0
1
2
3
4
5
-4 -3 -2 -1 0 1 2 3 4 5
X
Y
O
G1
G2
G3
G4
E1
E2
E3
61
Singular Value Decomposition (SVD) &
Singular Value Partitioning (SVP)
SVD:
SVP:
Biplot
Plot Plot



-







r
k
kj
f
k
f
k
ik
SVP
r
k
kj
k
ik
SVD
ij
v
u
v
u
X
1
1
1
)
)(
( 


The ‘rank’ of Y, i.e.,
the minimum number
of PC required to
fully represent Y
Matrix
characterising
the rows
“Singular values”
Matrix
characterising
the columns
Rows scores Column scores
f=1
f=0
f=1/2
Common uses value
of f
62
Biplot
 The simplest biplot is to show the first two PCs together with the
projections of the axes of the original variables
 x-axis represents the scores for the first principal component
 Y-axis the scores for the second principal component.
 The original variables are represented by arrows which
graphically indicate the proportion of the original variance
explained by the first two principal components.
 The direction of the arrows indicates the relative loadings on
the first and second principal components.
 Biplot analysis can help to understand the multivariate data
i) Graphically
ii) Effectively
iii) Conveniently.
63
Biplot of Iris Data
Comp. 1
Comp.
2
-0.2 -0.1 0.0 0.1 0.2
-0.2
-0.1
0.0
0.1
0.2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
33
3
3
3
3
3
3
3
3
3
3
3
3
3
33
3
3
3
3
3
3
3
3
3
-10 -5 0 5 10
-10
-5
0
5
10
Sepal L.
Sepal W.
Petal L.
Petal W.
1= Setosa
2= Versicolor
3= Virginica
64
Image Compression Example
Pansy Flower image, collected from
http://www.ats.ucla.edu/stat/r/code/pansy.jpg
This image is 600×465 pixels
65
Singular values of flowers image
Plot of the singular values
66
Low rank Approximation to flowers image
Rank-1 approximation
Rank- 5 approximation
67
Rank-20 approximation
Low rank Approximation to flowers image
Rank-30 approximation
68
Rank-50 approximation
Low rank Approximation to flowers image
Rank-80 approximation
69
Rank-100 approximation
Low rank Approximation to flowers image
Rank-120 approximation
70
Rank-150 approximation True Image
Low rank Approximation to flowers image
71
Outlier Detection Using SVD
Nishith and Nasser (2007,MSc. Thesis) propose a graphical
method of outliers detection using SVD.
It is suitable for both general multivariate data and regression
data. For this we construct the scatter plots of first two PC’s,
and first PC and third PC. We also make a box in the scatter
plot whose range lies
median(1stPC) ± 3 × mad(1stPC) in the X-axis and
median(2ndPC/3rdPC) ± 3 × mad(2ndPC/3rdPC) in the Y-
axis.
Where mad = median absolute deviation.
The points that are outside the box can be considered as
extreme outliers. The points outside one side of the box is
termed as outliers. Along with the box we may construct
another smaller box bounded by 2.5/2 MAD line
72
Outlier Detection Using SVD (Cont.)
Scatter plot of Hawkins, Bradu and kass data (a) scatter plot of first two PC’s and
(b) scatter plot of first and third PC.
HAWKINS-BRADU-KASS
(1984) DATA
Data set containing 75 observations
with 14 influential observations.
Among them there are ten high
leverage outliers (cases 1-10)
and for high leverage points
(cases 11-14) -Imon (2005).
73
Outlier Detection Using SVD (Cont.)
Scatter plot of modified Brown data (a) scatter plot of first
two PC’s and (b) scatter plot of first and third PC.
MODIFIED BROWN DATA
Data set given by Brown (1980).
Ryan (1997) pointed out that the
original data on the 53 patients
which contains 1 outlier
(observation number 24).
Imon and Hadi(2005) modified
this data set by putting two more
outliers as cases 54 and 55.
Also they showed that observations
24, 54 and 55 are outliers by using
generalized standardized
Pearson residual (GSPR)
74
Cluster Detection Using SVD
Singular Value Decomposition is also used for cluster
detection (Nishith, Nasser and Suboron, 2011).
The methods for clustering data using first three
PC’s are given below,
median (1st PC) ± k × mad (1st PC) in the X-axis
and median (2nd PC/3rd PC) ± k × mad (2nd
PC/3rd PC) in the Y-axis.
Where mad = median absolute deviation. The value of
k = 1, 2, 3.
75
76
Principals stations in climate data
77
Climatic Variables
The climatic variables are,
1. Rainfall (RF) mm
2. Daily mean temperature (T-MEAN)0C
3. Maximum temperature (T-MAX)0C
4. Minimum temperature (T-MIN)0C
5. Day-time temperature (T-DAY)0C
6. Night-time temperature (T-NIGHT)0C
7. Daily mean water vapor pressure (VP) MBAR
8. Daily mean wind speed (WS) m/sec
9. Hours of bright sunshine as percentage of maximum possible sunshine
hours (MPS)%
10. Solar radiation (SR) cal/cm2/day
78
Consequences of SVD
Generally many missing values may present in the data. It may also contain
unusual observations. Both types of problem can not handle Classical singular
value decomposition.
Robust singular value decomposition can solve both types of problems.
Robust singular value decomposition can be obtained by alternating L1
regression approach (Douglas M. Hawkins, Li Liu, and S. Stanley Young,
(2001)).
79
Initialize the leading
left singular vector 1
u
There is no obvious choice of
the initial values of 1
u
Fit the L1 regression coefficient cj by
minimizing ;
j=1,2,…,p


-
n
i
i
j
ij u
c
x
1
1
Calculate right singular vector v1=c/║c║
, where ║.║ refers to Euclidean norm.
Again fit the L1 regression coefficient
di by minimizing ; i=1,2,….,n


-
p
j
j
i
ij v
d
x
1
1
Calculate the resulting estimate of
the left eigenvector ui=d/ ║d║
Iterate this process untill it converge.
The Alternating L1 Regression Algorithm for Robust Singular Value
Decomposition.
For the second and subsequent of the SVD, we replaced X by a deflated matrix
obtained by subtracting the most recently found them in the SVD X X-λkukvk
T
80
Clustering weather stations on Map
Using RSVD
81
References
• Brown B.W., Jr. (1980). Prediction analysis for binary data. in
Biostatistics Casebook, R.G. Miller, Jr., B. Efron, B. W. Brown, Jr., L.E.
Moses (Eds.), New York: Wiley.
• Dhrymes, Phoebus J. (1984), Mathematics for Econometrics, 2nd ed.
Springer Verlag, New York.
• Hawkins D. M., Bradu D. and Kass G.V.(1984),Location of several
outliers in multiple regression data using elemental sets. Technometrics,
20, 197-208.
• Imon A. H. M. R. (2005). Identifying multiple influential observations in
linear Regression. Journal of Applied Statistics 32, 73 – 90.
• Kumar, N. , Nasser, M., and Sarker, S.C., 2011. “A New Singular Value
Decomposition Based Robust Graphical Clustering Technique and Its
Application in Climatic Data” Journal of Geography and Geology,
Canadian Center of Science and Education , Vol-3, No. 1, 227-238.
• Ryan T.P. (1997). Modern Regression Methods, Wiley, New York.
• Stewart, G.W. (1998). Matrix Algorithms, Vol 1. Basic
Decompositions, Siam, Philadelphia.
• Matrix Decomposition. http://fedc.wiwi.hu-
berlin.de/xplore/ebooks/html/csa/node36.html
82

More Related Content

Similar to Matrix-Decomposition-and-Its-application-in-Statistics_NK.ppt

directed-research-report
directed-research-reportdirected-research-report
directed-research-reportRyen Krusinga
 
APLICACIONES DE ESPACIO VECTORIALES
APLICACIONES DE ESPACIO VECTORIALESAPLICACIONES DE ESPACIO VECTORIALES
APLICACIONES DE ESPACIO VECTORIALESJoseLuisCastroGualot
 
linear system of solutions
linear system of solutionslinear system of solutions
linear system of solutionsLama Rulz
 
Iast.lect19.slides
Iast.lect19.slidesIast.lect19.slides
Iast.lect19.slidesha88ni
 
Design of sampled data control systems part 2. 6th lecture
Design of sampled data control systems part 2.  6th lectureDesign of sampled data control systems part 2.  6th lecture
Design of sampled data control systems part 2. 6th lectureKhalaf Gaeid Alshammery
 
lec-7_phase_plane_analysis.pptx
lec-7_phase_plane_analysis.pptxlec-7_phase_plane_analysis.pptx
lec-7_phase_plane_analysis.pptxdatamboli
 
Simulation of Double Pendulum
Simulation of Double PendulumSimulation of Double Pendulum
Simulation of Double PendulumQUESTJOURNAL
 
ECEN615_Fall2020_Lect4.pptx
ECEN615_Fall2020_Lect4.pptxECEN615_Fall2020_Lect4.pptx
ECEN615_Fall2020_Lect4.pptxPrasenjitDey49
 
Lagrangeon Points
Lagrangeon PointsLagrangeon Points
Lagrangeon PointsNikhitha C
 
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONSAPPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONSAYESHA JAVED
 
orthogonal.pptx
orthogonal.pptxorthogonal.pptx
orthogonal.pptxJaseSharma
 
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis Lake Como School of Advanced Studies
 
Second Order Active RC Blocks
Second Order Active RC BlocksSecond Order Active RC Blocks
Second Order Active RC BlocksHoopeer Hoopeer
 

Similar to Matrix-Decomposition-and-Its-application-in-Statistics_NK.ppt (20)

directed-research-report
directed-research-reportdirected-research-report
directed-research-report
 
Wang1998
Wang1998Wang1998
Wang1998
 
APLICACIONES DE ESPACIO VECTORIALES
APLICACIONES DE ESPACIO VECTORIALESAPLICACIONES DE ESPACIO VECTORIALES
APLICACIONES DE ESPACIO VECTORIALES
 
linear system of solutions
linear system of solutionslinear system of solutions
linear system of solutions
 
Iast.lect19.slides
Iast.lect19.slidesIast.lect19.slides
Iast.lect19.slides
 
Design of sampled data control systems part 2. 6th lecture
Design of sampled data control systems part 2.  6th lectureDesign of sampled data control systems part 2.  6th lecture
Design of sampled data control systems part 2. 6th lecture
 
Gaussian
GaussianGaussian
Gaussian
 
Dynamics
DynamicsDynamics
Dynamics
 
lec-7_phase_plane_analysis.pptx
lec-7_phase_plane_analysis.pptxlec-7_phase_plane_analysis.pptx
lec-7_phase_plane_analysis.pptx
 
Simulation of Double Pendulum
Simulation of Double PendulumSimulation of Double Pendulum
Simulation of Double Pendulum
 
Ee321s3.1
Ee321s3.1Ee321s3.1
Ee321s3.1
 
ECEN615_Fall2020_Lect4.pptx
ECEN615_Fall2020_Lect4.pptxECEN615_Fall2020_Lect4.pptx
ECEN615_Fall2020_Lect4.pptx
 
Rankmatrix
RankmatrixRankmatrix
Rankmatrix
 
Lagrangeon Points
Lagrangeon PointsLagrangeon Points
Lagrangeon Points
 
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONSAPPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
 
orthogonal.pptx
orthogonal.pptxorthogonal.pptx
orthogonal.pptx
 
Coueete project
Coueete projectCoueete project
Coueete project
 
Signals and Systems Assignment Help
Signals and Systems Assignment HelpSignals and Systems Assignment Help
Signals and Systems Assignment Help
 
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
 
Second Order Active RC Blocks
Second Order Active RC BlocksSecond Order Active RC Blocks
Second Order Active RC Blocks
 

Recently uploaded

UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdfKamal Acharya
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 

Recently uploaded (20)

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 

Matrix-Decomposition-and-Its-application-in-Statistics_NK.ppt

  • 1. 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com
  • 2. 2 Overview • Introduction • LU decomposition • QR decomposition • Cholesky decomposition • Jordan Decomposition • Spectral decomposition • Singular value decomposition • Applications
  • 3. 3 Introduction Some of most frequently used decompositions are the LU, QR, Cholesky, Jordan, Spectral decomposition and Singular value decompositions. This Lecture covers relevant matrix decompositions, basic numerical methods, its computation and some of its applications. Decompositions provide a numerically stable way to solve a system of linear equations, as shown already in [Wampler, 1970], and to invert a matrix. Additionally, they provide an important tool for analyzing the numerical stability of a system.
  • 4. 4 Easy to solve system (Cont.) Some linear system that can be easily solved The solution:             nn n a b a b a b / / / 22 2 11 1 
  • 5. 5 Easy to solve system (Cont.) Lower triangular matrix: Solution: This system is solved using forward substitution
  • 6. 6 Easy to solve system (Cont.) Upper Triangular Matrix: Solution: This system is solved using Backward substitution
  • 7. 7 LU Decomposition and Where,              mm m m u u u u u u U        0 0 0 2 22 1 12 11              mm m m l l l l l l L        2 1 22 21 11 0 0 0 LU A  LU decomposition was originally derived as a decomposition of quadratic and bilinear forms. Lagrange, in the very first paper in his collected works( 1759) derives the algorithm we call Gaussian elimination. Later Turing introduced the LU decomposition of a matrix in 1948 that is used to solve the system of linear equation. Let A be a m × m with nonsingular square matrix. Then there exists two matrices L and U such that, where L is a lower triangular matrix and U is an upper triangular matrix. J-L Lagrange (1736 –1813) A. M. Turing (1912-1954)
  • 8. 8 A … U (upper triangular)  U = Ek  E1 A  A = (E1)-1  (Ek)-1 U If each such elementary matrix Ei is a lower triangular matrices, it can be proved that (E1)-1, , (Ek)-1 are lower triangular, and (E1)-1  (Ek)-1 is a lower triangular matrix. Let L=(E1)-1  (Ek)-1 then A=LU. How to decompose A=LU?           - - -           - -           -            - - -            - - -           - -            - - -           - - -  2 13 3 6 8 12 2 2 6 1 0 2 / 1 0 1 2 0 0 1 1 3 0 0 1 0 0 0 1 5 0 0 2 4 0 2 2 6 2 13 3 6 8 12 2 2 6 1 0 2 / 1 0 1 2 0 0 1 1 12 0 2 4 0 2 2 6 Now, 2 13 3 6 8 12 2 2 6 A U E2 E1 A
  • 9. 9 Calculation of L and U (cont.) Now reducing the first column we have           - - -  2 13 3 6 8 12 2 2 6 A           - - -           2 13 3 6 8 12 2 2 6 1 0 0 0 1 0 0 0 1           - - -           - -           -            - - -            - - -           - -            - - - 2 13 3 6 8 12 2 2 6 1 0 2 / 1 0 1 2 0 0 1 1 3 0 0 1 0 0 0 1 5 0 0 2 4 0 2 2 6 2 13 3 6 8 12 2 2 6 1 0 2 / 1 0 1 2 0 0 1 1 12 0 2 4 0 2 2 6 =
  • 10. 10 If A is a Non singular matrix then for each L (lower triangular matrix) the upper triangular matrix is unique but an LU decomposition is not unique. There can be more than one such LU decomposition for a matrix. Such as Calculation of L and U (cont.)                                           -           - - - - 1 3 2 / 1 0 1 2 0 0 1 1 3 0 0 1 0 0 0 1 1 0 2 / 1 0 1 2 0 0 1 1 3 0 0 1 0 0 0 1 1 0 2 / 1 0 1 2 0 0 1 1 1           - - -  2 13 3 6 8 12 2 2 6 A           1 3 2 / 1 0 1 2 0 0 1           - - - 5 0 0 2 4 0 2 2 6           - - -  2 13 3 6 8 12 2 2 6 A           1 3 3 0 1 12 0 0 6           - - - 5 0 0 2 4 0 6 / 2 6 / 2 1 Now Therefore, = =LU = =LU
  • 11. 11 Calculation of L and U (cont.) Thus LU decomposition is not unique. Since we compute LU decomposition by elementary transformation so if we change L then U will be changed such that A=LU To find out the unique LU decomposition, it is necessary to put some restriction on L and U matrices. For example, we can require the lower triangular matrix L to be a unit one (i.e. set all the entries of its main diagonal to ones). LU Decomposition in R: • library(Matrix) • x<-matrix(c(3,2,1, 9,3,4,4,2,5 ),ncol=3,nrow=3) • expand(lu(x)) Calculation of L and U (cont.)
  • 12. 12 • Note: there are also generalizations of LU to non-square and singular matrices, such as rank revealing LU factorization. • [Pan, C.T. (2000). On the existence and computation of rank revealing LU factorizations. Linear Algebra and its Applications, 316: 199-222. • Miranian, L. and Gu, M. (2003). Strong rank revealing LU factorizations. Linear Algebra and its Applications, 367: 1-16.] • Uses: The LU decomposition is most commonly used in the solution of systems of simultaneous linear equations. We can also find determinant easily by using LU decomposition (Product of the diagonal element of upper and lower triangular matrix). Calculation of L and U (cont.)
  • 13. 13 Solving system of linear equation using LU decomposition Suppose we would like to solve a m×m system AX = b. Then we can find a LU-decomposition for A, then to solve AX =b, it is enough to solve the systems Thus the system LY = b can be solved by the method of forward substitution and the system UX = Y can be solved by the method of backward substitution. To illustrate, we give some examples Consider the given system AX = b, where and           - - -  2 13 3 6 8 12 2 2 6 A           -  17 14 8 b
  • 14. 14 We have seen A = LU, where Thus, to solve AX = b, we first solve LY = b by forward substitution Then Solving system of linear equation using LU decomposition            1 3 2 / 1 0 1 2 0 0 1 L           - - -  5 0 0 2 4 0 2 2 6 U           -                      17 14 8 1 3 2 / 1 0 1 2 0 0 1 3 2 1 y y y           - -             15 2 8 3 2 1 y y y Y
  • 15. 15 Now, we solve UX =Y by backward substitution then Solving system of linear equation using LU decomposition           - -                      - - - 15 2 8 5 0 0 2 4 0 2 2 6 3 2 1 x x x                      3 2 1 3 2 1 x x x
  • 16. 16 QR Decomposition If A is a m×n matrix with linearly independent columns, then A can be decomposed as , where Q is a m×n matrix whose columns form an orthonormal basis for the column space of A and R is an nonsingular upper triangular matrix. QR A  Jørgen Pedersen Gram (1850 –1916) Erhard Schmidt (1876-1959) Firstly QR decomposition originated with Gram(1883). Later Erhard Schmidt (1907) proved the QR Decomposition Theorem
  • 17. 17 QR-Decomposition (Cont.) Theorem : If A is a m×n matrix with linearly independent columns, then A can be decomposed as , where Q is a m×n matrix whose columns form an orthonormal basis for the column space of A and R is an nonsingular upper triangular matrix. Proof: Suppose A=[u1 | u2| . . . | un] and rank (A) = n. Apply the Gram-Schmidt process to {u1, u2 , . . . ,un} and the orthogonal vectors v1, v2 , . . . ,vn are Let for i=1,2,. . ., n. Thus q1, q2 , . . . ,qn form a orthonormal basis for the column space of A. QR A  1 2 1 1 2 2 2 2 1 2 1 1 , , , - - - - - - -  i i i i i i i i v v v u v v v u v v v u u v  i i i v v q 
  • 18. 18 QR-Decomposition (Cont.) Now, i.e., Thus ui is orthogonal to qj for j>i; 1 2 1 1 2 2 2 2 1 2 1 1 , , , - - -      i i i i i i i i v v v u v v v u v v v u v u  1 1 2 2 1 1 , , , - -       i i i i i i i i q q u q q u q q u q v u  } , , { } , , , { 2 2 1 i i i i q q q span v v v span u     1 1 2 2 1 1 2 2 3 1 1 3 3 3 3 1 1 2 2 2 2 1 1 1 , , , , , , - -            n n n n n n n n q q u q q u q q u q v u q q u q q u q v u q q u q v u q v u  
  • 19. 19 Let Q= [q1 q2 . . . qn] , so Q is a m×n matrix whose columns form an orthonormal basis for the column space of A . Now, i.e., A=QR. Where, Thus A can be decomposed as A=QR , where R is an upper triangular and nonsingular matrix. QR-Decomposition (Cont.)                       n n n n n n v q u v q u q u v q u q u q u v q q q u u u A 0 0 0 0 , 0 0 , , 0 , , , 3 3 2 2 3 2 1 1 3 1 2 1 2 1 2 1                            n n n n v q u v q u q u v q u q u q u v R 0 0 0 0 , 0 0 , , 0 , , , 3 3 2 2 3 2 1 1 3 1 2 1        
  • 20. 20 QR Decomposition Example: Find the QR decomposition of             - - - -  1 0 0 0 1 1 0 0 1 1 1 1 A
  • 21. 21 Applying Gram-Schmidt process of computing QR decomposition 1st Step: 2nd Step: 3rd Step: Calculation of QR Decomposition                   0 3 1 3 1 3 1 1 3 1 1 1 1 11 a a q a r 3 2 2 1 12 -   a q r T               - -                   - -                - -               - -  -  -  0 6 / 1 3 2 6 / 1 ˆ ˆ 1 3 2 ˆ 0 3 / 1 3 / 2 3 / 1 0 3 1 3 1 3 1 ) 3 / 2 ( 0 1 0 1 ˆ 2 2 2 2 22 12 1 2 2 1 1 2 2 q q q q r r q a a q q a q T
  • 22. 22 4th Step: 5th Step: 6th Step: Calculation of QR Decomposition 3 1 3 1 13 -   a q r T 6 1 3 2 23   a q r T               - -                   - -  - -  - -  6 / 2 6 / 1 0 6 / 1 ˆ ˆ 1 2 / 6 ˆ 1 2 / 1 0 2 / 1 ˆ 3 3 3 3 33 2 23 1 13 3 3 2 2 3 1 1 3 3 q q q q r q r q r a a q q a q q a q T T
  • 23. 23 Therefore, A=QR R code for QR Decomposition: x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3) qrstr <- qr(x) Q<-qr.Q(qrstr) R<-qr.R(qrstr) Uses: QR decomposition is widely used in computer codes to find the eigenvalues of a matrix, to solve linear systems, and to find least squares approximations. Calculation of QR Decomposition           - -               - - - -              - - - - 2 / 6 0 0 6 / 1 6 / 2 0 3 / 1 3 / 2 3 6 / 2 0 0 6 / 1 6 / 1 3 / 1 0 6 / 2 3 / 1 6 / 1 6 / 1 3 / 1 1 0 0 0 1 1 0 0 1 1 1 1
  • 24. 24 Least square solution using QR Decomposition The least square solution of b is Let X=QR. Then Therefore,   Y X b X X t t      Z Y Q Rb Y Q R R Rb R R Y Q R Rb R t t t t t t t t t       - - 1 1       Y Q R Y X Rb R QRb Q R b QR QR b X X t t t t t t t t     
  • 25. 25 Cholesky Decomposition Cholesky died from wounds received on the battle field on 31 August 1918 at 5 o'clock in the morning in the North of France. After his death one of his fellow officers, Commandant Benoit, published Cholesky's method of computing solutions to the normal equations for some least squares data fitting problems published in the Bulletin géodesique in 1924. Which is known as Cholesky Decomposition Cholesky Decomposition: If A is a real, symmetric and positive definite matrix then there exists a unique lower triangular matrix L with positive diagonal element such that . T LL A  Andre-Louis Cholesky 1875-1918
  • 26. 26 Cholesky Decomposition Theorem: If A is a n×n real, symmetric and positive definite matrix then there exists a unique lower triangular matrix G with positive diagonal element such that . Proof: Since A is a n×n real and positive definite so it has a LU decomposition, A=LU. Also let the lower triangular matrix L to be a unit one (i.e. set all the entries of its main diagonal to ones). So in that case LU decomposition is unique. Let us suppose observe that . is a unit upper triangular matrix. Thus, A=LDMT .Since A is Symmetric so, A=AT . i.e., LDMT =MDLT. From the uniqueness we have L=M. So, A=LDLT . Since A is positive definite so all diagonal elements of D are positive. Let then we can write A=GGT. T GG A  ) , , , ( 22 11 nn u u u diag D    U D M T 1 -  ) , , , ( 22 11 nn d d d diag L G   
  • 27. 27 Cholesky Decomposition (Cont.) Procedure To find out the cholesky decomposition Suppose We need to solve the equation              nn n n n n a a a a a a a a a A        2 1 2 22 21 1 12 11                                        T L nn n n L nn n n nn n n n n l l l l l l l l l l l l a a a a a a a a a A                                       0 0 0 0 0 0 2 22 1 21 11 2 1 22 21 11 2 1 2 22 21 1 12 11
  • 28. 28 Example of Cholesky Decomposition Suppose Then Cholesky Decomposition Now, 2 / 1 1 1 2       -   -  k s ks kk kk l a l           - -  5 2 2 2 10 2 2 2 4 A           -  3 1 1 0 3 1 0 0 2 L For k from 1 to n For j from k+1 to n kk k s ks js jk jk l l l a l       -   -  1 1
  • 29. 29 R code for Cholesky Decomposition • x<-matrix(c(4,2,-2, 2,10,2, -2,2,5),ncol=3,nrow=3) • cl<-chol(x) • If we Decompose A as LDLT then and           -  1 3 / 1 2 / 1 0 1 2 / 1 0 0 1 L            3 0 0 0 9 0 0 0 4 D
  • 30. 30 Application of Cholesky Decomposition Cholesky Decomposition is used to solve the system of linear equation Ax=b, where A is real symmetric and positive definite. In regression analysis it could be used to estimate the parameter if XTX is positive definite. In Kernel principal component analysis, Cholesky decomposition is also used (Weiya Shi; Yue-Fei Guo; 2010)
  • 31. 31 Characteristic Roots and Characteristics Vectors Any nonzero vector x is said to be a characteristic vector of a matrix A, If there exist a number λ such that Ax= λx; Where A is a square matrix, also then λ is said to be a characteristic root of the matrix A corresponding to the characteristic vector x. Characteristic root is unique but characteristic vector is not unique. We calculate characteristics root λ from the characteristic equation |A- λI|=0 For λ= λi the characteristics vector is the solution of x from the following homogeneous system of linear equation (A- λiI)x=0 Theorem: If A is a real symmetric matrix and λi and λj are two distinct latent root of A then the corresponding latent vector xi and xj are orthogonal.
  • 32. 32 Multiplicity Algebraic Multiplicity: The number of repetitions of a certain eigenvalue. If, for a certain matrix, λ={3,3,4}, then the algebraic multiplicity of 3 would be 2 (as it appears twice) and the algebraic multiplicity of 4 would be 1 (as it appears once). This type of multiplicity is normally represented by the Greek letter α, where α(λi) represents the algebraic multiplicity of λi. Geometric Multiplicity: the geometric multiplicity of an eigenvalue is the number of linearly independent eigenvectors associated with it.
  • 33. 33 Jordan Decomposition Camille Jordan (1870) • Let A be any n×n matrix then there exists a nonsingular matrix P and JK(λ) a k×k matrix form Such that                          0 0 0 0 1 0 0 0 1 ) ( k J                - ) ( 0 0 0 0 ) ( 0 0 0 ) ( 2 1 1 2 1 r k k k r J J J AP P          where k1+k2+ … + kr =n. Also λi , i=1,2,. . ., r are the characteristic roots And ki are the algebraic multiplicity of λi , Jordan Decomposition is used in Differential equation and time series analysis. Camille Jordan (1838-1921)
  • 34. 34 Spectral Decomposition Let A be a m × m real symmetric matrix. Then there exists an orthogonal matrix P such that or , where Λ is a diagonal matrix.   AP PT T P P A   CAUCHY, A.L.(1789-1857) A. L. Cauchy established the Spectral Decomposition in 1829.
  • 35. 35 Spectral Decomposition and Principal component Analysis (Cont.) By using spectral decomposition we can write In multivariate analysis our data is a matrix. Suppose our data is X matrix. Suppose X is mean centered i.e., and the variance covariance matrix is ∑. The variance covariance matrix ∑ is real and symmetric. Using spectral decomposition we can write ∑=PΛPT . Where Λ is a diagonal matrix. Also tr(∑) = Total variation of Data =tr(Λ) T P P A   ) (  -  X X ) , , , ( 2 1 n diag       n        2 1
  • 36. 36 The Principal component transformation is the transformation Y=(X-µ)P Where,  E(Yi)=0  V(Yi)=λi  Cov(Yi ,Yj)=0 if i ≠ j  V(Y1) ≥ V(Y2) ≥ . . . ≥ V(Yn)   Spectral Decomposition and Principal component Analysis (Cont.)     n i i tr Y V 1 ) ( ) (     n i i Y V 1 ) (
  • 37. 37 R code for Spectral Decomposition x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3) eigen(x) Application:  For Data Reduction.  Image Processing and Compression.  K-Selection for K-means clustering  Multivariate Outliers Detection  Noise Filtering  Trend detection in the observations.
  • 38. 38 There are five mathematicians who were responsible for establishing the existence of the singular value decomposition and developing its theory. Historical background of SVD Eugenio Beltrami (1835-1899) Camille Jordan (1838-1921) James Joseph Sylvester (1814-1897) Erhard Schmidt (1876-1959) Hermann Weyl (1885-1955) The Singular Value Decomposition was originally developed by two mathematician in the mid to late 1800’s 1. Eugenio Beltrami , 2.Camille Jordan Several other mathematicians took part in the final developments of the SVD including James Joseph Sylvester, Erhard Schmidt and Hermann Weyl who studied the SVD into the mid-1900’s. C.Eckart and G. Young prove low rank approximation of SVD (1936). C.Eckart
  • 39. 39 What is SVD? Any real (m×n) matrix X, where (n≤ m), can be decomposed, X = UΛVT U is a (m×n) column orthonormal matrix (UTU=I), containing the eigenvectors of the symmetric matrix XXT. Λ is a (n×n ) diagonal matrix, containing the singular values of matrix X. The number of non zero diagonal elements of Λ corresponds to the rank of X. VT is a (n×n ) row orthonormal matrix (VTV=I), containing the eigenvectors of the symmetric matrix XTX.
  • 40. 40 Theorem (Singular Value Decomposition) : Let X be m×n of rank r, r ≤ n ≤ m. Then there exist matrices U , V and a diagonal matrix Λ , with positive diagonal elements such that, Proof: Since X is m × n of rank r, r ≤ n ≤ m. So XXT and XTX both of rank r ( by using the concept of Grammian matrix ) and of dimension m × m and n × n respectively. Since XXT is real symmetric matrix so we can write by spectral decomposition, Where Q and D are respectively, the matrices of characteristic vectors and corresponding characteristic roots of XXT. Again since XTX is real symmetric matrix so we can write by spectral decomposition, Singular Value Decomposition (Cont.) T V U X   T T QDQ XX  T T RMR X X 
  • 41. 41 Where R is the (orthogonal) matrix of characteristic vectors and M is diagonal matrix of the corresponding characteristic roots. Since XXT and XTX are both of rank r, only r of their characteristic roots are positive, the remaining being zero. Hence we can write, Also we can write, Singular Value Decomposition (Cont.)        0 0 0 r D D        0 0 0 r M M
  • 42. 42 We know that the nonzero characteristic roots of XXT and XTX are equal so Partition Q, R conformably with D and M, respectively i.e., ; such that Qr is m × r , Rr is n × r and correspond respectively to the nonzero characteristic roots of XXT and XTX. Now take Where are the positive characteristic roots of XXT and hence those of XTX as well (by using the concept of grammian matrix.) Singular Value Decomposition (Cont.) r r M D  ) , ( * Q Q Q r  ) R , ( * r R R  r r R V Q U   ) , , , ( 2 / 1 2 / 1 2 2 / 1 1 2 / 1 r r d d d diag D     r i di , , 2 , 1 ,  
  • 43. 43 Now define, Now we shall show that S=X thus completing the proof. Similarly, From the first relation above we conclude that for an arbitrary orthogonal matrix, say P1 , While from the second we conclude that for an arbitrary orthogonal matrix, say P2 We must have Singular Value Decomposition (Cont.) T r r r R D Q S 2 / 1  X X RMR R M R R D R R D Q Q D R R D Q R D Q S S T T T r r r T r r r T r r r T r r r T r r r T T r r r T       ) ( 2 / 1 2 / 1 2 / 1 2 / 1 T T XX SS  X P S 1  2 XP S 
  • 44. 44 The preceding, however, implies that for arbitrary orthogonal matrices P1 , P2 the matrix X satisfies Which in turn implies that, Thus Singular Value Decomposition (Cont.) 2 2 1 1 , XP X P X X P XX P XX T T T T T T   n m I P I P   2 1 , T T r r r V U R D Q S X     2 / 1
  • 45. 45 R Code for Singular Value Decomposition x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3) sv<-svd(x) D<-sv$d U<-sv$u V<-sv$v
  • 46. 46 Decomposition in Diagram Matrix A Lu decomposition Not always unique QR Decomposition Full column rank Square Rectangular SVD Symmetric Asymmetric PD Cholesky Decomposition Spectral Decomposition AM>GM Jordan Decomposition AM=GM Similar Diagonalization P-1AP=Λ
  • 47. 47 Properties Of SVD Rewriting the SVD where r = rank of A λi = the i-th diagonal element of Λ. ui and vi are the i-th columns of U and V respectively. T i r i i i T v u V U A      1 
  • 48. 48 Proprieties of SVD Low rank Approximation Theorem: If A=UΛVT is the SVD of A and the singular values are sorted as , then for any l <r, the best rank-l approximation to A is ; Low rank approximation technique is very much important for data compression. n        2 1 T i l i i i v u A    1 ~      - r l i i A A 1 2 2 ~ 
  • 49. 49 • SVD can be used to compute optimal low-rank approximations. • Approximation of A is à of rank k such that If are the characteristics roots of ATA then à and X are both mn matrices. Low-rank Approximation F k X rank X X A Min A -   ) ( : ~ Frobenius norm     m i n j ij a A 1 2 1 n d d d , , , 2 1     n i i d A 1 2
  • 50. 50 Low-rank Approximation • Solution via SVD set smallest r-k singular values to zero                     T V U X                                                          * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * K=2 T k V U A ) 0 ,..., 0 , ,..., ( diag ~ 1    column notation: sum of rank 1 matrices T i i k i i v u A   1 ~ 
  • 51. 51 Approximation error • How good (bad) is this approximation? • It’s the best possible, measured by the Frobenius norm of the error: • where the λi are ordered such that λi  λi+1.      -  - r k i i F F k X rank X A A X A 1 2 2 2 ) ( : ~ min  2 ~ F A A - Now
  • 52. 52 Row approximation and column approximation Suppose Ri and cj represent the i-th row and j-th column of A. The SVD of A and is The SVD equation for Ri is We can approximate Ri by ; l<r where i = 1,…,m.    r k k k jk j u v C 1  A ~ T k l k k k T l l l v u V U A      1 ~  T k r k k k T v u V U A      1     r k k k ik i v u R 1     l k k k ik l i v u R 1  Also the SVD equation for Cj is, where j = 1, 2, …, n We can also approximate Cj by ; l<r    l k k k jk l j u v C 1 
  • 53. 53 Least square solution in inconsistent system By using SVD we can solve the inconsistent system.This gives the least square solution. The least square solution where Ag be the MP inverse of A. 2 min b Ax x -
  • 54. 54 The SVD of Ag is This can be written as Where
  • 56. 56 SVD based PCA If we reduced variable by using SVD then it performs like PCA. Suppose X is a mean centered data matrix, Then X using SVD, X=UΛVT we can write- XV = UΛ Suppose Y = XV = UΛ Then the first columns of Y represents the first principal component score and so on. o SVD Based PC is more Numerically Stable. o If no. of variables is greater than no. of observations then SVD based PCA will give efficient result(Antti Niemistö, Statistical Analysis of Gene Expression Microarray Data,2005)
  • 57. 57  Data Reduction both variables and observations.  Solving linear least square Problems  Image Processing and Compression.  K-Selection for K-means clustering  Multivariate Outliers Detection  Noise Filtering  Trend detection in the observations and the variables. Application of SVD
  • 58. 58 Origin of biplot  Gabriel (1971)  One of the most important advances in data analysis in recent decades  Currently…  > 50,000 web pages  Numerous academic publications  Included in most statistical analysis packages  Still a very new technique to most scientists Prof. Ruben Gabriel, “The founder of biplot” Courtesy of Prof. Purificación Galindo University of Salamanca, Spain
  • 59. 59 What is a biplot? • “Biplot” = “bi” + “plot” – “plot” • scatter plot of two rows OR of two columns, or • scatter plot summarizing the rows OR the columns – “bi” • BOTH rows AND columns • 1 biplot >> 2 plots
  • 60. 60 Practical definition of a biplot “Any two-way table can be analyzed using a 2D-biplot as soon as it can be sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971) G-by-E table Matrix decomposition P(4, 3) G(3, 2) E(2, 3) (Now 3D-biplots are also possible…)           - -                  - -                   - - - - - 2 1 4 3 3 2 3 2 1 0 4 4 3 1 3 3 3 2 3 4 1 12 12 8 4 9 6 10 3 15 12 6 2 6 9 20 1 3 2 1 y x e e e g g g g y x g g g g e e e -4 -3 -2 -1 0 1 2 3 4 5 -4 -3 -2 -1 0 1 2 3 4 5 X Y O G1 G2 G3 G4 E1 E2 E3
  • 61. 61 Singular Value Decomposition (SVD) & Singular Value Partitioning (SVP) SVD: SVP: Biplot Plot Plot    -        r k kj f k f k ik SVP r k kj k ik SVD ij v u v u X 1 1 1 ) )( (    The ‘rank’ of Y, i.e., the minimum number of PC required to fully represent Y Matrix characterising the rows “Singular values” Matrix characterising the columns Rows scores Column scores f=1 f=0 f=1/2 Common uses value of f
  • 62. 62 Biplot  The simplest biplot is to show the first two PCs together with the projections of the axes of the original variables  x-axis represents the scores for the first principal component  Y-axis the scores for the second principal component.  The original variables are represented by arrows which graphically indicate the proportion of the original variance explained by the first two principal components.  The direction of the arrows indicates the relative loadings on the first and second principal components.  Biplot analysis can help to understand the multivariate data i) Graphically ii) Effectively iii) Conveniently.
  • 63. 63 Biplot of Iris Data Comp. 1 Comp. 2 -0.2 -0.1 0.0 0.1 0.2 -0.2 -0.1 0.0 0.1 0.2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 -10 -5 0 5 10 -10 -5 0 5 10 Sepal L. Sepal W. Petal L. Petal W. 1= Setosa 2= Versicolor 3= Virginica
  • 64. 64 Image Compression Example Pansy Flower image, collected from http://www.ats.ucla.edu/stat/r/code/pansy.jpg This image is 600×465 pixels
  • 65. 65 Singular values of flowers image Plot of the singular values
  • 66. 66 Low rank Approximation to flowers image Rank-1 approximation Rank- 5 approximation
  • 67. 67 Rank-20 approximation Low rank Approximation to flowers image Rank-30 approximation
  • 68. 68 Rank-50 approximation Low rank Approximation to flowers image Rank-80 approximation
  • 69. 69 Rank-100 approximation Low rank Approximation to flowers image Rank-120 approximation
  • 70. 70 Rank-150 approximation True Image Low rank Approximation to flowers image
  • 71. 71 Outlier Detection Using SVD Nishith and Nasser (2007,MSc. Thesis) propose a graphical method of outliers detection using SVD. It is suitable for both general multivariate data and regression data. For this we construct the scatter plots of first two PC’s, and first PC and third PC. We also make a box in the scatter plot whose range lies median(1stPC) ± 3 × mad(1stPC) in the X-axis and median(2ndPC/3rdPC) ± 3 × mad(2ndPC/3rdPC) in the Y- axis. Where mad = median absolute deviation. The points that are outside the box can be considered as extreme outliers. The points outside one side of the box is termed as outliers. Along with the box we may construct another smaller box bounded by 2.5/2 MAD line
  • 72. 72 Outlier Detection Using SVD (Cont.) Scatter plot of Hawkins, Bradu and kass data (a) scatter plot of first two PC’s and (b) scatter plot of first and third PC. HAWKINS-BRADU-KASS (1984) DATA Data set containing 75 observations with 14 influential observations. Among them there are ten high leverage outliers (cases 1-10) and for high leverage points (cases 11-14) -Imon (2005).
  • 73. 73 Outlier Detection Using SVD (Cont.) Scatter plot of modified Brown data (a) scatter plot of first two PC’s and (b) scatter plot of first and third PC. MODIFIED BROWN DATA Data set given by Brown (1980). Ryan (1997) pointed out that the original data on the 53 patients which contains 1 outlier (observation number 24). Imon and Hadi(2005) modified this data set by putting two more outliers as cases 54 and 55. Also they showed that observations 24, 54 and 55 are outliers by using generalized standardized Pearson residual (GSPR)
  • 74. 74 Cluster Detection Using SVD Singular Value Decomposition is also used for cluster detection (Nishith, Nasser and Suboron, 2011). The methods for clustering data using first three PC’s are given below, median (1st PC) ± k × mad (1st PC) in the X-axis and median (2nd PC/3rd PC) ± k × mad (2nd PC/3rd PC) in the Y-axis. Where mad = median absolute deviation. The value of k = 1, 2, 3.
  • 75. 75
  • 77. 77 Climatic Variables The climatic variables are, 1. Rainfall (RF) mm 2. Daily mean temperature (T-MEAN)0C 3. Maximum temperature (T-MAX)0C 4. Minimum temperature (T-MIN)0C 5. Day-time temperature (T-DAY)0C 6. Night-time temperature (T-NIGHT)0C 7. Daily mean water vapor pressure (VP) MBAR 8. Daily mean wind speed (WS) m/sec 9. Hours of bright sunshine as percentage of maximum possible sunshine hours (MPS)% 10. Solar radiation (SR) cal/cm2/day
  • 78. 78 Consequences of SVD Generally many missing values may present in the data. It may also contain unusual observations. Both types of problem can not handle Classical singular value decomposition. Robust singular value decomposition can solve both types of problems. Robust singular value decomposition can be obtained by alternating L1 regression approach (Douglas M. Hawkins, Li Liu, and S. Stanley Young, (2001)).
  • 79. 79 Initialize the leading left singular vector 1 u There is no obvious choice of the initial values of 1 u Fit the L1 regression coefficient cj by minimizing ; j=1,2,…,p   - n i i j ij u c x 1 1 Calculate right singular vector v1=c/║c║ , where ║.║ refers to Euclidean norm. Again fit the L1 regression coefficient di by minimizing ; i=1,2,….,n   - p j j i ij v d x 1 1 Calculate the resulting estimate of the left eigenvector ui=d/ ║d║ Iterate this process untill it converge. The Alternating L1 Regression Algorithm for Robust Singular Value Decomposition. For the second and subsequent of the SVD, we replaced X by a deflated matrix obtained by subtracting the most recently found them in the SVD X X-λkukvk T
  • 80. 80 Clustering weather stations on Map Using RSVD
  • 81. 81 References • Brown B.W., Jr. (1980). Prediction analysis for binary data. in Biostatistics Casebook, R.G. Miller, Jr., B. Efron, B. W. Brown, Jr., L.E. Moses (Eds.), New York: Wiley. • Dhrymes, Phoebus J. (1984), Mathematics for Econometrics, 2nd ed. Springer Verlag, New York. • Hawkins D. M., Bradu D. and Kass G.V.(1984),Location of several outliers in multiple regression data using elemental sets. Technometrics, 20, 197-208. • Imon A. H. M. R. (2005). Identifying multiple influential observations in linear Regression. Journal of Applied Statistics 32, 73 – 90. • Kumar, N. , Nasser, M., and Sarker, S.C., 2011. “A New Singular Value Decomposition Based Robust Graphical Clustering Technique and Its Application in Climatic Data” Journal of Geography and Geology, Canadian Center of Science and Education , Vol-3, No. 1, 227-238. • Ryan T.P. (1997). Modern Regression Methods, Wiley, New York. • Stewart, G.W. (1998). Matrix Algorithms, Vol 1. Basic Decompositions, Siam, Philadelphia. • Matrix Decomposition. http://fedc.wiwi.hu- berlin.de/xplore/ebooks/html/csa/node36.html
  • 82. 82