2. Contents
• Assumptions
• Introduction
• Notation
• Refresher on matrices
• Eigen vectors and eigen values
• Principal component analysis
• Principal coordinate analysis
• Correspondence analysis
• Redundancy analysis and canonical correspondence analysis
• R Software
4. 1 2 3
2
I am assuming that you are familiar with the following :
A sum of several numbers ............. , 1 to
The mean of several numbers
1 1
Variance ( ) , or
( 1)
n
i
i
n
i
i
n
i
i
x x x x i n
x
x
n
x x
n n
2
( )
1
Covariance ( )
( 1)
n
i
i
n
i i
i
x x
x x y y
n
6. Before 1980 multivariate research tended to be theoretical. Often this
involved working on distribution theory for hypothetical but unrealistic
situations. Very rarely did anyone carry out a multivariate analysis.
Since 1980 or so, with growth of computing power, people started using
multivariate methods. This has led to further development of the old
methods, plus introduction of a lot of new methods.
Starting in the mid 1980s a lot of biologically trained people who were
also computer literate started appearing in the work place. These
people, who rarely had any formal training in mathematics or statistics
were able to run multivariate software and get results. This led to a lot
of nonsense, including some published nonsense.
7. 1 2 3 4 5 6 7
1 3 8
2 0 9
3 13 4
4 5 5
8 19
23
77 7 3
SpeciesSites
1 2 3 4 5 6 7
1 45.8 78.6
2 32.8 98.5
3 56.1 45.0
4
77 78.3
Counts
Environmental
Sites
e.g. Soil, Climate, etc.
Real Numbers
Typical Example of Ecology Data
In pesticide research “site”
could refer to different
chemicals, or different rates
of the same chemical
8.
9. Example data sets: Although this is a talk about methods used in
ecology, I shall be using data from other sources.
This is because it is easier to understand what is going on if data is
more familiar.
Also, because during my search of the web I found it quite difficult to
find suitable ecological examples.
I have also tended to use data with small numbers of dimensions
11. is an matrix :
I will call it an n by p matrix - which means n rows and p columns
Always in blue, and always a captital if p >1
And using the matric"style"in "Mathtype w hit
n pY
dimensions in subscripts
also represents matrix
showing generic cell member,
with subscripts but not dimensions....................not used very much
ijy
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . . is another way of representing a matrix
. . . . . . .
. . . . . . .
. . .
p
p
p
n n n np
y y y y
y y y y
y y y y
y y y y
If matrix has only one row or only one column i will use lower case blue,e.g. n×1u
13. Generic Matrix
Square Matrix
Row Matrix Column Matrix
Vector
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . .
. . . . . . .
. . . . . . .
. . .
p
p
p
ij
n n n np
y y y y
y y y y
y y y y
y
y y y y
n pY
11 12 13
21 22 23
31 32 33
ij
y y y
y y y y
y y y
3 3Y
11 12 11 3 1. . . . . . ny y y y pY
11
21
31
1
.
.
.
.
n
y
y
y
y
n 1Y
14. Matrix Multiplication
3 3 3 2 2 3Q B C
11 12 13 11 12
11 12 13
21 22 23 21 22
21 22 23
31 32 33 31 32
q q q b b
c c c
q q q b b
c c c
q q q b b
11 11 11 12 21
12 11 12 12 22
13 11 13 12 23
33 31` 13 32 23
.
q b c b c
q b c b c
q b c b c
q b c b c
32 32 32Q B C
11 12 11 12 11 12 11 11 12 12
21 22 21 22 21 22 21 21 22 21
31 32 31 32 31 32 31 31 32 33
q q b b c c b c b c
q q b b c c b c b c
q q b b c c b c b c
Matrix Addition
16. Norm: Normalisation
1
2
3
.
.
.
.
n
b
b
b
b
nb
3
4
4
3
4
3
2 2 2
21 ....... nb b b b
2 2
3 4 5 b
Length or Norm
/ / 5 b b b b b
Normalised
/b b b
17. Transpose of a Matrix
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
51 52 53 54
11 21 31 41 51
12 22 32 42 52
13 23 33 43 53
14 24 34 44 54
the transpose of is
y y y y
y y y y
y y y y
y y y y
y y y y
y y y y y
y y y y y
y y y y y
y y y y y
5×4
5×4 4×5
Y
Y Y
( ) =
( ) =
n×p p×q q×p p×n
n×p p×q q×m m×q q×p p×n
A B B A
A B C C B A
18. Scalar Product
1 2 3 1
2
3
. .
.
.
n
n
b c b b b b c
c
c
c
1 n n 1b c
1 1 2 2 3 3 .......... n nbc b c b c b c
If b and c are orthogonal then
length of length of cos b c
cos cos90 0
1 2 3 1
2
3
. .
.
.
n
n
b c b b b b b
b
b
b
1×n n×1b b
1 1 2 2 3 3 .......... n nbb b b b b b c
0 1 n n 1b cso
length of length of cos0 b b
2
length of b
1 if is normalised b
19. Determinant
11 12
11 22 12 21
21 22
b b
b b b b
b b
B
11 12 13
21 22 23
31 32 33
22 23 21 23 21 221 1 1 2 1 3
11 12 13
32 33 31 33 31 32
( 1) ( 1) ( 1)
b b b
b b b
b b b
b b b b b b
b b b
b b b b b b
B
Scalar
20. Rank of a Square Matrix
1 1 1
3 0 2
4 1 3
(-2*Col 1) = col 2 + (3*col3)
row 1 = row 2 – row 3
Only two linearly independent
(orthogonal) rows so rank = 2.
2 1 4
2 1 4
2 1 4
(-2*col1) = (4*col2) = col 3
row 1 = row 2 = row 3
Rank = 1
Order of a square matrix is number of rows/columns
Rank of a square matrix is the number of linearly independent rows/columns
A square matrix whose rank is less than its order has a determinant of zero.
If a square matrix has a non-zero determinant it has full rank = number of rows or columns
A full rank square matrix is called non-singular
21. Inverse of a Square Matrix
If is non singular then -1 -1
B BB B B I
is called theinverse of-1
B B
1 1
1 0
3 1
32B
1 3 1
2 5 1
23 23 32 22 32 23 33C C B I B C I
4 15 4
7 25 6
23 23 32 22 32 23 33C C B I B C I
C is not unique
http://www.mathwords.com/i/inverse_of_a_matrix.htm
http://mathworld.wolfram.com/MatrixInverse.html
http://www.purplemath.com/modules/mtrxinvr.htm
23. Association Matrices Q-Mode
11 1
1
p
n np
y y
y y
npY
11 12 13 1. . . . . . py y y y
21 22 23 2. . . . . . py y y y
Row 1
Row 2
2 2 2 2
12 21 11 21 11 21 11 21 11 21.......a a y y y y y y y y
Euclidean Distance
Correlation
1 1. 2 2.
1
12 21
2 2
1 1. 2 2.
p
j j
j
j j
y y y y
a a
y y y y
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . .
. . . . . . .
. . . . . . .
. . .
n
n
n
n
i
n n
n j
n nn
a a a a
a a a a
a a a a
a
a a a a
A
24. 11 1
1
p
n np
y y
y y
npY
11
21
31
1
.
.
.
.
n
y
y
y
y
12
22
32
2
.
.
.
.
n
y
y
y
y
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . .
. . . . . . .
. . . . . . .
. . .
p
p
p
ij
p p p pp
a a a a
a a a a
a a a a
a
a a a a
nnA
Compute
12 21a a
Association Matrices R-Mode
25. Summary: Q and R Mode
Descriptors Objects
Objects
e.g. Sites
Descriptors
e.g. Species
n pY n nA
p pA
R mode
association
matrix
Q mode
association
matrix
Original
data
matrix
27. Ecological data sets usually include a large number of variables that are associated to one
another (e.g. linearly correlated). The basic idea underlying several methods of data analysis is
to reduce this large number of inter-correlated variables to a smaller number of composite, but
linearly independent variables, each explaining a different fraction of the observed variation.
One of the main goals of numerical data analysis is to generate a small number of variables,
each explaining a large proportion of the variation, and to ascertain that these new variables
explain different aspects of the phenomena under study.
Eigen analysis is a key tool in helping us achieve this aim.
28. Eigen Values and Eigenvectors
For any square matrix the following relationship always exists
Where the colums of are orthonormal
n×n
-1
n×n n×n n×n n×n n×n n×n n×n
n×n
A
A U Λ U U Λ U
U
11 12 13
21 22 23
31 32 33
. . .
. . .
. . .
. . . . . .
. . . . . .
ij
a a a
a a a
a a a a
n×nA
1
2
3
0 0 . .
0 0 . .
0 0 . .
. . . . .
. . . . .
n×nΛ
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. .
. .
. .
. . . . . .
. .
n
n
ij n
n n n nn
u u u u
u u u u
u u u u u
u u u u
n×nU
Matrix is known as the canonical form ofn×n n×nΛ A
Any square matrix
Eigenvalues
Some may be zero
Some may be equal
Lagrange multipliers
Columns are eigenvectors
Columns orthonormal
29. Eigen Values and Eigenvectors
We compute eigen values of square matrix by solving n equations:
i
i iAu u i = 1 to n
i i iAu λ u 0
( )i i A I u 0
n*n matrix n*1 vector
n*1 vector
0i A Iare found by solving
i a polynomial of degree n
iu are then easily found characteristic equation
n×nA
31. Singular Value Decomposition (SVD)
Any matrix can be factorised as follows:
Columns of are the left singular vectors of
is a diagonal matrix containing (non -negative) singular value
n×p
n×p n×p p×p p×p
n×p n×n n×p p×p
n×n n×p
n×p
Y
Y V W U
Y V W U
V Y
W s
Columns of are the right singular vectors ofp×p n×pU Y
Lack of consistency in literature
32. SVD can be applied to any m × n matrix.
Eigenvalue decomposition can only be applied to certain classes of square matrices.
Nevertheless, the two decompositions are related.
Given an SVD of M, as described above, the following holds:
)(
p×n n×p p×p p×n n×n n×n n×p p×p
p×p p×n n×p p×p p×p p×p p×p
Y Y U W V V W U
U W W U U Λ U
SVD
Transpose
of SVD
n×p p×n n×n n×p p×n n×n n×n n×n n×nY Y V (W W )V V Λ Valso
eigenvectors eigenvalues
34. 11
21
1
We have a matrix with variance-covariance matrix
We transform as follows
Now consider the first column of
:
.
or .
.
.
n
z
z
z
n×p p p
n×p n×p n×p p×p
n×p n×1 n×p n×1
Y S
Y Z Y U
Z z Y u
11 12 1 11 11 11 12 21 1 1
21 22 2 21 21 11 22 21 2 1
1 1 1 11 2 21
. . .....
. . .....
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . .....
p p p
p p p
n np p n n
y y y u y u y u y u
y y y u y u y u y u
y y u y u y u y
1np pu
35.
2
1 .
1
2
1 11 2 21 1 .1 11 .2 21 . 1
1
2
1 .1 11 2 .2 21 21 1
1
2 2
1 .1 11 1 .1 2 .2 11 21
1
( )
1
..... ( ..... )
1
( ) ( ) ..... ( )
1
( ) ( )( ) ..............
n
i p
n
i i ip p p p
n
i i ip ip p
i i i
Var z z
n
y u y u y u y u y u y u
n
y y u y y u u y y u
n
y y u y y y y u u
n
n×1z
1
..........
and so on
n
1×p p×p p×1u S u
36. 1
1
Now we need to find that maximises var[ ]subject to =1
Let be a Lagrange multiplier
Then maximise = ( -1)
p×1 p×1 1×p p×1
1×p p×p p×1 1×p p×1
u z u u
u S u u u
1
1
1
0
( ) 0
We find first by solving the polynomial
p×p p×1 p×1
p×p p×p p×1
p
S u (1)u
u (2S I
S
)
1
1
1
1
0 (degree p)
We then substitute into to find
From we can show that var[ ] , subject to =1
is the first eig
(
e
1)
×p p×p
p×1
1×p p×p p×1 p×1 1×p p×1
I
u
u S u z u
(2)
u
n value of and is the first eigen vectorp×p p×1S u
37. To compute 2nd,3rd, and higher eigen values and eigen vectors is an identical process
Additional constraints are needed
Notably ( ) = 0 equivalent to Cov( ) = 0
where and are d
,
i
1×p p×1 p×1 p×1
p×1 p×1
u v u v
u v fferent eigen vectors
44. Violence
Plots of 1st and 2nd Components
Robbery
1Z
2Z
2 2 2 2
1 2
2 2 2 2
1 2
1 1 1 1
1 2
1 1 1 1
variance( ) variance( ) variance( ) variance( )
n n n n
Robbery Violence Z Z
Robbery Violence Z Z
n n n n
Robbery Violence Z Z
The PCA rotation
maximises variance of Z1
relative to Z2
It also maximises
relative to
1
2
54. Both the direction and length of the vectors can be interpreted. So, for
these data, where the vectors represent judges, and the points cars, a
group of vectors pointing in the same direction correspond to a group
of judges who have the same preference opinions about the
automobiles
55. In a biplot, the length of the lines approximates the variances of the variables.
The longer the line, the higher is the variance.
The angle between the lines, or, to be more precise, the cosine of the angle
between the lines, approximates the correlation between the variables they
represent. The closer the angle is to 90, or 270 degrees, the smaller the
correlation. An angle of 0 or 180 degrees reflects a correlation of 1 or −1,
respectively.
Taken from
The Stata Journal (2005) 5, Number 2, pp. 208–223
Data inspection using biplots.
Ulrich Kohler, Wissenschaftszentrum Berlin
kohler@wz-berlin.de,Magdalena Luniak, Wissenschaftszentrum Berlin
luniak@wz-berlin.de
http://onlinelibrary.wiley.com/doi/10.1002/9780470238004.app2/pdf
62. Start with a matrix with n rows (objects) and p columns (descriptors)
Compute the Euclidean distance (between rows) matrix [ ]
Transform into new matrix = [a ], such that :
hi
hi
d
n×p
n×n
n×n n×n
Y
D
D A
21
= -
2
Then compute centred matrix [ ]: . . ..
Finally, scale eigen vectors so that
If eigen vectors are arranged as columns. Rows of the
hi hi
hi hi hi h i
k
a d
a a a a
n×p
k k
Δ
u u
resulting table are the
coordinates of the objects in the space of principal coordinates.
72. 11 12 13
21 22 23
31 32 33
41 42 43
O O O
O O O
O O O
O O O
11 12 13
21 22 23
31 32 33
41 42 43
E E E
E E E
E E E
E E E
Matrix of
observed
counts
Matrix of
expectations
ij ij ijij
j i ji
ij ij
i jij ij ij
i j i j i j
O O OO
E O
O O O
Under hypothesis that
row and column are
independent
2
2 2
( 1) ( 1) 3 2
( )ij ij
n p
i j ij
O E
E
and
follows a chi-squared
distribution
74. Marsh Lotus Open
Swamp Swamp Water
Purple swamphen 798 78 25
Yellow-vented bulbul 690 101 129
Pink-necked green pigeon 614 150 90
Peaceful dove 462 101 84
Spotted dove 386 56 67
Pacific swallow 208 39 85
White-breasted waterhen 200 38 25
Baya weaver 173 7 52
Common myna 166 17 51
Purple heron 164 52 22
Yellow bittern 162 42 11
Jungle myna 154 15 117
White-throated kingfisher 128 51 42
Scaly-breasted munia 125 36 49
Relative abundance of bird species recorded
at three habitats of Paya IndahWetland
Reserve, Peninsular Malaysia.
Chi-squared = 505.9142,
df = 13*2=26, p-value < 2.2e-16
International Journal of Zoology
Volume 2011, Article ID 758573, 17
pages doi:10.1155/2011/758573
Bird Species Abundance and Their
Correlationship with Microclimate
and Habitat Variables at Natural
Wetland Reserve, Peninsular
Malaysia
75. 2
2
( 1) ( 1)
( )
Earlier we saw that ij ij
n p
i j ij
O E
E
( )1
Now we start with ij ij
ij
ij ij
i j
O E
O E
r×cQ
:Apply SVD to r×c r×c r×r r×c c×cQ Q V W U
orthonormal orthonormal
diagonal
c×c
We know from an earlier discussion that
are the eigen vectors of
are the eigen vectors of
Diagonal elements of W are square roots of eigenvalues, ii iw
c×c c×r r×c
r×r r×c c×r
U Q Q
V Q Q
We will also need : [ ] ij
ij
ij
i j
O
p
O
r×cP
76. Matrices and can be used to plot the positions
of the row and column vectors in two seperate scatter diagrams
c×c r×rU V
Eigen Values (1) 2.506575e-01 (2) 1.436226e-01 (3) 2.032566e-17
77. For joint plots a number of different scalings have been proposed :
For example:
Where is a diagonal matrix in which the diagonals are the
reciprocals of the square roots of the
-1/2
c×c c×c c×c
-1/2
c×c
X D U
D
column totals of
And
Where is a diagonal matrix in which the diagonals are the
reciprocals of row totals of
And
Finally, plot column1of against c
r r
r×c
-1
r×c r×r r×c c×c
-1
×
r×c
c×c c×c c×c
r×c
P
F D P X
D
P
G X W
F olumn2 of :on same graph
plot column1of against column 2 of
r×c
c×c c×c
F
G G
82. Redundancy Analysis and Canonical
Correspondence Analysis
In this context canonical means that we have two matrices or, alternatively,
two sets of descriptors for one set of objects
Rao, 1964 Ter Braak, 1986
83. 1 2 3 4 5 6 7
1 3 8
2 0 9
3 13 4
4 5 5
8 19
23
77 7 3
SpeciesSites
1 2 3 4 5 6 7
1 45.8 78.6
2 32.8 98.5
3 56.1 45.0
4
77 78.3
Counts
Environmental
Sites
e.g. Soil, Climate, etc.
Real Numbers
Typical Example of Ecology Data
84. Indirect Comparison: The matrix of explanatory variables, , does not
intervene in the in the calculation that produces the ordination of .
Correlation or regression of the ordination vectors on are carried
out first and the ordination is carried out on a modified .
In a direct comparison the matrix X intervenes in the calculation , forcing
the ordination vectors, , to be maximally related to combinations of the
columns of .
In mathematics more generally, a canonical form is the simplest and most
comprehensive form to which certain functions , relations, or expressions
can be reduced without loss of generality. For example, the canonical form
of a covariance matrix is its matrix of eigenvalues
n×pY
n×pX
n×pXn×pY
n×pX
n×pY
n×pY
86. 1 1 1 1 1 1
1 1
1 1 1 1 1 1
1 1 1
, , , ,
, , , ,
, , , ,
, , , ,
. . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . .
. . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . .
p m
p p p p p m
p m
p m m m m
y y y y y x y x
y y y y y x y x
x y x y x x x x
x y x y x x x x
S S S S
S S S S
S S S S
S S S S
Y+XS
YY YX YY YX
XY XX YX XX
S S S S
S S S S
Variance covariance matrix of
n×pY
Variance covariance matrix of n×pX
Covariances amongst descriptors of
X and Y
Covariances amongst descriptors of
X and Y
87.
In principal component analysis the eigen analysis equation was :
In redundancy analysis it is
0
:
0
k
k
YY k
-1 T
YX XX YX k
S I u
S S S I u
88. An Aside on Multiple Linear Regression
2
0 1 1 2 2 .... ,y b b x b x e e N
1 0 1 11 2 12
2 0 1 21 2 22
3 0 1 31 2 32
0 1 1 2 2
.......
.......
.......
.
.
.......n n n
y b b x b x
y b b x b x
y b b x b x
y b b x b x
1 11 12
2 21 22 0
3 31 32 1
3
1 2
1 . .
1 . .
1 . .
. . . . . .
. . . . . . .
. . . . . . .
1 . .n n n
y x x
y x x b
y x x b
b
y x x
ˆ
ˆ
ˆ
ˆ
ˆ
-1 -1
-1
-1
y = Xb
X y = (X X)b
(X X) X y = (X X) (X X)b
(X X) X y = Ib
(X X) X y = bLeast squares solution
Coefficients to be
estimated Data
89. The Algebra of Redundancy Analysis
Centre both response matrix and matrix of independent variables
by subtracting the column means from the column values / elements
For each column in compute , giving
Fo
.
ˆ ˆ
r
-1 -1
Y X
Y b (X X) X y B (X X) X Y
ˆ ˆ
ˆ ˆ ˆˆ ,
ˆ ˆ
each column of compute fitted values giving =
[1/ (1 )]
= [1/ (1 )]
[1/ (1 )]
= [1/ (1 )
=
]
n
n
n
n
-1
XX
Y Y
-1 -1
-1
-1
YX
Y y Xb Y XB X(X X) X Y
S X X
S Y Y
Y X(X X) X X(X X) X Y
Y X(X X) X Y
S S
ˆ ˆSo, perform redundancy analysis by solving : 0k k
XX YX
-1
k YX XX YX kY Y
S
S I u S S S I u
91. Population density /ha Population density /ha
Population density /ha
Burglary Violence
Robbery
92. Eigen values (1) 6.612048e+00 (2) 2.664535e-15 (3) -1.110223e-16
93. 1 1 1 2 1 3
2 1 2 2 2 3
3 1 3 2 3 3
1 2 3
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
. . .
. . .
. . .
ˆ ˆ ˆ
n n n
x b x b x b
x b x b x b
x b x b x b
x b x b x b
11 12 13
21 22 23
31 32 33
1 2 3
. . .
. . .
. . .
n n n
y y y
y y y
y y y
y y y
1 1 1 2 1 311 12 13 1
21 22 23 2 2 1 2 2 2 3
31 32 33 3 3 1 3 2 3 3
1 2 3
1 2 3
1 2 3
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ. . . . . .
. . . . . .
. . . . . .
ˆ ˆ ˆn n n n
n n n
x b x b x by y y x
y y y x x b x b x b
y y y x x b x b x b
b b b
y y y x x b x b x b
Regression model fitted
Matrix used in PCA, Rank = 3 Matrix used in Redundancy Analysis, Rank = 1
94. Eigen Values: (1) 6.706016e+00 (2) 7.439234e-02 (3) 6.071489e-18
Second attempt: fitting a quadratic instead of a straight line