Factor analysis

 Group profile:
Serial no. Name ID
1 Ashif uddin 118
2 Nurul amin 126
3 Saima sultana 127
4 Shahrukh –Bin-Sarwar 133
5 Sayma islam happy 148

Factor analysis is the technique
applicable where there is a
systematic interdependence
among a set of variables and the
researcher is interested to find out
the relation.

Basic terms related to factor analysis:
1.Factor
2.Factor-loading
3.Communality
4.Eigen value
5.Total sum of square
6.Rotation
7.Factors score

Centroid method:
This method
was developed by
(Louis Leon Thurstone)
L.L Thurstone(1887-1955)
This method was quiet frequently used until about
1950,before the advent of large capacity high speed
computers.

What is centroid method:
This is the method which extracts the largest sum of absolute loadings for each factor
in turn.
It is defined by linear combinations in which all weights are either +1.0 or -1.0.
Purpose of this method:
To maximize the sum of loadings, disregarding signs. Here
loadings mean-
For example, the hypothesis may hold that the average
student's aptitude in the field of astronomy is
{10 × the student's verbal intelligence} + {6 × the student's
mathematical intelligence}. The numbers 10 and 6 are the
factor loadings associated with astronomy.

Various steps will be illustrated by an example-
Example: Given is the following correlation matrix,R,relating to eight
variables with unities in diagonal spaces:
(1,1),(1.2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8)………………………………
…..=64
Requirement -
Using the centroid method of factor analysis, work out the first
and second centroid factors from the given information-
Steps involving

1 2 3 4 5 6 7 8
1 1.00 .709 .204 .081 .626 .113 .155 .774
2 .709 1.00 .051 .089 .581 .098 .083 .652
3 .204 .051 1.00 .671 .123 .689 .582 .072
4 .081 .089 .671 1.00 .022 .798 .613 .111
5 .626 .581 .123 .022 1.00 .047 .201 .724
6 .113 .098 .689 .798 .047 1.00 .801 .120
7 .155 .083 .582 .613 .201 .801 1.00 .152
8 .774 .652 .072 .111 .724 .120 .152 1.00
variables
variables
Solution : Given correlation matrix,R is a positive manifold and such the
weights for all variables be +1.0.
Calculating the first centroid factor(A)-

Step 1:
This method starts with the computation of a
matrix of correlations.
Termed as (R) ,where unities are place in the
diagonal spaces.

1 2 3 4 5 6 7 8
1 1.00 .709 .204 .081 .626 .113 .155 .774
2 .709 1.00 .051 .089 .581 .098 .083 .652
3 .204 .051 1.00 .671 .123 .689 .582 .072
4 .081 .089 .671 1.00 .022 .798 .613 .111
5 .626 .581 .123 .022 1.00 .047 .201 .724
6 .113 .098 .689 .798 .047 1.00 .801 .120
7 .155 .083 .582 .613 .201 .801 1.00 .152
8 .774 .652 .072 .111 .724 .120 .152 1.00
Colu-
mn
sums
3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605
Variables

STEP 2:
 IF the correlation matrix
happen to be positive manifold,
variables either weighted or can
simply summed. But incase of
negative manifold ,reflections
must be made out before the
first centroid factor is obtained.

STEP 3
To get first centroid factor –
a. The sum of the coefficients in each
column of the correlation matrix is
worked out.
b. The sum of these column sums is
obtained termed as (T)
c. The sum of each column obtained
as per(a) that is divided by the
square root of T, resulting in what

Sum of the column sums (T)= 27.884
T=5.281
First centroid factor A=
Columns Factor A
1 - 0.693
2 - 0.618
3 - 0.642
4 - 0.641
5 - 0.629
6 - 0.694
7 - 0.679
8 - 0.683

Step 4
To obtain the second centroid factor B,
developing the
first matrix of cross oroduct,Q1
For this purpose , the loadings of for the two
variables on the first centroid factor are
multiplied
Then Q1 is subtracted element by element
from the original matrix of correlation-R,

First matrix of factor cross product (Q1)
.693 .618 .642 .641 .629 .694 .679 .683
.693 .480 .428 .445 .444 .436 .481 .471 .473
.618 .428 .382 .397 .396 .389 .429 .420 .422
.642 .445 .397 .412 .412 .404 .446 .436 .438
.641 .444 .396 .412 .411 .403 .445 .435 .438
.629 .436 .389 .404. .403 .396 .437 .427 .430
.694 .481 .429 .446 .445 .437 .482 .471 .474
.679 .471 .420 .436 .435 .427 .471 .461 .464
.683 .473 .422 .438 .438 .430 .474 .464 .466
First Centroid
Factor A
First Centroid Factor A

First matrix of residual coefficient(R1)
1 2 3 4 5 6 7 8
1 0.520 0.281 -0.241 -0.363 0.190 -0.368 -0.316 0.301
2 0.281 0.618 -0.346 -0.307 0.192 -0.331 -0.337 0.230
3 -0.241 -0.346 0.588 0.259 -0.281 0.243 0.146 -0.366
4 -0.363 -0.307 0.259 0.589 -0.381 0.353 0.178 -0.327
5 0.190 0.192 0.281 0.381 0.604 -0.390 -0.217 0.294
6 -0.368 -0.331 0.243 0.353 -0.390 0.518 0.330 -0.354
7 -0.316 -0.337 0.146 0.178 -0.226 0.330 0.539 -0.312
8 0.301 0.230 -0.366 -0.327 0.294 -0.354 -0.312 0.534
Variables
variables

Contd….
 After obtaining R1, one must reflect some of
the variables in it, meaning there by that some
of the variables are given negative signs in the
sum.
 For any variable which is so reflected, the signs
of all coefficients in that column and row of the
residual matrix are changed.
 The matrix named as reflected matrix from
which the loadings can be obtained in the usual

Extraction of second centroid (B)
1 2 3* 4* 5 6* 7* 8
1 0.520 0.281 -0.241 -0.363 0.190 -0.368 -0.316 0.301
2 0.281 0.618 -0.346 -0.307 0.192 -0.331 -0.337 0.230
3* -0.241 -0.346 0.588 0.259 -0.281 0.243 0.146 -0.366
4* -0.363 -0.307 0.259 0.589 -0.381 0.353 0.178 -0.327
5 0.190 0.192 0.281 0.381 0.604 -0.390 -0.217 0.294
6* -0.368 -0.331 0.243 0.353 -0.390 0.518 0.330 -0.354
7* -0.316 -0.337 0.146 0.178 -0.226 0.330 0.539 -0.312
8 0.301 0.230 -0.366 -0.327 0.294 -0.354 -0.312 0.534
Column
sums
2.580 2.642 2.470 2.757 2.558 2.887 2.375 2.718
Variables

CONTINUED……………….
.563 ,.577,-.539,-.602,.558,-.630,-.518,.593
Sums of the column sums (T)= 20.987
T=4.581
Second centroid factor B=

--
VARIABLES Centroid Factor A Centroid Factor B
1 .693 .563
2 .618 .577
3 .642 -.539
4 .641 -.602
5 .629 .558
6 .694 -.630
7 .679 .518
8 .683 .593
Factor loadings

Step 5
For subsequent factors(C,D)the same
process outlined above is repeated.
Cross products are computed forming,
matrix Q2
To obtain a third factor (C) one should
operate R2 in the same way of R1.

Maximum likelihood method
The maximum likelihood (ML)method consists in obtaining sets of factor
loading successively in such a way that each, in turn, explains as much as
possible of the population correlation matrix as estimated from the sample
correlation matrix. If 𝑹 𝑺 stands for the correlation matrix actually
obtained from the data in sample, 𝑹 𝒑 stands for the correlation matrix
would be obtained if the entire population were tested, then the ML
method seeks to extrapolate what is known from 𝑹 𝑺 is the best possible
way to estimate 𝑹 𝒑 .Thus the method is statistical approach in which one
maximizes some relationship between the sample of data and the
population from which the sample has drawn.
The loading obtained on the first factor are employed in the usual way to
obtain a matrix of the residual coefficients. A significance test is then
applied to indicate whether is would be reasonable to extract a second
factor. This goes one repeatedly in search of one factor after another. One
stops factoring after the significance fails to reject the null hypothesis for
the residual matrix. The final product is a matrix of factor loading.

Rotation in factor analysis
Simple structure is obtained by rotating the axes until:
I. Each row of the factor matrix has one zero.
II. Each column of the factor matrix has p zero, where p is the
number of factor.
III. For each pair of factors, there are several variables for which
the loading on one is virtually zero and the loading on the
other is substantial.
IV. If there are many factors, then for each pair of the factors there
are many variables for which both loadings are zero.
V. For every pair of factors, the number of variables with non-
vanishing loading on both of them is small.
All these criteria imply that the factor analysis should reduce the
complexity of all the variables.

R-type and Q-type factor analysis
In R-type factor analysis, high correlation occur when respondents who score high on
variable 1 also score high on variable 2 and respondent who score low on variable 1
and also score low on variable 2.
In Q-type factor analysis, the correlation are computed between pairs of respondents
instead of pairs of variables. High correlations occur when respondent 1’s pattern of
responses on all the variables is much like respondent 2’s pattern of responses

Principal component analysis (PCA) is a statistical
procedure that uses an orthogonal transformation
to convert a set of observations of possibly
correlated variables into a set of values of linearly
uncorrelated variables called principal components
Karl Pearson
• PCA was invented by him in 1901.
Harold Hotelling
• It was later independently developed and
named by him in 1930s.

Principal component analysis
 A backbone of modern data analysis.
 A black box that is widely used but poorly
understood.
 It is a mathematical tool from applied linear
algebra.
 It is a simple, non-parametric method of
extracting relevant information from confusing
data sets.
 It provides a roadmap for how to reduce a
complex data set to a lower dimension

Method
Get
some
data
Subtrac
t the
mean
Calcula
te the
covaria
nce
matrix
Calcula
te the
eigenv
ectors
and
eigenva
lues of
covaria
Choose
compo
nents
and
form a
feature
vector
Derive
the
new
data
set

Cova r i a n c e Matrix
Eigenvalue

 Finding the roots of | A – λ .I| will give the
eigenvalues and for each of these eigenvalues
there will be an eigenvector

 Therefore the first eigenvector is any column vector
in which the two elements have equal magnitude
and opposite sign.

Factor analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Factor analysis

Similar to Factor analysis (20)

Recently uploaded

Recently uploaded (20)

Factor analysis

Editor's Notes