January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Background and Intuition : 3-5

 Principal Components Analysis : 6-15
 Factor Analysis : 16-20

    Comparison between PCA and Factor Analysis : 21-27
    Cases and choice between PCA and Factor Analysis : 28-33




January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
Analyst 1: im confused, should I run pca or factor analysis?
Analyst 2: depends. If you are doing variable reduction or developing a ranking, pca is better. If
you proposing a model for the observed variables, then factor analysis

Analyst1: so there is a difference between the two?
Analyst2: yep

Analyst1: but both give very close communalities
Analyst2: yep but not always

Analyst1: can you tell me the difference between the two?
Analyst2: yep

Analyst1: in non-mathematical terms?
Analyst2: nope. pca is maths and factor analysis is stats. There is no layman analogue of
eigenvectors and eigenvalues that I know of

Analyst1: but if in most cases they are similar, should I bother?
Analyst2: if you trained in maths or stats, yes, or you wouldnt be able to sleep at night. If you are
trained in market research, then no . Serious ans is it depends on data
January 28, 2013          ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 n*1 vector of random variables Y
 Say, μi = E(Yi) where i=1,2,…n
 Then var (Yi)=E(Yi- μi)(Yi- μi)… (1)
 And cov(Yi)=E(Yi- μi)(Yi- μj) where i ne j...(2)
    Then (Yi- μi) (Yi- μi)’ gives the variance
     covariance matrix where the diagonal
     elements are (1) and the off diagonal
     elements are (2)
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Think that you are trying to solve this problem
     every time you take a photograph - Converting a
     3d object into a 2d photograph with maximum
     details retained
    If we take a high dimensional data vector in n
     space and project into a lower dimensional sub-
     space of n-k, k>0 dimensions, and do it such that
     the retained variance is maximised, we get PC
    Note, there is no model involved here… just want
     to capture the maximum information in the
     photograph

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Suppose that x is a vector of p random variables,
     and that the variances of the p random variables
     and the structure of the covariances or
     correlations between the p variables are of
     interest
    Say we are lazy and simply don’t want to look at
     the p variances and all of the1/2{p(p − 1)}
     correlations or covariances
    An alternative approach is to look for a few (<< p)
     derived variables that preserve most of the
     information given by these variances and
     correlations or covariances
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Although PCA does not ignore covariances and
     correlations, it concentrates on variances
    The way we would go about finding these PCs is so
     that the minimum number of PCs can explain
     maximum variance
    The first step is to look for a linear function α’1x of
     the elements of x having maximum variance, where
     α1 is a vector of p constants α11, α12, . . . , α1p
    Next, look for a linear function α’2x, uncorrelated
     with α’1x having maximum variance, and so on
    These are the Principal Components
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 Consider, for the moment, the case where the vector of
  random variables x has a known covariance matrix Σ
 This is the famous variance covariance matrix whose (i,j)th
  element is the (known) covariance between the ith and jth
  elements of x when i not equal to j, and the variance of the
  jth element of x when i = j
 Now two very important results:
1. It turns out that for k = 1, 2, · · · , p, the kth PC is given by
   zk = α’kx where αk is an eigenvector of Σ corresponding to
   its kth largest eigenvalue λk
2. Furthermore, if αk is chosen to have unit length (α’kαk =
   1), then var(zk) = λk, where var(zk) denotes the variance of
   zk

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    To derive the form of the PCs, consider first
     α’1x; the vector α1 maximizes
     var*α’1x] = α1Σα1
       It is clear the maximum will not be achieved
        for finite α1 so a normalization constraint
        must be imposed
       The constraint used in the derivation is α’1α1
        = 1, that is, the sum of squares of elements
        of α1 equals 1
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    To maximize α’1Σα1 subject to α’1α1 = 1, the
     standard approach is to use the technique of
     Lagrange multipliers
    Maximise: α’1Σα1 − λ(α’1α1 − 1), where λ is the
     lagrange multiplier
    Differentiation with respect to α1 gives
        Σα1 − λα1 = 0,….. (A)
     OR, (Σ − λIp)α1 = 0, where Ip is the (p p) identity matrix
    Thus, λ is an eigenvalue of Σ and α1 is the
     corresponding eigenvector (from spectral
     decomposition, look up)

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Now α1 is supposed to maximise the variance
  α’1Σα1
 For that to happen, Σα1 − λα1 =0 must hold
 Or, α’1Σα1 = α’1 λα1 = λ α’1 α1 = λ
 So, the variance, when maximised would equal λ
    Which implies, that if we select the largest
     eigenvalue λ and the eigenvector associated with it
     α1, we would be maximising the retained variance
     and α’1x would be the first PC

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    The second PC, α’2x, maximizes α’2Σα2 subject to
     being uncorrelated with α’1x
    Or equivalently subject to cov[α’1x,α’2x] = 0,
     where cov(x, y) denotes the covariance between
     the random variables x and y
    Solving, we once again come down to maximising
     λ but it cant be equal to the largest eigenvalue
     since that is already taken by the first PC. So,
     λ=λ2, or the second largest eigenvalue of Σ
    And so on

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    It can be shown that for the first, second,
     third, fourth, . . . , pth PCs, the vectors of
     coefficients α1,α2,α3,α4, . . . ,αp are the
     eigenvectors of Σ corresponding to λ1, λ2,λ3,
     λ4, . . . , λp, the first, second, third and fourth
     largest, . . . , and the smallest eigenvalue,
     respectively
    Also, var[α’kx] = λk for k = 1, 2, . . . , p.

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Principal component analysis has often been
     dealt with in textbooks as a special case of
     factor analysis, and this practice is continued
     by some widely used computer packages,
     which treat PCA as one option in a program
     for factor analysis
    This view is misguided since PCA and factor
     analysis, as usually defined, are really quite
     distinct techniques
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    The basic idea underlying factor analysis is that p
     observed random variables, x, can be expressed,
     except for an error term, as linear functions of m
     (< p) hypothetical (random) variables or common
     factors
    That is if x1, x2, . . . , xp are the variables and f1,
     f2, . . . , fm are the factors, then
         x1 = λ11f1 + λ12f2 + . . . + λ1mfm + e1
         x2 = λ21f1 + λ22f2 + . . . + λ2mfm + e2
         ...
         xp = λp1f1 + λp2f2 + . . . + λpmfm + ep

January 28, 2013     ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    λ are the factor loadings
    e’s are the error terms sometimes also called
     specific factors. This is because, ej is specific to
     xj unlike fk are common to several xj
    fk are the factors common to several x’s
    We will skip the additional details of the
     factor analysis model since the objective is to
     demonstrate the difference between factor
     analysis and pca and not to explain the former
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    x = Λf + e (factor analysis model in matrix form)
    Now, going back to the 2 analysts we had met at
     the start of this presentation:
    Analyst 2: while factor analysis and pca are both
     dimension reduction techniques, factor analysis
     attempts to do so by proposing a model relating
     the observed to the latent variables. Pca has no
     such underlying model
    In other words, the cameraman is just trying to
     take the best 2d representation of the 3d world
     (pca). He is not trying to fit a model to explain
     the world
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Analyst 1: But, since both are trying to do the
     same thing, what if pca is used to solve the
     factor analysis model? Then there would be
     no difference between pca and factor analysis
     right?
    Analyst 2: very good point. But pca explains
     all the variance and covariance of variance
     covariance matrix for a given data. Whereas,
     factor analysis explains only the common
     variance. Lets get back to models.
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    α’kΣαk is maximised by setting α’kΣαk = λk on slide
     10
    This maximises var(z=α’kx) but it maximises the
     variances along the diagonal of Σ as well as the
     off-diagonal covariances or correlations in it
    So PCs explain diagonal elements = variance as
     well as off-diagonal = covariance/correlation
     elements of the variance-covariance matrix of
     the original data matrix x

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    If you remember the factor analysis model in
     matrix form: x = Λf + e
    Along with the following assumptions,
       E[ee] = Ψ (diagonal)
       E[fe] = 0 (a matrix of zeros)
       E[ff] = Im (an identity matrix)
    the above model implies that the variance
     covariance matrix would have the form:
    Σ = ΛΛ +Ψ
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Σ = ΛΛ +Ψ
    Now, Ψ is a diagonal matrix, which means that its off
     diagonal terms are zero
    So, the contribution of Ψ towards the off diagonal terms of
     Σ would be nil
    Note that the relative contribution of ΛΛ and Ψ on
     diagonal terms in Σ would depend on the nature of the
     variable xj in question
    If xj is highly correlated with all other variables then the
     communalities would be large and specific variance or ej
     would be low
    On the other hand, if xj is almost uncorrelated with the
     other variables then communalities would be low and ej
     would be large

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    We have a data vector x
    This data vector has a variance covariance
     matrix Σ
       The diagonal terms of which are the variances
       The off diagonal terms are the covariances
    If the objective is dimension reduction or to
     get a ranking variable or something like image
     recognition etc. then we would do PCA
    PCA would take care of diagonal as well as off-
     diagonal elements of the matrix Σ
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Now say we come up with a factor model that
     explains the data x
    Using this model we can decompose the
     variance covariance matrix Σ = ΛΛ +Ψ
    Now as soon as move to a factor model our
     objective changes from retaining the
     maximum variance (photography example) to
     uncovering the common latent factors driving
     the data (e.g.: psychology driving behaviour)
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 That is, in factor analysis, we are interested in
  ΛΛ part of Σ only. In PCA we are interested in
  the entire Σ
 To understand properly let us consider the
  following cases:
1. The variables in vector x are all correlated
2. The variables in vector x are uncorrelated
3. Some of the variables are correlated and
   some are not
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    If we specify a factor analysis model then the
     elements of Ψ would be small and ΛΛ would be
     dominating Σ
    In other words, the diagonal and off diagonal
     elements of Σ are all dominated by the common
     variation
    So, a direct PCA to extract the factors would
     mostly extract common variation
    Since this is the objective of factor analysis as
     well, in this case, pca and factor analysis would
     both give very close results

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 To see this intuitively let us consider the principal
  factor analysis, an alternative method to extract
  factors
 What it does is, since in factor analysis we are
  interested in the common variation only, or
      Σ - Ψ = ΛΛ, it applies PCA on Σ - Ψ rather
than on entire Σ
 However, since all variables are correlated, Ψ has
  small elements, so the difference between
  factors extracted by applying pca on Σ - Ψ vs Σ, is
  likely to be minimal

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Again if we consider the factor analysis model, the
     diagonal elements of Σ would be dominated by specific
     variance from Ψ and the off diagonal elements would
     be really small
    Here, an application of pca on Σ would extract PCs that
     take care of only specific variance and not common
     variation. So they would be very different from the
     factors
    Factors drawn through pfa would be different from the
     components but the model would not hold since there
     is no correlation
    In this case, factor analysis doesnt make sense which
     should be clear from the correlation matrix it self
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Consider two variables: xi and xj
    xi is highly correlated with the rest of the variables
    Then in Σ, both the diagonal and off-diagonal elements
     of xi are dominated by common variation
    xj on the other hand is not correlated with the rest of
     the variables
    In Σ, the diagonal element of xj is dominated by
     specific variation from Ψ
    Applying pca on Σ, would once again consider both
     common and specific variation making the
     components different from factors
    Its better to apply pfa since it would strip Σ of the
     specific variance due to Ψ
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Now that we know where the choice between
     pca and factor analysis is trivial in practice (they
     always make theoretical difference) and where it
     is not, how do we choose?
    The answer is the most basic step of statistics
       Know your data
       Or more specifically, know the correlation structure of
          your data
    However, since this is difficult and a judgement
     call it is always advisable to use non-pca
     techniques for factor analysis, lest you come up
     with factors that also contain specific variation
January 28, 2013      ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India

How principal components analysis is different from factor

  • 1.
    January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 2.
    Background and Intuition : 3-5  Principal Components Analysis : 6-15  Factor Analysis : 16-20  Comparison between PCA and Factor Analysis : 21-27  Cases and choice between PCA and Factor Analysis : 28-33 January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 3.
    Analyst 1: imconfused, should I run pca or factor analysis? Analyst 2: depends. If you are doing variable reduction or developing a ranking, pca is better. If you proposing a model for the observed variables, then factor analysis Analyst1: so there is a difference between the two? Analyst2: yep Analyst1: but both give very close communalities Analyst2: yep but not always Analyst1: can you tell me the difference between the two? Analyst2: yep Analyst1: in non-mathematical terms? Analyst2: nope. pca is maths and factor analysis is stats. There is no layman analogue of eigenvectors and eigenvalues that I know of Analyst1: but if in most cases they are similar, should I bother? Analyst2: if you trained in maths or stats, yes, or you wouldnt be able to sleep at night. If you are trained in market research, then no . Serious ans is it depends on data January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 4.
     n*1 vectorof random variables Y  Say, μi = E(Yi) where i=1,2,…n  Then var (Yi)=E(Yi- μi)(Yi- μi)… (1)  And cov(Yi)=E(Yi- μi)(Yi- μj) where i ne j...(2)  Then (Yi- μi) (Yi- μi)’ gives the variance covariance matrix where the diagonal elements are (1) and the off diagonal elements are (2) January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 5.
    Think that you are trying to solve this problem every time you take a photograph - Converting a 3d object into a 2d photograph with maximum details retained  If we take a high dimensional data vector in n space and project into a lower dimensional sub- space of n-k, k>0 dimensions, and do it such that the retained variance is maximised, we get PC  Note, there is no model involved here… just want to capture the maximum information in the photograph January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 6.
    January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 7.
    Suppose that x is a vector of p random variables, and that the variances of the p random variables and the structure of the covariances or correlations between the p variables are of interest  Say we are lazy and simply don’t want to look at the p variances and all of the1/2{p(p − 1)} correlations or covariances  An alternative approach is to look for a few (<< p) derived variables that preserve most of the information given by these variances and correlations or covariances January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 8.
    Although PCA does not ignore covariances and correlations, it concentrates on variances  The way we would go about finding these PCs is so that the minimum number of PCs can explain maximum variance  The first step is to look for a linear function α’1x of the elements of x having maximum variance, where α1 is a vector of p constants α11, α12, . . . , α1p  Next, look for a linear function α’2x, uncorrelated with α’1x having maximum variance, and so on  These are the Principal Components January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 9.
     Consider, forthe moment, the case where the vector of random variables x has a known covariance matrix Σ  This is the famous variance covariance matrix whose (i,j)th element is the (known) covariance between the ith and jth elements of x when i not equal to j, and the variance of the jth element of x when i = j  Now two very important results: 1. It turns out that for k = 1, 2, · · · , p, the kth PC is given by zk = α’kx where αk is an eigenvector of Σ corresponding to its kth largest eigenvalue λk 2. Furthermore, if αk is chosen to have unit length (α’kαk = 1), then var(zk) = λk, where var(zk) denotes the variance of zk January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 10.
    To derive the form of the PCs, consider first α’1x; the vector α1 maximizes var*α’1x] = α1Σα1  It is clear the maximum will not be achieved for finite α1 so a normalization constraint must be imposed  The constraint used in the derivation is α’1α1 = 1, that is, the sum of squares of elements of α1 equals 1 January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 11.
    To maximize α’1Σα1 subject to α’1α1 = 1, the standard approach is to use the technique of Lagrange multipliers  Maximise: α’1Σα1 − λ(α’1α1 − 1), where λ is the lagrange multiplier  Differentiation with respect to α1 gives Σα1 − λα1 = 0,….. (A) OR, (Σ − λIp)α1 = 0, where Ip is the (p p) identity matrix  Thus, λ is an eigenvalue of Σ and α1 is the corresponding eigenvector (from spectral decomposition, look up) January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 12.
    Now α1 is supposed to maximise the variance α’1Σα1  For that to happen, Σα1 − λα1 =0 must hold  Or, α’1Σα1 = α’1 λα1 = λ α’1 α1 = λ  So, the variance, when maximised would equal λ  Which implies, that if we select the largest eigenvalue λ and the eigenvector associated with it α1, we would be maximising the retained variance and α’1x would be the first PC January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 13.
    The second PC, α’2x, maximizes α’2Σα2 subject to being uncorrelated with α’1x  Or equivalently subject to cov[α’1x,α’2x] = 0, where cov(x, y) denotes the covariance between the random variables x and y  Solving, we once again come down to maximising λ but it cant be equal to the largest eigenvalue since that is already taken by the first PC. So, λ=λ2, or the second largest eigenvalue of Σ  And so on January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 14.
    It can be shown that for the first, second, third, fourth, . . . , pth PCs, the vectors of coefficients α1,α2,α3,α4, . . . ,αp are the eigenvectors of Σ corresponding to λ1, λ2,λ3, λ4, . . . , λp, the first, second, third and fourth largest, . . . , and the smallest eigenvalue, respectively  Also, var[α’kx] = λk for k = 1, 2, . . . , p. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 15.
    Principal component analysis has often been dealt with in textbooks as a special case of factor analysis, and this practice is continued by some widely used computer packages, which treat PCA as one option in a program for factor analysis  This view is misguided since PCA and factor analysis, as usually defined, are really quite distinct techniques January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 16.
    January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 17.
    The basic idea underlying factor analysis is that p observed random variables, x, can be expressed, except for an error term, as linear functions of m (< p) hypothetical (random) variables or common factors  That is if x1, x2, . . . , xp are the variables and f1, f2, . . . , fm are the factors, then  x1 = λ11f1 + λ12f2 + . . . + λ1mfm + e1  x2 = λ21f1 + λ22f2 + . . . + λ2mfm + e2  ...  xp = λp1f1 + λp2f2 + . . . + λpmfm + ep January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 18.
    λ are the factor loadings  e’s are the error terms sometimes also called specific factors. This is because, ej is specific to xj unlike fk are common to several xj  fk are the factors common to several x’s  We will skip the additional details of the factor analysis model since the objective is to demonstrate the difference between factor analysis and pca and not to explain the former January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 19.
    x = Λf + e (factor analysis model in matrix form)  Now, going back to the 2 analysts we had met at the start of this presentation:  Analyst 2: while factor analysis and pca are both dimension reduction techniques, factor analysis attempts to do so by proposing a model relating the observed to the latent variables. Pca has no such underlying model  In other words, the cameraman is just trying to take the best 2d representation of the 3d world (pca). He is not trying to fit a model to explain the world January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 20.
    Analyst 1: But, since both are trying to do the same thing, what if pca is used to solve the factor analysis model? Then there would be no difference between pca and factor analysis right?  Analyst 2: very good point. But pca explains all the variance and covariance of variance covariance matrix for a given data. Whereas, factor analysis explains only the common variance. Lets get back to models. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 21.
    January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 22.
    α’kΣαk is maximised by setting α’kΣαk = λk on slide 10  This maximises var(z=α’kx) but it maximises the variances along the diagonal of Σ as well as the off-diagonal covariances or correlations in it  So PCs explain diagonal elements = variance as well as off-diagonal = covariance/correlation elements of the variance-covariance matrix of the original data matrix x January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 23.
    If you remember the factor analysis model in matrix form: x = Λf + e  Along with the following assumptions,  E[ee] = Ψ (diagonal)  E[fe] = 0 (a matrix of zeros)  E[ff] = Im (an identity matrix)  the above model implies that the variance covariance matrix would have the form:  Σ = ΛΛ +Ψ January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 24.
    Σ = ΛΛ +Ψ  Now, Ψ is a diagonal matrix, which means that its off diagonal terms are zero  So, the contribution of Ψ towards the off diagonal terms of Σ would be nil  Note that the relative contribution of ΛΛ and Ψ on diagonal terms in Σ would depend on the nature of the variable xj in question  If xj is highly correlated with all other variables then the communalities would be large and specific variance or ej would be low  On the other hand, if xj is almost uncorrelated with the other variables then communalities would be low and ej would be large January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 25.
    We have a data vector x  This data vector has a variance covariance matrix Σ  The diagonal terms of which are the variances  The off diagonal terms are the covariances  If the objective is dimension reduction or to get a ranking variable or something like image recognition etc. then we would do PCA  PCA would take care of diagonal as well as off- diagonal elements of the matrix Σ January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 26.
    Now say we come up with a factor model that explains the data x  Using this model we can decompose the variance covariance matrix Σ = ΛΛ +Ψ  Now as soon as move to a factor model our objective changes from retaining the maximum variance (photography example) to uncovering the common latent factors driving the data (e.g.: psychology driving behaviour) January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 27.
     That is,in factor analysis, we are interested in ΛΛ part of Σ only. In PCA we are interested in the entire Σ  To understand properly let us consider the following cases: 1. The variables in vector x are all correlated 2. The variables in vector x are uncorrelated 3. Some of the variables are correlated and some are not January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 28.
    January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 29.
    If we specify a factor analysis model then the elements of Ψ would be small and ΛΛ would be dominating Σ  In other words, the diagonal and off diagonal elements of Σ are all dominated by the common variation  So, a direct PCA to extract the factors would mostly extract common variation  Since this is the objective of factor analysis as well, in this case, pca and factor analysis would both give very close results January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 30.
     To seethis intuitively let us consider the principal factor analysis, an alternative method to extract factors  What it does is, since in factor analysis we are interested in the common variation only, or Σ - Ψ = ΛΛ, it applies PCA on Σ - Ψ rather than on entire Σ  However, since all variables are correlated, Ψ has small elements, so the difference between factors extracted by applying pca on Σ - Ψ vs Σ, is likely to be minimal January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 31.
    Again if we consider the factor analysis model, the diagonal elements of Σ would be dominated by specific variance from Ψ and the off diagonal elements would be really small  Here, an application of pca on Σ would extract PCs that take care of only specific variance and not common variation. So they would be very different from the factors  Factors drawn through pfa would be different from the components but the model would not hold since there is no correlation  In this case, factor analysis doesnt make sense which should be clear from the correlation matrix it self January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 32.
    Consider two variables: xi and xj  xi is highly correlated with the rest of the variables  Then in Σ, both the diagonal and off-diagonal elements of xi are dominated by common variation  xj on the other hand is not correlated with the rest of the variables  In Σ, the diagonal element of xj is dominated by specific variation from Ψ  Applying pca on Σ, would once again consider both common and specific variation making the components different from factors  Its better to apply pfa since it would strip Σ of the specific variance due to Ψ January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 33.
    Now that we know where the choice between pca and factor analysis is trivial in practice (they always make theoretical difference) and where it is not, how do we choose?  The answer is the most basic step of statistics  Know your data  Or more specifically, know the correlation structure of your data  However, since this is difficult and a judgement call it is always advisable to use non-pca techniques for factor analysis, lest you come up with factors that also contain specific variation January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India