SlideShare a Scribd company logo
1 of 33
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Background and Intuition : 3-5

 Principal Components Analysis : 6-15
 Factor Analysis : 16-20

    Comparison between PCA and Factor Analysis : 21-27
    Cases and choice between PCA and Factor Analysis : 28-33




January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
Analyst 1: im confused, should I run pca or factor analysis?
Analyst 2: depends. If you are doing variable reduction or developing a ranking, pca is better. If
you proposing a model for the observed variables, then factor analysis

Analyst1: so there is a difference between the two?
Analyst2: yep

Analyst1: but both give very close communalities
Analyst2: yep but not always

Analyst1: can you tell me the difference between the two?
Analyst2: yep

Analyst1: in non-mathematical terms?
Analyst2: nope. pca is maths and factor analysis is stats. There is no layman analogue of
eigenvectors and eigenvalues that I know of

Analyst1: but if in most cases they are similar, should I bother?
Analyst2: if you trained in maths or stats, yes, or you wouldnt be able to sleep at night. If you are
trained in market research, then no . Serious ans is it depends on data
January 28, 2013          ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 n*1 vector of random variables Y
 Say, μi = E(Yi) where i=1,2,…n
 Then var (Yi)=E(Yi- μi)(Yi- μi)… (1)
 And cov(Yi)=E(Yi- μi)(Yi- μj) where i ne j...(2)
    Then (Yi- μi) (Yi- μi)’ gives the variance
     covariance matrix where the diagonal
     elements are (1) and the off diagonal
     elements are (2)
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Think that you are trying to solve this problem
     every time you take a photograph - Converting a
     3d object into a 2d photograph with maximum
     details retained
    If we take a high dimensional data vector in n
     space and project into a lower dimensional sub-
     space of n-k, k>0 dimensions, and do it such that
     the retained variance is maximised, we get PC
    Note, there is no model involved here… just want
     to capture the maximum information in the
     photograph

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Suppose that x is a vector of p random variables,
     and that the variances of the p random variables
     and the structure of the covariances or
     correlations between the p variables are of
     interest
    Say we are lazy and simply don’t want to look at
     the p variances and all of the1/2{p(p − 1)}
     correlations or covariances
    An alternative approach is to look for a few (<< p)
     derived variables that preserve most of the
     information given by these variances and
     correlations or covariances
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Although PCA does not ignore covariances and
     correlations, it concentrates on variances
    The way we would go about finding these PCs is so
     that the minimum number of PCs can explain
     maximum variance
    The first step is to look for a linear function α’1x of
     the elements of x having maximum variance, where
     α1 is a vector of p constants α11, α12, . . . , α1p
    Next, look for a linear function α’2x, uncorrelated
     with α’1x having maximum variance, and so on
    These are the Principal Components
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 Consider, for the moment, the case where the vector of
  random variables x has a known covariance matrix Σ
 This is the famous variance covariance matrix whose (i,j)th
  element is the (known) covariance between the ith and jth
  elements of x when i not equal to j, and the variance of the
  jth element of x when i = j
 Now two very important results:
1. It turns out that for k = 1, 2, · · · , p, the kth PC is given by
   zk = α’kx where αk is an eigenvector of Σ corresponding to
   its kth largest eigenvalue λk
2. Furthermore, if αk is chosen to have unit length (α’kαk =
   1), then var(zk) = λk, where var(zk) denotes the variance of
   zk

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    To derive the form of the PCs, consider first
     α’1x; the vector α1 maximizes
     var*α’1x] = α1Σα1
       It is clear the maximum will not be achieved
        for finite α1 so a normalization constraint
        must be imposed
       The constraint used in the derivation is α’1α1
        = 1, that is, the sum of squares of elements
        of α1 equals 1
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    To maximize α’1Σα1 subject to α’1α1 = 1, the
     standard approach is to use the technique of
     Lagrange multipliers
    Maximise: α’1Σα1 − λ(α’1α1 − 1), where λ is the
     lagrange multiplier
    Differentiation with respect to α1 gives
        Σα1 − λα1 = 0,….. (A)
     OR, (Σ − λIp)α1 = 0, where Ip is the (p p) identity matrix
    Thus, λ is an eigenvalue of Σ and α1 is the
     corresponding eigenvector (from spectral
     decomposition, look up)

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Now α1 is supposed to maximise the variance
  α’1Σα1
 For that to happen, Σα1 − λα1 =0 must hold
 Or, α’1Σα1 = α’1 λα1 = λ α’1 α1 = λ
 So, the variance, when maximised would equal λ
    Which implies, that if we select the largest
     eigenvalue λ and the eigenvector associated with it
     α1, we would be maximising the retained variance
     and α’1x would be the first PC

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    The second PC, α’2x, maximizes α’2Σα2 subject to
     being uncorrelated with α’1x
    Or equivalently subject to cov[α’1x,α’2x] = 0,
     where cov(x, y) denotes the covariance between
     the random variables x and y
    Solving, we once again come down to maximising
     λ but it cant be equal to the largest eigenvalue
     since that is already taken by the first PC. So,
     λ=λ2, or the second largest eigenvalue of Σ
    And so on

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    It can be shown that for the first, second,
     third, fourth, . . . , pth PCs, the vectors of
     coefficients α1,α2,α3,α4, . . . ,αp are the
     eigenvectors of Σ corresponding to λ1, λ2,λ3,
     λ4, . . . , λp, the first, second, third and fourth
     largest, . . . , and the smallest eigenvalue,
     respectively
    Also, var[α’kx] = λk for k = 1, 2, . . . , p.

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Principal component analysis has often been
     dealt with in textbooks as a special case of
     factor analysis, and this practice is continued
     by some widely used computer packages,
     which treat PCA as one option in a program
     for factor analysis
    This view is misguided since PCA and factor
     analysis, as usually defined, are really quite
     distinct techniques
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    The basic idea underlying factor analysis is that p
     observed random variables, x, can be expressed,
     except for an error term, as linear functions of m
     (< p) hypothetical (random) variables or common
     factors
    That is if x1, x2, . . . , xp are the variables and f1,
     f2, . . . , fm are the factors, then
         x1 = λ11f1 + λ12f2 + . . . + λ1mfm + e1
         x2 = λ21f1 + λ22f2 + . . . + λ2mfm + e2
         ...
         xp = λp1f1 + λp2f2 + . . . + λpmfm + ep

January 28, 2013     ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    λ are the factor loadings
    e’s are the error terms sometimes also called
     specific factors. This is because, ej is specific to
     xj unlike fk are common to several xj
    fk are the factors common to several x’s
    We will skip the additional details of the
     factor analysis model since the objective is to
     demonstrate the difference between factor
     analysis and pca and not to explain the former
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    x = Λf + e (factor analysis model in matrix form)
    Now, going back to the 2 analysts we had met at
     the start of this presentation:
    Analyst 2: while factor analysis and pca are both
     dimension reduction techniques, factor analysis
     attempts to do so by proposing a model relating
     the observed to the latent variables. Pca has no
     such underlying model
    In other words, the cameraman is just trying to
     take the best 2d representation of the 3d world
     (pca). He is not trying to fit a model to explain
     the world
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Analyst 1: But, since both are trying to do the
     same thing, what if pca is used to solve the
     factor analysis model? Then there would be
     no difference between pca and factor analysis
     right?
    Analyst 2: very good point. But pca explains
     all the variance and covariance of variance
     covariance matrix for a given data. Whereas,
     factor analysis explains only the common
     variance. Lets get back to models.
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    α’kΣαk is maximised by setting α’kΣαk = λk on slide
     10
    This maximises var(z=α’kx) but it maximises the
     variances along the diagonal of Σ as well as the
     off-diagonal covariances or correlations in it
    So PCs explain diagonal elements = variance as
     well as off-diagonal = covariance/correlation
     elements of the variance-covariance matrix of
     the original data matrix x

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    If you remember the factor analysis model in
     matrix form: x = Λf + e
    Along with the following assumptions,
       E[ee] = Ψ (diagonal)
       E[fe] = 0 (a matrix of zeros)
       E[ff] = Im (an identity matrix)
    the above model implies that the variance
     covariance matrix would have the form:
    Σ = ΛΛ +Ψ
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Σ = ΛΛ +Ψ
    Now, Ψ is a diagonal matrix, which means that its off
     diagonal terms are zero
    So, the contribution of Ψ towards the off diagonal terms of
     Σ would be nil
    Note that the relative contribution of ΛΛ and Ψ on
     diagonal terms in Σ would depend on the nature of the
     variable xj in question
    If xj is highly correlated with all other variables then the
     communalities would be large and specific variance or ej
     would be low
    On the other hand, if xj is almost uncorrelated with the
     other variables then communalities would be low and ej
     would be large

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    We have a data vector x
    This data vector has a variance covariance
     matrix Σ
       The diagonal terms of which are the variances
       The off diagonal terms are the covariances
    If the objective is dimension reduction or to
     get a ranking variable or something like image
     recognition etc. then we would do PCA
    PCA would take care of diagonal as well as off-
     diagonal elements of the matrix Σ
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Now say we come up with a factor model that
     explains the data x
    Using this model we can decompose the
     variance covariance matrix Σ = ΛΛ +Ψ
    Now as soon as move to a factor model our
     objective changes from retaining the
     maximum variance (photography example) to
     uncovering the common latent factors driving
     the data (e.g.: psychology driving behaviour)
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 That is, in factor analysis, we are interested in
  ΛΛ part of Σ only. In PCA we are interested in
  the entire Σ
 To understand properly let us consider the
  following cases:
1. The variables in vector x are all correlated
2. The variables in vector x are uncorrelated
3. Some of the variables are correlated and
   some are not
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    If we specify a factor analysis model then the
     elements of Ψ would be small and ΛΛ would be
     dominating Σ
    In other words, the diagonal and off diagonal
     elements of Σ are all dominated by the common
     variation
    So, a direct PCA to extract the factors would
     mostly extract common variation
    Since this is the objective of factor analysis as
     well, in this case, pca and factor analysis would
     both give very close results

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
 To see this intuitively let us consider the principal
  factor analysis, an alternative method to extract
  factors
 What it does is, since in factor analysis we are
  interested in the common variation only, or
      Σ - Ψ = ΛΛ, it applies PCA on Σ - Ψ rather
than on entire Σ
 However, since all variables are correlated, Ψ has
  small elements, so the difference between
  factors extracted by applying pca on Σ - Ψ vs Σ, is
  likely to be minimal

January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Again if we consider the factor analysis model, the
     diagonal elements of Σ would be dominated by specific
     variance from Ψ and the off diagonal elements would
     be really small
    Here, an application of pca on Σ would extract PCs that
     take care of only specific variance and not common
     variation. So they would be very different from the
     factors
    Factors drawn through pfa would be different from the
     components but the model would not hold since there
     is no correlation
    In this case, factor analysis doesnt make sense which
     should be clear from the correlation matrix it self
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Consider two variables: xi and xj
    xi is highly correlated with the rest of the variables
    Then in Σ, both the diagonal and off-diagonal elements
     of xi are dominated by common variation
    xj on the other hand is not correlated with the rest of
     the variables
    In Σ, the diagonal element of xj is dominated by
     specific variation from Ψ
    Applying pca on Σ, would once again consider both
     common and specific variation making the
     components different from factors
    Its better to apply pfa since it would strip Σ of the
     specific variance due to Ψ
January 28, 2013   ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
    Now that we know where the choice between
     pca and factor analysis is trivial in practice (they
     always make theoretical difference) and where it
     is not, how do we choose?
    The answer is the most basic step of statistics
       Know your data
       Or more specifically, know the correlation structure of
          your data
    However, since this is difficult and a judgement
     call it is always advisable to use non-pca
     techniques for factor analysis, lest you come up
     with factors that also contain specific variation
January 28, 2013      ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India

More Related Content

What's hot

Data Transformation.ppt
Data Transformation.pptData Transformation.ppt
Data Transformation.pptVishal Yadav
 
Regression Analysis and its uses
Regression Analysis and its usesRegression Analysis and its uses
Regression Analysis and its usesnim dar Nimra_Dar
 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regressionElkana Rorio
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressioncbt1213
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis_pem
 
Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...
Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...
Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...manumelwin
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimationTech_MX
 
Concepts of Correlation and Path Analysis
Concepts of Correlation and Path AnalysisConcepts of Correlation and Path Analysis
Concepts of Correlation and Path AnalysisGauravrajsinh Vaghela
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: EstimationParag Shah
 
P G STAT 531 Lecture 10 Regression
P G STAT 531 Lecture 10 RegressionP G STAT 531 Lecture 10 Regression
P G STAT 531 Lecture 10 RegressionAashish Patel
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inferenceJags Jagdish
 
Multivariate analysis - Multiple regression analysis
Multivariate analysis -  Multiple regression analysisMultivariate analysis -  Multiple regression analysis
Multivariate analysis - Multiple regression analysisRaihanathusSahdhiyya
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionSakthivel R
 
Auto Correlation Presentation
Auto Correlation PresentationAuto Correlation Presentation
Auto Correlation PresentationIrfan Hussain
 
Cytoscape basic features
Cytoscape basic featuresCytoscape basic features
Cytoscape basic featuresLuay AL-Assadi
 

What's hot (20)

Data Transformation.ppt
Data Transformation.pptData Transformation.ppt
Data Transformation.ppt
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Regression Analysis and its uses
Regression Analysis and its usesRegression Analysis and its uses
Regression Analysis and its uses
 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
 
Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...
Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...
Ducan’s multiple range test - - Dr. Manu Melwin Joy - School of Management St...
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimation
 
Concepts of Correlation and Path Analysis
Concepts of Correlation and Path AnalysisConcepts of Correlation and Path Analysis
Concepts of Correlation and Path Analysis
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: Estimation
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
P G STAT 531 Lecture 10 Regression
P G STAT 531 Lecture 10 RegressionP G STAT 531 Lecture 10 Regression
P G STAT 531 Lecture 10 Regression
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
 
Multivariate analysis - Multiple regression analysis
Multivariate analysis -  Multiple regression analysisMultivariate analysis -  Multiple regression analysis
Multivariate analysis - Multiple regression analysis
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Auto Correlation Presentation
Auto Correlation PresentationAuto Correlation Presentation
Auto Correlation Presentation
 
Cytoscape basic features
Cytoscape basic featuresCytoscape basic features
Cytoscape basic features
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 

Similar to How principal components analysis is different from factor

Logistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levelsLogistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levelsArup Guha
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Mengxi Jiang
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sortijfcstjournal
 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...ijscmcj
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...csandit
 
効率的反実仮想学習
効率的反実仮想学習効率的反実仮想学習
効率的反実仮想学習Masa Kato
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technicalalpinedatalabs
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
An algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemsAn algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemseSAT Journals
 
An algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemsAn algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemseSAT Journals
 
An algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemsAn algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemseSAT Journals
 
BIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHM
BIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHMBIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHM
BIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHMijcsa
 
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning AutomataA new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automatainfopapers
 

Similar to How principal components analysis is different from factor (20)

Logistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levelsLogistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levels
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
 
DMAIC
DMAICDMAIC
DMAIC
 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
 
presentation_1
presentation_1presentation_1
presentation_1
 
working with python
working with pythonworking with python
working with python
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
効率的反実仮想学習
効率的反実仮想学習効率的反実仮想学習
効率的反実仮想学習
 
Have you met Julia?
Have you met Julia?Have you met Julia?
Have you met Julia?
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
An algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemsAn algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problems
 
An algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemsAn algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problems
 
An algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problemsAn algorithm for solving integer linear programming problems
An algorithm for solving integer linear programming problems
 
BIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHM
BIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHMBIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHM
BIN PACKING PROBLEM: A LINEAR CONSTANTSPACE  -APPROXIMATION ALGORITHM
 
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning AutomataA new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
 

More from Arup Guha

Structural breaks, unit root tests and long time series
Structural breaks, unit root tests and long time seriesStructural breaks, unit root tests and long time series
Structural breaks, unit root tests and long time seriesArup Guha
 
Partnership with a Premier Business School
Partnership with a Premier Business SchoolPartnership with a Premier Business School
Partnership with a Premier Business SchoolArup Guha
 
Beer Market Analytics Solutions
Beer Market Analytics SolutionsBeer Market Analytics Solutions
Beer Market Analytics SolutionsArup Guha
 
Database marketing
Database marketingDatabase marketing
Database marketingArup Guha
 
Using survival analysis results
Using survival analysis results Using survival analysis results
Using survival analysis results Arup Guha
 
Measuring Actual Effect Of Tv Ads On Sales Lk
Measuring Actual Effect Of Tv Ads On Sales   LkMeasuring Actual Effect Of Tv Ads On Sales   Lk
Measuring Actual Effect Of Tv Ads On Sales LkArup Guha
 

More from Arup Guha (6)

Structural breaks, unit root tests and long time series
Structural breaks, unit root tests and long time seriesStructural breaks, unit root tests and long time series
Structural breaks, unit root tests and long time series
 
Partnership with a Premier Business School
Partnership with a Premier Business SchoolPartnership with a Premier Business School
Partnership with a Premier Business School
 
Beer Market Analytics Solutions
Beer Market Analytics SolutionsBeer Market Analytics Solutions
Beer Market Analytics Solutions
 
Database marketing
Database marketingDatabase marketing
Database marketing
 
Using survival analysis results
Using survival analysis results Using survival analysis results
Using survival analysis results
 
Measuring Actual Effect Of Tv Ads On Sales Lk
Measuring Actual Effect Of Tv Ads On Sales   LkMeasuring Actual Effect Of Tv Ads On Sales   Lk
Measuring Actual Effect Of Tv Ads On Sales Lk
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

How principal components analysis is different from factor

  • 1. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 2. Background and Intuition : 3-5  Principal Components Analysis : 6-15  Factor Analysis : 16-20  Comparison between PCA and Factor Analysis : 21-27  Cases and choice between PCA and Factor Analysis : 28-33 January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 3. Analyst 1: im confused, should I run pca or factor analysis? Analyst 2: depends. If you are doing variable reduction or developing a ranking, pca is better. If you proposing a model for the observed variables, then factor analysis Analyst1: so there is a difference between the two? Analyst2: yep Analyst1: but both give very close communalities Analyst2: yep but not always Analyst1: can you tell me the difference between the two? Analyst2: yep Analyst1: in non-mathematical terms? Analyst2: nope. pca is maths and factor analysis is stats. There is no layman analogue of eigenvectors and eigenvalues that I know of Analyst1: but if in most cases they are similar, should I bother? Analyst2: if you trained in maths or stats, yes, or you wouldnt be able to sleep at night. If you are trained in market research, then no . Serious ans is it depends on data January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 4.  n*1 vector of random variables Y  Say, μi = E(Yi) where i=1,2,…n  Then var (Yi)=E(Yi- μi)(Yi- μi)… (1)  And cov(Yi)=E(Yi- μi)(Yi- μj) where i ne j...(2)  Then (Yi- μi) (Yi- μi)’ gives the variance covariance matrix where the diagonal elements are (1) and the off diagonal elements are (2) January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 5. Think that you are trying to solve this problem every time you take a photograph - Converting a 3d object into a 2d photograph with maximum details retained  If we take a high dimensional data vector in n space and project into a lower dimensional sub- space of n-k, k>0 dimensions, and do it such that the retained variance is maximised, we get PC  Note, there is no model involved here… just want to capture the maximum information in the photograph January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 6. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 7. Suppose that x is a vector of p random variables, and that the variances of the p random variables and the structure of the covariances or correlations between the p variables are of interest  Say we are lazy and simply don’t want to look at the p variances and all of the1/2{p(p − 1)} correlations or covariances  An alternative approach is to look for a few (<< p) derived variables that preserve most of the information given by these variances and correlations or covariances January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 8. Although PCA does not ignore covariances and correlations, it concentrates on variances  The way we would go about finding these PCs is so that the minimum number of PCs can explain maximum variance  The first step is to look for a linear function α’1x of the elements of x having maximum variance, where α1 is a vector of p constants α11, α12, . . . , α1p  Next, look for a linear function α’2x, uncorrelated with α’1x having maximum variance, and so on  These are the Principal Components January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 9.  Consider, for the moment, the case where the vector of random variables x has a known covariance matrix Σ  This is the famous variance covariance matrix whose (i,j)th element is the (known) covariance between the ith and jth elements of x when i not equal to j, and the variance of the jth element of x when i = j  Now two very important results: 1. It turns out that for k = 1, 2, · · · , p, the kth PC is given by zk = α’kx where αk is an eigenvector of Σ corresponding to its kth largest eigenvalue λk 2. Furthermore, if αk is chosen to have unit length (α’kαk = 1), then var(zk) = λk, where var(zk) denotes the variance of zk January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 10. To derive the form of the PCs, consider first α’1x; the vector α1 maximizes var*α’1x] = α1Σα1  It is clear the maximum will not be achieved for finite α1 so a normalization constraint must be imposed  The constraint used in the derivation is α’1α1 = 1, that is, the sum of squares of elements of α1 equals 1 January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 11. To maximize α’1Σα1 subject to α’1α1 = 1, the standard approach is to use the technique of Lagrange multipliers  Maximise: α’1Σα1 − λ(α’1α1 − 1), where λ is the lagrange multiplier  Differentiation with respect to α1 gives Σα1 − λα1 = 0,….. (A) OR, (Σ − λIp)α1 = 0, where Ip is the (p p) identity matrix  Thus, λ is an eigenvalue of Σ and α1 is the corresponding eigenvector (from spectral decomposition, look up) January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 12. Now α1 is supposed to maximise the variance α’1Σα1  For that to happen, Σα1 − λα1 =0 must hold  Or, α’1Σα1 = α’1 λα1 = λ α’1 α1 = λ  So, the variance, when maximised would equal λ  Which implies, that if we select the largest eigenvalue λ and the eigenvector associated with it α1, we would be maximising the retained variance and α’1x would be the first PC January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 13. The second PC, α’2x, maximizes α’2Σα2 subject to being uncorrelated with α’1x  Or equivalently subject to cov[α’1x,α’2x] = 0, where cov(x, y) denotes the covariance between the random variables x and y  Solving, we once again come down to maximising λ but it cant be equal to the largest eigenvalue since that is already taken by the first PC. So, λ=λ2, or the second largest eigenvalue of Σ  And so on January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 14. It can be shown that for the first, second, third, fourth, . . . , pth PCs, the vectors of coefficients α1,α2,α3,α4, . . . ,αp are the eigenvectors of Σ corresponding to λ1, λ2,λ3, λ4, . . . , λp, the first, second, third and fourth largest, . . . , and the smallest eigenvalue, respectively  Also, var[α’kx] = λk for k = 1, 2, . . . , p. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 15. Principal component analysis has often been dealt with in textbooks as a special case of factor analysis, and this practice is continued by some widely used computer packages, which treat PCA as one option in a program for factor analysis  This view is misguided since PCA and factor analysis, as usually defined, are really quite distinct techniques January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 16. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 17. The basic idea underlying factor analysis is that p observed random variables, x, can be expressed, except for an error term, as linear functions of m (< p) hypothetical (random) variables or common factors  That is if x1, x2, . . . , xp are the variables and f1, f2, . . . , fm are the factors, then  x1 = λ11f1 + λ12f2 + . . . + λ1mfm + e1  x2 = λ21f1 + λ22f2 + . . . + λ2mfm + e2  ...  xp = λp1f1 + λp2f2 + . . . + λpmfm + ep January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 18. λ are the factor loadings  e’s are the error terms sometimes also called specific factors. This is because, ej is specific to xj unlike fk are common to several xj  fk are the factors common to several x’s  We will skip the additional details of the factor analysis model since the objective is to demonstrate the difference between factor analysis and pca and not to explain the former January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 19. x = Λf + e (factor analysis model in matrix form)  Now, going back to the 2 analysts we had met at the start of this presentation:  Analyst 2: while factor analysis and pca are both dimension reduction techniques, factor analysis attempts to do so by proposing a model relating the observed to the latent variables. Pca has no such underlying model  In other words, the cameraman is just trying to take the best 2d representation of the 3d world (pca). He is not trying to fit a model to explain the world January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 20. Analyst 1: But, since both are trying to do the same thing, what if pca is used to solve the factor analysis model? Then there would be no difference between pca and factor analysis right?  Analyst 2: very good point. But pca explains all the variance and covariance of variance covariance matrix for a given data. Whereas, factor analysis explains only the common variance. Lets get back to models. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 21. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 22. α’kΣαk is maximised by setting α’kΣαk = λk on slide 10  This maximises var(z=α’kx) but it maximises the variances along the diagonal of Σ as well as the off-diagonal covariances or correlations in it  So PCs explain diagonal elements = variance as well as off-diagonal = covariance/correlation elements of the variance-covariance matrix of the original data matrix x January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 23. If you remember the factor analysis model in matrix form: x = Λf + e  Along with the following assumptions,  E[ee] = Ψ (diagonal)  E[fe] = 0 (a matrix of zeros)  E[ff] = Im (an identity matrix)  the above model implies that the variance covariance matrix would have the form:  Σ = ΛΛ +Ψ January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 24. Σ = ΛΛ +Ψ  Now, Ψ is a diagonal matrix, which means that its off diagonal terms are zero  So, the contribution of Ψ towards the off diagonal terms of Σ would be nil  Note that the relative contribution of ΛΛ and Ψ on diagonal terms in Σ would depend on the nature of the variable xj in question  If xj is highly correlated with all other variables then the communalities would be large and specific variance or ej would be low  On the other hand, if xj is almost uncorrelated with the other variables then communalities would be low and ej would be large January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 25. We have a data vector x  This data vector has a variance covariance matrix Σ  The diagonal terms of which are the variances  The off diagonal terms are the covariances  If the objective is dimension reduction or to get a ranking variable or something like image recognition etc. then we would do PCA  PCA would take care of diagonal as well as off- diagonal elements of the matrix Σ January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 26. Now say we come up with a factor model that explains the data x  Using this model we can decompose the variance covariance matrix Σ = ΛΛ +Ψ  Now as soon as move to a factor model our objective changes from retaining the maximum variance (photography example) to uncovering the common latent factors driving the data (e.g.: psychology driving behaviour) January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 27.  That is, in factor analysis, we are interested in ΛΛ part of Σ only. In PCA we are interested in the entire Σ  To understand properly let us consider the following cases: 1. The variables in vector x are all correlated 2. The variables in vector x are uncorrelated 3. Some of the variables are correlated and some are not January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 28. January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 29. If we specify a factor analysis model then the elements of Ψ would be small and ΛΛ would be dominating Σ  In other words, the diagonal and off diagonal elements of Σ are all dominated by the common variation  So, a direct PCA to extract the factors would mostly extract common variation  Since this is the objective of factor analysis as well, in this case, pca and factor analysis would both give very close results January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 30.  To see this intuitively let us consider the principal factor analysis, an alternative method to extract factors  What it does is, since in factor analysis we are interested in the common variation only, or Σ - Ψ = ΛΛ, it applies PCA on Σ - Ψ rather than on entire Σ  However, since all variables are correlated, Ψ has small elements, so the difference between factors extracted by applying pca on Σ - Ψ vs Σ, is likely to be minimal January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 31. Again if we consider the factor analysis model, the diagonal elements of Σ would be dominated by specific variance from Ψ and the off diagonal elements would be really small  Here, an application of pca on Σ would extract PCs that take care of only specific variance and not common variation. So they would be very different from the factors  Factors drawn through pfa would be different from the components but the model would not hold since there is no correlation  In this case, factor analysis doesnt make sense which should be clear from the correlation matrix it self January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 32. Consider two variables: xi and xj  xi is highly correlated with the rest of the variables  Then in Σ, both the diagonal and off-diagonal elements of xi are dominated by common variation  xj on the other hand is not correlated with the rest of the variables  In Σ, the diagonal element of xj is dominated by specific variation from Ψ  Applying pca on Σ, would once again consider both common and specific variation making the components different from factors  Its better to apply pfa since it would strip Σ of the specific variance due to Ψ January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
  • 33. Now that we know where the choice between pca and factor analysis is trivial in practice (they always make theoretical difference) and where it is not, how do we choose?  The answer is the most basic step of statistics  Know your data  Or more specifically, know the correlation structure of your data  However, since this is difficult and a judgement call it is always advisable to use non-pca techniques for factor analysis, lest you come up with factors that also contain specific variation January 28, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India