2. Factor Analysis
Factor analysis is a useful tool for investigating variable relationships for complex
concepts such as socioeconomic status, dietary patterns, or psychological scales.
It allows researchers to investigate concepts that are not easily measured directly
by collapsing a large number of variables into a few interpretable underlying
factors
It has no interdependent or dependent variable as we were having in regression or
multiple regression.
3. What is a factor
The key concept of factor analysis is that multiple observed variables have similar patterns of
responses because they are all associated with a latent (i.e. not directly measured) variable.
For example, people may respond similarly to questions about income, education, and occupation,
which are all associated with the latent variable socioeconomic status.
So in Simple words Common underlying factors are known as factors
4. Why we use Factor Analysis
To lower down the no. of variables as it is a data summarization technique
As in factor analysis we see the interrelationship between large no. of variables
and on the basis of that relationship we reduce many variable into some under
common(similar) underlying dimension.
For example we have large no. of student’s data we can break them in smaller ones
by creating some category
1. Academic 2. Sports 3. Cultural
5. Understanding factor analysis
Regardless of purpose factor analysis is used in the determination of small no. of
factors based on particular inter-related quantitative variables.
6. Assumptions in FA
Variables must be interrelated
Sample size should be min. 50, preferred 100
Observations min. 5, preferred 10
8. Basic Difference b/w PCA and FA
PCA- In this total variance is taken Unique variance, error variance and hard
Variance.
FA- Only those variables are taken whose variance is common or we can say only
Shared variance is taken.
9. Performing PCA
We will use the built-in dataset mtcars. The dataset has 32 instances for 11 variables.
It gives 11 features like ‘miles per gallon’, ‘number of cylinders’, ‘horsepower’, etc.
of 32 different models of cars. In the dataset, there are two categorical variables.
First is ‘vs’ that shows whether the car’s engine is ‘v’ shaped (1) or not (0).
The second one is ‘am’ that shows whether the car has an automatic transmission (1)
or manual (0).
We will have to ignore these two variables in the analysis as PCA is for numeric data
and cannot deal with categorical variables.
We will compute the principal components using the prcomp() function to achieve
this
11. Factor Analysis in R
Factor analysis (FA) or exploratory factor analysis is another technique to reduce
the number of variables to a smaller set of factors. FA identifies the relationships
among a set of variables and narrows it down to a smaller set.
We will be using the bfi dataset, which is a built-in dataset provided in R. It
comprises 25 different personality factors. We will require the psych and the GPA
rotation packages. So, install and load them into the library.
12. code
parallel <- fa.parallel(bfi,fm="minres",fa='fa‘)
Output
Parallel analysis suggests that the number of factors = 7 and the number of
components = NA
13. Applying Factor Analysis
Now that we know how many factors we need, we can perform the factor analysis
using the fa() function.
factors <- fa(bfi,nfactors=7,rotate='oblimin',fm='minres')
print(factors)
14. Summary
PCA and factor analysis in R are both multivariate analysis techniques. They
both work by reducing the number of variables while maximizing the
proportion of variance covered. The prime difference between the two
methods is the new variables derived.
The principal components are normalized linear combinations of the
original variables. The factors are measurement models of latent variables.
While both techniques have the same purpose, they have different
approaches and results.