1. QCI- 5 Days Online Workshop on
Enhancing Research Capability
Dr. Jai Singh
FACTOR ANALYSIS
2. Impact of Covid-19 in Health Sector
• Insurance Industry
• Pharmaceutical Companies
• Physicians
• Patients
• Government
• Nurse/Carers, staff, unions,
• Voluntary organisations
• Social services,
• Local health authority
• Primary care groups
• Local health groups
3. Factor Analysis
• Factor Analysis (FA) is an exploratory technique applied to
a set of large number of observed variables to find basic
small number of factors (subsets of variables) from which
the observed variables were generated.
Ex.- An individual’s response to the questions on a college
entrance test is influenced by underlying variables such as
intelligence, years in school, age, emotional state on the day
of the test, amount of practice taking tests, and so on.
Ex.- The analyst hopes to reduce the interpretation of a 200-
question test to the study of 4 or 5 factors.
4. Continue-
• Factor analysis is a procedure used to reduce a
large amount of questions (condense the data) into
few variables (Factors) according to their
relevance.
• Factors are assumed to represent dimensions
within data.
• The answers to the questions are the observed
variables. The underlying, influential variables are
the factors.
• The most common technique is known as
Principal Component Analysis (PCA).
6. Continue-
• Factor analysis is a useful tool for investigating variable
relationships for complex concepts such as socioeconomic
status, dietary patterns, or psychological scales.
• It allows researchers to investigate concepts that are not easily
measured directly by collapsing a large number of variables
into a few interpretable underlying factors.
• Factor analysis is commonly used in:
-Scale development
-The evaluation of the psychometric quality of a measure, and
-The assessment of the dimensionality of a set of variables.
7. Factor Analysis –Structural Aspect
• Factor analysis provides a tool
for analyzing the structure of
interrelationships
(Correlations) among
variables by defining a set of
variables which are highly
correlated known as Factors.
8. What is a factor?
• Multiple observed variables have similar patterns of
responses because they are all associated with a latent (i.e.
not directly measured) variable.
- Ex.- People may respond similarly to questions about
income, education, and occupation, which are all associated
with the latent variable socioeconomic status.
• In every factor analysis, there are the same number of
factors as there are variables. Each factor captures a certain
amount of the overall variance in the observed variables.
• Factors are always listed in order of how much variation
they explain.
9. Exploratory Factor Analysis (EFA)
Exploratory - When the dimensions/factors are theoretically unknown
- Exploratory factor analysis is a statistical approach that can be used
to analyze interrelationships (correlation) among a large number of
variables in a data set and to explain these variables in terms of a
smaller number of common underlying dimensions.
- This involves finding a way of condensing the information
contained in some of the original variables into a smaller set of
implicit variables (called factors) with a minimum loss of
information.
- This type of analysis provides a factor structure (a grouping of
variables based on strong correlations).
10. Confirmatory Factor Analysis
• Confirmatory – When researcher has preconceived thoughts
about the actual structure of data based on theoretical
support or prior research.
• Researcher may wish to test hypothesis involving issues as
which variables should be grouped together on a factor.
• Example - Retail firm identified 80 characteristics of retail
stores and their services that consumers mentioned as
affecting their patronage choice among stores. Retailer want
to find the broader dimensions on which he can conduct a
survey
11. Exploratory Factor Analysis
• Assumptions:
Metric Data (Interval)
Multicollinearity must be present i.e.
Correlation among variables
Adequate sample size
• Purpose
Obtaining independent factors
Data Reduction
12. Extract Factor (Example-1)
• Several questions closely related to aspects of
customer satisfaction
-How satisfied are you with our product?
-Would you recommend our product to a friend or family
member?
-How likely are to you purchase our product in the future
13. Continue-
• We want one variable to represent a customer satisfaction
score.
- One option would be to average the three question
responses.
- Another option would be to create a factor dependent
variable.
• This can be done by PCA and keeping the first Principal
Component (also known as a factor).
- The advantage of PCA over an average is that it automatically
weights each of the variables in the calculation.
14. Example-2
• Purchase barriers of potential customers
• Possible barriers to purchase
- Factor analysis can uncover the trends of how these
questions will move together.
- Loadings for 3 factors for each of the variables.
16. Principal components weights for the variables
• The first component heavily weights variables
related to cost,
• The second weights variables related to IT, and
• The third weights variables related to
organizational factors.
- We can give our new super variables appropriate
names.
17. If we were to cluster the customers based on these three
components, we can see some trends. Customers tend to be high
in Cost barriers or Org barriers, but not both.
18.
19. Scope of Factor analysis
Psychographics (Agree/Disagree):
• I value family
• I believe brand represents value
Behavioral (Agree/Disagree):
• I purchase the cheapest option
• I am a bargain shopper
Attitudinal (Agree/Disagree):
• The economy is not improving
• I am pleased with the product
Activity-Based (Agree/Disagree):
• I love sports
• I sometimes shop online during
work hours
Behavioral and psychographic questions are especially suited
for factor analysis.
20. Eigenvalue
• The eigenvalue is a measure of how much of the variance of the
observed variables a factor explains. Eigenvalues represent the total
amount of variance that can be explained by a given principal component.
• Any factor with an eigenvalue ≥1 explains more variance than a single
observed variable.
• Factor for socioeconomic status had an eigenvalue of 2.3 it would explain
as much variance as 2.3 of the three variables.
• If eigenvalues are greater than zero, then it’s a good sign.
• Since variance cannot be negative, negative eigenvalues imply the model is
ill-conditioned.
• Eigenvalues close to zero imply there is item multicollinearity, since all the
variance can be taken up by the first component.
• The factors that explain the least amount of variance are generally
discarded.
22. What are factor loadings?
Variables Factor 1 Factor 2
Income 0.65 0.11
Education 0.59 0.25
Occupation 0.48 0.19
House value 0.38 0.60
Number of public parks in
neighborhood
0.13 0.57
Number of violent crimes per year
in neighborhood
0.23 0.55
The relationship of each variable to the underlying factor is expressed
by the so-called factor loading.
Example- indicators of wealth with six variables and two resulting
factors.
23. Interpretation
• The variable income has the strongest association to the underlying latent
variable. Factor 1, with a factor loading of 0.65.
• Two other variables, education and occupation, are also associated with
Factor 1. Based on the variables loading highly onto Factor 1, we could
call it “Individual socioeconomic status.”
• House value, number of public parks, and number of violent crimes per
year, however, have high factor loadings on the other factor, Factor 2.
They seem to indicate the overall wealth within the neighborhood, so we
may want to call Factor 2 “Neighborhood socioeconomic status.”
• Since factor loadings can be interpreted like standardized regression
coefficients, one could also say that the variable income has a correlation
of 0.65 with Factor 1.
24. Continue-
• Variable house value also is marginally important
in Factor 1 (loading = 0.38).
• This makes sense, since the value of a person’s
house should be associated with his or her
income.
• .50 loading denotes that 25% of variance is
accounted for by the factor.
• Loading must exceed .70 for factor to account for
.50 variance.
• Loadings .5 are considered practically significant
and .7 are indicative of well defined structure.
25. Communalities
• Communalities indicate the amount of variance in each
variable that is accounted for by the components.
• Initial communalities are estimates of the variance in each
variable accounted for by all components or factors.
• For principal components extraction, this is always equal to
1.0 for correlation analyses.
• Extraction communalities are estimates of the variance in each
variable accounted for by the components.
• The high communalities indicates that the extracted
components represent the variables well.
• If any communalities are very low in a principal components
extraction, you may need to extract another component.
26. Variance in factor analysis
• Two types of variance- common and unique
• Common variance is the amount of variance that is shared
among a set of items. Items that are highly correlated will share
a lot of variance.
– Communality (also called h2) is a definition of common variance
that ranges between 00 and 11. Values closer to 1 suggest that
extracted factors explain more of the variance of an individual item.
• Unique variance is any portion of variance that’s not common.
There are two types:
– Specific variance: is variance that is specific to a particular item
– Error variance: comes from errors of measurement and basically
anything unexplained by common or specific variance.
27. Kaiser-Meyer-Olkin (KMO) Test
- Measure of how suited data is for Factor
Analysis.
- Sampling adequacy for each variable in the
model and for the complete model.
- Measure of the proportion of variance among
variables that might be common variance.
- The lower the proportion, the more suited data
to Factor Analysis.
28. KMO Values
• KMO values between 0 and 1.
• A rule of thumb for interpreting the statistic
- KMO values between 0.8 and 1 indicate the
sampling is adequate.
- KMO values less than 0.6 indicate the sampling is
not adequate and that remedial action should be
taken.
- Some times we accept KMO values between 0.5
and 0.6.
29. Bartlett test of sphericity
Checks statistical significance that correlation matrix has significant
correlation among at least some variables
• Bartlett test should be significant i.e. less than 0.05 this means that
the variables are correlated highly enough to provide a reasonable
basis for factor analysis.
• Correlation matrix is significantly different from an identity matrix
in which correlations between variables are all zero.
• Test compares an observed correlation matrix to the identity matrix.
Null hypothesis of the test –variables are orthogonal, i.e. not correlated.
Alternative hypothesis - variables are not orthogonal, i.e. they are
correlated enough - correlation matrix diverges significantly from
identity matrix.
31. How many number of factors should be extracted?
There are some criteria, but no 100% foolproof statistical test exists.
• Drawing screeplot: Connect eigenvalues (as representing variances
explained by each factor) for many possible factors from maximum to
minimum. The adequate number of factors is before the sudden
downward inflextion of the plot.
• Parallel analysis: Compare actual screeplot with the possible
screeplot based on randomly resampled data. The adequate number of
factors is at the crossing point of the two plots.
• Eigenvalues > 1: Eigenvalues sum to the number of items, so an
eigenvalue more than 1 is more informative than a single average item.
32. Factor rotation
• Un rotated factor don’t provide the
required information.
• Axis of the factors can be rotated within
the multidimensional variable space while
determining the best fit between the
variables and the latent factors.
• Factor rotation improves interpretation of
data by reducing ambiguities.
• Rotation - reference axes of factors are
turned about the origin - some best position
has been achieved.
• Un rotated factor solution extract factors
in order of their variances extracted.
• Effect of rotation - redistribute the variance
from earlier factors to later ones.
33. Orthogonal and oblique factor rotation
• Orthogonal factor rotation
- Axes are maintained at 90 degree.
- More suitable when research goal is data
reduction
- Rotations that assume the factors are not
correlated are called orthogonal rotations.
• Oblique factor rotation
- Axes are rotated but they don’t retain the
90 degree angle between reference axes.
- Oblique is more flexible.
- Best suited to the goal of obtaining
several theoretically meaningful factors.
- Rotations that allow for correlation are
called oblique rotations.
34. Correlation between factors
• High Individual socioeconomic status (Factor
1) lives also in an area that has a high
Neighborhood socioeconomic status (Factor
2).
• That means the factors should be correlated.
• the two axes of the two factors are probably
closer together than an orthogonal rotation
axis.
• the angle between the two factors is now
smaller than 90 degrees.
• meaning the factors are now correlated.
• In this example, an oblique rotation
accommodates the data better than an
orthogonal rotation.
35. Sample Example
A researcher is
interested to investigate
the reasons of choosing
a university for
education. Several
variables were identified
which influence the
individuals
(guardians/students) to
choose a university.
• Variables-
1. Cost of Education
2. Quality of Education
3. Availability of experts and modern
laboratories
4. Having own campus and security
5. Numbers of years operating
6. Number of graduates in job and abroad
7. International recognition and
8. Accommodations and food
Seven Point Scale; 1=Not Important to 7=Very
Important