This document outlines an agenda for a university course on classification methods taught by Dr. S. Shivendu. The objectives of the course are to understand statistical concepts of principal component analysis and factor analysis, interpret results, and use SAS software. The agenda includes overviews of statistical concepts, principal component analysis, factor analysis, and textbooks. The document provides background information on these topics, including why they are used, basic concepts, properties, assumptions, and how the analyses work.
1. U N I V E R S I T Y O F S O U T H F L O R I D A // 1
Classification Methods
Dr. S. Shivendu
2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Overview of Classification Methods
Understand statistical concepts of Principal
component analysis and Factor analysis.
01
Interpret results of Principal component
analysis and Factor analysis.
02
Be able to debug SAS programs and export data
from SAS files.
03
3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Overview of Statistical Concepts
Principal Component Analysis
Factor Analysis
4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Course Textbooks
5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5
• We study phenomena that can not be directly observed
– ego, personality, intelligence in psychology
– Underlying factors that govern the observed data
• We want to identify and operate with underlying latent factors rather than the
observed data
– E.g. topics in news articles
– Transcription factors in genomics
• We want to discover and exploit hidden relationships
– “beautiful car” and “gorgeous automobile” are closely related
– So are “driver” and “automobile”
– But does your search engine know this?
– Reduces noise and error in results
Why Principal Component or Factor Analysis?
6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6
• We have too many observations and dimensions
– To reason about or obtain insights from
– To visualize
– Too much noise in the data
– Need to “reduce” them to a smaller set of factors
– Better representation of data without losing much information
– Can build more effective data analyses on the reduced-dimensional space:
classification, clustering, pattern recognition
• Combinations of observed variables may be more effective bases for insights, even if
physical meaning is obscure
Why Principal Component or Factor Analysis?
7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7
• Discover a new set of factors/dimensions/axes against which to represent, describe or
evaluate the data
– For more effective reasoning, insights, or better visualization
– Reduce noise in the data
– Typically a smaller set of factors: dimension reduction
• Factors are combinations of observed variables
– May be more effective bases for insights, even if physical meaning is obscure
– Observed data are described in terms of these factors rather than in terms of
original variables/dimensions
Factor or Component Analysis
8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Basic Concept
• Areas of variance in data are where items can be best discriminated and key
underlying phenomena observed
• Areas of greatest “signal” in the data
• If two items or dimensions are highly correlated or dependent
• They are likely to represent highly related phenomena
• If they tell us about the same underlying variance in the data, combining them to
form a single measure is reasonable
• Parsimony
• Reduction in Error
• So we want to combine related variables, and focus on uncorrelated or independent
ones, especially those along which the observations have high variance
• We want a smaller set of variables that explain most of the variance in the original
data, in more compact and insightful form
9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Basic Concept
• What if the dependences and correlations are not so strong or direct?
• And suppose you have 3 variables, or 4, or 5, or 10000?
• Look for the phenomena underlying the observed covariance/co-dependence in a set
of variables
• Once again, phenomena that are uncorrelated or independent, and especially
those along which the data show high variance
• These phenomena are called “factors” or “principal components” or “independent
components,” depending on the methods used
• Factor analysis: based on variance/covariance/correlation
• Independent Component Analysis: based on independence
10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Principal Component Analysis
• Most common form of factor analysis
• The new variables/dimensions
• Are linear combinations of the original ones
• Are uncorrelated with one another
• Orthogonal in original dimension space
• Capture as much of the original variance in the data as possible
• Are called Principal Components
11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Principal Component Analysis: An Overview
• It is a mathematical tool from applied linear algebra.
• It is a simple, non-parametric method of extracting relevant information from a data
set with large number of variables.
• It provides a roadmap for how to reduce a complex data set to a lower dimension.
12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Let’s start with first principles: Variance
13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13
So, what is Covariance?
14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14
What do we understand by Covariance?
15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15
So, for Covariance
16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Why Covariance?
• Why bother with calculating (expensive) covariance when we could just plot the 2
values to see their relationship?
• Covariance calculations are used to find relationships between dimensions in high
dimensional data sets (usually greater than 3) where visualization is difficult.
17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Another concept: Linear Independence
• A set of n-dimensional vectors xi Є Rn, are said to be linearly independent if
none of them can be written as a linear combination of the others.
• In other words,
18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Linear Independence continued
19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Some other terms: Span and Basis
20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Orthonormal and Orthogonal Basis
21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Some more basic terms: Eigen Value Problem and Eigenvector
22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22
How to calculate Eigenvalues and Eigenvectors?
23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Calculating Eigenvalues and Eigenvector (you can skip this)
24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24
Properties of Eigenvalues and Eigenvectors
25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25
So, What is Principal Component Analysis?
• An example:
26. U N I V E R S I T Y O F S O U T H F L O R I D A // 26
Example (continued)
27. U N I V E R S I T Y O F S O U T H F L O R I D A // 27
In other words:
28. U N I V E R S I T Y O F S O U T H F L O R I D A // 28
Which parameters we want to ignore and which
ones we want to keep?
• Which parameters to keep:
• Parameter that doesn’t depend on others (e.g. eye color), i.e. uncorrelated and
hence have low covariance.
• Parameter that changes a lot: Have High variance
• Which parameters to drop:
• Constant parameter (number of heads)
• Parameter with very low variance (thickness of hair)
• Parameter that is linearly dependent on other parameters: Z= aX + bY
29. U N I V E R S I T Y O F S O U T H F L O R I D A // 29
Now we are ready!
30. U N I V E R S I T Y O F S O U T H F L O R I D A // 30
Core principles of Principal Component Analysis (PCA)
• PCA is a mathematical presentation using linear algebra to “best express” the
raw data along minimum number of dimensions”.
• PCA allows us to filter out noise and extract the relevant information from the
given data set.
• Hence, the representation we are looking for is such that it decreases both
noise and redundancy in the data set at hand.
31. U N I V E R S I T Y O F S O U T H F L O R I D A // 31
What is noise?
32. U N I V E R S I T Y O F S O U T H F L O R I D A // 32
What is redundancy?
33. U N I V E R S I T Y O F S O U T H F L O R I D A // 33
Goal of PCA:
34. U N I V E R S I T Y O F S O U T H F L O R I D A // 34
PCA assumption and how it works
35. U N I V E R S I T Y O F S O U T H F L O R I D A // 35
How does it work?
36. U N I V E R S I T Y O F S O U T H F L O R I D A // 36
In summary
37. U N I V E R S I T Y O F S O U T H F L O R I D A // 37
Factor Analysis
• General Concepts
• Factor analysis provides information about reliability, item quality, and
construct validity
• General goal is to understand whether and to what extent items from a
scale may reflect an underlying hypothetical construct or constructs, known
as factors
• An analytic method with high sensitivity to identify problematic items and
assess the number of factors
• Useful for analysis of survey data
38. U N I V E R S I T Y O F S O U T H F L O R I D A // 38
General concepts (contd.)
• In general, factor analysis methods decompose (or break down) the
covariation among items in a measure into meaningful components
• Higher inter-item correlations should reflect greater overlap in what the
items measure, and, therefore, higher inter-item correlations reflect higher
internal reliability
39. U N I V E R S I T Y O F S O U T H F L O R I D A // 39
Objectives
• What is factor analysis?
• What do we need factor analysis for?
• What are the modeling assumptions?
• How to specify, fit, and interpret factor models?
• What is the difference between exploratory and confirmatory factor analysis?
• What is and how to assess model identifiability?
39
40. U N I V E R S I T Y O F S O U T H F L O R I D A // 40
What is factor analysis?
• Factor analysis is a theory driven statistical data reduction technique used to
explain covariance among observed random variables in terms of fewer
unobserved random variables named factors
41. U N I V E R S I T Y O F S O U T H F L O R I D A // 41
General concept
• In practice, a factor cannot be estimated with one item
• Should only be estimated with three or more items with higher
correlation with factor contribute more to the measure
42. U N I V E R S I T Y O F S O U T H F L O R I D A // 42
General Concepts
• Items are referred to as indicators Regression slopes between factor and
indicators are referred to as loadings
43. U N I V E R S I T Y O F S O U T H F L O R I D A // 43
General Concepts
• Patterns of high inter-item correlations among subsets of items suggest more
than one factor because the items tend to “cluster” together
• Any number of factors might underlie a set of items, up to the total number
of items (which would imply no common factor)
• Example: set of six items might assess extroversion and openness
44. U N I V E R S I T Y O F S O U T H F L O R I D A // 44
General Concepts
45. U N I V E R S I T Y O F S O U T H F L O R I D A // 45
General Concepts
• We never know the meaning of the factors, however; we can only use theory
to decide what they mean and then test their validity
• The factors may be related or not related—correlated or orthogonal
(uncorrelated)
• If those who are extroverted tend to be a little more open, then the factors
are correlated (contrary to what is suggested by the table)
46. U N I V E R S I T Y O F S O U T H F L O R I D A // 46
Types of Factor Analysis
• Two major types of factor analysis
• Exploratory factor analysis (EFA)
• Confirmatory factor analysis (CFA)
• Major difference is that EFA seeks to discover the number of factors and does
not specify which items load on which factors
47. U N I V E R S I T Y O F S O U T H F L O R I D A // 47
Exploratory Factor Analysis
• The researcher may discover there is one factor underlying the items or
many factors
• Items may be eliminated by the researcher if they do not load highly
• Researchers choose items that load highly on one factor and low on other
factors to achieve simple structure
• Composite scale scores often created based on the factor analysis to be
used in further research
48. U N I V E R S I T Y O F S O U T H F L O R I D A // 48
Exploratory Factor Analysis
• An initial analysis called principal components analysis (PCA) is first
conducted to help determine the number of factors that underlie the set of
items
• PCA is the default EFA method in most software and the first stage in other
exploratory factor analysis methods to select the number of factors
• PCA is not considered a “true factor analysis method,” because measurement
error is not estimated.
49. U N I V E R S I T Y O F S O U T H F L O R I D A // 49
EFA
• PCA gives eigenvalues for the number of components (factors) equal to the number of items
• If 12 items, there will be 12 eigenvalues
• Each component is a potential “cluster” of highly inter-correlated items
• Eigenvalues represent the amount of variance accounted for by each component, but they
are not in a standardized metric
• Larger eigenvalues indicate a more important (and more likely real) components or factor,
with some merely reflecting unimportant factors or random variation
• The values sum to the number of items, so if 12 items, then there will be 12 eigenvalues that
sum to 12
• The proportion or percentage of (co)variance accounted for by each factor can be calculated
by dividing by the number of items
50. U N I V E R S I T Y O F S O U T H F L O R I D A // 50
EFA
• There are several possible rules which may be used for choosing the number of factors
based on eigenvalues
• The usual rule of greater than 1.0 (the Kaiser-Guttman rule) does not seem to work the
best
• Most use the scree plot and a subjective scree test by identifying the biggest drop in
eigenvalues
• The scree test seems to work well for identifying the correct number of factors
51. U N I V E R S I T Y O F S O U T H F L O R I D A // 51
52. U N I V E R S I T Y O F S O U T H F L O R I D A // 52
EFA
• Next steps in an EFA after deciding on the number of factors is to choose a
method of extraction
• The extraction method is the statistical algorithm used to estimate loadings
• There are several to choose from, of which principal factors (principal axis
factoring) or maximum likelihood seem to perform the best
53. U N I V E R S I T Y O F S O U T H F L O R I D A // 53
EFA
• Factor rotation
• Factor rotation is a mathematical scaling process for the loadings that also specifies
whether the factors are correlated (oblique) or uncorrelated (orthogonal)
• Usually, no harm in allowing factors to correlate
• If the factor correlation is zero, then the same as orthogonal
• Orthogonal rotation makes a strong assumption that the factors are uncorrelated, which
probably is not likely in most applications
54. U N I V E R S I T Y O F S O U T H F L O R I D A // 54
Confirmatory Factor Analysis
• Confirmatory factor analysis (CFA) starts with a hypothesis about how many factors there are
and which items load on which factors
• Factor loadings and factor correlations are obtained as in EFA
• EFA, in contrast, does not specify a measurement model initially and usually seeks to discover
the measurement model
• In EFA, all items load on all factors but in In CFA, most researchers start with a model in which
items load on only one factor (simple structure)
55. U N I V E R S I T Y O F S O U T H F L O R I D A // 55
CFA
• A test is computed to investigate how well the hypothesized factor structure fits with the data
• The fit test seeks to find a non-significant result, indicating good fit to the data
• The model fit is derived from comparing the correlations (technically, the covariances) among
the items to the correlations expected by the model being tested
• One may hear about many fit indices, so here are some
• Chi-square, χ2 lower values indicate better fit
• RMSEA, lower values indicate better fit (< .06)
• SRMR, lower values indicate better fit (< .08)
• Comparative Fit Index, higher value indicate better fit (>.95)
• Tucker-Lewis Index, higher value indicate better fit (>.95)
56. U N I V E R S I T Y O F S O U T H F L O R I D A // 56
CFA (contd.)
57. U N I V E R S I T Y O F S O U T H F L O R I D A // 57
EFA vs CFA
58. U N I V E R S I T Y O F S O U T H F L O R I D A // 58
EFA vs CFA
59. U N I V E R S I T Y O F S O U T H F L O R I D A // 59
CFA to SEM
60. U N I V E R S I T Y O F S O U T H F L O R I D A // 60
Another method to classify?
61. U N I V E R S I T Y O F S O U T H F L O R I D A // 61
Important characteristics of discriminant analysis
62. U N I V E R S I T Y O F S O U T H F L O R I D A // 62
Moreover, discriminant analysis
• Extracts dominant, underlying gradients of variation (canonical functions) among
groups of sample entities (e.g., species, sites, observations, etc.) from a set of
multivariate observations, such that variation among groups is maximized and variation
within groups is minimized along the gradient.
• Reduces the dimensionality of a multivariate data set by condensing a large number of
original variables into a smaller set of new composite dimensions (canonical functions)
with a minimum loss of information.
63. U N I V E R S I T Y O F S O U T H F L O R I D A // 63
Hence, discriminant analysis
• Summarizes data redundancy by placing similar entities in proximity in canonical
space and producing a parsimonious understanding of the data in terms of a few
dominant gradients of variation.
• Describes maximum differences among pre-specified groups of sampling entities
based on a suite of discriminating characteristics (i.e., canonical analysis of
discrimination).
• Predicts the group membership of future samples, or samples from unknown
groups, based on a suite of classification characteristics (i.e., classification).
64. U N I V E R S I T Y O F S O U T H F L O R I D A // 64
Takeaways
• Principal Component Analysis and Factor Analysis are data intensive methods
• These methods provide insights beyond inferential statistics
• These data intensive methods are increasingly more important in pattern
recognition, computer vision and natural language processing.
• In this course, we have just touched the tip of the iceberg!
65. U N I V E R S I T Y O F S O U T H F L O R I D A // 65
U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end of the
presentation.