Answer to the working of principal component analysis (pca), when to use, and its pre-requisites.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
2. Principal Component Analysis
What is Principal Component Analysis?
According to Wikipedia:
“Principal component analysis (PCA) is a statistical procedure that uses an
orthogonal transformation to convert a set of observations of possibly
correlated variables into a set of values of linearly uncorrelated variables
called principal components.”
In layman‟s term it is a technique that enables us to reduced set of variables
which is much easier to analyze and interpret.
It is dimension reduction tool used for identification of a smaller number of
uncorrelated variables known as principal components from a larger set of
data. The technique is widely used to understand variation and capture
strong patterns from the data set.
It is an unsupervised learning technique so it will not care of the
target/dependent variable. However we should exclude it before adding it
to PCA as it will also use the target variable like other independent variable
to capture correlation. Also since it is an unsupervised learning it will not
know which one is the target variable.
Rupak Roy
3. Principal Component Analysis
As i said “Its like Compression while preserving the information.”
When to use PCA
• Latent features (latent features are 'hidden' features to distinguish
them from observed features. An example would be text analysis.
'words„ extracted from the documents are features. Factorize the words
we will get 'topics', where 'topic' is a group of words with semantic
relevance. So these are the variables which cannot be measured
directly.)
• To reduce noise for other algorithms enabling them process faster.
• Face recognition- Images which have many pixels, high
dimensionality space, with PCA we can reduce high dimensionality
space for svm or other classification algorithms for faster processing of
actual classification of the picture.
Rupak Roy
4. Principal Component Analysis
To perform PCA we will use
Princomp
Prcomp
This are the widely used function for performing PCA since it comes
bundled with the base R package.
Both perform same function however princomp uses the eigen function
while prcomp uses singular value decomposition.
Princomp returns the results as an object of class “princomp”.
And prcomp returns the results as an object of class “prcomp”.
Rupak Roy
5. Principal Component Analysis
Pre-requisites of PCA:
> PCA functions work in numerical variables, so everything have to
converted to numerical values
> Remove the columns that has no variance like if the column has all
1‟s, 0‟s, 5‟s that will not help PCA to reduce the dimension.
> Also remove columns that has unique values, this will slow down the
PCA to do its work.
Rupak Roy