Principal component analysis in machine L

Introduction to
PCA
By NAGA V SATYANARAYANA K

WHY PCA IN ML
• While working with high-dimensional data, machine learning
models often seem to overfit, and this reduces the ability to
generalize past the training set examples. Hence, it is important to
perform dimensionality reduction techniques before creating a
model.
• Principal Component Analysis (PCA) is one of the most
commonly used unsupervised machine learning algorithms across
a variety of applications: exploratory data analysis, dimensionality
reduction, information compression, data de-noising, and plenty
more.
Department of Electronics & Communication Engineering
Annamalai University
RAC Meeting – 14/12/2022

WHAT IS PCA
• The Principal Component Analysis is a popular unsupervised learning technique for reducing the
dimensionality of data. It increases interpretability yet, at the same time, it minimizes information loss. It
helps to find the most significant features in a dataset and makes the data easy for plotting in 2D and 3D.
PCA helps in finding a sequence of linear combinations of variables.
• In the above figure, we have several points plotted on a 2-D plane. There are two principal components. PC1
is the primary principal component that explains the maximum variance in the data. PC2 is another principal
component that is orthogonal to PC1.

What is a Principal Component?
• The Principal Components are a straight line that captures most of the
variance of the data. They have a direction and magnitude. Principal
components are orthogonal projections (perpendicular) of data onto
lower-dimensional space.

EXAMPLE
• Before we delve into its inner workings, let’s first get a better understanding of PCA.
• Imagine we have a 2-dimensional dataset. Each dimension can be represented as a
feature column:
• We can represent the same dataset as a scatterplot:

• The main aim of PCA is to find such principal components, which can
describe the data points with a set of... well, principal components.
• The principal components are vectors, but they are not chosen at
random. The first principal component is computed so that it explains
the greatest amount of variance in the original features. The second
component is orthogonal to the first, and it explains the greatest
amount of variance left after the first principal component.
• The original data can be represented as feature vectors. PCA allows us
to go a step further and represent the data as linear combinations of
principal components. Getting principal components is equivalent to a
linear transformation of data from the feature1 x feature2 axis to a
PCA1 x PCA2 axis.

There are multiple ways to calculate PCA:
1.Eigen decomposition of the covariance matrix
2.Singular value decomposition of the data matrix
3.Eigenvalue approximation via power iterative computation
4.Non-linear iterative partial least squares (NIPALS) computation
5.… and more.

• Feature standardization. We standardize each feature to
have a mean of 0 and a variance of 1. As we explain later in
assumptions and limitations, features with values that are on
different orders of magnitude prevent PCA from computing
the best principal components.
• Standardization is one type of scaling technique where the
values are centered around the mean with a unit standard
deviation. This means that the mean of the attribute becomes
zero and the resultant distribution has a unit standard
deviation.
• Here’s the formula for standardization:
1. Feature standardization

2. covariance matrix computation
• Obtain the covariance matrix computation.
The covariance matrix is a square matrix, of d x
d dimensions, where d stands for “dimension”
(or feature or column, if our data is tabular). It
shows the pairwise feature correlation between
each feature.
• Construct a square matrix to express the
correlation between two or more features in a
multidimensional dataset.

3.Calculate the eigen decomposition of the
covariance matrix.
• We calculate the eigenvectors (unit vectors) and their associated
eigenvalues (scalars by which we multiply the eigenvector) of the
covariance matrix.
• Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues are
scalars by which we multiply the eigenvector of the covariance matrix.

4. Sort the eigenvectors from the highest eigenvalue to
the lowest.
• The eigenvector with the highest eigenvalue is the first principal
component. Higher eigenvalues correspond to greater amounts of
shared variance explained.

5. Select the number of principal components.
• Select the top N eigenvectors (based on their eigenvalues) to become
the N principal components. The optimal number of principal
components is both subjective and problem-dependent. Usually, we
look at the cumulative amount of shared variance explained by the
combination of principal components and pick that number of
components, which still significantly explains the shared variance.

How does Principal Component Analysis Work?

Advantages
1.Easy to compute. PCA is based on linear algebra, which is
computationally easy to solve by computers.
2.Speeds up other machine learning algorithms. Machine learning
algorithms converge faster when trained on principal components
instead of the original dataset.
3.Counteracts the issues of high-dimensional data. High-dimensional
data causes regression-based algorithms to overfit easily. By using
PCA beforehand to lower the dimensions of the training dataset, we
prevent the predictive algorithms from overfitting.

Disadvantages
1.Low interpretability of principal components. Principal components
are linear combinations of the features from the original data, but they
are not as easy to interpret. For example, it is difficult to tell which are
the most important features in the dataset after computing principal
components.
2.The trade-off between information loss and dimensionality reduction.
Although dimensionality reduction is useful, it comes at a cost.
Information loss is a necessary part of PCA. Balancing the trade-off
between dimensionality reduction and information loss is unfortunately
a necessary compromise that we have to make when using PCA.

Applications of PCA in Machine Learning
• PCA is used to visualize multidimensional data.
• It is used to reduce the number of dimensions in healthcare data.
• PCA can help resize an image.
• It can be used in finance to analyze stock data and forecast returns.
• PCA helps to find patterns in the high-dimensional datasets.
Visualize multidimensional data. Data visualizations are a great tool for
communicating multidimensional data as 2- or 3-dimensional plots.
Compress information. Principal Component Analysis is used to compress
information to store and transmit data more efficiently. For example, it can be
used to compress images without losing too much quality, or in signal processing.
The technique has successfully been applied across a wide range of compression
problems in pattern recognition (specifically face recognition), image recognition,
and more.

• Simplify complex business decisions. PCA has been employed to
simplify traditionally complex business decisions. For example,
traders use over 300 financial instruments to manage portfolios. The
algorithm has proven successful in the risk management of interest
rate derivative portfolios, lowering the number of financial
instruments from more than 300 to just 3-4 principal components.
• Clarify convoluted scientific processes. The algorithm has been
applied extensively in the understanding of convoluted and
multidirectional factors, which increase the probability of neural
ensembles to trigger action potentials.
Contd… Department of Electronics & Communication Engineering

THANK YOU

Principal component analysis in machine L

More Related Content

What's hot

Similar to Principal component analysis in machine L

More from satyanarayana242612

Recently uploaded

Principal component analysis in machine L