2. Curse of Dimensionality
❖ Curse of Dimensionality refers to a set of problems that arise
when working with high-dimensional data.
❖ The difficulties related to training machine learning models due
to high dimensional data is referred to as ‘Curse of
Dimensionality’.
3. Few ways of reducing Dimensionality
❖ Principal Component Analysis (PCA),
❖ Factor Analysis (FA),
❖ Linear Discriminant Analysis (LDA)
❖ Truncated Singular Value Decomposition (SVD)
4. Introduction to PCA
❖ PCA is a process of figuring out the most important features or
PC that has the most impact on the target variable.
❖ PCA or principal component analysis is a dimensionality
reduction technique that can help us reduce the dimensions of
the dataset that we use in machine learning for training. It helps
with the famous dimensionality curse problem.
5. Definition of PCA
❖ Principal component analysis (PCA) is applied to transform
linearly correlated variables into uncorrelated variables called
principal components.
❖ PCA also sorts the produced uncorrelated variables according to
their variance along the data record.
❖ It is the process used for dimensionality reduction.
6.
7.
8.
9. PCA Algorithm.
There are six steps in the PCA algorithm
1. Get and subtract the mean.
2. Calculate the covariance matrix.
3. Calculate the eigenvalues and eigenvector.
4. Formulates the PCs.
5. Dimensionality reduction.
6. Reconstruct the dataset.
10.
11.
12.
13.
14. ❖ Reduce dimensionality and form feature vector the
eigenvector with the highest eigenvalue is the principal
component of the data set.
❖ In our example, the eigenvector with the largest eigenvalue
was the one that pointed down the middle of the data.
❖ Once eigenvectors are found from the covariance matrix,
the next step is to order them by eigenvalue, highest to
lowest. This gives you the components in order of
significance.
Dimensionality Reduction
15. Reconstruction of original Data
If we reduced the dimensionality, obviously, when reconstructing the data we would
lose those dimensions we chose to discard. In our example let us assume that we
considered only the x dimension
new data samples: yi = [PC1 PC2 ] T · [xi ] T