SlideShare a Scribd company logo
1 of 103
Dimensionality Reduction
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Some problem sets may have
● Large number of feature set
● Making the model extremely slow
● Even making it difficult to find a solution
● This is referred to as ‘Curse of Dimensionality’
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Example
● MNIST Dataset
a. Each pixel was a feature
b. (28*28) number of features for each image
c. Border feature had no importance and could be ignored
Border data (features) can be
ignored for all datasets
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Example
● MNIST Dataset
○ Also, neighbouring pixels are highly correlated
○ Neighbouring pixels can be merged into one without losing
much of information
○ Hence, further reducing the dimensions or features
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Some benefits of dimension reduction
● Faster and more efficient model
● Better visualization to gain important insights by detecting
patterns
Drawbacks:
● Lossy - we lose some information - we should try with the
original dataset before going for dimension reduction
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Some important facts
● Q. Probability that a random point chosen in a unit metre
square is 0.001 m from the border?
● Ans. ?
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Some important facts
● Q. Probability that a random point chosen in a unit metre
square is 0.001 m from the border?
● Ans. 0.004 = 1 - (0.998)**2
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Some important facts
● Q. Probability that a random point chosen in a unit metre
square is 0.001 m from the border?
● Ans. 0.004, meaning chances are very low that the point will
extreme along any dimension
● Q. Probability that a random point chosen on a 10,000
dimensional unit metre hypercube is 1 mm from the border?
● Ans. >99.999999 %
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Some more important facts
If we pick 2 points randomly on a unit square
● The distance between these 2 points shall be roughly 0.52
If we pick 2 points randomly in a 1,000,000 dimension hypercube
● The distance between these 2 points shall be roughly
sqrt(1000000/6)
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
Some important observations about large dimension datasets
● Higher dimensional datasets are at risk of being very sparse
● Most training sets are likely to be far away from each other
Instances much more
scattered in higher
dimensions, hence sparse
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
New dataset (test) dataset will also likely be far away from any
training instance
● making predictions much less reliable
Hence,
● more dimensional the training set is,
● the greater the risk of overfitting.
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
How to reduce the curse of dimensionality?
● Increase the size of training set (number of datasets) to reach a
sufficient density of training instances
○ However, number of instances required to reach a given
density grows exponentially with the number of dimensions
(features)
Adding more instances
will increase the
density
Machine Learning - Dimensionality Reduction
Introduction - Curse of Dimensionality
How to reduce the curse of dimensionality?
● Example:
○ For a dataset with 100 features
○ Will need more training datasets than atoms in observable
universe
○ To have the instances on an average 0.1 distance from each
other (assuming they are spread out equally)
● Hence, we reduce the dimensions
Machine Learning - Dimensionality Reduction
Main approaches for dimensionality reduction
● Projection
● Manifold Learning
Dimensionality Reduction
Machine Learning - Dimensionality Reduction
Dimensionality Reduction
Dimensionality
Reduction
Methods
Projection Manifold Learning
Machine Learning - Dimensionality Reduction
Most real-world problems do not have training instances spread out across
all dimensions
● Many features are almost constant -
● While others are correlated
Dimensionality Reduction - Projection
Q. How many features are there in the above graph?
Machine Learning - Dimensionality Reduction
Most real-world problems do not have training instances spread out across
all dimensions
● Many features are almost constant -
● While others are correlated
Dimensionality Reduction - Projection
Q. How many features are there in the above graph? 3
Machine Learning - Dimensionality Reduction
Most real-world problems do not have training instances spread out across
all dimensions
● Many features are almost constant -
● While others are correlated
Dimensionality Reduction - Projection
Q. Which of the feature is almost constant for almost all
instances? x1, x2 or x3?
Machine Learning - Dimensionality Reduction
Most real-world problems do not have training instances spread out across
all dimensions
● Many features are almost constant -
● While others are correlated
Dimensionality Reduction - Projection
Q. Which of the feature is almost constant for almost all
instances? Ans: x3
Machine Learning - Dimensionality Reduction
Most of the training instances actually lie within (or close to) a much
lower-dimensional subspace.
● Refer the diagram below
Dimensionality Reduction - Projection
A 3-
dimensional
space (x1, x2
and x3)
A lower 2-
dimensional
subspace (grey
plane)
Machine Learning - Dimensionality Reduction
● Not all instances are ON the 2-dimensional subspace
● If we project all the instances perpendicularly on the subspace
○ We get the new 2d dataset with features z1 and z2
Dimensionality Reduction - Projection
A 3-
dimensional
space (x1, x2
and x3)
A lower 2-
dimensional
subspace (grey
plane)
projections
Machine Learning - Dimensionality Reduction
Remember projection from Linear Algebra?
As we have seen in linear algebra session,
● A vector v can be projected onto
● another vector u
● By doing a dot product of v and u.
Machine Learning - Dimensionality Reduction
Remember projection from Linear Algebra?
Q. For the graph below, which of these is true?
a. Vector v is orthogonal to u
b. Vector v is projected into vector u
c. Vector u is projected into vector v
Machine Learning - Dimensionality Reduction
Remember projection from Linear Algebra?
A. For the graph below, which of these is true?
a. Vector v is orthogonal to u
b. ✅ Vector v is projected into vector u
c. Vector u is projected into vector v
Machine Learning - Dimensionality Reduction
● Like we project a vector onto another, we can project a vector onto a
plane by a dot product.
● If we project all the instances perpendicularly on the subspace
○ We get the new 2d dataset with features z1 and z2
Dimensionality Reduction - Projection
Machine Learning - Dimensionality Reduction
● The above example is demonstrated on notebook
○ Download the 3d dataset
○ Reduce it to 2 dimensions using PCA - a dimensionality reduction
technique based on projection
○ Define a utility to plot the projection arrows
○ Plot the 3d dataset, the plane and the projection arrows
○ Draw the 2d equivalent
Dimensionality Reduction - Projection
Switch to Notebook
Machine Learning - Dimensionality Reduction
● Is projection always good?
○ Not really! Example: Swiss roll toy dataset
Dimensionality Reduction - Projection
Machine Learning - Dimensionality Reduction
● Is projection always good?
○ Not really! Example: Swiss roll toy dataset
○ What if we project the training dataset onto x1 and x2.
○ The projection squashes the the different layers and hence
classification is difficult
Dimensionality Reduction - Projection
Machine Learning - Dimensionality Reduction
Dimensionality Reduction - Projection
● What if we instead open the swiss roll?
○ Opening the swiss roll does not squash the different layers
○ The layers are classifiable.
Machine Learning - Dimensionality Reduction
● Projection does not seem to work in the case of swiss roll or similar
datasets
Dimensionality Reduction - Projection
Machine Learning - Dimensionality Reduction
● The above limitation of Projection can be demoed in the following steps:
○ Visualizing the swiss roll on a 3d plot
○ Projecting the swiss roll on the x1 and x2
■ Visualizing the squashed projection
○ Visualizing the rolled out plot
Dimensionality Reduction - Projection
Switch to Notebook
Machine Learning - Dimensionality Reduction
Dimensionality Reduction
Dimensionality
Reduction
Methods
Projection
Manifold
Learning
Machine Learning - Dimensionality Reduction
Swiss roll is an example 2d manifold
● 2d manifold is a 2d shape that can be bent and twisted in a higher-
dimensional space
● A d-dimensional space is a part of n-dimensional space (d<n)
Q. For swiss roll, d =? , n =?
Dimensionality Reduction - Manifold Learning
Machine Learning - Dimensionality Reduction
Swiss roll is an example 2d manifold
● 2d manifold is a 2d shape that can be bent and twisted in a higher-
dimensional space
● A d-dimensional space is a part of n-dimensional space (d<n)
Q. For swiss roll, d =2 , n =3
Dimensionality Reduction - Manifold Learning
Machine Learning - Dimensionality Reduction
● Many dimensionality reduction algorithms work by
○ modeling the manifold on which the training instances lie
● This is called manifold learning
So, for the swiss roll
● We can model the 2d plane
● Which is rolled in a swiss roll fashion
● Hence occupying a 3d space (like rolling of a paper)
Dimensionality Reduction - Manifold Learning
Machine Learning - Dimensionality Reduction
Manifold Learning
● Relies on manifold assumption, i.e.,
○ Most real-world high-dimensional datasets lie close to a much
lower-dimensional manifold
● This is observed often empirically
Dimensionality Reduction - Manifold Learning
Machine Learning - Dimensionality Reduction
Manifold assumption is observed empirically in case of
● MNIST dataset where images of the digits have similarities:
○ Made of connected lines
○ Borders are white
○ More or less centered
● A randomly generated image would have much larger degree of
freedom as compared to the images of digits
● Hence, the constraints in the MNIST images tend to squeeze the
dataset into a lower-dimensional manifold.
Dimensionality Reduction - Manifold Learning
Machine Learning - Dimensionality Reduction
Manifold learning is accompanied by another assumption
● Going to a lower-dimensional space shall make the task-at-hand
simpler (holds true in below case)
Dimensionality Reduction - Manifold Learning
Simple classification
Machine Learning - Dimensionality Reduction
Manifold assumption accompanied by another assumption
● Going to a lower-dimensional space shall make the task-at-hand
simpler (Not always the case)
Dimensionality Reduction - Manifold Learning
Fairly complex classification
Simple classification (x1=5)
Machine Learning - Dimensionality Reduction
The previous 2 cases can be demonstrated in these steps:
● Using the 3d swiss roll dataset
● Plotting the case where the classification gets easier with manifold
● Plotting the case where the classification gets difficult with manifold
● Plotting the decision boundary in each case
Dimensionality Reduction - Manifold Learning
Switch to Notebook
Machine Learning - Dimensionality Reduction
Summary - Dimensionality Reduction
● 2 approaches: Projection and Manifold Learning
○ Depends on the dataset, which should be used
● Leads to better visualization
● Faster training
● May not always lead to a better or simpler or better solution
○ Valid both for projection or manifold learning
○ Depends on the dataset
● Lossy
○ Should always try with the original dataset before going for
dimensionality reduction
Dimensionality Reduction
Machine Learning - Dimensionality Reduction
Dimensionality Reduction
Dimensionality
Reduction
Methods
Approach:
Projection
Technique: PCA
Manifold Learning
Machine Learning - Dimensionality Reduction
Principal Component Analysis (PCA)
● The most popular dimensionality reduction algorithm
● Identify the hyperplane that lies closest to the data
● Projects the data onto the hyperplane
Machine Learning - Dimensionality Reduction
PCA- Preserving the variance
How do we select the best hyperplane to project the datasets into?
● Select the axis that preserves the maximum amount of variance
● Lose less information than other projections
Machine Learning - Dimensionality Reduction
PCA- Preserving the variance
Q. Which of these is the best axes to select (preserves maximum variance)?
c1 or c2 or c3?
c1
c3
c2
Machine Learning - Dimensionality Reduction
PCA- Preserving the variance
Q. Which of these is the best axes to select? Ans: c1.
● Preserves maximum variance as compared to other axes.
c1
c3
c2
Machine Learning - Dimensionality Reduction
PCA- Preserving the variance
Another way to say the axis that minimizes the mean squared distance
between the original dataset and its projection onto that axis.
c1
c3
c2
Machine Learning - Dimensionality Reduction
The previous case can be demonstrated in these steps:
● Generate a random 2d dataset
● Stretch it along a particular direction
● Project it along certain 3 axis
● Plot the stretched random numbers, the projections along the axes
Dimensionality Reduction - Manifold Learning
Switch to Notebook
Machine Learning - Dimensionality Reduction
PCA- Principal Components
How do we select the best hyperplane to project the datasets into?
Ans: PCA
● identifies the axis that accounts for the largest amount of variance in
the training set - 1st principal component
● Provides a second axis orthogonal to the first one that accounts for
second largest
● And so on.. Third axis, fourth axis..
Machine Learning - Dimensionality Reduction
PCA- Principal Components
The unit vector that defines that ‘i’th axis is called the ‘i’th principal
component (PC)
● 1st PC = c1
● 2nd PC = c2
● 3rd PC = c3
C1 is orthogonal to c2, c3 would be orthogonal to the plane formed by c1
and c2,
And hence orthogonal to both c1 and c2.
Image in 3d space for a minute!
Machine Learning - Dimensionality Reduction
PCA- Principal Components
Next Ques: How do we find the principal components?
● Standard factorization technique called Singular Value Decomposition
(SVD) - based on eigen value calculation!
● It divides the training dataset into the dot product of 3 matrices
○ U
○ ∑
○ transpose(V)
● Transpose(V) contains the principal components (PC) - unit vectors
Machine Learning - Dimensionality Reduction
PCA- Principal Components
Transpose(V) contains the principal components (PC) - unit vectors
● 1st PC = c1
● 2nd PC = c2
● 3rd PC = c3
● ...
Machine Learning - Dimensionality Reduction
PCA- Principal Components - SVD
SVD can implemented in scikit-learn using the code below
● SVD assumes that the data is centered around the origin
# Data needs to centralized before performing SVD
>>> X_centered = X - X.mean(axis=0)
# Performing SVD
>>> U,s,V = np.linalg.svd(X_centered)
# Printing the principal components
>>> c1, c2 = V.T[:,0], V.T[:,1]
print(c1,c2)
Q. How many principal components are we printing in the above code?
Machine Learning - Dimensionality Reduction
PCA- Principal Components - SVD
SVD can implemented in scikit-learn using the code below
● SVD assumes that the data is centered around the origin
# Data needs to centralized before performing SVD
>>> X_centered = X - X.mean(axis=0)
# Performing SVD
>>> U,s,V = np.linalg.svd(X_centered)
# Printing the principal components
>>> c1, c2 = V.T[:,0], V.T[:,1]
print(c1,c2)
Q. How many principal components are we printing in the above code?
Ans: 2
Machine Learning - Dimensionality Reduction
PCA- Projecting down to d dimensions
Once, the PCs have been found, original dataset has to be projected on the
PCs
As we have seen in linear algebra session,
● A vector v can be projected onto
● another vector u
● By doing a dot product of v and u.
Machine Learning - Dimensionality Reduction
PCA- Projecting down to d dimensions
Similarly, the
● original training dataset X can be projected onto
● the first ‘d’ principal components Wd
○ Composed of first ‘d’ columns of transpose(V) obtained in SVD
● Reducing the dataset dimensions to ‘d’
Wd = first d columns of transpose(V) containing the first d
principal components
Xd-proj = X.Wd
Machine Learning - Dimensionality Reduction
PCA- Projecting down to d dimensions
Similarly, the
● original training dataset X can be projected onto
● the first ‘d’ principal components Wd
○ Composed of first ‘d’ columns of transpose(V) obtained in SVD
● Reducing the dataset dimensions to ‘d’
First ‘d’ columns of the transpose(V)
Wd =
Machine Learning - Dimensionality Reduction
PCA- SVD and PCA
So, PCA involves two steps
● SVD and
● Projection of the training dataset onto the orthogonal principal
components
Scikit-Learn provides functions for both
● SVD, projection and
● Combined PCA
We will be comparing the codes for these
Machine Learning - Dimensionality Reduction
PCA- SVD and PCA
PCA using SVD in Sci-kit Learn PCA using Sci-kit Learn PCA function
# Centering the data and doing SVD
X_centered = X - X.mean(axis=0)
U,s,V = np.linalg.svd(X_centered)
# Extracting the components and projecting the
# original dataset
W2 = V.T[:, :2]
X2D = X_centered.dot(W2)
from sklearn.decomposition import PCA
# Directly doing PCA and transforming the
original dataset
# Takes care of centering
pca = PCA(n_components = 2)
X2D = pca.fit_transform(X)
Switch to Notebook
Machine Learning - Dimensionality Reduction
PCA- Explained Variance Ratio
Variances explained by each of the components is important
● We would like to cover as much variance as in the original dataset
● available via the explained_variance_ratio_ variable
>>> print(pca.explained_variance_ratio_)
[ 0.95369864 0.04630136]
1st component
covers 95.3 % of
the variance
2nd component
covers 4.6 % of
the variance
Machine Learning - Dimensionality Reduction
PCA- Number of PCs
How to select the number of principal components
● The principal components should explain 95% of the variance in
original dataset
● For visualization, it has to be reduced to 2 or 3
Calculating the variance explained in Scikit-Learn
>>> pca = PCA()
>>> pca.fit(X)
>>> cumsum = np.cumsum(pca.explained_variance_ratio_)
# Calculating the number of dimensions which explain 95% of variance
>>> d = np.argmax(cumsum >= 0.95) + 1
2
Machine Learning - Dimensionality Reduction
PCA- Number of PCs
# Calculating the PCs directly specifying the variance to be
explained
>>> pca = PCA(n_components=0.95)
>>> X_reduced = pca.fit_transform(X)
Machine Learning - Dimensionality Reduction
PCA- Number of PCs
Another option is to plot the explained variance
● As a function of the number of dimensions
● Elbow curve: explained variance stops growing fast after certain
number of dimensions
Machine Learning - Dimensionality Reduction
● For the above 2d dataset, we shall demonstrate
○ Calculating the estimated variance ratio
○ Calculating the number of principal components
Dimensionality Reduction - Projection
Switch to Notebook
Machine Learning - Dimensionality Reduction
PCA- Compression of dataset
Another aspect of dimensionality reduction,
● the training set takes up much less space.
● For example, applying PCA to MNIST dataset
● ORIGINAL: Each image
○ 28 X 28 pixels
○ 784 features
○ Each pixel is either on or off 0 or 1
Machine Learning - Dimensionality Reduction
PCA- Compression of dataset
After applying PCA to the MNIST data
● Number of dimensions reduces to 154 features from 784 features
● Keeping 95% of its variance
Hence, the training set is 20% of its original size
>>> pca = PCA()
>>> pca.fit(X)
>>> d = np.argmax(np.cumsum(pca.explained_variance_ratio_) >= 0.95) + 1
154
Number of features required to explain 95% variance
Machine Learning - Dimensionality Reduction
PCA- Compression of dataset - Demo
Loading the MNIST Dataset
#MNIST compression:
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.datasets import fetch_mldata
>>> mnist = fetch_mldata('MNIST original')
>>> X, y = mnist["data"], mnist["target"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> X = X_train
Machine Learning - Dimensionality Reduction
PCA- Compression of dataset
Applying PCA to the MNIST dataset
# Applying PCA to the MNIST Dataset
>>> pca = PCA()
>>> pca.fit(X)
>>> d = np.argmax(np.cumsum(pca.explained_variance_ratio_) >= 0.95) + 1
154
# Projecting onto the principal components
>>> pca = PCA(n_components=0.95)
>>> X_reduced = pca.fit_transform(X)
>>> pca.n_components_
154
# Checking for the variance explained
# did we hit the 95% minimum?
>>> np.sum(pca.explained_variance_ratio_)
0.9503623084769206
Switch to Notebook
Machine Learning - Dimensionality Reduction
PCA- Decompression
The compressed dataset can be decompressed to the original size
● For MNIST dataset, the reduced dataset (154 features)
● Back to 784 features
● Using inverse transformation of the PCA projection
# use inverse_transform to decompress back to 784 dimensions
>>> X_mnist = X_train
>>> pca = PCA(n_components = 154)
>>> X_mnist_reduced = pca.fit_transform(X_mnist)
>>> X_mnist_recovered = pca.inverse_transform(X_mnist_reduced)
Machine Learning - Dimensionality Reduction
PCA- Decompression
Plotting the recovered digits
● Recovered digits has lost some information
● Dimensionality reduction captured only 95% of variance
● It is called reconstruction error
Switch to Notebook
RecoveredOriginal
Machine Learning - Dimensionality Reduction
PCA- Incremental PCA
Problem with PCA (Batch-PCA)
● Requires the entire training dataset in-the-memory to run SVD
Incremental PCA (IPCA)
● Splits the training set into mini-batches
● Feeds one mini-batch at a time to the IPCA algorithm
● Useful for large datasets and online learning
Machine Learning - Dimensionality Reduction
PCA- Incremental PCA
Incremental PCA using Scikit Learn’s IncrementalPCA class
● And associated partial_fit() function instead of fit() and fit_transform()
# split MNIST into 100 mini-batches using Numpy array_split()
# reduce MNIST down to 154 dimensions as before.
# note use of partial_fit() for each batch.
>>> from sklearn.decomposition import IncrementalPCA
>>> n_batches = 100
>>> inc_pca = IncrementalPCA(n_components=154)
>>> for X_batch in np.array_split(X_mnist, n_batches):
print(".", end="")
inc_pca.partial_fit(X_batch)
>>> X_mnist_reduced_inc = inc_pca.transform(X_mnist)
Machine Learning - Dimensionality Reduction
PCA- Incremental PCA
Another way is to use Numpy memap class
● Uses binary array on the disk as if it was in-memory
# alternative: Numpy memmap class (use binary array on disk as if it was in memory)
>>> filename = "my_mnist.data"
>>> X_mm = np.memmap(
filename, dtype='float32', mode='write', shape=X_mnist.shape)
>>> X_mm[:] = X_mnist
>>> del X_mm
>>> X_mm = np.memmap(filename, dtype='float32', mode='readonly', shape=X_mnist.shape)
>>> batch_size = len(X_mnist) // n_batches
>>> inc_pca = IncrementalPCA(n_components=154, batch_size=batch_size)
>>> inc_pca.fit(X_mm)
Switch to Notebook
Machine Learning - Dimensionality Reduction
PCA- Randomized PCA
Using a stochastic algorithm
● To approximate the first d principal components
● O(m × d^2) + O(d^3), instead of O(m × n^2) + O(n^3)
● Dramatically faster than (Batch) PCA and Incremental PCA
○ When d << n
>>> rnd_pca = PCA(n_components=154, svd_solver="randomized")
>>> t1 = time.time()
>>> X_reduced = rnd_pca.fit_transform(X_mnist)
>>> t2 = time.time()
>>> print(t2-t1, "seconds")
4.414088487625122 seconds
Switch to Notebook
Machine Learning - Dimensionality Reduction
Kernel PCA
Using Kernel PCA
● Kernel trick can also be applied to PCA
● Makes nonlinear projections possible for dimensionality reduction
● This is called Kernel PCA (kPCA)
Important point about Kernel PCA we should remember is:
● Good at preserving clusters
● Useful when unrolling datasets that lies close to a twisted manifold
Machine Learning - Dimensionality Reduction
Kernel PCA
Kernel PCA in Scikit-Learn using KernelPCA class
● Linear Kernel
● RBF Kernel
● Sigmoid Kernel
Machine Learning - Dimensionality Reduction
Kernel PCA
Kernel PCA in Scikit-Learn using KernelPCA class
>>> from sklearn.decomposition import KernelPCA
>>> rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.04)
>>> X_reduced = rbf_pca.fit_transform(X)
Switch to Notebook
Machine Learning - Dimensionality Reduction
Kernel PCA - Selecting hyperparameters
Selecting hyper parameters
● Kernel PCA is an unsupervised learning algorithm
● No obvious performance measure to help select the best kernel and
hyperparameters
Instead, we can follow these steps:
● Create a pipeline with KernelPCA and Classification model
● Do a grid search using GridSearchCV to find the best kernel and
gamma value for kPCA
Machine Learning - Dimensionality Reduction
Kernel PCA - Selecting hyperparameters
Selecting hyper parameters
● Create a pipeline with KernelPCA and Classification model
● Doing a grid search using GridSearchCV to find the best kernel and
gamma value for kPCA
>>> clf = Pipeline([
("kpca", KernelPCA(n_components=2)),
("log_reg", LogisticRegression())])
>>> param_grid = [{
"kpca__gamma": np.linspace(0.03, 0.05, 10),
"kpca__kernel": ["rbf", "sigmoid"]}]
>>> grid_search = GridSearchCV(clf, param_grid, cv=3)
Switch to Notebook
Machine Learning - Dimensionality Reduction
Kernel PCA - Reconstruction
Reconstruction in Kernel PCA
Machine Learning - Dimensionality Reduction
Kernel PCA - Reconstruction
Reconstruction in Kernel PCA
● 2 steps followed in Kernel PCA
○ Mapping to a higher infinite-dimensional feature space
○ Then projecting the transformed training set into 2d using linear
PCA
● Inverse of linear PCA step would lie in the feature space, not in the
original space
○ Since infinite-dimensional, we cannot compute the reconstruction
point
○ Therefore, cannot compute the true reconstruction error
Machine Learning - Dimensionality Reduction
Kernel PCA - Reconstruction
Reconstruction in Kernel PCA
● For reconstruction, we instead use a pre-image
○ By finding a point in the original space that would map close to the
reconstructed point
○ Can find the squared distance with the original space
○ Then select the kernel and hyperparameters that minimize the
reconstruction pre-image error
Machine Learning - Dimensionality Reduction
Kernel PCA - Reconstruction
Machine Learning - Dimensionality Reduction
Kernel PCA - Reconstruction Error
Calculating reconstruction error when using kernel PCA
● Inverse_transform in scikit-learn creates the pre-image
● Which can be used to calculate the mean squared error
## Performing Kernel PCA and enabling inverse transform
## to enable pre-image computation
>>> rbf_pca = KernelPCA(
n_components = 2,
kernel="rbf",
gamma=0.0433,
fit_inverse_transform=True) # perform reconstruction
...contd
Machine Learning - Dimensionality Reduction
Kernel PCA - Reconstruction Error
Calculating reconstruction error when using kernel PCA
● Inverse_transform in scikit-learn creates the pre-image
● Which can be used to calculate the mean squared error
## Calculating the reduced space using kernel PCA and pre-image
>>> X_reduced = rbf_pca.fit_transform(X)
>>> X_preimage = rbf_pca.inverse_transform(X_reduced)
# return reconstruction pre-image error
>>> from sklearn.metrics import mean_squared_error
>>> mean_squared_error(X, X_preimage)
Switch to Notebook
Machine Learning - Dimensionality Reduction
Dimensionality Reduction
Dimensionality
Reduction
Methods
Projection
Technique: PCA
Manifold Learning
Technique: LLEIncremental PCA,
Randomized PCA,
Kernel PCA
Machine Learning - Dimensionality Reduction
LLE
Local Linear Embedding (LLE)
● Another powerful nonlinear dimensionality reduction (NLDR)
technique
● Manifold technique that does not rely on projections
● Works by
○ Measuring how each training instance linearly relates to its closest
neighbours
○ Then looking for low-dimensional representation where these local
relationships are best preserved
● Good at unrolling twisted manifolds, especially when there is not
much noise
Machine Learning - Dimensionality Reduction
LLE
Local Linear Embedding (LLE) in scikit-learn
● LocallyLinearEmbedding class in sklearn.manifold
● Run on the swiss roll example
● Step 1: Make the swiss roll
>>> from sklearn.datasets import make_swiss_roll
>>> X, t = make_swiss_roll(
n_samples=1000,
noise=0.2,
random_state=41)
...contd
Machine Learning - Dimensionality Reduction
LLE
Local Linear Embedding (LLE) in scikit-learn
● LocallyLinearEmbedding class in sklearn.manifold
● Run on the swiss roll example
● Step 2: Instantiate LLE class in sklearn and fit the swiss roll training
features using the LLE model
>>> from sklearn.manifold import LocallyLinearEmbedding
>>> lle = LocallyLinearEmbedding(
n_neighbors=10,
n_components=2,
random_state=42)
>>> X_reduced = lle.fit_transform(X)
...contd
Machine Learning - Dimensionality Reduction
LLE
Local Linear Embedding (LLE) in scikit-learn
● LocallyLinearEmbedding class in sklearn.manifold
● Run on the swiss roll example
● Step 3: Plot the reduced dimension data
>>> plt.title("Unrolled swiss roll using LLE", fontsize=14)
>>> plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=t, cmap=plt.cm.hot)
>>> plt.xlabel("$z_1$", fontsize=18)
>>> plt.ylabel("$z_2$", fontsize=18)
>>> plt.axis([-0.065, 0.055, -0.1, 0.12])
>>> plt.grid(True)
>>> plt.show()
...contd
Machine Learning - Dimensionality Reduction
LLE
Local Linear Embedding (LLE) in scikit-learn
● LocallyLinearEmbedding class in sklearn.manifold
● Run on the swiss roll example
Switch to Notebook
Machine Learning - Dimensionality Reduction
LLE
Observations
● Swiss roll is completely unrolled
● Distances between the instances
are locally preserved
● Not preserved on a larger scale
○ Left most part is squeezed
○ Right part is stretched
Distance locally preserved
Machine Learning - Dimensionality Reduction
LLE - How it Works? Maths!
How LLE works?
Step 1: For each training instance, the algorithm identifies the k closest
neighbours
Step 2: reconstructs the instance as a linear function of these closest
neighbours
● More specifically, finds the weight w vector such that distance
between the closest neighbours and the instance is as small as
possible.
Machine Learning - Dimensionality Reduction
LLE - How it Works? Maths!
How LLE works?
Step 3: Map the training instances into a d-dimensional space while
preserving the local relationship as much as possible
● Basically, keeping the same weight as calculated in the previous step,
the new instance should have minimum distances with the previous
closest neighbours (same weights and relationship)
Machine Learning - Dimensionality Reduction
LLE - Time Complexity
How LLE works?
Step 1: finding K nearest neighbors: O(m x log(m) x n x log(k))
Step 2: weight optimization: O(m x n x k^3)
Step 3: constructing low-d representations: O(d x m^2)
Where m = number of training datasets,
n = number of original dimensions
k = nearest neighbours
d = reduced dimensions
Step 3 makes the model very slow for large number of training datasets
Machine Learning - Dimensionality Reduction
Other dimensionality techniques
Multidimensional Scaling (MDS)
● Reduces dimensionality
● trying to preserve the instances
>>> from sklearn.manifold import MDS
>>> mds = MDS(n_components=2,
random_state=42)
>>> X_reduced_mds = mds.fit_transform(X)
Machine Learning - Dimensionality Reduction
Other dimensionality techniques
Isomap
● Creates a graph connecting each instance to its nearest neighbours
● Then, reduces dimensionality
● Trying to preserve geodesic distances between instances
>>> from sklearn.manifold import Isomap
>>> isomap = Isomap(n_components=2)
>>> X_reduced_isomap =
isomap.fit_transform(X)
Machine Learning - Dimensionality Reduction
Other dimensionality techniques
T-distributed Stochastic Neighbour Embedding
● Reduces dimensionality
● Keeping similar instances close and dissimilar apart
● Mostly used for visualize clusters in high-dimensional space
>>> from sklearn.manifold import TSNE
>>> tsne = TSNE(n_components=2)
>>> X_reduced_tsne = tsne.fit_transform(X)
Machine Learning - Dimensionality Reduction
Other dimensionality techniques
Linear Discriminant Analysis (LDA)
● A classification algorithm
● During training learns the most discriminative axes between the
classes
● Axes can be used to define the hyper plant to project the data
● Projection will keep the classes as far apart as possible
● A good technique to reduce dimensionality before running
classification algorithms such as SVM Classifier
Machine Learning - Dimensionality Reduction
Other dimensionality techniques
Plotting the results for each of the techniques on the notebook
>>> titles = ["MDS", "Isomap", "t-SNE"]
>>> plt.figure(figsize=(11,4))
for subplot, title, X_reduced in zip((131, 132, 133), titles,
(X_reduced_mds, X_reduced_isomap, X_reduced_tsne)):
plt.subplot(subplot)
plt.title(title, fontsize=14)
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=t, cmap=plt.cm.hot)
plt.xlabel("$z_1$", fontsize=18)
if subplot == 131:
plt.ylabel("$z_2$", fontsize=18, rotation=0)
plt.grid(True)
>>> plt.show()
Switch to Notebook
Machine Learning - Dimensionality Reduction
Other dimensionality techniques
Plotting the results for each of the techniques on the notebook
Archives
Machine Learning - Dimensionality Reduction
PCA- Projecting down to d dimensions
Similarly, the
● original training dataset X can be projected onto
● the first ‘d’ principal components Wd
○ Composed of first ‘d’ columns of transpose(V) obtained in SVD
● Reducing the dataset dimensions to ‘d’
Wd = first d columns of transpose(V) containing the first d
principal components
Xd-proj = X.Wd

More Related Content

What's hot

What's hot (20)

Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
Brain Tumour Detection.pptx
Brain Tumour Detection.pptxBrain Tumour Detection.pptx
Brain Tumour Detection.pptx
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 

Similar to Dimensionality Reduction | Machine Learning | CloudxLab

Questions On The Equation For Regression
Questions On The Equation For RegressionQuestions On The Equation For Regression
Questions On The Equation For Regression
Tiffany Sandoval
 
3D Graphics
3D Graphics3D Graphics
3D Graphics
ViTAly
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16
Kumar
 

Similar to Dimensionality Reduction | Machine Learning | CloudxLab (20)

Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
Machine learning using matlab.pdf
Machine learning using matlab.pdfMachine learning using matlab.pdf
Machine learning using matlab.pdf
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Questions On The Equation For Regression
Questions On The Equation For RegressionQuestions On The Equation For Regression
Questions On The Equation For Regression
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupDTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
Computer Graphics - Lecture 03 - Virtual Cameras and the Transformation Pipeline
Computer Graphics - Lecture 03 - Virtual Cameras and the Transformation PipelineComputer Graphics - Lecture 03 - Virtual Cameras and the Transformation Pipeline
Computer Graphics - Lecture 03 - Virtual Cameras and the Transformation Pipeline
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Metric Recovery from Unweighted k-NN Graphs
Metric Recovery from Unweighted k-NN GraphsMetric Recovery from Unweighted k-NN Graphs
Metric Recovery from Unweighted k-NN Graphs
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Weakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloudWeakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloud
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
3D Graphics
3D Graphics3D Graphics
3D Graphics
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16
 

More from CloudxLab

More from CloudxLab (20)

Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Deep Learning Overview
Deep Learning OverviewDeep Learning Overview
Deep Learning Overview
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Training Deep Neural Nets
Training Deep Neural NetsTraining Deep Neural Nets
Training Deep Neural Nets
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
 
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabIntroduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 

Dimensionality Reduction | Machine Learning | CloudxLab

  • 2. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Some problem sets may have ● Large number of feature set ● Making the model extremely slow ● Even making it difficult to find a solution ● This is referred to as ‘Curse of Dimensionality’
  • 3. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Example ● MNIST Dataset a. Each pixel was a feature b. (28*28) number of features for each image c. Border feature had no importance and could be ignored Border data (features) can be ignored for all datasets
  • 4. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Example ● MNIST Dataset ○ Also, neighbouring pixels are highly correlated ○ Neighbouring pixels can be merged into one without losing much of information ○ Hence, further reducing the dimensions or features
  • 5. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Some benefits of dimension reduction ● Faster and more efficient model ● Better visualization to gain important insights by detecting patterns Drawbacks: ● Lossy - we lose some information - we should try with the original dataset before going for dimension reduction
  • 6. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Some important facts ● Q. Probability that a random point chosen in a unit metre square is 0.001 m from the border? ● Ans. ?
  • 7. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Some important facts ● Q. Probability that a random point chosen in a unit metre square is 0.001 m from the border? ● Ans. 0.004 = 1 - (0.998)**2
  • 8. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Some important facts ● Q. Probability that a random point chosen in a unit metre square is 0.001 m from the border? ● Ans. 0.004, meaning chances are very low that the point will extreme along any dimension ● Q. Probability that a random point chosen on a 10,000 dimensional unit metre hypercube is 1 mm from the border? ● Ans. >99.999999 %
  • 9. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Some more important facts If we pick 2 points randomly on a unit square ● The distance between these 2 points shall be roughly 0.52 If we pick 2 points randomly in a 1,000,000 dimension hypercube ● The distance between these 2 points shall be roughly sqrt(1000000/6)
  • 10. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality Some important observations about large dimension datasets ● Higher dimensional datasets are at risk of being very sparse ● Most training sets are likely to be far away from each other Instances much more scattered in higher dimensions, hence sparse
  • 11. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality New dataset (test) dataset will also likely be far away from any training instance ● making predictions much less reliable Hence, ● more dimensional the training set is, ● the greater the risk of overfitting.
  • 12. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality How to reduce the curse of dimensionality? ● Increase the size of training set (number of datasets) to reach a sufficient density of training instances ○ However, number of instances required to reach a given density grows exponentially with the number of dimensions (features) Adding more instances will increase the density
  • 13. Machine Learning - Dimensionality Reduction Introduction - Curse of Dimensionality How to reduce the curse of dimensionality? ● Example: ○ For a dataset with 100 features ○ Will need more training datasets than atoms in observable universe ○ To have the instances on an average 0.1 distance from each other (assuming they are spread out equally) ● Hence, we reduce the dimensions
  • 14. Machine Learning - Dimensionality Reduction Main approaches for dimensionality reduction ● Projection ● Manifold Learning Dimensionality Reduction
  • 15. Machine Learning - Dimensionality Reduction Dimensionality Reduction Dimensionality Reduction Methods Projection Manifold Learning
  • 16. Machine Learning - Dimensionality Reduction Most real-world problems do not have training instances spread out across all dimensions ● Many features are almost constant - ● While others are correlated Dimensionality Reduction - Projection Q. How many features are there in the above graph?
  • 17. Machine Learning - Dimensionality Reduction Most real-world problems do not have training instances spread out across all dimensions ● Many features are almost constant - ● While others are correlated Dimensionality Reduction - Projection Q. How many features are there in the above graph? 3
  • 18. Machine Learning - Dimensionality Reduction Most real-world problems do not have training instances spread out across all dimensions ● Many features are almost constant - ● While others are correlated Dimensionality Reduction - Projection Q. Which of the feature is almost constant for almost all instances? x1, x2 or x3?
  • 19. Machine Learning - Dimensionality Reduction Most real-world problems do not have training instances spread out across all dimensions ● Many features are almost constant - ● While others are correlated Dimensionality Reduction - Projection Q. Which of the feature is almost constant for almost all instances? Ans: x3
  • 20. Machine Learning - Dimensionality Reduction Most of the training instances actually lie within (or close to) a much lower-dimensional subspace. ● Refer the diagram below Dimensionality Reduction - Projection A 3- dimensional space (x1, x2 and x3) A lower 2- dimensional subspace (grey plane)
  • 21. Machine Learning - Dimensionality Reduction ● Not all instances are ON the 2-dimensional subspace ● If we project all the instances perpendicularly on the subspace ○ We get the new 2d dataset with features z1 and z2 Dimensionality Reduction - Projection A 3- dimensional space (x1, x2 and x3) A lower 2- dimensional subspace (grey plane) projections
  • 22. Machine Learning - Dimensionality Reduction Remember projection from Linear Algebra? As we have seen in linear algebra session, ● A vector v can be projected onto ● another vector u ● By doing a dot product of v and u.
  • 23. Machine Learning - Dimensionality Reduction Remember projection from Linear Algebra? Q. For the graph below, which of these is true? a. Vector v is orthogonal to u b. Vector v is projected into vector u c. Vector u is projected into vector v
  • 24. Machine Learning - Dimensionality Reduction Remember projection from Linear Algebra? A. For the graph below, which of these is true? a. Vector v is orthogonal to u b. ✅ Vector v is projected into vector u c. Vector u is projected into vector v
  • 25. Machine Learning - Dimensionality Reduction ● Like we project a vector onto another, we can project a vector onto a plane by a dot product. ● If we project all the instances perpendicularly on the subspace ○ We get the new 2d dataset with features z1 and z2 Dimensionality Reduction - Projection
  • 26. Machine Learning - Dimensionality Reduction ● The above example is demonstrated on notebook ○ Download the 3d dataset ○ Reduce it to 2 dimensions using PCA - a dimensionality reduction technique based on projection ○ Define a utility to plot the projection arrows ○ Plot the 3d dataset, the plane and the projection arrows ○ Draw the 2d equivalent Dimensionality Reduction - Projection Switch to Notebook
  • 27. Machine Learning - Dimensionality Reduction ● Is projection always good? ○ Not really! Example: Swiss roll toy dataset Dimensionality Reduction - Projection
  • 28. Machine Learning - Dimensionality Reduction ● Is projection always good? ○ Not really! Example: Swiss roll toy dataset ○ What if we project the training dataset onto x1 and x2. ○ The projection squashes the the different layers and hence classification is difficult Dimensionality Reduction - Projection
  • 29. Machine Learning - Dimensionality Reduction Dimensionality Reduction - Projection ● What if we instead open the swiss roll? ○ Opening the swiss roll does not squash the different layers ○ The layers are classifiable.
  • 30. Machine Learning - Dimensionality Reduction ● Projection does not seem to work in the case of swiss roll or similar datasets Dimensionality Reduction - Projection
  • 31. Machine Learning - Dimensionality Reduction ● The above limitation of Projection can be demoed in the following steps: ○ Visualizing the swiss roll on a 3d plot ○ Projecting the swiss roll on the x1 and x2 ■ Visualizing the squashed projection ○ Visualizing the rolled out plot Dimensionality Reduction - Projection Switch to Notebook
  • 32. Machine Learning - Dimensionality Reduction Dimensionality Reduction Dimensionality Reduction Methods Projection Manifold Learning
  • 33. Machine Learning - Dimensionality Reduction Swiss roll is an example 2d manifold ● 2d manifold is a 2d shape that can be bent and twisted in a higher- dimensional space ● A d-dimensional space is a part of n-dimensional space (d<n) Q. For swiss roll, d =? , n =? Dimensionality Reduction - Manifold Learning
  • 34. Machine Learning - Dimensionality Reduction Swiss roll is an example 2d manifold ● 2d manifold is a 2d shape that can be bent and twisted in a higher- dimensional space ● A d-dimensional space is a part of n-dimensional space (d<n) Q. For swiss roll, d =2 , n =3 Dimensionality Reduction - Manifold Learning
  • 35. Machine Learning - Dimensionality Reduction ● Many dimensionality reduction algorithms work by ○ modeling the manifold on which the training instances lie ● This is called manifold learning So, for the swiss roll ● We can model the 2d plane ● Which is rolled in a swiss roll fashion ● Hence occupying a 3d space (like rolling of a paper) Dimensionality Reduction - Manifold Learning
  • 36. Machine Learning - Dimensionality Reduction Manifold Learning ● Relies on manifold assumption, i.e., ○ Most real-world high-dimensional datasets lie close to a much lower-dimensional manifold ● This is observed often empirically Dimensionality Reduction - Manifold Learning
  • 37. Machine Learning - Dimensionality Reduction Manifold assumption is observed empirically in case of ● MNIST dataset where images of the digits have similarities: ○ Made of connected lines ○ Borders are white ○ More or less centered ● A randomly generated image would have much larger degree of freedom as compared to the images of digits ● Hence, the constraints in the MNIST images tend to squeeze the dataset into a lower-dimensional manifold. Dimensionality Reduction - Manifold Learning
  • 38. Machine Learning - Dimensionality Reduction Manifold learning is accompanied by another assumption ● Going to a lower-dimensional space shall make the task-at-hand simpler (holds true in below case) Dimensionality Reduction - Manifold Learning Simple classification
  • 39. Machine Learning - Dimensionality Reduction Manifold assumption accompanied by another assumption ● Going to a lower-dimensional space shall make the task-at-hand simpler (Not always the case) Dimensionality Reduction - Manifold Learning Fairly complex classification Simple classification (x1=5)
  • 40. Machine Learning - Dimensionality Reduction The previous 2 cases can be demonstrated in these steps: ● Using the 3d swiss roll dataset ● Plotting the case where the classification gets easier with manifold ● Plotting the case where the classification gets difficult with manifold ● Plotting the decision boundary in each case Dimensionality Reduction - Manifold Learning Switch to Notebook
  • 41. Machine Learning - Dimensionality Reduction Summary - Dimensionality Reduction ● 2 approaches: Projection and Manifold Learning ○ Depends on the dataset, which should be used ● Leads to better visualization ● Faster training ● May not always lead to a better or simpler or better solution ○ Valid both for projection or manifold learning ○ Depends on the dataset ● Lossy ○ Should always try with the original dataset before going for dimensionality reduction Dimensionality Reduction
  • 42. Machine Learning - Dimensionality Reduction Dimensionality Reduction Dimensionality Reduction Methods Approach: Projection Technique: PCA Manifold Learning
  • 43. Machine Learning - Dimensionality Reduction Principal Component Analysis (PCA) ● The most popular dimensionality reduction algorithm ● Identify the hyperplane that lies closest to the data ● Projects the data onto the hyperplane
  • 44. Machine Learning - Dimensionality Reduction PCA- Preserving the variance How do we select the best hyperplane to project the datasets into? ● Select the axis that preserves the maximum amount of variance ● Lose less information than other projections
  • 45. Machine Learning - Dimensionality Reduction PCA- Preserving the variance Q. Which of these is the best axes to select (preserves maximum variance)? c1 or c2 or c3? c1 c3 c2
  • 46. Machine Learning - Dimensionality Reduction PCA- Preserving the variance Q. Which of these is the best axes to select? Ans: c1. ● Preserves maximum variance as compared to other axes. c1 c3 c2
  • 47. Machine Learning - Dimensionality Reduction PCA- Preserving the variance Another way to say the axis that minimizes the mean squared distance between the original dataset and its projection onto that axis. c1 c3 c2
  • 48. Machine Learning - Dimensionality Reduction The previous case can be demonstrated in these steps: ● Generate a random 2d dataset ● Stretch it along a particular direction ● Project it along certain 3 axis ● Plot the stretched random numbers, the projections along the axes Dimensionality Reduction - Manifold Learning Switch to Notebook
  • 49. Machine Learning - Dimensionality Reduction PCA- Principal Components How do we select the best hyperplane to project the datasets into? Ans: PCA ● identifies the axis that accounts for the largest amount of variance in the training set - 1st principal component ● Provides a second axis orthogonal to the first one that accounts for second largest ● And so on.. Third axis, fourth axis..
  • 50. Machine Learning - Dimensionality Reduction PCA- Principal Components The unit vector that defines that ‘i’th axis is called the ‘i’th principal component (PC) ● 1st PC = c1 ● 2nd PC = c2 ● 3rd PC = c3 C1 is orthogonal to c2, c3 would be orthogonal to the plane formed by c1 and c2, And hence orthogonal to both c1 and c2. Image in 3d space for a minute!
  • 51. Machine Learning - Dimensionality Reduction PCA- Principal Components Next Ques: How do we find the principal components? ● Standard factorization technique called Singular Value Decomposition (SVD) - based on eigen value calculation! ● It divides the training dataset into the dot product of 3 matrices ○ U ○ ∑ ○ transpose(V) ● Transpose(V) contains the principal components (PC) - unit vectors
  • 52. Machine Learning - Dimensionality Reduction PCA- Principal Components Transpose(V) contains the principal components (PC) - unit vectors ● 1st PC = c1 ● 2nd PC = c2 ● 3rd PC = c3 ● ...
  • 53. Machine Learning - Dimensionality Reduction PCA- Principal Components - SVD SVD can implemented in scikit-learn using the code below ● SVD assumes that the data is centered around the origin # Data needs to centralized before performing SVD >>> X_centered = X - X.mean(axis=0) # Performing SVD >>> U,s,V = np.linalg.svd(X_centered) # Printing the principal components >>> c1, c2 = V.T[:,0], V.T[:,1] print(c1,c2) Q. How many principal components are we printing in the above code?
  • 54. Machine Learning - Dimensionality Reduction PCA- Principal Components - SVD SVD can implemented in scikit-learn using the code below ● SVD assumes that the data is centered around the origin # Data needs to centralized before performing SVD >>> X_centered = X - X.mean(axis=0) # Performing SVD >>> U,s,V = np.linalg.svd(X_centered) # Printing the principal components >>> c1, c2 = V.T[:,0], V.T[:,1] print(c1,c2) Q. How many principal components are we printing in the above code? Ans: 2
  • 55. Machine Learning - Dimensionality Reduction PCA- Projecting down to d dimensions Once, the PCs have been found, original dataset has to be projected on the PCs As we have seen in linear algebra session, ● A vector v can be projected onto ● another vector u ● By doing a dot product of v and u.
  • 56. Machine Learning - Dimensionality Reduction PCA- Projecting down to d dimensions Similarly, the ● original training dataset X can be projected onto ● the first ‘d’ principal components Wd ○ Composed of first ‘d’ columns of transpose(V) obtained in SVD ● Reducing the dataset dimensions to ‘d’ Wd = first d columns of transpose(V) containing the first d principal components Xd-proj = X.Wd
  • 57. Machine Learning - Dimensionality Reduction PCA- Projecting down to d dimensions Similarly, the ● original training dataset X can be projected onto ● the first ‘d’ principal components Wd ○ Composed of first ‘d’ columns of transpose(V) obtained in SVD ● Reducing the dataset dimensions to ‘d’ First ‘d’ columns of the transpose(V) Wd =
  • 58. Machine Learning - Dimensionality Reduction PCA- SVD and PCA So, PCA involves two steps ● SVD and ● Projection of the training dataset onto the orthogonal principal components Scikit-Learn provides functions for both ● SVD, projection and ● Combined PCA We will be comparing the codes for these
  • 59. Machine Learning - Dimensionality Reduction PCA- SVD and PCA PCA using SVD in Sci-kit Learn PCA using Sci-kit Learn PCA function # Centering the data and doing SVD X_centered = X - X.mean(axis=0) U,s,V = np.linalg.svd(X_centered) # Extracting the components and projecting the # original dataset W2 = V.T[:, :2] X2D = X_centered.dot(W2) from sklearn.decomposition import PCA # Directly doing PCA and transforming the original dataset # Takes care of centering pca = PCA(n_components = 2) X2D = pca.fit_transform(X) Switch to Notebook
  • 60. Machine Learning - Dimensionality Reduction PCA- Explained Variance Ratio Variances explained by each of the components is important ● We would like to cover as much variance as in the original dataset ● available via the explained_variance_ratio_ variable >>> print(pca.explained_variance_ratio_) [ 0.95369864 0.04630136] 1st component covers 95.3 % of the variance 2nd component covers 4.6 % of the variance
  • 61. Machine Learning - Dimensionality Reduction PCA- Number of PCs How to select the number of principal components ● The principal components should explain 95% of the variance in original dataset ● For visualization, it has to be reduced to 2 or 3 Calculating the variance explained in Scikit-Learn >>> pca = PCA() >>> pca.fit(X) >>> cumsum = np.cumsum(pca.explained_variance_ratio_) # Calculating the number of dimensions which explain 95% of variance >>> d = np.argmax(cumsum >= 0.95) + 1 2
  • 62. Machine Learning - Dimensionality Reduction PCA- Number of PCs # Calculating the PCs directly specifying the variance to be explained >>> pca = PCA(n_components=0.95) >>> X_reduced = pca.fit_transform(X)
  • 63. Machine Learning - Dimensionality Reduction PCA- Number of PCs Another option is to plot the explained variance ● As a function of the number of dimensions ● Elbow curve: explained variance stops growing fast after certain number of dimensions
  • 64. Machine Learning - Dimensionality Reduction ● For the above 2d dataset, we shall demonstrate ○ Calculating the estimated variance ratio ○ Calculating the number of principal components Dimensionality Reduction - Projection Switch to Notebook
  • 65. Machine Learning - Dimensionality Reduction PCA- Compression of dataset Another aspect of dimensionality reduction, ● the training set takes up much less space. ● For example, applying PCA to MNIST dataset ● ORIGINAL: Each image ○ 28 X 28 pixels ○ 784 features ○ Each pixel is either on or off 0 or 1
  • 66. Machine Learning - Dimensionality Reduction PCA- Compression of dataset After applying PCA to the MNIST data ● Number of dimensions reduces to 154 features from 784 features ● Keeping 95% of its variance Hence, the training set is 20% of its original size >>> pca = PCA() >>> pca.fit(X) >>> d = np.argmax(np.cumsum(pca.explained_variance_ratio_) >= 0.95) + 1 154 Number of features required to explain 95% variance
  • 67. Machine Learning - Dimensionality Reduction PCA- Compression of dataset - Demo Loading the MNIST Dataset #MNIST compression: >>> from sklearn.model_selection import train_test_split >>> from sklearn.datasets import fetch_mldata >>> mnist = fetch_mldata('MNIST original') >>> X, y = mnist["data"], mnist["target"] >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> X = X_train
  • 68. Machine Learning - Dimensionality Reduction PCA- Compression of dataset Applying PCA to the MNIST dataset # Applying PCA to the MNIST Dataset >>> pca = PCA() >>> pca.fit(X) >>> d = np.argmax(np.cumsum(pca.explained_variance_ratio_) >= 0.95) + 1 154 # Projecting onto the principal components >>> pca = PCA(n_components=0.95) >>> X_reduced = pca.fit_transform(X) >>> pca.n_components_ 154 # Checking for the variance explained # did we hit the 95% minimum? >>> np.sum(pca.explained_variance_ratio_) 0.9503623084769206 Switch to Notebook
  • 69. Machine Learning - Dimensionality Reduction PCA- Decompression The compressed dataset can be decompressed to the original size ● For MNIST dataset, the reduced dataset (154 features) ● Back to 784 features ● Using inverse transformation of the PCA projection # use inverse_transform to decompress back to 784 dimensions >>> X_mnist = X_train >>> pca = PCA(n_components = 154) >>> X_mnist_reduced = pca.fit_transform(X_mnist) >>> X_mnist_recovered = pca.inverse_transform(X_mnist_reduced)
  • 70. Machine Learning - Dimensionality Reduction PCA- Decompression Plotting the recovered digits ● Recovered digits has lost some information ● Dimensionality reduction captured only 95% of variance ● It is called reconstruction error Switch to Notebook RecoveredOriginal
  • 71. Machine Learning - Dimensionality Reduction PCA- Incremental PCA Problem with PCA (Batch-PCA) ● Requires the entire training dataset in-the-memory to run SVD Incremental PCA (IPCA) ● Splits the training set into mini-batches ● Feeds one mini-batch at a time to the IPCA algorithm ● Useful for large datasets and online learning
  • 72. Machine Learning - Dimensionality Reduction PCA- Incremental PCA Incremental PCA using Scikit Learn’s IncrementalPCA class ● And associated partial_fit() function instead of fit() and fit_transform() # split MNIST into 100 mini-batches using Numpy array_split() # reduce MNIST down to 154 dimensions as before. # note use of partial_fit() for each batch. >>> from sklearn.decomposition import IncrementalPCA >>> n_batches = 100 >>> inc_pca = IncrementalPCA(n_components=154) >>> for X_batch in np.array_split(X_mnist, n_batches): print(".", end="") inc_pca.partial_fit(X_batch) >>> X_mnist_reduced_inc = inc_pca.transform(X_mnist)
  • 73. Machine Learning - Dimensionality Reduction PCA- Incremental PCA Another way is to use Numpy memap class ● Uses binary array on the disk as if it was in-memory # alternative: Numpy memmap class (use binary array on disk as if it was in memory) >>> filename = "my_mnist.data" >>> X_mm = np.memmap( filename, dtype='float32', mode='write', shape=X_mnist.shape) >>> X_mm[:] = X_mnist >>> del X_mm >>> X_mm = np.memmap(filename, dtype='float32', mode='readonly', shape=X_mnist.shape) >>> batch_size = len(X_mnist) // n_batches >>> inc_pca = IncrementalPCA(n_components=154, batch_size=batch_size) >>> inc_pca.fit(X_mm) Switch to Notebook
  • 74. Machine Learning - Dimensionality Reduction PCA- Randomized PCA Using a stochastic algorithm ● To approximate the first d principal components ● O(m × d^2) + O(d^3), instead of O(m × n^2) + O(n^3) ● Dramatically faster than (Batch) PCA and Incremental PCA ○ When d << n >>> rnd_pca = PCA(n_components=154, svd_solver="randomized") >>> t1 = time.time() >>> X_reduced = rnd_pca.fit_transform(X_mnist) >>> t2 = time.time() >>> print(t2-t1, "seconds") 4.414088487625122 seconds Switch to Notebook
  • 75. Machine Learning - Dimensionality Reduction Kernel PCA Using Kernel PCA ● Kernel trick can also be applied to PCA ● Makes nonlinear projections possible for dimensionality reduction ● This is called Kernel PCA (kPCA) Important point about Kernel PCA we should remember is: ● Good at preserving clusters ● Useful when unrolling datasets that lies close to a twisted manifold
  • 76. Machine Learning - Dimensionality Reduction Kernel PCA Kernel PCA in Scikit-Learn using KernelPCA class ● Linear Kernel ● RBF Kernel ● Sigmoid Kernel
  • 77. Machine Learning - Dimensionality Reduction Kernel PCA Kernel PCA in Scikit-Learn using KernelPCA class >>> from sklearn.decomposition import KernelPCA >>> rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.04) >>> X_reduced = rbf_pca.fit_transform(X) Switch to Notebook
  • 78. Machine Learning - Dimensionality Reduction Kernel PCA - Selecting hyperparameters Selecting hyper parameters ● Kernel PCA is an unsupervised learning algorithm ● No obvious performance measure to help select the best kernel and hyperparameters Instead, we can follow these steps: ● Create a pipeline with KernelPCA and Classification model ● Do a grid search using GridSearchCV to find the best kernel and gamma value for kPCA
  • 79. Machine Learning - Dimensionality Reduction Kernel PCA - Selecting hyperparameters Selecting hyper parameters ● Create a pipeline with KernelPCA and Classification model ● Doing a grid search using GridSearchCV to find the best kernel and gamma value for kPCA >>> clf = Pipeline([ ("kpca", KernelPCA(n_components=2)), ("log_reg", LogisticRegression())]) >>> param_grid = [{ "kpca__gamma": np.linspace(0.03, 0.05, 10), "kpca__kernel": ["rbf", "sigmoid"]}] >>> grid_search = GridSearchCV(clf, param_grid, cv=3) Switch to Notebook
  • 80. Machine Learning - Dimensionality Reduction Kernel PCA - Reconstruction Reconstruction in Kernel PCA
  • 81. Machine Learning - Dimensionality Reduction Kernel PCA - Reconstruction Reconstruction in Kernel PCA ● 2 steps followed in Kernel PCA ○ Mapping to a higher infinite-dimensional feature space ○ Then projecting the transformed training set into 2d using linear PCA ● Inverse of linear PCA step would lie in the feature space, not in the original space ○ Since infinite-dimensional, we cannot compute the reconstruction point ○ Therefore, cannot compute the true reconstruction error
  • 82. Machine Learning - Dimensionality Reduction Kernel PCA - Reconstruction Reconstruction in Kernel PCA ● For reconstruction, we instead use a pre-image ○ By finding a point in the original space that would map close to the reconstructed point ○ Can find the squared distance with the original space ○ Then select the kernel and hyperparameters that minimize the reconstruction pre-image error
  • 83. Machine Learning - Dimensionality Reduction Kernel PCA - Reconstruction
  • 84. Machine Learning - Dimensionality Reduction Kernel PCA - Reconstruction Error Calculating reconstruction error when using kernel PCA ● Inverse_transform in scikit-learn creates the pre-image ● Which can be used to calculate the mean squared error ## Performing Kernel PCA and enabling inverse transform ## to enable pre-image computation >>> rbf_pca = KernelPCA( n_components = 2, kernel="rbf", gamma=0.0433, fit_inverse_transform=True) # perform reconstruction ...contd
  • 85. Machine Learning - Dimensionality Reduction Kernel PCA - Reconstruction Error Calculating reconstruction error when using kernel PCA ● Inverse_transform in scikit-learn creates the pre-image ● Which can be used to calculate the mean squared error ## Calculating the reduced space using kernel PCA and pre-image >>> X_reduced = rbf_pca.fit_transform(X) >>> X_preimage = rbf_pca.inverse_transform(X_reduced) # return reconstruction pre-image error >>> from sklearn.metrics import mean_squared_error >>> mean_squared_error(X, X_preimage) Switch to Notebook
  • 86. Machine Learning - Dimensionality Reduction Dimensionality Reduction Dimensionality Reduction Methods Projection Technique: PCA Manifold Learning Technique: LLEIncremental PCA, Randomized PCA, Kernel PCA
  • 87. Machine Learning - Dimensionality Reduction LLE Local Linear Embedding (LLE) ● Another powerful nonlinear dimensionality reduction (NLDR) technique ● Manifold technique that does not rely on projections ● Works by ○ Measuring how each training instance linearly relates to its closest neighbours ○ Then looking for low-dimensional representation where these local relationships are best preserved ● Good at unrolling twisted manifolds, especially when there is not much noise
  • 88. Machine Learning - Dimensionality Reduction LLE Local Linear Embedding (LLE) in scikit-learn ● LocallyLinearEmbedding class in sklearn.manifold ● Run on the swiss roll example ● Step 1: Make the swiss roll >>> from sklearn.datasets import make_swiss_roll >>> X, t = make_swiss_roll( n_samples=1000, noise=0.2, random_state=41) ...contd
  • 89. Machine Learning - Dimensionality Reduction LLE Local Linear Embedding (LLE) in scikit-learn ● LocallyLinearEmbedding class in sklearn.manifold ● Run on the swiss roll example ● Step 2: Instantiate LLE class in sklearn and fit the swiss roll training features using the LLE model >>> from sklearn.manifold import LocallyLinearEmbedding >>> lle = LocallyLinearEmbedding( n_neighbors=10, n_components=2, random_state=42) >>> X_reduced = lle.fit_transform(X) ...contd
  • 90. Machine Learning - Dimensionality Reduction LLE Local Linear Embedding (LLE) in scikit-learn ● LocallyLinearEmbedding class in sklearn.manifold ● Run on the swiss roll example ● Step 3: Plot the reduced dimension data >>> plt.title("Unrolled swiss roll using LLE", fontsize=14) >>> plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=t, cmap=plt.cm.hot) >>> plt.xlabel("$z_1$", fontsize=18) >>> plt.ylabel("$z_2$", fontsize=18) >>> plt.axis([-0.065, 0.055, -0.1, 0.12]) >>> plt.grid(True) >>> plt.show() ...contd
  • 91. Machine Learning - Dimensionality Reduction LLE Local Linear Embedding (LLE) in scikit-learn ● LocallyLinearEmbedding class in sklearn.manifold ● Run on the swiss roll example Switch to Notebook
  • 92. Machine Learning - Dimensionality Reduction LLE Observations ● Swiss roll is completely unrolled ● Distances between the instances are locally preserved ● Not preserved on a larger scale ○ Left most part is squeezed ○ Right part is stretched Distance locally preserved
  • 93. Machine Learning - Dimensionality Reduction LLE - How it Works? Maths! How LLE works? Step 1: For each training instance, the algorithm identifies the k closest neighbours Step 2: reconstructs the instance as a linear function of these closest neighbours ● More specifically, finds the weight w vector such that distance between the closest neighbours and the instance is as small as possible.
  • 94. Machine Learning - Dimensionality Reduction LLE - How it Works? Maths! How LLE works? Step 3: Map the training instances into a d-dimensional space while preserving the local relationship as much as possible ● Basically, keeping the same weight as calculated in the previous step, the new instance should have minimum distances with the previous closest neighbours (same weights and relationship)
  • 95. Machine Learning - Dimensionality Reduction LLE - Time Complexity How LLE works? Step 1: finding K nearest neighbors: O(m x log(m) x n x log(k)) Step 2: weight optimization: O(m x n x k^3) Step 3: constructing low-d representations: O(d x m^2) Where m = number of training datasets, n = number of original dimensions k = nearest neighbours d = reduced dimensions Step 3 makes the model very slow for large number of training datasets
  • 96. Machine Learning - Dimensionality Reduction Other dimensionality techniques Multidimensional Scaling (MDS) ● Reduces dimensionality ● trying to preserve the instances >>> from sklearn.manifold import MDS >>> mds = MDS(n_components=2, random_state=42) >>> X_reduced_mds = mds.fit_transform(X)
  • 97. Machine Learning - Dimensionality Reduction Other dimensionality techniques Isomap ● Creates a graph connecting each instance to its nearest neighbours ● Then, reduces dimensionality ● Trying to preserve geodesic distances between instances >>> from sklearn.manifold import Isomap >>> isomap = Isomap(n_components=2) >>> X_reduced_isomap = isomap.fit_transform(X)
  • 98. Machine Learning - Dimensionality Reduction Other dimensionality techniques T-distributed Stochastic Neighbour Embedding ● Reduces dimensionality ● Keeping similar instances close and dissimilar apart ● Mostly used for visualize clusters in high-dimensional space >>> from sklearn.manifold import TSNE >>> tsne = TSNE(n_components=2) >>> X_reduced_tsne = tsne.fit_transform(X)
  • 99. Machine Learning - Dimensionality Reduction Other dimensionality techniques Linear Discriminant Analysis (LDA) ● A classification algorithm ● During training learns the most discriminative axes between the classes ● Axes can be used to define the hyper plant to project the data ● Projection will keep the classes as far apart as possible ● A good technique to reduce dimensionality before running classification algorithms such as SVM Classifier
  • 100. Machine Learning - Dimensionality Reduction Other dimensionality techniques Plotting the results for each of the techniques on the notebook >>> titles = ["MDS", "Isomap", "t-SNE"] >>> plt.figure(figsize=(11,4)) for subplot, title, X_reduced in zip((131, 132, 133), titles, (X_reduced_mds, X_reduced_isomap, X_reduced_tsne)): plt.subplot(subplot) plt.title(title, fontsize=14) plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=t, cmap=plt.cm.hot) plt.xlabel("$z_1$", fontsize=18) if subplot == 131: plt.ylabel("$z_2$", fontsize=18, rotation=0) plt.grid(True) >>> plt.show() Switch to Notebook
  • 101. Machine Learning - Dimensionality Reduction Other dimensionality techniques Plotting the results for each of the techniques on the notebook
  • 103. Machine Learning - Dimensionality Reduction PCA- Projecting down to d dimensions Similarly, the ● original training dataset X can be projected onto ● the first ‘d’ principal components Wd ○ Composed of first ‘d’ columns of transpose(V) obtained in SVD ● Reducing the dataset dimensions to ‘d’ Wd = first d columns of transpose(V) containing the first d principal components Xd-proj = X.Wd