SlideShare a Scribd company logo
1 of 89
CHAPTER-08
Dimensionality Reduction
@St_Hakky
Hands-On Machine Learning with Scikit-Learn and TensorFlow
Github : https://github.com/ageron/handson-ml
CHAPTER-08
Dimensionality Reduction
Why should we think about this topic?
• Machine Learning problems involve
thousands or even millions of features
for each training instance.
• Curse of Dimensionality’s problems
• Make training extremely slow
• Make it much harder to find a good solution
• For example, we often get much information from
data visualization but it is difficult in high
dimensionality.
• Need much more training data
Reducing the number of features
• Fortunately, it is often possible to reduce
the number of features considerably.
• If we can reducing dimension without
loosing information for some task…
• Make training faster
• Make it much easier to find a good solution
• Reduce the training data to resolve the task
MNIST : Example of Reducing Dimension
Pixels on the image borders
are almost always white.
We can drop dimension
without loosing info
For the classification task, many pixels
are utterly unimportant.
Moreover, two neighboring pixels are
often highly correlated
If we merge them into a single pixel,
we will not lose much information.
Reducing Dimension for Visualization
1. Can you understand what’s going on
this data?(42 dimensions)
Dimensionality reduction is also extremely useful
for data visualization.
2. Reducing the number of dimensions down to two makes it
possible to plot a high-dimensional training set on a graph
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
The Curse of Dimensionality
Even a basic 4D hypercube is incredibly hard to picture, let
alone a 200-dimensional ellipsoid bent in a 1,000-dimensional
space.
We live in three dimensions that our intuition fails us when
we try to imagine a high-dimensional space.
https://youtu.be/BVo2igbFSPE
https://youtu.be/-x60xZe0Si0
Example of our intuition failing
• Let’s think about picking a random point in a unit
square (1 × 1 square).
• Only about a 0.4% chance of being located less than
0.001 from a border.
• What’s happen in a 10,000-dimensional unit
hypercube?
• This probability is greater than 99.999999%.
• Most points in a high-dimensional hypercube are very
close to the border.
• This is quite counterintuitive.
Example of our intuition failing
• Let’s think about another example here.
• Picking two points randomly in a unit square.
• The distance between these two points will be, on
average, roughly 0.52.
• But what about two points picked randomly in a
1,000,000-dimensional hypercube?
• The average distance, believe it or not, will be about
408.25!
Need more data in High Dimension
• These examples means that a new instance will
likely be far away from any training instance.
• This makes predictions much less reliable than in
lower dimensions, since they will be based on much
larger extrapolations.
• In short, the more dimensions the training set
has, the greater the risk of overfitting it.
• So, we need more data.
The solution of Curse of Dimensionality
• Increase the size of the training set to reach a
sufficient density of training instances.
• Unfortunately, in practice, the number of training
instances required to reach a given density grows
exponentially with the number of dimensions.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Main Approaches for Dimensionality
Reduction
• There are two main approaches to
reducing dimensionality
• Projection
• Manifold Learning.
Projection
• In most problems, training instances are
not spread out uniformly across all
dimensions.
• as discussed earlier for MNIST
You can see a 3D dataset represented by
the circles
All training instances actually
lie within a much lower-
dimensional subspace.
3D→2D
You can see a 3D
dataset represented
by the circles
The new 2D dataset
after projection
Project every
training instance
We have just reduced the dataset’s dimensionality from 3D to
2D!!
The axes correspond to
new features z1 and z2
Projection is not always the best
approach
In many cases the subspace may twist and turn.
Swiss roll
Projection is not always the best
approach
Simply projecting onto a
plane would squash
different layers of
the Swiss roll together.
Swiss roll
Dropping x3
Projection is not always the best
approach
Simply projecting onto a
plane would squash
different layers of
the Swiss roll together.
Swiss roll
What you really want is this.
Dropping x3
More example : https://goo.gl/7ILsqR
Manifold
• What is Manifold?
• A d-dimensional manifold is a part of an n-
dimensional space (where d < n) that locally
resembles a d-dimensional hyperplane.
• 2D manifold is a 2D shape that can be bent
and twisted in a higher-dimensional space.
d = 2 and n = 3
Example of a 2D manifold : Swiss roll
Manifold Learning
• What is Manifold Learning?
• Modeling the manifold on which the training
instances lie.
• It relies on the manifold assumption(manifold
hypothesis)
• Most real-world high-dimensional datasets lie close
to a much lower-dimensional manifold.
Once again, MNIST example
• Handwritten digit images have some
similarities.
• Connected lines
• Borders are white
• they are more or less centered…
• These constraints tend to squeeze the
dataset into a lower dimensional manifold.
Manifold assumption
• The manifold assumption is often accompanied
by another implicit assumption
• The task at hand will be simpler if expressed in the
lower-dimensional space of the manifold.
This can be
split into two
classes
The decision boundary would
be fairly complex, but…
The decision boundary is
a simple straight line.
This assumption does not always hold
It looks more complex
in the unrolled
manifold.
x1 = 5
This decision boundary looks
very simple in the original
3D space
Data only knows the best way of
Dimensionality Reduction
Reducing the dimensionality of your
training set before training a model.
Definitely speeding up training
But it may not always lead to a better or
simpler solution : This all depends on the
dataset.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Principal Component Analysis (PCA)
• PCA is the most popular dimensionality
reduction algorithm.
• PCA have two steps:
1. It identifies the hyperplane that lies closest
to the data
2. It projects the data onto it.
Preserving the Variance
Before projecting the training set onto a lower-
dimensional hyperplane, you first need to choose
the right hyperplane.
The projection of
the dataset onto
each of new axes.
If you select the axis that preserves the max
variance, it will most likely lose less information.
Another way to choose axis
• Another way is to choose axis that
minimizes the mean squared distance
between the original dataset and its
projection onto that axis.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
PCA identifies the axis
PCA identifies the axis that accounts for the
largest amount of variance.
It also finds a second axis that is orthogonal to the
first one and accounts for the largest amount of
remaining variance.
Principal Components
1th axis
2th axis
So how can you find the principal components
of a training set?
The unit vector that defines the ith axis is called the
ith principal component (PC).
Singular Value Decomposition(SVD)
SVD can decompose the training set matrix 𝑋 into
the dot product of three matrices 𝑈・Σ・𝑉 𝑇, where 𝑉 𝑇
contains all the principal components
Python Code of SVD
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d
Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Projecting Down to 𝑑 Dimensions
You can reduce the dimensionality down to 𝑑
dimensions by projecting it onto the
hyperplane defined by the first 𝒅 principal
components.
Projecting Down to 𝑑 Dimensions
𝑊 𝑑, defined as the matrix containing
the first d principal components
To project the training set onto the hyperplane,
you can simply compute the following equation.
The following Python code projects the training set onto
the plane defined by the first two principal components:
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Using Scikit-Learn
※it automatically takes
care of centering the data
Scikit-Learn’s PCA class implements PCA using
SVD decomposition just like we did before.
After fitting the PCA, you can access the principal
components using the components_ variable.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Explained Variance Ratio
Explained Variance Ratio
84.2% of the dataset’s variance lies along the
first axis, and 14.6% lies along the second axis.
The proportion of the dataset’s variance that
lies along the axis of each principal component.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Choosing the Right Number of
Dimensions
• Generally, it is preferable to choose the
number of dimensions that add up to a
sufficiently large portion of the variance
(e.g., 95%).
Sample Code
Sample Code
Computing PCA without reducing
dimensionality
Sample Code
Computing PCA without reducing
dimensionality
Computing the minimum number of dimensions
required to preserve 95% of the training set’s
variance
Sample Code
There is a much better way :
You can set n_components to be a
float between 0.0 and 1.0
Plot the explained variance
Elbow = The explained
variance stops growing
fast.
You can think of Elbow point as the intrinsic
dimensionality of the dataset.
Another option is to plot the explained
variance.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
PCA for Compression
Example: Applying PCA to the MNIST
Obviously after dimensionality reduction, the
training set takes up much less space.
95%
This is a reasonable compression ratio and this can
speed up a classification algorithm tremendously.
Each instance will have just over 150
features, instead of the original 784
features
The dataset is now less than
20% of its original size!
Decompress the reduced datasets
The equation of the
inverse transformation:
You also can decompress the reduced dataset by
the inverse transformation of the PCA projection.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Incremental PCA(IPCA)
• One problem with implementation of PCA
• It requires the whole training set to fit in
memory for the SVD.
• IPCA algorithms have been developed
• Split the training set into mini-batches
• Feed an IPCA algorithm one mini-batch at a
time.
• This is useful for large training sets, and
also to apply PCA online.
Sample Code
Sample Code
Spliting the MNIST dataset into 100 mini-batches
Sample Code
Feeding them to Scikit-Learn’s IPCA class
Another Sample Code
NumPy’s memmap class allows you to
manipulate a large array stored in a binary
file on disk.
The class loads only the data it
needs in memory, when it needs it.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Randomized PCA(RPCA)
Computational complexity
𝑂(𝑚 × 𝑑2) + 𝑂(𝑑3)
𝑂(𝑚 × 𝑑2) + 𝑂(𝑛3)
It is dramatically faster
when 𝑑 is much smaller
than 𝑛.
This is a stochastic algorithm that quickly finds an
approximation of the first d principal components.
PCA
RPCA
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Kernel Trick
• Kernel Trick
• A mathematical technique that implicitly maps
instances into a very high-dimensional space
• A linear decision boundary in the high-dimensional
feature space corresponds to a complex nonlinear
decision boundary in the original space.
Kernel PCA(kPCA)
It is often good at preserving clusters of instances
after projection.
Making kernel trick possible to perform complex
nonlinear projections for dimensionality reduction.
kPCA
Sample Code
• Scikit-Learn’s KernelPCA class to perform
kPCA with an RBF kernel
Selecting a Kernel and Tuning
Hyperparameters
• As kPCA is an unsupervised learning:
• There is no obvious performance measure to
select the best kernel and hyperparameters.
• However, dimensionality reduction is
often a preparation step for a supervised
learning task.
Grid Search
You can simply use grid
search to select the
kernel and
hyperparameters.
The best kernel and
hyperparameters are then
available.
Selecting a Kernel and Tuning Hyperparameters
With Lowest reconstruction error
• Another approach is to select the kernel
and hyperparameters that yield the
lowest reconstruction error.
• This time entirely unsupervised
• However, reconstruction is not as easy as
with linear PCA.
Example : Reconstruction is not easy
The original Swiss roll 3D dataset Resulting 2D dataset after kPCA
is applied using an RBF kernel
Mapping the dataset to an infinite-
dimensional space by kernel trickReconstruction pre-image
Example : Reconstruction is not easy
The original Swiss roll 3D dataset Resulting 2D dataset after kPCA
is applied using an RBF kernel
Mapping the dataset to an infinite-
dimensional space by kernel trickReconstruction pre-image
We calculate the
reconstruction
error by this.
Reconstruction
by kernel PCA
Example : Reconstruction is not easy
Resulting 2D dataset after kPCA
is applied using an RBF kernel
Mapping the dataset to an infinite-
dimensional space by kernel trickReconstruction pre-image
Reconstruction
by kernel PCA
Since the feature space is infinite-
dimensional, we cannot compute
the reconstructed point.
→ We cannot compute the true
reconstruction error.
We calculate the
reconstruction
error by this.
Example : Reconstruction is not easy
The original Swiss roll 3D dataset Resulting 2D dataset after kPCA
is applied using an RBF kernel
Mapping the dataset to an infinite-
dimensional space by kernel trickReconstruction pre-image
Fortunately, it is possible to find a
point in the original space that
would map close to the
reconstructed point. This is called
the reconstruction pre-image.
Example : Reconstruction is not easy
The original Swiss roll 3D dataset Resulting 2D dataset after kPCA
is applied using an RBF kernel
Mapping the dataset to an infinite-
dimensional space by kernel trickReconstruction pre-image
You can measure its
squared distance to the
original instance.
By this value, you can
select the kernel and
hyperparameters.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Locally Linear Embedding (LLE)
• LLE is Powerful nonlinear dimensionality
reduction method.
• A Manifold Learning technique that does not
rely on projections like the previous algorithms.
• This makes it good at unrolling twisted
manifolds
• Especially when there is not too much noise.
Roweis, Sam T., and Lawrence K. Saul. "Nonlinear dimensionality reduction
by locally linear embedding." science 290.5500 (2000): 2323-2326.
Sample Code
Result
Swiss roll is completely
unrolled.
The distances between
instances are locally well
preserved.
Sample Code
Result
However, distances are not
preserved on a larger scale
Squeezed Stretched
How LLE works
1. First, the algorithm identifies its 𝑘 closest
neighbors for each training instance 𝑥(𝑖)
• Find the weights 𝑤𝑖,𝑗 such that the squared
distance between 𝑥(𝑖) and 𝑗=1
𝑚
𝑤𝑖,𝑗 𝑥(𝑗) is as small
as possible.
• 𝑤𝑖,𝑗 = 0 if 𝑥(𝑗) is not one of the 𝑘 closest neighbors of
𝑥(𝑖)
2. Then, trying to reconstruct 𝑥(𝑖)
as a linear
function of these neighbors.
How LLE works
1. First, the algorithm identifies its 𝒌
closest neighbors for each training
instance 𝒙(𝒊)
• Find the weights 𝑤𝑖,𝑗 such that the squared
distance between 𝑥(𝑖) and 𝑗=1
𝑚
𝑤𝑖,𝑗 𝑥(𝑗) is as small
as possible.
• 𝑤𝑖,𝑗 = 0 if 𝑥(𝑗) is not one of the 𝑘 closest neighbors of
𝑥(𝑖)
2. Then, trying to reconstruct 𝑥(𝑖)
as a linear
function of these neighbors.
How LLE works
• The detail first step of LLE is here.
• First step of LLE is the constrained optimization
problem described in Equation 8-4.
• Second constraint simply normalizes the weights
for each training instance 𝑥(𝑖).
𝑾 is the weight matrix
containing all the weights 𝑤𝑖,𝑗
How LLE works
1. First, the algorithm identifies its 𝑘 closest
neighbors for each training instance 𝑥(𝑖)
• Find the weights 𝑤𝑖,𝑗 such that the squared distance
between 𝑥(𝑖) and 𝑗=1
𝑚
𝑤𝑖,𝑗 𝑥(𝑗) is as small as possible.
• 𝑤𝑖,𝑗 = 0 if 𝑥(𝑗) is not one of the 𝑘 closest neighbors of 𝑥(𝑖)
2. Then, trying to reconstruct 𝒙(𝒊) as a linear
function of these neighbors.
How LLE works
• The detail second step of LLE is here.
• Now the second step is to map the training instances
into a 𝑑-dimensional space (where 𝑑 < 𝑛).
• If 𝑧(𝑖) is the image of 𝑥(𝑖) in this 𝑑-dimensional space,
then we want the squared distance between 𝑧(𝑖)
and
𝑗=1
𝑚
𝑤𝑖,𝑗 𝑧(𝑖)to be as small as possible.
Note that 𝒁 is the matrix containing all 𝑧(𝑖)
How LLE works
These look very similar
How LLE works
Keeping the instances
fixed and finding the
optimal weights
We are doing the reverse
Keeping the weights fixed and
finding the optimal position in the
low dimensional space.
Contents
• The Curse of Dimensionality
• Main Approaches for
Dimensionality Reduction
• Projection
• Manifold Learning
• PCA
• Preserving the Variance
• Principal Components
• Projecting Down to d Dimensions
• Using Scikit-Learn
• Explained Variance Ratio
• Choosing the Right Number of
Dimensions
• PCA for Compression
• Incremental PCA
• Randomized PCA
• Kernel PCA
• Selecting a Kernel and Tuning
Hyperparameters
• LLE
• Other Dimensionality Reduction
Techniques
• MDS
• SOM
• Isomap
• t-SNE
Other Dimensionality Reduction
Techniques
• There are many other dimensionality
reduction techniques.
• MDS
• Isomap
• t-SNE
Multidimensional Scaling (MDS)
• MDS reduces dimensionality while trying to
preserve the distances between the instances.
Isomap
• First, creating a graph by connecting each
instance to its nearest neighbors
• Then reducing dimensionality while trying
to preserve the geodesic distances
between the instances.
t-Distributed Stochastic Neighbor
Embedding (t-SNE)
• t-SNE reduces dimensionality while trying
to keep similar instances close and
dissimilar instances apart.
• It is mostly used for visualization
Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using t-
SNE." Journal of Machine Learning Research 9.Nov (2008): 2579-2605.
Thank you

More Related Content

What's hot

Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Machine Learning
Machine LearningMachine Learning
Machine LearningKumar P
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Edureka!
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaEdureka!
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 

What's hot (20)

Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
KNN
KNN KNN
KNN
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 

Viewers also liked

Diet networks thin parameters for fat genomic
Diet networks thin parameters for fat genomicDiet networks thin parameters for fat genomic
Diet networks thin parameters for fat genomicHakky St
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節Hakky St
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksHakky St
 
Tensorflow
TensorflowTensorflow
TensorflowHakky St
 
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.Hakky St
 
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable ConvolutionsDeep Learning JP
 
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章 【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章 Hakky St
 
強くなるロボティック・ ゲームプレイヤーの作り方3章
強くなるロボティック・ ゲームプレイヤーの作り方3章強くなるロボティック・ ゲームプレイヤーの作り方3章
強くなるロボティック・ ゲームプレイヤーの作り方3章Hakky St
 
Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Hakky St
 
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPsDeep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPsHakky St
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章Hakky St
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節Hakky St
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節Hakky St
 
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章Hakky St
 
スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習hagino 3000
 
劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章Hakky St
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnArnaud Joly
 
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael VaroquauxPyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael VaroquauxPôle Systematic Paris-Region
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learnYoss Cohen
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnKan Ouivirach, Ph.D.
 

Viewers also liked (20)

Diet networks thin parameters for fat genomic
Diet networks thin parameters for fat genomicDiet networks thin parameters for fat genomic
Diet networks thin parameters for fat genomic
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networks
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
 
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
 
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章 【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章
 
強くなるロボティック・ ゲームプレイヤーの作り方3章
強くなるロボティック・ ゲームプレイヤーの作り方3章強くなるロボティック・ ゲームプレイヤーの作り方3章
強くなるロボティック・ ゲームプレイヤーの作り方3章
 
Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...
 
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPsDeep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
 
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章
 
スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習
 
劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael VaroquauxPyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
 

Similar to Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8

How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern PresentationDaniel Cahall
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 

Similar to Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8 (20)

PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Deep learning
Deep learningDeep learning
Deep learning
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 

Recently uploaded

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 

Recently uploaded (20)

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 

Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8

  • 1. CHAPTER-08 Dimensionality Reduction @St_Hakky Hands-On Machine Learning with Scikit-Learn and TensorFlow Github : https://github.com/ageron/handson-ml
  • 3. Why should we think about this topic? • Machine Learning problems involve thousands or even millions of features for each training instance. • Curse of Dimensionality’s problems • Make training extremely slow • Make it much harder to find a good solution • For example, we often get much information from data visualization but it is difficult in high dimensionality. • Need much more training data
  • 4. Reducing the number of features • Fortunately, it is often possible to reduce the number of features considerably. • If we can reducing dimension without loosing information for some task… • Make training faster • Make it much easier to find a good solution • Reduce the training data to resolve the task
  • 5. MNIST : Example of Reducing Dimension Pixels on the image borders are almost always white. We can drop dimension without loosing info For the classification task, many pixels are utterly unimportant. Moreover, two neighboring pixels are often highly correlated If we merge them into a single pixel, we will not lose much information.
  • 6. Reducing Dimension for Visualization 1. Can you understand what’s going on this data?(42 dimensions) Dimensionality reduction is also extremely useful for data visualization. 2. Reducing the number of dimensions down to two makes it possible to plot a high-dimensional training set on a graph
  • 7. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 8. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 9. The Curse of Dimensionality Even a basic 4D hypercube is incredibly hard to picture, let alone a 200-dimensional ellipsoid bent in a 1,000-dimensional space. We live in three dimensions that our intuition fails us when we try to imagine a high-dimensional space.
  • 11. Example of our intuition failing • Let’s think about picking a random point in a unit square (1 × 1 square). • Only about a 0.4% chance of being located less than 0.001 from a border. • What’s happen in a 10,000-dimensional unit hypercube? • This probability is greater than 99.999999%. • Most points in a high-dimensional hypercube are very close to the border. • This is quite counterintuitive.
  • 12. Example of our intuition failing • Let’s think about another example here. • Picking two points randomly in a unit square. • The distance between these two points will be, on average, roughly 0.52. • But what about two points picked randomly in a 1,000,000-dimensional hypercube? • The average distance, believe it or not, will be about 408.25!
  • 13. Need more data in High Dimension • These examples means that a new instance will likely be far away from any training instance. • This makes predictions much less reliable than in lower dimensions, since they will be based on much larger extrapolations. • In short, the more dimensions the training set has, the greater the risk of overfitting it. • So, we need more data.
  • 14. The solution of Curse of Dimensionality • Increase the size of the training set to reach a sufficient density of training instances. • Unfortunately, in practice, the number of training instances required to reach a given density grows exponentially with the number of dimensions.
  • 15. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 16. Main Approaches for Dimensionality Reduction • There are two main approaches to reducing dimensionality • Projection • Manifold Learning.
  • 17. Projection • In most problems, training instances are not spread out uniformly across all dimensions. • as discussed earlier for MNIST You can see a 3D dataset represented by the circles All training instances actually lie within a much lower- dimensional subspace.
  • 18. 3D→2D You can see a 3D dataset represented by the circles The new 2D dataset after projection Project every training instance We have just reduced the dataset’s dimensionality from 3D to 2D!! The axes correspond to new features z1 and z2
  • 19. Projection is not always the best approach In many cases the subspace may twist and turn. Swiss roll
  • 20. Projection is not always the best approach Simply projecting onto a plane would squash different layers of the Swiss roll together. Swiss roll Dropping x3
  • 21. Projection is not always the best approach Simply projecting onto a plane would squash different layers of the Swiss roll together. Swiss roll What you really want is this. Dropping x3 More example : https://goo.gl/7ILsqR
  • 22. Manifold • What is Manifold? • A d-dimensional manifold is a part of an n- dimensional space (where d < n) that locally resembles a d-dimensional hyperplane. • 2D manifold is a 2D shape that can be bent and twisted in a higher-dimensional space. d = 2 and n = 3 Example of a 2D manifold : Swiss roll
  • 23. Manifold Learning • What is Manifold Learning? • Modeling the manifold on which the training instances lie. • It relies on the manifold assumption(manifold hypothesis) • Most real-world high-dimensional datasets lie close to a much lower-dimensional manifold.
  • 24. Once again, MNIST example • Handwritten digit images have some similarities. • Connected lines • Borders are white • they are more or less centered… • These constraints tend to squeeze the dataset into a lower dimensional manifold.
  • 25. Manifold assumption • The manifold assumption is often accompanied by another implicit assumption • The task at hand will be simpler if expressed in the lower-dimensional space of the manifold. This can be split into two classes The decision boundary would be fairly complex, but… The decision boundary is a simple straight line.
  • 26. This assumption does not always hold It looks more complex in the unrolled manifold. x1 = 5 This decision boundary looks very simple in the original 3D space
  • 27. Data only knows the best way of Dimensionality Reduction Reducing the dimensionality of your training set before training a model. Definitely speeding up training But it may not always lead to a better or simpler solution : This all depends on the dataset.
  • 28. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 29. Principal Component Analysis (PCA) • PCA is the most popular dimensionality reduction algorithm. • PCA have two steps: 1. It identifies the hyperplane that lies closest to the data 2. It projects the data onto it.
  • 30. Preserving the Variance Before projecting the training set onto a lower- dimensional hyperplane, you first need to choose the right hyperplane. The projection of the dataset onto each of new axes. If you select the axis that preserves the max variance, it will most likely lose less information.
  • 31. Another way to choose axis • Another way is to choose axis that minimizes the mean squared distance between the original dataset and its projection onto that axis.
  • 32. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 33. PCA identifies the axis PCA identifies the axis that accounts for the largest amount of variance. It also finds a second axis that is orthogonal to the first one and accounts for the largest amount of remaining variance.
  • 34. Principal Components 1th axis 2th axis So how can you find the principal components of a training set? The unit vector that defines the ith axis is called the ith principal component (PC).
  • 35. Singular Value Decomposition(SVD) SVD can decompose the training set matrix 𝑋 into the dot product of three matrices 𝑈・Σ・𝑉 𝑇, where 𝑉 𝑇 contains all the principal components Python Code of SVD
  • 36. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 37. Projecting Down to 𝑑 Dimensions You can reduce the dimensionality down to 𝑑 dimensions by projecting it onto the hyperplane defined by the first 𝒅 principal components.
  • 38. Projecting Down to 𝑑 Dimensions 𝑊 𝑑, defined as the matrix containing the first d principal components To project the training set onto the hyperplane, you can simply compute the following equation. The following Python code projects the training set onto the plane defined by the first two principal components:
  • 39. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 40. Using Scikit-Learn ※it automatically takes care of centering the data Scikit-Learn’s PCA class implements PCA using SVD decomposition just like we did before. After fitting the PCA, you can access the principal components using the components_ variable.
  • 41. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 42. Explained Variance Ratio Explained Variance Ratio 84.2% of the dataset’s variance lies along the first axis, and 14.6% lies along the second axis. The proportion of the dataset’s variance that lies along the axis of each principal component.
  • 43. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 44. Choosing the Right Number of Dimensions • Generally, it is preferable to choose the number of dimensions that add up to a sufficiently large portion of the variance (e.g., 95%).
  • 46. Sample Code Computing PCA without reducing dimensionality
  • 47. Sample Code Computing PCA without reducing dimensionality Computing the minimum number of dimensions required to preserve 95% of the training set’s variance
  • 48. Sample Code There is a much better way : You can set n_components to be a float between 0.0 and 1.0
  • 49. Plot the explained variance Elbow = The explained variance stops growing fast. You can think of Elbow point as the intrinsic dimensionality of the dataset. Another option is to plot the explained variance.
  • 50. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 51. PCA for Compression Example: Applying PCA to the MNIST Obviously after dimensionality reduction, the training set takes up much less space. 95% This is a reasonable compression ratio and this can speed up a classification algorithm tremendously. Each instance will have just over 150 features, instead of the original 784 features The dataset is now less than 20% of its original size!
  • 52. Decompress the reduced datasets The equation of the inverse transformation: You also can decompress the reduced dataset by the inverse transformation of the PCA projection.
  • 53. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 54. Incremental PCA(IPCA) • One problem with implementation of PCA • It requires the whole training set to fit in memory for the SVD. • IPCA algorithms have been developed • Split the training set into mini-batches • Feed an IPCA algorithm one mini-batch at a time. • This is useful for large training sets, and also to apply PCA online.
  • 56. Sample Code Spliting the MNIST dataset into 100 mini-batches
  • 57. Sample Code Feeding them to Scikit-Learn’s IPCA class
  • 58. Another Sample Code NumPy’s memmap class allows you to manipulate a large array stored in a binary file on disk. The class loads only the data it needs in memory, when it needs it.
  • 59. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 60. Randomized PCA(RPCA) Computational complexity 𝑂(𝑚 × 𝑑2) + 𝑂(𝑑3) 𝑂(𝑚 × 𝑑2) + 𝑂(𝑛3) It is dramatically faster when 𝑑 is much smaller than 𝑛. This is a stochastic algorithm that quickly finds an approximation of the first d principal components. PCA RPCA
  • 61. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 62. Kernel Trick • Kernel Trick • A mathematical technique that implicitly maps instances into a very high-dimensional space • A linear decision boundary in the high-dimensional feature space corresponds to a complex nonlinear decision boundary in the original space.
  • 63. Kernel PCA(kPCA) It is often good at preserving clusters of instances after projection. Making kernel trick possible to perform complex nonlinear projections for dimensionality reduction. kPCA
  • 64. Sample Code • Scikit-Learn’s KernelPCA class to perform kPCA with an RBF kernel
  • 65. Selecting a Kernel and Tuning Hyperparameters • As kPCA is an unsupervised learning: • There is no obvious performance measure to select the best kernel and hyperparameters. • However, dimensionality reduction is often a preparation step for a supervised learning task.
  • 66. Grid Search You can simply use grid search to select the kernel and hyperparameters. The best kernel and hyperparameters are then available.
  • 67. Selecting a Kernel and Tuning Hyperparameters With Lowest reconstruction error • Another approach is to select the kernel and hyperparameters that yield the lowest reconstruction error. • This time entirely unsupervised • However, reconstruction is not as easy as with linear PCA.
  • 68. Example : Reconstruction is not easy The original Swiss roll 3D dataset Resulting 2D dataset after kPCA is applied using an RBF kernel Mapping the dataset to an infinite- dimensional space by kernel trickReconstruction pre-image
  • 69. Example : Reconstruction is not easy The original Swiss roll 3D dataset Resulting 2D dataset after kPCA is applied using an RBF kernel Mapping the dataset to an infinite- dimensional space by kernel trickReconstruction pre-image We calculate the reconstruction error by this. Reconstruction by kernel PCA
  • 70. Example : Reconstruction is not easy Resulting 2D dataset after kPCA is applied using an RBF kernel Mapping the dataset to an infinite- dimensional space by kernel trickReconstruction pre-image Reconstruction by kernel PCA Since the feature space is infinite- dimensional, we cannot compute the reconstructed point. → We cannot compute the true reconstruction error. We calculate the reconstruction error by this.
  • 71. Example : Reconstruction is not easy The original Swiss roll 3D dataset Resulting 2D dataset after kPCA is applied using an RBF kernel Mapping the dataset to an infinite- dimensional space by kernel trickReconstruction pre-image Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image.
  • 72. Example : Reconstruction is not easy The original Swiss roll 3D dataset Resulting 2D dataset after kPCA is applied using an RBF kernel Mapping the dataset to an infinite- dimensional space by kernel trickReconstruction pre-image You can measure its squared distance to the original instance. By this value, you can select the kernel and hyperparameters.
  • 73. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 74. Locally Linear Embedding (LLE) • LLE is Powerful nonlinear dimensionality reduction method. • A Manifold Learning technique that does not rely on projections like the previous algorithms. • This makes it good at unrolling twisted manifolds • Especially when there is not too much noise. Roweis, Sam T., and Lawrence K. Saul. "Nonlinear dimensionality reduction by locally linear embedding." science 290.5500 (2000): 2323-2326.
  • 75. Sample Code Result Swiss roll is completely unrolled. The distances between instances are locally well preserved.
  • 76. Sample Code Result However, distances are not preserved on a larger scale Squeezed Stretched
  • 77. How LLE works 1. First, the algorithm identifies its 𝑘 closest neighbors for each training instance 𝑥(𝑖) • Find the weights 𝑤𝑖,𝑗 such that the squared distance between 𝑥(𝑖) and 𝑗=1 𝑚 𝑤𝑖,𝑗 𝑥(𝑗) is as small as possible. • 𝑤𝑖,𝑗 = 0 if 𝑥(𝑗) is not one of the 𝑘 closest neighbors of 𝑥(𝑖) 2. Then, trying to reconstruct 𝑥(𝑖) as a linear function of these neighbors.
  • 78. How LLE works 1. First, the algorithm identifies its 𝒌 closest neighbors for each training instance 𝒙(𝒊) • Find the weights 𝑤𝑖,𝑗 such that the squared distance between 𝑥(𝑖) and 𝑗=1 𝑚 𝑤𝑖,𝑗 𝑥(𝑗) is as small as possible. • 𝑤𝑖,𝑗 = 0 if 𝑥(𝑗) is not one of the 𝑘 closest neighbors of 𝑥(𝑖) 2. Then, trying to reconstruct 𝑥(𝑖) as a linear function of these neighbors.
  • 79. How LLE works • The detail first step of LLE is here. • First step of LLE is the constrained optimization problem described in Equation 8-4. • Second constraint simply normalizes the weights for each training instance 𝑥(𝑖). 𝑾 is the weight matrix containing all the weights 𝑤𝑖,𝑗
  • 80. How LLE works 1. First, the algorithm identifies its 𝑘 closest neighbors for each training instance 𝑥(𝑖) • Find the weights 𝑤𝑖,𝑗 such that the squared distance between 𝑥(𝑖) and 𝑗=1 𝑚 𝑤𝑖,𝑗 𝑥(𝑗) is as small as possible. • 𝑤𝑖,𝑗 = 0 if 𝑥(𝑗) is not one of the 𝑘 closest neighbors of 𝑥(𝑖) 2. Then, trying to reconstruct 𝒙(𝒊) as a linear function of these neighbors.
  • 81. How LLE works • The detail second step of LLE is here. • Now the second step is to map the training instances into a 𝑑-dimensional space (where 𝑑 < 𝑛). • If 𝑧(𝑖) is the image of 𝑥(𝑖) in this 𝑑-dimensional space, then we want the squared distance between 𝑧(𝑖) and 𝑗=1 𝑚 𝑤𝑖,𝑗 𝑧(𝑖)to be as small as possible. Note that 𝒁 is the matrix containing all 𝑧(𝑖)
  • 82. How LLE works These look very similar
  • 83. How LLE works Keeping the instances fixed and finding the optimal weights We are doing the reverse Keeping the weights fixed and finding the optimal position in the low dimensional space.
  • 84. Contents • The Curse of Dimensionality • Main Approaches for Dimensionality Reduction • Projection • Manifold Learning • PCA • Preserving the Variance • Principal Components • Projecting Down to d Dimensions • Using Scikit-Learn • Explained Variance Ratio • Choosing the Right Number of Dimensions • PCA for Compression • Incremental PCA • Randomized PCA • Kernel PCA • Selecting a Kernel and Tuning Hyperparameters • LLE • Other Dimensionality Reduction Techniques • MDS • SOM • Isomap • t-SNE
  • 85. Other Dimensionality Reduction Techniques • There are many other dimensionality reduction techniques. • MDS • Isomap • t-SNE
  • 86. Multidimensional Scaling (MDS) • MDS reduces dimensionality while trying to preserve the distances between the instances.
  • 87. Isomap • First, creating a graph by connecting each instance to its nearest neighbors • Then reducing dimensionality while trying to preserve the geodesic distances between the instances.
  • 88. t-Distributed Stochastic Neighbor Embedding (t-SNE) • t-SNE reduces dimensionality while trying to keep similar instances close and dissimilar instances apart. • It is mostly used for visualization Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using t- SNE." Journal of Machine Learning Research 9.Nov (2008): 2579-2605.

Editor's Notes

  1. We often gain some important insights easily.
  2. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  3. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  4. let alone →いうまでもなく ellipsoid →楕円体の
  5. How can two points be so far apart when they both lie within the same unit hypercube?
  6. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  7. Before we dive into specific dimensionality reduction algorithms, let’s take a look at the two main approaches to reducing dimensionality
  8. Notice that all training instances lie close to a plane: this is a lower-dimensional (2D) subspace of the high-dimensional (3D) space. Now if we project every training instance perpendicularly onto this subspace (as represented by the short lines connecting the instances to the plane), we get the new 2D dataset shown in Figure 8-3. Ta-da! We have just reduced the dataset’s dimensionality from 3D to 2D. Note that the axes correspond to new features z1 and z2 (the coordinates of the projections on the plane).
  9. However, projection is not always the best approach to dimensionality reduction.
  10. However, projection is not always the best approach to dimensionality reduction.
  11. However, projection is not always the best approach to dimensionality reduction. squash  押しつぶす
  12. This assumption is very often empirically observed.
  13. If you randomly generated images, only a ridiculously tiny fraction of them would look like handwritten digits. In other words, the degrees of freedom available to you if you try to create a digit image are dramatically lower than the degrees of freedom you would have if you were allowed to generate any image you wanted.
  14. Implicit 暗に示された
  15. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  16. As you can see, the projection onto the solid line preserves the maximum variance, while the projection onto the dotted line preserves very little variance, and the projection onto the dashed line preserves an intermediate amount of variance.
  17. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  18. Singular Value Decomposition(SVD) : 特異値分解 VはMの世紀直こうきていを表す
  19. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  20. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  21. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  22. Another very useful piece of information is the explained variance ratio of each principal component, available via the explained_variance_ratio_ variable. For example, let’s look at the explained variance ratios of the first two components of the 3D dataset represented in Figure 8-2:
  23. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  24. Instead of arbitrarily choosing the number of dimensions to reduce down to, Unless, of course, you are reducing dimensionality for data visualization—in that case you will generally want to reduce the dimensionality down to 2 or 3.
  25. You could then set n_components=d and run PCA again.
  26. You could then set n_components=d and run PCA again.
  27. You could then set n_components=d and run PCA again.
  28. instead of specifying the number of principal components you want to preserve, you can set n_components to be a float between 0.0 and 1.0, indicating the ratio of variance you wish to preserve:
  29. In this case, you can see that reducing the dimensionality down to about 100 dimensions wouldn’t lose too much explained variance. Intrinsic 本来備わっている
  30. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  31. Of course this won’t give you back the original data, since the projection lost a bit of information (within the 5% variance that was dropped), but it will likely be quite close to the original data. Figure 8-9 shows a few digits from the original training set (on the left), and the corresponding digits after compression and decompression. You can see that there is a slight image quality loss, but the digits are still mostly intact.
  32. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  33. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  34. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  35. For example, the following code creates a two-step pipeline, first reducing dimensionality to two dimensions using kPCA, then applying Logistic Regression for classification. Then it uses Grid SearchCV to find the best kernel and gamma value for kPCA in order to get the best classification accuracy at the end of the pipeline:
  36. Notice that if we could invert the linear PCA step for a given instance in the reduced space, the reconstructed point would lie in feature space, not in the original space (e.g., like the one represented by an x in the diagram). Since the feature space is infinite-dimensional, we cannot compute the reconstructed point, and therefore we cannot compute the true reconstruction error. Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image. Once you have this pre-image, you can measure its squared distance to the original instance. You can then select the kernel and hyperparameters that minimize this reconstruction pre-image error.
  37. Since the feature space is infinite-dimensional, we cannot compute the reconstructed point, and therefore we cannot compute the true reconstruction error. Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image. Once you have this pre-image, you can measure its squared distance to the original instance. You can then select the kernel and hyperparameters that minimize this reconstruction pre-image error.
  38. Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image. Once you have this pre-image, you can measure its squared distance to the original instance. You can then select the kernel and hyperparameters that minimize this reconstruction pre-image error.
  39. Notice that if we could invert the linear PCA step for a given instance in the reduced space, the reconstructed point would lie in feature space, not in the original space (e.g., like the one represented by an x in the diagram). Since the feature space is infinite-dimensional, we cannot compute the reconstructed point, and therefore we cannot compute the true reconstruction error. Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image. Once you have this pre-image, you can measure its squared distance to the original instance. You can then select the kernel and hyperparameters that minimize this reconstruction pre-image error.
  40. Notice that if we could invert the linear PCA step for a given instance in the reduced space, the reconstructed point would lie in feature space, not in the original space (e.g., like the one represented by an x in the diagram). Since the feature space is infinite-dimensional, we cannot compute the reconstructed point, and therefore we cannot compute the true reconstruction error. Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image. Once you have this pre-image, you can measure its squared distance to the original instance. You can then select the kernel and hyperparameters that minimize this reconstruction pre-image error.
  41. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  42. LLE works by following step: First, measuring how each training instance linearly relates to its closest neighbors Then, looking for a low-dimensional representation of the training set where these local relationships are best preserved.
  43. LLE works by following step: First, measuring how each training instance linearly relates to its closest neighbors Then, looking for a low-dimensional representation of the training set where these local relationships are best preserved.
  44. LLE works by following step: First, measuring how each training instance linearly relates to its closest neighbors Then, looking for a low-dimensional representation of the training set where these local relationships are best preserved.
  45. We will discuss the curse of dimensionality and get a sense of what goes on in high-dimensional space. Then, we will present the two main approaches to dimensionality reduction (projection and Manifold Learning), and we will go through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.
  46. It is mostly used for visualization , in particular to visualize clusters of instances in high-dimensional space (e.g., to visualize the MNIST images in 2D).