SlideShare a Scribd company logo
Dimensionality Reduction
Legal Notices and Disclaimers
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS SUMMARY.
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. Check with your system manufacturer or retailer or learn more at intel.com.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2017, Intel Corporation. All rights reserved.
Curse of Dimensionality
• Theoretically, increasing features
should improve performance
• In practice, more features leads
to worse performance
• Number of training examples
required increases exponentially
with dimensionality
1
dimension:
10 positions
2 dimensions:
100 positions
3 dimensions:
1000 positions
Curse of Dimensionality
• Theoretically, increasing features
should improve performance
• In practice, too many features
leads to worse performance
1
dimension:
10 positions
2 dimensions:
100 positions
3 dimensions:
1000 positions
Curse of Dimensionality
• Theoretically, increasing features
should improve performance
• In practice, too many features
leads to worse performance
• Number of training examples
required increases exponentially
with dimensionality
1
dimension:
10 positions
2 dimensions:
100 positions
3 dimensions:
1000 positions
Solution: Dimensionality Reduction
• Data can be represented
by fewer dimensions
(features)
• Reduce dimensionality by
selecting subset (feature
elimination)
• Combine with linear and
non-linear transformations
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Data can be represented
by fewer dimensions
(features)
• Reduce dimensionality by
selecting subset (feature
elimination)
• Combine with linear and
non-linear transformations
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Data can be represented
by fewer dimensions
(features)
• Reduce dimensionality by
selecting subset (feature
elimination)
• Combine with linear and
non-linear transformations
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Two features: height
and cigarettes per day
• Both features increase
together (correlated)
• Can we reduce number
of features to one?
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Two features: height
and cigarettes per day
• Both features increase
together (correlated)
• Can we reduce number
of features to one?
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Two features: height
and cigarettes per day
• Both features increase
together (correlated)
• Can we reduce number
of features to one?
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Two features: height
and cigarettes per day
• Both features increase
together (correlated)
• Can we reduce number
of features to one?
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Create single feature
that is combination of
height and cigarettes
• This is Principal
Component Analysis
(PCA)
Height
Cigarettes/Day
Solution: Dimensionality Reduction
• Create single feature
that is combination of
height and cigarettes
• This is Principal
Component Analysis
(PCA)
Dimensionality Reduction
Given an 𝑁-dimensional data set (𝑥), find a 𝑁 × 𝐾 matrix (𝑈):
𝑦 = 𝑈 𝑇
𝑥, where 𝑦 has 𝐾 dimensions and 𝐾 < 𝑁
𝑥 =
𝑥1
𝑥2
⋯
𝑥 𝑛
𝑈 𝑇
𝑦 =
𝑦1
𝑦2
⋯
𝑦 𝑘
(𝐾 < 𝑁)
Principal Component Analysis (PCA)
X2
X1
Principal Component Analysis (PCA)
X2
X1
Principal Component Analysis (PCA)
X2
X1
Principal Component Analysis (PCA)
X2
X1
Direction: 𝑣1
Length: 𝜆1
Direction: 𝑣2
Length: 𝜆2
• SVD is a matrix
factorization method
normally used for PCA
• Does not require a
square data set
• SVD is used by Scikit-
learn for PCA
Single Value Decomposition (SVD)
𝐴 𝑚×𝑛 𝑈 𝑚×𝑚 𝑆 𝑚×𝑛 𝑉𝑛×𝑛
𝑇
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
=
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
⋆ 0 0
0 ⋆ 0
0 0 ⋆
0 0 0
0 0 0
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
• How can SVD be used
for dimensionality
reduction?
• Principal components
are calculated from 𝑈𝑆
• "Truncated SVD" used
for dimensionality
reduction (𝑛 → 𝑘)
Truncated Single Value Decomposition
𝐴 𝑚×𝑛 𝑈 𝑚×𝑘 𝑆 𝑘×𝑘 𝑉𝑘×𝑛
𝑇
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
≈
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
⋆ ⋆ ⋆ ⋆ ⋆
𝟗 0 0
0 𝟕 0
0 0 𝟏
0 0 0
0 0 0
⋆ ⋆ ⋆
⋆ ⋆ ⋆
⋆ ⋆ ⋆
• PCA and SVD seek to
find the vectors that
capture the most
variance
• Variance is sensitive to
axis scale
• Must scale data!
Importance of Feature Scaling
X2
X1
100
200
300
400
500
10 20 30 40 50
Unscaled
• PCA and SVD seek to
find the vectors that
capture the most
variance
• Variance is sensitive to
axis scale
• Must scale data!
X2
X1
10
20
30
40
50
10 20 30 40 50
Unscaled
Scaled
Importance of Feature Scaling
Import the class containing the dimensionality reduction method
from sklearn.decomposition import PCA
Create an instance of the class
PCAinst = PCA(n_components=3, whiten=True)
Fit the instance on the data and then transform the data
X_trans = PCAinst.fit_transform(X_train)
Does not work with sparse matrices
PCA: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.decomposition import PCA
Create an instance of the class
PCAinst = PCA(n_components=3, whiten=True)
PCA: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.decomposition import PCA
Create an instance of the class
PCAinst = PCA(n_components=3, whiten=True) final number of
dimensions
PCA: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.decomposition import PCA
Create an instance of the class
PCAinst = PCA(n_components=3, whiten=True) whiten = scale
and center data
PCA: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.decomposition import PCA
Create an instance of the class
PCAinst = PCA(n_components=3, whiten=True)
Fit the instance on the data and then transform the data
X_trans = PCAinst.fit_transform(X_train)
Does not work with sparse matrices
PCA: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.decomposition import PCA
Create an instance of the class
PCAinst = PCA(n_components=3, whiten=True)
Fit the instance on the data and then transform the data
X_trans = PCAinst.fit_transform(X_train)
Does not work with sparse matrices
PCA: The Syntax
Truncated SVD: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.decomposition import TruncatedSVD
Create an instance of the class
SVD = TruncatedSVD(n_components=3)
Fit the instance on the data and then transform the data
X_trans = SVD.fit_transform(X_sparse)
Works with sparse matrices—used with text data for Latent Semantic Analysis (LSA)
Import the class containing the dimensionality reduction method
from sklearn.decomposition import TruncatedSVD
Create an instance of the class
SVD = TruncatedSVD(n_components=3)
Fit the instance on the data and then transform the data
X_trans = SVD.fit_transform(X_sparse)
Works with sparse matrices—used with text data for Latent Semantic Analysis (LSA)
does not
center data
Truncated SVD: The Syntax
• Transformations calculated
with PCA/SVD are linear
• Data can have non-linear
features
• This can cause dimensionality
reduction to fail
Moving Beyond Linearity
Original Space Projection by PCA
• Transformations calculated
with PCA/SVD are linear
• Data can have non-linear
features
• This can cause dimensionality
reduction to fail
Moving Beyond Linearity
Original Space Projection by PCA
• Transformations calculated
with PCA/SVD are linear
• Data can have non-linear
features
• This can cause dimensionality
reduction to fail
Moving Beyond Linearity
Original Space Projection by PCA
dimensionality
reduction fails
• Solution: kernels can be
used to perform non-linear
PCA
• Like the kernel trick
introduced for SVMs
Kernel PCA
Original Space Projection by KPCA
Kernel PCA
Linear PCA
𝑅2 𝑅2
Φ
𝐹
Kernel PCA• Solution: kernels can be
used to perform non-linear
PCA
• Like the kernel trick
introduced for SVMs
Kernel PCA: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.decomposition import KernelPCA
Create an instance of the class
kPCA = KernelPCA(n_components=3, kernel='rbf', gamma=1.0)
Fit the instance on the data and then transform the data
X_trans = kPCA.fit_transform(X_train)
• Non-linear transformation
• Doesn't focus on maintaining
overall variance
• Instead, maintains geometric
distances between points
Multi-Dimensional Scaling (MDS)
X
Y
Z
MDS: The Syntax
Import the class containing the dimensionality reduction method
from sklearn.manifold import MDS
Create an instance of the class
mdsMod = MDS(n_components=2)
Fit the instance on the data and then transform the data
X_trans = mdsMod.fit_transform(X_sparse)
Many other manifold dimensionality methods exist: Isomap, TSNE.
Uses of Dimensionality Reduction
Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg
• Frequently used for high
dimensionality data
• Natural language processing
(NLP)—many word
combinations
• Image-based data sets—
pixels are features
Uses of Dimensionality Reduction
• Divide image into 12 x 12
pixel sections
• Flatten section to create row
of data with 144 features
• Perform PCA on all data
points
Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg
Uses of Dimensionality Reduction
• Divide image into 12 x 12
pixel sections
• Flatten section to create row
of data with 144 features
• Perform PCA on all data
points
Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg
12 x 12
1 2 3 … 142 143 144
Uses of Dimensionality Reduction
• Divide image into 12 x 12
pixel sections
• Flatten section to create row
of data with 144 features
• Perform PCA on all data
points
Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg
1 2 3 … 142 143 144
1 2 3 … 142 143 144
1 2 3 … 142 143 144
1 2 3 … 142 143 144
1 2 3 … 142 143 144
1 2 3 … 142 143 144
PCA Compression: 144 → 60 Dimensions
144
Dimensions
60
Dimensions
PCA Compression: 144 → 16 Dimensions
144
Dimensions
16 Dimensions
Sixteen Most Important Eigenvectors
PCA Compression: 144 → 4 Dimensions
144
Dimensions
4 Dimensions
L2 Error and PCA Dimension
PCA Dimension
0.2
4020
RelativeError
0.8
1.0
0.6
0.4
60 80 100 120 140
Four Most Important Eigenvectors
Four Most Important Eigenvectors
PCA Compression: 144 → 1 Dimension
144
Dimensions
1
Dimension
Ml10 dimensionality reduction-and_advanced_topics

More Related Content

What's hot

Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
Akash Goel
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
Rohit Kumar
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Brodmann17
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Knoldus Inc.
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
Sungjoon Choi
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
Pravinkumar Landge
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
Mehrnaz Faraz
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
Hasan H Topcu
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
佳蓉 倪
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
Lukas Tencer
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
CloudxLab
 

What's hot (20)

Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 

Similar to Ml10 dimensionality reduction-and_advanced_topics

Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
Linear regression
Linear regressionLinear regression
Linear regression
ansrivas21
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
ankit_ppt
 
background.pptx
background.pptxbackground.pptx
background.pptx
KabileshCm
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
07 learning
07 learning07 learning
07 learning
ankit_ppt
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matching
ankit_ppt
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
Venkata Reddy Konasani
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
YONG ZHENG
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
Venkata Reddy Konasani
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
Hadi Fadlallah
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
milad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
Roger Barga
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
Siddharth Shrivastava
 

Similar to Ml10 dimensionality reduction-and_advanced_topics (20)

Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
07 learning
07 learning07 learning
07 learning
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matching
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 

More from ankit_ppt

Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
ankit_ppt
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
06 image features
06 image features06 image features
06 image features
ankit_ppt
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_ii
ankit_ppt
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
ankit_ppt
 
02 image processing
02 image processing02 image processing
02 image processing
ankit_ppt
 
01 foundations
01 foundations01 foundations
01 foundations
ankit_ppt
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
ankit_ppt
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
ankit_ppt
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
ankit_ppt
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
ankit_ppt
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
ankit_ppt
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
ankit_ppt
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
ankit_ppt
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stacking
ankit_ppt
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
ankit_ppt
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
ankit_ppt
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernels
ankit_ppt
 

More from ankit_ppt (20)

Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
06 image features
06 image features06 image features
06 image features
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_ii
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
 
02 image processing
02 image processing02 image processing
02 image processing
 
01 foundations
01 foundations01 foundations
01 foundations
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stacking
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernels
 

Recently uploaded

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 

Recently uploaded (20)

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 

Ml10 dimensionality reduction-and_advanced_topics

  • 2. Legal Notices and Disclaimers This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. This sample source code is released under the Intel Sample Source Code License Agreement. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2017, Intel Corporation. All rights reserved.
  • 3. Curse of Dimensionality • Theoretically, increasing features should improve performance • In practice, more features leads to worse performance • Number of training examples required increases exponentially with dimensionality 1 dimension: 10 positions 2 dimensions: 100 positions 3 dimensions: 1000 positions
  • 4. Curse of Dimensionality • Theoretically, increasing features should improve performance • In practice, too many features leads to worse performance 1 dimension: 10 positions 2 dimensions: 100 positions 3 dimensions: 1000 positions
  • 5. Curse of Dimensionality • Theoretically, increasing features should improve performance • In practice, too many features leads to worse performance • Number of training examples required increases exponentially with dimensionality 1 dimension: 10 positions 2 dimensions: 100 positions 3 dimensions: 1000 positions
  • 6. Solution: Dimensionality Reduction • Data can be represented by fewer dimensions (features) • Reduce dimensionality by selecting subset (feature elimination) • Combine with linear and non-linear transformations Height Cigarettes/Day
  • 7. Solution: Dimensionality Reduction • Data can be represented by fewer dimensions (features) • Reduce dimensionality by selecting subset (feature elimination) • Combine with linear and non-linear transformations Height Cigarettes/Day
  • 8. Solution: Dimensionality Reduction • Data can be represented by fewer dimensions (features) • Reduce dimensionality by selecting subset (feature elimination) • Combine with linear and non-linear transformations Height Cigarettes/Day
  • 9. Solution: Dimensionality Reduction • Two features: height and cigarettes per day • Both features increase together (correlated) • Can we reduce number of features to one? Height Cigarettes/Day
  • 10. Solution: Dimensionality Reduction • Two features: height and cigarettes per day • Both features increase together (correlated) • Can we reduce number of features to one? Height Cigarettes/Day
  • 11. Solution: Dimensionality Reduction • Two features: height and cigarettes per day • Both features increase together (correlated) • Can we reduce number of features to one? Height Cigarettes/Day
  • 12. Solution: Dimensionality Reduction • Two features: height and cigarettes per day • Both features increase together (correlated) • Can we reduce number of features to one? Height Cigarettes/Day
  • 13. Solution: Dimensionality Reduction • Create single feature that is combination of height and cigarettes • This is Principal Component Analysis (PCA) Height Cigarettes/Day
  • 14. Solution: Dimensionality Reduction • Create single feature that is combination of height and cigarettes • This is Principal Component Analysis (PCA)
  • 15. Dimensionality Reduction Given an 𝑁-dimensional data set (𝑥), find a 𝑁 × 𝐾 matrix (𝑈): 𝑦 = 𝑈 𝑇 𝑥, where 𝑦 has 𝐾 dimensions and 𝐾 < 𝑁 𝑥 = 𝑥1 𝑥2 ⋯ 𝑥 𝑛 𝑈 𝑇 𝑦 = 𝑦1 𝑦2 ⋯ 𝑦 𝑘 (𝐾 < 𝑁)
  • 19. Principal Component Analysis (PCA) X2 X1 Direction: 𝑣1 Length: 𝜆1 Direction: 𝑣2 Length: 𝜆2
  • 20. • SVD is a matrix factorization method normally used for PCA • Does not require a square data set • SVD is used by Scikit- learn for PCA Single Value Decomposition (SVD) 𝐴 𝑚×𝑛 𝑈 𝑚×𝑚 𝑆 𝑚×𝑛 𝑉𝑛×𝑛 𝑇 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ 0 0 0 ⋆ 0 0 0 0 0 0 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆
  • 21. • How can SVD be used for dimensionality reduction? • Principal components are calculated from 𝑈𝑆 • "Truncated SVD" used for dimensionality reduction (𝑛 → 𝑘) Truncated Single Value Decomposition 𝐴 𝑚×𝑛 𝑈 𝑚×𝑘 𝑆 𝑘×𝑘 𝑉𝑘×𝑛 𝑇 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ≈ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 𝟗 0 0 0 𝟕 0 0 0 𝟏 0 0 0 0 0 0 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆
  • 22. • PCA and SVD seek to find the vectors that capture the most variance • Variance is sensitive to axis scale • Must scale data! Importance of Feature Scaling X2 X1 100 200 300 400 500 10 20 30 40 50 Unscaled
  • 23. • PCA and SVD seek to find the vectors that capture the most variance • Variance is sensitive to axis scale • Must scale data! X2 X1 10 20 30 40 50 10 20 30 40 50 Unscaled Scaled Importance of Feature Scaling
  • 24. Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA(n_components=3, whiten=True) Fit the instance on the data and then transform the data X_trans = PCAinst.fit_transform(X_train) Does not work with sparse matrices PCA: The Syntax
  • 25. Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA(n_components=3, whiten=True) PCA: The Syntax
  • 26. Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA(n_components=3, whiten=True) final number of dimensions PCA: The Syntax
  • 27. Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA(n_components=3, whiten=True) whiten = scale and center data PCA: The Syntax
  • 28. Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA(n_components=3, whiten=True) Fit the instance on the data and then transform the data X_trans = PCAinst.fit_transform(X_train) Does not work with sparse matrices PCA: The Syntax
  • 29. Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA(n_components=3, whiten=True) Fit the instance on the data and then transform the data X_trans = PCAinst.fit_transform(X_train) Does not work with sparse matrices PCA: The Syntax
  • 30. Truncated SVD: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import TruncatedSVD Create an instance of the class SVD = TruncatedSVD(n_components=3) Fit the instance on the data and then transform the data X_trans = SVD.fit_transform(X_sparse) Works with sparse matrices—used with text data for Latent Semantic Analysis (LSA)
  • 31. Import the class containing the dimensionality reduction method from sklearn.decomposition import TruncatedSVD Create an instance of the class SVD = TruncatedSVD(n_components=3) Fit the instance on the data and then transform the data X_trans = SVD.fit_transform(X_sparse) Works with sparse matrices—used with text data for Latent Semantic Analysis (LSA) does not center data Truncated SVD: The Syntax
  • 32. • Transformations calculated with PCA/SVD are linear • Data can have non-linear features • This can cause dimensionality reduction to fail Moving Beyond Linearity Original Space Projection by PCA
  • 33. • Transformations calculated with PCA/SVD are linear • Data can have non-linear features • This can cause dimensionality reduction to fail Moving Beyond Linearity Original Space Projection by PCA
  • 34. • Transformations calculated with PCA/SVD are linear • Data can have non-linear features • This can cause dimensionality reduction to fail Moving Beyond Linearity Original Space Projection by PCA dimensionality reduction fails
  • 35. • Solution: kernels can be used to perform non-linear PCA • Like the kernel trick introduced for SVMs Kernel PCA Original Space Projection by KPCA
  • 36. Kernel PCA Linear PCA 𝑅2 𝑅2 Φ 𝐹 Kernel PCA• Solution: kernels can be used to perform non-linear PCA • Like the kernel trick introduced for SVMs
  • 37. Kernel PCA: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import KernelPCA Create an instance of the class kPCA = KernelPCA(n_components=3, kernel='rbf', gamma=1.0) Fit the instance on the data and then transform the data X_trans = kPCA.fit_transform(X_train)
  • 38. • Non-linear transformation • Doesn't focus on maintaining overall variance • Instead, maintains geometric distances between points Multi-Dimensional Scaling (MDS) X Y Z
  • 39. MDS: The Syntax Import the class containing the dimensionality reduction method from sklearn.manifold import MDS Create an instance of the class mdsMod = MDS(n_components=2) Fit the instance on the data and then transform the data X_trans = mdsMod.fit_transform(X_sparse) Many other manifold dimensionality methods exist: Isomap, TSNE.
  • 40. Uses of Dimensionality Reduction Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg • Frequently used for high dimensionality data • Natural language processing (NLP)—many word combinations • Image-based data sets— pixels are features
  • 41. Uses of Dimensionality Reduction • Divide image into 12 x 12 pixel sections • Flatten section to create row of data with 144 features • Perform PCA on all data points Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg
  • 42. Uses of Dimensionality Reduction • Divide image into 12 x 12 pixel sections • Flatten section to create row of data with 144 features • Perform PCA on all data points Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg 12 x 12 1 2 3 … 142 143 144
  • 43. Uses of Dimensionality Reduction • Divide image into 12 x 12 pixel sections • Flatten section to create row of data with 144 features • Perform PCA on all data points Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144
  • 44. PCA Compression: 144 → 60 Dimensions 144 Dimensions 60 Dimensions
  • 45. PCA Compression: 144 → 16 Dimensions 144 Dimensions 16 Dimensions
  • 46. Sixteen Most Important Eigenvectors
  • 47. PCA Compression: 144 → 4 Dimensions 144 Dimensions 4 Dimensions
  • 48. L2 Error and PCA Dimension PCA Dimension 0.2 4020 RelativeError 0.8 1.0 0.6 0.4 60 80 100 120 140
  • 49. Four Most Important Eigenvectors
  • 50. Four Most Important Eigenvectors
  • 51. PCA Compression: 144 → 1 Dimension 144 Dimensions 1 Dimension