SlideShare a Scribd company logo
zekeLabs
Dimensionality Reduction
“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”
● Real Data
● PCA
● Eigenvectors
● Covariance Matrix
● Matrix Decomposition
● Covariance Matrix Decomposition
● SVD
● PCA vs SVD
● LDA
● PCA vs LDA
Overview of
Dimensionality
Reduction
Techniques
Real Data
● Real world data and information therein may be
○ Noisy
■ Some dimensions may not carry any useful information
■ Variation in that dimension is purely due to noise in the observations
○ Redundant
■ One variables may carry the same information as the other variable
■ Information covered by a set of variable may overlap
● How to reduce the dimensions?
PCA
● Dimensionality reduction technique
● Linear projection of Data into Orthogonal Basis System
● Minimum redundancy and preserves variance in the data
● Smallest reconstruction error
● Applications include Image-compression, Data Visualisation
Eigenvectors
● If A is a square matrix, vector v is an eigenvector of if there is a scalar 𝝀
such that
Av = 𝝀v
● Example:
● Simply, if we transform the eigenvector, it doesn’t change it’s direction
● Matrix multiplication of A with v is a transformation
Covariance Matrix
● Covariance Matrix of X
● Diagonal Terms: Variance
● Off-Diagonal Terms: covariance
● Covariance matrix is always symmetric
● n - no. of observation in X here
Matrix Decomposition
● If square d x d matrix S is a real and symmetric matrix then
Covariance Matrix Decomposition
● If square d x d matrix S is a real and symmetric matrix then
PCA
● Compute the covariance matrix decomposition
● The first principle component will always be the eigenvector with highest
eigenvalue
● Second will be chosen with the next highest eigenvalue
Linear Discriminant Analysis - LDA
● Dimensionality reduction technique in the pre-processing step for pattern-
classification and machine learning applications.
● Project data into lower dimension with good class separability to avoid
over-fitting & reduced computation
● In addition to finding component axis that maximizes the variance of our
data, we are additionally interested in the axes that maximize the
separation between multiple classes
● project a feature space onto a smaller subspace k (where k≤n−1) while
maintaining the class-discriminatory information.
PCA vs LDA
● It’s not true that LDA is superior
than PCA for classification
● PCA outperforms classification
for smaller dataset.
● PCA & SVD can be combined
PCA vs LDA
Normality Assumptions of LDA
● It should be mentioned that LDA assumes normal distributed data,
features that are statistically independent, and identical covariance
matrices for every class.
● However, this only applies for LDA as classifier and LDA for dimensionality
reduction can also work reasonably well if those assumptions are violated.
Singular Value Decomposition - SVD
SVD - Basics
SVD
SVD - Understanding Decomposition
SVD - Understanding Decomposition
PCA vs SVD
● The eigenvectors of C are the same as the right singular vectors of X
● The eigenvectors of covariance matrix are same as vector V
● Working directly with X will produce much more accurate results
● Working directly with X is faster
LDA
● LDA - Linear Discriminant Analysis
○ Compute the d-dimensional mean vectors for the different classes from the dataset
○ Compute the scatter matrices (in-between-class and within-class scatter matrix)
○ Compute the eigenvectors and corresponding eigenvalues for the scatter matrices
○ Sort the eigenvectors by decreasing eigenvalues and choose kk eigenvectors
PCA vs LDA

More Related Content

Similar to Dimentionality reduction

Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108
Ting-Shuo Yo
 
M5.pptx
M5.pptxM5.pptx
M5.pptx
MayuraD1
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
Cloudera, Inc.
 
PCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptxPCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptx
TechohiT
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
pca.pdf
pca.pdfpca.pdf
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
Md Abul Hayat
 
Fr pca lda
Fr pca ldaFr pca lda
Fr pca lda
ultraraj
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
ssuser2624f71
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
Jihun Yun
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detection
ActiveEon
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
Sivam Chinna
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
Lucian Neghina
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
SURBHI SAROHA
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
Bita Kazemi
 
Cutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationCutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For Classification
Pankaj Sharma
 
Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.
Sahana B S
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of data
Piyush Katariya
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
DataStax Academy
 
Skytree big data london meetup - may 2013
Skytree   big data london meetup - may 2013Skytree   big data london meetup - may 2013
Skytree big data london meetup - may 2013
bigdatalondon
 

Similar to Dimentionality reduction (20)

Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108
 
M5.pptx
M5.pptxM5.pptx
M5.pptx
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
PCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptxPCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptx
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
pca.pdf
pca.pdfpca.pdf
pca.pdf
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Fr pca lda
Fr pca ldaFr pca lda
Fr pca lda
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detection
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
Cutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationCutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For Classification
 
Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of data
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
Skytree big data london meetup - may 2013
Skytree   big data london meetup - may 2013Skytree   big data london meetup - may 2013
Skytree big data london meetup - may 2013
 

More from zekeLabs Technologies

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
zekeLabs Technologies
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
zekeLabs Technologies
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
zekeLabs Technologies
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
zekeLabs Technologies
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
zekeLabs Technologies
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
zekeLabs Technologies
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
zekeLabs Technologies
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
zekeLabs Technologies
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
zekeLabs Technologies
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
zekeLabs Technologies
 
Naive bayes
Naive bayesNaive bayes
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
zekeLabs Technologies
 
Linear regression
Linear regressionLinear regression
Linear regression
zekeLabs Technologies
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
zekeLabs Technologies
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
zekeLabs Technologies
 
Feature selection
Feature selectionFeature selection
Feature selection
zekeLabs Technologies
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
zekeLabs Technologies
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 

More from zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Dimentionality reduction

  • 2. “Goal - Become a Data Scientist” “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett “The Plan” “A Goal without a Plan is just a wish”
  • 3. ● Real Data ● PCA ● Eigenvectors ● Covariance Matrix ● Matrix Decomposition ● Covariance Matrix Decomposition ● SVD ● PCA vs SVD ● LDA ● PCA vs LDA Overview of Dimensionality Reduction Techniques
  • 4. Real Data ● Real world data and information therein may be ○ Noisy ■ Some dimensions may not carry any useful information ■ Variation in that dimension is purely due to noise in the observations ○ Redundant ■ One variables may carry the same information as the other variable ■ Information covered by a set of variable may overlap ● How to reduce the dimensions?
  • 5. PCA ● Dimensionality reduction technique ● Linear projection of Data into Orthogonal Basis System ● Minimum redundancy and preserves variance in the data ● Smallest reconstruction error ● Applications include Image-compression, Data Visualisation
  • 6. Eigenvectors ● If A is a square matrix, vector v is an eigenvector of if there is a scalar 𝝀 such that Av = 𝝀v ● Example: ● Simply, if we transform the eigenvector, it doesn’t change it’s direction ● Matrix multiplication of A with v is a transformation
  • 7. Covariance Matrix ● Covariance Matrix of X ● Diagonal Terms: Variance ● Off-Diagonal Terms: covariance ● Covariance matrix is always symmetric ● n - no. of observation in X here
  • 8. Matrix Decomposition ● If square d x d matrix S is a real and symmetric matrix then
  • 9. Covariance Matrix Decomposition ● If square d x d matrix S is a real and symmetric matrix then
  • 10. PCA ● Compute the covariance matrix decomposition ● The first principle component will always be the eigenvector with highest eigenvalue ● Second will be chosen with the next highest eigenvalue
  • 11. Linear Discriminant Analysis - LDA ● Dimensionality reduction technique in the pre-processing step for pattern- classification and machine learning applications. ● Project data into lower dimension with good class separability to avoid over-fitting & reduced computation ● In addition to finding component axis that maximizes the variance of our data, we are additionally interested in the axes that maximize the separation between multiple classes ● project a feature space onto a smaller subspace k (where k≤n−1) while maintaining the class-discriminatory information.
  • 12. PCA vs LDA ● It’s not true that LDA is superior than PCA for classification ● PCA outperforms classification for smaller dataset. ● PCA & SVD can be combined
  • 14. Normality Assumptions of LDA ● It should be mentioned that LDA assumes normal distributed data, features that are statistically independent, and identical covariance matrices for every class. ● However, this only applies for LDA as classifier and LDA for dimensionality reduction can also work reasonably well if those assumptions are violated.
  • 17. SVD
  • 18. SVD - Understanding Decomposition
  • 19. SVD - Understanding Decomposition
  • 20. PCA vs SVD ● The eigenvectors of C are the same as the right singular vectors of X ● The eigenvectors of covariance matrix are same as vector V ● Working directly with X will produce much more accurate results ● Working directly with X is faster
  • 21. LDA ● LDA - Linear Discriminant Analysis ○ Compute the d-dimensional mean vectors for the different classes from the dataset ○ Compute the scatter matrices (in-between-class and within-class scatter matrix) ○ Compute the eigenvectors and corresponding eigenvalues for the scatter matrices ○ Sort the eigenvectors by decreasing eigenvalues and choose kk eigenvectors