SlideShare a Scribd company logo
1 of 22
Introduction to Dimensionality
Reduction Technique
What is Dimensionality Reduction?
• Dimensionality reduction technique can be defined as, "It is a way of
converting the higher dimensions dataset into lesser dimensions
dataset ensuring that it provides similar information." These
techniques are widely used in machine learning for obtaining a better
fit predictive model while solving the classification and regression
problems.
• It is commonly used in the fields that deal with high-dimensional
data, such as speech recognition, signal processing, bioinformatics,
etc. It can also be used for data visualization, noise reduction, cluster
analysis, etc.
Benefits of applying Dimensionality
Reduction
• Some benefits of applying dimensionality reduction technique to the given
dataset are given below:
• By reducing the dimensions of the features, the space required to store the
dataset also gets reduced.
• Less Computation training time is required for reduced dimensions of
features.
• Reduced dimensions of features of the dataset help in visualizing the data
quickly.
• It removes the redundant features (if present) by taking care of
multicollinearity.
Disadvantages of dimensionality Reduction
• There are also some disadvantages of applying the dimensionality
reduction, which are given below:
• Some data may be lost due to dimensionality reduction.
• In the PCA dimensionality reduction technique, sometimes the
principal components required to consider are unknown.
Approaches of Dimension Reduction
There are two ways to apply the dimension reduction technique, which
are given below:
1. Feature Selection
• Feature selection is the process of selecting the subset of the relevant
features and leaving out the irrelevant features present in a dataset
to build a model of high accuracy. In other words, it is a way of
selecting the optimal features from the input dataset.
• Three methods are used for the feature selection:
Feature Selection
1. Filters Methods
• In this method, the dataset is filtered, and a subset that contains only
the relevant features is taken. Some common techniques of filters
method are:
• Correlation
• Chi-Square Test
• ANOVA
• Information Gain, etc.
1. Feature Selection
• 2. Wrappers Methods
• The wrapper method has the same goal as the filter method,
but it takes a machine learning model for its evaluation. In this
method, some features are fed to the ML model, and evaluate
the performance. The performance decides whether to add
those features or remove to increase the accuracy of the
model. This method is more accurate than the filtering method
but complex to work. Some common techniques of wrapper
methods are:
• Forward Selection
• Backward Selection
• Bi-directional Elimination
1. Feature Selection
• 3. Embedded Methods: Embedded methods check the
different training iterations of the machine learning model
and evaluate the importance of each feature. Some
common techniques of Embedded methods are:
• LASSO
• Elastic Net
• Ridge Regression, etc.
2. Feature Extraction
• Feature extraction is the process of transforming the
space containing many dimensions into space with fewer
dimensions. This approach is useful when we want to
keep the whole information but use fewer resources while
processing the information.
• Some common feature extraction techniques are:
1.Principal Component Analysis
2.Linear Discriminant Analysis
3.Kernel PCA
4.Quadratic Discriminant Analysis
Common techniques of Dimensionality
Reduction
• Principal Component Analysis
• Backward Elimination
• Forward Selection
• Score comparison
• Missing Value Ratio
• Low Variance Filter
• High Correlation Filter
• Random Forest
• Factor Analysis
• Auto-Encoder
Principal Component Analysis
• Principal Component Analysis is a statistical process that
converts the observations of correlated features into a
set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed
features are called the Principal Components. It is one of
the popular tools that is used for exploratory data
analysis and predictive modeling.
• PCA works by considering the variance of each attribute
because the high attribute shows the good split between
the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image
processing, movie recommendation system, optimizing
the power allocation in various communication channels.
Principal Component Analysis
• Principal Component Analysis is an unsupervised learning algorithm
that is used for dimensionality reduction (a technique to reduce the
number of features within a data set.) in machine learning.
• It is a statistical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are
called the Principal Components.
• It is one of the popular tools that is used for exploratory data analysis
and predictive modeling.
• It is a technique to draw strong patterns from the given dataset by
reducing the variances.
Applications of Machine Learning &
19CT3611
Department of Computer Science & Technology 13
The PCA algorithm is based on some mathematical concepts such as:
• Variance and Covariance
• Eigenvalues and Eigen factors
Applications of Machine Learning &
19CT3611
Department of Computer Science & Technology 14
Principal Component Analysis
• The Principal Component Analysis is a popular
unsupervised learning technique for reducing the
dimensionality of data. It increases interpretability
yet, at the same time, it minimizes information loss.
It helps to find the most significant features in a
dataset and makes the data easy for plotting in 2D
and 3D. PCA helps in finding a sequence of linear
combinations of variables.
• In the above figure, we have several points plotted
on a 2-D plane. There are two principal components.
PC1 is the primary principal component that
explains the maximum variance in the data. PC2 is
another principal component that is orthogonal to
PC1.
What is a Principal Component?
• The Principal Components are a straight line that captures most of
the variance of the data. They have a direction and magnitude.
Principal components are orthogonal projections (perpendicular) of
data onto lower-dimensional space.
Some common terms used in PCA algorithm:
• Dimensionality
• Correlation
• Orthogonal
• Eigenvectors
• Covariance Matrix
Applications of Machine Learning &
19CT3611
Department of Computer Science & Technology 17
How does Principal Component Analysis
Work?
1. Normalize the data
• Standardize the data before performing PCA. This will ensure that
each feature has a mean = 0 and variance = 1.
2. Build the covariance matrix
• Construct a square matrix to express the correlation between two or
more features in a multidimensional dataset.
3. Find the Eigenvectors and Eigenvalues
• Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues
are scalars by which we multiply the eigenvector of the covariance
matrix.
• 4. Sort the eigenvectors in highest to lowest order and select the
number of principal components.
Steps for PCA algorithm
• Getting the dataset
• Representing data into a structure
• Standardizing the data
• Calculating the Covariance of Z
• Calculating the Eigen Values and Eigen Vectors
• Sorting the Eigen Vectors
• Calculating the new features Or Principal Components
• Remove fewer or unimportant features from the new dataset.
Applications of Machine Learning &
19CT3611
Department of Computer Science & Technology 21
Applications of Principal Component Analysis
• PCA is mainly used as the dimensionality reduction technique in
various AI applications such as computer vision, image compression,
etc.
• It can also be used for finding hidden patterns if data has high
dimensions. Some fields where PCA is used are Finance, data mining,
Psychology, etc.
Applications of Machine Learning &
19CT3611
Department of Computer Science & Technology 22

More Related Content

Similar to Module-4_Part-II.pptx

overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processingFEG
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Jayanti Pande
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...IRJET Journal
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1khairulhuda242
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxnarmeen11
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessingKnoldus Inc.
 
Data Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptxData Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptxTs. Heshalini Rajagopal
 
Series-and-Parallel-Algorithm.pptx
Series-and-Parallel-Algorithm.pptxSeries-and-Parallel-Algorithm.pptx
Series-and-Parallel-Algorithm.pptxBikashKhanal15
 
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docxjaffarbikat
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 

Similar to Module-4_Part-II.pptx (20)

overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processing
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
M5.pptx
M5.pptxM5.pptx
M5.pptx
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
seminar.pptx
seminar.pptxseminar.pptx
seminar.pptx
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
 
Data Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptxData Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptx
 
Series-and-Parallel-Algorithm.pptx
Series-and-Parallel-Algorithm.pptxSeries-and-Parallel-Algorithm.pptx
Series-and-Parallel-Algorithm.pptx
 
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 

More from VaishaliBagewadikar

More from VaishaliBagewadikar (7)

SEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptxSEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptx
 
part3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptxpart3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptx
 
Module-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptxModule-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptx
 
chapter3.pptx
chapter3.pptxchapter3.pptx
chapter3.pptx
 
Module 2 softcomputing.pptx
Module 2 softcomputing.pptxModule 2 softcomputing.pptx
Module 2 softcomputing.pptx
 
SC1.pptx
SC1.pptxSC1.pptx
SC1.pptx
 
FuzzyRelations.pptx
FuzzyRelations.pptxFuzzyRelations.pptx
FuzzyRelations.pptx
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 

Module-4_Part-II.pptx

  • 2. What is Dimensionality Reduction? • Dimensionality reduction technique can be defined as, "It is a way of converting the higher dimensions dataset into lesser dimensions dataset ensuring that it provides similar information." These techniques are widely used in machine learning for obtaining a better fit predictive model while solving the classification and regression problems. • It is commonly used in the fields that deal with high-dimensional data, such as speech recognition, signal processing, bioinformatics, etc. It can also be used for data visualization, noise reduction, cluster analysis, etc.
  • 3.
  • 4. Benefits of applying Dimensionality Reduction • Some benefits of applying dimensionality reduction technique to the given dataset are given below: • By reducing the dimensions of the features, the space required to store the dataset also gets reduced. • Less Computation training time is required for reduced dimensions of features. • Reduced dimensions of features of the dataset help in visualizing the data quickly. • It removes the redundant features (if present) by taking care of multicollinearity.
  • 5. Disadvantages of dimensionality Reduction • There are also some disadvantages of applying the dimensionality reduction, which are given below: • Some data may be lost due to dimensionality reduction. • In the PCA dimensionality reduction technique, sometimes the principal components required to consider are unknown.
  • 6. Approaches of Dimension Reduction There are two ways to apply the dimension reduction technique, which are given below: 1. Feature Selection • Feature selection is the process of selecting the subset of the relevant features and leaving out the irrelevant features present in a dataset to build a model of high accuracy. In other words, it is a way of selecting the optimal features from the input dataset. • Three methods are used for the feature selection:
  • 7. Feature Selection 1. Filters Methods • In this method, the dataset is filtered, and a subset that contains only the relevant features is taken. Some common techniques of filters method are: • Correlation • Chi-Square Test • ANOVA • Information Gain, etc.
  • 8. 1. Feature Selection • 2. Wrappers Methods • The wrapper method has the same goal as the filter method, but it takes a machine learning model for its evaluation. In this method, some features are fed to the ML model, and evaluate the performance. The performance decides whether to add those features or remove to increase the accuracy of the model. This method is more accurate than the filtering method but complex to work. Some common techniques of wrapper methods are: • Forward Selection • Backward Selection • Bi-directional Elimination
  • 9. 1. Feature Selection • 3. Embedded Methods: Embedded methods check the different training iterations of the machine learning model and evaluate the importance of each feature. Some common techniques of Embedded methods are: • LASSO • Elastic Net • Ridge Regression, etc.
  • 10. 2. Feature Extraction • Feature extraction is the process of transforming the space containing many dimensions into space with fewer dimensions. This approach is useful when we want to keep the whole information but use fewer resources while processing the information. • Some common feature extraction techniques are: 1.Principal Component Analysis 2.Linear Discriminant Analysis 3.Kernel PCA 4.Quadratic Discriminant Analysis
  • 11. Common techniques of Dimensionality Reduction • Principal Component Analysis • Backward Elimination • Forward Selection • Score comparison • Missing Value Ratio • Low Variance Filter • High Correlation Filter • Random Forest • Factor Analysis • Auto-Encoder
  • 12. Principal Component Analysis • Principal Component Analysis is a statistical process that converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation. These new transformed features are called the Principal Components. It is one of the popular tools that is used for exploratory data analysis and predictive modeling. • PCA works by considering the variance of each attribute because the high attribute shows the good split between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA are image processing, movie recommendation system, optimizing the power allocation in various communication channels.
  • 13. Principal Component Analysis • Principal Component Analysis is an unsupervised learning algorithm that is used for dimensionality reduction (a technique to reduce the number of features within a data set.) in machine learning. • It is a statistical process that converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation. These new transformed features are called the Principal Components. • It is one of the popular tools that is used for exploratory data analysis and predictive modeling. • It is a technique to draw strong patterns from the given dataset by reducing the variances. Applications of Machine Learning & 19CT3611 Department of Computer Science & Technology 13
  • 14. The PCA algorithm is based on some mathematical concepts such as: • Variance and Covariance • Eigenvalues and Eigen factors Applications of Machine Learning & 19CT3611 Department of Computer Science & Technology 14
  • 15. Principal Component Analysis • The Principal Component Analysis is a popular unsupervised learning technique for reducing the dimensionality of data. It increases interpretability yet, at the same time, it minimizes information loss. It helps to find the most significant features in a dataset and makes the data easy for plotting in 2D and 3D. PCA helps in finding a sequence of linear combinations of variables. • In the above figure, we have several points plotted on a 2-D plane. There are two principal components. PC1 is the primary principal component that explains the maximum variance in the data. PC2 is another principal component that is orthogonal to PC1.
  • 16. What is a Principal Component? • The Principal Components are a straight line that captures most of the variance of the data. They have a direction and magnitude. Principal components are orthogonal projections (perpendicular) of data onto lower-dimensional space.
  • 17. Some common terms used in PCA algorithm: • Dimensionality • Correlation • Orthogonal • Eigenvectors • Covariance Matrix Applications of Machine Learning & 19CT3611 Department of Computer Science & Technology 17
  • 18. How does Principal Component Analysis Work?
  • 19. 1. Normalize the data • Standardize the data before performing PCA. This will ensure that each feature has a mean = 0 and variance = 1. 2. Build the covariance matrix • Construct a square matrix to express the correlation between two or more features in a multidimensional dataset.
  • 20. 3. Find the Eigenvectors and Eigenvalues • Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues are scalars by which we multiply the eigenvector of the covariance matrix. • 4. Sort the eigenvectors in highest to lowest order and select the number of principal components.
  • 21. Steps for PCA algorithm • Getting the dataset • Representing data into a structure • Standardizing the data • Calculating the Covariance of Z • Calculating the Eigen Values and Eigen Vectors • Sorting the Eigen Vectors • Calculating the new features Or Principal Components • Remove fewer or unimportant features from the new dataset. Applications of Machine Learning & 19CT3611 Department of Computer Science & Technology 21
  • 22. Applications of Principal Component Analysis • PCA is mainly used as the dimensionality reduction technique in various AI applications such as computer vision, image compression, etc. • It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA is used are Finance, data mining, Psychology, etc. Applications of Machine Learning & 19CT3611 Department of Computer Science & Technology 22