Dimensionality Reduction in Machine Learning

•

0 likes•13 views

RomiRoy4

Principal Component Analysis in Machine Learning

Engineering

• Dimensionality reduction is the process of reducing the number of random variables or
attributes under consideration.
• When the dimension increases, with the sparsity, the distance between two independent
points increases. That results in less similarity among the data points which will result in
more error when it comes to most of the machine learning and other techniques used in
data mining. To compensate we will have to feed very large number of data points but with
higher dimensions it’s practically impossible and even it’s possible it will be inefficient.

Techniques of dimensionality reduction
Dimensionality reduction is accomplished based on either feature selection or feature
extraction.
Feature selection is based on omitting those features from the available measurements
which do not contribute to class separability. In other words, redundant and irrelevant
features are ignored.

Feature extraction, on the other hand, considers the whole information content and maps the
useful information content into a lower dimensional feature space.

Why Dimensionality Reduction is Important
• Dimensionality reduction brings many advantages to your machine learning data,
including:
• Fewer features mean less complexity
• You will need less storage space because you have fewer data
• Fewer features require less computation time
• Model accuracy improves due to less misleading data
• Algorithms train faster thanks to fewer data
• Reducing the data set’s feature dimensions helps visualize the data faster
• It removes noise and redundant features

Dimensionality Reduction Techniques
• Here are some techniques machine learning professionals use.
• Principal Component Analysis(feature extraction).
• PCA extracts a new set of variables from an existing, more extensive set. The new set is called “principal
components.”
• Backward Feature Elimination.
• Forward Feature Selection.
• Low Variance Filter.
• High Correlation Filter.
• Decision Trees.(feature selection)
• Random Forest.
• Factor Analysis.(feature selection)

How do you do a PCA?
1.Standardize the range of continuous initial variables
2.Compute the covariance matrix to identify correlations
3.Compute the eigenvectors and eigenvalues of the covariance matrix to identify the
principal components
4.Create a feature vector to decide which principal components to keep
5.Recast the data along the principal components axes

Exercise:
• Consider the two dimensional patterns
(2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
• Compute the principal component using PCA Algorithm.

Thus, two eigen values are λ1 = 8.22 and λ2 = 0.38.
Clearly, the second eigen value is very small compared to the first eigen value.
So, the second eigen vector can be left out.
Eigen vector corresponding to the greatest eigen value is the principal component for the given data
set.
So. we find the eigen vector corresponding to eigen value λ1.

• 𝐴 = 𝜋𝑟2
we project the data points onto the new subspace
as-
=
Projected points are:

Similar to Dimensionality Reduction in Machine Learning

Week 12 Dimensionality Reduction Bagian 1khairulhuda242

Rapid MinerSrushtiSuvarna

overview of_data_processingFEG

Random Forest Decision Tree.pptxRamakrishna Reddy Bijjam

Heuristic approch monika sanghaniMonika Sanghani

Pre-Processing and Data PreparationUmair Shafique

ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit

Working with the data for Machine LearningMehwish690898

DATA MINING.pptxDipankar Boruah

Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann

Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha

Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu

Data preprocessing using Machine Learning Gopal Sakarkar

CSL0777-L07.pptxKonkoboUlrichArthur

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

random forest.pptxPriyadharshiniG41

Intro to Machine Learning by Microsoft Venturesmicrosoftventures

background.pptxKabileshCm

Feature Engineering.pdfRajoo Jha

PCA.pptxtestuser473730

Similar to Dimensionality Reduction in Machine Learning (20)

Week 12 Dimensionality Reduction Bagian 1

Rapid Miner

overview of_data_processing

Random Forest Decision Tree.pptx

Heuristic approch monika sanghani

Pre-Processing and Data Preparation

ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...

Working with the data for Machine Learning

DATA MINING.pptx

Predicting Moscow Real Estate Prices with Azure Machine Learning

Data preprocessing using Machine Learning

CSL0777-L07.pptx

Kaggle Higgs Boson Machine Learning Challenge

random forest.pptx

Intro to Machine Learning by Microsoft Ventures

background.pptx

Feature Engineering.pdf

PCA.pptx

Recently uploaded

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066

result management system report for college projectTonystark477637

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Thermal Engineering-R & A / C - unit - VDineshKumar4165

Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi

Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control

Online banking management system project.pdfKamal Acharya

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

University management System project report..pdfKamal Acharya

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

UNIT - IV - Air Compressors and its Performancesivaprakash250

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

Extrusion Processes and Their Limitations120cr0395

chapter 5.pptx: drainage and irrigation engineeringmulugeta48

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Vivazz, Mieres Social Housing Design Spaintimesproduction05

Recently uploaded (20)

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...

UNIT-II FMM-Flow Through Circular Conduits

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756

result management system report for college project

Coefficient of Thermal Expansion and their Importance.pptx

Thermal Engineering-R & A / C - unit - V

Glass Ceramics: Processing and Properties

Water Industry Process Automation & Control Monthly - April 2024

Online banking management system project.pdf

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

University management System project report..pdf

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

UNIT - IV - Air Compressors and its Performance

Roadmap to Membership of RICS - Pathways and Routes

Extrusion Processes and Their Limitations

chapter 5.pptx: drainage and irrigation engineering

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

Vivazz, Mieres Social Housing Design Spain

Dimensionality Reduction in Machine Learning

1. Dimensionality Reduction

2. • Dimensionality reduction is the process of reducing the number of random variables or attributes under consideration. • When the dimension increases, with the sparsity, the distance between two independent points increases. That results in less similarity among the data points which will result in more error when it comes to most of the machine learning and other techniques used in data mining. To compensate we will have to feed very large number of data points but with higher dimensions it’s practically impossible and even it’s possible it will be inefficient.

3. Techniques of dimensionality reduction Dimensionality reduction is accomplished based on either feature selection or feature extraction. Feature selection is based on omitting those features from the available measurements which do not contribute to class separability. In other words, redundant and irrelevant features are ignored.

4. Feature extraction, on the other hand, considers the whole information content and maps the useful information content into a lower dimensional feature space.

5. Why Dimensionality Reduction is Important • Dimensionality reduction brings many advantages to your machine learning data, including: • Fewer features mean less complexity • You will need less storage space because you have fewer data • Fewer features require less computation time • Model accuracy improves due to less misleading data • Algorithms train faster thanks to fewer data • Reducing the data set’s feature dimensions helps visualize the data faster • It removes noise and redundant features

6. Dimensionality Reduction Techniques • Here are some techniques machine learning professionals use. • Principal Component Analysis(feature extraction). • PCA extracts a new set of variables from an existing, more extensive set. The new set is called “principal components.” • Backward Feature Elimination. • Forward Feature Selection. • Low Variance Filter. • High Correlation Filter. • Decision Trees.(feature selection) • Random Forest. • Factor Analysis.(feature selection)

7. How do you do a PCA? 1.Standardize the range of continuous initial variables 2.Compute the covariance matrix to identify correlations 3.Compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components 4.Create a feature vector to decide which principal components to keep 5.Recast the data along the principal components axes

8. Exercise: • Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8). • Compute the principal component using PCA Algorithm.

10.

11.

12.

13.

14.

15.

16.

17. Thus, two eigen values are λ1 = 8.22 and λ2 = 0.38. Clearly, the second eigen value is very small compared to the first eigen value. So, the second eigen vector can be left out. Eigen vector corresponding to the greatest eigen value is the principal component for the given data set. So. we find the eigen vector corresponding to eigen value λ1.

18.

19.

20. • 𝐴 = 𝜋𝑟2 we project the data points onto the new subspace as- = Projected points are:

21.

22. Apply PCA for the following dataset

Dimensionality Reduction in Machine Learning

Recommended

Recommended

More Related Content

Similar to Dimensionality Reduction in Machine Learning

Similar to Dimensionality Reduction in Machine Learning (20)

Recently uploaded

Recently uploaded (20)

Dimensionality Reduction in Machine Learning