SlideShare a Scribd company logo
”The Curse of Dimensionality”
October 9, 2019
Amit Praseed Classification October 9, 2019 1 / 12
Are all Features Equally Important?
F1 F2 F3 F4 F5 F6 T
7 18 7 11 22 1 B
1 5 1 18 36 0 B
0 15 0 2 4 1 B
7 5 7 12 24 0 A
1 15 1 12 24 1 B
3 20 3 6 12 2 B
0 5 0 18 36 1 B
7 10 7 12 24 0.5 A
10 8 10 20 40 1 A
9 20 9 17 34 0.5 A
Amit Praseed Classification October 9, 2019 2 / 12
Do we really need so many Dimensions?
Dimensionality Reduction Techniques are broadly classified into two
categories:
Feature Selection: These techniques select a subset of dimensions.
Filter Methods: Evaluate the importance of each feature one by one.
Wrapper Methods: Evaluate different subsets of features, and test their
performance on a classifier to select the best subset.
Embedded Methods: Certain classification algorithms, such as Decision
Trees, automatically select the best subset of features to model the data.
Feature Extraction: These techniques transform the data to a lower
dimensional space without loss of data.
Eg. Principal Component Analysis (PCA)
Amit Praseed Classification October 9, 2019 3 / 12
Filter Methods
Filter methods inspect each independent variable individually, or oc-
casionally inpspect a single independent variable with the dependent
variable (which will be the class value for classification).
Advantages:
Fast and Simple.
Works well for simple applications.
Disadvantages:
Does not consider relationships between variables.
It is a very general technique and not tied to a particular classifier. So
there is no guarantee that the newly reduced data will perform well on
all classifiers.
Amit Praseed Classification October 9, 2019 4 / 12
Eliminate Dimensions based on Variance
F1 F2 F3 F4 F5 F6 T
7 18 7 11 22 1 B
1 5 1 18 36 0 B
0 15 0 2 4 1 B
7 5 7 12 24 0 A
1 15 1 12 24 1 B
3 20 3 6 12 2 B
0 5 0 18 36 1 B
7 10 7 12 24 0.5 A
10 8 10 20 40 1 A
9 20 9 17 34 0.5 A
Variance measures the spread
of a random variable around its
mean value.
Var(X) = E[(X − µ)2
]
Dimensions with low values of
variance can be eliminated with
minimal loss of information.
In this case, F6 has very low
variance compared to the other
dimensions and hence can be
ignored.
Amit Praseed Classification October 9, 2019 5 / 12
Do all of the Features Influence the Output?
F1 F2 F3 F4 F5 T
7 18 7 11 22 A
1 5 1 18 36 B
0 15 0 2 4 B
7 5 7 12 24 B
1 15 1 12 24 A
3 20 3 6 12 B
0 5 0 18 36 B
7 10 7 12 24 B
10 8 10 20 40 A
9 20 9 17 34 A
A feature can be regarded as irrel-
evant if it is conditionally inde-
pendent of the class labels.
How to identify irrelevant fea-
tures?
Pearson’s Correlation between
features and the output variable
Point Biserial Correlation be-
tween nominal and numeric vari-
ables
Cramer’s V value between two
nominal variables
Advantage: Simple, and works
well for certain datasets.
Drawback: Does not consider re-
lationships between variables.
Amit Praseed Classification October 9, 2019 6 / 12
Are all the Variables Independent?
F1 F2 F3 F4 F5 T
7 18 7 11 22 A
1 5 1 18 36 B
0 15 0 2 4 B
7 5 7 12 24 B
1 15 1 12 24 A
3 20 3 6 12 B
0 5 0 18 36 B
7 10 7 12 24 B
10 8 10 20 40 A
9 20 9 17 34 A
A feature can be removed from
the feature set if it provides no
more information than already
provided.
How to identify redundant fea-
tures?
Correlation between features
Corr(x, y) =
n
i=1(xi − µx )(yi − µy )
n
i=1
(xi − µx )2 n
i=1
(yi − µy )2
Amit Praseed Classification October 9, 2019 7 / 12
Wrapper Methods
Wrapper methods also test for relationships between variables.
The wrapper method essentially selects a subset of features and feeds
the reduced data to a classifier. A heuristic value, which is the accuracy
of the classifier on the newly reduced data, is obtained.
The wrapper tests for different subsets till it obtains the feature subset
which gives an optimal value of the heuristic.
The feature subset or the reduced feature space is now optimal for the
classifier.
It is considerably more complex and time consuming than the filter
method but performs better.
Amit Praseed Classification October 9, 2019 8 / 12
Wait... Subset Selection is NP-Complete!!!
Selecting the optimal subset of features is an NP-Complete problem,
so approximations are usually used.
Simple approximations include specifying the maximum number of fea-
tures or iterations.
Typical heuristic-based search algorithms such as Hill Climbing, Steep-
est Ascent Hill Climbing, Simulated Annealing etc. are used.
Wrapper methods are usually of four categories:
Forward Selection
Backward Elimination
Recursive Selection/Elimination
Amit Praseed Classification October 9, 2019 9 / 12
Feature Extraction
Feature selection involves loss of data due to the loss of features.
Even though care is taken to remove dimensions which are unlikely to
contribute much to data mining, it is still encouraged to retain all the
input data in one way or the other.
So, how to retain all the input data, and reduce the number of dimen-
sions?
Feature Extraction maps the data in a higher dimension feature space
to a lower dimension feature space without much loss of data.
The most common feature extraction technique used is Principal Com-
ponent Analysis (PCA).
Amit Praseed Classification October 9, 2019 10 / 12
The Idea behind PCA
Amit Praseed Classification October 9, 2019 11 / 12
The Idea behind PCA
Amit Praseed Classification October 9, 2019 12 / 12

More Related Content

What's hot

A test for structural break
A test for structural breakA test for structural break
A test for structural breakSteven Myers
 
The Advance Spreadsheet Skills
The Advance Spreadsheet SkillsThe Advance Spreadsheet Skills
The Advance Spreadsheet SkillsRjDela2
 
Scatter diagram and control chart
Scatter diagram and control chartScatter diagram and control chart
Scatter diagram and control chartnithyanithi26
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 

What's hot (6)

Mb0040
Mb0040Mb0040
Mb0040
 
A test for structural break
A test for structural breakA test for structural break
A test for structural break
 
The Advance Spreadsheet Skills
The Advance Spreadsheet SkillsThe Advance Spreadsheet Skills
The Advance Spreadsheet Skills
 
Graphs, pareto
Graphs, paretoGraphs, pareto
Graphs, pareto
 
Scatter diagram and control chart
Scatter diagram and control chartScatter diagram and control chart
Scatter diagram and control chart
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 

Similar to Dimensionality Reduction

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine LearningMehwish690898
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...
IRJET - License Plate Detection using Hybrid Morphological Technique and ...IRJET Journal
 
Image Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationImage Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationIRJET Journal
 
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfasdfasdf214078
 
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...Abbie Wong
 
A02610104
A02610104A02610104
A02610104theijes
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: InsuranceGregg Barrett
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
IRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation SystemIRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation SystemIRJET Journal
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET Journal
 
Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]Joachim Nkendeys
 

Similar to Dimensionality Reduction (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
 
Image Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationImage Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit Identification
 
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining Tool
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
 
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
 
A02610104
A02610104A02610104
A02610104
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
IRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation SystemIRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation System
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data Mining
 
CAR EVALUATION DATABASE
CAR EVALUATION DATABASECAR EVALUATION DATABASE
CAR EVALUATION DATABASE
 
Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]
 

More from amitpraseed

Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)amitpraseed
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysisamitpraseed
 
Perceptron Learning
Perceptron LearningPerceptron Learning
Perceptron Learningamitpraseed
 
Introduction to Classification
Introduction to ClassificationIntroduction to Classification
Introduction to Classificationamitpraseed
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksamitpraseed
 
Bayesianclassifiers
BayesianclassifiersBayesianclassifiers
Bayesianclassifiersamitpraseed
 

More from amitpraseed (7)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Perceptron Learning
Perceptron LearningPerceptron Learning
Perceptron Learning
 
Introduction to Classification
Introduction to ClassificationIntroduction to Classification
Introduction to Classification
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Bayesianclassifiers
BayesianclassifiersBayesianclassifiers
Bayesianclassifiers
 

Recently uploaded

Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringDenish Jangid
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxCapitolTechU
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...Nguyen Thanh Tu Collection
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportAvinash Rai
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxricssacare
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxShibin Azad
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resourcesaileywriter
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfjoachimlavalley1
 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptxmansk2
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfYibeltalNibretu
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleCeline George
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsCol Mukteshwar Prasad
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...Nguyen Thanh Tu Collection
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXMIRIAMSALINAS13
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 

Recently uploaded (20)

Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 

Dimensionality Reduction

  • 1. ”The Curse of Dimensionality” October 9, 2019 Amit Praseed Classification October 9, 2019 1 / 12
  • 2. Are all Features Equally Important? F1 F2 F3 F4 F5 F6 T 7 18 7 11 22 1 B 1 5 1 18 36 0 B 0 15 0 2 4 1 B 7 5 7 12 24 0 A 1 15 1 12 24 1 B 3 20 3 6 12 2 B 0 5 0 18 36 1 B 7 10 7 12 24 0.5 A 10 8 10 20 40 1 A 9 20 9 17 34 0.5 A Amit Praseed Classification October 9, 2019 2 / 12
  • 3. Do we really need so many Dimensions? Dimensionality Reduction Techniques are broadly classified into two categories: Feature Selection: These techniques select a subset of dimensions. Filter Methods: Evaluate the importance of each feature one by one. Wrapper Methods: Evaluate different subsets of features, and test their performance on a classifier to select the best subset. Embedded Methods: Certain classification algorithms, such as Decision Trees, automatically select the best subset of features to model the data. Feature Extraction: These techniques transform the data to a lower dimensional space without loss of data. Eg. Principal Component Analysis (PCA) Amit Praseed Classification October 9, 2019 3 / 12
  • 4. Filter Methods Filter methods inspect each independent variable individually, or oc- casionally inpspect a single independent variable with the dependent variable (which will be the class value for classification). Advantages: Fast and Simple. Works well for simple applications. Disadvantages: Does not consider relationships between variables. It is a very general technique and not tied to a particular classifier. So there is no guarantee that the newly reduced data will perform well on all classifiers. Amit Praseed Classification October 9, 2019 4 / 12
  • 5. Eliminate Dimensions based on Variance F1 F2 F3 F4 F5 F6 T 7 18 7 11 22 1 B 1 5 1 18 36 0 B 0 15 0 2 4 1 B 7 5 7 12 24 0 A 1 15 1 12 24 1 B 3 20 3 6 12 2 B 0 5 0 18 36 1 B 7 10 7 12 24 0.5 A 10 8 10 20 40 1 A 9 20 9 17 34 0.5 A Variance measures the spread of a random variable around its mean value. Var(X) = E[(X − µ)2 ] Dimensions with low values of variance can be eliminated with minimal loss of information. In this case, F6 has very low variance compared to the other dimensions and hence can be ignored. Amit Praseed Classification October 9, 2019 5 / 12
  • 6. Do all of the Features Influence the Output? F1 F2 F3 F4 F5 T 7 18 7 11 22 A 1 5 1 18 36 B 0 15 0 2 4 B 7 5 7 12 24 B 1 15 1 12 24 A 3 20 3 6 12 B 0 5 0 18 36 B 7 10 7 12 24 B 10 8 10 20 40 A 9 20 9 17 34 A A feature can be regarded as irrel- evant if it is conditionally inde- pendent of the class labels. How to identify irrelevant fea- tures? Pearson’s Correlation between features and the output variable Point Biserial Correlation be- tween nominal and numeric vari- ables Cramer’s V value between two nominal variables Advantage: Simple, and works well for certain datasets. Drawback: Does not consider re- lationships between variables. Amit Praseed Classification October 9, 2019 6 / 12
  • 7. Are all the Variables Independent? F1 F2 F3 F4 F5 T 7 18 7 11 22 A 1 5 1 18 36 B 0 15 0 2 4 B 7 5 7 12 24 B 1 15 1 12 24 A 3 20 3 6 12 B 0 5 0 18 36 B 7 10 7 12 24 B 10 8 10 20 40 A 9 20 9 17 34 A A feature can be removed from the feature set if it provides no more information than already provided. How to identify redundant fea- tures? Correlation between features Corr(x, y) = n i=1(xi − µx )(yi − µy ) n i=1 (xi − µx )2 n i=1 (yi − µy )2 Amit Praseed Classification October 9, 2019 7 / 12
  • 8. Wrapper Methods Wrapper methods also test for relationships between variables. The wrapper method essentially selects a subset of features and feeds the reduced data to a classifier. A heuristic value, which is the accuracy of the classifier on the newly reduced data, is obtained. The wrapper tests for different subsets till it obtains the feature subset which gives an optimal value of the heuristic. The feature subset or the reduced feature space is now optimal for the classifier. It is considerably more complex and time consuming than the filter method but performs better. Amit Praseed Classification October 9, 2019 8 / 12
  • 9. Wait... Subset Selection is NP-Complete!!! Selecting the optimal subset of features is an NP-Complete problem, so approximations are usually used. Simple approximations include specifying the maximum number of fea- tures or iterations. Typical heuristic-based search algorithms such as Hill Climbing, Steep- est Ascent Hill Climbing, Simulated Annealing etc. are used. Wrapper methods are usually of four categories: Forward Selection Backward Elimination Recursive Selection/Elimination Amit Praseed Classification October 9, 2019 9 / 12
  • 10. Feature Extraction Feature selection involves loss of data due to the loss of features. Even though care is taken to remove dimensions which are unlikely to contribute much to data mining, it is still encouraged to retain all the input data in one way or the other. So, how to retain all the input data, and reduce the number of dimen- sions? Feature Extraction maps the data in a higher dimension feature space to a lower dimension feature space without much loss of data. The most common feature extraction technique used is Principal Com- ponent Analysis (PCA). Amit Praseed Classification October 9, 2019 10 / 12
  • 11. The Idea behind PCA Amit Praseed Classification October 9, 2019 11 / 12
  • 12. The Idea behind PCA Amit Praseed Classification October 9, 2019 12 / 12