SlideShare a Scribd company logo
1 of 12
Download to read offline
”The Curse of Dimensionality”
October 9, 2019
Amit Praseed Classification October 9, 2019 1 / 12
Are all Features Equally Important?
F1 F2 F3 F4 F5 F6 T
7 18 7 11 22 1 B
1 5 1 18 36 0 B
0 15 0 2 4 1 B
7 5 7 12 24 0 A
1 15 1 12 24 1 B
3 20 3 6 12 2 B
0 5 0 18 36 1 B
7 10 7 12 24 0.5 A
10 8 10 20 40 1 A
9 20 9 17 34 0.5 A
Amit Praseed Classification October 9, 2019 2 / 12
Do we really need so many Dimensions?
Dimensionality Reduction Techniques are broadly classified into two
categories:
Feature Selection: These techniques select a subset of dimensions.
Filter Methods: Evaluate the importance of each feature one by one.
Wrapper Methods: Evaluate different subsets of features, and test their
performance on a classifier to select the best subset.
Embedded Methods: Certain classification algorithms, such as Decision
Trees, automatically select the best subset of features to model the data.
Feature Extraction: These techniques transform the data to a lower
dimensional space without loss of data.
Eg. Principal Component Analysis (PCA)
Amit Praseed Classification October 9, 2019 3 / 12
Filter Methods
Filter methods inspect each independent variable individually, or oc-
casionally inpspect a single independent variable with the dependent
variable (which will be the class value for classification).
Advantages:
Fast and Simple.
Works well for simple applications.
Disadvantages:
Does not consider relationships between variables.
It is a very general technique and not tied to a particular classifier. So
there is no guarantee that the newly reduced data will perform well on
all classifiers.
Amit Praseed Classification October 9, 2019 4 / 12
Eliminate Dimensions based on Variance
F1 F2 F3 F4 F5 F6 T
7 18 7 11 22 1 B
1 5 1 18 36 0 B
0 15 0 2 4 1 B
7 5 7 12 24 0 A
1 15 1 12 24 1 B
3 20 3 6 12 2 B
0 5 0 18 36 1 B
7 10 7 12 24 0.5 A
10 8 10 20 40 1 A
9 20 9 17 34 0.5 A
Variance measures the spread
of a random variable around its
mean value.
Var(X) = E[(X − µ)2
]
Dimensions with low values of
variance can be eliminated with
minimal loss of information.
In this case, F6 has very low
variance compared to the other
dimensions and hence can be
ignored.
Amit Praseed Classification October 9, 2019 5 / 12
Do all of the Features Influence the Output?
F1 F2 F3 F4 F5 T
7 18 7 11 22 A
1 5 1 18 36 B
0 15 0 2 4 B
7 5 7 12 24 B
1 15 1 12 24 A
3 20 3 6 12 B
0 5 0 18 36 B
7 10 7 12 24 B
10 8 10 20 40 A
9 20 9 17 34 A
A feature can be regarded as irrel-
evant if it is conditionally inde-
pendent of the class labels.
How to identify irrelevant fea-
tures?
Pearson’s Correlation between
features and the output variable
Point Biserial Correlation be-
tween nominal and numeric vari-
ables
Cramer’s V value between two
nominal variables
Advantage: Simple, and works
well for certain datasets.
Drawback: Does not consider re-
lationships between variables.
Amit Praseed Classification October 9, 2019 6 / 12
Are all the Variables Independent?
F1 F2 F3 F4 F5 T
7 18 7 11 22 A
1 5 1 18 36 B
0 15 0 2 4 B
7 5 7 12 24 B
1 15 1 12 24 A
3 20 3 6 12 B
0 5 0 18 36 B
7 10 7 12 24 B
10 8 10 20 40 A
9 20 9 17 34 A
A feature can be removed from
the feature set if it provides no
more information than already
provided.
How to identify redundant fea-
tures?
Correlation between features
Corr(x, y) =
n
i=1(xi − µx )(yi − µy )
n
i=1
(xi − µx )2 n
i=1
(yi − µy )2
Amit Praseed Classification October 9, 2019 7 / 12
Wrapper Methods
Wrapper methods also test for relationships between variables.
The wrapper method essentially selects a subset of features and feeds
the reduced data to a classifier. A heuristic value, which is the accuracy
of the classifier on the newly reduced data, is obtained.
The wrapper tests for different subsets till it obtains the feature subset
which gives an optimal value of the heuristic.
The feature subset or the reduced feature space is now optimal for the
classifier.
It is considerably more complex and time consuming than the filter
method but performs better.
Amit Praseed Classification October 9, 2019 8 / 12
Wait... Subset Selection is NP-Complete!!!
Selecting the optimal subset of features is an NP-Complete problem,
so approximations are usually used.
Simple approximations include specifying the maximum number of fea-
tures or iterations.
Typical heuristic-based search algorithms such as Hill Climbing, Steep-
est Ascent Hill Climbing, Simulated Annealing etc. are used.
Wrapper methods are usually of four categories:
Forward Selection
Backward Elimination
Recursive Selection/Elimination
Amit Praseed Classification October 9, 2019 9 / 12
Feature Extraction
Feature selection involves loss of data due to the loss of features.
Even though care is taken to remove dimensions which are unlikely to
contribute much to data mining, it is still encouraged to retain all the
input data in one way or the other.
So, how to retain all the input data, and reduce the number of dimen-
sions?
Feature Extraction maps the data in a higher dimension feature space
to a lower dimension feature space without much loss of data.
The most common feature extraction technique used is Principal Com-
ponent Analysis (PCA).
Amit Praseed Classification October 9, 2019 10 / 12
The Idea behind PCA
Amit Praseed Classification October 9, 2019 11 / 12
The Idea behind PCA
Amit Praseed Classification October 9, 2019 12 / 12

More Related Content

What's hot

A test for structural break
A test for structural breakA test for structural break
A test for structural breakSteven Myers
 
The Advance Spreadsheet Skills
The Advance Spreadsheet SkillsThe Advance Spreadsheet Skills
The Advance Spreadsheet SkillsRjDela2
 
Scatter diagram and control chart
Scatter diagram and control chartScatter diagram and control chart
Scatter diagram and control chartnithyanithi26
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 

What's hot (6)

Mb0040
Mb0040Mb0040
Mb0040
 
A test for structural break
A test for structural breakA test for structural break
A test for structural break
 
The Advance Spreadsheet Skills
The Advance Spreadsheet SkillsThe Advance Spreadsheet Skills
The Advance Spreadsheet Skills
 
Graphs, pareto
Graphs, paretoGraphs, pareto
Graphs, pareto
 
Scatter diagram and control chart
Scatter diagram and control chartScatter diagram and control chart
Scatter diagram and control chart
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 

Similar to Dimensionality Reduction

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine LearningMehwish690898
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...
IRJET - License Plate Detection using Hybrid Morphological Technique and ...IRJET Journal
 
Image Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationImage Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationIRJET Journal
 
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfasdfasdf214078
 
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...Abbie Wong
 
A02610104
A02610104A02610104
A02610104theijes
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: InsuranceGregg Barrett
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
IRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation SystemIRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation SystemIRJET Journal
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET Journal
 
Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]Joachim Nkendeys
 

Similar to Dimensionality Reduction (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
 
Image Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationImage Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit Identification
 
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining Tool
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
 
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
Gaining a Competitive Advantage using Analytics to Optimize your Digital Mark...
 
A02610104
A02610104A02610104
A02610104
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
IRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation SystemIRJET- A Plant Identification and Recommendation System
IRJET- A Plant Identification and Recommendation System
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data Mining
 
CAR EVALUATION DATABASE
CAR EVALUATION DATABASECAR EVALUATION DATABASE
CAR EVALUATION DATABASE
 
Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]Satellite_Image_Analysis[1]
Satellite_Image_Analysis[1]
 

More from amitpraseed

Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)amitpraseed
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysisamitpraseed
 
Perceptron Learning
Perceptron LearningPerceptron Learning
Perceptron Learningamitpraseed
 
Introduction to Classification
Introduction to ClassificationIntroduction to Classification
Introduction to Classificationamitpraseed
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksamitpraseed
 
Bayesianclassifiers
BayesianclassifiersBayesianclassifiers
Bayesianclassifiersamitpraseed
 

More from amitpraseed (7)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Perceptron Learning
Perceptron LearningPerceptron Learning
Perceptron Learning
 
Introduction to Classification
Introduction to ClassificationIntroduction to Classification
Introduction to Classification
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Bayesianclassifiers
BayesianclassifiersBayesianclassifiers
Bayesianclassifiers
 

Recently uploaded

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 

Recently uploaded (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

Dimensionality Reduction

  • 1. ”The Curse of Dimensionality” October 9, 2019 Amit Praseed Classification October 9, 2019 1 / 12
  • 2. Are all Features Equally Important? F1 F2 F3 F4 F5 F6 T 7 18 7 11 22 1 B 1 5 1 18 36 0 B 0 15 0 2 4 1 B 7 5 7 12 24 0 A 1 15 1 12 24 1 B 3 20 3 6 12 2 B 0 5 0 18 36 1 B 7 10 7 12 24 0.5 A 10 8 10 20 40 1 A 9 20 9 17 34 0.5 A Amit Praseed Classification October 9, 2019 2 / 12
  • 3. Do we really need so many Dimensions? Dimensionality Reduction Techniques are broadly classified into two categories: Feature Selection: These techniques select a subset of dimensions. Filter Methods: Evaluate the importance of each feature one by one. Wrapper Methods: Evaluate different subsets of features, and test their performance on a classifier to select the best subset. Embedded Methods: Certain classification algorithms, such as Decision Trees, automatically select the best subset of features to model the data. Feature Extraction: These techniques transform the data to a lower dimensional space without loss of data. Eg. Principal Component Analysis (PCA) Amit Praseed Classification October 9, 2019 3 / 12
  • 4. Filter Methods Filter methods inspect each independent variable individually, or oc- casionally inpspect a single independent variable with the dependent variable (which will be the class value for classification). Advantages: Fast and Simple. Works well for simple applications. Disadvantages: Does not consider relationships between variables. It is a very general technique and not tied to a particular classifier. So there is no guarantee that the newly reduced data will perform well on all classifiers. Amit Praseed Classification October 9, 2019 4 / 12
  • 5. Eliminate Dimensions based on Variance F1 F2 F3 F4 F5 F6 T 7 18 7 11 22 1 B 1 5 1 18 36 0 B 0 15 0 2 4 1 B 7 5 7 12 24 0 A 1 15 1 12 24 1 B 3 20 3 6 12 2 B 0 5 0 18 36 1 B 7 10 7 12 24 0.5 A 10 8 10 20 40 1 A 9 20 9 17 34 0.5 A Variance measures the spread of a random variable around its mean value. Var(X) = E[(X − µ)2 ] Dimensions with low values of variance can be eliminated with minimal loss of information. In this case, F6 has very low variance compared to the other dimensions and hence can be ignored. Amit Praseed Classification October 9, 2019 5 / 12
  • 6. Do all of the Features Influence the Output? F1 F2 F3 F4 F5 T 7 18 7 11 22 A 1 5 1 18 36 B 0 15 0 2 4 B 7 5 7 12 24 B 1 15 1 12 24 A 3 20 3 6 12 B 0 5 0 18 36 B 7 10 7 12 24 B 10 8 10 20 40 A 9 20 9 17 34 A A feature can be regarded as irrel- evant if it is conditionally inde- pendent of the class labels. How to identify irrelevant fea- tures? Pearson’s Correlation between features and the output variable Point Biserial Correlation be- tween nominal and numeric vari- ables Cramer’s V value between two nominal variables Advantage: Simple, and works well for certain datasets. Drawback: Does not consider re- lationships between variables. Amit Praseed Classification October 9, 2019 6 / 12
  • 7. Are all the Variables Independent? F1 F2 F3 F4 F5 T 7 18 7 11 22 A 1 5 1 18 36 B 0 15 0 2 4 B 7 5 7 12 24 B 1 15 1 12 24 A 3 20 3 6 12 B 0 5 0 18 36 B 7 10 7 12 24 B 10 8 10 20 40 A 9 20 9 17 34 A A feature can be removed from the feature set if it provides no more information than already provided. How to identify redundant fea- tures? Correlation between features Corr(x, y) = n i=1(xi − µx )(yi − µy ) n i=1 (xi − µx )2 n i=1 (yi − µy )2 Amit Praseed Classification October 9, 2019 7 / 12
  • 8. Wrapper Methods Wrapper methods also test for relationships between variables. The wrapper method essentially selects a subset of features and feeds the reduced data to a classifier. A heuristic value, which is the accuracy of the classifier on the newly reduced data, is obtained. The wrapper tests for different subsets till it obtains the feature subset which gives an optimal value of the heuristic. The feature subset or the reduced feature space is now optimal for the classifier. It is considerably more complex and time consuming than the filter method but performs better. Amit Praseed Classification October 9, 2019 8 / 12
  • 9. Wait... Subset Selection is NP-Complete!!! Selecting the optimal subset of features is an NP-Complete problem, so approximations are usually used. Simple approximations include specifying the maximum number of fea- tures or iterations. Typical heuristic-based search algorithms such as Hill Climbing, Steep- est Ascent Hill Climbing, Simulated Annealing etc. are used. Wrapper methods are usually of four categories: Forward Selection Backward Elimination Recursive Selection/Elimination Amit Praseed Classification October 9, 2019 9 / 12
  • 10. Feature Extraction Feature selection involves loss of data due to the loss of features. Even though care is taken to remove dimensions which are unlikely to contribute much to data mining, it is still encouraged to retain all the input data in one way or the other. So, how to retain all the input data, and reduce the number of dimen- sions? Feature Extraction maps the data in a higher dimension feature space to a lower dimension feature space without much loss of data. The most common feature extraction technique used is Principal Com- ponent Analysis (PCA). Amit Praseed Classification October 9, 2019 10 / 12
  • 11. The Idea behind PCA Amit Praseed Classification October 9, 2019 11 / 12
  • 12. The Idea behind PCA Amit Praseed Classification October 9, 2019 12 / 12