Feature Selections Methods

Feature Selection
Methods
1
Zahra Mojtahedin
Fall 2024

Outline
2
Introduction
Filter Methods
Wrapper Methods
Embedded Methods
Results
Conclusion

Introduction
3
 The problem of reducing irrelevant
and redundant variables.
 Feature Selection:
1. Understanding data
2. Reducing computation requirement
3. Improving the predictor performance

Filter Methods
4
 Variable ranking
techniques
 They are applied before
classification
 Correlation criteria:
 The Pearson correlation
coefficient:
 Python: numpy.corrcoef

Filter Methods
5
 Mutual Information (MI): The measure of dependency
between two variables
 The uncertainty in output Y:
 The conditional entropy:
 The mutual information:
 Python: sklearn.feature_selection.mutual_info_classif

Filter Methods
6
 Feature ranking:
1. Computationally light
2. Avoids overfitting
3. Works well for certain
datasets
4. The selected subset might
not be optimal:
• The correlation to other
variables

Wrapper Methods
7
 Use predictor as a black box.
 Objective: Evaluate variable subsets based on predictor
performance.
 Search algorithms are employed to find suboptimal
feature subsets.
 These methods aim to balance computational feasibility
and good results.
 Classify into Sequential Selection Algorithms and
Heuristic Search Algorithms.
 Sequential Selection starts with an empty set, adding or

8
Sequential Selection Algorithms :
 These algorithms are iterative in nature.
1. Sequential Feature Selection (SFS): Starts with an empty set
and adds one feature with the highest objective function
value. New features are added iteratively, provided they
increase classification accuracy. Repeated until the desired
number of features are included.
2. Sequential Backward Selection (SBS): Similar to SFS but
starts with all variables and removes one at a time, removing
the feature with the lowest impact on predictor performance.
3. Sequential Floating Forward Selection (SFFS): Enhances SFS
by introducing a backtracking step. Adds features initially and
excludes them if removal improves the objective function.
Repeats until the required features or performance are
Wrapper Methods

Heuristic Search Algorithms
 Genetic Algorithm: Used for feature selection, where
chromosome bits represent the inclusion of features. It searches
for the global maximum of the objective function, which is
predictor performance.
 Parameters and operators in GA can be adapted to specific data
and applications to optimize performance.
 Cluster-based Half Uniform Crossover Genetic Algorithm: A non-
traditional GA with unique characteristics.
1. Selects the best N individuals from the parent and offspring pool.
2. Utilizes highly disruptive Half Uniform Crossover (HUX) for crossovers,
selecting random non- matching alleles.
3. Mating only occurs between diverse parents; Hamming distance between
parents is calculated, and if it doesn't exceed a threshold, they do not
mate.
4. If offspring is not generated (threshold drops to zero), a cataclysmic
mutation is introduced.
10
Wrapper Methods

Embedded Methods
Embedded Feature Selection Methods:
 Objective: To reduce computation time and incorporate feature
selection into the training process.
 Motivation: Filtering methods based on Mutual Information (MI)
had limitations.
 Greedy Search Algorithm: Designed an objective function
for feature selection, maximizing MI between the feature
and the class output.
 Goal: Maximize MI with the class output and minimize MI
with the selected feature and the subset of previously
selected features.
 Formula: I(Y, f) = b ∑ I(f, s) for s in S.
 Selection based on inter-feature MI for non-redundant
11

 Classifier Weight-Based Ranking: Ranking features for removal
based on classifier weights (wj).
 wj = lj(+) - lj(-) / (rj(+) + rj(-)).
 Features ranked proportionally contribute to correlation.
 Sensitivity analysis on wj used for feature selection.
 Suggested method: Change in the objective function, a linear
discriminant function J based on wj.
12
Embedded Methods

Results
13
 Datasets: from the UCI Machine Learning
Repository and MKS Instruments
 SVM
classifier

14
o Results for correlation criteria
using SVM:
Results

15
o Result for MI using
SVM :
Results

16
o Results for SFFS
using SVM:
Results

An example of Mutual Information
17

Conclusion
18
 More information is not always good in
machine learning applications.
 Applying feature selection:
1. Insight into the data
2. Better classifier model
3. Enhance generalization
4. Identification of irrelevant variables

Thank You For Your
Attention!
19
19

Feature Selections Methods

More Related Content

Similar to Feature Selections Methods

More from zahramojtahediin

Recently uploaded

Feature Selections Methods