Date
Introduction to Data Science
Lab 8
(Feature Selection)
Feature Selection
There are mainly 3 ways for feature selection:
Filter Methods
Wrapper Method
Embedded Methods
Univariate Selection: Filter Method
 The filter method ranks each feature based on some uni-variate metric and then
selects the highest-ranking features.
 Variance
 No Duplicate Columns
 High correlation
 Chi square test
 Mutual information of the independent variable.
 The filter method looks at individual features for identifying it’s relative
importance.
 A feature may not be useful on its own but may be an important influencer when
combined with other features. Filter methods may miss such features.
Filter Method: Variance
Removing the features whose variance is zero (having similar
values)
Filter Method: No duplicate Columns
Remove all the columns which have duplicate values
Filter Method: Correlation
 Two or more than two features are mutually correlated, they convey redundant
information to the model and hence only one of the correlated features should
be retained to reduce the number of features.
 If independent features are correlated with dependent features (target
variable), you don't need to remove them.
 If independent features are correlated with dependent features by 80% or
90% then drop those kind of features and train model with remaining
features.
Chi-Square Calculation: An Example
Χ2 (chi-square) calculation (numbers in parenthesis are expected counts
calculated based on the data distribution in the two categories)
It shows that like_science_fiction and play_chess are correlated in the group as
the value is greater than 10.828
93
.
507
840
)
840
1000
(
360
)
360
200
(
210
)
210
50
(
90
)
90
250
( 2
2
2
2
2










Play chess Not play chess Sum (row)
Like science fiction 250(90) 200(360) 450
Not like science fiction 50(210) 1000(840) 1050
Sum(col.) 300 1200 1500
Filter Method: Chi Square Test & Mutual Information
Chi Square Test & Mutual Information
For chi square convert to categorical data by converting data to
integers
Wrapper Method
 Feature selection process is based on a specific machine learning algorithm that
is to be applied to a particular record.
 It follows a greedy search approach by evaluating all possible combinations of
features based on the evaluation criterion.
 Wrapper methods are of following types:
 Forward Elimination
 Backward Elimination
Forward Elimination
The procedure starts with an empty set of features.
The best of the original features is determined and added to the
reduced set.
At each subsequent iteration, the best of the remaining original
attributes is added to the set.
SequentialFeatureSelector from sklearn can be used for forward
feature elimination by setting parameter direction=‘forward’
Forward Elimination
ExtraTreesClassifier from sklearn can also be used for the forward
elimination feature selection
Backward Elimination
The procedure starts with the full set of features.
At each step, it removes the worst attribute remaining in the set.
SequentialFeatureSelector from sklearn can be used for forward
feature elimination by setting parameter direction=‘backward’
Comparison
Filter
 Filter methods do not incorporate a machine
learning model in order to determine if a
feature is good or bad
 Filter methods are much faster compared to
wrapper methods as they do not involve
training the models.
 Filter methods may fail to find the best subset
of features in situations when there is not
enough data to model the statistical correlation
of the features
Wrapper
 Wrapper methods use a machine learning model
and train it the feature to decide if it is essential
for the final model or not
 Wrapper methods are computationally costly,
and in the case of massive datasets, wrapper
methods are probably not the most effective
feature selection method to consider.
• Wrapper methods can always provide the best
subset of features because of their exhaustive
nature.
 Using features from wrapper methods in your
final machine learning model can lead to
overfitting as wrapper methods already train
machine learning models with the features and it
affects the true power of learning.

feature-Selection-Lab-8-20032024-111222am.pptx

  • 1.
    Date Introduction to DataScience Lab 8 (Feature Selection)
  • 2.
    Feature Selection There aremainly 3 ways for feature selection: Filter Methods Wrapper Method Embedded Methods
  • 3.
    Univariate Selection: FilterMethod  The filter method ranks each feature based on some uni-variate metric and then selects the highest-ranking features.  Variance  No Duplicate Columns  High correlation  Chi square test  Mutual information of the independent variable.  The filter method looks at individual features for identifying it’s relative importance.  A feature may not be useful on its own but may be an important influencer when combined with other features. Filter methods may miss such features.
  • 4.
    Filter Method: Variance Removingthe features whose variance is zero (having similar values)
  • 5.
    Filter Method: Noduplicate Columns Remove all the columns which have duplicate values
  • 6.
    Filter Method: Correlation Two or more than two features are mutually correlated, they convey redundant information to the model and hence only one of the correlated features should be retained to reduce the number of features.  If independent features are correlated with dependent features (target variable), you don't need to remove them.  If independent features are correlated with dependent features by 80% or 90% then drop those kind of features and train model with remaining features.
  • 7.
    Chi-Square Calculation: AnExample Χ2 (chi-square) calculation (numbers in parenthesis are expected counts calculated based on the data distribution in the two categories) It shows that like_science_fiction and play_chess are correlated in the group as the value is greater than 10.828 93 . 507 840 ) 840 1000 ( 360 ) 360 200 ( 210 ) 210 50 ( 90 ) 90 250 ( 2 2 2 2 2           Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500
  • 8.
    Filter Method: ChiSquare Test & Mutual Information Chi Square Test & Mutual Information For chi square convert to categorical data by converting data to integers
  • 9.
    Wrapper Method  Featureselection process is based on a specific machine learning algorithm that is to be applied to a particular record.  It follows a greedy search approach by evaluating all possible combinations of features based on the evaluation criterion.  Wrapper methods are of following types:  Forward Elimination  Backward Elimination
  • 10.
    Forward Elimination The procedurestarts with an empty set of features. The best of the original features is determined and added to the reduced set. At each subsequent iteration, the best of the remaining original attributes is added to the set. SequentialFeatureSelector from sklearn can be used for forward feature elimination by setting parameter direction=‘forward’
  • 11.
    Forward Elimination ExtraTreesClassifier fromsklearn can also be used for the forward elimination feature selection
  • 12.
    Backward Elimination The procedurestarts with the full set of features. At each step, it removes the worst attribute remaining in the set. SequentialFeatureSelector from sklearn can be used for forward feature elimination by setting parameter direction=‘backward’
  • 13.
    Comparison Filter  Filter methodsdo not incorporate a machine learning model in order to determine if a feature is good or bad  Filter methods are much faster compared to wrapper methods as they do not involve training the models.  Filter methods may fail to find the best subset of features in situations when there is not enough data to model the statistical correlation of the features Wrapper  Wrapper methods use a machine learning model and train it the feature to decide if it is essential for the final model or not  Wrapper methods are computationally costly, and in the case of massive datasets, wrapper methods are probably not the most effective feature selection method to consider. • Wrapper methods can always provide the best subset of features because of their exhaustive nature.  Using features from wrapper methods in your final machine learning model can lead to overfitting as wrapper methods already train machine learning models with the features and it affects the true power of learning.