Feature Engineering Fundamentals Explained.pptx

Module 3
Advanced Feature Engineering and Feature
Selection

Introduction to Feature Engineering
Feature engineering is the process of improving a model’s accuracy by
using domain knowledge to select and transform raw data’s most
relevant variables into features of predictive models that better
represent the underlying problem.

Feature Engineering
Feature
Transformation
Feature
Construction
Feature Selection Feature Extraction
Missing value imputation
Handling Categorical
Features
Outlier detection
Feature scaling

Outlier Detection
interquartile range = Upper Quartile – Lower Quartile = Q
3 – Q
1

Why do we need feature scaling

Feature Scaling
Standardization Normalization

Standardisation(Z-score normalization)
Assume our dataset has random numeric values in the range of 1 to 95,000 (in random order)Just for our
understanding consider a small Dataset of barely 10 values with numbers in the given range and
randomized order.
If we just look at these values, their range is so high, that while training the
model with 10,000 such values will take lot of time.
We have a solution to solve the problem arisen i.e. Standardization. It helps us
solve this by :
● Down Scaling the Values to a scale common to all, usually in the range -
1 to +1.
● And keeping the Range between the values intact.

Feature Selection Techniques
Feature selection is a crucial step in the machine learning pipeline, involving
the selection of a subset of relevant features (variables, predictors) for use in
model construction. Effective feature selection can improve model
performance, reduce overfitting, and decrease training time.
The role of feature selection in machine learning is,
1.To reduce the dimensionality of feature space.
2.To speed up a learning algorithm.
3.To improve the predictive accuracy of a classification algorithm.

There are several techniques for feature selection:

Filter Methods
▪ In Filter Method, features are selected on the basis of statistics measures.
▪ This method does not depend on the learning algorithm and chooses the features as a pre-
processing step.
▪ These methods are faster and less computationally expensive than wrapper methods.
▪ When dealing with high-dimensional data, it is computationally cheaper to use filter
methods.
▪ Very good for removing duplicated, correlated, redundant features but these methods do
not remove multicollinearity.

Information Gain
It is defined as the amount of information provided by the feature for identifying the target value
and measures reduction in the entropy values. Information gain of each attribute is calculated
considering the target values for feature selection.
Chi-square Test
Chi-square test is a technique to determine the relationship between the categorical variables.
The chi-square value is calculated between each feature and the target variable, and the desired
number of features with the best chi-square value is selected.

Steps:
1.Define Null and Alternative Hypothesis:
Null Hypothesis:There is no significant association between the two categorical data
Alternative Hypothesis:There is significant association between the two categorical data.
2.Calculate Contingency Table:

5. Compare Chi-square value with Critical value to Accept or Reject Hypothesis
Degree of freedom=(r-
1) (c-1)
Significance level=0.05

Therefore Income level is relevant feature for predicting subscription
status

Fisher’s Score
Fisher score is one of the most widely used supervised feature selection methods.The algorithm
returns the ranks of the variables based on the fisher’s score in descending order.
Missing Value Ratio
The value of the missing value ratio can be used for evaluating the feature set against the
threshold value. The formula for obtaining the missing value ratio is the number of missing
values in each column divided by the total number of observations.The variable is having more
than the threshold value can be dropped.

Missing Value Ratio:
1. Calculate the missing value ratio for each feature by dividing the number of missing values by
the total number of instances in the dataset.
2. Set a threshold for the acceptable missing value ratio (e.g., 0.8, meaning that a feature should
have at most 80% of its values missing to be considered).
3. Filter out features that have a missing value ratio above the threshold.

Wrapper Methods
Wrapper methods, also referred to as greedy algorithms train the algorithm by
using a subset of features in an iterative manner.
Based on the conclusions made from training in prior to the model, addition and
removal of features takes place.
Stopping criteria for selecting the best subset are usually pre-defined by the person
training the model such as when the performance of the model decreases or a
specific number of features has been achieved.
The main advantage of wrapper methods over the filter methods is that they provide
an optimal set of features for training the model, thus resulting in better
accuracy than the filter methods but are computationally more expensive.

Forward selection
Forward selection is an iterative process, which begins with an empty set of features. After each
iteration, it keeps adding on a feature and evaluates the performance to check whether it is
improving the performance or not. The process continues until the addition of a new
variable/feature does not improve the performance of the model.
Backward elimination
Backward elimination is also an iterative approach, but it is the opposite of forward selection.
This technique begins the process by considering all the features and removes the least
significant feature. This elimination process continues until removing the features does not
improve the performance of the model.
Recursive Feature Elimination
Recursive feature elimination is a recursive greedy optimization approach, where features are
selected by recursively taking a smaller and smaller subset of features. Now, an estimator is
trained with each set of features, and the importance of each feature is determined using
coef_attribute or through a feature_importances_attribute.

Exhaustive Feature Selection
Exhaustive feature selection is one of the best feature selection methods, which evaluates each
feature set as brute-force. It means this method tries & make each possible combination of
features and return the best performing feature set.
How Exhaustive Feature SelectionWorks
1. Generate all possible feature subsets: For a dataset with n features, this means
𝑛
evaluating 2 ^ subsets (including the empty set).
𝑛
2. Evaluate each subset:Train and evaluate a model using each subset of features.The
evaluation metric could be accuracy, precision, recall, F1 score, etc.
3. Select the best subset: Identify the subset of features that provides the best performance
according to the chosen evaluation metric.

Embedded Methods
1.Regularization
This method adds a penalty to different parameters of the machine learning model to
avoid overfitting of the model.
▪ Lasso Regression (L1 Regularization): Adds an L1 penalty (the absolute value of
the magnitude of coefficients) to the loss function. This can shrink some coefficients
to zero, effectively performing feature selection.
▪ Ridge Regression (L2 Regularization): Adds an L2 penalty (the square of the
magnitude of coefficients) to the loss function. While it does not perform feature
selection by shrinking coefficients to zero, it helps in reducing overfitting and
improving model generalization.

2.Tree-based methods
Decision Trees:
Decision Trees split the data into subsets based on the value of input features, and
the splits that provide the best separation (based on criteria like Gini impurity or
information gain) indicate the most important features.
The depth of the tree and the features selected for splits at various levels provide
insights into feature importance.
Random Forests:
Random Forests are ensembles of decision trees. They provide feature importance
by averaging the importance measures of each feature across all the trees.
Feature importance in Random Forests is typically calculated by looking at the
decrease in impurity (e.g., Gini impurity)

Automated Feature Engineering
Automated feature engineering aims to simplify and speed up the process of creating features
from raw data by leveraging algorithms and tools.This approach reduces manual effort and can
uncover complex patterns and interactions that might be missed otherwise.
Benefits of Automated Feature Engineering
● Speed: Quickly generates and evaluates a large number of features.
● Complexity Handling: Captures complex interactions and transformations that might be
difficult to manually specify.
● Consistency: Applies feature engineering techniques uniformly across different datasets and
tasks.
● Performance: Often improves model performance by discovering useful features that
enhance predictive power.

EvalML AutoML library to automate Feature Engineering
evalML is an open-source Python library designed to automate and streamline the machine
learning workflow, particularly focusing on end-to-end model development.

Feature Engineering for Specific Data Types
1. Numerical Data
▪ Feature Scaling
▪ Power Transformations

2.Categorical Data
▪ One hot encoding
▪ Label encoding
▪ Target Encoding

3.Text Data
▪ Bag of Words (BoW)
▪ TF-IDF (Term Frequency-Inverse Document Frequency
▪ Word Embeddings
4.Time-Series Data
▪ Lag
▪ Fourier Transforms
▪ Time-Based Features

Feature Engineering Fundamentals Explained.pptx

More Related Content

What's hot

Similar to Feature Engineering Fundamentals Explained.pptx

More from shilpamathur13

Recently uploaded

Feature Engineering Fundamentals Explained.pptx